Commit Graph

11 Commits

Author SHA1 Message Date
fxyfxy777 4c92035f2d [Feature] Unify fp8 block_wise quant ops (#5991)
* quant stash

* blockwise_quant

* precommit

* rm tensor.cut

* tp ok

* add swiglu

* rm outdate code

* fix activate ut

* change baseline

* fix baseline error
2026-01-15 05:50:37 -08:00
Ryan 724045c426 add some op infershape&dtype (#5762) 2025-12-26 16:17:39 +08:00
Yuanle Liu cdc0004894 Revert "[Feature] add ue8m0 for per_token_quant_fp8 (#5563)" (#5611)
This reverts commit 73e1d6aa90.
2025-12-17 13:59:06 +08:00
fxyfxy777 73e1d6aa90 [Feature] add ue8m0 for per_token_quant_fp8 (#5563)
* ue8m0

* add default arg

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-16 18:40:12 +08:00
周周周 95243f012c [Others] add PADDLE_ENFORCE (#5288) 2025-11-28 14:23:35 +08:00
Ryan e25c067f70 [OP] Add InferShape&InferDtype for per_token_quant_padding (#4667)
* add InferShape&InferDtype for per_token_quant_padding

* fix codestyle
2025-10-30 10:28:26 +08:00
周周周 76513f6416 Support 45t fp8 8 GPU (#3659) 2025-08-28 10:52:53 +08:00
RichardWooSJTU e39159f3bd Add switch to apply fine-grained per token quant fp8 (#3192)
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
2025-08-04 19:54:03 -07:00
Jiang-Jia-Jun 05c670e593 [Sync] Update to latest code (#2679)
* [Sync] Update to latest code

* Add new code files

* Add new code files

* update code

* Try to fix build.sh

* Try to fix build.sh

* Update code

* Update requirements.txt

* Update code

---------

Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
2025-07-03 15:43:53 +08:00
MARD1NO ac5f860536 use shfl_xor_sync to reduce redundant shfl broadcast 2025-06-30 13:12:21 +08:00
jiangjiajun 684703fd72 [LLM] First commit the llm deployment code 2025-06-09 19:20:15 +08:00