Commit Graph

7 Commits

Author SHA1 Message Date
JYChen c6d8fbe526 [BugFix] fix log with paddlefleet.ops (#6528) 2026-02-27 14:34:29 +08:00
AIbin 0eb87467f8 [BugFix]fix RL bug about blockwisefp8 (#6466)
* fix RL bug about blockwisefp8

* fix  moe same bug

* fix RL FP8 bug
2026-02-12 09:15:29 +08:00
JYChen 40c952e7b5 fix deepgemm import (#6451)
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
2026-02-11 20:10:01 +08:00
JYChen 9bcd863902 [Others] support import deepgemm/deepep from fleet ops (#6351)
* update paddleformers to v1.0

* only change import fleetpath
2026-02-09 11:53:13 +08:00
JYChen c745a22420 [Feature] Support Ernie FP8 on sm100 ( the fixed version) (#6304) 2026-02-03 17:47:38 +08:00
JYChen 6c685c9474 Revert "[Feature] Support Ernie FP8 on sm100 (#5593)" (#6275)
This reverts commit eb80724b71.
2026-01-30 11:22:01 +08:00
JYChen eb80724b71 [Feature] Support Ernie FP8 on sm100 (#5593)
* Deepgemm暂时可用版本

* dense部分 e8m0 ok

* EB模型E8M0跑通的版本

* code check

* support 21b-tp2, dev_paddle

* 单机4.5T ep OK的版本

* 修复删除的代码,单机4.5T ep(非cudagraph)

* eb tp

* Support SM100 block-wise FP8 inference

* refine codes, support deepgemm on sm100

* add thirdparty PFCC/DeepGEMM

* fix ep decode

* 使用deepep ue8m0, 解决精度问题

* 修复FP8 TP精度

* Deepgemm升级适配Hopper逻辑

* add ue8m0 kernel

* add ue8m0 kernel

* fix custom_ops/gpu_ops/cpp_extensions.cc

* eb 输出正常

* eb5 text is right

* 目测精度一致

* 自测精度对齐

* 替换masked_per_token_quant, ep精度OK

* 性能提升约30%

* 暂时跑通ep但是有问题

* 自测一致

* rm test fun

* fix ep event

* 图优化算子更新Deepgemm

* fix build

* 暂时绕过deepgemm CI编译问题

* 根据SM区分deepgemm版本

* remove useless code

---------

Co-authored-by: ckl117 <ckl117@163.com>
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
Co-authored-by: fxyfxy777 <fxyfxy777@163.com>
2026-01-29 13:49:54 +08:00