JYChen
|
c745a22420
|
[Feature] Support Ernie FP8 on sm100 ( the fixed version) (#6304)
|
2026-02-03 17:47:38 +08:00 |
|
JYChen
|
6c685c9474
|
Revert "[Feature] Support Ernie FP8 on sm100 (#5593)" (#6275)
This reverts commit eb80724b71.
|
2026-01-30 11:22:01 +08:00 |
|
JYChen
|
eb80724b71
|
[Feature] Support Ernie FP8 on sm100 (#5593)
* Deepgemm暂时可用版本
* dense部分 e8m0 ok
* EB模型E8M0跑通的版本
* code check
* support 21b-tp2, dev_paddle
* 单机4.5T ep OK的版本
* 修复删除的代码,单机4.5T ep(非cudagraph)
* eb tp
* Support SM100 block-wise FP8 inference
* refine codes, support deepgemm on sm100
* add thirdparty PFCC/DeepGEMM
* fix ep decode
* 使用deepep ue8m0, 解决精度问题
* 修复FP8 TP精度
* Deepgemm升级适配Hopper逻辑
* add ue8m0 kernel
* add ue8m0 kernel
* fix custom_ops/gpu_ops/cpp_extensions.cc
* eb 输出正常
* eb5 text is right
* 目测精度一致
* 自测精度对齐
* 替换masked_per_token_quant, ep精度OK
* 性能提升约30%
* 暂时跑通ep但是有问题
* 自测一致
* rm test fun
* fix ep event
* 图优化算子更新Deepgemm
* fix build
* 暂时绕过deepgemm CI编译问题
* 根据SM区分deepgemm版本
* remove useless code
---------
Co-authored-by: ckl117 <ckl117@163.com>
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
Co-authored-by: fxyfxy777 <fxyfxy777@163.com>
|
2026-01-29 13:49:54 +08:00 |
|
xiaoxiaohehe001
|
00a01ae024
|
[Feature] Support redundant expert for eplb (#5918)
* [BugFix] support redundant expert for eplb
* support redundant expert for eplb
* support redundant expert for eplb
* update
* fix ci eplb
|
2026-01-09 17:13:24 +08:00 |
|
Yuanle Liu
|
5e729bc2ba
|
[OPs] ep_moe_expert_dispatch.cu dispatch num_experts_per_rank 5 (#5890)
|
2026-01-06 10:39:35 +08:00 |
|
周周周
|
ab553b3b8b
|
revert cuda_check (#5883)
|
2026-01-05 20:51:31 +08:00 |
|
周周周
|
e3957a5ebc
|
[Others] remove template NUM_EXPERTS_PER_RANK in permute_x_fp8_kernel (#5620)
|
2026-01-04 11:21:15 +08:00 |
|
Longzhi Wang
|
11329ee35e
|
[Model] support mode config for expert_dispatch (#5748)
|
2025-12-29 13:37:20 +08:00 |
|
Ryan
|
724045c426
|
add some op infershape&dtype (#5762)
|
2025-12-26 16:17:39 +08:00 |
|
周周周
|
a36d60aa18
|
[FIX BUG] fix bug in TP in permute_x_fp8_kernel (#5350)
* commit
* commit
* commit
* commit
* commit
* commit
|
2025-12-03 05:17:37 -08:00 |
|
Sunny-bot1
|
3629db4129
|
[Quantization] Support w4afp8 MoE dynamic quantization (#5282)
* support dynamic activation quant for w4afp8
* support dynamic w4afp8
* add test
* fix
* fix
---------
Co-authored-by: zhoutianzi666 <17801055074@163.com>
|
2025-12-02 18:56:16 +08:00 |
|
周周周
|
fb7f951612
|
[UNITEST] add test (#5305)
|
2025-12-02 17:59:01 +08:00 |
|
chen
|
aa35ce449d
|
[Optimization] EP empty_input_forward Remove Communication (#5254)
|
2025-12-01 21:10:40 +08:00 |
|
周周周
|
95243f012c
|
[Others] add PADDLE_ENFORCE (#5288)
|
2025-11-28 14:23:35 +08:00 |
|
yangjianfengo1
|
ae7bee8122
|
【New Feature】W4afp8 supports per group quantization (#4987)
* w4afp8 支持per group
* code style
* fix transpose
* revert fast hardmard
---------
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
Co-authored-by: plusNew001 <95567040+plusNew001@users.noreply.github.com>
|
2025-11-13 19:17:27 +08:00 |
|
gaoziyuan
|
896e3bb606
|
[NewFeture]add ep rollout model init and update/clear ep buffer (#4039)
* fix gid
* merge
* fix test
* fix bug
* fix
* fix ci
|
2025-09-17 20:24:53 +08:00 |
|
Sunny-bot1
|
442543cd6b
|
fix ep wint8 (#4102)
|
2025-09-16 11:05:33 +08:00 |
|
co63oc
|
2033450391
|
rename ep_moe_prefill_func ep_moe_expert_dispatch (#3938)
|
2025-09-08 15:19:28 +08:00 |
|