Commit Graph

161 Commits

Author SHA1 Message Date
fxyfxy777 4c92035f2d [Feature] Unify fp8 block_wise quant ops (#5991)
* quant stash

* blockwise_quant

* precommit

* rm tensor.cut

* tp ok

* add swiglu

* rm outdate code

* fix activate ut

* change baseline

* fix baseline error
2026-01-15 05:50:37 -08:00
lizexu123 6619298b50 【Optim】Optimize grid dimensions using max_tokens_per_expert for MoE models (#6007)
* update w4afp8

* build.sh ok

* support cuda_graph

* fix

* add test

* fix max_tokens_per_expert

* >=70

* fix

* compute_max_tokens_from_prefix_sum in w4afp8

* compute_max_tokens use cub
2026-01-15 19:18:42 +08:00
Cheng Yanfei fbcccaa750 [Intel HPU] enable MoE EP for hpu (#5855)
* enable HPU MoE EP

* MoE intermediate_scale stack

* enable loader_v1 esp for tensor_wise_fp8 TP or EP

* modify activation_scale name
2026-01-15 13:08:00 +08:00
RAM b3f59fd9b5 [RL][CI] Support Async R3 And Add Accuracy Test (#5937)
* add bs1 r3 test case

* async put

* r3 test case 1.0

* success run eb5

* refine test case

* pre-commit

* add eb45 & glm testcase

* format code

* add p2pstore requirements

* support only last turn

* R3 use worker log

* refine code &fix ci bug

* refine error mesg

* fix empty input bug

* Success set acc ci of eb45 and glm45

* refine code

* fix bug
2026-01-14 04:25:06 -08:00
xiaoxiaohehe001 00a01ae024 [Feature] Support redundant expert for eplb (#5918)
* [BugFix] support redundant expert for eplb

* support redundant expert for eplb

* support redundant expert for eplb

* update

* fix ci eplb
2026-01-09 17:13:24 +08:00
Ryan 3e74bacc5e add m_grouped_gemm_fp8_fp8_bf16_nt_contiguous_custom_python_op (#5847) 2026-01-07 16:17:55 +08:00
lizexu123 1d3ae7c024 [BugFix] fix w4afp8 tp=8 (#5868)
* fix w4afp8 tp=8

* fix
2026-01-05 18:59:02 +08:00
ming1753 f50e1bcc16 [Others] enable use PFCC deep_ep (#5822)
* upstream deep_ep

* fix bug

* fix bug

* modify env name
2026-01-05 02:07:01 -08:00
周周周 dc13344ab8 [Optimization] add del to decrease peak memory in MoE prefill (#5863) 2026-01-05 14:01:48 +08:00
lizexu123 44a13e4557 [Feature] support w4afp8 v1_loader and v0_loader(tp>1) (#5757)
* support

* fix

* support w4afp8 v1_loader and v0_loader

* fix

* fix test

* fix test

* fix test

* fix moe.py

* add test_ernie_4_5_w4afp8

* add test

* delete tensor

* fix test

* fix

* add

* fix test
2025-12-30 14:11:52 +08:00
Ryan eb782a0225 [BugFix] Fix return value inconsistency for ep_moe_expert_combine op (#5812) 2025-12-29 16:44:00 +08:00
Nyakku Shigure 11227e00bb [GraphOptimization] Wrap deep gemm and triton as python op (#5673)
* [GraphOptimization] Wrap deep gemm and triton as python op

* add unitest to _base_test && compatibility

* paddle.static.MetaTensor -> "paddle.static.MetaTensor"

* mv register_custom_python_op

* rename yaml

---------

Co-authored-by: DrRyanHuang <zihaohuang@aliyun.com>
2025-12-24 15:23:46 +08:00
bukejiyu d1c6e57341 [Others] upgrade paddleformer to 0.4.0 (#5599) 2025-12-23 05:08:01 -08:00
Sunny-bot1 04035e4ebf support w4afp8 two stage (#5608) 2025-12-22 15:13:05 +08:00
Sunny-bot1 40f3897a4e support w4afp8 moe offline permute & load (#5613) 2025-12-22 15:12:57 +08:00
Longzhi Wang d8587e987e [Model] tp+ep support v1_loader (#5465)
* [Model] tp+ep support v1_loader

* fix

* fix mtp_linear

* fix mtp_linear

* fix

* fix

* fix v0 loader

* fix

* Add get_tensor for ep

* fix linear weight_loader

* fix typo

* fix
2025-12-18 14:31:54 +08:00
zhupengyang 8735cb5045 [XPU] refactor moe ffn (#5501)
- remove BKCL_DISPATCH_ALL_GATHER
- support sparse mode
- support moe quant_method
2025-12-18 14:14:05 +08:00
fmiao2372 404cf0ece4 [Intel HPU] enable tensor_wise_fp8 (#5324)
* [Intel HPU] enable tensor_wise_fp8

* update code based on comments

* fix code style issue

* fix bug about RP 5138

* mv kv_cache modifications to HPU backend

* fix FP8 Precision Issues

* fix FP8 Precision Issues

* Add quantization UT

---------

Co-authored-by: yanfeich <yanfei.cheng@intel.com>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-17 16:45:03 +08:00
RAM 6fc5eccf83 [RL] R3 Support RDMA Store (#5467)
* [RL] R3 support rdma store

* refine notes

* refine code

* disable prefix cache

* support preempted task and put cpu tensor
2025-12-16 16:50:13 +08:00
bukejiyu 4066dfb4a6 RL fix (#5503) 2025-12-11 19:25:27 +08:00
周周周 ff353b922f [Others] update tbo related code (#5485)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-12-11 12:34:46 +08:00
Sunny-bot1 364197c4b5 support w4afp8 mtp (#5429) 2025-12-08 20:24:00 +08:00
RAM b2908b8e82 [New][RL] Support Rollout Routing Replay (#5405)
* [RL] Support Rollout Routing Replay

* add routing indices cache

* fix config bug and moe forward bug

* R3 Support GLM

* support eb4.5

* fix merge bug

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* add routing replay ci

* support glm topk

* support orther top_k

* fix ci bug

* pre-commit

* only support chatcmpl

* Revert "Revert "[RL] Support Rollout Routing Replay (#5321)" (#5402)"

This reverts commit c45e064f3d.

* Fix XPU and NPU bug

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Yuanle Liu <yuanlehome@163.com>
2025-12-05 22:06:26 +08:00
Jiang-Jia-Jun c45e064f3d Revert "[RL] Support Rollout Routing Replay (#5321)" (#5402)
This reverts commit 96d2d4877b.
2025-12-05 20:19:39 +08:00
RAM 96d2d4877b [RL] Support Rollout Routing Replay (#5321)
* [RL] Support Rollout Routing Replay

* add routing indices cache

* fix config bug and moe forward bug

* R3 Support GLM

* support eb4.5

* fix merge bug

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* add routing replay ci

* support glm topk

* support orther top_k

* fix ci bug

* pre-commit

* only support chatcmpl

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Yuanle Liu <yuanlehome@163.com>
2025-12-05 20:01:33 +08:00
周周周 c83dc58105 [Feature] support Two batch overlap, mainly used in Prefill (#5078) 2025-12-05 14:58:50 +08:00
Longzhi Wang 5cd17fd662 [Models] Add forward_meta to moe models' forward function (#5138)
* [Models] Add forward_meta to moe models' forward function

* fix missing param

* fix

* fix

* fix forward_meta

* fix test and remove chunked MoE releated in config

* fix test

* fix

* fix
2025-12-04 13:26:58 +08:00
fmiao2372 209006e6a6 [Intel HPU] fix memory fragmentation issue due to warmup process and fix moe all_reduce issue (#5357) 2025-12-04 11:29:41 +08:00
lzy 690bcb8e50 [Optimization] 1.fix tp+ep moe_forward; 2.set max_prefill_batch=env.MAX_PREFILL_NUM (#5315) 2025-12-03 13:33:15 +08:00
Sunny-bot1 3629db4129 [Quantization] Support w4afp8 MoE dynamic quantization (#5282)
* support dynamic activation quant for w4afp8

* support dynamic w4afp8

* add test

* fix

* fix

---------

Co-authored-by: zhoutianzi666 <17801055074@163.com>
2025-12-02 18:56:16 +08:00
K11OntheBoat 2e1680838f [PD Disaggregation] Support PD deployment of DeepSeekv3. (#5251)
* Support deepseekv3 cache transfer for PD deploy

* clean some log info

---------

Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
2025-12-02 14:11:50 +08:00
chen aa35ce449d [Optimization] EP empty_input_forward Remove Communication (#5254) 2025-12-01 21:10:40 +08:00
Longzhi Wang add524d80c [Feature] support chunked moe (#4575)
* [Feature] support chunked moe

* update

* update

* fix and add test

* update

* fix conflict and modity test

* fix fused_moe

* fix fused_moe

* fix docstring

* fix

* fix typo

* fix test

* fix

* fix

* fix test

* fix test
2025-12-01 15:17:18 +08:00
fmiao2372 2c7683d551 [Intel HPU] change MoE weights and scales from list to tensor and add… (#5289)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* [Intel HPU] change MoE weights and scales from list to tensor and add q/k rms norm

* update doc

* move HPU_CHUNK_SIZE into envs
2025-11-28 19:17:05 +08:00
Yuanle Liu cb56d46694 [Optimization] Refine row parallel bias and nranks and moe all_reduce (#5247)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
* rename nranks to tp_size and fix bias in v1 loader

* fix

* update
2025-11-26 05:09:09 -08:00
chen 209970836e [BugFix] BF16 MoE Cutlass Backend Support EP (#5242) 2025-11-26 19:16:22 +08:00
xiaoxiaohehe001 e150a418d4 support moe offline quant (#5142) 2025-11-24 18:59:18 +08:00
xiaoxiaohehe001 95f3c8c641 [Fix] Fix eplb bug and support fp8 load weight (#5178)
* fix eplb part2

* fix eplb part2

* fix eplb part2
2025-11-24 15:31:37 +08:00
xiaoxiaohehe001 6471dade4a [Fix] Fix noaux ep test (#5161)
* support noaux eplb

* noaux_eplb

* noaux_eplb

* noaux_eplb

* noaux_eplb
2025-11-21 16:36:41 +08:00
xiaoxiaohehe001 6ca2651995 [Feature] Support noaux for eplb (#5143)
* support noaux eplb

* noaux_eplb

* noaux_eplb

* noaux_eplb
2025-11-21 14:10:32 +08:00
Ryan 0857099191 mv import (#5146) 2025-11-20 19:25:56 +08:00
Sunny-bot1 bde97e09f7 support dynamic activation quant for w4afp8 (#5117) 2025-11-19 21:11:16 +08:00
Sunny-bot1 43f0c7557e [Feature] Add an unquantized option for MoE and Dense quant type (#4813) 2025-11-19 16:24:03 +08:00
bukejiyu a82f25ea7b [RL]Resolve shape mismatch problems in RL-related modules (#5032)
* RL fix

* update
2025-11-19 11:12:48 +08:00
MingkunZhang a36c958c66 [Metax] support default_v1 loader based #4988 (#5001)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-11-18 09:44:30 +08:00
yangjianfengo1 3afb717995 【Fix】fix deepep dispatch (#5036)
* fix dispatch

* fix dispatch

---------

Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
2025-11-17 10:34:01 +08:00
yzwu 3b80a799ab [Iluvatar][CI] Fix moe_expert_dispatch cannot support dequant_scale (#5012)
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-11-17 10:18:42 +08:00
yangjianfengo1 ae7bee8122 【New Feature】W4afp8 supports per group quantization (#4987)
* w4afp8 支持per group

* code style

* fix transpose

* revert fast hardmard

---------

Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>
Co-authored-by: plusNew001 <95567040+plusNew001@users.noreply.github.com>
2025-11-13 19:17:27 +08:00
ming1753 3148dbca06 [BugFix] fix VL fp8 bug when moe token_num is 0 (#4928)
* [BugFix] fix VL fp8 bug when moe token_num is 0

* fix bug

* format

* fix bug
2025-11-12 21:19:36 +08:00
yzwu 76e60e98f8 [Iluvatar][CI] fix safetensors_rust.SafetensorError: framework paddle is invalid (#4972) 2025-11-12 14:13:40 +08:00