FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 08:21:53 +08:00

Author	SHA1	Message	Date
fxyfxy777	4c92035f2d	[Feature] Unify fp8 block_wise quant ops (#5991 ) * quant stash * blockwise_quant * precommit * rm tensor.cut * tp ok * add swiglu * rm outdate code * fix activate ut * change baseline * fix baseline error	2026-01-15 05:50:37 -08:00
lizexu123	6619298b50	【Optim】Optimize grid dimensions using max_tokens_per_expert for MoE models (#6007 ) * update w4afp8 * build.sh ok * support cuda_graph * fix * add test * fix max_tokens_per_expert * >=70 * fix * compute_max_tokens_from_prefix_sum in w4afp8 * compute_max_tokens use cub	2026-01-15 19:18:42 +08:00
Cheng Yanfei	fbcccaa750	[Intel HPU] enable MoE EP for hpu (#5855 ) * enable HPU MoE EP * MoE intermediate_scale stack * enable loader_v1 esp for tensor_wise_fp8 TP or EP * modify activation_scale name	2026-01-15 13:08:00 +08:00
RAM	b3f59fd9b5	[RL][CI] Support Async R3 And Add Accuracy Test (#5937 ) * add bs1 r3 test case * async put * r3 test case 1.0 * success run eb5 * refine test case * pre-commit * add eb45 & glm testcase * format code * add p2pstore requirements * support only last turn * R3 use worker log * refine code &fix ci bug * refine error mesg * fix empty input bug * Success set acc ci of eb45 and glm45 * refine code * fix bug	2026-01-14 04:25:06 -08:00
xiaoxiaohehe001	00a01ae024	[Feature] Support redundant expert for eplb (#5918 ) * [BugFix] support redundant expert for eplb * support redundant expert for eplb * support redundant expert for eplb * update * fix ci eplb	2026-01-09 17:13:24 +08:00
Ryan	3e74bacc5e	add m_grouped_gemm_fp8_fp8_bf16_nt_contiguous_custom_python_op (#5847 )	2026-01-07 16:17:55 +08:00
lizexu123	1d3ae7c024	[BugFix] fix w4afp8 tp=8 (#5868 ) * fix w4afp8 tp=8 * fix	2026-01-05 18:59:02 +08:00
ming1753	f50e1bcc16	[Others] enable use PFCC deep_ep (#5822 ) * upstream deep_ep * fix bug * fix bug * modify env name	2026-01-05 02:07:01 -08:00
周周周	dc13344ab8	[Optimization] add del to decrease peak memory in MoE prefill (#5863 )	2026-01-05 14:01:48 +08:00
lizexu123	44a13e4557	[Feature] support w4afp8 v1_loader and v0_loader(tp>1) (#5757 ) * support * fix * support w4afp8 v1_loader and v0_loader * fix * fix test * fix test * fix test * fix moe.py * add test_ernie_4_5_w4afp8 * add test * delete tensor * fix test * fix * add * fix test	2025-12-30 14:11:52 +08:00
Ryan	eb782a0225	[BugFix] Fix return value inconsistency for `ep_moe_expert_combine` op (#5812 )	2025-12-29 16:44:00 +08:00
Nyakku Shigure	11227e00bb	[GraphOptimization] Wrap deep gemm and triton as python op (#5673 ) * [GraphOptimization] Wrap deep gemm and triton as python op * add unitest to _base_test && compatibility * paddle.static.MetaTensor -> "paddle.static.MetaTensor" * mv register_custom_python_op * rename yaml --------- Co-authored-by: DrRyanHuang <zihaohuang@aliyun.com>	2025-12-24 15:23:46 +08:00
bukejiyu	d1c6e57341	[Others] upgrade paddleformer to 0.4.0 (#5599 )	2025-12-23 05:08:01 -08:00
Sunny-bot1	04035e4ebf	support w4afp8 two stage (#5608 )	2025-12-22 15:13:05 +08:00
Sunny-bot1	40f3897a4e	support w4afp8 moe offline permute & load (#5613 )	2025-12-22 15:12:57 +08:00
Longzhi Wang	d8587e987e	[Model] tp+ep support v1_loader (#5465 ) * [Model] tp+ep support v1_loader * fix * fix mtp_linear * fix mtp_linear * fix * fix * fix v0 loader * fix * Add get_tensor for ep * fix linear weight_loader * fix typo * fix	2025-12-18 14:31:54 +08:00
zhupengyang	8735cb5045	[XPU] refactor moe ffn (#5501 ) - remove BKCL_DISPATCH_ALL_GATHER - support sparse mode - support moe quant_method	2025-12-18 14:14:05 +08:00
fmiao2372	404cf0ece4	[Intel HPU] enable tensor_wise_fp8 (#5324 ) * [Intel HPU] enable tensor_wise_fp8 * update code based on comments * fix code style issue * fix bug about RP 5138 * mv kv_cache modifications to HPU backend * fix FP8 Precision Issues * fix FP8 Precision Issues * Add quantization UT --------- Co-authored-by: yanfeich <yanfei.cheng@intel.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-17 16:45:03 +08:00
RAM	6fc5eccf83	[RL] R3 Support RDMA Store (#5467 ) * [RL] R3 support rdma store * refine notes * refine code * disable prefix cache * support preempted task and put cpu tensor	2025-12-16 16:50:13 +08:00
bukejiyu	4066dfb4a6	RL fix (#5503 )	2025-12-11 19:25:27 +08:00
周周周	ff353b922f	[Others] update tbo related code (#5485 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-12-11 12:34:46 +08:00
Sunny-bot1	364197c4b5	support w4afp8 mtp (#5429 )	2025-12-08 20:24:00 +08:00
RAM	b2908b8e82	[New][RL] Support Rollout Routing Replay (#5405 ) * [RL] Support Rollout Routing Replay * add routing indices cache * fix config bug and moe forward bug * R3 Support GLM * support eb4.5 * fix merge bug * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * add routing replay ci * support glm topk * support orther top_k * fix ci bug * pre-commit * only support chatcmpl * Revert "Revert "[RL] Support Rollout Routing Replay (#5321)" (#5402)" This reverts commit `c45e064f3d`. * Fix XPU and NPU bug --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Yuanle Liu <yuanlehome@163.com>	2025-12-05 22:06:26 +08:00
Jiang-Jia-Jun	c45e064f3d	Revert "[RL] Support Rollout Routing Replay (#5321 )" (#5402 ) This reverts commit `96d2d4877b`.	2025-12-05 20:19:39 +08:00
RAM	96d2d4877b	[RL] Support Rollout Routing Replay (#5321 ) * [RL] Support Rollout Routing Replay * add routing indices cache * fix config bug and moe forward bug * R3 Support GLM * support eb4.5 * fix merge bug * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * add routing replay ci * support glm topk * support orther top_k * fix ci bug * pre-commit * only support chatcmpl --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Yuanle Liu <yuanlehome@163.com>	2025-12-05 20:01:33 +08:00
周周周	c83dc58105	[Feature] support Two batch overlap, mainly used in Prefill (#5078 )	2025-12-05 14:58:50 +08:00
Longzhi Wang	5cd17fd662	[Models] Add forward_meta to moe models' forward function (#5138 ) * [Models] Add forward_meta to moe models' forward function * fix missing param * fix * fix * fix forward_meta * fix test and remove chunked MoE releated in config * fix test * fix * fix	2025-12-04 13:26:58 +08:00
fmiao2372	209006e6a6	[Intel HPU] fix memory fragmentation issue due to warmup process and fix moe all_reduce issue (#5357 )	2025-12-04 11:29:41 +08:00
lzy	690bcb8e50	[Optimization] 1.fix tp+ep moe_forward; 2.set max_prefill_batch=env.MAX_PREFILL_NUM (#5315 )	2025-12-03 13:33:15 +08:00
Sunny-bot1	3629db4129	[Quantization] Support w4afp8 MoE dynamic quantization (#5282 ) * support dynamic activation quant for w4afp8 * support dynamic w4afp8 * add test * fix * fix --------- Co-authored-by: zhoutianzi666 <17801055074@163.com>	2025-12-02 18:56:16 +08:00
K11OntheBoat	2e1680838f	[PD Disaggregation] Support PD deployment of DeepSeekv3. (#5251 ) * Support deepseekv3 cache transfer for PD deploy * clean some log info --------- Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>	2025-12-02 14:11:50 +08:00
chen	aa35ce449d	[Optimization] EP empty_input_forward Remove Communication (#5254 )	2025-12-01 21:10:40 +08:00
Longzhi Wang	add524d80c	[Feature] support chunked moe (#4575 ) * [Feature] support chunked moe * update * update * fix and add test * update * fix conflict and modity test * fix fused_moe * fix fused_moe * fix docstring * fix * fix typo * fix test * fix * fix * fix test * fix test	2025-12-01 15:17:18 +08:00
fmiao2372	2c7683d551	[Intel HPU] change MoE weights and scales from list to tensor and add… (#5289 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details Publish Job / publish_pre_check (push) Has been cancelled Details Publish Job / print_publish_pre_check_outputs (push) Has been cancelled Details Publish Job / FD-Clone-Linux (push) Has been cancelled Details Publish Job / Show Code Archive Output (push) Has been cancelled Details Publish Job / BUILD_SM8090 (push) Has been cancelled Details Publish Job / BUILD_SM8689 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled Details Publish Job / Run FD Image Build (push) Has been cancelled Details Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled Details Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details Publish Job / Run Base Tests (push) Has been cancelled Details Publish Job / Run Accuracy Tests (push) Has been cancelled Details Publish Job / Run Stable Tests (push) Has been cancelled Details CI Images Build / FD-Clone-Linux (push) Has been cancelled Details CI Images Build / Show Code Archive Output (push) Has been cancelled Details CI Images Build / CI Images Build (push) Has been cancelled Details CI Images Build / BUILD_SM8090 (push) Has been cancelled Details CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled Details CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details CI Images Build / Run Base Tests (push) Has been cancelled Details CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled Details * [Intel HPU] change MoE weights and scales from list to tensor and add q/k rms norm * update doc * move HPU_CHUNK_SIZE into envs	2025-11-28 19:17:05 +08:00
Yuanle Liu	cb56d46694	[Optimization] Refine row parallel bias and nranks and moe all_reduce (#5247 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details Publish Job / publish_pre_check (push) Has been cancelled Details Publish Job / print_publish_pre_check_outputs (push) Has been cancelled Details Publish Job / FD-Clone-Linux (push) Has been cancelled Details Publish Job / Show Code Archive Output (push) Has been cancelled Details Publish Job / BUILD_SM8090 (push) Has been cancelled Details Publish Job / BUILD_SM8689 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled Details Publish Job / Run FD Image Build (push) Has been cancelled Details Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled Details Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details Publish Job / Run Base Tests (push) Has been cancelled Details Publish Job / Run Accuracy Tests (push) Has been cancelled Details Publish Job / Run Stable Tests (push) Has been cancelled Details CI Images Build / FD-Clone-Linux (push) Has been cancelled Details CI Images Build / Show Code Archive Output (push) Has been cancelled Details CI Images Build / CI Images Build (push) Has been cancelled Details CI Images Build / BUILD_SM8090 (push) Has been cancelled Details CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled Details CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details CI Images Build / Run Base Tests (push) Has been cancelled Details CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled Details * rename nranks to tp_size and fix bias in v1 loader * fix * update	2025-11-26 05:09:09 -08:00
chen	209970836e	[BugFix] BF16 MoE Cutlass Backend Support EP (#5242 )	2025-11-26 19:16:22 +08:00
xiaoxiaohehe001	e150a418d4	support moe offline quant (#5142 )	2025-11-24 18:59:18 +08:00
xiaoxiaohehe001	95f3c8c641	[Fix] Fix eplb bug and support fp8 load weight (#5178 ) * fix eplb part2 * fix eplb part2 * fix eplb part2	2025-11-24 15:31:37 +08:00
xiaoxiaohehe001	6471dade4a	[Fix] Fix noaux ep test (#5161 ) * support noaux eplb * noaux_eplb * noaux_eplb * noaux_eplb * noaux_eplb	2025-11-21 16:36:41 +08:00
xiaoxiaohehe001	6ca2651995	[Feature] Support noaux for eplb (#5143 ) * support noaux eplb * noaux_eplb * noaux_eplb * noaux_eplb	2025-11-21 14:10:32 +08:00
Ryan	0857099191	mv import (#5146 )	2025-11-20 19:25:56 +08:00
Sunny-bot1	bde97e09f7	support dynamic activation quant for w4afp8 (#5117 )	2025-11-19 21:11:16 +08:00
Sunny-bot1	43f0c7557e	[Feature] Add an unquantized option for MoE and Dense quant type (#4813 )	2025-11-19 16:24:03 +08:00
bukejiyu	a82f25ea7b	[RL]Resolve shape mismatch problems in RL-related modules (#5032 ) * RL fix * update	2025-11-19 11:12:48 +08:00
MingkunZhang	a36c958c66	[Metax] support default_v1 loader based #4988 (#5001 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-11-18 09:44:30 +08:00
yangjianfengo1	3afb717995	【Fix】fix deepep dispatch (#5036 ) * fix dispatch * fix dispatch --------- Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com>	2025-11-17 10:34:01 +08:00
yzwu	3b80a799ab	[Iluvatar][CI] Fix moe_expert_dispatch cannot support dequant_scale (#5012 ) Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-11-17 10:18:42 +08:00
yangjianfengo1	ae7bee8122	【New Feature】W4afp8 supports per group quantization (#4987 ) * w4afp8 支持per group * code style * fix transpose * revert fast hardmard --------- Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com> Co-authored-by: plusNew001 <95567040+plusNew001@users.noreply.github.com>	2025-11-13 19:17:27 +08:00
ming1753	3148dbca06	[BugFix] fix VL fp8 bug when moe token_num is 0 (#4928 ) * [BugFix] fix VL fp8 bug when moe token_num is 0 * fix bug * format * fix bug	2025-11-12 21:19:36 +08:00
yzwu	76e60e98f8	[Iluvatar][CI] fix safetensors_rust.SafetensorError: framework paddle is invalid (#4972 )	2025-11-12 14:13:40 +08:00

1 2 3 4

161 Commits