周周周
73bd4ab318
[Feature] 为 FusedMoE 添加 hidden_size 显式参数支持 ( #7361 )
...
[Feature] 为 FusedMoE 添加 hidden_size 显式参数支持
2026-04-13 20:24:58 +08:00
AIbin
1fb8194191
[OP][Models][Optimization] 优化 RoPE CUDA kernel 并更新 DeepSeek V3 配置 ( #7359 )
...
* dsk del prefill mask
* dsk support 1M+ seq_len rope
* update rope tests
* Replace max_position_embeddings with max_model_len
* 1D grid: gridDim.x has a maximum size of 2^31-1, far exceeding the actual number of tokens.
2026-04-13 19:12:36 +08:00
AIbin
ba01d7a823
[Optimization] [OP] [Models] dsk del prefill mask ( #7313 )
...
* dsk del prefill mask
* dsk support 1M+ seq_len rope
* update rope tests
2026-04-11 19:32:27 +08:00
zhangbo9674
627f0d9cc8
[RL] change rms norm for glm ( #7269 )
...
* change rms norm for glm
* refine code
* refine code
* refine code
2026-04-10 01:02:37 -07:00
JYChen
43ace7af25
[RL] support moe-topk use topk_reduce_func ( #7218 )
...
* support moe-topk use topk_reduce_func
* fix ep error
* fix ut
* fix ut
2026-04-09 11:01:03 +08:00
AIbin
48d2bbeb74
fix dsa ( #7252 )
2026-04-08 20:21:38 +08:00
sunxin
ae2f9f4d22
[BugFix] Enable moe_gate_fp32 using FD_ENABLE_RL ( #7130 )
...
* rl gate fp32
* clean
2026-04-06 21:07:38 -07:00
AIbin
1090f8b123
[Models]support GLM4.7 Flash && Ernie_MLA ( #7139 )
...
* support GLM4.7 Flash && Ernie_MLA
2026-04-03 17:41:33 +08:00
fxyfxy777
9f3b3ce7f5
[Optimization] merge_allreduce ( #7039 )
2026-04-02 19:52:13 +08:00
YilongGuo
dd61e7e421
[Qwen3VL] Add clear_grpah_opt_backend method to Qwen3VLForConditionalGeneration ( #7086 )
...
Add clear_grpah_opt_backend method that delegates to the underlying model
to clear cuda graph optimization backend.
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com >
2026-03-31 13:48:25 +08:00
Nyakku Shigure
8b6bbb3504
[Optimization] Use a separate driver when using Triton with Paddle ( #6897 )
...
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2026-03-24 10:56:00 +08:00
jackyYang6
00eb12f656
[BugFix][Models] Unify PaddleFormers fused QKV TP loading and stabilize fallback TP path ( #6555 )
...
* [BugFix][Models] avoid custom all-reduce in PaddleFormers fallback TP path and tighten TP-aware layout matching
* [BugFix][Models] unify PaddleFormers fused QKV TP loading and align fallback tests
2026-03-20 16:37:58 +08:00
AIbin
bf7e2424d0
[Optimization][Feature]Supports multiple batches of DSK-DSA. ( #6930 )
...
* support DSA_MUTI_BATCH
* update test topk
* update dsk-dsa
2026-03-20 15:59:22 +08:00
AIbin
4794a28f3d
opt glm5 model ( #6916 )
2026-03-19 11:13:33 +08:00
AIbin
9b117aafac
support glm-moe-dsa model ( #6863 )
2026-03-18 17:21:55 +08:00
gongweibao
a6351dea0b
[BugFix][Optimization] Replace silent failures with catchable exceptions and informative error messages ( #6533 )
...
* init
* init
* fix format
* add
* add files
* add ut
* fix some
* add ut
* add more
* add
* fix pre-commit
* fix pre-commit
* fix cover
* skip long seq
* add
* add
* fix
* remove not need
* fix set attr
* fix comments
* fix comments
* fix failed tests
---------
Co-authored-by: gongweibao <gognweibao@baidu.com >
2026-03-16 21:32:43 +08:00
AIbin
c9f7f5234e
[Optimization][BugFix]Optimize Deepseek networking code ( #6861 )
...
* update dsk model
* update dsk model
2026-03-16 16:52:43 +08:00
ming1753
bb925c605f
[Other] Adjust GPUModelRunner to enhance compatibility ( #6851 )
2026-03-16 14:49:19 +08:00
fxyfxy777
8eb177147c
[BugFix]rm draft code for glm ( #6810 )
...
* rm draft code for glm
* fix baseline
* fix baseline 2
2026-03-12 23:26:05 -07:00
AIbin
2b8a5b0d81
update indexer model ( #6791 )
2026-03-13 14:11:39 +08:00
fxyfxy777
250ce40b40
[Feature] use phi permute/unpermute & rm swiglu ( #6361 )
...
* tp文字输出正常
* B eb5 mini文字输出正常
* eb5mini ep B卡 文字输出正常
* default use phi moe op
* stash
* tp H卡正常
* ep ok
* rm debug
* rm debug tool
* rm del ffn_out
* rm swiglu
* add envs to swiglu
* merge dev
* fix ci baseline
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com >
* fix ci baseline 2
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com >
2026-03-12 02:01:57 -07:00
AIbin
1118351b27
[Optimization] Update Deepseekv3.2 model and dsa-indexer networking and add some unitest ( #6762 )
...
* add deepseek model doc
* update deepseek model doc
* update deepseek model doc
* update deepseek model doc
* cwb suppor DSK_V32 Model
* update DSK_V32_DSA modeling
* Ibin Support DSK_DSA
* update kernel
* update yaml
* update requirements
* update pre_commit
* update model-runner
* fix CI bug
* del start.sh
* fix iluvatar_model_runner
* update DSA & add unitest
* update import deep_gemm
2026-03-11 15:52:54 +08:00
bukejiyu
cffa8c246c
[Others]update paddleformer 1.0.0 ( #6496 )
...
* update paddleformer 1.0.0
* update
2026-03-11 15:06:29 +08:00
AIbin
c3aceb6bdc
[Models][OP][Optimization] Support DeepSeek-v3.2 model, integrate DSA & Indexer architecture with FlashMLA/DeepGEMM ( #6689 )
...
* Support DeepSeek-v3.2 model, integrate DSA & Indexer architecture with FlashMLA/DeepGEMM
2026-03-10 15:05:14 +08:00
周周周
cebe6f7dae
clean nvfp4 related code ( #6644 )
2026-03-05 15:48:33 +08:00
周周周
3cc09418f1
support dsv3 use flashmla ( #6593 )
2026-03-03 11:09:43 +08:00
yzwu
6674131b0b
[Iluvatar] Support CudaGraph and optimize flash_attn_unpadded and fused_neox_rope_embedding ( #6553 )
2026-03-02 14:07:17 +08:00
周周周
1503443871
add dsv3 mixed deploy as EP16 TP8 ( #6525 )
2026-02-27 14:08:25 +08:00
sunxin
53aaac69da
[Optimization] Enable BF16 gate computation for GLM and Qwen ( #6457 )
...
* gate bf16
* add gate-fp32
* fix
* update baseline
* update
* update
* fix
2026-02-26 21:08:46 -08:00
jackyYang6
38c3e02470
fix paddleformers fallback ( #6465 )
2026-02-23 15:29:13 +08:00
bukejiyu
dc5917289d
[loader]supoort wint2 backend ( #6139 )
...
* support wint2
* update
2026-02-08 22:42:36 -08:00
chen
72fe94cb13
[Feature] support glm tp+dp+ep ( #6317 )
2026-02-05 21:47:01 +08:00
GoldPancake
183b8d325a
[RL] Support GLM MTP RL Model ( #6267 )
2026-02-04 20:14:35 +08:00
GoldPancake
fb374238e1
Revert "[RL] Support GLM MTP RL Model ( #6223 )" ( #6301 )
...
This reverts commit af6c84d48d .
2026-02-02 14:08:13 +08:00
GoldPancake
af6c84d48d
[RL] Support GLM MTP RL Model ( #6223 )
...
* support glm mtp rl model
* fix
* fix
* fix ut
* update baseline
2026-01-28 08:28:03 -08:00
ddchenhao66
6d33d5e370
[Models][BugFix] shared experts and dense mlp layer do not require TP split ( #6180 )
...
Co-authored-by: ddchenhao66 <dhaochen163.com>
2026-01-28 18:58:19 +08:00
Haonan Luo
82057cb71f
Support MXFP4 for GPT-OSS ( #5435 )
...
* support mxfp4 in gpt-oss
* support mxfp4 in gpt-oss
* add scope for flashinfer
* remove torch code
* update envs.FD_MXFP4_BACKEND
* update process_weights_after_loading
* update env name
* support tp in gpt-oss, add e2e test
* add flashinfer-python-paddle in requirements
* fix import error
* add test
* add test
* add test
* add test
2026-01-22 14:21:01 +08:00
jackyYang6
988e0bc338
[Feature] Add PaddleFormers fallback backend ( #5999 )
...
* feat(paddleformers): add dense text model fallback backend
* docs(paddleformers): add user guide and fix code review issues
* add fallback unit test
* precommit format
* fix pre-commit
* fix: address code review feedback
* docs: add PaddleFormers backend documentation (EN) and simplify installation
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2026-01-19 21:50:50 +08:00
GoldPancake
879e45f6b3
fix compute logits problem ( #6093 )
2026-01-19 20:12:14 +08:00
sunxin
9dc1c74d36
fix opt qknorm ( #6080 )
2026-01-19 12:07:20 +08:00
GoldPancake
bda38aa519
[Speculative Decoding] Support MTP for GLM-4.5-Air ( #6047 )
...
* glm mtp
* add spec neox partial rope
2026-01-16 14:35:24 +08:00
Cheng Yanfei
fbcccaa750
[Intel HPU] enable MoE EP for hpu ( #5855 )
...
* enable HPU MoE EP
* MoE intermediate_scale stack
* enable loader_v1 esp for tensor_wise_fp8 TP or EP
* modify activation_scale name
2026-01-15 13:08:00 +08:00
xiaoxiaohehe001
6f72be7c3e
[Optimize] Qwen2.5-VL vision model with merged linear layers and unif… ( #6037 )
...
* [Optimize] Qwen2.5-VL vision model with merged linear layers and unified normalization
* [Optimize] Qwen2.5-VL vision model with merged linear layers and unified normalization
2026-01-14 19:21:31 +08:00
sunxin
2533836dbb
[Optimization] Accelerate Qwen3 QK RMSNorm via Fused Triton Kernel ( #5880 )
...
* qk rmsnorm fused
* inplace
* glm
* fix
* add qknorm layer
* fix
* update
* fix qwen3 vl
* update rl baseline
* fix qwen3 vl moe
* test
* fix qwen vl moe rl
* fix
2026-01-12 05:10:21 -08:00
xiaoxiaohehe001
00a01ae024
[Feature] Support redundant expert for eplb ( #5918 )
...
* [BugFix] support redundant expert for eplb
* support redundant expert for eplb
* support redundant expert for eplb
* update
* fix ci eplb
2026-01-09 17:13:24 +08:00
CSWYF3634076
e6cdea4492
[Models] Qwen3VL and Qwen3VL-Moe CUDA graph Support ( #5962 )
...
* [Models] add Qwen3VL and Qwen3VL-Moe CUDA graph support
* [Models] add Qwen3VL and Qwen3VL-Moe CUDA graph support v2
* [Models] add Qwen3VL and Qwen3VL-Moe CUDA graph support v3
2026-01-09 17:09:02 +08:00
Yuanle Liu
d4a386dfc4
Revert "Revert "[TSP] last_norm allgather move to model.py ( #5924 )" ( #5961 )" ( #5972 )
...
This reverts commit 8c3513a410 .
2026-01-09 15:58:22 +08:00
Yuanle Liu
8c3513a410
Revert "[TSP] last_norm allgather move to model.py ( #5924 )" ( #5961 )
...
This reverts commit 2bb838fed9 .
2026-01-09 15:20:40 +08:00
xiaoluomi
2bb838fed9
[TSP] last_norm allgather move to model.py ( #5924 )
...
* support_lastnorm_gather_split_dev
* support_lastnorm_gather_split_dev1
* support_lastnorm_gather_split_dev3
* support_lastnorm_gather_split_dev4
* support_lastnorm_gather_split_dev5
2026-01-07 23:36:33 -08:00
CSWYF3634076
d8fcb7c07d
[Models] Add Qwen3-VL Moe Model Support ( #5913 )
...
* [Model] add Qwen3vl moe model support
* [Model] add Qwen3vl moe model support remove log
* [Model] add Qwen3vl moe model support unittest
2026-01-08 11:36:42 +08:00