周周周
|
73bd4ab318
|
[Feature] 为 FusedMoE 添加 hidden_size 显式参数支持 (#7361)
[Feature] 为 FusedMoE 添加 hidden_size 显式参数支持
|
2026-04-13 20:24:58 +08:00 |
|
sunxin
|
53aaac69da
|
[Optimization] Enable BF16 gate computation for GLM and Qwen (#6457)
* gate bf16
* add gate-fp32
* fix
* update baseline
* update
* update
* fix
|
2026-02-26 21:08:46 -08:00 |
|
chen
|
29a313a402
|
[Optimization] Support FA2/FA3/FA4 with attn_mask_q (#6354)
* support FA4 sm100
* flash attn backend support mask
* flash attn backend run flashmask correct
* add test for flash_attn_backend and flash_attn_func
* check
* add test for fa4
* requirements.txt add fa4 whl
* check test on sm100
* fix CI conflict
* add enable_torch_proxy for flash_mask
* lazy import fa4
* check
* fix tests import
* check test_load_mpt import
|
2026-02-05 14:39:00 +08:00 |
|
RAM
|
b2908b8e82
|
[New][RL] Support Rollout Routing Replay (#5405)
* [RL] Support Rollout Routing Replay
* add routing indices cache
* fix config bug and moe forward bug
* R3 Support GLM
* support eb4.5
* fix merge bug
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* add routing replay ci
* support glm topk
* support orther top_k
* fix ci bug
* pre-commit
* only support chatcmpl
* Revert "Revert "[RL] Support Rollout Routing Replay (#5321)" (#5402)"
This reverts commit c45e064f3d.
* Fix XPU and NPU bug
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Yuanle Liu <yuanlehome@163.com>
|
2025-12-05 22:06:26 +08:00 |
|
Jiang-Jia-Jun
|
c45e064f3d
|
Revert "[RL] Support Rollout Routing Replay (#5321)" (#5402)
This reverts commit 96d2d4877b.
|
2025-12-05 20:19:39 +08:00 |
|
RAM
|
96d2d4877b
|
[RL] Support Rollout Routing Replay (#5321)
* [RL] Support Rollout Routing Replay
* add routing indices cache
* fix config bug and moe forward bug
* R3 Support GLM
* support eb4.5
* fix merge bug
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* add routing replay ci
* support glm topk
* support orther top_k
* fix ci bug
* pre-commit
* only support chatcmpl
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Yuanle Liu <yuanlehome@163.com>
|
2025-12-05 20:01:33 +08:00 |
|
Sunny-bot1
|
3629db4129
|
[Quantization] Support w4afp8 MoE dynamic quantization (#5282)
* support dynamic activation quant for w4afp8
* support dynamic w4afp8
* add test
* fix
* fix
---------
Co-authored-by: zhoutianzi666 <17801055074@163.com>
|
2025-12-02 18:56:16 +08:00 |
|