Yuanle Liu
|
6d3fede240
|
[OP][Feature] 统一 limit_thinking_content_length CUDA 算子,支持回复长度限制与注入序列 (#6493)
* Initial plan
* Migrate PRs #6311, #6129, #6305 to develop and merge unit tests
Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>
* fix
* update
* fix
* fix ci
* fix ci
* Initial plan
* test: add test_chat_with_response_max_tokens to test_EB_VL_Lite_serving.py
Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>
* test: add disable-thinking case to test_chat_with_response_max_tokens
Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>
* test: add both reasoning_max_tokens and response_max_tokens case
Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>
* fix ci
* fix ci
* fix ci
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>
|
2026-02-25 21:36:50 +08:00 |
|
bukejiyu
|
12d4b4cb87
|
[Feature]Support reorder ids to split prefill and decodes (#5779)
* support reorder ids
* perfect code
* fix
* fix unittest
* delete code
* fix
* add python api
* delete custom op
* update algorithm
* fix swap
* support condense
* support condense
* support mtp
* delete code
* update
* update
* update
* update
* update for other platfrom
* update
* fix
* fix mtp
* fix ut
* update
* fix ut
* update ut
* fix
* fix encoder_cache
* fix ci
* fix
* fix vl
* Fix performance regression
* fix
* fix
* fix mtp
* fix index->req_id mapping
* fix ut
---------
Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com>
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
|
2026-02-03 00:28:02 -08:00 |
|
周周周
|
e237313797
|
[BugFix] allow return code 250 in tests/distributed/test_fusedmoe_ep_entry.py (#6269)
|
2026-01-29 16:00:03 +08:00 |
|
sunxin
|
bef6293552
|
[Model Runner] Add exist_prefill_flag (#6172)
|
2026-01-23 13:07:05 +08:00 |
|
周周周
|
8f035101ad
|
initial commit (#6054)
Co-authored-by: xiaoluomi <1037819816@qq.com>
|
2026-01-16 10:49:38 +08:00 |
|
周周周
|
d38cd8b40b
|
[UNITEST] add EP TP test_fused_moe CI (#5989)
|
2026-01-15 21:37:32 +08:00 |
|
RAM
|
b2908b8e82
|
[New][RL] Support Rollout Routing Replay (#5405)
* [RL] Support Rollout Routing Replay
* add routing indices cache
* fix config bug and moe forward bug
* R3 Support GLM
* support eb4.5
* fix merge bug
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* add routing replay ci
* support glm topk
* support orther top_k
* fix ci bug
* pre-commit
* only support chatcmpl
* Revert "Revert "[RL] Support Rollout Routing Replay (#5321)" (#5402)"
This reverts commit c45e064f3d.
* Fix XPU and NPU bug
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Yuanle Liu <yuanlehome@163.com>
|
2025-12-05 22:06:26 +08:00 |
|
Jiang-Jia-Jun
|
c45e064f3d
|
Revert "[RL] Support Rollout Routing Replay (#5321)" (#5402)
This reverts commit 96d2d4877b.
|
2025-12-05 20:19:39 +08:00 |
|
RAM
|
96d2d4877b
|
[RL] Support Rollout Routing Replay (#5321)
* [RL] Support Rollout Routing Replay
* add routing indices cache
* fix config bug and moe forward bug
* R3 Support GLM
* support eb4.5
* fix merge bug
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* add routing replay ci
* support glm topk
* support orther top_k
* fix ci bug
* pre-commit
* only support chatcmpl
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Yuanle Liu <yuanlehome@163.com>
|
2025-12-05 20:01:33 +08:00 |
|
Longzhi Wang
|
5cd17fd662
|
[Models] Add forward_meta to moe models' forward function (#5138)
* [Models] Add forward_meta to moe models' forward function
* fix missing param
* fix
* fix
* fix forward_meta
* fix test and remove chunked MoE releated in config
* fix test
* fix
* fix
|
2025-12-04 13:26:58 +08:00 |
|
lzy
|
f458cc5ba4
|
[Optimization]1.fix tp+ep moe_forward; 2.set max_prefill_batch=env.MAX_PREFILL_NUM (#5353)
* [Optimization] 1.fix tp+ep moe_forward; 2.set max_prefill_batch=env.MAX_PREFILL_NUM
* fix test_chunked_moe
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
|
2025-12-03 16:42:10 +08:00 |
|
YuBaoku
|
dfeabee123
|
[CI] Allow occasional distributed worker exit_code (#5341)
|
2025-12-03 10:56:59 +08:00 |
|
YuBaoku
|
69e003abcb
|
[CI] Fix return_code check in test_chunked_moe.py (#5326)
|
2025-12-02 15:41:26 +08:00 |
|
Longzhi Wang
|
add524d80c
|
[Feature] support chunked moe (#4575)
* [Feature] support chunked moe
* update
* update
* fix and add test
* update
* fix conflict and modity test
* fix fused_moe
* fix fused_moe
* fix docstring
* fix
* fix typo
* fix test
* fix
* fix
* fix test
* fix test
|
2025-12-01 15:17:18 +08:00 |
|
Echo-Nie
|
2aabaecbc2
|
[CI] Add five unittest (#4958)
* add unittest
* Update test_logger.py
|
2025-11-12 10:43:33 +08:00 |
|
chen
|
b134e6afe6
|
[BugFix]Dev fix custom ar unstable result (#4437)
|
2025-10-17 11:47:16 +08:00 |
|
co63oc
|
c4830ef24c
|
fix typos (#4176)
* fix typos
* fix
|
2025-09-22 14:27:17 +08:00 |
|
YUNSHEN XIE
|
3a6058e445
|
Add stable ci (#3460)
* add stable ci
* fix
* update
* fix
* rename tests dir;fix stable ci bug
* add timeout limit
* update
|
2025-08-20 08:57:17 +08:00 |
|