sunxin
|
a79b82ce68
|
[BugFix] fix seq_lens_this_time init (#6670)
|
2026-03-05 17:07:26 +08:00 |
|
sunxin
|
0dc7034ce0
|
[Model Runner] Deprecate not_need_stop (#6356)
* Deprecate not_need_stop
|
2026-03-05 10:55:42 +08:00 |
|
ming1753
|
02d32eea3b
|
Revert "[Bug Fix] Fix MM mtp incorrect rope emb (#6581)" (#6631)
This reverts commit c5eb6b65e7.
|
2026-03-04 11:23:28 +08:00 |
|
ming1753
|
c5eb6b65e7
|
[Bug Fix] Fix MM mtp incorrect rope emb (#6581)
* [Bug Fix] Fix MM mtp incorrect rope emb
|
2026-03-03 19:28:59 +08:00 |
|
周周周
|
3cc09418f1
|
support dsv3 use flashmla (#6593)
|
2026-03-03 11:09:43 +08:00 |
|
ming1753
|
344db8c8af
|
[BugFix] Fix mtp when token_ids_all is None (#6591)
* [BugFix] Fix mtp when token_ids_all is None
* fix bug
|
2026-03-02 01:23:44 -08:00 |
|
周周周
|
d957ccd46d
|
seq_lens related tensor shape -> [max_num_seqs] (#6535)
|
2026-03-02 11:18:30 +08:00 |
|
ming1753
|
97eee75677
|
[Feature] GPU Memory Optimization and Retirement of V0 Scheduler (#6407)
* Optim GPU Mem Usage
---------
Co-authored-by: huzesen <huzesen@baidu.com>
|
2026-02-28 15:07:43 +08:00 |
|
gongweibao
|
edd31e8849
|
[Feature] Add Deterministic Inference Support (#6476)
* add
* [tests] Add Paddle attention determinism tests and refactor resource manager
Add comprehensive determinism tests for Paddle attention layer and refactor
resource manager for deterministic mode support.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* add
* add
* add
* add
* add more
* add more
* fixsome
* fixsome
* fix bugs
* fix bugs
* only in gpu
* add docs
* fix comments
* fix some
* fix some
* fix comments
* add more
* fix potential problem
* remove not need
* remove not need
* remove no need
* fix bug
* fix bugs
* fix comments
* fix comments
* Update tests/ce/deterministic/test_determinism_verification.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update tests/inter_communicator/test_ipc_signal.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update tests/layers/test_paddle_attention_determinism.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update tests/engine/test_sampling_params_determinism.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update tests/layers/test_paddle_attention_determinism.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update tests/layers/test_paddle_attention_determinism_standalone.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix comments
* fix import error
* fix a bug
* fix bugs
* fix bugs
* fix coverage
* refine codes
* refine code
* fix comments
* fix comments
* fix comments
* rm not need
* fix allreduce large tensor bug
* mv log files
* mv log files
* add files
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
2026-02-26 19:31:51 -08:00 |
|
GoldPancake
|
2178f2829b
|
[Speculative Decoding] Support suffix decoding (#6403)
* support suffix decoding
|
2026-02-26 11:42:05 +08:00 |
|
Yuanle Liu
|
6d3fede240
|
[OP][Feature] 统一 limit_thinking_content_length CUDA 算子,支持回复长度限制与注入序列 (#6493)
* Initial plan
* Migrate PRs #6311, #6129, #6305 to develop and merge unit tests
Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>
* fix
* update
* fix
* fix ci
* fix ci
* Initial plan
* test: add test_chat_with_response_max_tokens to test_EB_VL_Lite_serving.py
Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>
* test: add disable-thinking case to test_chat_with_response_max_tokens
Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>
* test: add both reasoning_max_tokens and response_max_tokens case
Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>
* fix ci
* fix ci
* fix ci
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>
|
2026-02-25 21:36:50 +08:00 |
|
kevin
|
52edf5e9b3
|
fix mtp acceptance rate decline (#6470)
|
2026-02-12 19:56:10 +08:00 |
|
kevin
|
3ce842b55b
|
[BugFix] add reset shared inputs when update weight dummy run (#6331)
* fix dummy run input bug
* update code
* update code
* update code
* update code
|
2026-02-10 10:29:03 +08:00 |
|
bukejiyu
|
5bfc0938e2
|
[BugFix] PD reorder fix and add ut (#6375)
|
2026-02-09 04:42:48 -08:00 |
|
周周周
|
2b4748de4f
|
[MTP] refactor MTP pre_process (#6358)
|
2026-02-09 10:47:15 +08:00 |
|
sunxin
|
ef47e6eb46
|
[Others]skip to_tensor (#6342)
|
2026-02-04 17:25:19 +08:00 |
|
MingkunZhang
|
e109fb9a0e
|
[Metax][Fix] fix issues based #6259 (#6338)
|
2026-02-03 23:21:35 -08:00 |
|
sunxin
|
9b0a82cfa9
|
[Model Runner] Support overlap schedule (#6259)
|
2026-02-04 10:49:44 +08:00 |
|
bukejiyu
|
12d4b4cb87
|
[Feature]Support reorder ids to split prefill and decodes (#5779)
* support reorder ids
* perfect code
* fix
* fix unittest
* delete code
* fix
* add python api
* delete custom op
* update algorithm
* fix swap
* support condense
* support condense
* support mtp
* delete code
* update
* update
* update
* update
* update for other platfrom
* update
* fix
* fix mtp
* fix ut
* update
* fix ut
* update ut
* fix
* fix encoder_cache
* fix ci
* fix
* fix vl
* Fix performance regression
* fix
* fix
* fix mtp
* fix index->req_id mapping
* fix ut
---------
Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com>
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
|
2026-02-03 00:28:02 -08:00 |
|