GoldPancake
|
cf0df470cf
|
[Cherry-Pick][Speculative Decoding] Support suffix decoding (#6403) (#6967)
|
2026-03-23 17:33:58 +08:00 |
|
ming1753
|
568cb7102a
|
Revert "[Cherry-Pick] [Bug Fix] Fix MM mtp incorrect rope emb(#6581) (#6586)" (#6633)
This reverts commit 5eaba4e22d.
|
2026-03-04 14:02:45 +08:00 |
|
ming1753
|
5eaba4e22d
|
[Cherry-Pick] [Bug Fix] Fix MM mtp incorrect rope emb(#6581) (#6586)
* [Bug Fix] Fix MM mtp incorrect rope emb
* fix bug
|
2026-03-03 20:07:24 +08:00 |
|
周周周
|
2b4748de4f
|
[MTP] refactor MTP pre_process (#6358)
|
2026-02-09 10:47:15 +08:00 |
|
chen
|
29a313a402
|
[Optimization] Support FA2/FA3/FA4 with attn_mask_q (#6354)
* support FA4 sm100
* flash attn backend support mask
* flash attn backend run flashmask correct
* add test for flash_attn_backend and flash_attn_func
* check
* add test for fa4
* requirements.txt add fa4 whl
* check test on sm100
* fix CI conflict
* add enable_torch_proxy for flash_mask
* lazy import fa4
* check
* fix tests import
* check test_load_mpt import
|
2026-02-05 14:39:00 +08:00 |
|
bukejiyu
|
12d4b4cb87
|
[Feature]Support reorder ids to split prefill and decodes (#5779)
* support reorder ids
* perfect code
* fix
* fix unittest
* delete code
* fix
* add python api
* delete custom op
* update algorithm
* fix swap
* support condense
* support condense
* support mtp
* delete code
* update
* update
* update
* update
* update for other platfrom
* update
* fix
* fix mtp
* fix ut
* update
* fix ut
* update ut
* fix
* fix encoder_cache
* fix ci
* fix
* fix vl
* Fix performance regression
* fix
* fix
* fix mtp
* fix index->req_id mapping
* fix ut
---------
Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com>
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
|
2026-02-03 00:28:02 -08:00 |
|
freeliuzc
|
ce06c6dfb3
|
[BugFix] Fix token_penalty kernel (#6069)
* fix token_penalty kernel
* try to fix xpu
* fix xpu
* fix unit test
|
2026-01-28 12:03:05 +08:00 |
|
周周周
|
0966df78dc
|
[Others] remove stop_nums (#6182)
|
2026-01-26 12:12:47 +08:00 |
|
GoldPancake
|
bda38aa519
|
[Speculative Decoding] Support MTP for GLM-4.5-Air (#6047)
* glm mtp
* add spec neox partial rope
|
2026-01-16 14:35:24 +08:00 |
|
Yonghua Li
|
9fc2400e71
|
[BugFix] fix mtp cache attaching for pd disaggregation (#5884)
* [fix] fix mtp cache attaching for pd disaggregation
* [fix] fix test_mtp_proposer.py
|
2026-01-06 14:17:53 +08:00 |
|
kesmeey
|
ac731653b3
|
[CI]【Hackathon 9th Sprint No.12】功能模块 fastdeploy/spec_decode/mtp.py 单测补充 (#5533)
* Add unit tests for MTPProposer class in spec_decode/mtp.py
* fix: remove non-existent QuantizationConfig import in test_mtp_proposer
* fix: add logprobs_mode attribute to FakeModelConfig
* fix: fix test failures in test_mtp_proposer - fix Mock setup, remove arrival_time, add missing keys
* fix: add seq_lens_this_time initialization and kv_cache init before insert_tasks_v1
* fix: check pos_emb_type attribute existence before assertion
* test: add minimal coverage for mtp cache type, mm init, preempted
* test: fix cache_type_branches unsupported platform on 12
* test: refine MTPProposer tests for cache type, requests and chunked prefill
* chore: remove stray spec_decode copy
|
2025-12-17 20:09:45 +08:00 |
|