Commit Graph

11 Commits

Author SHA1 Message Date
GoldPancake cf0df470cf [Cherry-Pick][Speculative Decoding] Support suffix decoding (#6403) (#6967) 2026-03-23 17:33:58 +08:00
ming1753 568cb7102a Revert "[Cherry-Pick] [Bug Fix] Fix MM mtp incorrect rope emb(#6581) (#6586)" (#6633)
This reverts commit 5eaba4e22d.
2026-03-04 14:02:45 +08:00
ming1753 5eaba4e22d [Cherry-Pick] [Bug Fix] Fix MM mtp incorrect rope emb(#6581) (#6586)
* [Bug Fix] Fix MM mtp incorrect rope emb

* fix bug
2026-03-03 20:07:24 +08:00
周周周 2b4748de4f [MTP] refactor MTP pre_process (#6358) 2026-02-09 10:47:15 +08:00
chen 29a313a402 [Optimization] Support FA2/FA3/FA4 with attn_mask_q (#6354)
* support FA4 sm100

* flash attn backend support mask

* flash attn backend run flashmask correct

* add test for flash_attn_backend and flash_attn_func

* check

* add test for fa4

* requirements.txt add fa4 whl

* check test on sm100

* fix CI conflict

* add enable_torch_proxy for flash_mask

* lazy import fa4

* check

* fix tests import

* check test_load_mpt import
2026-02-05 14:39:00 +08:00
bukejiyu 12d4b4cb87 [Feature]Support reorder ids to split prefill and decodes (#5779)
* support reorder ids

* perfect code

* fix

* fix unittest

* delete code

* fix

* add python api

* delete custom op

* update algorithm

* fix swap

* support condense

* support condense

* support mtp

* delete code

* update

* update

* update

* update

* update for other platfrom

* update

* fix

* fix mtp

* fix ut

* update

* fix ut

* update ut

* fix

* fix encoder_cache

* fix ci

* fix

* fix vl

* Fix performance regression

* fix

* fix

* fix mtp

* fix index->req_id mapping

* fix ut

---------

Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com>
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2026-02-03 00:28:02 -08:00
freeliuzc ce06c6dfb3 [BugFix] Fix token_penalty kernel (#6069)
* fix token_penalty kernel

* try to fix xpu

* fix xpu

* fix unit test
2026-01-28 12:03:05 +08:00
周周周 0966df78dc [Others] remove stop_nums (#6182) 2026-01-26 12:12:47 +08:00
GoldPancake bda38aa519 [Speculative Decoding] Support MTP for GLM-4.5-Air (#6047)
* glm mtp
* add spec neox partial rope
2026-01-16 14:35:24 +08:00
Yonghua Li 9fc2400e71 [BugFix] fix mtp cache attaching for pd disaggregation (#5884)
* [fix] fix mtp cache attaching for pd disaggregation

* [fix] fix test_mtp_proposer.py
2026-01-06 14:17:53 +08:00
kesmeey ac731653b3 [CI]【Hackathon 9th Sprint No.12】功能模块 fastdeploy/spec_decode/mtp.py 单测补充 (#5533)
* Add unit tests for MTPProposer class in spec_decode/mtp.py

* fix: remove non-existent QuantizationConfig import in test_mtp_proposer

* fix: add logprobs_mode attribute to FakeModelConfig

* fix: fix test failures in test_mtp_proposer - fix Mock setup, remove arrival_time, add missing keys

* fix: add seq_lens_this_time initialization and kv_cache init before insert_tasks_v1

* fix: check pos_emb_type attribute existence before assertion

* test: add minimal coverage for mtp cache type, mm init, preempted

* test: fix cache_type_branches unsupported platform on 12

* test: refine MTPProposer tests for cache type, requests and chunked prefill

* chore: remove stray spec_decode copy
2025-12-17 20:09:45 +08:00