sunxin
|
c29e86fc9d
|
[Feature] Support mtp overlap schedule (#7001)
|
2026-04-01 14:24:26 +08:00 |
|
huicongyao
|
25d64efdc4
|
[Speculative Decoding] Refactor Eagle MTP hidden states copy (#6812)
* reformat eagle_get_hidden_states & eagle_get_self_hidden_states
* readibility
* fix xpu bug
* fix coverage failure
* change luanch params & parallelize position_map compute
* Fix MTP-related bugs in FastDeploy centralized inference
* fix
* refactor mtp hidden_states process
* fix
* add unittest & optimize kernel
* remove useless code
* fix
|
2026-03-25 22:54:31 -07:00 |
|
freeliuzc
|
7a6c28781b
|
[Speculative Decoding] Optimize attn_mask_offset and fix mtp bug (#7005)
* optimize attn_mask_offset and optimize mtp usage
* delete useless branch
* fix kernel format
* fix kernel runner
|
2026-03-25 01:52:06 -07:00 |
|
gongweibao
|
a6351dea0b
|
[BugFix][Optimization] Replace silent failures with catchable exceptions and informative error messages (#6533)
* init
* init
* fix format
* add
* add files
* add ut
* fix some
* add ut
* add more
* add
* fix pre-commit
* fix pre-commit
* fix cover
* skip long seq
* add
* add
* fix
* remove not need
* fix set attr
* fix comments
* fix comments
* fix failed tests
---------
Co-authored-by: gongweibao <gognweibao@baidu.com>
|
2026-03-16 21:32:43 +08:00 |
|
ming1753
|
bb925c605f
|
[Other] Adjust GPUModelRunner to enhance compatibility (#6851)
|
2026-03-16 14:49:19 +08:00 |
|
ming1753
|
02d32eea3b
|
Revert "[Bug Fix] Fix MM mtp incorrect rope emb (#6581)" (#6631)
This reverts commit c5eb6b65e7.
|
2026-03-04 11:23:28 +08:00 |
|
ming1753
|
c5eb6b65e7
|
[Bug Fix] Fix MM mtp incorrect rope emb (#6581)
* [Bug Fix] Fix MM mtp incorrect rope emb
|
2026-03-03 19:28:59 +08:00 |
|
ming1753
|
97eee75677
|
[Feature] GPU Memory Optimization and Retirement of V0 Scheduler (#6407)
* Optim GPU Mem Usage
---------
Co-authored-by: huzesen <huzesen@baidu.com>
|
2026-02-28 15:07:43 +08:00 |
|
周周周
|
2b4748de4f
|
[MTP] refactor MTP pre_process (#6358)
|
2026-02-09 10:47:15 +08:00 |
|
chen
|
29a313a402
|
[Optimization] Support FA2/FA3/FA4 with attn_mask_q (#6354)
* support FA4 sm100
* flash attn backend support mask
* flash attn backend run flashmask correct
* add test for flash_attn_backend and flash_attn_func
* check
* add test for fa4
* requirements.txt add fa4 whl
* check test on sm100
* fix CI conflict
* add enable_torch_proxy for flash_mask
* lazy import fa4
* check
* fix tests import
* check test_load_mpt import
|
2026-02-05 14:39:00 +08:00 |
|
bukejiyu
|
12d4b4cb87
|
[Feature]Support reorder ids to split prefill and decodes (#5779)
* support reorder ids
* perfect code
* fix
* fix unittest
* delete code
* fix
* add python api
* delete custom op
* update algorithm
* fix swap
* support condense
* support condense
* support mtp
* delete code
* update
* update
* update
* update
* update for other platfrom
* update
* fix
* fix mtp
* fix ut
* update
* fix ut
* update ut
* fix
* fix encoder_cache
* fix ci
* fix
* fix vl
* Fix performance regression
* fix
* fix
* fix mtp
* fix index->req_id mapping
* fix ut
---------
Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com>
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
|
2026-02-03 00:28:02 -08:00 |
|
freeliuzc
|
ce06c6dfb3
|
[BugFix] Fix token_penalty kernel (#6069)
* fix token_penalty kernel
* try to fix xpu
* fix xpu
* fix unit test
|
2026-01-28 12:03:05 +08:00 |
|
周周周
|
0966df78dc
|
[Others] remove stop_nums (#6182)
|
2026-01-26 12:12:47 +08:00 |
|
GoldPancake
|
bda38aa519
|
[Speculative Decoding] Support MTP for GLM-4.5-Air (#6047)
* glm mtp
* add spec neox partial rope
|
2026-01-16 14:35:24 +08:00 |
|
Yonghua Li
|
9fc2400e71
|
[BugFix] fix mtp cache attaching for pd disaggregation (#5884)
* [fix] fix mtp cache attaching for pd disaggregation
* [fix] fix test_mtp_proposer.py
|
2026-01-06 14:17:53 +08:00 |
|
kesmeey
|
ac731653b3
|
[CI]【Hackathon 9th Sprint No.12】功能模块 fastdeploy/spec_decode/mtp.py 单测补充 (#5533)
* Add unit tests for MTPProposer class in spec_decode/mtp.py
* fix: remove non-existent QuantizationConfig import in test_mtp_proposer
* fix: add logprobs_mode attribute to FakeModelConfig
* fix: fix test failures in test_mtp_proposer - fix Mock setup, remove arrival_time, add missing keys
* fix: add seq_lens_this_time initialization and kv_cache init before insert_tasks_v1
* fix: check pos_emb_type attribute existence before assertion
* test: add minimal coverage for mtp cache type, mm init, preempted
* test: fix cache_type_branches unsupported platform on 12
* test: refine MTPProposer tests for cache type, requests and chunked prefill
* chore: remove stray spec_decode copy
|
2025-12-17 20:09:45 +08:00 |
|