Commit Graph

495 Commits

Author SHA1 Message Date
google-labs-jules[bot] 69c7dd0a19 Bolt: Optimize single element list appends
Replaced instances of `.extend([item])` with `.append(item)` in multiple files.
Using `.extend([item])` incurs memory overhead by allocating a new single-element
list and is computationally slower than calling `.append(item)` directly.

Files updated:
- fastdeploy/input/encodings/ernie_encoding.py
- fastdeploy/input/ernie4_5_vl_processor/process.py
- fastdeploy/output/token_processor.py
- fastdeploy/worker/gpu_model_runner.py
- fastdeploy/worker/metax_model_runner.py
2026-04-15 16:45:13 +00:00
GoldPancake a498720a75 [RL] Add clear_graph_opt_backend for glm4_mtp (#7378)
* add clear_grpah func

* fix spell
2026-04-15 19:44:15 +08:00
luukunn 3f84d8d893 [DataProcessor] Refactor multimodal processor: extract encoding strategies and unify MM processing pipeline (#7298)
* merge mm processor
2026-04-15 19:01:06 +08:00
Echo-Nie 8819a039c9 [Others] Fix typo (#7280)
* typo

* typo

* typo

* typo
2026-04-14 17:28:22 +08:00
xiaoxiaohehe001 abba29b348 [BugFix] fix mm rope (#7274) 2026-04-14 11:36:08 +08:00
zhupengyang 27b00cf385 [XPU] glm-4.5-air (#7071) 2026-04-14 11:31:49 +08:00
Yuanle Liu 0ddb6e461c [Optimization] 移除 num_blocks 上限限制 (#7241) 2026-04-13 07:07:41 -07:00
freeliuzc 31e2a8bbad [Speculative Decoding] Support mtp super ultra overlap in pd-split mode with insert_task overlap (#7323)
* support mtp overlap in pd-split mode with insert_task overlap
2026-04-13 19:41:17 +08:00
Nyako Shigure d659099415 [Cleanup] Replace torch proxy alias with public compat API (#7348) 2026-04-13 11:43:26 +08:00
Jiang-Jia-Jun 26d6a20c2f [Optim] Remove IPCLock between CacheManager and WorkerProcess (#7299)
* [Optim] Remove IPCLock between CacheManager and WorkerProcess

* Update envs.py

* Update worker_process.py

---------

Co-authored-by: jiang-jia-jun <jiangjiajun@baidu.com>
2026-04-12 13:59:34 +08:00
sunxin 00005c92e0 [BugFix] Fix mtp empty run issue in overlap schedule and EP model (#7300) 2026-04-10 03:29:45 -07:00
bukejiyu 14d46181b8 [Loader] add multi-thread model loading (#6877)
* multi-thread-loader

* fix ut
2026-04-09 23:40:15 -07:00
GoldPancake c1fb3112f8 [FDConfig] Support CLI args for quantization params and add cudagraph validation (#7281)
* refactor quant cli param
2026-04-10 14:13:42 +08:00
chenjian 427efadaee [Feature] Support set PREEMPTED_TOKEN_ID in GET_SAVE_OUTPUT_V1 (#7159)
* [Feature] Support set PREEMPTED_TOKEN_ID in GET_SAVE_OUTPUT_V1

* [Feature] Support set PREEMPTED_TOKEN_ID in GET_SAVE_OUTPUT_V1

* fix
2026-04-08 19:30:54 +08:00
Jiajun Ji 9b970de029 [XPU] Add TP broadcast after sampling in XPU model runner to ensure consistent results across ranks. (#7096) 2026-04-08 19:26:53 +08:00
RichardWooSJTU 771d42c90b [TBO] Apply tbo to gpu_model_runner (#7165)
* apply tbo in gpu_model_runner

* fix
2026-04-08 16:55:17 +08:00
K11OntheBoat bb48bcbaa2 Split enable_mm (#7183)
Co-authored-by: liuruian <liuruian@MacBook-Pro.local>
2026-04-08 11:25:41 +08:00
GoldPancake 9d4fd19c3f [Speculative Decoding] Auto-scale CUDA graph capture sizes for speculative decoding (#7215) 2026-04-07 20:22:28 +08:00
Nana 367d37b523 fix typo (#7147) 2026-04-07 16:30:32 +08:00
huicongyao 095a11d932 fix MTP bugs in TP and overlap (#7172)
* fix MTP bugs in TP and overlap

* fix
2026-04-03 14:19:11 +08:00
cmcamdy 7a2e33098f [XPU] Refactor pre process (#6993)
* [XPU] support speculate_pre_process

* merge develop

* fix codestype

* fix mtp, support cu_seqlens_q_output

* fix mtp, support cu_seqlens_q_output

* fix test

---------

Co-authored-by: lizan1999 <lizan03@baidu.com>
2026-04-01 20:29:55 +08:00
yzwu ceaf5df350 [Iluvatar] Fix cuda graph error for tp > 1 in ernie models (#7126) 2026-04-01 19:13:34 +08:00
sunxin c29e86fc9d [Feature] Support mtp overlap schedule (#7001) 2026-04-01 14:24:26 +08:00
Yonghua Li a3cc3aa777 [BugFix] reset exist tasks signal in clear_data (#7111)
* [BugFix] reset exist tasks signal in clear_data

* [Fix] fix stale exist tasks signal after weight update

* [Chore] downgrade detected new requests log to DEBUG level

* [fix] adjust continue place
2026-03-31 21:24:08 +08:00
chenjian 6727df8286 [Optimization] Optimize ttft for prefill pd (#6680)
* optimize ttft

* fix

* fix

* fix ci

* fix ci

* fix

* fix bug

* fix

* add comments

* fix ci

* fix

* fix ci

* fix format

* update according to review

* add comment

* fix

* fix format
2026-03-30 20:36:23 +08:00
jackyYang6 05f2d95729 [RL] Adapt async rollout checkpoint update flow (#7042)
* update checkpoint-transfer flow and control update_weights params

* test: add update_weights route validation
2026-03-30 19:19:34 +08:00
yzwu 8789329457 [Iluvatar] Support wi4a16 group_gemm (#7078) 2026-03-30 19:03:51 +08:00
GoldPancake 6693bcd0e4 [BugFix] fix clear_parameters in draft cudagraph (#7035) 2026-03-27 15:28:50 +08:00
Yonghua Li 442514252c [fix] remove all gather ep group control requests in normal cases (#7026) 2026-03-26 18:41:29 +08:00
freeliuzc 4fd877ed43 [Speculative Decoding] Support mtp expert-parallel and support different modality deploy (#7018)
* support mtp ep and support different modality

* fix default arg
2026-03-26 13:52:16 +08:00
Yonghua Li a7f52c300d [Feature] support v1 update/clear api for RL (#6761)
* [Feature] support v1 update/clear api for RL

* [fix] fix execute_model and add sleep/wakeup api

* [fix] fix mtp and key_prefix

* [chore] move _update_key_prefix to resume method

* [fix] make the interface safe to call multiple times

* [fix] fix some tiny bugs

* [chore] make small changes against pr review

* [docs] add docs for weight update

* [test] add some tests and update docs

* [style] fix code style check

* [test] fix ci

* [fix] fix stale control responses when control method timed out

* [chore] remove unused code

* [chore] fix code style

* [chore] optimize tags and key_prefix

* [test] fix ci

* [chore] fix code style

* [test] fix ci

* [fix] fix ep control

* [fix] fix ep control for engine cache queue
2026-03-25 19:18:46 +08:00
freeliuzc e87ce4b8cd [Speculative Decoding] refactor MTP and optimize spec-decoding postprocess (#6973)
* support new mtp

* refactor(speculate_decoding and mtp): optimize mtp sturcture logic. Update spec-branch status-process

* fix cuda-graph for spec-decoding

* fix xpu mtp and fix some note

* fix unittest and optmize note

* fix model status update in eos-branch
2026-03-24 10:19:01 +08:00
bukejiyu c62f6b4ea5 [Others] Fix PD reorder for MTP (#6792)
* fix pd reorder in mtp

* add ut

* update

* fix mtp
2026-03-23 21:10:22 +08:00
xiaoxiaohehe001 c1f7991aec [BugFix] add worker_process no grad (#6971) 2026-03-23 02:10:56 -07:00
sunxin 7a78001be2 fix execute_model_normal in empty run (#6968) 2026-03-23 14:07:46 +08:00
周周周 1c38da2118 Make seq_lens_this_time/decoder/encoder equal shape (#6942) 2026-03-20 15:31:52 +08:00
yzwu 8b890c0d72 [Iluvatar] refactor attn and moe code (#6887) 2026-03-18 10:31:00 +08:00
qwes5s5 3b7507a4c2 test_abort (#6743) 2026-03-17 14:06:40 +08:00
huicongyao eab429d05e fix performance drop while no spec (#6866) 2026-03-17 13:06:36 +08:00
gongweibao a6351dea0b [BugFix][Optimization] Replace silent failures with catchable exceptions and informative error messages (#6533)
* init

* init

* fix format

* add

* add files

* add ut

* fix some

* add ut

* add more

* add

* fix pre-commit

* fix pre-commit

* fix cover

* skip long seq

* add

* add

* fix

* remove not need

* fix set attr

* fix comments

* fix comments

* fix failed tests

---------

Co-authored-by: gongweibao <gognweibao@baidu.com>
2026-03-16 21:32:43 +08:00
ming1753 bb925c605f [Other] Adjust GPUModelRunner to enhance compatibility (#6851) 2026-03-16 14:49:19 +08:00
huicongyao 2e63d88f7a [Optimization][Speculative Decoding]Fuse padding sampling params (#6765)
* optimize speculate pre process unit test

* Add CUDA kernel for building sampling params in speculative decoding

* init infer seed in device

* format code

* add unittest & fix

* fix

* format-code

* format-code

* fix rebase

* .

* fix unitest
2026-03-12 05:05:15 -07:00
MingkunZhang a9ace998db [Metax][Fix] fix ci error based pr#6805 caused by pr#6685 (#6807) 2026-03-12 19:30:16 +08:00
RAM cdaf6dd400 [RL][Cherry-Pick] Support Fully Async and PrefixCache (#6599)
* cherry-pick  Support Fully Async and PrefixCache step 1

* copy routing_indices_cache.py from 2.4

* cherry-pick [RL] R3 Fix the bug for determining the end of a request (#6388)

* cherry-pick [RL] Clear Requests status of R3 (#6569)

* delete code

* fix rename bug

* fix status shape bug

* fix ci
2026-03-12 01:13:30 -07:00
cmcamdy 3543088d3e [XPU] rm stop nums (#6651)
* rm stop nums

* fix conflict

---------

Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
2026-03-12 14:05:58 +08:00
RichardWooSJTU 9f0778f991 [Feature] Support EP prefill with num_worst_tokens (#6574)
* support num worst tokens

* support num worst tokens

* fix build error

* support num worst tokens: fix errors

* support num worst tokens: fix feild

* support num worst tokens: delete requiements

* replace permute and depermute op by pure cuda

* replace permute and depermute op by pure cuda

* fix ci

* fix op

* fix nan

* fix code style

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2026-03-11 17:09:07 +08:00
Yonghua Li 7811eeccaa [fix] resolve get_save_output_v1 socket name conflicts between multiple instances (#6758) 2026-03-11 15:02:32 +08:00
freeliuzc cf7934a4b2 [Speculative Decoding] Unify Spec and non-spec branch (#6685)
* optimize spec-inference architecture

* delete debug log

* optimize spec_method usage  && fix unit_test

* add claude unit-test skill

* fix some ugly bug

* enhance robustness and bounds check

* unify method & spec_method to method to avoid bug

* activate CI

* fix unit test

* Unify logprobs computation for naive and speculative decoding, fix CUDA kernel

* fix logprob bug && optimize verify kernel

* fix exist_decode() judge
2026-03-10 23:58:44 -07:00
Jiang-Jia-Jun b05a6c4206 [BugFix][KVCache] Add inter-process lock to fix NaN error under DP+EP (#6724)
* [BugFix] Support  to fix NaN bug in EP

* Optimze notion for all the funs

* Fix potential lock contention failure issues

* Update fastdeploy/inter_communicator/ipc_signal.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update envs.py

* Update default value for USE_KVCACHE_LOCK

Change default value of USE_KVCACHE_LOCK from 1 to 0.

* Update worker_process.py

* Fix suffix wrong

* Update test_prefix_cache_manager.py

---------

Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-03-10 21:55:32 +08:00
sunxin 812657beee fix pd overlap (#6753) 2026-03-10 20:29:54 +08:00