FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Author	SHA1	Message	Date
fxyfxy777	9f3b3ce7f5	[Optimization] merge_allreduce (#7039 )	2026-04-02 19:52:13 +08:00
Yuanle Liu	1af7f80811	Revert "[BugFix][Speculative Decoding] Correct index calculation in speculate…" (#7133 ) This reverts commit `ba1aa1edff`.	2026-04-01 06:54:23 -07:00
luukunn	fa7a84926d	[Optimization]Fix tool parser (#7079 ) * fix tool parser	2026-04-01 21:20:34 +08:00
lonelygsh	ba1aa1edff	[BugFix][Speculative Decoding] Correct index calculation in speculate decoding operators (#7121 ) - Fix accept_idx calculation in spec_set_value_by_stop_seqs - Fix condition check from < to <= for token matching - Fix accept_tokens indexing logic - Remove unnecessary -1 in current_step comparison for max_think_len Co-authored-by: guanshihui] <guanshihui@baidu.com>	2026-04-01 05:36:53 -07:00
cmcamdy	7a2e33098f	[XPU] Refactor pre process (#6993 ) * [XPU] support speculate_pre_process * merge develop * fix codestype * fix mtp, support cu_seqlens_q_output * fix mtp, support cu_seqlens_q_output * fix test --------- Co-authored-by: lizan1999 <lizan03@baidu.com>	2026-04-01 20:29:55 +08:00
luukunn	fdfc908e2f	[Others] reuse unit test (#7127 )	2026-04-01 18:36:00 +08:00
sunxin	c29e86fc9d	[Feature] Support mtp overlap schedule (#7001 )	2026-04-01 14:24:26 +08:00
YuBaoku	c6f0c5c3a6	[CI] Optimize test execution with single-GPU parallelism (#7085 ) * [CI] Optimize test execution with single-GPU parallelism and log collection * remove export CUDA_VISIBLE_DEVICES * fix path error * fix log_* path and debug * [CI] Optimize test execution with single-GPU parallelism and log collection	2026-04-01 14:18:40 +08:00
zhouchong	91c832f607	[Feature] Add logging parameters and error output to terminal (#7098 )	2026-04-01 13:18:42 +08:00
luukunn	3651113ee5	[DataProcessor]Remove ENABLE_V1_DATA_PROCESSOR (#7052 ) * remove ENABLE_V1_DATA_PROCESSOR * fix unit test * fix unit test	2026-04-01 09:53:41 +08:00
qwes5s5	ee2b965f5f	adjust config info (#7054 )	2026-03-31 21:26:05 +08:00
cloudforge1	5c5dc66aa7	[CI]【Hackathon 10th Spring No.34】async_expert_loader 单测补充 (#6731 ) * [CI]【Hackathon 10th Spring No.34】async_expert_loader 单测补充 * [CI]【Hackathon 10th Spring No.34】async_expert_loader 单测补充 --------- Co-authored-by: cloudforge1 <cloudforge1@users.noreply.github.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2026-03-31 15:29:35 +08:00
qwes5s5	daa95244f7	abort requests (#6992 )	2026-03-31 11:02:26 +08:00
Yonghua Li	6d9739f360	[BugFix] fix speculative gauge metrics in multi api server (#7082 )	2026-03-31 10:52:50 +08:00
chenjian	6727df8286	[Optimization] Optimize ttft for prefill pd (#6680 ) * optimize ttft * fix * fix * fix ci * fix ci * fix * fix bug * fix * add comments * fix ci * fix * fix ci * fix format * update according to review * add comment * fix * fix format	2026-03-30 20:36:23 +08:00
jackyYang6	05f2d95729	[RL] Adapt async rollout checkpoint update flow (#7042 ) * update checkpoint-transfer flow and control update_weights params * test: add update_weights route validation	2026-03-30 19:19:34 +08:00
yzwu	8789329457	[Iluvatar] Support wi4a16 group_gemm (#7078 )	2026-03-30 19:03:51 +08:00
kevin	18062c55bb	[BugFix][KVCache] Fix mm hash boundary comparison in get_block_hash_extra_keys (#6929 ) * [BugFix][KVCache] Fix mm hash boundary comparison in get_block_hash_extra_keys Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [BugFix][KVCache] Fix test_get_block_hash_extra_keys_boundary_cases assertions ## Motivation 测试用例 `test_get_block_hash_extra_keys_boundary_cases` 中，Block [4,8) 的调用错误地传入了 `mm_idx=1`，跳过了 img0[2,5)；但 img0 覆盖 token 4，token 4 属于 block [4,8)，应被包含在 hash_keys 中。此外，所有 assertEqual 只校验了 hash_keys，未校验返回的 mm_idx 游标。 ## Modifications - `test_get_block_hash_extra_keys_boundary_cases`： - 改为链式调用，用上一次返回的 mm_idx 作为下一次入参，模拟真实调用循环 - Block [4,8) 入参从 `mm_idx=1` 改为沿用上次返回的 `mm_idx=0`，期望值从 `[]` 改为 `["hash-0"]` - 所有断言改为 `assertEqual((mm_idx, hash_keys), (...))` 同时校验游标 - `test_get_block_hash_extra_keys_no_overlap_at_boundaries`： - Case B 入参从 `mm_idx=1` 改为 `mm_idx=0`（从头遍历，img-a 走 continue） - 所有断言增加 mm_idx 校验 - `test_get_block_hash_extra_keys_image_crosses_block_boundary`： - 所有断言增加 mm_idx 校验 - `test_get_block_hash_extra_keys_no_mm_inputs`： - 断言增加 mm_idx 校验 - `test_get_block_hash_extra_keys_handles_multimodal_segments`： - call2、call3 断言增加 mm_idx 校验 ## Usage or Command ```bash python -m pytest tests/cache_manager/test_prefix_cache_manager.py::TestPrefixCacheManagerCoverage -v -k "get_block_hash_extra_keys" ``` Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: chengyanfu <chengyanfu@baidu.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-30 17:13:31 +08:00
luukunn	b9f8873367	[Optimization]Merge Text processor (#7030 ) * merge text processor * update * fix unit test * merge messages2ids * fix unit test * 删除重复代码 * remove redundant code * delete code * fix unit test	2026-03-30 15:02:35 +08:00
mpgemm	1a1d048774	[Feature] Support NVFP4 Flashinfer-cutedsl MoE on SM100 (#6963 )	2026-03-30 11:37:04 +08:00
Longzhi Wang	2eea6fa97a	[BugFix] Fix kv cache int8 dynamic quant on flash and flash_mask backend (#7028 ) * [BugFix] Fix kv cache int8 dynamic quant on flash and flash_mask backend * add constexpr and code style clean * add test * fix code style * fix test	2026-03-30 11:17:15 +08:00
mpgemm	7a20eaebe8	[Feature] Support cute cpp Encoder FA4 (#7016 ) * add cute cpp fa4 * 删掉注释 * 修正合并错误 * sm_version放到函数内 * ci错误	2026-03-30 10:54:56 +08:00
YuBaoku	842c60809a	[CI] Align with Paddle layer_norm kernel update (#7056 )	2026-03-27 22:58:01 +08:00
cloudforge1	11ad95ba91	[CI]【Hackathon 10th Spring No.43】ernie4_5_mtp 单测补充 (#6738 ) * [CI]【Hackathon 10th Spring No.43】ernie4_5_mtp 单测补充 * [CI]【Hackathon 10th Spring No.43】add mapping and forward branch coverage --------- Co-authored-by: cloudforge1 <cloudforge1@users.noreply.github.com> Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2026-03-27 17:15:53 +08:00
YuBaoku	10c59f78d6	[CI] disable tests/e2e/test_Qwen3VLMoe_serving.py in unit_test (#7044 )	2026-03-27 14:15:14 +08:00
Jiaxin Sui	c3ed7db28d	[XPU] [CI] Fix xpu ci bug (#7014 ) * fix xpu ci bug * Remove unnecessary blank line in conftest.py * Update upload-artifact action to version 6 * Update _xpu_8cards_case_test.yml * fix ci bug * Change exit code on test failure to 1 * fix ci bug * fix ci bug * fix ci bug * fix ci bug * Update conftest.py	2026-03-27 10:29:34 +08:00
Zhang Yulong	a31d4bfbdf	[CI] update mtp case (#7031 )	2026-03-27 10:21:37 +08:00
huicongyao	25d64efdc4	[Speculative Decoding] Refactor Eagle MTP hidden states copy (#6812 ) * reformat eagle_get_hidden_states & eagle_get_self_hidden_states * readibility * fix xpu bug * fix coverage failure * change luanch params & parallelize position_map compute * Fix MTP-related bugs in FastDeploy centralized inference * fix * refactor mtp hidden_states process * fix * add unittest & optimize kernel * remove useless code * fix	2026-03-25 22:54:31 -07:00
YuBaoku	61ebac49ef	[CI] Fix test_communication.py and add port cleanup (#7021 )	2026-03-26 10:56:40 +08:00
luukunn	e6804ba97d	[Optimization]Streaming requests return complete special tokens. (#6998 ) * return special token * add completions * update * fix * add prompt_token_ids& completion_token_ids=None, * fix unite test	2026-03-26 09:49:43 +08:00
luukunn	d5cb2767d7	[Optimization] Deduplicate shared image/video utilities across VL processors (#6988 ) * step1~3 * fix import path * 删除重复代码 * 删除重复代码 * 删除重复代码 * fix import path * update * fix import path * add unit test * fix * update * fix unit test	2026-03-26 09:49:33 +08:00
YuBaoku	b8bb34c7dd	[CI] disable tests/distributed/test_communication.py in unit_test (#7019 )	2026-03-25 20:54:55 +08:00
Yonghua Li	a7f52c300d	[Feature] support v1 update/clear api for RL (#6761 ) * [Feature] support v1 update/clear api for RL * [fix] fix execute_model and add sleep/wakeup api * [fix] fix mtp and key_prefix * [chore] move _update_key_prefix to resume method * [fix] make the interface safe to call multiple times * [fix] fix some tiny bugs * [chore] make small changes against pr review * [docs] add docs for weight update * [test] add some tests and update docs * [style] fix code style check * [test] fix ci * [fix] fix stale control responses when control method timed out * [chore] remove unused code * [chore] fix code style * [chore] optimize tags and key_prefix * [test] fix ci * [chore] fix code style * [test] fix ci * [fix] fix ep control * [fix] fix ep control for engine cache queue	2026-03-25 19:18:46 +08:00
gongweibao	48cfb608aa	[FDConfig] Reduce FD_CUSTOM_AR_MAX_SIZE_MB default from 64 to 8 (#6997 ) Most single-GPU and small-model deployments do not need 64MB custom all-reduce buffers. Lowering the default to 8MB reduces unnecessary shared memory allocation. Tests that require larger buffers now explicitly set the value. Co-authored-by: gongweibao <gognweibao@baidu.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-25 17:40:01 +08:00
freeliuzc	7a6c28781b	[Speculative Decoding] Optimize attn_mask_offset and fix mtp bug (#7005 ) * optimize attn_mask_offset and optimize mtp usage * delete useless branch * fix kernel format * fix kernel runner	2026-03-25 01:52:06 -07:00
YuBaoku	aee293be0f	[CI] Optimize: add vl swap_test and remove useless code (#7000 )	2026-03-25 11:33:56 +08:00
YuBaoku	4e8d503e3c	Revert "add deepep precision test (#6984 )" (#7004 ) This reverts commit `522d12c25a`.	2026-03-25 10:50:40 +08:00
周周周	522d12c25a	add deepep precision test (#6984 )	2026-03-24 19:51:33 +08:00
SUN Dong	6cff780fdb	[RL] Support moe_topk_select using Paddle native operators and Add fused stack-transpose-quant for BlockWiseFP8 MoE weight quantization and swiglu-fp8-quant op for DeepGemmFusedMoE for training alignment (#6850 ) * [RL] Add fused stack-transpose-quant for BlockWiseFP8 MoE weight quantization * update * update * update * support custom topk inDeepGemmFusedMoeMethod apply_tp * apply_ep_prefill support moe_topk_select * update * add ut * add ut * add ut * modity doc * fix env and docs * add ut --------- Co-authored-by: zhanghonggeng <zhanghonggeng@baidu.com>	2026-03-24 11:12:39 +08:00
freeliuzc	e87ce4b8cd	[Speculative Decoding] refactor MTP and optimize spec-decoding postprocess (#6973 ) * support new mtp * refactor(speculate_decoding and mtp): optimize mtp sturcture logic. Update spec-branch status-process * fix cuda-graph for spec-decoding * fix xpu mtp and fix some note * fix unittest and optmize note * fix model status update in eos-branch	2026-03-24 10:19:01 +08:00
bukejiyu	c62f6b4ea5	[Others] Fix PD reorder for MTP (#6792 ) * fix pd reorder in mtp * add ut * update * fix mtp	2026-03-23 21:10:22 +08:00
wikilsh	5e469fc901	[RL][BugFix][Optimization] Support chunked part files loading and fix model path format in IPC snapshot strategy (#6852 ) * [RL] Support chunked part files loading in IPC snapshot strategy ## Motivation When using IPC snapshot for elastic recovery in RL training, loading a single large pdparams file causes a significant memory spike. This PR refactors `_update_ipc_snapshot` to support loading chunked part files to avoid the memory spike. ## Modifications Refactored `_update_ipc_snapshot` in `fastdeploy/rl/dynamic_weight_manager.py` with a three-level loading priority: 1. Chunked part files (`model_state.tpR{id}.part{N}.pdparams`): Load multiple smaller shards sequentially, freeing memory between each chunk via `gc.collect()` to avoid memory spike. 2. Single full file (`model_state.tpR{id}.pdparams`): Legacy single-file loading path (preserved for backward compatibility). 3. Shared fallback directory (`/shared_ipc_meta/...`): Oldest legacy fallback path (preserved for backward compatibility). Also fixed the rank ID in the file name pattern from hardcoded `tp0` to dynamic `paddle.distributed.get_rank()`. ## Checklist - [ ] Add at least a tag in the PR title. - [ ] Format your code, run `pre-commit` before commit. - [ ] Add unit tests. Please write the reason in this PR if no unit tests. - [ ] Provide accuracy results. - [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag. Co-Authored-By: lishuaihui <lishuaihui@baidu.com> * [RL] Support chunked part files loading in IPC snapshot strategy ## Motivation When using IPC snapshot for elastic recovery in RL training, loading a single large pdparams file causes a significant memory spike. This PR refactors `_update_ipc_snapshot` to support loading chunked part files to avoid the memory spike. ## Modifications Refactored `_update_ipc_snapshot` in `fastdeploy/rl/dynamic_weight_manager.py` with a three-level loading priority: 1. Chunked part files (`model_state.tpR{id}.part{N}.pdparams`): Load multiple smaller shards sequentially, freeing memory between each chunk via `gc.collect()` to avoid memory spike. 2. Single full file (`model_state.tpR{id}.pdparams`): Legacy single-file loading path (preserved for backward compatibility). 3. Shared fallback directory (`/shared_ipc_meta/...`): Oldest legacy fallback path (preserved for backward compatibility). Also fixed the rank ID in the file name pattern from hardcoded `tp0` to dynamic `paddle.distributed.get_rank()`. ## Checklist - [ ] Add at least a tag in the PR title. - [ ] Format your code, run `pre-commit` before commit. - [ ] Add unit tests. Please write the reason in this PR if no unit tests. - [ ] Provide accuracy results. - [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag. Co-Authored-By: lishuaihui <lishuaihui@baidu.com> * [RL][BugFix] Fix ambiguous model path format and add legacy fallback in IPC snapshot ## Motivation The previous snapshot file naming `model_state.tp{rank}{id}` concatenated rank and id without a separator, causing ambiguity (e.g., rank=1, id=234 and rank=12, id=34 both produce `tp1234`). Additionally, after the naming format is updated, existing checkpoints saved in the old format would fail to load during elastic recovery, causing unnecessary failures. ## Modifications - Add dot separator between rank and id in snapshot file name: `model_state.tp{rank}{id}` → `model_state.tp{rank}.{id}` - Add Priority 3 legacy fallback to load old-format files (`model_state.tp0{id}.pdparams`) for backward compatibility during rolling upgrades - Update docstring and error message to reflect the new 4-level priority Co-Authored-By: lishuaihui <lishuaihui@baidu.com> * [RL][Test] Add unit tests for DynamicWeightManager._update_ipc_snapshot Cover all 4 loading priority branches (chunked part files, single full pdparams, legacy format, shared directory fallback) with mock-based tests to verify correct behavior without filesystem or GPU dependencies. Co-Authored-By: lishuaihui <lishuaihui@baidu.com> * [RL][Test] Remove unused import 'call' in test_update_ipc_snapshot.py Co-Authored-By: lishuaihui <lishuaihui@baidu.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * [RL] Fix snapshot part index to match filename numbering Parse part index from filename (e.g. .part0.) instead of using enumerate index, so that logs and src_type stay consistent with the actual file naming convention. Co-Authored-By: wikilsh <wiki_hui@qq.com> --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-03-23 16:17:41 +08:00
jc	bb881c2c0a	[PD Disaggregation] pd + cache_storage support vl model (#6906 ) * pd + cache_storage support vl model * support vl model * fix test	2026-03-23 15:35:20 +08:00
jackyYang6	634d23a38a	[Bugfix] Align thinking_budget behavior with ERNIE reasoning flow (#6934 ) * [Bugfix] Align thinking_budget behavior with ERNIE reasoning flow * [Docs] Fix thinking_budget markdown formatting * [Test] Align ernie thinking budget test with process_request_dict	2026-03-23 14:15:55 +08:00
YuBaoku	0b4c1cba9b	[CI] Change 21b ep4 to tp1_dp4 in 4_cards_tests (#6745 ) * [CI] Change 21b ep4 to tp1_dp4 in 4_cards_tests	2026-03-20 20:42:23 +08:00
jackyYang6	00eb12f656	[BugFix][Models] Unify PaddleFormers fused QKV TP loading and stabilize fallback TP path (#6555 ) * [BugFix][Models] avoid custom all-reduce in PaddleFormers fallback TP path and tighten TP-aware layout matching * [BugFix][Models] unify PaddleFormers fused QKV TP loading and align fallback tests	2026-03-20 16:37:58 +08:00
AIbin	bf7e2424d0	[Optimization][Feature]Supports multiple batches of DSK-DSA. (#6930 ) * support DSA_MUTI_BATCH * update test topk * update dsk-dsa	2026-03-20 15:59:22 +08:00
cloudforge1	aca733b95c	[CI]【Hackathon 10th Spring No.32】load_weight_utils unit test (#6740 ) * 【Hackathon 10th Spring No.32】Unit test for load_weight_utils.py * [CI]【Hackathon 10th Spring No.32】rewrite load_weight_utils unit test * [CI]【Hackathon 10th Spring No.32】improve load_weight_utils coverage to 83% - Add test_load_ep_checkpoint_basic: exercises EP checkpoint loading with minimal fixture - Add test_composite_ep_branch: covers EP path in load_composite_checkpoint - Add test_get_weight_iterator_unordered: covers unordered sharded safetensors path * [CI]【Hackathon 10th Spring No.32】align load_weight_utils test with gold standard (tmp_path, split tests) * [CI]【Hackathon 10th Spring No.32】add coverage tests for load_weight_utils - Add test_is_layers_grouped: test layers_are_grouped() with grouped, interleaved, and no-layer keys - Add test_save_model_bf16_cache: exercise save_model decorator with is_checkpoint_bf16=True - Add test_composite_checkpoint_ep: test load_composite_checkpoint use_ep=True branch - Add test_composite_checkpoint_rank_mismatch: test tp_size != rank_dirs ValueError - Add test_composite_checkpoint_kv_quant: test float8_e4m3fn kv_cache path - Add __main__ block for direct execution * [CI]【Hackathon 10th Spring No.32】raise load_weight_utils test delta * [CI]【Hackathon 10th Spring No.32】cover TP sequence-parallel MoE load branches * test: add load_reordered_experts, pre-sharded, and empty-state tests --------- Co-authored-by: cloudforge1 <cloudforge1@users.noreply.github.com>	2026-03-20 13:14:30 +08:00
luukunn	f4a79d4c00	[Optimization]Unified data processing for online and offline (#6891 ) * remove process_request * fix chat * fix unit test * remove process response * fix unit test * fix offline decode * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * fix sampling_params --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-03-19 21:56:09 +08:00
luukunn	c3d8db85c4	[Optimization] Update ZMQ server (#6735 ) * add batch zmq send reaponse * update * Revert "update" This reverts commit `0234a25b47`. * update * remove lock * fix unit test * add unit test * add unit test * pre commit * add unit test * fix unit test * add unit test * fix worker>1 * update zmq_worker_pid * fix unit test * fix unit test * fix unit test * add unit test * fix unit test * fix first token time * fix logprobs * add unit test * op * remore debug log --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2026-03-19 21:53:16 +08:00

1 2 3 4 5 ...

862 Commits