FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Author	SHA1	Message	Date
luukunn	3651113ee5	[DataProcessor]Remove ENABLE_V1_DATA_PROCESSOR (#7052 ) * remove ENABLE_V1_DATA_PROCESSOR * fix unit test * fix unit test	2026-04-01 09:53:41 +08:00
qwes5s5	ee2b965f5f	adjust config info (#7054 )	2026-03-31 21:26:05 +08:00
Yonghua Li	a3cc3aa777	[BugFix] reset exist tasks signal in clear_data (#7111 ) * [BugFix] reset exist tasks signal in clear_data * [Fix] fix stale exist tasks signal after weight update * [Chore] downgrade detected new requests log to DEBUG level * [fix] adjust continue place	2026-03-31 21:24:08 +08:00
周周周	fd44bb7cbf	cpmmot (#7105 ) Co-authored-by: “liuruian” <liuruian@baidu.com>	2026-03-31 16:13:44 +08:00
cloudforge1	5c5dc66aa7	[CI]【Hackathon 10th Spring No.34】async_expert_loader 单测补充 (#6731 ) * [CI]【Hackathon 10th Spring No.34】async_expert_loader 单测补充 * [CI]【Hackathon 10th Spring No.34】async_expert_loader 单测补充 --------- Co-authored-by: cloudforge1 <cloudforge1@users.noreply.github.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2026-03-31 15:29:35 +08:00
YilongGuo	dd61e7e421	[Qwen3VL] Add clear_grpah_opt_backend method to Qwen3VLForConditionalGeneration (#7086 ) Add clear_grpah_opt_backend method that delegates to the underlying model to clear cuda graph optimization backend. Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2026-03-31 13:48:25 +08:00
YuBaoku	db6e637f4f	[CI] Remove skip logic for *.txt-only changes (#7104 )	2026-03-31 13:24:50 +08:00
huicongyao	dd2aa10ed4	fix cuda graph capture failure in CI test (#7094 )	2026-03-31 11:05:51 +08:00
qwes5s5	daa95244f7	abort requests (#6992 )	2026-03-31 11:02:26 +08:00
Yonghua Li	6d9739f360	[BugFix] fix speculative gauge metrics in multi api server (#7082 )	2026-03-31 10:52:50 +08:00
chenjian	6727df8286	[Optimization] Optimize ttft for prefill pd (#6680 ) * optimize ttft * fix * fix * fix ci * fix ci * fix * fix bug * fix * add comments * fix ci * fix * fix ci * fix format * update according to review * add comment * fix * fix format	2026-03-30 20:36:23 +08:00
jackyYang6	05f2d95729	[RL] Adapt async rollout checkpoint update flow (#7042 ) * update checkpoint-transfer flow and control update_weights params * test: add update_weights route validation	2026-03-30 19:19:34 +08:00
yzwu	8789329457	[Iluvatar] Support wi4a16 group_gemm (#7078 )	2026-03-30 19:03:51 +08:00
kevin	18062c55bb	[BugFix][KVCache] Fix mm hash boundary comparison in get_block_hash_extra_keys (#6929 ) * [BugFix][KVCache] Fix mm hash boundary comparison in get_block_hash_extra_keys Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [BugFix][KVCache] Fix test_get_block_hash_extra_keys_boundary_cases assertions ## Motivation 测试用例 `test_get_block_hash_extra_keys_boundary_cases` 中，Block [4,8) 的调用错误地传入了 `mm_idx=1`，跳过了 img0[2,5)；但 img0 覆盖 token 4，token 4 属于 block [4,8)，应被包含在 hash_keys 中。此外，所有 assertEqual 只校验了 hash_keys，未校验返回的 mm_idx 游标。 ## Modifications - `test_get_block_hash_extra_keys_boundary_cases`： - 改为链式调用，用上一次返回的 mm_idx 作为下一次入参，模拟真实调用循环 - Block [4,8) 入参从 `mm_idx=1` 改为沿用上次返回的 `mm_idx=0`，期望值从 `[]` 改为 `["hash-0"]` - 所有断言改为 `assertEqual((mm_idx, hash_keys), (...))` 同时校验游标 - `test_get_block_hash_extra_keys_no_overlap_at_boundaries`： - Case B 入参从 `mm_idx=1` 改为 `mm_idx=0`（从头遍历，img-a 走 continue） - 所有断言增加 mm_idx 校验 - `test_get_block_hash_extra_keys_image_crosses_block_boundary`： - 所有断言增加 mm_idx 校验 - `test_get_block_hash_extra_keys_no_mm_inputs`： - 断言增加 mm_idx 校验 - `test_get_block_hash_extra_keys_handles_multimodal_segments`： - call2、call3 断言增加 mm_idx 校验 ## Usage or Command ```bash python -m pytest tests/cache_manager/test_prefix_cache_manager.py::TestPrefixCacheManagerCoverage -v -k "get_block_hash_extra_keys" ``` Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: chengyanfu <chengyanfu@baidu.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-30 17:13:31 +08:00
周周周	76cf5e9496	[append attention] clean code (#7062 )	2026-03-30 15:07:53 +08:00
luukunn	b9f8873367	[Optimization]Merge Text processor (#7030 ) * merge text processor * update * fix unit test * merge messages2ids * fix unit test * 删除重复代码 * remove redundant code * delete code * fix unit test	2026-03-30 15:02:35 +08:00
Jiang-Jia-Jun	1670b011a5	Revert "[BugFix] Add lock to avoid generating nan when using storage cache (#…" (#7075 ) This reverts commit `6d2ab8f2c0`.	2026-03-30 14:52:05 +08:00
jc	6d2ab8f2c0	[BugFix] Add lock to avoid generating nan when using storage cache (#7046 ) * Add lock to avoid generating nan * up	2026-03-30 14:50:32 +08:00
zhangbo9674	5c60e2fc6f	fix bug in cudagraph (#7069 )	2026-03-30 14:24:23 +08:00
mpgemm	1a1d048774	[Feature] Support NVFP4 Flashinfer-cutedsl MoE on SM100 (#6963 )	2026-03-30 11:37:04 +08:00
mouxin	61a9079c60	[Feature] Update logging (#7072 )	2026-03-30 11:20:27 +08:00
Longzhi Wang	2eea6fa97a	[BugFix] Fix kv cache int8 dynamic quant on flash and flash_mask backend (#7028 ) * [BugFix] Fix kv cache int8 dynamic quant on flash and flash_mask backend * add constexpr and code style clean * add test * fix code style * fix test	2026-03-30 11:17:15 +08:00
mpgemm	7a20eaebe8	[Feature] Support cute cpp Encoder FA4 (#7016 ) * add cute cpp fa4 * 删掉注释 * 修正合并错误 * sm_version放到函数内 * ci错误	2026-03-30 10:54:56 +08:00
kevin	9765fa7313	[Refactor] Replace --skip-mm-profiling with --deploy-modality text (#7048 ) * [Feature] Support --skip-mm-profiling to skip multimodal token overhead in profiling ## Motivation 在多模态模型（如 Qwen2.5-VL、ERNIE4.5-VL 等）部署时，`get_max_chunk_tokens` 会在基础 token 数之上额外叠加 mm token 数，用于 profiling 阶段预留显存。某些场景下（如已知图像 token 数较小，或希望节省显存），用户希望跳过该多模态 token 额外开销的计算，直接使用文本 token 数进行 profiling。 ## Modifications - `fastdeploy/engine/args_utils.py`：`EngineArgs` 新增 `skip_mm_profiling: bool = False` 字段，parser 新增 `--skip-mm-profiling` 启动参数 - `fastdeploy/config.py`：`ModelConfig.__init__` 新增 `self.skip_mm_profiling = False`； `FDConfig.get_max_chunk_tokens` 中增加 `not self.model_config.skip_mm_profiling` 判断，开启后跳过 mm token 叠加，直接返回基础 `num_tokens` ## Usage or Command 启动服务时添加参数： ```bash --skip-mm-profiling ``` ## Checklist - [x] Add at least a tag in the PR title. - [x] Format your code, run `pre-commit` before commit. - [ ] Add unit tests. 本功能为配置参数透传，逻辑简单，已有相关 config 单元测试覆盖。 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [Refactor] Replace skip_mm_profiling with deploy_modality=text to skip mm profiling ## Motivation 原 `--skip-mm-profiling` 参数与已有的 `deploy_modality` 参数功能存在语义重叠：当以纯文本模式（`deploy_modality=text`）部署时，本就不需要为多模态 token 预留显存。引入独立参数增加了配置复杂度，复用 `deploy_modality` 更加直观和一致。 ## Modifications - `fastdeploy/engine/args_utils.py`：删除 `EngineArgs.skip_mm_profiling` 字段及 `--skip-mm-profiling` 启动参数 - `fastdeploy/config.py`：删除 `ModelConfig.__init__` 中的 `self.skip_mm_profiling = False`； `FDConfig.get_max_chunk_tokens` 中将条件改为 `self.deploy_modality != DeployModality.TEXT`，当 deploy_modality 为 text 时直接返回 `max_num_batched_tokens`，跳过 mm token 叠加 ## Usage or Command ```bash # 以文本模式部署，跳过 mm token profiling 开销（替代原 --skip-mm-profiling） python -m fastdeploy.entrypoints.openai.api_server \ --deploy-modality text \ --model /path/to/model \ ... ``` ## Checklist - [x] Add at least a tag in the PR title. - [x] Format your code, run `pre-commit` before commit. - [ ] Add unit tests. 本次为参数重构，逻辑等价替换，已有 config 单元测试覆盖。 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-29 19:40:27 -07:00
YuBaoku	a7cbe3ff91	[CI] Adapt to codecov action changes for Node.js 24 (#7064 )	2026-03-29 16:49:44 +08:00
YuBaoku	842c60809a	[CI] Align with Paddle layer_norm kernel update (#7056 )	2026-03-27 22:58:01 +08:00
Zhang Yulong	f25760f4e6	[CI] Update docker run command in unit test coverage workflow (#7050 ) Removed the --ipc=host option from the docker run command.	2026-03-27 19:53:09 +08:00
cmcamdy	bf8e9bf81d	[XPU] Fix speculate schedule (#7049 ) * [BugFix] xpu fix speculate schedule cache kernel * fix code style	2026-03-27 18:28:17 +08:00
cloudforge1	11ad95ba91	[CI]【Hackathon 10th Spring No.43】ernie4_5_mtp 单测补充 (#6738 ) * [CI]【Hackathon 10th Spring No.43】ernie4_5_mtp 单测补充 * [CI]【Hackathon 10th Spring No.43】add mapping and forward branch coverage --------- Co-authored-by: cloudforge1 <cloudforge1@users.noreply.github.com> Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2026-03-27 17:15:53 +08:00
fxyfxy777	8ff8236a6f	[Optimization] optimize fused_swiglu_fp8_quant_kernel (#7007 ) * use sharemem * B card test * fix acc error	2026-03-27 16:10:16 +08:00
GoldPancake	6693bcd0e4	[BugFix] fix clear_parameters in draft cudagraph (#7035 )	2026-03-27 15:28:50 +08:00
mouxin	6c24f1955c	[Feature] Update error logging (#7045 )	2026-03-27 15:13:12 +08:00
YuBaoku	10c59f78d6	[CI] disable tests/e2e/test_Qwen3VLMoe_serving.py in unit_test (#7044 )	2026-03-27 14:15:14 +08:00
Jiaxin Sui	c3ed7db28d	[XPU] [CI] Fix xpu ci bug (#7014 ) * fix xpu ci bug * Remove unnecessary blank line in conftest.py * Update upload-artifact action to version 6 * Update _xpu_8cards_case_test.yml * fix ci bug * Change exit code on test failure to 1 * fix ci bug * fix ci bug * fix ci bug * fix ci bug * Update conftest.py	2026-03-27 10:29:34 +08:00
Zhang Yulong	a31d4bfbdf	[CI] update mtp case (#7031 )	2026-03-27 10:21:37 +08:00
luukunn	14b17c06af	add completion_tokens default (#7032 )	2026-03-26 21:06:23 +08:00
xiegegege	209e5cf7f4	[CE]add 21b mooncake yaml (#7033 ) * [CE]add 21b cpu cache ,glm mtp,glm for rl config * [CE]add 21b tp2 yaml * [CE]add 21b mooncake yaml * add fastdeploy benchmark,paddletest-155 * [CE] adjust vl wint4 config * [CE]add glm mtp with updatemodel config * [CE]fix * fix * test * test * test --------- Co-authored-by: xiegegege <>	2026-03-26 20:01:05 +08:00
Yonghua Li	442514252c	[fix] remove all gather ep group control requests in normal cases (#7026 )	2026-03-26 18:41:29 +08:00
Dangweichong	3c9fd818e3	[BugFix] Fix RDMA initializes failed (#7025 )	2026-03-26 17:45:39 +08:00
huicongyao	25d64efdc4	[Speculative Decoding] Refactor Eagle MTP hidden states copy (#6812 ) * reformat eagle_get_hidden_states & eagle_get_self_hidden_states * readibility * fix xpu bug * fix coverage failure * change luanch params & parallelize position_map compute * Fix MTP-related bugs in FastDeploy centralized inference * fix * refactor mtp hidden_states process * fix * add unittest & optimize kernel * remove useless code * fix	2026-03-25 22:54:31 -07:00
freeliuzc	4fd877ed43	[Speculative Decoding] Support mtp expert-parallel and support different modality deploy (#7018 ) * support mtp ep and support different modality * fix default arg	2026-03-26 13:52:16 +08:00
YuBaoku	61ebac49ef	[CI] Fix test_communication.py and add port cleanup (#7021 )	2026-03-26 10:56:40 +08:00
luukunn	e6804ba97d	[Optimization]Streaming requests return complete special tokens. (#6998 ) * return special token * add completions * update * fix * add prompt_token_ids& completion_token_ids=None, * fix unite test	2026-03-26 09:49:43 +08:00
luukunn	d5cb2767d7	[Optimization] Deduplicate shared image/video utilities across VL processors (#6988 ) * step1~3 * fix import path * 删除重复代码 * 删除重复代码 * 删除重复代码 * fix import path * update * fix import path * add unit test * fix * update * fix unit test	2026-03-26 09:49:33 +08:00
chen	1502b6f43e	add instantiations for decoder rope enfore_fmul_rn=true (#7009 )	2026-03-25 22:22:10 +08:00
Jiang-Jia-Jun	482f951ee9	Update copilot-instructions.md	2026-03-25 21:09:24 +08:00
YuBaoku	b8bb34c7dd	[CI] disable tests/distributed/test_communication.py in unit_test (#7019 )	2026-03-25 20:54:55 +08:00
Yonghua Li	a7f52c300d	[Feature] support v1 update/clear api for RL (#6761 ) * [Feature] support v1 update/clear api for RL * [fix] fix execute_model and add sleep/wakeup api * [fix] fix mtp and key_prefix * [chore] move _update_key_prefix to resume method * [fix] make the interface safe to call multiple times * [fix] fix some tiny bugs * [chore] make small changes against pr review * [docs] add docs for weight update * [test] add some tests and update docs * [style] fix code style check * [test] fix ci * [fix] fix stale control responses when control method timed out * [chore] remove unused code * [chore] fix code style * [chore] optimize tags and key_prefix * [test] fix ci * [chore] fix code style * [test] fix ci * [fix] fix ep control * [fix] fix ep control for engine cache queue	2026-03-25 19:18:46 +08:00
gongweibao	48cfb608aa	[FDConfig] Reduce FD_CUSTOM_AR_MAX_SIZE_MB default from 64 to 8 (#6997 ) Most single-GPU and small-model deployments do not need 64MB custom all-reduce buffers. Lowering the default to 8MB reduces unnecessary shared memory allocation. Tests that require larger buffers now explicitly set the value. Co-authored-by: gongweibao <gognweibao@baidu.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-25 17:40:01 +08:00
freeliuzc	7a6c28781b	[Speculative Decoding] Optimize attn_mask_offset and fix mtp bug (#7005 ) * optimize attn_mask_offset and optimize mtp usage * delete useless branch * fix kernel format * fix kernel runner	2026-03-25 01:52:06 -07:00

... 2 3 4 5 6 ...

5085 Commits