FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-05-06 23:49:39 +08:00

Author	SHA1	Message	Date
YuBaoku	54f7d9f621	[CI] Sync mm_batch_invariant with paddle.mm update (#6557 )	2026-02-28 14:56:42 +08:00
Jiang-Jia-Jun	39a5ea66c8	[BugFix] Enable control socket disable option in API server (#6545 ) * [BugFix] Enable control socket disable option in API server * Update requirements.txt * Update requirements.txt	2026-02-28 10:35:35 +08:00
Weiguo Zhu	8fb24122b8	fix reshard error (#6536 )	2026-02-27 22:22:37 +08:00
cmcamdy	13447279aa	[XPU] Fix PD + MTP (#6495 ) * fix pd + mtp * fix code style * fix PD + MTP, D get P's first token * add anno for gpu(speculate_update) * update draft insertv1 * fix wapper & kernel * fix wapper * fix code stype	2026-02-27 19:07:35 +08:00
JYChen	c6d8fbe526	[BugFix] fix log with paddlefleet.ops (#6528 )	2026-02-27 14:34:29 +08:00
周周周	1503443871	add dsv3 mixed deploy as EP16 TP8 (#6525 )	2026-02-27 14:08:25 +08:00
luukunn	16de778343	update FD_USAGE_STATS_SERVER (#6524 )	2026-02-27 13:28:57 +08:00
sunxin	53aaac69da	[Optimization] Enable BF16 gate computation for GLM and Qwen (#6457 ) * gate bf16 * add gate-fp32 * fix * update baseline * update * update * fix	2026-02-26 21:08:46 -08:00
gongweibao	edd31e8849	[Feature] Add Deterministic Inference Support (#6476 ) * add * [tests] Add Paddle attention determinism tests and refactor resource manager Add comprehensive determinism tests for Paddle attention layer and refactor resource manager for deterministic mode support. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * add * add * add * add * add more * add more * fixsome * fixsome * fix bugs * fix bugs * only in gpu * add docs * fix comments * fix some * fix some * fix comments * add more * fix potential problem * remove not need * remove not need * remove no need * fix bug * fix bugs * fix comments * fix comments * Update tests/ce/deterministic/test_determinism_verification.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/inter_communicator/test_ipc_signal.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/layers/test_paddle_attention_determinism.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/engine/test_sampling_params_determinism.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/layers/test_paddle_attention_determinism.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update tests/layers/test_paddle_attention_determinism_standalone.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix comments * fix import error * fix a bug * fix bugs * fix bugs * fix coverage * refine codes * refine code * fix comments * fix comments * fix comments * rm not need * fix allreduce large tensor bug * mv log files * mv log files * add files --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-02-26 19:31:51 -08:00
zccjjj	c34cb2a8c2	[XPU] [bugfix] fix moe_ffn_quant_type_map bugs about datatype and tensorshape (#6337 )	2026-02-27 09:55:41 +08:00
jc	7b1d787b4b	[BugFix] Fix storage_backend_type comparison bug in cache_transfer_manager.py (#6514 ) Co-authored-by: root <root@tjzj-inf-sci-k8s-hzz1-h62ni7-2178.tjzj.baidu.com>	2026-02-26 19:32:24 +08:00
MingkunZhang	c369f7139f	[Metax][Fix] fix error based pr #6493 (#6521 )	2026-02-26 18:41:35 +08:00
chen	2d1531f3cb	dev opensource model support fa4/flashmasV2/V3 (#6518 )	2026-02-26 17:46:05 +08:00
GoldPancake	2178f2829b	[Speculative Decoding] Support suffix decoding (#6403 ) * support suffix decoding	2026-02-26 11:42:05 +08:00
Yuanle Liu	6d3fede240	[OP][Feature] 统一 limit_thinking_content_length CUDA 算子，支持回复长度限制与注入序列 (#6493 ) * Initial plan * Migrate PRs #6311, #6129, #6305 to develop and merge unit tests Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com> * fix * update * fix * fix ci * fix ci * Initial plan * test: add test_chat_with_response_max_tokens to test_EB_VL_Lite_serving.py Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com> * test: add disable-thinking case to test_chat_with_response_max_tokens Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com> * test: add both reasoning_max_tokens and response_max_tokens case Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com> * fix ci * fix ci * fix ci --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>	2026-02-25 21:36:50 +08:00
zhupengyang	a303eacf62	[XPU] support norm before rope (#6475 )	2026-02-25 18:43:44 +08:00
Wanglongzhi2001	14ea7243e1	[Feature] support mm_processor_kwargs for flexible model	2026-02-25 14:34:33 +08:00
jackyYang6	a29ee57e15	[Feature] Support ThinkingBudget Logits processor to control thinking content length (#6367 ) * feat: add thinking budget logits processor * add unittest * fix pre-commit * add unittest * docs: clarify operator-level vs logits processor usage and conflict guidance --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2026-02-25 14:17:09 +08:00
Longzhi Wang	22566168c3	[Feature] support qkv&gate linear fusion (#6455 ) * [Feature] support qkv&gate linear fusion * add test	2026-02-24 15:20:29 +08:00
jackyYang6	38c3e02470	fix paddleformers fallback (#6465 )	2026-02-23 15:29:13 +08:00
Yonghua Li	e2332a1112	[BugFix] fix num_cpu_blocks computation (#6438 ) * [BugFix] fix num_cpu_blocks computation * [fix] fix syntax and log * [fix] pre-commit * [fix] use getattr * [fix] ci test	2026-02-13 11:05:14 +08:00
kevin	52edf5e9b3	fix mtp acceptance rate decline (#6470 )	2026-02-12 19:56:10 +08:00
AIbin	0eb87467f8	[BugFix]fix RL bug about blockwisefp8 (#6466 ) * fix RL bug about blockwisefp8 * fix moe same bug * fix RL FP8 bug	2026-02-12 09:15:29 +08:00
Divano	ba3b142ff7	[Others] add objgraph to test out of memory (#6456 )	2026-02-11 20:17:20 +08:00
JYChen	40c952e7b5	fix deepgemm import (#6451 ) Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>	2026-02-11 20:10:01 +08:00
zhupengyang	4a8c54926b	[XPU] topk_method=noaux_tc (#6355 )	2026-02-11 16:12:20 +08:00
CSWYF3634076	7380bfb476	[BugFix]fix console log metrics waitting queue count (#6432 ) * [BugFix]fix console log metrics waitting queue count * [BugFix]fix console log metrics waitting queue count unittest	2026-02-11 10:51:49 +08:00
yzwu	60e75ea8e8	[Iluvatar][CI] Fix cannot import get_stop (#6165 )	2026-02-10 16:57:23 +08:00
chen	d937d6ebfd	check (#6424 )	2026-02-10 15:55:17 +08:00
Dangweichong	62ac1e543f	[BugFix] Compatibility fix for download feature links (#6276 ) * [BugFix] Compatibility fix for download feature links * add download time log * remove paddle tensor case	2026-02-10 14:21:08 +08:00
yuxuan	83b4b082ab	[BugFix] Fix model loading error for 300B FP8 EP parallel test case (#6382 ) * fix fp8 bug * fix * fix comment, cn to en * fix ci * del else in utils * fix review	2026-02-10 11:32:57 +08:00
chen	a8ffcaa068	fix fa4 test (#6408 )	2026-02-10 10:57:21 +08:00
kevin	3ce842b55b	[BugFix] add reset shared inputs when update weight dummy run (#6331 ) * fix dummy run input bug * update code * update code * update code * update code	2026-02-10 10:29:03 +08:00
CSWYF3634076	335ab70b1c	[Feature] console print metrics add env (#6413 )	2026-02-10 09:37:11 +08:00
Jiang-Jia-Jun	4e06df520e	[Feature] 统一请求完成日志格式并增强统计信息 (#6405 ) 将原来分散的两行日志合并为一行，同时增加更多统计信息展示。主要变更： - 整合原有的 "Request finished" 和 "token ratio" 两行日志为单行格式 - 新增 InputToken：输入token数量 - 新增 CachedDetail：缓存详情（包含CachedToken/GPU/CPU） - 新增 OutputToken：输出token数量 - 新增 TTFT：首Token时延（秒） - 新增 E2E：整句时延（秒） - 保留 IsPrefill 和 RecoveryStop 标志新日志格式示例： Request=chatcmpl-xxx, InputToken=18, CachedDetail={"CachedToken": 0, "GPU": 0, "CPU": 0}, OutputToken=247, TokenRatio=315.77, TTFT=0.02, E2E=0.78, IsPrefill=False, RecoveryStop=False Co-authored-by: Ducc <ducc@baidu.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-09 21:06:55 +08:00
bukejiyu	5bfc0938e2	[BugFix] PD reorder fix and add ut (#6375 )	2026-02-09 04:42:48 -08:00
CSWYF3634076	ec128068b7	[Others] Exit to ensure no residual processes (cpu cache & dp) (#6377 ) * [Others] good exit single dp * [Others] good exit cpu cache dp>1 * [Others] good exit cpu cache dp>1 unittest	2026-02-09 20:38:38 +08:00
Mattheliu	d75b1b8df1	[Fix] Use paddle.device.get_device_properties for multi-platform compatibility (#6400 ) Replace paddle.device.cuda.get_device_properties() with paddle.device.get_device_properties() to support all hardware platforms (NVIDIA, ILUVATAR, HPU, etc.) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-09 19:15:41 +08:00
chenjian	35c24f3f71	Revert "[Optimize] Optimize ttft for ep (#6098 )" (#6402 ) This reverts commit `90db0bdd0d`.	2026-02-09 19:01:23 +08:00
kevin	d60daca4a8	[Feature] consider multimodal model when dummy run (#6045 ) * add mm do profile * updata code * update code * update code * update code * update test case * update code * update code * fix xpu bug * update code * add mm do profile * update test case * update code	2026-02-09 17:49:55 +08:00
sunxin	783d56e28a	[Optimization] Support logprob async copy (#6362 ) * support logprob async copy * fix prompt logprob * fix xpu	2026-02-09 17:32:12 +08:00
MingkunZhang	268276e287	[Metax][CI] e2e ci tests enable cuda graph (#6401 )	2026-02-09 16:25:23 +08:00
CSWYF3634076	eb8d639fe3	[Engine] apiserver&engine exit when work failed to start (#6322 )	2026-02-09 15:07:40 +08:00
bukejiyu	dc5917289d	[loader]supoort wint2 backend (#6139 ) * support wint2 * update	2026-02-08 22:42:36 -08:00
chen	f18f3b99ed	fix zmq hung when sampled_token_id=0 (#6398 )	2026-02-09 14:13:18 +08:00
Mattheliu	c776d483e4	[BugFix]fix handle 4 return values from noaux_tc_redundant op (#6384 ) * fix: handle 4 return values from noaux_tc_redundant op The noaux_tc_redundant CUDA op is defined with 4 outputs in PD_BUILD_STATIC_OP: - output_tensor (scores) - topk_values - topk_indices - tokens_per_expert_stats_list_out (inplace updated) The Python code was only unpacking 3 values, causing: ValueError: too many values to unpack (expected 3) This fix correctly unpacks all 4 return values, ignoring the inplace updated tensor which is the same as the input tokens_per_expert_stats_list. Co-Authored-By: Claude (Claude Opus 4.5) <noreply@anthropic.com> * fix: make noaux_tc_redundant return 4 values to match OP definition The PD_BUILD_STATIC_OP defines 4 outputs but the function only returned 3, causing inconsistent behavior across different Paddle framework versions. This fix explicitly returns 4 values: - scores (inplace modified) - topk_values - topk_indices - tokens_per_expert_stats_list (inplace modified via atomicAdd) Co-Authored-By: Claude (Claude Opus 4.5) <noreply@anthropic.com> --------- Co-authored-by: Claude (Claude Opus 4.5) <noreply@anthropic.com>	2026-02-09 13:17:47 +08:00
JYChen	9bcd863902	[Others] support import deepgemm/deepep from fleet ops (#6351 ) * update paddleformers to v1.0 * only change import fleetpath	2026-02-09 11:53:13 +08:00
周周周	2b4748de4f	[MTP] refactor MTP pre_process (#6358 )	2026-02-09 10:47:15 +08:00
Jiang-Jia-Jun	18e79dd660	[Metrics] Support cpu-cache-block-num (#6390 ) Co-authored-by: root <root@szzj-bcc-offline-1487319.szzj.baidu.com>	2026-02-09 10:27:56 +08:00
Yonghua Li	5ac5ecd0b0	[BugFix] fix cache transfer tasks failure after cache cleared (#6202 ) * [fix] fix cache transfer tasks failure after cache cleared * [fix] fix submit_task * [fix] fix cache manager hang when clearing prefix cache * [fix] fix list_proxy has no clear method * [fix] fix barrier * [fix] add barrier0 * [fix] add cache_task_is_paused_signal * [fix] fix condition * [fix] fix cache transfer sync and delay prefix cache tree clearing * [fix] fix typo * [chore] polish code * [fix] revert only rank0 write kv_cache_status_signal * [fix] fix thread pool and prefix cache manager hang * [fix] add timeout for task_swapping_event * [fix] tolerate prefix cache manager error while prefix tree is cleared * [chore] add more log * [fix] fix test_prefix_cache_manager * [fix] fix prefix_cache_status_signal usage	2026-02-08 15:33:56 +08:00

1 2 3 4 5 ...

1765 Commits