FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Author	SHA1	Message	Date
jackyYang6	00a6a73431	docs: fix pre-commit error of markdown (#6100 )	2026-01-20 19:32:05 +08:00
ChowMingSing	bf60e103b6	[CI]Fix test case (#6111 )	2026-01-20 17:47:44 +08:00
Ryan	dda27e50f5	[Graph Optimization] remove static_op_get_block_shape_and_split_kv_block from cudagraph (#6081 ) * rm static_op_get_block_shape_and_split_kv_block from cudagraph * update max_capture_shape * fallback: zeros -> empty to avoid coverage check * check graph_opt_config exists * add max_capture_shape_dy2st && full_cuda_graph: false -> true in 28B vl test * add use_cudagraph flag to control step_use_cudagraph	2026-01-20 14:05:18 +08:00
zhupengyang	45ebb2efb4	[XPU] support plugin model (#6092 )	2026-01-20 13:00:09 +08:00
jackyYang6	988e0bc338	[Feature] Add PaddleFormers fallback backend (#5999 ) * feat(paddleformers): add dense text model fallback backend * docs(paddleformers): add user guide and fix code review issues * add fallback unit test * precommit format * fix pre-commit * fix: address code review feedback * docs: add PaddleFormers backend documentation (EN) and simplify installation --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2026-01-19 21:50:50 +08:00
GoldPancake	879e45f6b3	fix compute logits problem (#6093 )	2026-01-19 20:12:14 +08:00
xiegegege	e22c4e29bb	[CE]add paddleocr config yaml (#6097 )	2026-01-19 20:07:42 +08:00
Jingfeng Wu	7d44009f39	[FDConfig] transfer metrics_port (#6056 ) * transfer metrics_port * transfer metrics_port	2026-01-19 19:58:57 +08:00
cmcamdy	211dd81ca7	add pd+mtp ci (#6090 )	2026-01-19 19:21:24 +08:00
Jiaxin Sui	e0d15a2ded	[XPU][CI] Xpu ci update (#6089 ) * add xpu ci case * add xpu ci case * add xpu ci case * Change runner from XPU-P800-8Card to XPU-P800 * Remove cache queue port from test_pd_03b_tp1.py Removed cache queue port arguments from test cases. * Remove cache queue port from test_pd_21b_tp2.py Removed cache queue port arguments from test cases. * Update README with PYTHONPATH setup instructions Added instructions for setting PYTHONPATH in CI scripts.	2026-01-19 16:09:09 +08:00
ChowMingSing	496cc23089	[CI]Fix test cases failing under Python 3.12 (#6059 ) * 修复python3.12下测试用例错误 * 修复python3.12下测试用例错误 --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2026-01-19 15:41:12 +08:00
sunxin	a4144e0b8e	[Optimization] Avoid unnecessary penalty computation (#6078 )	2026-01-19 15:24:12 +08:00
GoldPancake	05fbd89a8e	[Speculative Decoding][Bugfix] Fix MTP logprob issues caused by max_num_logprobs (#6084 )	2026-01-19 14:55:36 +08:00
ddchenhao66	3685474799	[XPU] xpu support mm prefill batch (#6072 ) Co-authored-by: ddchenhao66 <dhaochen163.com>	2026-01-19 14:36:35 +08:00
sunxin	9dc1c74d36	fix opt qknorm (#6080 )	2026-01-19 12:07:20 +08:00
YuBaoku	ac6fa6d725	[CI] Add 4-GPU e2e test job (#6082 )	2026-01-19 10:42:14 +08:00
kevin	0e0eaa1c57	[BugFix] fix mm revert bug (#6061 ) * fix mm revert bug * update code	2026-01-16 08:13:34 -08:00
Jiaxin Sui	70a962df53	[XPU][CI] XPU CI refactor (#6053 ) * add xpu ci case * add xpu ci case * add xpu ci case * Change runner from XPU-P800-8Card to XPU-P800	2026-01-16 20:57:58 +08:00
GoldPancake	b917b56aca	[Bugfix] Fix logprob issues caused by max_num_logprobs (#6067 )	2026-01-16 04:40:18 -08:00
周周周	97f96e34ca	only update self.exist_prefill_task_signal in v0 (#6064 ) * commit * commit * commit --------- Co-authored-by: xiaoluomi <1037819816@qq.com>	2026-01-16 20:11:55 +08:00
MingkunZhang	0d372e4fb2	[Metax][CI] update jenkins github action version (#6065 )	2026-01-16 15:06:14 +08:00
GoldPancake	bda38aa519	[Speculative Decoding] Support MTP for GLM-4.5-Air (#6047 ) * glm mtp * add spec neox partial rope	2026-01-16 14:35:24 +08:00
qwes5s5	b2a2e11551	[Feature] Support stopping the inference for the corresponding request in the online service after a disconnection request. (#5320 ) * request disconnect * request disconnect * fix bug * fix bug--amend --------- Co-authored-by: root <root@yq01-sys-rpm26xc1knu.yq01.baidu.com>	2026-01-16 11:46:13 +08:00
周周周	8f035101ad	initial commit (#6054 ) Co-authored-by: xiaoluomi <1037819816@qq.com>	2026-01-16 10:49:38 +08:00
fxyfxy777	4c92035f2d	[Feature] Unify fp8 block_wise quant ops (#5991 ) * quant stash * blockwise_quant * precommit * rm tensor.cut * tp ok * add swiglu * rm outdate code * fix activate ut * change baseline * fix baseline error	2026-01-15 05:50:37 -08:00
周周周	d38cd8b40b	[UNITEST] add EP TP test_fused_moe CI (#5989 )	2026-01-15 21:37:32 +08:00
guozhuangzhuang	d2f1ec2b1b	[XPU] fix(xpu_model_runner): reset seq_lens_encoder to 0 for decode role in PD splitwise mode (#6048 ) * fix(xpu_model_runner): reset seq_lens_encoder to 0 for decode role in PD splitwise mode - Set seq_lens_encoder to 0 when splitwise_role is 'decode' during prefill processing - This ensures proper continuation of decoding after P generate first token in PD disaggregated architecture - Fixes potential sequence length inconsistency in PD splitwise deployment scenarios * format	2026-01-15 20:24:56 +08:00
freeliuzc	49617d9832	[Feature]Support tag phase token enforce generation (#6034 ) * support tag phase token enforce generation * optimize note and some feature * fix sampler unit test --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2026-01-15 03:59:55 -08:00
freeliuzc	17866c028e	add more cases for attention unit test (#5931 )	2026-01-15 19:52:35 +08:00
cmcamdy	59d8ae0a25	[XPU] Speculate Decoding + PD, benchmark fix (#6036 ) * fix mtp pd * fix kernel * fix code style * fix kernel * fix test / clear debug code * fix test / clear debug code * fix codestyle * fix codestyle * fix codestyle	2026-01-15 19:19:03 +08:00
lizexu123	6619298b50	【Optim】Optimize grid dimensions using max_tokens_per_expert for MoE models (#6007 ) * update w4afp8 * build.sh ok * support cuda_graph * fix * add test * fix max_tokens_per_expert * >=70 * fix * compute_max_tokens_from_prefix_sum in w4afp8 * compute_max_tokens use cub	2026-01-15 19:18:42 +08:00
Jiaxin Sui	b0fc9cadb5	[XPU][CI] update paddle version (#6044 ) * Remove cache queue port from test configuration Removed cache queue port configuration from test. * Remove cache queue port from test_vl_model.py Removed cache queue port argument from test configuration. * Update test_w4a8.py * Remove cache queue port from test_mtp.py Removed cache queue port configuration from test. * Remove cache queue port from test_logprobs_21b_tp4 Removed cache queue port configuration from test. * Remove cache queue port from test configuration Removed cache queue port configuration from test. * Update test_ep4tp4_online.py * Update run_xpu_ci_pytest.sh to comment out installations Comment out PaddlePaddle installation and XVLLM download steps.	2026-01-15 15:17:48 +08:00
Daci	e10b51b8c6	[Feature] get_output_kv_signal blocking read mode & send_first_token (#5836 ) * get_output_kv_signal blocking read mode * send first token before recycle * xpu get_output_kv_signal blocking read mode --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2026-01-15 14:11:03 +08:00
Cheng Yanfei	fbcccaa750	[Intel HPU] enable MoE EP for hpu (#5855 ) * enable HPU MoE EP * MoE intermediate_scale stack * enable loader_v1 esp for tensor_wise_fp8 TP or EP * modify activation_scale name	2026-01-15 13:08:00 +08:00
ming1753	7c56041272	[BugFix] fix PaddleOCR-VL illegal memory (#6042 )	2026-01-14 20:07:43 -08:00
zhupengyang	24ffa7c991	[XPU] fix moe num_expert (#6014 )	2026-01-15 10:49:36 +08:00
RAM	b3f59fd9b5	[RL][CI] Support Async R3 And Add Accuracy Test (#5937 ) * add bs1 r3 test case * async put * r3 test case 1.0 * success run eb5 * refine test case * pre-commit * add eb45 & glm testcase * format code * add p2pstore requirements * support only last turn * R3 use worker log * refine code &fix ci bug * refine error mesg * fix empty input bug * Success set acc ci of eb45 and glm45 * refine code * fix bug	2026-01-14 04:25:06 -08:00
ddchenhao66	9373f373dc	[XPU] fix multi-batch bug in VL model (#6015 ) * [XPU] fix multi-batch bug in VL model * Add command to kill additional port processes --------- Co-authored-by: ddchenhao66 <dhaochen163.com> Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>	2026-01-14 19:44:58 +08:00
xiaoxiaohehe001	6f72be7c3e	[Optimize] Qwen2.5-VL vision model with merged linear layers and unif… (#6037 ) * [Optimize] Qwen2.5-VL vision model with merged linear layers and unified normalization * [Optimize] Qwen2.5-VL vision model with merged linear layers and unified normalization	2026-01-14 19:21:31 +08:00
luukunn	93b7675a64	[Feature]Report FD statistical information (#5646 ) * add usage commit * update envs and xpu * add requirements * fix quantization value * add unit test * add unit test * fix unit test * add unit test * add unit test * add unit test * add unit test * add unit test * add unit test * fix FD_USAGE_STATS_SERVER * fix * fix * add doc * add doc * add doc * add doc * add doc * fix file name	2026-01-14 17:54:01 +08:00
MingkunZhang	273e79aa5b	[Metax][Fix] fix self.share_inputs['preempted_idx']=[] incorrect use (#6038 ) Co-authored-by: root <root@lt-wks-10-0-180-15.pub.metax-tech.com>	2026-01-14 17:06:00 +08:00
MingkunZhang	32fb04703b	[Metax][Doc] update metax gpu 'get_started' doc (#6035 ) Co-authored-by: root <root@lt-wks-10-0-180-15.pub.metax-tech.com>	2026-01-14 16:11:43 +08:00
YuBaoku	2c17acd767	[CI] Adapt vl_model baseline changes due to Paddle update_2 (#6033 )	2026-01-14 15:22:26 +08:00
MingkunZhang	f3587b592c	[Metax][CI] remove 28B VL model test sampling randomness (#6032 ) Co-authored-by: root <root@lt-wks-10-0-180-15.pub.metax-tech.com>	2026-01-14 14:00:41 +08:00
Jiaxin Sui	926a26074f	[XPU][CI] Cache queue port bug fix (#6030 ) * Remove cache queue port from test configuration Removed cache queue port configuration from test. * Remove cache queue port from test_vl_model.py Removed cache queue port argument from test configuration. * Update test_w4a8.py * Remove cache queue port from test_mtp.py Removed cache queue port configuration from test. * Remove cache queue port from test_logprobs_21b_tp4 Removed cache queue port configuration from test. * Remove cache queue port from test configuration Removed cache queue port configuration from test. * Update test_ep4tp4_online.py	2026-01-14 12:51:40 +08:00
chenjian	74d0f1c01f	[Optim] Robust sync status when preempted happens (#5796 ) * [Bug fix] Sync status for caching output cache * fix * fix * fix bug * fix * fix * support xpu * fix * fix * fix * fix * fix * fix ci * fix ci * fix xpu --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2026-01-14 12:07:33 +08:00
Ryan	0d1a5e70bc	[Graph Optimization] Add `full_cuda_graph` to control subgraph split (#6027 )	2026-01-14 11:43:59 +08:00
Yonghua Li	456637002d	[BugFix] fix cache transfer manager updating/clearing (#5930 ) * [fix] fix cache transfer manager updating/clearing * [fix] fix code style * [fix] fix config * [fix] fix engine client * [fix] let worker update kv cache status signal * [fix] update worker process * [fix] fix clear/update for case if comm group is shutdown * [fix] update dynamic weight manager * [fix] fix port * [fix] add num_cpu_blocks arg for async_llm, and remove unnecessary waiting	2026-01-13 05:09:29 -08:00
chenjian	6da06abc17	[Featue] Enable output caching by default (#5987 ) Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2026-01-13 19:34:21 +08:00
MingkunZhang	3772810b0a	[Metax][CI] update test_ernie_28b_vl.py image result keywords (#6022 ) Co-authored-by: root <root@lt-wks-10-0-180-15.pub.metax-tech.com>	2026-01-13 17:15:10 +08:00

1 2 3 4 5 ...

4454 Commits