FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-05-10 09:31:48 +08:00

Author	SHA1	Message	Date
RAM	b3f59fd9b5	[RL][CI] Support Async R3 And Add Accuracy Test (#5937 ) * add bs1 r3 test case * async put * r3 test case 1.0 * success run eb5 * refine test case * pre-commit * add eb45 & glm testcase * format code * add p2pstore requirements * support only last turn * R3 use worker log * refine code &fix ci bug * refine error mesg * fix empty input bug * Success set acc ci of eb45 and glm45 * refine code * fix bug	2026-01-14 04:25:06 -08:00
ddchenhao66	9373f373dc	[XPU] fix multi-batch bug in VL model (#6015 ) * [XPU] fix multi-batch bug in VL model * Add command to kill additional port processes --------- Co-authored-by: ddchenhao66 <dhaochen163.com> Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>	2026-01-14 19:44:58 +08:00
xiaoxiaohehe001	6f72be7c3e	[Optimize] Qwen2.5-VL vision model with merged linear layers and unif… (#6037 ) * [Optimize] Qwen2.5-VL vision model with merged linear layers and unified normalization * [Optimize] Qwen2.5-VL vision model with merged linear layers and unified normalization	2026-01-14 19:21:31 +08:00
luukunn	93b7675a64	[Feature]Report FD statistical information (#5646 ) * add usage commit * update envs and xpu * add requirements * fix quantization value * add unit test * add unit test * fix unit test * add unit test * add unit test * add unit test * add unit test * add unit test * add unit test * fix FD_USAGE_STATS_SERVER * fix * fix * add doc * add doc * add doc * add doc * add doc * fix file name	2026-01-14 17:54:01 +08:00
YuBaoku	2c17acd767	[CI] Adapt vl_model baseline changes due to Paddle update_2 (#6033 )	2026-01-14 15:22:26 +08:00
MingkunZhang	f3587b592c	[Metax][CI] remove 28B VL model test sampling randomness (#6032 ) Co-authored-by: root <root@lt-wks-10-0-180-15.pub.metax-tech.com>	2026-01-14 14:00:41 +08:00
Jiaxin Sui	926a26074f	[XPU][CI] Cache queue port bug fix (#6030 ) * Remove cache queue port from test configuration Removed cache queue port configuration from test. * Remove cache queue port from test_vl_model.py Removed cache queue port argument from test configuration. * Update test_w4a8.py * Remove cache queue port from test_mtp.py Removed cache queue port configuration from test. * Remove cache queue port from test_logprobs_21b_tp4 Removed cache queue port configuration from test. * Remove cache queue port from test configuration Removed cache queue port configuration from test. * Update test_ep4tp4_online.py	2026-01-14 12:51:40 +08:00
chenjian	74d0f1c01f	[Optim] Robust sync status when preempted happens (#5796 ) * [Bug fix] Sync status for caching output cache * fix * fix * fix bug * fix * fix * support xpu * fix * fix * fix * fix * fix * fix ci * fix ci * fix xpu --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2026-01-14 12:07:33 +08:00
Ryan	0d1a5e70bc	[Graph Optimization] Add `full_cuda_graph` to control subgraph split (#6027 )	2026-01-14 11:43:59 +08:00
Yonghua Li	456637002d	[BugFix] fix cache transfer manager updating/clearing (#5930 ) * [fix] fix cache transfer manager updating/clearing * [fix] fix code style * [fix] fix config * [fix] fix engine client * [fix] let worker update kv cache status signal * [fix] update worker process * [fix] fix clear/update for case if comm group is shutdown * [fix] update dynamic weight manager * [fix] fix port * [fix] add num_cpu_blocks arg for async_llm, and remove unnecessary waiting	2026-01-13 05:09:29 -08:00
MingkunZhang	3772810b0a	[Metax][CI] update test_ernie_28b_vl.py image result keywords (#6022 ) Co-authored-by: root <root@lt-wks-10-0-180-15.pub.metax-tech.com>	2026-01-13 17:15:10 +08:00
MingkunZhang	5afeef69d6	[Metax][CI] update test_ernie_28b_vl.py (#6019 ) Co-authored-by: root <root@lt-wks-10-0-180-15.pub.metax-tech.com>	2026-01-13 15:44:43 +08:00
ming1753	9c559d02d3	[BugFix] Fix insert_zmq_task_to_scheduler break bug (#5960 ) * [BugFix] fix zmq bug * fix bug * formate * fix test bug * fix bug	2026-01-12 19:21:01 -08:00
sunxin	2533836dbb	[Optimization] Accelerate Qwen3 QK RMSNorm via Fused Triton Kernel (#5880 ) * qk rmsnorm fused * inplace * glm * fix * add qknorm layer * fix * update * fix qwen3 vl * update rl baseline * fix qwen3 vl moe * test * fix qwen vl moe rl * fix	2026-01-12 05:10:21 -08:00
xjkmfa	1aa7e82924	[ci case]Check the chunking of the chat interface (#5981 ) * Add ci case for min token and max token * 【CI case】include total_tokens in the last packet of completion interface stream output * [ci case] add Chunk segmentation check * [ci case] add Chunk segmentation check * [ci case] add Chunk segmentation check * [ci case] add Chunk segmentation check --------- Co-authored-by: xujing43 <xujing43@baidu.com>	2026-01-12 16:36:13 +08:00
ddchenhao66	fefc0b8382	[XPU]add ci test cast for P_EP4TP4 D_EP4TP1 (#5988 ) Co-authored-by: ddchenhao66 <dhaochen163.com>	2026-01-12 16:30:15 +08:00
Yonghua Li	60ee72f682	[BugFix] [MultiAPIServer] fix rdma script and port check for multi api server (#5935 ) * [fix] fix rdma script and add more error log for multi api server * [fix] log * [fix] fix test_multi_api_server * [fix] fix multi api server port check --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2026-01-12 10:38:52 +08:00
zhupengyang	9db48ecb34	[XPU] fix dp4 (#5946 )	2026-01-09 20:36:53 +08:00
MingkunZhang	384ffd6952	[Metax] add ci test file & update run_ci_metax.sh (#5975 ) Co-authored-by: root <root@lt-wks-10-0-180-15.pub.metax-tech.com>	2026-01-09 18:47:06 +08:00
xiaoxiaohehe001	00a01ae024	[Feature] Support redundant expert for eplb (#5918 ) * [BugFix] support redundant expert for eplb * support redundant expert for eplb * support redundant expert for eplb * update * fix ci eplb	2026-01-09 17:13:24 +08:00
CSWYF3634076	e6cdea4492	[Models] Qwen3VL and Qwen3VL-Moe CUDA graph Support (#5962 ) * [Models] add Qwen3VL and Qwen3VL-Moe CUDA graph support * [Models] add Qwen3VL and Qwen3VL-Moe CUDA graph support v2 * [Models] add Qwen3VL and Qwen3VL-Moe CUDA graph support v3	2026-01-09 17:09:02 +08:00
zccjjj	20de04e249	[XPU] move xpu_attn_backend.py to FastDeploy/fastdeploy/model_executor/layers/backends/xpu (#5878 )	2026-01-09 16:34:57 +08:00
essos	1d20957340	[CI]【Hackathon 9th Sprint No.50】NO.50 功能模块 fastdeploy/entrypoints/engine_client.py 单测补充 -part #5045 (#5807 ) * update test code * 减少 mock * fix style --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2026-01-09 15:13:19 +08:00
GoldPancake	3ca99ab170	[Speculative Decoding] Return accepted tokens per head in response (#5947 ) * adjust log level * add accepted tokens per head * fix ut * fix	2026-01-09 14:32:08 +08:00
kevin	2d2b156252	[BugFix] fix dyc8 cache bug (#5958 ) * fix dyc8 cache bug * update code	2026-01-08 19:25:47 -08:00
GoldPancake	e41d434548	[Bugfix] Fix entropy calculation bugs (#5941 ) * fix entropy bugs	2026-01-08 20:57:35 +08:00
CSWYF3634076	d8fcb7c07d	[Models] Add Qwen3-VL Moe Model Support (#5913 ) * [Model] add Qwen3vl moe model support * [Model] add Qwen3vl moe model support remove log * [Model] add Qwen3vl moe model support unittest	2026-01-08 11:36:42 +08:00
xunyoyo	78adf83549	[CI] 【Hackathon 9th Sprint No.18】NO.18 功能模块单测补充 -new (#5717 ) * Remove paddle import guards from DeepEP tests * Sort imports in DeepEP tests * Refactor assertions for combine handle in test_ep.py Updated assertions to verify combine handle in DeepEPEngine. * Add moe_select coverage in DeepEP tests * Refactor assertions for combine handle in test_ep * Strengthen moe_select assertions in DeepEP tests * Update test_ep.py --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2026-01-07 17:20:59 +08:00
kevin	eabd01cd21	[BugFix] fix eb5 prefix bug (#5879 ) * fix eb5 prefix bug * update ci test * update code * update code * update code * update code * update code * update code * update code	2026-01-06 23:50:39 -08:00
ddchenhao66	733014bf32	[XPU] Support EP4TP1 in pd disaggregation (#5860 ) Co-authored-by: ddchenhao66 <dhaochen163.com>	2026-01-06 15:25:36 +08:00
Yonghua Li	9445fbe054	[KVCache] launch cache transfer processes only if hierarchical cache or kv cache storage is enabled (#5871 ) * [fix] temporarily forbid cpu cache in update/clear api * [fix] stop launching cache transfer manager unless hierarchical cache is enabled * [fix] fix no attr hierarchical cache * [fix] fix ci * [fix] fix test_prefix_cache_manager.py	2026-01-06 14:27:47 +08:00
Yonghua Li	9fc2400e71	[BugFix] fix mtp cache attaching for pd disaggregation (#5884 ) * [fix] fix mtp cache attaching for pd disaggregation * [fix] fix test_mtp_proposer.py	2026-01-06 14:17:53 +08:00
lizexu123	acdf0cd1d9	fix hadamard_block_size (#5888 )	2026-01-06 14:12:14 +08:00
周周周	7a0744f05a	[UT]support attention test tp (#5887 )	2026-01-06 11:15:01 +08:00
Jiaxin Sui	2785b820c8	[XPU][CI] Add XPU logprobs case (#5874 ) * Enhance run_ci_xpu.sh with caching and prefill options * Update model path and configuration in run_ci_xpu.sh * Add '北朝' keyword to assertion in run_45vl.py * Enhance process termination logic in run_ci_xpu.sh * Set timeout for CI_XPU job to 60 minutes * Remove extra newline in stop_processes function * Update paddlepaddle-xpu installation command Comment out the previous paddlepaddle-xpu installation command and replace it with a specific version installation due to EP parallel error. * Update PaddlePaddle installation command * Remove max_tokens from model response configuration Removed max_tokens parameter from the model response call. * add xpu logprobs case * Fix formatting and improve setup_logprobs_env Add newline at end of file and update setup_logprobs_env function. * Refactor test_logprobs_21b_tp4.py for clarity * Change top_p value from 1.0 to 0 --------- Co-authored-by: root <root@gajl-bbc-onlinec-com-1511972.gajl.baidu.com>	2026-01-05 19:01:14 +08:00
jc	8d384f9fd8	[PD Disaggregation] Update usage of pd disaggregation and data parallel (#5742 ) * Update usage of pd disaggregation * up * up * up * up * up * up * up * up * up * up dp docs * up * up * up * fix unittest	2026-01-05 17:51:29 +08:00
jc	e911ac2ce7	[BugFix] Refine the preparation of cpu and storage cache (#5777 ) * Refine the preparation of cpu and storage cache * fix error * fix error * up * fix * up docs * fix unittest * remove debug info	2026-01-05 10:13:30 +08:00
kevin	52dc9a7b85	[BugFix] skip mm revert (#5848 ) * skip mm revert * update code * update test	2026-01-04 14:25:45 +08:00
周周周	e3957a5ebc	[Others] remove template NUM_EXPERTS_PER_RANK in permute_x_fp8_kernel (#5620 )	2026-01-04 11:21:15 +08:00
GoldPancake	4e10ae5d99	[Speculative Decoding] Optimize draft logprob (#5842 ) * optimize draft logprob * fix ut	2025-12-31 13:35:56 +08:00
xjkmfa	ed60b4da32	[CI case]Prompt logprob (#5835 ) * [ci case]prompt_logprobs	2025-12-30 21:26:06 +08:00
essos	b03a4f3e3d	[CI]【Hackathon 9th Sprint No.46】NO.46 功能模块 fastdeploy/model_executor/guided_decoding/xgrammar_backend.py 单测补充 (#5042 ) * test * rename ut * remove test max_rollback_tokens * update * 精简代码 * fix: torch use mock --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2025-12-30 17:05:26 +08:00
chen	0bcf924e10	[Optimization] Optimization for gather_logprob by 10GB (#5817 ) * opt logprobs gather_logprob,reduce device memory usage by 10GB when token_num=8k	2025-12-30 15:33:34 +08:00
lizexu123	44a13e4557	[Feature] support w4afp8 v1_loader and v0_loader(tp>1) (#5757 ) * support * fix * support w4afp8 v1_loader and v0_loader * fix * fix test * fix test * fix test * fix moe.py * add test_ernie_4_5_w4afp8 * add test * delete tensor * fix test * fix * add * fix test	2025-12-30 14:11:52 +08:00
GoldPancake	e78e22ebd5	[BugFix] Fix entropy bugs (#5818 ) * fix entropy bugs * fix ut * fix	2025-12-29 20:44:29 -08:00
周周周	7ae13b2326	[PD Disaggregation]remove unsed para in RDMACommManager (#5814 )	2025-12-30 11:38:30 +08:00
Yonghua Li	a8d3e3ba12	[BugFix] fix shm opened but not closed in set_data_ipc (#5826 )	2025-12-29 23:35:07 +08:00
CSWYF3634076	9286403570	[Models] Add Qwen3-VL Model Support (#5763 ) * support v1 loader * remove useless code * remove useless * [Model] support Qwen3VL images success * [Model] support Qwen3VL rope_3d * [Model] support Qwen3VL remove log * [Model] support Qwen3VL RL * [Model] support Qwen3VL tp * [Model] support Qwen3VL video * [Model] support Qwen3VL fix ernievl * [Model] support Qwen3VL fix get_image_boundaries.cc array out of bounds * [Model] support Qwen3VL fix multi card * [Model] support Qwen3VL file close * [Model] support Qwen3VL fix ce * [Model] support Qwen3VL fix unittest * [Model] support Qwen3VL add unittest --------- Co-authored-by: Ayakouji <yuhongh@qq.com>	2025-12-29 17:39:33 +08:00
essos	ffb3ccff74	[CI]【Hackathon 9th Sprint No.52】NO.52 功能模块 fastdeploy/model_executor/guided_decoding/ernie_tokenizer.py 单测补充 (#5047 ) * add test * update test * 精简代码 * 去除 mock --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2025-12-29 13:44:56 +08:00
xunyoyo	7e39560a42	[CI] 【Hackathon 9th Sprint No.33】NO.33 功能模块单测补充 -new (#5726 ) * Add cache messager coverage tests * Add default_dtype parameter to test cache manager --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2025-12-29 13:42:27 +08:00

1 2 3 4 5 ...

626 Commits