FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Author	SHA1	Message	Date
YuBaoku	5218d40af6	[CI] Add clang-format 13.0.0 recommendation to pre_commit.sh	2026-01-08 21:47:19 +08:00
GoldPancake	e41d434548	[Bugfix] Fix entropy calculation bugs (#5941 ) * fix entropy bugs	2026-01-08 20:57:35 +08:00
Jiang-Jia-Jun	b9663e5c89	Revise Pull Request guidelines and language section Updated instructions for Pull Request titles and descriptions, changed language section to 'Others', and added notes on code style and pre-commit usage.	2026-01-08 19:26:05 +08:00
Copilot	6825903559	[BugFix] Fix misleading logging in worker_process for request counting (#5939 ) * Initial plan * Optimize logging in worker_process to accurately reflect request types Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * Address feedback: rename to max_occupied_batch_index and simplify logging Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * Improve comment clarity for batch request counting Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * Fix code style: reorder imports with isort Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2026-01-08 16:36:22 +08:00
xiaoluomi	2bb838fed9	[TSP] last_norm allgather move to model.py (#5924 ) * support_lastnorm_gather_split_dev * support_lastnorm_gather_split_dev1 * support_lastnorm_gather_split_dev3 * support_lastnorm_gather_split_dev4 * support_lastnorm_gather_split_dev5	2026-01-07 23:36:33 -08:00
Bingoo	8e11d719f3	add flashinfer-python-paddle depend (#5912 ) Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>	2026-01-08 15:08:35 +08:00
GoldPancake	a1fc4e249e	[Bugfix] Fix mtp logprob hang problem when include stop_seq (#5927 ) * fix mtp logprob hang when include stop_seq	2026-01-08 14:21:24 +08:00
Jiaxin Sui	dc170e3005	[XPU][CI]Update CI workflow to include all file types (#5943 ) Removed paths-ignore for markdown and text files.	2026-01-08 12:03:26 +08:00
FocusLuo	decbbb3933	[INTEL HPU] support only one release package of PaddleCustomDevice (#5910 ) Signed-off-by: Luo, Focus <focus.luo@intel.com>	2026-01-08 11:57:13 +08:00
CSWYF3634076	d8fcb7c07d	[Models] Add Qwen3-VL Moe Model Support (#5913 ) * [Model] add Qwen3vl moe model support * [Model] add Qwen3vl moe model support remove log * [Model] add Qwen3vl moe model support unittest	2026-01-08 11:36:42 +08:00
Daci	d8c6ba61f3	[BugFix] resource_manager_v1 lock PD (#5616 ) * bugfix resource_manager_v1 lock PD * with lock add_prefilled_request --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2026-01-08 10:02:54 +08:00
YuBaoku	5088d4acdb	[CI] Add daily build_linux jobs for CUDA 12.9 (#5936 ) To extend the daily CI coverage by adding Linux build jobs for CUDA 12.9.	2026-01-07 23:20:11 +08:00
FocusLuo	64f910553e	[INTEL_HPU] supported ERNIE-4.5-21B-A3B-Thinking (#5891 ) ERNIE-4.5-21B-A3B-Thinking needs to use DefaultModelLoaderV1 mode reference command line: ENABLE_V1_KVCACHE_SCHEDULER=1 FD_ENC_DEC_BLOCK_NUM=8 HPU_PERF_BREAKDOWN_SYNC_MODE=1 \ HPU_WARMUP_BUCKET=0 MAX_PREFILL_NUM=1 FD_ATTENTION_BACKEND=HPU_ATTN \ python -m fastdeploy.entrypoints.openai.api_server --model \ ./models--baidu--ERNIE-4.5-21B-A3B-Thinking/snapshots/4341bb42644d5422859509fa25d41544c57181f8/ \ --port 8388 --engine-worker-queue-port 8302 --metrics-port 8301 \ --cache-queue-port 8303 --max-model-len 16384 --tensor-parallel-size 1 \ --load-choices "default_v1" --num-gpu-blocks-override 5000 --kv-cache-ratio 0.5 \ --max-num-seqs 128 --block-size 64 --no-enable-prefix-caching \ --graph-optimization-config '{"use_cudagraph":false}' Signed-off-by: Luo, Focus <focus.luo@intel.com>	2026-01-07 21:31:53 +08:00
mouxin	0a92e96f20	[Feature] Add Golang-based Router for Request Scheduling and Load Balancing (#5882 ) * [Feature] add golang router * [Feature] add golang router * [Feature] add golang router * [Feature] add golang router * [Feature] add golang router * [Feature] Add Golang-based Router for Request Scheduling and Load Balancing * [Feature] Add Golang-based Router for Request Scheduling and Load Balancing * [Feature] Add Golang-based Router for Request Scheduling and Load Balancing * [Feature] Add Golang-based Router for Request Scheduling and Load Balancing --------- Co-authored-by: mouxin <mouxin@baidu.com>	2026-01-07 21:28:08 +08:00
chenjian	925e7edd3c	[Bug fix] Limit multi-modal request to 1 (#5901 )	2026-01-07 20:25:07 +08:00
lizhenyun01	2be8656c29	[BugFix] fix mtp split kv attetion (#5920 ) * [BugFix] fix mtp split kv attetion * clean code * clean code	2026-01-07 04:07:31 -08:00
chenjian	c883a2d3ec	[Optimization] Reduce preemption occurrence when blocks not enough (#5696 ) * [Optimize] Reduce preemption occurrence when blocks not enough for decoding * fix * fix * fix spell * optimize performance * fix	2026-01-07 20:01:16 +08:00
xunyoyo	78adf83549	[CI] 【Hackathon 9th Sprint No.18】NO.18 功能模块单测补充 -new (#5717 ) * Remove paddle import guards from DeepEP tests * Sort imports in DeepEP tests * Refactor assertions for combine handle in test_ep.py Updated assertions to verify combine handle in DeepEPEngine. * Add moe_select coverage in DeepEP tests * Refactor assertions for combine handle in test_ep * Strengthen moe_select assertions in DeepEP tests * Update test_ep.py --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2026-01-07 17:20:59 +08:00
Ryan	3e74bacc5e	add m_grouped_gemm_fp8_fp8_bf16_nt_contiguous_custom_python_op (#5847 )	2026-01-07 16:17:55 +08:00
kevin	eabd01cd21	[BugFix] fix eb5 prefix bug (#5879 ) * fix eb5 prefix bug * update ci test * update code * update code * update code * update code * update code * update code * update code	2026-01-06 23:50:39 -08:00
kevin	a76e8ae40c	[Feature] support rdma pd dy-c8 (#5788 ) * add rdma pd dy-c8 * update code	2026-01-07 14:55:25 +08:00
周周周	f15df1ec89	Revert cuda check (#5915 ) * commit * commit	2026-01-07 14:40:18 +08:00
yzwu	29898372e9	[Iluvatar] remove CUDA_VISIBLE_DEVICE in run_ci_iluvatar.sh (#5916 )	2026-01-07 14:10:47 +08:00
Jiang-Jia-Jun	15179ab730	Revise language guidelines for PR reviews Updated language instructions for PR comments.	2026-01-07 13:34:02 +08:00
yangjianfengo1	59523b27de	opt w4afp8 (#5853 )	2026-01-07 12:22:35 +08:00
sunxin	6ee8241521	[V1 Loader] Support loading static C8 scale JSON (#5909 ) * v1 loader: support loading static C8 scale JSON * update	2026-01-06 19:49:30 -08:00
MingkunZhang	7ad5737560	[Metax] adapt to gemm interface on different versions of maca (#5905 ) Co-authored-by: root <root@lt-wks-10-0-180-15.pub.metax-tech.com>	2026-01-07 10:02:24 +08:00
fmiao2372	1ee285c2d6	[Intel HPU] enable chunked prefill (#5903 ) * [Intel HPU] enable chunked prefill * fix bug by copilot comments	2026-01-06 21:01:50 +08:00
周周周	83ae59431e	[BugFix] fix BatchMLAWithPagedKVCacheKernel O_tmp (#5895 )	2026-01-06 15:39:06 +08:00
ddchenhao66	733014bf32	[XPU] Support EP4TP1 in pd disaggregation (#5860 ) Co-authored-by: ddchenhao66 <dhaochen163.com>	2026-01-06 15:25:36 +08:00
gaoziyuan	e99ec4c9d5	[Bugfix]fix model weight signal tensor num (#5900 )	2026-01-06 14:36:59 +08:00
Yonghua Li	9445fbe054	[KVCache] launch cache transfer processes only if hierarchical cache or kv cache storage is enabled (#5871 ) * [fix] temporarily forbid cpu cache in update/clear api * [fix] stop launching cache transfer manager unless hierarchical cache is enabled * [fix] fix no attr hierarchical cache * [fix] fix ci * [fix] fix test_prefix_cache_manager.py	2026-01-06 14:27:47 +08:00
Yonghua Li	9fc2400e71	[BugFix] fix mtp cache attaching for pd disaggregation (#5884 ) * [fix] fix mtp cache attaching for pd disaggregation * [fix] fix test_mtp_proposer.py	2026-01-06 14:17:53 +08:00
jc	e9b25aa72f	[BugFix] Storage backend gets env params (#5892 ) * Storage backend gets env params * up * up * up	2026-01-06 14:14:17 +08:00
lizexu123	acdf0cd1d9	fix hadamard_block_size (#5888 )	2026-01-06 14:12:14 +08:00
qwes5s5	b3ca7f041a	[BugFix] Fix redundant prompt_logprobs in the second chunk of streaming response when return_token_ids is enabled for v1/completions and fix trace file name (#5829 ) * fix prompt logprobs bug * fix trace file name --------- Co-authored-by: qwes5s5 <root@yq01-sys-rpm26xc1knu.yq01.baidu.com>	2026-01-06 14:11:43 +08:00
freeliuzc	ca574119e5	support multi-step draft-model with cudagraph (#5886 )	2026-01-06 11:16:21 +08:00
周周周	7a0744f05a	[UT]support attention test tp (#5887 )	2026-01-06 11:15:01 +08:00
Copilot	5c53193c4e	[Docs] Update GPU version from 2.3.0 to 2.3.2 in installation documentation (#5894 ) * Initial plan * Update GPU version from 2.3.0 to 2.3.2 in NVIDIA GPU installation documentation Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2026-01-06 11:06:32 +08:00
Yuanle Liu	5e729bc2ba	[OPs] ep_moe_expert_dispatch.cu dispatch num_experts_per_rank 5 (#5890 )	2026-01-06 10:39:35 +08:00
Neil Zhu	272a371635	[Metax] optimize flash attention backend (#5876 )	2026-01-06 09:52:09 +08:00
周周周	ab553b3b8b	revert cuda_check (#5883 )	2026-01-05 20:51:31 +08:00
Jiaxin Sui	2785b820c8	[XPU][CI] Add XPU logprobs case (#5874 ) * Enhance run_ci_xpu.sh with caching and prefill options * Update model path and configuration in run_ci_xpu.sh * Add '北朝' keyword to assertion in run_45vl.py * Enhance process termination logic in run_ci_xpu.sh * Set timeout for CI_XPU job to 60 minutes * Remove extra newline in stop_processes function * Update paddlepaddle-xpu installation command Comment out the previous paddlepaddle-xpu installation command and replace it with a specific version installation due to EP parallel error. * Update PaddlePaddle installation command * Remove max_tokens from model response configuration Removed max_tokens parameter from the model response call. * add xpu logprobs case * Fix formatting and improve setup_logprobs_env Add newline at end of file and update setup_logprobs_env function. * Refactor test_logprobs_21b_tp4.py for clarity * Change top_p value from 1.0 to 0 --------- Co-authored-by: root <root@gajl-bbc-onlinec-com-1511972.gajl.baidu.com>	2026-01-05 19:01:14 +08:00
lizexu123	1d3ae7c024	[BugFix] fix w4afp8 tp=8 (#5868 ) * fix w4afp8 tp=8 * fix	2026-01-05 18:59:02 +08:00
tianhaodongbd	6f14b180e3	[RL] Change 'model' to the instance variable 'tmp_model' (#5872 )	2026-01-05 02:09:02 -08:00
ming1753	f50e1bcc16	[Others] enable use PFCC deep_ep (#5822 ) * upstream deep_ep * fix bug * fix bug * modify env name	2026-01-05 02:07:01 -08:00
jc	8d384f9fd8	[PD Disaggregation] Update usage of pd disaggregation and data parallel (#5742 ) * Update usage of pd disaggregation * up * up * up * up * up * up * up * up * up * up dp docs * up * up * up * fix unittest	2026-01-05 17:51:29 +08:00
cmcamdy	690d4bcdb0	[XPU] Speculative Decoding with PD (#5856 ) * [XPU] Speculative Decoding with PD * fix post process * share kv cache sender * support speculate decoding step system cache * support speculate decoding step system cache --------- Co-authored-by: root <root@gajl-bbc-onlinec-com-1512108.gajl.baidu.com>	2026-01-05 17:31:03 +08:00
chen	ac39c0f887	support fa3 qwen-vl rope (#5869 )	2026-01-05 15:29:34 +08:00
sunxin	adb91dcacc	[BugFix] Fix wint4 ep issue caused by empty run (#5870 )	2026-01-05 14:24:37 +08:00

1 2 3 4 5 ...

4373 Commits