FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Author	SHA1	Message	Date
SunLei	32b6900d01	fix code type (#6951 )	2026-03-20 16:14:12 +08:00
AIbin	bf7e2424d0	[Optimization][Feature]Supports multiple batches of DSK-DSA. (#6930 ) * support DSA_MUTI_BATCH * update test topk * update dsk-dsa	2026-03-20 15:59:22 +08:00
周周周	1c38da2118	Make seq_lens_this_time/decoder/encoder equal shape (#6942 )	2026-03-20 15:31:52 +08:00
Zhang Yulong	2b10ebc1f1	[benchmark] Refactor debug logging and payload handling (#6949 ) * Refactor debug logging and payload handling * Update backend_request_func.py	2026-03-20 15:04:10 +08:00
Zhang Yulong	3a4e139f65	[Benchmark] fix multi turn (#6948 )	2026-03-20 13:22:30 +08:00
cloudforge1	aca733b95c	[CI]【Hackathon 10th Spring No.32】load_weight_utils unit test (#6740 ) * 【Hackathon 10th Spring No.32】Unit test for load_weight_utils.py * [CI]【Hackathon 10th Spring No.32】rewrite load_weight_utils unit test * [CI]【Hackathon 10th Spring No.32】improve load_weight_utils coverage to 83% - Add test_load_ep_checkpoint_basic: exercises EP checkpoint loading with minimal fixture - Add test_composite_ep_branch: covers EP path in load_composite_checkpoint - Add test_get_weight_iterator_unordered: covers unordered sharded safetensors path * [CI]【Hackathon 10th Spring No.32】align load_weight_utils test with gold standard (tmp_path, split tests) * [CI]【Hackathon 10th Spring No.32】add coverage tests for load_weight_utils - Add test_is_layers_grouped: test layers_are_grouped() with grouped, interleaved, and no-layer keys - Add test_save_model_bf16_cache: exercise save_model decorator with is_checkpoint_bf16=True - Add test_composite_checkpoint_ep: test load_composite_checkpoint use_ep=True branch - Add test_composite_checkpoint_rank_mismatch: test tp_size != rank_dirs ValueError - Add test_composite_checkpoint_kv_quant: test float8_e4m3fn kv_cache path - Add __main__ block for direct execution * [CI]【Hackathon 10th Spring No.32】raise load_weight_utils test delta * [CI]【Hackathon 10th Spring No.32】cover TP sequence-parallel MoE load branches * test: add load_reordered_experts, pre-sharded, and empty-state tests --------- Co-authored-by: cloudforge1 <cloudforge1@users.noreply.github.com>	2026-03-20 13:14:30 +08:00
xjkmfa	3b203994e2	[Benchmark] Update Qwen3 vl 32k yaml (#6946 )	2026-03-20 11:48:53 +08:00
xjkmfa	a81116ad90	[Benchmark] Update Qwen3 vl dense yaml (#6945 )	2026-03-20 11:26:47 +08:00
sunxin	d77edf8fc9	opt wfp8afp8 triton moe (#6938 )	2026-03-20 11:07:25 +08:00
mouxin	96b0ecea6b	[Feature] Update Counter Release (#6943 )	2026-03-20 10:51:37 +08:00
luukunn	f4a79d4c00	[Optimization]Unified data processing for online and offline (#6891 ) * remove process_request * fix chat * fix unit test * remove process response * fix unit test * fix offline decode * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * fix sampling_params --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-03-19 21:56:09 +08:00
luukunn	c3d8db85c4	[Optimization] Update ZMQ server (#6735 ) * add batch zmq send reaponse * update * Revert "update" This reverts commit `0234a25b47`. * update * remove lock * fix unit test * add unit test * add unit test * pre commit * add unit test * fix unit test * add unit test * fix worker>1 * update zmq_worker_pid * fix unit test * fix unit test * fix unit test * add unit test * fix unit test * fix first token time * fix logprobs * add unit test * op * remore debug log --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2026-03-19 21:53:16 +08:00
cloudforge1	9148562ed0	[CI]【Hackathon 10th Spring No.35】resource_manager 单测补充 (#6734 ) * [CI]【Hackathon 10th Spring No.35】resource_manager 单测补充 * [CI]【Hackathon 10th Spring No.35】resource_manager 单测补充 * [CI]【Hackathon 10th Spring No.35】add __main__ block --------- Co-authored-by: cloudforge1 <cloudforge1@users.noreply.github.com> Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>	2026-03-19 17:45:21 +08:00
YuBaoku	7141db0e01	[CI] Optimize CI: update nightly test_image build workflow (#6937 )	2026-03-19 17:39:01 +08:00
周周周	b1c800b64b	remove load_up_proj_weight_first (#6932 )	2026-03-19 17:21:34 +08:00
sunxin	33e01f22a8	[Feature][Sampling] Extend top-k_top-p sampling to all backends and unify greedy decoding with top_k=1 (#6894 ) * update sampling * fix * fix * fix mtp * fix test	2026-03-19 01:43:10 -07:00
YuBaoku	2b84a4276e	[CI] Optimize CI: add timeout and cancel on PR close (#6933 )	2026-03-19 15:54:30 +08:00
JYChen	f95d8ca7df	[RL] support qkrmsnorm use proxy-norm (#6862 ) * support qkrmsnorm use paddle.nn.functional.rms_norm * remove flags in fd	2026-03-18 23:27:26 -07:00
周周周	1a05744c4e	nvfp4.py support ep (#6920 )	2026-03-19 14:07:46 +08:00
周周周	c184a7cb69	remove source in weight_loader in moe.py (#6892 )	2026-03-19 13:31:43 +08:00
Nyakku Shigure	dd93f8ffb4	[Optimization] Skip compat guard when torch is not installed (#6913 )	2026-03-19 11:29:27 +08:00
AIbin	4794a28f3d	opt glm5 model (#6916 )	2026-03-19 11:13:33 +08:00
jc	dd55cda3c8	[CI] Add test for pd and cache storage (#6876 ) * Add test for pd and cache storage * up * up * fix bug * fix bug * up docker image * up	2026-03-19 10:38:27 +08:00
gongweibao	fb6c56dfd5	[BugFix][DataProcessor] Force top_k=1 for greedy decoding when temperature=0 (#6748 ) * [BugFix] Force top_k=1 for greedy decoding when temperature=0 When temperature is set to 0 (greedy decoding), only setting temperature to a small epsilon is insufficient — the sampling kernel may still pick non-top-1 tokens. Explicitly set top_k=1 in all processors to guarantee argmax behavior. Additionally, add argmax fast-path in top_k_top_p_sampling() under FD_DETERMINISTIC_MODE to handle non-rejection sampling backends that ignore top_k parameter. * Extract greedy decoding from FD_DETERMINISTIC_MODE guard top_k=1 → argmax is a correctness optimization, not deterministic-specific. Remove the FD_DETERMINISTIC_MODE guard so all-greedy fast-path and mixed-batch override work unconditionally. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Update test_torch_model.py --------- Co-authored-by: gongweibao <gognweibao@baidu.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2026-03-18 17:36:43 +08:00
AIbin	9b117aafac	support glm-moe-dsa model (#6863 )	2026-03-18 17:21:55 +08:00
YuBaoku	07543685ec	[CI] Isolate cache and ccache for CUDA 13.0 build	2026-03-18 11:41:46 +08:00
fxyfxy777	9660f98837	[BugFix] Set FD_USE_PHI_MOE_PERMUTE = 0 Default (#6886 ) * FD_USE_PHI_MOE_PERMUTE = 0 * modify comments	2026-03-17 20:05:39 -07:00
yzwu	8b890c0d72	[Iluvatar] refactor attn and moe code (#6887 )	2026-03-18 10:31:00 +08:00
YuBaoku	0359794e08	[CI] Sync _log_softmax_batch_invariant with paddle update (#6893 )	2026-03-17 23:03:57 +08:00
mouxin	2a371a3450	[Feature] Update tpSize (#6896 )	2026-03-17 20:20:39 +08:00
lizan1999	148eee84c6	[XPU] use quant2d_per_token for weight quant int8 && fix some XPU Kernel check (#6869 )	2026-03-17 19:44:48 +08:00
Jiaxin Sui	aa9deb6ad4	[XPU] Dockerfiles update (#6898 ) * Update Dockerfile.xpu * Add build script for XPU Docker image * Refactor Dockerfile to conditionally install packages Added conditional installation for requirements and fastdeploy. * Reorder RUN commands in Dockerfile.xpu * Update Dockerfile.xpu * Delete dockerfiles/build_xpu.sh	2026-03-17 19:43:49 +08:00
gongweibao	e4c9cac124	[BugFix] Cap nvcc -t threads to avoid compilation failures on high-co… (#6885 ) * [BugFix] Cap nvcc -t threads to avoid compilation failures on high-core machines On machines with many cores (e.g. 192), the nvcc -t flag was set to os.cpu_count(), causing each nvcc process to spawn that many internal threads. Combined with Paddle's ThreadPoolExecutor launching parallel compilations (also based on cpu_count), this leads to ~28K+ threads, resource exhaustion, and silent compilation failures. The linker then cannot find the missing .o files, but a second build succeeds because already-compiled objects are cached. Cap nvcc -t at 4 to keep total parallelism reasonable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: gongweibao <gognweibao@baidu.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-03-17 19:27:45 +08:00
AIbin	cb6819d086	[Optimization][OP]support per_token_group_fp8_quant cuda kernel (#6865 ) * support per_token_group_fp8_quant cuda kernel * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * update code --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-03-17 19:17:51 +08:00
mouxin	b61731bb96	[Feature][Docs] Adjust prefill release & expose load metrics (#6884 )	2026-03-17 15:23:13 +08:00
Longzhi Wang	daaf498213	[Feature] support compute shared experts before combine for better overlap (#6697 ) * [Feature] support compute shared experts before combine for better overlap * fix test * fix xpu * fix	2026-03-17 15:18:51 +08:00
Jiang-Jia-Jun	12eb001d0c	Remove comments on multi-mode request handling Removed comments about multi-mode scenarios and request pulling.	2026-03-17 14:49:00 +08:00
jc	950366e58d	[PD Disaggregation][RL] Register to router with version and support rdma eager connect for pd (#6718 ) * [Feature] Register to router with version info for PD disaggregation Add RegisterManager for PD (Prefill-Decode) disaggregated deployment: - All instances (Prefill/Decode) register to Router with heartbeat - Prefill instances fetch Decode instance list from Router - Prefill instances establish eager RDMA connections to Decode instances - Register info includes: host_ip, port, role, version, is_paused, connected_decodes Changes: - Add RegisterManager class for managing PD registration and RDMA connections - Add version field to ModelConfig for model version tracking - Add connected_decodes to register_info for tracking connected Decode instances - Add FD_ENABLE_PD_RDMA_EAGER_CONNECT environment variable Test fixes: - Add None checks for load_config in FDConfig.__init__ - Add version attribute to test mock model configs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refine * remove test --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-17 14:43:35 +08:00
YuBaoku	b152baeeee	[CI] disable test_batch_invariance_op_logsoftmax.py in unit_test	2026-03-17 14:43:14 +08:00
周周周	ea998dd26f	clean clean code in _load_per_tensor_weight_scale (#6868 ) Co-authored-by: “liuruian” <liuruian@baidu.com>	2026-03-17 14:06:57 +08:00
qwes5s5	3b7507a4c2	test_abort (#6743 )	2026-03-17 14:06:40 +08:00
huicongyao	eab429d05e	fix performance drop while no spec (#6866 )	2026-03-17 13:06:36 +08:00
luukunn	fe8d58a094	[Optimization]update request in tool parser&reasoning parser (#6858 ) * update request in tool parser&reasoning parser	2026-03-17 11:51:12 +08:00
RichardWooSJTU	4ed483d20b	[BugFix] Fix ep compatibility issues & Optimize permute operator (#6821 ) * fix ep compatibility issues & optimize permute operator * fix ut * fix ut	2026-03-17 10:32:11 +08:00
gongweibao	a6351dea0b	[BugFix][Optimization] Replace silent failures with catchable exceptions and informative error messages (#6533 ) * init * init * fix format * add * add files * add ut * fix some * add ut * add more * add * fix pre-commit * fix pre-commit * fix cover * skip long seq * add * add * fix * remove not need * fix set attr * fix comments * fix comments * fix failed tests --------- Co-authored-by: gongweibao <gognweibao@baidu.com>	2026-03-16 21:32:43 +08:00
Jiang-Jia-Jun	d113397b09	Simplify available_blocks assignment logic (#6819 )	2026-03-16 20:12:30 +08:00
Longzhi Wang	5c92f4d0cd	[Feature] Add deepgemm bias epilogue for SM100 (#6857 ) * [Feature] Add deepgemm bias epilogue for SM100 * fix	2026-03-16 20:12:00 +08:00
Jiang-Jia-Jun	bd4b6092dd	Update title and activity section in README_CN.md	2026-03-16 19:21:50 +08:00
Jiang-Jia-Jun	c5f402e7aa	Update title and release note in README_CN.md	2026-03-16 19:17:38 +08:00
AIbin	c9f7f5234e	[Optimization][BugFix]Optimize Deepseek networking code (#6861 ) * update dsk model * update dsk model	2026-03-16 16:52:43 +08:00

1 2 3 4 5 ...

4862 Commits