FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Author	SHA1	Message	Date
kevin	741a01562b	[BugFix][Cherry-Pick] cp fix dyc8 cache bug(#5958 ) (#5959 ) * cp fix dyc8 cache bug * udpate code	2026-01-08 19:25:56 -08:00
GoldPancake	8049a4982e	[Cherry-Pick][Bugfix] Fix entropy calculation bugs (#5941 ) (#5942 ) * fix entropy bug	2026-01-08 20:57:45 +08:00
Jiaxin Sui	7cdffced2d	[Cherry Pick][XPU][CI] Add logprobs Case (#5907 ) * Implement setup_logprobs_env for environment setup Add setup_logprobs_env function to manage environment variables for logprobs. * Update conftest.py * Add logprobs test for ERNIE-4.5-21B-A3B model This test verifies the logprobs functionality of the ERNIE-4.5-21B-A3B model through direct HTTP requests, ensuring correct response structure and log probabilities. * Fix indentation and formatting in conftest.py	2026-01-07 18:37:49 +08:00
kevin	939dfa4877	[BugFix][Cherry-Pick] Cp fix eb5 prefix cache(#5879 ) (#5881 ) * fix eb5 prefix bug * update code * update code * update code * update code	2026-01-06 23:49:32 -08:00
freeliuzc	dcb0cceded	[Speculative Decoding] Fix attn_mask_offset for multi-step MTP in mixed and PD-split modes (#5738 ) (#5793 ) * fix attn_mask_offset in mtp with multi-step and pd-split-mode * fix xpu operater register * update pmtp multi-step mtp strategy in d-split -mode * add note * fix xpu register Co-authored-by: Yuanle Liu <yuanlehome@163.com>	2026-01-05 07:59:17 -08:00
YuBaoku	2a71e427f9	[Cherry-Pick][CI] Fix archive URL injection and add retry(#5725,#5828) (#5832 )	2026-01-04 17:10:03 +08:00
chen	9a7eb33fd4	[Cherry-Pick][Optimization] Optimization for gather_logprob by 10GB (#5817 )(#5846 ) (#5834 ) * [Optimization] Optimization for gather_logprob by 10GB (#5817) * opt logprobs gather_logprob,reduce device memory usage by 10GB when token_num=8k * only cuda run triton op (#5846)	2025-12-31 19:54:14 +08:00
kevin	20024b889c	[Cherry-Pick][BugFix] cp skip_mm_revert(#5848 ) (#5849 ) * cp skip_mm_revert * update test	2025-12-31 17:29:49 +08:00
GoldPancake	f33e642327	[Cherry-Pick][Speculative Decoding] Optimize draft logprob (#5842 ) (#5843 ) * optimize draft logprob * fix ut	2025-12-31 10:43:44 +08:00
GoldPancake	0d29f6df03	[Cherry-Pick][BugFix] Fix entropy bugs (#5818 ) (#5819 ) * [Speculative Decoding] Fix attn_mask_offset for multi-step MTP in mixed and PD-split modes (#5738) * fix attn_mask_offset in mtp with multi-step and pd-split-mode * fix xpu operater register * update pmtp multi-step mtp strategy in d-split -mode * add note * fix xpu register * fix entropy bugs * Revert "[Speculative Decoding] Fix attn_mask_offset for multi-step MTP in mixed and PD-split modes (#5738)" This reverts commit ba0d35a52e8775300a1459bfcaa39056df570525. * fix ut * fix --------- Co-authored-by: freeliuzc <lzc842650834@gmail.com>	2025-12-29 20:45:03 -08:00
Yonghua Li	ca4ccf2397	[BugFix] fix shm opened but not closed in set_data_ipc (#5827 )	2025-12-29 23:35:31 +08:00
kxz2002	df775c2811	[BugFix] Fix process_response_dict to support async in serving_completion (#5758 ) (#5802 ) * support process_response_dict async initial commit * fixbug * add unit test * optimize	2025-12-29 09:56:42 +08:00
kevin	c170fc4dc5	[FDConfig][Cherry-Pick] Cp disable mm chunked(#5774 ) (#5775 ) * disable chunked_mm_input in ernie5 * cp_disable_mm_chunked * update test case * update code	2025-12-26 15:31:46 +08:00
周周周	d0c5bcec3d	[cherry-pick] support FA3 in mixed mode and support Qwen3 rope (#5655 ) * [Others] Remove useless code (#5404) * FA3 support qwen3 (#5441) * commit	2025-12-25 11:11:16 +08:00
bukejiyu	fc3bccc5b6	[Cherry-Pick][Others]upgrade paddleformer to 0.4.0 #5599 (#5716 ) * update 0.4.0 * update	2025-12-24 06:28:50 -08:00
YuBaoku	70163ddb6b	[Cherry-Pick][CI] Refactor RL tests to reuse upload_clear(#5741 ) (#5755 ) * [Cherry-Pick][CI] Refactor RL tests to reuse upload_clear(#5741)	2025-12-24 21:15:35 +08:00
GoldPancake	e51af01a65	[Cherry-Pick][Feature] Entropy calculation support #5692 (#5731 ) * support entropy * add script --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-24 15:42:43 +08:00
YuBaoku	f50988d917	[Cherry-Pick][CI] Revert adapt vl_model baseline changes due to Paddle update(#5732 ) (#5733 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details * [Cherry-Pick][CI] Revert adapt vl_model baseline changes due to Paddle update(#5732) --------- Co-authored-by: yubaoku <yubaoku@baidu.com>	2025-12-24 12:14:34 +08:00
kevin	23bfd28624	[Cherry-Pick][BugFix] cp fix_cpu_cache_bugs(#5544 ) (#5577 ) * cp fix_cpu_cache_bugs * update ce case * update test case * update code	2025-12-19 11:48:50 +08:00
Yuanle Liu	0cb9ad186e	[Cherry-Pick][BugFix] fix speculate_limit_thinking_content_length #5590 (#5615 )	2025-12-18 01:50:18 -08:00
GoldPancake	e56c4dd0a8	[Cherry-Pick] Support for request-level speculative decoding metrics monitoring.(#5518 ) (#5614 ) * support spec metrics monitor per request	2025-12-17 20:53:04 +08:00
qwes5s5	d67b64d5e1	add detoken switch (#5463 ) (#5572 )	2025-12-17 17:04:45 +08:00
freeliuzc	a7359d1c1d	[Cherry-Pick][CI]Support different inferseed in speculate decoding(#5568 ) (#5597 ) * fix mtp entropy drop in RL * optimize usage and fix unit test * optimize padding_sampling_params speed(vectorized)	2025-12-17 16:53:47 +08:00
YuBaoku	53158b7f8d	[Cherry-Pick][CI] Adape unit_test due to incompatibility change(#5578 ) (#5583 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details * [CI] Remove test_metrics.py due to incompatible forced merge (#5578) * [CI] Adapt vl_model baseline changes due to Paddle update (#5576)	2025-12-16 15:45:49 +08:00
YuBaoku	b43563977d	[CI] disable test_cuda_graph_dynamic_subgraph.py in unit_test	2025-12-11 14:14:30 +08:00
zccjjj	bcde798098	[CI][XPU] ep+prefix cache+chunk prefill (#5490 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details	2025-12-10 19:40:38 +08:00
RAM	707d1a1fc9	[New][RL] Support Rollout Routing Replay (#5405 ) (#5408 ) * [RL] Support Rollout Routing Replay * add routing indices cache * fix config bug and moe forward bug * R3 Support GLM * support eb4.5 * fix merge bug * Apply suggestion from @Copilot * Apply suggestion from @Copilot * Apply suggestion from @Copilot * Apply suggestion from @Copilot * add routing replay ci * support glm topk * support orther top_k * fix ci bug * pre-commit * only support chatcmpl * Revert "Revert "[RL] Support Rollout Routing Replay (#5321)" (#5402)" This reverts commit `c45e064f3d`. * Fix XPU and NPU bug --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Yuanle Liu <yuanlehome@163.com>	2025-12-08 10:00:35 +08:00
Jiang-Jia-Jun	c45e064f3d	Revert "[RL] Support Rollout Routing Replay (#5321 )" (#5402 ) This reverts commit `96d2d4877b`.	2025-12-05 20:19:39 +08:00
lizexu123	d4979347ca	[Bug fix] Fix the multi-input accuracy issue in the pooling model. (#5374 ) * fix multi-inputs * fix threshold * fix threshold * fix	2025-12-05 20:18:17 +08:00
RAM	96d2d4877b	[RL] Support Rollout Routing Replay (#5321 ) * [RL] Support Rollout Routing Replay * add routing indices cache * fix config bug and moe forward bug * R3 Support GLM * support eb4.5 * fix merge bug * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * add routing replay ci * support glm topk * support orther top_k * fix ci bug * pre-commit * only support chatcmpl --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Yuanle Liu <yuanlehome@163.com>	2025-12-05 20:01:33 +08:00
kevin	c9d7f9e7c3	[BugFix] fix async download bug (#5349 ) * fix async download bug * update log * Revert "update log" This reverts commit `5816e602f4`. * update code * fix mtp bug	2025-12-05 18:59:12 +08:00
zccjjj	5b900667e3	[XPU] support ep4tp1+v1 loader (#5398 )	2025-12-05 18:51:15 +08:00
zccjjj	e927c65742	[XPU] [Optimization] [EP] EP communication optimization. (#5145 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-12-05 10:03:45 +08:00
YuBaoku	1b5fd79d6b	[CI] disable test_schedule_output.py in unit_test (#5377 )	2025-12-04 23:18:23 +08:00
chenjian	3878a99b69	[Fearture] Support cache kv cache for output tokens (#4535 ) * [Fearture] Support cache kv cache for output tokens * fix bug * fix ci bug * improve coverage * enable output caching by default * fix ci --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2025-12-04 20:53:08 +08:00
Longzhi Wang	5cd17fd662	[Models] Add forward_meta to moe models' forward function (#5138 ) * [Models] Add forward_meta to moe models' forward function * fix missing param * fix * fix * fix forward_meta * fix test and remove chunked MoE releated in config * fix test * fix * fix	2025-12-04 13:26:58 +08:00
Juncai	f5bdb36e9b	Reduce timeout in unittest (#5366 )	2025-12-04 13:19:02 +08:00
lizexu123	946025480e	[Bug fix] fix pooling models (#5358 ) * fix * fix * fix test * fix gpu_model_runner --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-04 11:06:30 +08:00
qwes5s5	a52aea073c	fix logprobs (#5335 )	2025-12-04 10:38:51 +08:00
ming1753	5f8d4aedea	[Feature] support audio tts (#5333 )	2025-12-03 21:06:48 +08:00
Daci	83dbc4e5dd	[Feature] Guided Decoding add LLguidance backend (#5124 ) * llguidance * add requirements_guided_decoding.txt and doc * fix test_guidance_.py fix test_guidance_.py && mv fix llguidance choice * test_guidance_* * rm lazy loader --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-03 20:23:57 +08:00
lzy	f458cc5ba4	[Optimization]1.fix tp+ep moe_forward; 2.set max_prefill_batch=env.MAX_PREFILL_NUM (#5353 ) * [Optimization] 1.fix tp+ep moe_forward; 2.set max_prefill_batch=env.MAX_PREFILL_NUM * fix test_chunked_moe --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-03 16:42:10 +08:00
YuBaoku	dfeabee123	[CI] Allow occasional distributed worker exit_code (#5341 )	2025-12-03 10:56:59 +08:00
YuBaoku	3e2c13d8c5	[CI] Disable queue state assertion temporarily (#5329 )	2025-12-02 18:57:29 +08:00
Sunny-bot1	3629db4129	[Quantization] Support w4afp8 MoE dynamic quantization (#5282 ) * support dynamic activation quant for w4afp8 * support dynamic w4afp8 * add test * fix * fix --------- Co-authored-by: zhoutianzi666 <17801055074@163.com>	2025-12-02 18:56:16 +08:00
周周周	fb7f951612	[UNITEST] add test (#5305 )	2025-12-02 17:59:01 +08:00
Jiaxin Sui	8e0f4dfd0c	[XPU] [CI] Xpu Ci Refactor (#5252 ) * add xpu ci * add case * add case * fix ci bug * Update Docker image tag to 'latest' in CI workflow * Fix set -e usage in run_xpu_ci_pytest.sh * add pd case * add case * Configure pip to use Tsinghua mirror for dependencies Set the global pip index URL to Tsinghua mirror. * fix ci bug * fix bug * fix bug --------- Co-authored-by: suijiaxin <suijiaxin@Suis-MacBook-Pro.local> Co-authored-by: root <root@gajl-bbc-onlinec-com-1511964.gajl.baidu.com> Co-authored-by: root <root@gajl-bbc-onlinec-com-1511972.gajl.baidu.com>	2025-12-02 17:15:51 +08:00
YuBaoku	69e003abcb	[CI] Fix return_code check in test_chunked_moe.py (#5326 )	2025-12-02 15:41:26 +08:00
lizexu123	c563eca791	[Feature] support reward model (#5301 ) * Your commit message here * add test * update develop * support reward * support enable_chunk_prefill * support bingfa * support convert is reward * update test * delete print * fix enable_thinking * add document * fix place * fix test * fix * support enable_prefix_caching * add no-enable_prefix-caching test * fix * support enable_prefix_caching * delete print * fix document * fix * fix test * fix document and delete chinese * udpate * enable_thinking * fix test	2025-12-02 14:55:31 +08:00
qwes5s5	117980dd4e	[LogProbs]Enable prompt logprobs output and modify data transmission method for the online interface. (#5089 ) * add prompt logprobs * Merge prompt_logprobs_tensors and prompt_logprobs * fix param check * trigger ci * fix unitest * fix logprobs bug	2025-12-02 13:49:51 +08:00

1 2 3 4 5 ...

518 Commits