kevin
741a01562b
[BugFix][Cherry-Pick] cp fix dyc8 cache bug( #5958 ) ( #5959 )
...
* cp fix dyc8 cache bug
* udpate code
2026-01-08 19:25:56 -08:00
GoldPancake
8049a4982e
[Cherry-Pick][Bugfix] Fix entropy calculation bugs ( #5941 ) ( #5942 )
...
* fix entropy bug
2026-01-08 20:57:45 +08:00
Jiaxin Sui
7cdffced2d
[Cherry Pick][XPU][CI] Add logprobs Case ( #5907 )
...
* Implement setup_logprobs_env for environment setup
Add setup_logprobs_env function to manage environment variables for logprobs.
* Update conftest.py
* Add logprobs test for ERNIE-4.5-21B-A3B model
This test verifies the logprobs functionality of the ERNIE-4.5-21B-A3B model through direct HTTP requests, ensuring correct response structure and log probabilities.
* Fix indentation and formatting in conftest.py
2026-01-07 18:37:49 +08:00
kevin
939dfa4877
[BugFix][Cherry-Pick] Cp fix eb5 prefix cache( #5879 ) ( #5881 )
...
* fix eb5 prefix bug
* update code
* update code
* update code
* update code
2026-01-06 23:49:32 -08:00
freeliuzc
dcb0cceded
[Speculative Decoding] Fix attn_mask_offset for multi-step MTP in mixed and PD-split modes ( #5738 ) ( #5793 )
...
* fix attn_mask_offset in mtp with multi-step and pd-split-mode
* fix xpu operater register
* update pmtp multi-step mtp strategy in d-split -mode
* add note
* fix xpu register
Co-authored-by: Yuanle Liu <yuanlehome@163.com >
2026-01-05 07:59:17 -08:00
YuBaoku
2a71e427f9
[Cherry-Pick][CI] Fix archive URL injection and add retry(#5725,#5828) ( #5832 )
2026-01-04 17:10:03 +08:00
chen
9a7eb33fd4
[Cherry-Pick][Optimization] Optimization for gather_logprob by 10GB ( #5817 )( #5846 ) ( #5834 )
...
* [Optimization] Optimization for gather_logprob by 10GB (#5817 )
* opt logprobs gather_logprob,reduce device memory usage by 10GB when token_num=8k
* only cuda run triton op (#5846 )
2025-12-31 19:54:14 +08:00
kevin
20024b889c
[Cherry-Pick][BugFix] cp skip_mm_revert( #5848 ) ( #5849 )
...
* cp skip_mm_revert
* update test
2025-12-31 17:29:49 +08:00
GoldPancake
f33e642327
[Cherry-Pick][Speculative Decoding] Optimize draft logprob ( #5842 ) ( #5843 )
...
* optimize draft logprob
* fix ut
2025-12-31 10:43:44 +08:00
GoldPancake
0d29f6df03
[Cherry-Pick][BugFix] Fix entropy bugs ( #5818 ) ( #5819 )
...
* [Speculative Decoding] Fix attn_mask_offset for multi-step MTP in mixed and PD-split modes (#5738 )
* fix attn_mask_offset in mtp with multi-step and pd-split-mode
* fix xpu operater register
* update pmtp multi-step mtp strategy in d-split -mode
* add note
* fix xpu register
* fix entropy bugs
* Revert "[Speculative Decoding] Fix attn_mask_offset for multi-step MTP in mixed and PD-split modes (#5738 )"
This reverts commit ba0d35a52e8775300a1459bfcaa39056df570525.
* fix ut
* fix
---------
Co-authored-by: freeliuzc <lzc842650834@gmail.com >
2025-12-29 20:45:03 -08:00
Yonghua Li
ca4ccf2397
[BugFix] fix shm opened but not closed in set_data_ipc ( #5827 )
2025-12-29 23:35:31 +08:00
kxz2002
df775c2811
[BugFix] Fix process_response_dict to support async in serving_completion ( #5758 ) ( #5802 )
...
* support process_response_dict async initial commit
* fixbug
* add unit test
* optimize
2025-12-29 09:56:42 +08:00
kevin
c170fc4dc5
[FDConfig][Cherry-Pick] Cp disable mm chunked( #5774 ) ( #5775 )
...
* disable chunked_mm_input in ernie5
* cp_disable_mm_chunked
* update test case
* update code
2025-12-26 15:31:46 +08:00
周周周
d0c5bcec3d
[cherry-pick] support FA3 in mixed mode and support Qwen3 rope ( #5655 )
...
* [Others] Remove useless code (#5404 )
* FA3 support qwen3 (#5441 )
* commit
2025-12-25 11:11:16 +08:00
bukejiyu
fc3bccc5b6
[Cherry-Pick][Others]upgrade paddleformer to 0.4.0 #5599 ( #5716 )
...
* update 0.4.0
* update
2025-12-24 06:28:50 -08:00
YuBaoku
70163ddb6b
[Cherry-Pick][CI] Refactor RL tests to reuse upload_clear( #5741 ) ( #5755 )
...
* [Cherry-Pick][CI] Refactor RL tests to reuse upload_clear(#5741 )
2025-12-24 21:15:35 +08:00
GoldPancake
e51af01a65
[Cherry-Pick][Feature] Entropy calculation support #5692 ( #5731 )
...
* support entropy
* add script
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-12-24 15:42:43 +08:00
YuBaoku
f50988d917
[Cherry-Pick][CI] Revert adapt vl_model baseline changes due to Paddle update( #5732 ) ( #5733 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* [Cherry-Pick][CI] Revert adapt vl_model baseline changes due to Paddle update(#5732 )
---------
Co-authored-by: yubaoku <yubaoku@baidu.com >
2025-12-24 12:14:34 +08:00
kevin
23bfd28624
[Cherry-Pick][BugFix] cp fix_cpu_cache_bugs( #5544 ) ( #5577 )
...
* cp fix_cpu_cache_bugs
* update ce case
* update test case
* update code
2025-12-19 11:48:50 +08:00
Yuanle Liu
0cb9ad186e
[Cherry-Pick][BugFix] fix speculate_limit_thinking_content_length #5590 ( #5615 )
2025-12-18 01:50:18 -08:00
GoldPancake
e56c4dd0a8
[Cherry-Pick] Support for request-level speculative decoding metrics monitoring.( #5518 ) ( #5614 )
...
* support spec metrics monitor per request
2025-12-17 20:53:04 +08:00
qwes5s5
d67b64d5e1
add detoken switch ( #5463 ) ( #5572 )
2025-12-17 17:04:45 +08:00
freeliuzc
a7359d1c1d
[Cherry-Pick][CI]Support different inferseed in speculate decoding( #5568 ) ( #5597 )
...
* fix mtp entropy drop in RL
* optimize usage and fix unit test
* optimize padding_sampling_params speed(vectorized)
2025-12-17 16:53:47 +08:00
YuBaoku
53158b7f8d
[Cherry-Pick][CI] Adape unit_test due to incompatibility change( #5578 ) ( #5583 )
...
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* [CI] Remove test_metrics.py due to incompatible forced merge (#5578 )
* [CI] Adapt vl_model baseline changes due to Paddle update (#5576 )
2025-12-16 15:45:49 +08:00
YuBaoku
b43563977d
[CI] disable test_cuda_graph_dynamic_subgraph.py in unit_test
2025-12-11 14:14:30 +08:00
zccjjj
bcde798098
[CI][XPU] ep+prefix cache+chunk prefill ( #5490 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
2025-12-10 19:40:38 +08:00
RAM
707d1a1fc9
[New][RL] Support Rollout Routing Replay ( #5405 ) ( #5408 )
...
* [RL] Support Rollout Routing Replay
* add routing indices cache
* fix config bug and moe forward bug
* R3 Support GLM
* support eb4.5
* fix merge bug
* Apply suggestion from @Copilot
* Apply suggestion from @Copilot
* Apply suggestion from @Copilot
* Apply suggestion from @Copilot
* add routing replay ci
* support glm topk
* support orther top_k
* fix ci bug
* pre-commit
* only support chatcmpl
* Revert "Revert "[RL] Support Rollout Routing Replay (#5321 )" (#5402 )"
This reverts commit c45e064f3d .
* Fix XPU and NPU bug
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
Co-authored-by: Yuanle Liu <yuanlehome@163.com >
2025-12-08 10:00:35 +08:00
Jiang-Jia-Jun
c45e064f3d
Revert "[RL] Support Rollout Routing Replay ( #5321 )" ( #5402 )
...
This reverts commit 96d2d4877b .
2025-12-05 20:19:39 +08:00
lizexu123
d4979347ca
[Bug fix] Fix the multi-input accuracy issue in the pooling model. ( #5374 )
...
* fix multi-inputs
* fix threshold
* fix threshold
* fix
2025-12-05 20:18:17 +08:00
RAM
96d2d4877b
[RL] Support Rollout Routing Replay ( #5321 )
...
* [RL] Support Rollout Routing Replay
* add routing indices cache
* fix config bug and moe forward bug
* R3 Support GLM
* support eb4.5
* fix merge bug
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* add routing replay ci
* support glm topk
* support orther top_k
* fix ci bug
* pre-commit
* only support chatcmpl
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
Co-authored-by: Yuanle Liu <yuanlehome@163.com >
2025-12-05 20:01:33 +08:00
kevin
c9d7f9e7c3
[BugFix] fix async download bug ( #5349 )
...
* fix async download bug
* update log
* Revert "update log"
This reverts commit 5816e602f4 .
* update code
* fix mtp bug
2025-12-05 18:59:12 +08:00
zccjjj
5b900667e3
[XPU] support ep4tp1+v1 loader ( #5398 )
2025-12-05 18:51:15 +08:00
zccjjj
e927c65742
[XPU] [Optimization] [EP] EP communication optimization. ( #5145 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-12-05 10:03:45 +08:00
YuBaoku
1b5fd79d6b
[CI] disable test_schedule_output.py in unit_test ( #5377 )
2025-12-04 23:18:23 +08:00
chenjian
3878a99b69
[Fearture] Support cache kv cache for output tokens ( #4535 )
...
* [Fearture] Support cache kv cache for output tokens
* fix bug
* fix ci bug
* improve coverage
* enable output caching by default
* fix ci
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-12-04 20:53:08 +08:00
Longzhi Wang
5cd17fd662
[Models] Add forward_meta to moe models' forward function ( #5138 )
...
* [Models] Add forward_meta to moe models' forward function
* fix missing param
* fix
* fix
* fix forward_meta
* fix test and remove chunked MoE releated in config
* fix test
* fix
* fix
2025-12-04 13:26:58 +08:00
Juncai
f5bdb36e9b
Reduce timeout in unittest ( #5366 )
2025-12-04 13:19:02 +08:00
lizexu123
946025480e
[Bug fix] fix pooling models ( #5358 )
...
* fix
* fix
* fix test
* fix gpu_model_runner
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-12-04 11:06:30 +08:00
qwes5s5
a52aea073c
fix logprobs ( #5335 )
2025-12-04 10:38:51 +08:00
ming1753
5f8d4aedea
[Feature] support audio tts ( #5333 )
2025-12-03 21:06:48 +08:00
Daci
83dbc4e5dd
[Feature] Guided Decoding add LLguidance backend ( #5124 )
...
* llguidance
* add requirements_guided_decoding.txt and doc
* fix test_guidance_*.py
* fix test_guidance_*.py && mv
* fix llguidance choice
* test_guidance_*
* rm lazy loader
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-12-03 20:23:57 +08:00
lzy
f458cc5ba4
[Optimization]1.fix tp+ep moe_forward; 2.set max_prefill_batch=env.MAX_PREFILL_NUM ( #5353 )
...
* [Optimization] 1.fix tp+ep moe_forward; 2.set max_prefill_batch=env.MAX_PREFILL_NUM
* fix test_chunked_moe
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-12-03 16:42:10 +08:00
YuBaoku
dfeabee123
[CI] Allow occasional distributed worker exit_code ( #5341 )
2025-12-03 10:56:59 +08:00
YuBaoku
3e2c13d8c5
[CI] Disable queue state assertion temporarily ( #5329 )
2025-12-02 18:57:29 +08:00
Sunny-bot1
3629db4129
[Quantization] Support w4afp8 MoE dynamic quantization ( #5282 )
...
* support dynamic activation quant for w4afp8
* support dynamic w4afp8
* add test
* fix
* fix
---------
Co-authored-by: zhoutianzi666 <17801055074@163.com >
2025-12-02 18:56:16 +08:00
周周周
fb7f951612
[UNITEST] add test ( #5305 )
2025-12-02 17:59:01 +08:00
Jiaxin Sui
8e0f4dfd0c
[XPU] [CI] Xpu Ci Refactor ( #5252 )
...
* add xpu ci
* add case
* add case
* fix ci bug
* Update Docker image tag to 'latest' in CI workflow
* Fix set -e usage in run_xpu_ci_pytest.sh
* add pd case
* add case
* Configure pip to use Tsinghua mirror for dependencies
Set the global pip index URL to Tsinghua mirror.
* fix ci bug
* fix bug
* fix bug
---------
Co-authored-by: suijiaxin <suijiaxin@Suis-MacBook-Pro.local >
Co-authored-by: root <root@gajl-bbc-onlinec-com-1511964.gajl.baidu.com >
Co-authored-by: root <root@gajl-bbc-onlinec-com-1511972.gajl.baidu.com >
2025-12-02 17:15:51 +08:00
YuBaoku
69e003abcb
[CI] Fix return_code check in test_chunked_moe.py ( #5326 )
2025-12-02 15:41:26 +08:00
lizexu123
c563eca791
[Feature] support reward model ( #5301 )
...
* Your commit message here
* add test
* update develop
* support reward
* support enable_chunk_prefill
* support bingfa
* support convert is reward
* update test
* delete print
* fix enable_thinking
* add document
* fix place
* fix test
* fix
* support enable_prefix_caching
* add no-enable_prefix-caching test
* fix
* support enable_prefix_caching
* delete print
* fix document
* fix
* fix test
* fix document and delete chinese
* udpate
* enable_thinking
* fix test
2025-12-02 14:55:31 +08:00
qwes5s5
117980dd4e
[LogProbs]Enable prompt logprobs output and modify data transmission method for the online interface. ( #5089 )
...
* add prompt logprobs
* Merge prompt_logprobs_tensors and prompt_logprobs
* fix param check
* trigger ci
* fix unitest
* fix logprobs bug
2025-12-02 13:49:51 +08:00