Commit Graph

518 Commits

Author SHA1 Message Date
kevin 741a01562b [BugFix][Cherry-Pick] cp fix dyc8 cache bug(#5958) (#5959)
* cp fix dyc8 cache bug

* udpate code
2026-01-08 19:25:56 -08:00
GoldPancake 8049a4982e [Cherry-Pick][Bugfix] Fix entropy calculation bugs (#5941) (#5942)
* fix entropy bug
2026-01-08 20:57:45 +08:00
Jiaxin Sui 7cdffced2d [Cherry Pick][XPU][CI] Add logprobs Case (#5907)
* Implement setup_logprobs_env for environment setup

Add setup_logprobs_env function to manage environment variables for logprobs.

* Update conftest.py

* Add logprobs test for ERNIE-4.5-21B-A3B model

This test verifies the logprobs functionality of the ERNIE-4.5-21B-A3B model through direct HTTP requests, ensuring correct response structure and log probabilities.

* Fix indentation and formatting in conftest.py
2026-01-07 18:37:49 +08:00
kevin 939dfa4877 [BugFix][Cherry-Pick] Cp fix eb5 prefix cache(#5879) (#5881)
* fix eb5 prefix bug

* update code

* update code

* update code

* update code
2026-01-06 23:49:32 -08:00
freeliuzc dcb0cceded [Speculative Decoding] Fix attn_mask_offset for multi-step MTP in mixed and PD-split modes (#5738) (#5793)
* fix attn_mask_offset in mtp with multi-step and pd-split-mode

* fix xpu operater register

* update pmtp multi-step mtp strategy in d-split -mode

* add note

* fix xpu register

Co-authored-by: Yuanle Liu <yuanlehome@163.com>
2026-01-05 07:59:17 -08:00
YuBaoku 2a71e427f9 [Cherry-Pick][CI] Fix archive URL injection and add retry(#5725,#5828) (#5832) 2026-01-04 17:10:03 +08:00
chen 9a7eb33fd4 [Cherry-Pick][Optimization] Optimization for gather_logprob by 10GB (#5817)(#5846) (#5834)
* [Optimization] Optimization for gather_logprob by 10GB (#5817)

* opt logprobs gather_logprob,reduce device memory usage by 10GB when token_num=8k

* only cuda run triton op (#5846)
2025-12-31 19:54:14 +08:00
kevin 20024b889c [Cherry-Pick][BugFix] cp skip_mm_revert(#5848) (#5849)
* cp skip_mm_revert

* update test
2025-12-31 17:29:49 +08:00
GoldPancake f33e642327 [Cherry-Pick][Speculative Decoding] Optimize draft logprob (#5842) (#5843)
* optimize draft logprob

* fix ut
2025-12-31 10:43:44 +08:00
GoldPancake 0d29f6df03 [Cherry-Pick][BugFix] Fix entropy bugs (#5818) (#5819)
* [Speculative Decoding] Fix attn_mask_offset for multi-step MTP in mixed and PD-split modes (#5738)

* fix attn_mask_offset in mtp with multi-step and pd-split-mode

* fix xpu operater register

* update pmtp multi-step mtp strategy in d-split -mode

* add note

* fix xpu register

* fix entropy bugs

* Revert "[Speculative Decoding] Fix attn_mask_offset for multi-step MTP in mixed and PD-split modes (#5738)"

This reverts commit ba0d35a52e8775300a1459bfcaa39056df570525.

* fix ut

* fix

---------

Co-authored-by: freeliuzc <lzc842650834@gmail.com>
2025-12-29 20:45:03 -08:00
Yonghua Li ca4ccf2397 [BugFix] fix shm opened but not closed in set_data_ipc (#5827) 2025-12-29 23:35:31 +08:00
kxz2002 df775c2811 [BugFix] Fix process_response_dict to support async in serving_completion (#5758) (#5802)
* support process_response_dict async initial commit

* fixbug

* add unit test

* optimize
2025-12-29 09:56:42 +08:00
kevin c170fc4dc5 [FDConfig][Cherry-Pick] Cp disable mm chunked(#5774) (#5775)
* disable chunked_mm_input in ernie5

* cp_disable_mm_chunked

* update test case

* update code
2025-12-26 15:31:46 +08:00
周周周 d0c5bcec3d [cherry-pick] support FA3 in mixed mode and support Qwen3 rope (#5655)
* [Others] Remove useless code (#5404)

* FA3 support qwen3 (#5441)

* commit
2025-12-25 11:11:16 +08:00
bukejiyu fc3bccc5b6 [Cherry-Pick][Others]upgrade paddleformer to 0.4.0 #5599 (#5716)
* update 0.4.0

* update
2025-12-24 06:28:50 -08:00
YuBaoku 70163ddb6b [Cherry-Pick][CI] Refactor RL tests to reuse upload_clear(#5741) (#5755)
* [Cherry-Pick][CI] Refactor RL tests to reuse upload_clear(#5741)
2025-12-24 21:15:35 +08:00
GoldPancake e51af01a65 [Cherry-Pick][Feature] Entropy calculation support #5692 (#5731)
* support entropy

* add script

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-24 15:42:43 +08:00
YuBaoku f50988d917 [Cherry-Pick][CI] Revert adapt vl_model baseline changes due to Paddle update(#5732) (#5733)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* [Cherry-Pick][CI] Revert adapt vl_model baseline changes due to Paddle update(#5732)

---------

Co-authored-by: yubaoku <yubaoku@baidu.com>
2025-12-24 12:14:34 +08:00
kevin 23bfd28624 [Cherry-Pick][BugFix] cp fix_cpu_cache_bugs(#5544) (#5577)
* cp fix_cpu_cache_bugs

* update ce case

* update test case

* update code
2025-12-19 11:48:50 +08:00
Yuanle Liu 0cb9ad186e [Cherry-Pick][BugFix] fix speculate_limit_thinking_content_length #5590 (#5615) 2025-12-18 01:50:18 -08:00
GoldPancake e56c4dd0a8 [Cherry-Pick] Support for request-level speculative decoding metrics monitoring.(#5518) (#5614)
* support spec metrics monitor per request
2025-12-17 20:53:04 +08:00
qwes5s5 d67b64d5e1 add detoken switch (#5463) (#5572) 2025-12-17 17:04:45 +08:00
freeliuzc a7359d1c1d [Cherry-Pick][CI]Support different inferseed in speculate decoding(#5568) (#5597)
* fix mtp entropy drop in RL

* optimize usage and fix unit test

* optimize padding_sampling_params speed(vectorized)
2025-12-17 16:53:47 +08:00
YuBaoku 53158b7f8d [Cherry-Pick][CI] Adape unit_test due to incompatibility change(#5578) (#5583)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* [CI] Remove test_metrics.py due to incompatible forced merge (#5578)
* [CI] Adapt vl_model baseline changes due to Paddle update (#5576)
2025-12-16 15:45:49 +08:00
YuBaoku b43563977d [CI] disable test_cuda_graph_dynamic_subgraph.py in unit_test 2025-12-11 14:14:30 +08:00
zccjjj bcde798098 [CI][XPU] ep+prefix cache+chunk prefill (#5490)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
2025-12-10 19:40:38 +08:00
RAM 707d1a1fc9 [New][RL] Support Rollout Routing Replay (#5405) (#5408)
* [RL] Support Rollout Routing Replay

* add routing indices cache

* fix config bug and moe forward bug

* R3 Support GLM

* support eb4.5

* fix merge bug

* Apply suggestion from @Copilot



* Apply suggestion from @Copilot



* Apply suggestion from @Copilot



* Apply suggestion from @Copilot



* add routing replay ci

* support glm topk

* support orther top_k

* fix ci bug

* pre-commit

* only support chatcmpl

* Revert "Revert "[RL] Support Rollout Routing Replay (#5321)" (#5402)"

This reverts commit c45e064f3d.

* Fix XPU and NPU bug

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Yuanle Liu <yuanlehome@163.com>
2025-12-08 10:00:35 +08:00
Jiang-Jia-Jun c45e064f3d Revert "[RL] Support Rollout Routing Replay (#5321)" (#5402)
This reverts commit 96d2d4877b.
2025-12-05 20:19:39 +08:00
lizexu123 d4979347ca [Bug fix] Fix the multi-input accuracy issue in the pooling model. (#5374)
* fix multi-inputs

* fix threshold

* fix threshold

* fix
2025-12-05 20:18:17 +08:00
RAM 96d2d4877b [RL] Support Rollout Routing Replay (#5321)
* [RL] Support Rollout Routing Replay

* add routing indices cache

* fix config bug and moe forward bug

* R3 Support GLM

* support eb4.5

* fix merge bug

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* add routing replay ci

* support glm topk

* support orther top_k

* fix ci bug

* pre-commit

* only support chatcmpl

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Yuanle Liu <yuanlehome@163.com>
2025-12-05 20:01:33 +08:00
kevin c9d7f9e7c3 [BugFix] fix async download bug (#5349)
* fix async download bug

* update log

* Revert "update log"

This reverts commit 5816e602f4.

* update code

* fix mtp bug
2025-12-05 18:59:12 +08:00
zccjjj 5b900667e3 [XPU] support ep4tp1+v1 loader (#5398) 2025-12-05 18:51:15 +08:00
zccjjj e927c65742 [XPU] [Optimization] [EP] EP communication optimization. (#5145)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-12-05 10:03:45 +08:00
YuBaoku 1b5fd79d6b [CI] disable test_schedule_output.py in unit_test (#5377) 2025-12-04 23:18:23 +08:00
chenjian 3878a99b69 [Fearture] Support cache kv cache for output tokens (#4535)
* [Fearture] Support cache kv cache for output tokens

* fix bug

* fix ci bug

* improve coverage

* enable output caching by default

* fix ci

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-12-04 20:53:08 +08:00
Longzhi Wang 5cd17fd662 [Models] Add forward_meta to moe models' forward function (#5138)
* [Models] Add forward_meta to moe models' forward function

* fix missing param

* fix

* fix

* fix forward_meta

* fix test and remove chunked MoE releated in config

* fix test

* fix

* fix
2025-12-04 13:26:58 +08:00
Juncai f5bdb36e9b Reduce timeout in unittest (#5366) 2025-12-04 13:19:02 +08:00
lizexu123 946025480e [Bug fix] fix pooling models (#5358)
* fix

* fix

* fix test

* fix gpu_model_runner

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-04 11:06:30 +08:00
qwes5s5 a52aea073c fix logprobs (#5335) 2025-12-04 10:38:51 +08:00
ming1753 5f8d4aedea [Feature] support audio tts (#5333) 2025-12-03 21:06:48 +08:00
Daci 83dbc4e5dd [Feature] Guided Decoding add LLguidance backend (#5124)
* llguidance

* add requirements_guided_decoding.txt and doc

* fix test_guidance_*.py

* fix test_guidance_*.py && mv

* fix llguidance choice

* test_guidance_*

* rm lazy loader

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-03 20:23:57 +08:00
lzy f458cc5ba4 [Optimization]1.fix tp+ep moe_forward; 2.set max_prefill_batch=env.MAX_PREFILL_NUM (#5353)
* [Optimization] 1.fix tp+ep moe_forward; 2.set max_prefill_batch=env.MAX_PREFILL_NUM

* fix test_chunked_moe

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-03 16:42:10 +08:00
YuBaoku dfeabee123 [CI] Allow occasional distributed worker exit_code (#5341) 2025-12-03 10:56:59 +08:00
YuBaoku 3e2c13d8c5 [CI] Disable queue state assertion temporarily (#5329) 2025-12-02 18:57:29 +08:00
Sunny-bot1 3629db4129 [Quantization] Support w4afp8 MoE dynamic quantization (#5282)
* support dynamic activation quant for w4afp8

* support dynamic w4afp8

* add test

* fix

* fix

---------

Co-authored-by: zhoutianzi666 <17801055074@163.com>
2025-12-02 18:56:16 +08:00
周周周 fb7f951612 [UNITEST] add test (#5305) 2025-12-02 17:59:01 +08:00
Jiaxin Sui 8e0f4dfd0c [XPU] [CI] Xpu Ci Refactor (#5252)
* add xpu ci

* add case

* add case

* fix ci bug

* Update Docker image tag to 'latest' in CI workflow

* Fix set -e usage in run_xpu_ci_pytest.sh

* add pd case

* add case

* Configure pip to use Tsinghua mirror for dependencies

Set the global pip index URL to Tsinghua mirror.

* fix ci bug

* fix bug

* fix bug

---------

Co-authored-by: suijiaxin <suijiaxin@Suis-MacBook-Pro.local>
Co-authored-by: root <root@gajl-bbc-onlinec-com-1511964.gajl.baidu.com>
Co-authored-by: root <root@gajl-bbc-onlinec-com-1511972.gajl.baidu.com>
2025-12-02 17:15:51 +08:00
YuBaoku 69e003abcb [CI] Fix return_code check in test_chunked_moe.py (#5326) 2025-12-02 15:41:26 +08:00
lizexu123 c563eca791 [Feature] support reward model (#5301)
* Your commit message here

* add test

* update develop

* support reward

* support enable_chunk_prefill

* support bingfa

* support convert is reward

* update test

* delete print

* fix enable_thinking

* add document

* fix place

* fix test

* fix

* support enable_prefix_caching

* add no-enable_prefix-caching test

* fix

* support enable_prefix_caching

* delete print

* fix document

* fix

* fix test

* fix document and delete chinese

* udpate

* enable_thinking

* fix test
2025-12-02 14:55:31 +08:00
qwes5s5 117980dd4e [LogProbs]Enable prompt logprobs output and modify data transmission method for the online interface. (#5089)
* add prompt logprobs

* Merge prompt_logprobs_tensors and prompt_logprobs

* fix param check

* trigger ci

* fix unitest

* fix logprobs bug
2025-12-02 13:49:51 +08:00