Commit Graph

668 Commits

Author SHA1 Message Date
freeliuzc ce06c6dfb3 [BugFix] Fix token_penalty kernel (#6069)
* fix token_penalty kernel

* try to fix xpu

* fix xpu

* fix unit test
2026-01-28 12:03:05 +08:00
Divano ba9d2a9e5a [CI] add update weights tests (#6242) 2026-01-27 20:54:21 +08:00
qwes5s5 38378415c7 add token ratio metrics (#6236) 2026-01-27 17:00:49 +08:00
周周周 aa57864c5b remove unneeded para from flash_mask_attention (#6218) 2026-01-27 14:04:27 +08:00
Jiaxin Sui f1cee7fd5e [XPU] [CI] XPU CI Updata (#6211)
* Update log file path in test_pd_21b_ep4tp1.py

* Update log file path in test_pd_21b_ep4tp4.py

* Update log file path in test_pd_p_tp4ep4_d_tp1ep4
2026-01-27 11:45:53 +08:00
jc b1698a79cb [RL] add version to the key of cache storage && refine raising error (#6160)
* Waiting for cache transfer manager inited

* up

* up

* up

* up

* up

* fix according comments

* fix unittest

* fix

* fix unittest

* fix error

* pass storage_backend to worker
2026-01-27 10:47:46 +08:00
yinwei 56d01f7e49 [XPU][CI]Add Cuda Graph CI Case (#6229)
* add cuda graph ci case
2026-01-26 23:20:44 +08:00
CSWYF3634076 08c411518f [Loader] support dummy load weight (#6169)
* [Loader] support dummy load weight

* [Loader] support dummy load weight v2

* [Loader] support dummy load weight unittest

* [Loader] support dummy load weight unittest v2

* [Loader] support dummy load weight v3 docs and fp8
2026-01-26 13:58:53 +08:00
周周周 0966df78dc [Others] remove stop_nums (#6182) 2026-01-26 12:12:47 +08:00
zhouchong b5b28eea94 Remove flaky IPC-related test (#6190) 2026-01-26 10:47:50 +08:00
Yonghua Li 833d00e2d7 [BugFix] move cache creation back to cache transfer process and adapt clear/update (#6144)
* [fix] move cache creation back to cache transfer process

* [fix] fix clear cache

* [chore] change some log level

* [fix] fix clear cache

* [fix] fix clear cache for blockwisefp8 and mtp

* [fix] fix c8

* [fix] fix clear_mtp_cache args

* [chore] update cache_transfer_manager

* [fix] fix update mtp cache
2026-01-24 21:59:13 +08:00
Jiaxin Sui 20074d301f [XPU] [CI] add xpu logprobs case (#6187)
* add xpu case

* add xpu case
2026-01-23 19:40:55 +08:00
sunxin bef6293552 [Model Runner] Add exist_prefill_flag (#6172) 2026-01-23 13:07:05 +08:00
GoldPancake 646aced1eb [UT] Add GLM E2E tests for non-MTP and MTP (#6163)
* add glm ut
2026-01-23 10:34:29 +08:00
wangyifei b7c5daa316 [RL] add pause, update_weights, resume interface for async RL (#6052)
* support dynamic run_control_request through zmq from apiserver to common_engine

* support pause/resume/is_paused/update_weights in apiserver->common_engine by common run_control_method

* change /is_puased from HTTP POST method to GET method

* add pause、resume、is_paused implementation

* support engine <==> worker communication(request&response)

* support sync weights through RDMA from checkpoint_transfer

* support specified version, rsync_config in update_weights rpc call

* add pause, update_weights, resume interface for async RL

* bug fix: update_weights support using default arguments

* fix typo

* typo fix

* typo fix

* typo fix

* add unitest for control request/response, localscheduler.get_inflight_requests, resource_manager_v1.preempted_all

* add "rsync" to LoadConfig.load_strategy Literal type hints

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* typo fix

* typo fix

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* check version/rsync params

* add error log when version.txt not exists

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* raise specified ValueError when paramters check failed

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* tp barrier after run_control_method

* encode 'engine_worker_queue_port' to unique name of worker2engine fmq queue

* typo fix

* typo fix

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-23 10:18:07 +08:00
Yonghua Li 8d27a523e7 [Feature] [KVCache] support attention_store kv cache backend (#5823)
* [feat] support attention_store kv cache backend

* [fix] fix codestyle

* [chore] optimize log

* [fix] fix write storage task

* [fix] fix read storage

* [fix] fix code conflict after merge develop

* [fix] fix cache bytes and read task token ids

* [chore] add model for cache transfer manager

* [chore] add some log

* [chore] remove launched_cache_manager_signal

* [fix] fix write_back_storage_task match_block_num condition

* [fix] fix swap_cost_time

* [ci] fix ci

* Update fastdeploy/engine/sched/resource_manager_v1.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update fastdeploy/cache_manager/cache_transfer_manager.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update fastdeploy/cache_manager/transfer_factory/mooncake_store/attention_store.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-22 21:01:23 +08:00
Yonghua Li bb76d3b6f0 [RL] [APIServer] add more status codes for update/clear api (#6141)
* [RL] add more status codes for update/clear api

* [feat] return json response

* [fix] fix ci
2026-01-22 17:26:18 +08:00
luukunn 6b968a76f1 【Optimization】update data_processor & add tool parser plugins (#6096)
* update data_processor

* fix unit test

* fix unit test

* add unit test

* add tool parser plugins

* fix tool call

* fix tool call

* fix tool call

* fix unit test

* fix unit test

* add unit test

* fix unit test

* fix unit test

* fix unit test
2026-01-22 17:17:32 +08:00
RAM 955785e2e0 [RL][R3] Fix typo (#6046)
* fix typo
2026-01-22 15:46:34 +08:00
YuBaoku 1cfb042045 [CI] Add ep4_mtp e2e test (#6153)
* [CI] Add ep4_mtp e2e test
2026-01-22 14:54:18 +08:00
kxz2002 6e416c62dd [Optimization] The pre- and post-processing pipeline do not perform dict conversion (#5494)
* to_request_for_infer initial commit

* refact to from_chat_completion_request

* preprocess use request initial commit

* bugfix

* processors refact to using request

* bug fix

* refact Request from_generic_request

* post process initial commit

* bugfix

* postprocess second commit

* bugfix

* serving_embedding initial commit

* serving_reward initial commit

* bugfix

* replace function name

* async_llm initial commit

* offline initial commit and fix bug

* bugfix

* fix async_llm

* remove add speculate_metrics into data

* fix logprobs bug

* fix echo bug

* fix bug

* fix reasoning_max_tokens

* bugfix

* bugfix and modify unittest

* bugfix and modify unit test

* bugfix

* bugfix

* bugfix

* modify unittest

* fix error when reasong_content is none for text_processor

* remove some unnessary logic

* revert removed logic

* implement add and set method for RequestOutput and refact code

* modify unit test

* modify unit test

* union process_request and process_request_obj

* remove a unit test

* union process_response and process_response_obj

* support qwen3_vl_processor

* modify unittest and remove comments

* fix prompt_logprobs

* fix codestyle

* add v1

* v1

* fix unit test

* fix unit test

* fix pre-commit

* fix

* add process request

* add process request

* fix

* fix

* fix unit test

* fix unit test

* fix unit test

* fix unit test

* fix unit test

* remove file

* add unit test

* add unit test

* add unit test

* fix unit test

* fix unit test

* fix

* fix

---------

Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
Co-authored-by: luukunn <981429396@qq.com>
Co-authored-by: luukunn <83932082+luukunn@users.noreply.github.com>
Co-authored-by: Zhang Yulong <35552275+ZhangYulongg@users.noreply.github.com>
2026-01-22 00:50:52 +08:00
zccjjj 14a64e9b3b [XPU] change XPU EP interface from xDeepEP to paddle (#5706)
* add ENV VAR to controll low lantency buffer
2026-01-21 18:23:45 +08:00
lizexu123 1f96028bea [BugFix] fix python3.12 v0_loader (#6132) 2026-01-21 16:12:11 +08:00
MingkunZhang 7e04067663 [Metax][CI] restore 'moe_expert_dispatch' outputs (#6130) 2026-01-21 10:33:09 +08:00
lizexu123 f4902fe42d [BugFix] fix wint2 (#6109)
* fix

* fix

* fix
2026-01-20 21:46:21 +08:00
ChowMingSing bf60e103b6 [CI]Fix test case (#6111) 2026-01-20 17:47:44 +08:00
Ryan dda27e50f5 [Graph Optimization] remove static_op_get_block_shape_and_split_kv_block from cudagraph (#6081)
* rm static_op_get_block_shape_and_split_kv_block from cudagraph

* update max_capture_shape

* fallback: zeros -> empty to avoid coverage check

* check graph_opt_config exists

* add max_capture_shape_dy2st && full_cuda_graph: false -> true in 28B vl test

* add use_cudagraph flag to control step_use_cudagraph
2026-01-20 14:05:18 +08:00
jackyYang6 988e0bc338 [Feature] Add PaddleFormers fallback backend (#5999)
* feat(paddleformers): add dense text model fallback backend

* docs(paddleformers): add user guide and fix code review issues

* add fallback unit test

* precommit format

* fix pre-commit

* fix: address code review feedback

* docs: add PaddleFormers backend documentation (EN) and simplify installation

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2026-01-19 21:50:50 +08:00
cmcamdy 211dd81ca7 add pd+mtp ci (#6090) 2026-01-19 19:21:24 +08:00
Jiaxin Sui e0d15a2ded [XPU][CI] Xpu ci update (#6089)
* add xpu ci case

* add xpu ci case

* add xpu ci case

* Change runner from XPU-P800-8Card to XPU-P800

* Remove cache queue port from test_pd_03b_tp1.py

Removed cache queue port arguments from test cases.

* Remove cache queue port from test_pd_21b_tp2.py

Removed cache queue port arguments from test cases.

* Update README with PYTHONPATH setup instructions

Added instructions for setting PYTHONPATH in CI scripts.
2026-01-19 16:09:09 +08:00
ChowMingSing 496cc23089 [CI]Fix test cases failing under Python 3.12 (#6059)
* 修复python3.12下测试用例错误

* 修复python3.12下测试用例错误

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2026-01-19 15:41:12 +08:00
YuBaoku ac6fa6d725 [CI] Add 4-GPU e2e test job (#6082) 2026-01-19 10:42:14 +08:00
kevin 0e0eaa1c57 [BugFix] fix mm revert bug (#6061)
* fix mm revert bug

* update code
2026-01-16 08:13:34 -08:00
Jiaxin Sui 70a962df53 [XPU][CI] XPU CI refactor (#6053)
* add xpu ci case

* add xpu ci case

* add xpu ci case

* Change runner from XPU-P800-8Card to XPU-P800
2026-01-16 20:57:58 +08:00
GoldPancake bda38aa519 [Speculative Decoding] Support MTP for GLM-4.5-Air (#6047)
* glm mtp
* add spec neox partial rope
2026-01-16 14:35:24 +08:00
qwes5s5 b2a2e11551 [Feature] Support stopping the inference for the corresponding request in the online service after a disconnection request. (#5320)
* request disconnect

* request disconnect

* fix bug

* fix bug--amend

---------

Co-authored-by: root <root@yq01-sys-rpm26xc1knu.yq01.baidu.com>
2026-01-16 11:46:13 +08:00
周周周 8f035101ad initial commit (#6054)
Co-authored-by: xiaoluomi <1037819816@qq.com>
2026-01-16 10:49:38 +08:00
fxyfxy777 4c92035f2d [Feature] Unify fp8 block_wise quant ops (#5991)
* quant stash

* blockwise_quant

* precommit

* rm tensor.cut

* tp ok

* add swiglu

* rm outdate code

* fix activate ut

* change baseline

* fix baseline error
2026-01-15 05:50:37 -08:00
周周周 d38cd8b40b [UNITEST] add EP TP test_fused_moe CI (#5989) 2026-01-15 21:37:32 +08:00
freeliuzc 49617d9832 [Feature]Support tag phase token enforce generation (#6034)
* support tag phase token enforce generation

* optimize note and some feature

* fix sampler unit test

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2026-01-15 03:59:55 -08:00
freeliuzc 17866c028e add more cases for attention unit test (#5931) 2026-01-15 19:52:35 +08:00
lizexu123 6619298b50 【Optim】Optimize grid dimensions using max_tokens_per_expert for MoE models (#6007)
* update w4afp8

* build.sh ok

* support cuda_graph

* fix

* add test

* fix max_tokens_per_expert

* >=70

* fix

* compute_max_tokens_from_prefix_sum in w4afp8

* compute_max_tokens use cub
2026-01-15 19:18:42 +08:00
RAM b3f59fd9b5 [RL][CI] Support Async R3 And Add Accuracy Test (#5937)
* add bs1 r3 test case

* async put

* r3 test case 1.0

* success run eb5

* refine test case

* pre-commit

* add eb45 & glm testcase

* format code

* add p2pstore requirements

* support only last turn

* R3 use worker log

* refine code &fix ci bug

* refine error mesg

* fix empty input bug

* Success set acc ci of eb45 and glm45

* refine code

* fix bug
2026-01-14 04:25:06 -08:00
ddchenhao66 9373f373dc [XPU] fix multi-batch bug in VL model (#6015)
* [XPU] fix multi-batch bug in VL model

* Add command to kill additional port processes

---------

Co-authored-by: ddchenhao66 <dhaochen163.com>
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
2026-01-14 19:44:58 +08:00
xiaoxiaohehe001 6f72be7c3e [Optimize] Qwen2.5-VL vision model with merged linear layers and unif… (#6037)
* [Optimize] Qwen2.5-VL vision model with merged linear layers and unified normalization

* [Optimize] Qwen2.5-VL vision model with merged linear layers and unified normalization
2026-01-14 19:21:31 +08:00
luukunn 93b7675a64 [Feature]Report FD statistical information (#5646)
* add usage commit

* update envs and xpu

* add requirements

* fix quantization value

* add unit test

* add unit test

* fix unit test

* add unit test

* add unit test

* add unit test

* add unit test

* add unit test

* add unit test

* fix FD_USAGE_STATS_SERVER

* fix

* fix

* add doc

* add doc

* add doc

* add doc

* add doc

* fix file name
2026-01-14 17:54:01 +08:00
YuBaoku 2c17acd767 [CI] Adapt vl_model baseline changes due to Paddle update_2 (#6033) 2026-01-14 15:22:26 +08:00
MingkunZhang f3587b592c [Metax][CI] remove 28B VL model test sampling randomness (#6032)
Co-authored-by: root <root@lt-wks-10-0-180-15.pub.metax-tech.com>
2026-01-14 14:00:41 +08:00
Jiaxin Sui 926a26074f [XPU][CI] Cache queue port bug fix (#6030)
* Remove cache queue port from test configuration

Removed cache queue port configuration from test.

* Remove cache queue port from test_vl_model.py

Removed cache queue port argument from test configuration.

* Update test_w4a8.py

* Remove cache queue port from test_mtp.py

Removed cache queue port configuration from test.

* Remove cache queue port from test_logprobs_21b_tp4

Removed cache queue port configuration from test.

* Remove cache queue port from test configuration

Removed cache queue port configuration from test.

* Update test_ep4tp4_online.py
2026-01-14 12:51:40 +08:00
chenjian 74d0f1c01f [Optim] Robust sync status when preempted happens (#5796)
* [Bug fix] Sync status for caching output cache

* fix

* fix

* fix bug

* fix

* fix

* support xpu

* fix

* fix

* fix

* fix

* fix

* fix ci

* fix ci

* fix xpu

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2026-01-14 12:07:33 +08:00