Commit Graph

4485 Commits

Author SHA1 Message Date
GoldPancake 646aced1eb [UT] Add GLM E2E tests for non-MTP and MTP (#6163)
* add glm ut
2026-01-23 10:34:29 +08:00
wangyifei b7c5daa316 [RL] add pause, update_weights, resume interface for async RL (#6052)
* support dynamic run_control_request through zmq from apiserver to common_engine

* support pause/resume/is_paused/update_weights in apiserver->common_engine by common run_control_method

* change /is_puased from HTTP POST method to GET method

* add pause、resume、is_paused implementation

* support engine <==> worker communication(request&response)

* support sync weights through RDMA from checkpoint_transfer

* support specified version, rsync_config in update_weights rpc call

* add pause, update_weights, resume interface for async RL

* bug fix: update_weights support using default arguments

* fix typo

* typo fix

* typo fix

* typo fix

* add unitest for control request/response, localscheduler.get_inflight_requests, resource_manager_v1.preempted_all

* add "rsync" to LoadConfig.load_strategy Literal type hints

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* typo fix

* typo fix

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* check version/rsync params

* add error log when version.txt not exists

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* raise specified ValueError when paramters check failed

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* tp barrier after run_control_method

* encode 'engine_worker_queue_port' to unique name of worker2engine fmq queue

* typo fix

* typo fix

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-23 10:18:07 +08:00
Copilot 96b2cf2c20 [Docs] Update FastDeploy Docker image to 2.4.0 for Nvidia GPU installation (#6168)
* Initial plan

* Update Nvidia GPU Docker image version from 2.3.3 to 2.4.0

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2026-01-22 22:01:13 +08:00
Ryan 31c219d483 [Graph Optimization] Add max_capture_shape_prefill && cudagraph_capture_sizes_prefill (#6148)
* Add max_capture_shape_dy2st parameter to YAML config

* split cudagraph capture size between decode and prefill

* rm if

* add default value
2026-01-22 21:37:18 +08:00
Yonghua Li 8d27a523e7 [Feature] [KVCache] support attention_store kv cache backend (#5823)
* [feat] support attention_store kv cache backend

* [fix] fix codestyle

* [chore] optimize log

* [fix] fix write storage task

* [fix] fix read storage

* [fix] fix code conflict after merge develop

* [fix] fix cache bytes and read task token ids

* [chore] add model for cache transfer manager

* [chore] add some log

* [chore] remove launched_cache_manager_signal

* [fix] fix write_back_storage_task match_block_num condition

* [fix] fix swap_cost_time

* [ci] fix ci

* Update fastdeploy/engine/sched/resource_manager_v1.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update fastdeploy/cache_manager/cache_transfer_manager.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update fastdeploy/cache_manager/transfer_factory/mooncake_store/attention_store.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-22 21:01:23 +08:00
yinwei 3cd0ffe36c Enable CudaGraph 2026-01-22 19:49:33 +08:00
Yonghua Li bb76d3b6f0 [RL] [APIServer] add more status codes for update/clear api (#6141)
* [RL] add more status codes for update/clear api

* [feat] return json response

* [fix] fix ci
2026-01-22 17:26:18 +08:00
luukunn 6b968a76f1 【Optimization】update data_processor & add tool parser plugins (#6096)
* update data_processor

* fix unit test

* fix unit test

* add unit test

* add tool parser plugins

* fix tool call

* fix tool call

* fix tool call

* fix unit test

* fix unit test

* add unit test

* fix unit test

* fix unit test

* fix unit test
2026-01-22 17:17:32 +08:00
RAM 955785e2e0 [RL][R3] Fix typo (#6046)
* fix typo
2026-01-22 15:46:34 +08:00
YuBaoku 1cfb042045 [CI] Add ep4_mtp e2e test (#6153)
* [CI] Add ep4_mtp e2e test
2026-01-22 14:54:18 +08:00
yinwei 1e3c35496c [XPU][Graph Optimization] XPU Support CUDAGraph (#6152)
* support cuda graph
2026-01-22 14:41:56 +08:00
Haonan Luo 82057cb71f Support MXFP4 for GPT-OSS (#5435)
* support mxfp4 in gpt-oss

* support mxfp4 in gpt-oss

* add scope for flashinfer

* remove torch code

* update envs.FD_MXFP4_BACKEND

* update process_weights_after_loading

* update env name

* support tp in gpt-oss, add e2e test

* add flashinfer-python-paddle in requirements

* fix import error

* add test

* add test

* add test

* add test
2026-01-22 14:21:01 +08:00
jc 309c7d9764 router support divided roolout (#6150) 2026-01-22 10:39:39 +08:00
fxyfxy777 9c4db0ac3f [BugFix] fix weight quant op (#6137)
* fix weight quant

* fix weight quant

* bit equal

* code style
2026-01-22 09:50:57 +08:00
kxz2002 6e416c62dd [Optimization] The pre- and post-processing pipeline do not perform dict conversion (#5494)
* to_request_for_infer initial commit

* refact to from_chat_completion_request

* preprocess use request initial commit

* bugfix

* processors refact to using request

* bug fix

* refact Request from_generic_request

* post process initial commit

* bugfix

* postprocess second commit

* bugfix

* serving_embedding initial commit

* serving_reward initial commit

* bugfix

* replace function name

* async_llm initial commit

* offline initial commit and fix bug

* bugfix

* fix async_llm

* remove add speculate_metrics into data

* fix logprobs bug

* fix echo bug

* fix bug

* fix reasoning_max_tokens

* bugfix

* bugfix and modify unittest

* bugfix and modify unit test

* bugfix

* bugfix

* bugfix

* modify unittest

* fix error when reasong_content is none for text_processor

* remove some unnessary logic

* revert removed logic

* implement add and set method for RequestOutput and refact code

* modify unit test

* modify unit test

* union process_request and process_request_obj

* remove a unit test

* union process_response and process_response_obj

* support qwen3_vl_processor

* modify unittest and remove comments

* fix prompt_logprobs

* fix codestyle

* add v1

* v1

* fix unit test

* fix unit test

* fix pre-commit

* fix

* add process request

* add process request

* fix

* fix

* fix unit test

* fix unit test

* fix unit test

* fix unit test

* fix unit test

* remove file

* add unit test

* add unit test

* add unit test

* fix unit test

* fix unit test

* fix

* fix

---------

Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
Co-authored-by: luukunn <981429396@qq.com>
Co-authored-by: luukunn <83932082+luukunn@users.noreply.github.com>
Co-authored-by: Zhang Yulong <35552275+ZhangYulongg@users.noreply.github.com>
2026-01-22 00:50:52 +08:00
YuBaoku fe5ba4b509 [CI] Update image used by build_rl in ce_job.yml 2026-01-21 20:57:50 +08:00
yangjianfengo1 bb635e0819 fix text (#6145) 2026-01-21 19:40:30 +08:00
yinwei 9536cd650b [XPU] update release doc (#6143) 2026-01-21 18:31:25 +08:00
zccjjj 14a64e9b3b [XPU] change XPU EP interface from xDeepEP to paddle (#5706)
* add ENV VAR to controll low lantency buffer
2026-01-21 18:23:45 +08:00
K11OntheBoat 490a6551dc rename params of normalization layer (#6133)
Co-authored-by: “liuruian” <liuruian@baidu.com>
2026-01-21 17:18:35 +08:00
lizexu123 1f96028bea [BugFix] fix python3.12 v0_loader (#6132) 2026-01-21 16:12:11 +08:00
yzwu 837ddca273 [Iluvartar][CI] Fix the error max_tokens_per_expert referenced before assignment (#6083) 2026-01-21 16:01:29 +08:00
yinwei 85d995100a Update Dummy Run To Suppport Mutil-Batch Execution (#6123) 2026-01-21 14:20:44 +08:00
Cheng Yanfei 9ee0156cc3 add HPU tensorwise_fp8 readme (#6091) 2026-01-21 11:48:22 +08:00
MingkunZhang 7e04067663 [Metax][CI] restore 'moe_expert_dispatch' outputs (#6130) 2026-01-21 10:33:09 +08:00
YuBaoku c991fda54c [CI] Enable 4-GPU e2e test in nightly and fix docker_tag_build (#6128) 2026-01-20 22:44:29 +08:00
lizexu123 f4902fe42d [BugFix] fix wint2 (#6109)
* fix

* fix

* fix
2026-01-20 21:46:21 +08:00
yinwei 5385d51808 [XPU]XPU FD Release/2.4 Note 2026-01-20 20:38:34 +08:00
Copilot dcb20c1a2a [WIP] Add directory guide to mkdocs configuration (#6121)
* Initial plan

* Add PaddleFormers Backend documentation to mkdocs.yml navigation

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2026-01-20 19:51:27 +08:00
luukunn 56e22a7ddc [Docs]fix doc (#6119)
* fix doc

* fix doc
2026-01-20 19:46:05 +08:00
yinwei 51a8a2ed57 [XPU] Support CudaGraph(add block attn cuda_graph support) (#6116)
* add block attn cuda_graph support
2026-01-20 19:33:11 +08:00
jackyYang6 00a6a73431 docs: fix pre-commit error of markdown (#6100) 2026-01-20 19:32:05 +08:00
ChowMingSing bf60e103b6 [CI]Fix test case (#6111) 2026-01-20 17:47:44 +08:00
Ryan dda27e50f5 [Graph Optimization] remove static_op_get_block_shape_and_split_kv_block from cudagraph (#6081)
* rm static_op_get_block_shape_and_split_kv_block from cudagraph

* update max_capture_shape

* fallback: zeros -> empty to avoid coverage check

* check graph_opt_config exists

* add max_capture_shape_dy2st && full_cuda_graph: false -> true in 28B vl test

* add use_cudagraph flag to control step_use_cudagraph
2026-01-20 14:05:18 +08:00
zhupengyang 45ebb2efb4 [XPU] support plugin model (#6092) 2026-01-20 13:00:09 +08:00
jackyYang6 988e0bc338 [Feature] Add PaddleFormers fallback backend (#5999)
* feat(paddleformers): add dense text model fallback backend

* docs(paddleformers): add user guide and fix code review issues

* add fallback unit test

* precommit format

* fix pre-commit

* fix: address code review feedback

* docs: add PaddleFormers backend documentation (EN) and simplify installation

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2026-01-19 21:50:50 +08:00
GoldPancake 879e45f6b3 fix compute logits problem (#6093) 2026-01-19 20:12:14 +08:00
xiegegege e22c4e29bb [CE]add paddleocr config yaml (#6097) 2026-01-19 20:07:42 +08:00
Jingfeng Wu 7d44009f39 [FDConfig] transfer metrics_port (#6056)
* transfer metrics_port

* transfer metrics_port
2026-01-19 19:58:57 +08:00
cmcamdy 211dd81ca7 add pd+mtp ci (#6090) 2026-01-19 19:21:24 +08:00
Jiaxin Sui e0d15a2ded [XPU][CI] Xpu ci update (#6089)
* add xpu ci case

* add xpu ci case

* add xpu ci case

* Change runner from XPU-P800-8Card to XPU-P800

* Remove cache queue port from test_pd_03b_tp1.py

Removed cache queue port arguments from test cases.

* Remove cache queue port from test_pd_21b_tp2.py

Removed cache queue port arguments from test cases.

* Update README with PYTHONPATH setup instructions

Added instructions for setting PYTHONPATH in CI scripts.
2026-01-19 16:09:09 +08:00
ChowMingSing 496cc23089 [CI]Fix test cases failing under Python 3.12 (#6059)
* 修复python3.12下测试用例错误

* 修复python3.12下测试用例错误

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2026-01-19 15:41:12 +08:00
sunxin a4144e0b8e [Optimization] Avoid unnecessary penalty computation (#6078) 2026-01-19 15:24:12 +08:00
GoldPancake 05fbd89a8e [Speculative Decoding][Bugfix] Fix MTP logprob issues caused by max_num_logprobs (#6084) 2026-01-19 14:55:36 +08:00
ddchenhao66 3685474799 [XPU] xpu support mm prefill batch (#6072)
Co-authored-by: ddchenhao66 <dhaochen163.com>
2026-01-19 14:36:35 +08:00
sunxin 9dc1c74d36 fix opt qknorm (#6080) 2026-01-19 12:07:20 +08:00
YuBaoku ac6fa6d725 [CI] Add 4-GPU e2e test job (#6082) 2026-01-19 10:42:14 +08:00
kevin 0e0eaa1c57 [BugFix] fix mm revert bug (#6061)
* fix mm revert bug

* update code
2026-01-16 08:13:34 -08:00
Jiaxin Sui 70a962df53 [XPU][CI] XPU CI refactor (#6053)
* add xpu ci case

* add xpu ci case

* add xpu ci case

* Change runner from XPU-P800-8Card to XPU-P800
2026-01-16 20:57:58 +08:00
GoldPancake b917b56aca [Bugfix] Fix logprob issues caused by max_num_logprobs (#6067) 2026-01-16 04:40:18 -08:00