CSWYF3634076
08c411518f
[Loader] support dummy load weight ( #6169 )
...
* [Loader] support dummy load weight
* [Loader] support dummy load weight v2
* [Loader] support dummy load weight unittest
* [Loader] support dummy load weight unittest v2
* [Loader] support dummy load weight v3 docs and fp8
2026-01-26 13:58:53 +08:00
sunxin
adc69c15d0
[Model Runner] Prepare token count and move FA3 initialization into the graph ( #6170 )
...
* prepare for token num and put FA3 init in graph
2026-01-26 12:16:57 +08:00
周周周
0966df78dc
[Others] remove stop_nums ( #6182 )
2026-01-26 12:12:47 +08:00
wangyifei
84a1780814
[build] support build sm 80,86,89,90 to one whl package ( #6173 )
...
* support build sm 80,86,89,90 to one whl package
* create tmp dir before build custom ops in FD_UNIFY_BUILD mode
* typo fix
* ignore exceptions in xpu ..
2026-01-26 11:30:02 +08:00
zhouchong
b5b28eea94
Remove flaky IPC-related test ( #6190 )
2026-01-26 10:47:50 +08:00
Yuanle Liu
253c5cc16c
Improve deep_ep import handling with logging ( #6207 )
...
* Improve deep_ep import handling with logging
Refactor deep_ep import logic to handle PaddleFleet and PFCCLab imports with error logging.
* Add traceback import to ep.py
2026-01-24 22:41:42 -08:00
Yonghua Li
833d00e2d7
[BugFix] move cache creation back to cache transfer process and adapt clear/update ( #6144 )
...
* [fix] move cache creation back to cache transfer process
* [fix] fix clear cache
* [chore] change some log level
* [fix] fix clear cache
* [fix] fix clear cache for blockwisefp8 and mtp
* [fix] fix c8
* [fix] fix clear_mtp_cache args
* [chore] update cache_transfer_manager
* [fix] fix update mtp cache
2026-01-24 21:59:13 +08:00
RuohengMa
976203cf60
[XPU ]fix text_image_gather_scatter in cudagraph mode( #6049 )
2026-01-23 19:48:43 +08:00
Jiaxin Sui
20074d301f
[XPU] [CI] add xpu logprobs case ( #6187 )
...
* add xpu case
* add xpu case
2026-01-23 19:40:55 +08:00
wangyifei
53dc56f11b
[Docs] add docs of /v1/pause、/v1/resume、/v1/is_paused ( #6192 )
...
* support dynamic run_control_request through zmq from apiserver to common_engine
* support pause/resume/is_paused/update_weights in apiserver->common_engine by common run_control_method
* change /is_puased from HTTP POST method to GET method
* add pause、resume、is_paused implementation
* support engine <==> worker communication(request&response)
* support sync weights through RDMA from checkpoint_transfer
* support specified version, rsync_config in update_weights rpc call
* add pause, update_weights, resume interface for async RL
* bug fix: update_weights support using default arguments
* fix typo
* typo fix
* typo fix
* typo fix
* add unitest for control request/response, localscheduler.get_inflight_requests, resource_manager_v1.preempted_all
* add "rsync" to LoadConfig.load_strategy Literal type hints
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* typo fix
* typo fix
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* check version/rsync params
* add error log when version.txt not exists
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* raise specified ValueError when paramters check failed
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* tp barrier after run_control_method
* encode 'engine_worker_queue_port' to unique name of worker2engine fmq queue
* typo fix
* typo fix
* update docs of /v1/pause, /v1/resume, /v1/is_paused
* add zh docs of pause、resume、is_paused
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2026-01-23 17:57:51 +08:00
fxyfxy777
79f42209bf
add scale_wrapper for per_block_cast_to_fp8 ( #6183 )
2026-01-23 00:37:20 -08:00
lizan1999
b3a48529ab
[XPU] add more type for recover batch sequence ( #6142 )
2026-01-23 15:16:05 +08:00
sunxin
bef6293552
[Model Runner] Add exist_prefill_flag ( #6172 )
2026-01-23 13:07:05 +08:00
luukunn
0a19e1b6df
fix image gen ( #6175 )
2026-01-23 11:24:12 +08:00
luukunn
8635d8880d
bug fix tool_calls ( #6166 )
2026-01-23 10:49:27 +08:00
GoldPancake
646aced1eb
[UT] Add GLM E2E tests for non-MTP and MTP ( #6163 )
...
* add glm ut
2026-01-23 10:34:29 +08:00
wangyifei
b7c5daa316
[RL] add pause, update_weights, resume interface for async RL ( #6052 )
...
* support dynamic run_control_request through zmq from apiserver to common_engine
* support pause/resume/is_paused/update_weights in apiserver->common_engine by common run_control_method
* change /is_puased from HTTP POST method to GET method
* add pause、resume、is_paused implementation
* support engine <==> worker communication(request&response)
* support sync weights through RDMA from checkpoint_transfer
* support specified version, rsync_config in update_weights rpc call
* add pause, update_weights, resume interface for async RL
* bug fix: update_weights support using default arguments
* fix typo
* typo fix
* typo fix
* typo fix
* add unitest for control request/response, localscheduler.get_inflight_requests, resource_manager_v1.preempted_all
* add "rsync" to LoadConfig.load_strategy Literal type hints
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* typo fix
* typo fix
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* check version/rsync params
* add error log when version.txt not exists
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* raise specified ValueError when paramters check failed
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* tp barrier after run_control_method
* encode 'engine_worker_queue_port' to unique name of worker2engine fmq queue
* typo fix
* typo fix
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2026-01-23 10:18:07 +08:00
Copilot
96b2cf2c20
[Docs] Update FastDeploy Docker image to 2.4.0 for Nvidia GPU installation ( #6168 )
...
* Initial plan
* Update Nvidia GPU Docker image version from 2.3.3 to 2.4.0
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2026-01-22 22:01:13 +08:00
Ryan
31c219d483
[Graph Optimization] Add max_capture_shape_prefill && cudagraph_capture_sizes_prefill ( #6148 )
...
* Add max_capture_shape_dy2st parameter to YAML config
* split cudagraph capture size between decode and prefill
* rm if
* add default value
2026-01-22 21:37:18 +08:00
Yonghua Li
8d27a523e7
[Feature] [KVCache] support attention_store kv cache backend ( #5823 )
...
* [feat] support attention_store kv cache backend
* [fix] fix codestyle
* [chore] optimize log
* [fix] fix write storage task
* [fix] fix read storage
* [fix] fix code conflict after merge develop
* [fix] fix cache bytes and read task token ids
* [chore] add model for cache transfer manager
* [chore] add some log
* [chore] remove launched_cache_manager_signal
* [fix] fix write_back_storage_task match_block_num condition
* [fix] fix swap_cost_time
* [ci] fix ci
* Update fastdeploy/engine/sched/resource_manager_v1.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update fastdeploy/cache_manager/cache_transfer_manager.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update fastdeploy/cache_manager/transfer_factory/mooncake_store/attention_store.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2026-01-22 21:01:23 +08:00
yinwei
3cd0ffe36c
Enable CudaGraph
2026-01-22 19:49:33 +08:00
Yonghua Li
bb76d3b6f0
[RL] [APIServer] add more status codes for update/clear api ( #6141 )
...
* [RL] add more status codes for update/clear api
* [feat] return json response
* [fix] fix ci
2026-01-22 17:26:18 +08:00
luukunn
6b968a76f1
【Optimization】update data_processor & add tool parser plugins ( #6096 )
...
* update data_processor
* fix unit test
* fix unit test
* add unit test
* add tool parser plugins
* fix tool call
* fix tool call
* fix tool call
* fix unit test
* fix unit test
* add unit test
* fix unit test
* fix unit test
* fix unit test
2026-01-22 17:17:32 +08:00
RAM
955785e2e0
[RL][R3] Fix typo ( #6046 )
...
* fix typo
2026-01-22 15:46:34 +08:00
YuBaoku
1cfb042045
[CI] Add ep4_mtp e2e test ( #6153 )
...
* [CI] Add ep4_mtp e2e test
2026-01-22 14:54:18 +08:00
yinwei
1e3c35496c
[XPU][Graph Optimization] XPU Support CUDAGraph ( #6152 )
...
* support cuda graph
2026-01-22 14:41:56 +08:00
Haonan Luo
82057cb71f
Support MXFP4 for GPT-OSS ( #5435 )
...
* support mxfp4 in gpt-oss
* support mxfp4 in gpt-oss
* add scope for flashinfer
* remove torch code
* update envs.FD_MXFP4_BACKEND
* update process_weights_after_loading
* update env name
* support tp in gpt-oss, add e2e test
* add flashinfer-python-paddle in requirements
* fix import error
* add test
* add test
* add test
* add test
2026-01-22 14:21:01 +08:00
jc
309c7d9764
router support divided roolout ( #6150 )
2026-01-22 10:39:39 +08:00
fxyfxy777
9c4db0ac3f
[BugFix] fix weight quant op ( #6137 )
...
* fix weight quant
* fix weight quant
* bit equal
* code style
2026-01-22 09:50:57 +08:00
kxz2002
6e416c62dd
[Optimization] The pre- and post-processing pipeline do not perform dict conversion ( #5494 )
...
* to_request_for_infer initial commit
* refact to from_chat_completion_request
* preprocess use request initial commit
* bugfix
* processors refact to using request
* bug fix
* refact Request from_generic_request
* post process initial commit
* bugfix
* postprocess second commit
* bugfix
* serving_embedding initial commit
* serving_reward initial commit
* bugfix
* replace function name
* async_llm initial commit
* offline initial commit and fix bug
* bugfix
* fix async_llm
* remove add speculate_metrics into data
* fix logprobs bug
* fix echo bug
* fix bug
* fix reasoning_max_tokens
* bugfix
* bugfix and modify unittest
* bugfix and modify unit test
* bugfix
* bugfix
* bugfix
* modify unittest
* fix error when reasong_content is none for text_processor
* remove some unnessary logic
* revert removed logic
* implement add and set method for RequestOutput and refact code
* modify unit test
* modify unit test
* union process_request and process_request_obj
* remove a unit test
* union process_response and process_response_obj
* support qwen3_vl_processor
* modify unittest and remove comments
* fix prompt_logprobs
* fix codestyle
* add v1
* v1
* fix unit test
* fix unit test
* fix pre-commit
* fix
* add process request
* add process request
* fix
* fix
* fix unit test
* fix unit test
* fix unit test
* fix unit test
* fix unit test
* remove file
* add unit test
* add unit test
* add unit test
* fix unit test
* fix unit test
* fix
* fix
---------
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com >
Co-authored-by: luukunn <981429396@qq.com >
Co-authored-by: luukunn <83932082+luukunn@users.noreply.github.com >
Co-authored-by: Zhang Yulong <35552275+ZhangYulongg@users.noreply.github.com >
2026-01-22 00:50:52 +08:00
YuBaoku
fe5ba4b509
[CI] Update image used by build_rl in ce_job.yml
2026-01-21 20:57:50 +08:00
yangjianfengo1
bb635e0819
fix text ( #6145 )
2026-01-21 19:40:30 +08:00
yinwei
9536cd650b
[XPU] update release doc ( #6143 )
2026-01-21 18:31:25 +08:00
zccjjj
14a64e9b3b
[XPU] change XPU EP interface from xDeepEP to paddle ( #5706 )
...
* add ENV VAR to controll low lantency buffer
2026-01-21 18:23:45 +08:00
K11OntheBoat
490a6551dc
rename params of normalization layer ( #6133 )
...
Co-authored-by: “liuruian” <liuruian@baidu.com >
2026-01-21 17:18:35 +08:00
lizexu123
1f96028bea
[BugFix] fix python3.12 v0_loader ( #6132 )
2026-01-21 16:12:11 +08:00
yzwu
837ddca273
[Iluvartar][CI] Fix the error max_tokens_per_expert referenced before assignment ( #6083 )
2026-01-21 16:01:29 +08:00
yinwei
85d995100a
Update Dummy Run To Suppport Mutil-Batch Execution ( #6123 )
2026-01-21 14:20:44 +08:00
Cheng Yanfei
9ee0156cc3
add HPU tensorwise_fp8 readme ( #6091 )
2026-01-21 11:48:22 +08:00
MingkunZhang
7e04067663
[Metax][CI] restore 'moe_expert_dispatch' outputs ( #6130 )
2026-01-21 10:33:09 +08:00
YuBaoku
c991fda54c
[CI] Enable 4-GPU e2e test in nightly and fix docker_tag_build ( #6128 )
2026-01-20 22:44:29 +08:00
lizexu123
f4902fe42d
[BugFix] fix wint2 ( #6109 )
...
* fix
* fix
* fix
2026-01-20 21:46:21 +08:00
yinwei
5385d51808
[XPU]XPU FD Release/2.4 Note
2026-01-20 20:38:34 +08:00
Copilot
dcb20c1a2a
[WIP] Add directory guide to mkdocs configuration ( #6121 )
...
* Initial plan
* Add PaddleFormers Backend documentation to mkdocs.yml navigation
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2026-01-20 19:51:27 +08:00
luukunn
56e22a7ddc
[Docs]fix doc ( #6119 )
...
* fix doc
* fix doc
2026-01-20 19:46:05 +08:00
yinwei
51a8a2ed57
[XPU] Support CudaGraph(add block attn cuda_graph support) ( #6116 )
...
* add block attn cuda_graph support
2026-01-20 19:33:11 +08:00
jackyYang6
00a6a73431
docs: fix pre-commit error of markdown ( #6100 )
2026-01-20 19:32:05 +08:00
ChowMingSing
bf60e103b6
[CI]Fix test case ( #6111 )
2026-01-20 17:47:44 +08:00
Ryan
dda27e50f5
[Graph Optimization] remove static_op_get_block_shape_and_split_kv_block from cudagraph ( #6081 )
...
* rm static_op_get_block_shape_and_split_kv_block from cudagraph
* update max_capture_shape
* fallback: zeros -> empty to avoid coverage check
* check graph_opt_config exists
* add max_capture_shape_dy2st && full_cuda_graph: false -> true in 28B vl test
* add use_cudagraph flag to control step_use_cudagraph
2026-01-20 14:05:18 +08:00
zhupengyang
45ebb2efb4
[XPU] support plugin model ( #6092 )
2026-01-20 13:00:09 +08:00