FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-05-07 07:59:05 +08:00

Author	SHA1	Message	Date
GoldPancake	646aced1eb	[UT] Add GLM E2E tests for non-MTP and MTP (#6163 ) * add glm ut	2026-01-23 10:34:29 +08:00
wangyifei	b7c5daa316	[RL] add pause, update_weights, resume interface for async RL (#6052 ) * support dynamic run_control_request through zmq from apiserver to common_engine * support pause/resume/is_paused/update_weights in apiserver->common_engine by common run_control_method * change /is_puased from HTTP POST method to GET method * add pause、resume、is_paused implementation * support engine <==> worker communication(request&response) * support sync weights through RDMA from checkpoint_transfer * support specified version, rsync_config in update_weights rpc call * add pause, update_weights, resume interface for async RL * bug fix: update_weights support using default arguments * fix typo * typo fix * typo fix * typo fix * add unitest for control request/response, localscheduler.get_inflight_requests, resource_manager_v1.preempted_all * add "rsync" to LoadConfig.load_strategy Literal type hints Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * typo fix * typo fix * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * check version/rsync params * add error log when version.txt not exists Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * raise specified ValueError when paramters check failed Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * tp barrier after run_control_method * encode 'engine_worker_queue_port' to unique name of worker2engine fmq queue * typo fix * typo fix --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-01-23 10:18:07 +08:00
Copilot	96b2cf2c20	[Docs] Update FastDeploy Docker image to 2.4.0 for Nvidia GPU installation (#6168 ) * Initial plan * Update Nvidia GPU Docker image version from 2.3.3 to 2.4.0 Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2026-01-22 22:01:13 +08:00
Ryan	31c219d483	[Graph Optimization] Add `max_capture_shape_prefill` && `cudagraph_capture_sizes_prefill` (#6148 ) * Add max_capture_shape_dy2st parameter to YAML config * split cudagraph capture size between decode and prefill * rm if * add default value	2026-01-22 21:37:18 +08:00
Yonghua Li	8d27a523e7	[Feature] [KVCache] support attention_store kv cache backend (#5823 ) * [feat] support attention_store kv cache backend * [fix] fix codestyle * [chore] optimize log * [fix] fix write storage task * [fix] fix read storage * [fix] fix code conflict after merge develop * [fix] fix cache bytes and read task token ids * [chore] add model for cache transfer manager * [chore] add some log * [chore] remove launched_cache_manager_signal * [fix] fix write_back_storage_task match_block_num condition * [fix] fix swap_cost_time * [ci] fix ci * Update fastdeploy/engine/sched/resource_manager_v1.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update fastdeploy/cache_manager/cache_transfer_manager.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update fastdeploy/cache_manager/transfer_factory/mooncake_store/attention_store.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-01-22 21:01:23 +08:00
yinwei	3cd0ffe36c	Enable CudaGraph	2026-01-22 19:49:33 +08:00
Yonghua Li	bb76d3b6f0	[RL] [APIServer] add more status codes for update/clear api (#6141 ) * [RL] add more status codes for update/clear api * [feat] return json response * [fix] fix ci	2026-01-22 17:26:18 +08:00
luukunn	6b968a76f1	【Optimization】update data_processor & add tool parser plugins (#6096 ) * update data_processor * fix unit test * fix unit test * add unit test * add tool parser plugins * fix tool call * fix tool call * fix tool call * fix unit test * fix unit test * add unit test * fix unit test * fix unit test * fix unit test	2026-01-22 17:17:32 +08:00
RAM	955785e2e0	[RL][R3] Fix typo (#6046 ) * fix typo	2026-01-22 15:46:34 +08:00
YuBaoku	1cfb042045	[CI] Add ep4_mtp e2e test (#6153 ) * [CI] Add ep4_mtp e2e test	2026-01-22 14:54:18 +08:00
yinwei	1e3c35496c	[XPU][Graph Optimization] XPU Support CUDAGraph (#6152 ) * support cuda graph	2026-01-22 14:41:56 +08:00
Haonan Luo	82057cb71f	Support MXFP4 for GPT-OSS (#5435 ) * support mxfp4 in gpt-oss * support mxfp4 in gpt-oss * add scope for flashinfer * remove torch code * update envs.FD_MXFP4_BACKEND * update process_weights_after_loading * update env name * support tp in gpt-oss, add e2e test * add flashinfer-python-paddle in requirements * fix import error * add test * add test * add test * add test	2026-01-22 14:21:01 +08:00
jc	309c7d9764	router support divided roolout (#6150 )	2026-01-22 10:39:39 +08:00
fxyfxy777	9c4db0ac3f	[BugFix] fix weight quant op (#6137 ) * fix weight quant * fix weight quant * bit equal * code style	2026-01-22 09:50:57 +08:00
kxz2002	6e416c62dd	[Optimization] The pre- and post-processing pipeline do not perform dict conversion (#5494 ) * to_request_for_infer initial commit * refact to from_chat_completion_request * preprocess use request initial commit * bugfix * processors refact to using request * bug fix * refact Request from_generic_request * post process initial commit * bugfix * postprocess second commit * bugfix * serving_embedding initial commit * serving_reward initial commit * bugfix * replace function name * async_llm initial commit * offline initial commit and fix bug * bugfix * fix async_llm * remove add speculate_metrics into data * fix logprobs bug * fix echo bug * fix bug * fix reasoning_max_tokens * bugfix * bugfix and modify unittest * bugfix and modify unit test * bugfix * bugfix * bugfix * modify unittest * fix error when reasong_content is none for text_processor * remove some unnessary logic * revert removed logic * implement add and set method for RequestOutput and refact code * modify unit test * modify unit test * union process_request and process_request_obj * remove a unit test * union process_response and process_response_obj * support qwen3_vl_processor * modify unittest and remove comments * fix prompt_logprobs * fix codestyle * add v1 * v1 * fix unit test * fix unit test * fix pre-commit * fix * add process request * add process request * fix * fix * fix unit test * fix unit test * fix unit test * fix unit test * fix unit test * remove file * add unit test * add unit test * add unit test * fix unit test * fix unit test * fix * fix --------- Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com> Co-authored-by: luukunn <981429396@qq.com> Co-authored-by: luukunn <83932082+luukunn@users.noreply.github.com> Co-authored-by: Zhang Yulong <35552275+ZhangYulongg@users.noreply.github.com>	2026-01-22 00:50:52 +08:00
YuBaoku	fe5ba4b509	[CI] Update image used by build_rl in ce_job.yml	2026-01-21 20:57:50 +08:00
yangjianfengo1	bb635e0819	fix text (#6145 )	2026-01-21 19:40:30 +08:00
yinwei	9536cd650b	[XPU] update release doc (#6143 )	2026-01-21 18:31:25 +08:00
zccjjj	14a64e9b3b	[XPU] change XPU EP interface from xDeepEP to paddle (#5706 ) * add ENV VAR to controll low lantency buffer	2026-01-21 18:23:45 +08:00
K11OntheBoat	490a6551dc	rename params of normalization layer (#6133 ) Co-authored-by: “liuruian” <liuruian@baidu.com>	2026-01-21 17:18:35 +08:00
lizexu123	1f96028bea	[BugFix] fix python3.12 v0_loader (#6132 )	2026-01-21 16:12:11 +08:00
yzwu	837ddca273	[Iluvartar][CI] Fix the error max_tokens_per_expert referenced before assignment (#6083 )	2026-01-21 16:01:29 +08:00
yinwei	85d995100a	Update Dummy Run To Suppport Mutil-Batch Execution (#6123 )	2026-01-21 14:20:44 +08:00
Cheng Yanfei	9ee0156cc3	add HPU tensorwise_fp8 readme (#6091 )	2026-01-21 11:48:22 +08:00
MingkunZhang	7e04067663	[Metax][CI] restore 'moe_expert_dispatch' outputs (#6130 )	2026-01-21 10:33:09 +08:00
YuBaoku	c991fda54c	[CI] Enable 4-GPU e2e test in nightly and fix docker_tag_build (#6128 )	2026-01-20 22:44:29 +08:00
lizexu123	f4902fe42d	[BugFix] fix wint2 (#6109 ) * fix * fix * fix	2026-01-20 21:46:21 +08:00
yinwei	5385d51808	[XPU]XPU FD Release/2.4 Note	2026-01-20 20:38:34 +08:00
Copilot	dcb20c1a2a	[WIP] Add directory guide to mkdocs configuration (#6121 ) * Initial plan * Add PaddleFormers Backend documentation to mkdocs.yml navigation Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2026-01-20 19:51:27 +08:00
luukunn	56e22a7ddc	[Docs]fix doc (#6119 ) * fix doc * fix doc	2026-01-20 19:46:05 +08:00
yinwei	51a8a2ed57	[XPU] Support CudaGraph(add block attn cuda_graph support) (#6116 ) * add block attn cuda_graph support	2026-01-20 19:33:11 +08:00
jackyYang6	00a6a73431	docs: fix pre-commit error of markdown (#6100 )	2026-01-20 19:32:05 +08:00
ChowMingSing	bf60e103b6	[CI]Fix test case (#6111 )	2026-01-20 17:47:44 +08:00
Ryan	dda27e50f5	[Graph Optimization] remove static_op_get_block_shape_and_split_kv_block from cudagraph (#6081 ) * rm static_op_get_block_shape_and_split_kv_block from cudagraph * update max_capture_shape * fallback: zeros -> empty to avoid coverage check * check graph_opt_config exists * add max_capture_shape_dy2st && full_cuda_graph: false -> true in 28B vl test * add use_cudagraph flag to control step_use_cudagraph	2026-01-20 14:05:18 +08:00
zhupengyang	45ebb2efb4	[XPU] support plugin model (#6092 )	2026-01-20 13:00:09 +08:00
jackyYang6	988e0bc338	[Feature] Add PaddleFormers fallback backend (#5999 ) * feat(paddleformers): add dense text model fallback backend * docs(paddleformers): add user guide and fix code review issues * add fallback unit test * precommit format * fix pre-commit * fix: address code review feedback * docs: add PaddleFormers backend documentation (EN) and simplify installation --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2026-01-19 21:50:50 +08:00
GoldPancake	879e45f6b3	fix compute logits problem (#6093 )	2026-01-19 20:12:14 +08:00
xiegegege	e22c4e29bb	[CE]add paddleocr config yaml (#6097 )	2026-01-19 20:07:42 +08:00
Jingfeng Wu	7d44009f39	[FDConfig] transfer metrics_port (#6056 ) * transfer metrics_port * transfer metrics_port	2026-01-19 19:58:57 +08:00
cmcamdy	211dd81ca7	add pd+mtp ci (#6090 )	2026-01-19 19:21:24 +08:00
Jiaxin Sui	e0d15a2ded	[XPU][CI] Xpu ci update (#6089 ) * add xpu ci case * add xpu ci case * add xpu ci case * Change runner from XPU-P800-8Card to XPU-P800 * Remove cache queue port from test_pd_03b_tp1.py Removed cache queue port arguments from test cases. * Remove cache queue port from test_pd_21b_tp2.py Removed cache queue port arguments from test cases. * Update README with PYTHONPATH setup instructions Added instructions for setting PYTHONPATH in CI scripts.	2026-01-19 16:09:09 +08:00
ChowMingSing	496cc23089	[CI]Fix test cases failing under Python 3.12 (#6059 ) * 修复python3.12下测试用例错误 * 修复python3.12下测试用例错误 --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2026-01-19 15:41:12 +08:00
sunxin	a4144e0b8e	[Optimization] Avoid unnecessary penalty computation (#6078 )	2026-01-19 15:24:12 +08:00
GoldPancake	05fbd89a8e	[Speculative Decoding][Bugfix] Fix MTP logprob issues caused by max_num_logprobs (#6084 )	2026-01-19 14:55:36 +08:00
ddchenhao66	3685474799	[XPU] xpu support mm prefill batch (#6072 ) Co-authored-by: ddchenhao66 <dhaochen163.com>	2026-01-19 14:36:35 +08:00
sunxin	9dc1c74d36	fix opt qknorm (#6080 )	2026-01-19 12:07:20 +08:00
YuBaoku	ac6fa6d725	[CI] Add 4-GPU e2e test job (#6082 )	2026-01-19 10:42:14 +08:00
kevin	0e0eaa1c57	[BugFix] fix mm revert bug (#6061 ) * fix mm revert bug * update code	2026-01-16 08:13:34 -08:00
Jiaxin Sui	70a962df53	[XPU][CI] XPU CI refactor (#6053 ) * add xpu ci case * add xpu ci case * add xpu ci case * Change runner from XPU-P800-8Card to XPU-P800	2026-01-16 20:57:58 +08:00
GoldPancake	b917b56aca	[Bugfix] Fix logprob issues caused by max_num_logprobs (#6067 )	2026-01-16 04:40:18 -08:00

1 2 3 4 5 ...

4485 Commits