FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Author	SHA1	Message	Date
gongweibao	a6351dea0b	[BugFix][Optimization] Replace silent failures with catchable exceptions and informative error messages (#6533 ) * init * init * fix format * add * add files * add ut * fix some * add ut * add more * add * fix pre-commit * fix pre-commit * fix cover * skip long seq * add * add * fix * remove not need * fix set attr * fix comments * fix comments * fix failed tests --------- Co-authored-by: gongweibao <gognweibao@baidu.com>	2026-03-16 21:32:43 +08:00
Yonghua Li	6520ae807c	[BugFix] fix grpc failure when tracing init before workers forked (#6732 ) * [fix] fix grpc failure when tracing init before workers forked * [fix] change default exporter to http * [fix] fix test_trace	2026-03-10 21:24:10 +08:00
SunLei	5d9524fc3c	[Models][Feature] Support new ERNIE reward model and add return_token_ids to reward API (#6638 ) * reward model * Add support for pooling-based inference in the reward model * bugfix --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2026-03-06 18:51:00 +08:00
luukunn	caf73e8131	[Feature]add reasoning effort (#6656 ) * add reasoning_effort * fix log * fix reasoning_effort * add reasoning_effort level * fix valid_parameters * fix valid_parameters * fix * fix unit test * add unit test * add unit test	2026-03-06 14:16:02 +08:00
ddchenhao66	fa4815b93a	[BugFix] fix dp sheduler bug in ep4tp1 when start by using multi_api_server (#6598 ) * [BugFix] fix dp sheduler bug in ep4tp1 when start by using multi_api_server * [BugFix] modify request_queue and result_queue of dp scheduler	2026-03-05 10:04:12 +08:00
qwes5s5	375b5b7b21	[Feature]Log Format Normalization and Trace Log Optimization (#6370 ) * log refactor * log refactor 2 * log refactor 3	2026-03-03 11:31:45 +08:00
yzwu	6674131b0b	[Iluvatar] Support CudaGraph and optimize flash_attn_unpadded and fused_neox_rope_embedding (#6553 )	2026-03-02 14:07:17 +08:00
Jiang-Jia-Jun	39a5ea66c8	[BugFix] Enable control socket disable option in API server (#6545 ) * [BugFix] Enable control socket disable option in API server * Update requirements.txt * Update requirements.txt	2026-02-28 10:35:35 +08:00
Yuanle Liu	6d3fede240	[OP][Feature] 统一 limit_thinking_content_length CUDA 算子，支持回复长度限制与注入序列 (#6493 ) * Initial plan * Migrate PRs #6311, #6129, #6305 to develop and merge unit tests Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com> * fix * update * fix * fix ci * fix ci * Initial plan * test: add test_chat_with_response_max_tokens to test_EB_VL_Lite_serving.py Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com> * test: add disable-thinking case to test_chat_with_response_max_tokens Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com> * test: add both reasoning_max_tokens and response_max_tokens case Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com> * fix ci * fix ci * fix ci --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>	2026-02-25 21:36:50 +08:00
luukunn	765df94e6c	[Optimization]update prompt & prompt_token_ids (#6334 ) * fix prompt * add unit test * add unit test * fix	2026-02-04 20:08:01 +08:00
luukunn	0a19e1b6df	fix image gen (#6175 )	2026-01-23 11:24:12 +08:00
wangyifei	b7c5daa316	[RL] add pause, update_weights, resume interface for async RL (#6052 ) * support dynamic run_control_request through zmq from apiserver to common_engine * support pause/resume/is_paused/update_weights in apiserver->common_engine by common run_control_method * change /is_puased from HTTP POST method to GET method * add pause、resume、is_paused implementation * support engine <==> worker communication(request&response) * support sync weights through RDMA from checkpoint_transfer * support specified version, rsync_config in update_weights rpc call * add pause, update_weights, resume interface for async RL * bug fix: update_weights support using default arguments * fix typo * typo fix * typo fix * typo fix * add unitest for control request/response, localscheduler.get_inflight_requests, resource_manager_v1.preempted_all * add "rsync" to LoadConfig.load_strategy Literal type hints Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * typo fix * typo fix * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * check version/rsync params * add error log when version.txt not exists Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * raise specified ValueError when paramters check failed Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * tp barrier after run_control_method * encode 'engine_worker_queue_port' to unique name of worker2engine fmq queue * typo fix * typo fix --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-01-23 10:18:07 +08:00
Yonghua Li	bb76d3b6f0	[RL] [APIServer] add more status codes for update/clear api (#6141 ) * [RL] add more status codes for update/clear api * [feat] return json response * [fix] fix ci	2026-01-22 17:26:18 +08:00
luukunn	6b968a76f1	【Optimization】update data_processor & add tool parser plugins (#6096 ) * update data_processor * fix unit test * fix unit test * add unit test * add tool parser plugins * fix tool call * fix tool call * fix tool call * fix unit test * fix unit test * add unit test * fix unit test * fix unit test * fix unit test	2026-01-22 17:17:32 +08:00
kxz2002	6e416c62dd	[Optimization] The pre- and post-processing pipeline do not perform dict conversion (#5494 ) * to_request_for_infer initial commit * refact to from_chat_completion_request * preprocess use request initial commit * bugfix * processors refact to using request * bug fix * refact Request from_generic_request * post process initial commit * bugfix * postprocess second commit * bugfix * serving_embedding initial commit * serving_reward initial commit * bugfix * replace function name * async_llm initial commit * offline initial commit and fix bug * bugfix * fix async_llm * remove add speculate_metrics into data * fix logprobs bug * fix echo bug * fix bug * fix reasoning_max_tokens * bugfix * bugfix and modify unittest * bugfix and modify unit test * bugfix * bugfix * bugfix * modify unittest * fix error when reasong_content is none for text_processor * remove some unnessary logic * revert removed logic * implement add and set method for RequestOutput and refact code * modify unit test * modify unit test * union process_request and process_request_obj * remove a unit test * union process_response and process_response_obj * support qwen3_vl_processor * modify unittest and remove comments * fix prompt_logprobs * fix codestyle * add v1 * v1 * fix unit test * fix unit test * fix pre-commit * fix * add process request * add process request * fix * fix * fix unit test * fix unit test * fix unit test * fix unit test * fix unit test * remove file * add unit test * add unit test * add unit test * fix unit test * fix unit test * fix * fix --------- Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com> Co-authored-by: luukunn <981429396@qq.com> Co-authored-by: luukunn <83932082+luukunn@users.noreply.github.com> Co-authored-by: Zhang Yulong <35552275+ZhangYulongg@users.noreply.github.com>	2026-01-22 00:50:52 +08:00
qwes5s5	b2a2e11551	[Feature] Support stopping the inference for the corresponding request in the online service after a disconnection request. (#5320 ) * request disconnect * request disconnect * fix bug * fix bug--amend --------- Co-authored-by: root <root@yq01-sys-rpm26xc1knu.yq01.baidu.com>	2026-01-16 11:46:13 +08:00
Yonghua Li	60ee72f682	[BugFix] [MultiAPIServer] fix rdma script and port check for multi api server (#5935 ) * [fix] fix rdma script and add more error log for multi api server * [fix] log * [fix] fix test_multi_api_server * [fix] fix multi api server port check --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2026-01-12 10:38:52 +08:00
qwes5s5	b3ca7f041a	[BugFix] Fix redundant prompt_logprobs in the second chunk of streaming response when return_token_ids is enabled for v1/completions and fix trace file name (#5829 ) * fix prompt logprobs bug * fix trace file name --------- Co-authored-by: qwes5s5 <root@yq01-sys-rpm26xc1knu.yq01.baidu.com>	2026-01-06 14:11:43 +08:00
Copilot	7d5282e158	[APIServer][Feature] Add configurable worker health check timeout via FD_WORKER_ALIVE_TIMEOUT (#5865 ) * Initial plan * Add configurable FD_WORKER_ALIVE_TIMEOUT environment variable Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * Add test for FD_WORKER_ALIVE_TIMEOUT environment variable Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * Update docs/zh/usage/environment_variables.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update docs/usage/environment_variables.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Improve test coverage to validate integration with check_health calls Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * Remove test_worker_alive_timeout.py per reviewer feedback Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-01-05 09:47:12 +08:00
kxz2002	cad2932990	[BugFix] Fix process_response_dict to support async in serving_completion (#5758 ) * support process_response_dict async initial commit * fixbug * add unit test * optimize	2025-12-26 17:40:58 +08:00
memoryCoderC	be3be4913a	[Optimization] refactor(chat_handler,completion_handler): extract base classes and use AsyncLLM (#5195 ) * [Optimization] refactor(chat_handler,completion_handler): extract base classes and use AsyncLLM * [Optimization] refactor(chat_handler,completion_handler): rename class	2025-12-25 16:28:15 +08:00
Yonghua Li	0c8c6369ed	[Feature] [PD Disaggregation] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports (#5415 ) * [feat] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports * [fix] fix some bugs * [fix] fix rdma port for cache manager/messager * [fix] temporarily cancel port availability check to see if it can pass ci test * [feat] simplify args for multi api server * [fix] fix dp * [fix] fix port for xpu * [fix] add tests for ports post processing & fix ci * [test] fix test_multi_api_server * [fix] fix rdma_comm_ports args for multi_api_server * [fix] fix test_common_engine * [fix] fix test_cache_transfer_manager * [chore] automatically setting FD_ENABLE_MULTI_API_SERVER * [fix] avoid api server from creating engine_args twice * [fix] fix test_run_batch * [fix] fix test_metrics * [fix] fix splitwise connector init * [test] add test_rdma_transfer and test_expert_service * [fix] fix code syntax * [fix] fix test_rdma_transfer and build wheel with rdma script	2025-12-17 15:50:42 +08:00
xiaolei373	a30b4da260	[Feature] Tracing: Fine-Grained Tracing for Request Latency Part1 (#5458 )	2025-12-16 16:36:09 +08:00
GoldPancake	909059c60a	[Feature] Support for request-level speculative decoding metrics monitoring. (#5518 ) * support spec metrics monitor per request * fix bug * remove debug log * fix ut bugs	2025-12-12 12:22:18 +08:00
qwes5s5	d79438bb86	add detoken switch (#5463 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-12-10 21:44:02 +08:00
luukunn	fbc9bce1e9	[Feature]Optimization of Thinking Pattern Framework (#4302 ) * add model status in vl * add x1 parser * add model_status * fix parser * fix parser * fix parser * fix parser * Revert "fix parser" This reverts commit `300f446d8a`. * fix parser * fix * fix * fix * fix * fix parser * fix unit test * fix unit test * add unit test * fix * fix * add unit test * fix unit test * add unit test * add unit test * fix unit test * fix unit test * fix bug * fix unit test * x1 tool parser * fix unit test * fix unit test * fix unit test * fix n * fix unit test * add unit test * add unit test * remove pring	2025-12-10 16:17:06 +08:00
ming1753	9e15191cce	[BugFix] fix audio end bug (#5464 )	2025-12-10 13:37:26 +08:00
Juncai	80efe98f8d	[PD Disaggregation] Add timestamp for analyzing splitwise deployment (#5317 ) * Add timestamp for analyzing splitwise deployment * up * up * up * up * up * up * fix format * fix	2025-12-08 10:08:44 +08:00
lizexu123	d4979347ca	[Bug fix] Fix the multi-input accuracy issue in the pooling model. (#5374 ) * fix multi-inputs * fix threshold * fix threshold * fix	2025-12-05 20:18:17 +08:00
ming1753	dd2e9a14c7	[BugFix] Compatible with asynchronous functions (#5378 ) * [BugFix] fix data_processor asyn bug * fix bug	2025-12-05 11:05:21 +08:00
lizexu123	946025480e	[Bug fix] fix pooling models (#5358 ) * fix * fix * fix test * fix gpu_model_runner --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-04 11:06:30 +08:00
qwes5s5	a52aea073c	fix logprobs (#5335 )	2025-12-04 10:38:51 +08:00
ming1753	5f8d4aedea	[Feature] support audio tts (#5333 )	2025-12-03 21:06:48 +08:00
xiaolei373	a4bb3e9960	[bugfix]remove metrics middleware (#5332 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-12-03 17:07:45 +08:00
lizexu123	c563eca791	[Feature] support reward model (#5301 ) * Your commit message here * add test * update develop * support reward * support enable_chunk_prefill * support bingfa * support convert is reward * update test * delete print * fix enable_thinking * add document * fix place * fix test * fix * support enable_prefix_caching * add no-enable_prefix-caching test * fix * support enable_prefix_caching * delete print * fix document * fix * fix test * fix document and delete chinese * udpate * enable_thinking * fix test	2025-12-02 14:55:31 +08:00
qwes5s5	117980dd4e	[LogProbs]Enable prompt logprobs output and modify data transmission method for the online interface. (#5089 ) * add prompt logprobs * Merge prompt_logprobs_tensors and prompt_logprobs * fix param check * trigger ci * fix unitest * fix logprobs bug	2025-12-02 13:49:51 +08:00
Yonghua Li	a535050b11	[FDConfig] remove engine client args, use fd_config instead (#5217 ) * [refactor] remove engine client args, use fd_config instead * [chore] update * [fix] fix * [fix] fix * [chore] rename config to fd_config * [fix] fix run_batch * [ci] add ci case for engine client --------- Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>	2025-11-28 01:20:54 -08:00
fl0w2o48	e63d715fc3	[BugFix][Metrics] Fix Prometheus Multiprocess Metrics Issues and Add ZMQ Communication Metrics (#5185 ) * [Feature] add metrics for ZMQ and fix multiprocess metrics * fix test_metrics.py --------- Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>	2025-11-27 15:05:09 +08:00
SunLei	c424e08dc5	[Speculative Decoding] split draft_tokens into standalone post-processing path (#5205 ) * refactor(mtp): split draft_tokens into standalone post-processing path for MTP + logprobs * Restore Request.__repr__ implementation * ci * add envs * fix unittest	2025-11-27 11:22:41 +08:00
kxz2002	2d787590c4	[Feature] The 45VL supports prompt_token_ids + messages input. (#5148 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * support prompt_token_ids + messages * fix bug * refact code structure * support cache mm items * refact code structure * delete test cases * modify unit test * add unit test * add unit test * fix append * add check for messages	2025-11-25 23:11:44 +08:00
Yonghua Li	09379183e2	[BugFix] fix work metrics not returned by metrics api (#4912 ) * [BugFix] fix work metrics not returned by metrics api * [fix] fix conflict * [fix] fix ci	2025-11-25 19:12:29 +08:00
kevin	8e4e3ff510	[Feature] support eplb in api_server (#4782 ) * support eplb in api_server * update code * add eplb test case * update eplb * support tp+dp eplb * update test cese * update code * update code * fix bug * update copilot review * update test case name	2025-11-24 20:22:29 +08:00
kxz2002	97189079b9	[BugFix] unify max_tokens (#4968 ) * unify max tokens * modify and add unit test * modify and add unit test * modify and add unit tests --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-11-18 20:01:33 +08:00
qwes5s5	36216e62f0	[Log] Add trace log and add loggingInstrumentor tool (#4692 ) * add trace logger and trace print * trigger ci * fix unittest * translate notes and add copyright --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-11-17 11:08:57 +08:00
zhouchong	5444af6ff6	[APIServer] metrics use port the same as api_port (#5016 ) * metrics use port the same as api_port * Be tolerant to tests that monkeypatch/partially mock args. * Reduce code redundancy --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-11-17 10:42:45 +08:00
kxz2002	2dfbcf3cc9	[BugFix] Fix inference_start_time (#4922 ) * fix inference_start_time * fix inference_start_time	2025-11-10 19:28:44 +08:00
kxz2002	87911b7cf1	[Feature] Enable FastDeploy to support adding the “--api-key” authentication parameter. (#4806 ) * add api key initial commit * add unit test * modify unit test * move middleware to a single file and add unit tests	2025-11-08 18:24:02 +08:00
Juncai	08ca0f6aea	[Feature] [PD] add simple router and refine splitwise deployment (#4709 ) * add simple router and refine splitwise deployment * fix	2025-11-06 14:56:02 +08:00
luukunn	7b35488779	【DataProcessor】add options thinking_mode (#4735 ) * add thinking_mode * add thinking_mode * add thinking_mode * add thinking_mode * add thinking_mode * add thinking_mode * add unit test	2025-11-03 14:30:07 +08:00
kxz2002	a2870ed4a9	[Feature] Unify the registration name recognition for tool_parser and reasoning_parser to “-” (#4668 ) * parser register name unify * change ernie_x1 to ernie-x1 * change ernie4_5_vl to ernie-45-vl * fix unit test	2025-10-31 10:45:27 +08:00

1 2 3 4

160 Commits