FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Author	SHA1	Message	Date
K11OntheBoat	b79b094dcc	Change default workers and max-concurrency when launch api-server (#7457 ) Co-authored-by: zhangxiao35 <zhangxiao35@baidu.com>	2026-04-20 15:55:06 +08:00
jackyYang6	a729e0f729	[Bugfix][RL] fix control request timeout in async update weights pipeline (#7430 )	2026-04-17 16:45:33 +08:00
zhouchong	6e16438a57	[Feature] implement log channel separation and request log level system (#7190 ) * feat: implement log channel separation and request log level system * fix: log system improvements based on review * add request_id to error logs, use RequestLogLevel enum, and unify logger implementation from utils to logger module	2026-04-16 15:13:05 +08:00
luukunn	14d556692b	[BugFix] fix tool call parser (#7369 ) * fix tool call parser * add unit test * fix unit test * add unit test	2026-04-15 16:21:46 +08:00
Echo-Nie	8819a039c9	[Others] Fix typo (#7280 ) * typo * typo * typo * typo	2026-04-14 17:28:22 +08:00
luukunn	9d9d79c457	[DataProcessor] add strict (#7307 ) * add strict * fix	2026-04-14 17:25:38 +08:00
周周周	a6f0055d51	add ips check (#7352 ) * commit * commit --------- Co-authored-by: “liuruian” <liuruian@baidu.com>	2026-04-13 15:24:22 +08:00
Longzhi Wang	b262419db1	Revert "[Other] support video_fps args for video bench (#7077 )" (#7254 ) This reverts commit `938e7dd881`. Co-authored-by: TBD1 <798934910@qq.com>	2026-04-08 20:13:57 +08:00
K11OntheBoat	bb48bcbaa2	Split enable_mm (#7183 ) Co-authored-by: liuruian <liuruian@MacBook-Pro.local>	2026-04-08 11:25:41 +08:00
Nana	367d37b523	fix typo (#7147 )	2026-04-07 16:30:32 +08:00
luukunn	562fa31791	[BugFix]fix extract_tool_calls (#7154 ) * fix extract_tool_calls	2026-04-02 21:18:37 +08:00
Longzhi Wang	938e7dd881	[Other] support video_fps args for video bench (#7077 )	2026-04-02 10:40:15 +08:00
luukunn	fa7a84926d	[Optimization]Fix tool parser (#7079 ) * fix tool parser	2026-04-01 21:20:34 +08:00
luukunn	3651113ee5	[DataProcessor]Remove ENABLE_V1_DATA_PROCESSOR (#7052 ) * remove ENABLE_V1_DATA_PROCESSOR * fix unit test * fix unit test	2026-04-01 09:53:41 +08:00
qwes5s5	ee2b965f5f	adjust config info (#7054 )	2026-03-31 21:26:05 +08:00
qwes5s5	daa95244f7	abort requests (#6992 )	2026-03-31 11:02:26 +08:00
Yonghua Li	6d9739f360	[BugFix] fix speculative gauge metrics in multi api server (#7082 )	2026-03-31 10:52:50 +08:00
jackyYang6	05f2d95729	[RL] Adapt async rollout checkpoint update flow (#7042 ) * update checkpoint-transfer flow and control update_weights params * test: add update_weights route validation	2026-03-30 19:19:34 +08:00
luukunn	14b17c06af	add completion_tokens default (#7032 )	2026-03-26 21:06:23 +08:00
luukunn	e6804ba97d	[Optimization]Streaming requests return complete special tokens. (#6998 ) * return special token * add completions * update * fix * add prompt_token_ids& completion_token_ids=None, * fix unite test	2026-03-26 09:49:43 +08:00
Yonghua Li	a7f52c300d	[Feature] support v1 update/clear api for RL (#6761 ) * [Feature] support v1 update/clear api for RL * [fix] fix execute_model and add sleep/wakeup api * [fix] fix mtp and key_prefix * [chore] move _update_key_prefix to resume method * [fix] make the interface safe to call multiple times * [fix] fix some tiny bugs * [chore] make small changes against pr review * [docs] add docs for weight update * [test] add some tests and update docs * [style] fix code style check * [test] fix ci * [fix] fix stale control responses when control method timed out * [chore] remove unused code * [chore] fix code style * [chore] optimize tags and key_prefix * [test] fix ci * [chore] fix code style * [test] fix ci * [fix] fix ep control * [fix] fix ep control for engine cache queue	2026-03-25 19:18:46 +08:00
luukunn	33e79f922a	[Optimization]Optimize CPU utilization (#6950 ) * Optimize CPU utilization	2026-03-22 23:02:39 +08:00
SunLei	32b6900d01	fix code type (#6951 )	2026-03-20 16:14:12 +08:00
luukunn	f4a79d4c00	[Optimization]Unified data processing for online and offline (#6891 ) * remove process_request * fix chat * fix unit test * remove process response * fix unit test * fix offline decode * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * fix sampling_params --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-03-19 21:56:09 +08:00
luukunn	c3d8db85c4	[Optimization] Update ZMQ server (#6735 ) * add batch zmq send reaponse * update * Revert "update" This reverts commit `0234a25b47`. * update * remove lock * fix unit test * add unit test * add unit test * pre commit * add unit test * fix unit test * add unit test * fix worker>1 * update zmq_worker_pid * fix unit test * fix unit test * fix unit test * add unit test * fix unit test * fix first token time * fix logprobs * add unit test * op * remore debug log --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2026-03-19 21:53:16 +08:00
luukunn	fe8d58a094	[Optimization]update request in tool parser&reasoning parser (#6858 ) * update request in tool parser&reasoning parser	2026-03-17 11:51:12 +08:00
gongweibao	a6351dea0b	[BugFix][Optimization] Replace silent failures with catchable exceptions and informative error messages (#6533 ) * init * init * fix format * add * add files * add ut * fix some * add ut * add more * add * fix pre-commit * fix pre-commit * fix cover * skip long seq * add * add * fix * remove not need * fix set attr * fix comments * fix comments * fix failed tests --------- Co-authored-by: gongweibao <gognweibao@baidu.com>	2026-03-16 21:32:43 +08:00
Yonghua Li	6520ae807c	[BugFix] fix grpc failure when tracing init before workers forked (#6732 ) * [fix] fix grpc failure when tracing init before workers forked * [fix] change default exporter to http * [fix] fix test_trace	2026-03-10 21:24:10 +08:00
gongweibao	1e49855b0f	[BugFix][DataProcessor] Add validate_model_path to fail fast on bad model path or unreachable network (#6713 ) * fix * add more endpoint * fix some --------- Co-authored-by: gongweibao <gognweibao@baidu.com>	2026-03-08 12:36:32 +08:00
luukunn	aac1484b0d	[Feature]add arguments string in tool (#6704 ) * add arguments string	2026-03-06 20:45:09 +08:00
SunLei	5d9524fc3c	[Models][Feature] Support new ERNIE reward model and add return_token_ids to reward API (#6638 ) * reward model * Add support for pooling-based inference in the reward model * bugfix --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2026-03-06 18:51:00 +08:00
luukunn	caf73e8131	[Feature]add reasoning effort (#6656 ) * add reasoning_effort * fix log * fix reasoning_effort * add reasoning_effort level * fix valid_parameters * fix valid_parameters * fix * fix unit test * add unit test * add unit test	2026-03-06 14:16:02 +08:00
ddchenhao66	fa4815b93a	[BugFix] fix dp sheduler bug in ep4tp1 when start by using multi_api_server (#6598 ) * [BugFix] fix dp sheduler bug in ep4tp1 when start by using multi_api_server * [BugFix] modify request_queue and result_queue of dp scheduler	2026-03-05 10:04:12 +08:00
qwes5s5	375b5b7b21	[Feature]Log Format Normalization and Trace Log Optimization (#6370 ) * log refactor * log refactor 2 * log refactor 3	2026-03-03 11:31:45 +08:00
yzwu	6674131b0b	[Iluvatar] Support CudaGraph and optimize flash_attn_unpadded and fused_neox_rope_embedding (#6553 )	2026-03-02 14:07:17 +08:00
Jiang-Jia-Jun	39a5ea66c8	[BugFix] Enable control socket disable option in API server (#6545 ) * [BugFix] Enable control socket disable option in API server * Update requirements.txt * Update requirements.txt	2026-02-28 10:35:35 +08:00
Yuanle Liu	6d3fede240	[OP][Feature] 统一 limit_thinking_content_length CUDA 算子，支持回复长度限制与注入序列 (#6493 ) * Initial plan * Migrate PRs #6311, #6129, #6305 to develop and merge unit tests Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com> * fix * update * fix * fix ci * fix ci * Initial plan * test: add test_chat_with_response_max_tokens to test_EB_VL_Lite_serving.py Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com> * test: add disable-thinking case to test_chat_with_response_max_tokens Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com> * test: add both reasoning_max_tokens and response_max_tokens case Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com> * fix ci * fix ci * fix ci --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>	2026-02-25 21:36:50 +08:00
Divano	ba3b142ff7	[Others] add objgraph to test out of memory (#6456 )	2026-02-11 20:17:20 +08:00
Yonghua Li	5ac5ecd0b0	[BugFix] fix cache transfer tasks failure after cache cleared (#6202 ) * [fix] fix cache transfer tasks failure after cache cleared * [fix] fix submit_task * [fix] fix cache manager hang when clearing prefix cache * [fix] fix list_proxy has no clear method * [fix] fix barrier * [fix] add barrier0 * [fix] add cache_task_is_paused_signal * [fix] fix condition * [fix] fix cache transfer sync and delay prefix cache tree clearing * [fix] fix typo * [chore] polish code * [fix] revert only rank0 write kv_cache_status_signal * [fix] fix thread pool and prefix cache manager hang * [fix] add timeout for task_swapping_event * [fix] tolerate prefix cache manager error while prefix tree is cleared * [chore] add more log * [fix] fix test_prefix_cache_manager * [fix] fix prefix_cache_status_signal usage	2026-02-08 15:33:56 +08:00
luukunn	765df94e6c	[Optimization]update prompt & prompt_token_ids (#6334 ) * fix prompt * add unit test * add unit test * fix	2026-02-04 20:08:01 +08:00
luukunn	0a19e1b6df	fix image gen (#6175 )	2026-01-23 11:24:12 +08:00
wangyifei	b7c5daa316	[RL] add pause, update_weights, resume interface for async RL (#6052 ) * support dynamic run_control_request through zmq from apiserver to common_engine * support pause/resume/is_paused/update_weights in apiserver->common_engine by common run_control_method * change /is_puased from HTTP POST method to GET method * add pause、resume、is_paused implementation * support engine <==> worker communication(request&response) * support sync weights through RDMA from checkpoint_transfer * support specified version, rsync_config in update_weights rpc call * add pause, update_weights, resume interface for async RL * bug fix: update_weights support using default arguments * fix typo * typo fix * typo fix * typo fix * add unitest for control request/response, localscheduler.get_inflight_requests, resource_manager_v1.preempted_all * add "rsync" to LoadConfig.load_strategy Literal type hints Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * typo fix * typo fix * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * check version/rsync params * add error log when version.txt not exists Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * raise specified ValueError when paramters check failed Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * tp barrier after run_control_method * encode 'engine_worker_queue_port' to unique name of worker2engine fmq queue * typo fix * typo fix --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-01-23 10:18:07 +08:00
Yonghua Li	bb76d3b6f0	[RL] [APIServer] add more status codes for update/clear api (#6141 ) * [RL] add more status codes for update/clear api * [feat] return json response * [fix] fix ci	2026-01-22 17:26:18 +08:00
luukunn	6b968a76f1	【Optimization】update data_processor & add tool parser plugins (#6096 ) * update data_processor * fix unit test * fix unit test * add unit test * add tool parser plugins * fix tool call * fix tool call * fix tool call * fix unit test * fix unit test * add unit test * fix unit test * fix unit test * fix unit test	2026-01-22 17:17:32 +08:00
kxz2002	6e416c62dd	[Optimization] The pre- and post-processing pipeline do not perform dict conversion (#5494 ) * to_request_for_infer initial commit * refact to from_chat_completion_request * preprocess use request initial commit * bugfix * processors refact to using request * bug fix * refact Request from_generic_request * post process initial commit * bugfix * postprocess second commit * bugfix * serving_embedding initial commit * serving_reward initial commit * bugfix * replace function name * async_llm initial commit * offline initial commit and fix bug * bugfix * fix async_llm * remove add speculate_metrics into data * fix logprobs bug * fix echo bug * fix bug * fix reasoning_max_tokens * bugfix * bugfix and modify unittest * bugfix and modify unit test * bugfix * bugfix * bugfix * modify unittest * fix error when reasong_content is none for text_processor * remove some unnessary logic * revert removed logic * implement add and set method for RequestOutput and refact code * modify unit test * modify unit test * union process_request and process_request_obj * remove a unit test * union process_response and process_response_obj * support qwen3_vl_processor * modify unittest and remove comments * fix prompt_logprobs * fix codestyle * add v1 * v1 * fix unit test * fix unit test * fix pre-commit * fix * add process request * add process request * fix * fix * fix unit test * fix unit test * fix unit test * fix unit test * fix unit test * remove file * add unit test * add unit test * add unit test * fix unit test * fix unit test * fix * fix --------- Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com> Co-authored-by: luukunn <981429396@qq.com> Co-authored-by: luukunn <83932082+luukunn@users.noreply.github.com> Co-authored-by: Zhang Yulong <35552275+ZhangYulongg@users.noreply.github.com>	2026-01-22 00:50:52 +08:00
qwes5s5	b2a2e11551	[Feature] Support stopping the inference for the corresponding request in the online service after a disconnection request. (#5320 ) * request disconnect * request disconnect * fix bug * fix bug--amend --------- Co-authored-by: root <root@yq01-sys-rpm26xc1knu.yq01.baidu.com>	2026-01-16 11:46:13 +08:00
Yonghua Li	456637002d	[BugFix] fix cache transfer manager updating/clearing (#5930 ) * [fix] fix cache transfer manager updating/clearing * [fix] fix code style * [fix] fix config * [fix] fix engine client * [fix] let worker update kv cache status signal * [fix] update worker process * [fix] fix clear/update for case if comm group is shutdown * [fix] update dynamic weight manager * [fix] fix port * [fix] add num_cpu_blocks arg for async_llm, and remove unnecessary waiting	2026-01-13 05:09:29 -08:00
Yonghua Li	60ee72f682	[BugFix] [MultiAPIServer] fix rdma script and port check for multi api server (#5935 ) * [fix] fix rdma script and add more error log for multi api server * [fix] log * [fix] fix test_multi_api_server * [fix] fix multi api server port check --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>	2026-01-12 10:38:52 +08:00
xiaoxiaohehe001	00a01ae024	[Feature] Support redundant expert for eplb (#5918 ) * [BugFix] support redundant expert for eplb * support redundant expert for eplb * support redundant expert for eplb * update * fix ci eplb	2026-01-09 17:13:24 +08:00
Yonghua Li	9445fbe054	[KVCache] launch cache transfer processes only if hierarchical cache or kv cache storage is enabled (#5871 ) * [fix] temporarily forbid cpu cache in update/clear api * [fix] stop launching cache transfer manager unless hierarchical cache is enabled * [fix] fix no attr hierarchical cache * [fix] fix ci * [fix] fix test_prefix_cache_manager.py	2026-01-06 14:27:47 +08:00

1 2 3 4 5

231 Commits