Commit Graph

184 Commits

Author SHA1 Message Date
qwes5s5 8883757bad [BugFix] Fix bugs in /v1/abort_requests interface from PR(#6992) (#7176)
* abort api bug fix

* bug fix

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2026-04-21 19:27:25 +08:00
K11OntheBoat b79b094dcc Change default workers and max-concurrency when launch api-server (#7457)
Co-authored-by: zhangxiao35 <zhangxiao35@baidu.com>
2026-04-20 15:55:06 +08:00
zhouchong 6e16438a57 [Feature] implement log channel separation and request log level system (#7190)
* feat: implement log channel separation and request log level system

* fix: log system improvements based on review

* add request_id to error logs, use RequestLogLevel enum, and unify logger implementation from utils to logger module
2026-04-16 15:13:05 +08:00
luukunn 14d556692b [BugFix] fix tool call parser (#7369)
* fix tool call parser

* add unit test

* fix unit test

* add unit test
2026-04-15 16:21:46 +08:00
Echo-Nie 8819a039c9 [Others] Fix typo (#7280)
* typo

* typo

* typo

* typo
2026-04-14 17:28:22 +08:00
luukunn 9d9d79c457 [DataProcessor] add strict (#7307)
* add strict

* fix
2026-04-14 17:25:38 +08:00
周周周 a6f0055d51 add ips check (#7352)
* commit

* commit

---------

Co-authored-by: “liuruian” <liuruian@baidu.com>
2026-04-13 15:24:22 +08:00
Longzhi Wang b262419db1 Revert "[Other] support video_fps args for video bench (#7077)" (#7254)
This reverts commit 938e7dd881.

Co-authored-by: TBD1 <798934910@qq.com>
2026-04-08 20:13:57 +08:00
Nana 367d37b523 fix typo (#7147) 2026-04-07 16:30:32 +08:00
luukunn 562fa31791 [BugFix]fix extract_tool_calls (#7154)
* fix extract_tool_calls
2026-04-02 21:18:37 +08:00
Longzhi Wang 938e7dd881 [Other] support video_fps args for video bench (#7077) 2026-04-02 10:40:15 +08:00
luukunn fa7a84926d [Optimization]Fix tool parser (#7079)
* fix tool parser
2026-04-01 21:20:34 +08:00
luukunn 3651113ee5 [DataProcessor]Remove ENABLE_V1_DATA_PROCESSOR (#7052)
* remove ENABLE_V1_DATA_PROCESSOR

* fix unit test

* fix unit test
2026-04-01 09:53:41 +08:00
qwes5s5 ee2b965f5f adjust config info (#7054) 2026-03-31 21:26:05 +08:00
qwes5s5 daa95244f7 abort requests (#6992) 2026-03-31 11:02:26 +08:00
Yonghua Li 6d9739f360 [BugFix] fix speculative gauge metrics in multi api server (#7082) 2026-03-31 10:52:50 +08:00
jackyYang6 05f2d95729 [RL] Adapt async rollout checkpoint update flow (#7042)
* update checkpoint-transfer flow and control update_weights params

* test: add update_weights route validation
2026-03-30 19:19:34 +08:00
luukunn 14b17c06af add completion_tokens default (#7032) 2026-03-26 21:06:23 +08:00
luukunn e6804ba97d [Optimization]Streaming requests return complete special tokens. (#6998)
* return special token

* add completions

* update

* fix

* add prompt_token_ids&                        completion_token_ids=None,

* fix unite test
2026-03-26 09:49:43 +08:00
Yonghua Li a7f52c300d [Feature] support v1 update/clear api for RL (#6761)
* [Feature] support v1 update/clear api for RL

* [fix] fix execute_model and add sleep/wakeup api

* [fix] fix mtp and key_prefix

* [chore] move _update_key_prefix to resume method

* [fix] make the interface safe to call multiple times

* [fix] fix some tiny bugs

* [chore] make small changes against pr review

* [docs] add docs for weight update

* [test] add some tests and update docs

* [style] fix code style check

* [test] fix ci

* [fix] fix stale control responses when control method timed out

* [chore] remove unused code

* [chore] fix code style

* [chore] optimize tags and key_prefix

* [test] fix ci

* [chore] fix code style

* [test] fix ci

* [fix] fix ep control

* [fix] fix ep control for engine cache queue
2026-03-25 19:18:46 +08:00
luukunn 33e79f922a [Optimization]Optimize CPU utilization (#6950)
* Optimize CPU utilization
2026-03-22 23:02:39 +08:00
SunLei 32b6900d01 fix code type (#6951) 2026-03-20 16:14:12 +08:00
luukunn c3d8db85c4 [Optimization] Update ZMQ server (#6735)
* add batch zmq send reaponse

* update

* Revert "update"

This reverts commit 0234a25b47.

* update

* remove lock

* fix unit test

* add unit test

* add unit test

* pre commit

* add unit test

* fix unit test

* add unit test

* fix worker>1

* update zmq_worker_pid

* fix unit test

* fix unit test

* fix unit test

* add unit test

* fix unit test

* fix first token time

* fix logprobs

* add unit test

* op

* remore debug log

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2026-03-19 21:53:16 +08:00
luukunn fe8d58a094 [Optimization]update request in tool parser&reasoning parser (#6858)
* update request in tool parser&reasoning parser
2026-03-17 11:51:12 +08:00
gongweibao a6351dea0b [BugFix][Optimization] Replace silent failures with catchable exceptions and informative error messages (#6533)
* init

* init

* fix format

* add

* add files

* add ut

* fix some

* add ut

* add more

* add

* fix pre-commit

* fix pre-commit

* fix cover

* skip long seq

* add

* add

* fix

* remove not need

* fix set attr

* fix comments

* fix comments

* fix failed tests

---------

Co-authored-by: gongweibao <gognweibao@baidu.com>
2026-03-16 21:32:43 +08:00
Yonghua Li 6520ae807c [BugFix] fix grpc failure when tracing init before workers forked (#6732)
* [fix] fix grpc failure when tracing init before workers forked

* [fix] change default exporter to http

* [fix] fix test_trace
2026-03-10 21:24:10 +08:00
SunLei 5d9524fc3c [Models][Feature] Support new ERNIE reward model and add return_token_ids to reward API (#6638)
* reward model

* Add support for pooling-based inference in the reward model

* bugfix

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2026-03-06 18:51:00 +08:00
luukunn caf73e8131 [Feature]add reasoning effort (#6656)
* add reasoning_effort

* fix log

* fix reasoning_effort

* add reasoning_effort level

* fix valid_parameters

* fix valid_parameters

* fix

* fix unit test

* add unit test

* add unit test
2026-03-06 14:16:02 +08:00
ddchenhao66 fa4815b93a [BugFix] fix dp sheduler bug in ep4tp1 when start by using multi_api_server (#6598)
* [BugFix] fix dp sheduler bug in ep4tp1 when start by using multi_api_server

* [BugFix] modify request_queue and result_queue of dp scheduler
2026-03-05 10:04:12 +08:00
qwes5s5 375b5b7b21 [Feature]Log Format Normalization and Trace Log Optimization (#6370)
* log refactor

* log refactor 2

* log refactor 3
2026-03-03 11:31:45 +08:00
yzwu 6674131b0b [Iluvatar] Support CudaGraph and optimize flash_attn_unpadded and fused_neox_rope_embedding (#6553) 2026-03-02 14:07:17 +08:00
Jiang-Jia-Jun 39a5ea66c8 [BugFix] Enable control socket disable option in API server (#6545)
* [BugFix] Enable control socket disable option in API server

* Update requirements.txt

* Update requirements.txt
2026-02-28 10:35:35 +08:00
Yuanle Liu 6d3fede240 [OP][Feature] 统一 limit_thinking_content_length CUDA 算子,支持回复长度限制与注入序列 (#6493)
* Initial plan

* Migrate PRs #6311, #6129, #6305 to develop and merge unit tests

Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>

* fix

* update

* fix

* fix ci

* fix ci

* Initial plan

* test: add test_chat_with_response_max_tokens to test_EB_VL_Lite_serving.py

Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>

* test: add disable-thinking case to test_chat_with_response_max_tokens

Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>

* test: add both reasoning_max_tokens and response_max_tokens case

Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>

* fix ci

* fix ci

* fix ci

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>
2026-02-25 21:36:50 +08:00
luukunn 765df94e6c [Optimization]update prompt & prompt_token_ids (#6334)
* fix prompt

* add unit test

* add unit test

* fix
2026-02-04 20:08:01 +08:00
luukunn 0a19e1b6df fix image gen (#6175) 2026-01-23 11:24:12 +08:00
wangyifei b7c5daa316 [RL] add pause, update_weights, resume interface for async RL (#6052)
* support dynamic run_control_request through zmq from apiserver to common_engine

* support pause/resume/is_paused/update_weights in apiserver->common_engine by common run_control_method

* change /is_puased from HTTP POST method to GET method

* add pause、resume、is_paused implementation

* support engine <==> worker communication(request&response)

* support sync weights through RDMA from checkpoint_transfer

* support specified version, rsync_config in update_weights rpc call

* add pause, update_weights, resume interface for async RL

* bug fix: update_weights support using default arguments

* fix typo

* typo fix

* typo fix

* typo fix

* add unitest for control request/response, localscheduler.get_inflight_requests, resource_manager_v1.preempted_all

* add "rsync" to LoadConfig.load_strategy Literal type hints

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* typo fix

* typo fix

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* check version/rsync params

* add error log when version.txt not exists

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* raise specified ValueError when paramters check failed

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* tp barrier after run_control_method

* encode 'engine_worker_queue_port' to unique name of worker2engine fmq queue

* typo fix

* typo fix

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-23 10:18:07 +08:00
Yonghua Li bb76d3b6f0 [RL] [APIServer] add more status codes for update/clear api (#6141)
* [RL] add more status codes for update/clear api

* [feat] return json response

* [fix] fix ci
2026-01-22 17:26:18 +08:00
luukunn 6b968a76f1 【Optimization】update data_processor & add tool parser plugins (#6096)
* update data_processor

* fix unit test

* fix unit test

* add unit test

* add tool parser plugins

* fix tool call

* fix tool call

* fix tool call

* fix unit test

* fix unit test

* add unit test

* fix unit test

* fix unit test

* fix unit test
2026-01-22 17:17:32 +08:00
kxz2002 6e416c62dd [Optimization] The pre- and post-processing pipeline do not perform dict conversion (#5494)
* to_request_for_infer initial commit

* refact to from_chat_completion_request

* preprocess use request initial commit

* bugfix

* processors refact to using request

* bug fix

* refact Request from_generic_request

* post process initial commit

* bugfix

* postprocess second commit

* bugfix

* serving_embedding initial commit

* serving_reward initial commit

* bugfix

* replace function name

* async_llm initial commit

* offline initial commit and fix bug

* bugfix

* fix async_llm

* remove add speculate_metrics into data

* fix logprobs bug

* fix echo bug

* fix bug

* fix reasoning_max_tokens

* bugfix

* bugfix and modify unittest

* bugfix and modify unit test

* bugfix

* bugfix

* bugfix

* modify unittest

* fix error when reasong_content is none for text_processor

* remove some unnessary logic

* revert removed logic

* implement add and set method for RequestOutput and refact code

* modify unit test

* modify unit test

* union process_request and process_request_obj

* remove a unit test

* union process_response and process_response_obj

* support qwen3_vl_processor

* modify unittest and remove comments

* fix prompt_logprobs

* fix codestyle

* add v1

* v1

* fix unit test

* fix unit test

* fix pre-commit

* fix

* add process request

* add process request

* fix

* fix

* fix unit test

* fix unit test

* fix unit test

* fix unit test

* fix unit test

* remove file

* add unit test

* add unit test

* add unit test

* fix unit test

* fix unit test

* fix

* fix

---------

Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
Co-authored-by: luukunn <981429396@qq.com>
Co-authored-by: luukunn <83932082+luukunn@users.noreply.github.com>
Co-authored-by: Zhang Yulong <35552275+ZhangYulongg@users.noreply.github.com>
2026-01-22 00:50:52 +08:00
qwes5s5 b2a2e11551 [Feature] Support stopping the inference for the corresponding request in the online service after a disconnection request. (#5320)
* request disconnect

* request disconnect

* fix bug

* fix bug--amend

---------

Co-authored-by: root <root@yq01-sys-rpm26xc1knu.yq01.baidu.com>
2026-01-16 11:46:13 +08:00
Yonghua Li 60ee72f682 [BugFix] [MultiAPIServer] fix rdma script and port check for multi api server (#5935)
* [fix] fix rdma script and add more error log for multi api server

* [fix] log

* [fix] fix test_multi_api_server

* [fix] fix multi api server port check

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2026-01-12 10:38:52 +08:00
qwes5s5 b3ca7f041a [BugFix] Fix redundant prompt_logprobs in the second chunk of streaming response when return_token_ids is enabled for v1/completions and fix trace file name (#5829)
* fix prompt logprobs bug

* fix trace file name

---------

Co-authored-by: qwes5s5 <root@yq01-sys-rpm26xc1knu.yq01.baidu.com>
2026-01-06 14:11:43 +08:00
Copilot 7d5282e158 [APIServer][Feature] Add configurable worker health check timeout via FD_WORKER_ALIVE_TIMEOUT (#5865)
* Initial plan

* Add configurable FD_WORKER_ALIVE_TIMEOUT environment variable

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* Add test for FD_WORKER_ALIVE_TIMEOUT environment variable

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* Update docs/zh/usage/environment_variables.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update docs/usage/environment_variables.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Improve test coverage to validate integration with check_health calls

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* Remove test_worker_alive_timeout.py per reviewer feedback

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-05 09:47:12 +08:00
kxz2002 cad2932990 [BugFix] Fix process_response_dict to support async in serving_completion (#5758)
* support process_response_dict async initial commit

* fixbug

* add unit test

* optimize
2025-12-26 17:40:58 +08:00
memoryCoderC be3be4913a [Optimization] refactor(chat_handler,completion_handler): extract base classes and use AsyncLLM (#5195)
* [Optimization] refactor(chat_handler,completion_handler): extract base classes and use AsyncLLM

* [Optimization] refactor(chat_handler,completion_handler): rename class
2025-12-25 16:28:15 +08:00
Yonghua Li 0c8c6369ed [Feature] [PD Disaggregation] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports (#5415)
* [feat] simplify configuration for pd-disaggregated deployment, and refactor post-init and usage for all ports

* [fix] fix some bugs

* [fix] fix rdma port for cache manager/messager

* [fix] temporarily cancel port availability check to see if it can pass ci test

* [feat] simplify args for multi api server

* [fix] fix dp

* [fix] fix port for xpu

* [fix] add tests for ports post processing & fix ci

* [test] fix test_multi_api_server

* [fix] fix rdma_comm_ports args for multi_api_server

* [fix] fix test_common_engine

* [fix] fix test_cache_transfer_manager

* [chore] automatically setting FD_ENABLE_MULTI_API_SERVER

* [fix] avoid api server from creating engine_args twice

* [fix] fix test_run_batch

* [fix] fix test_metrics

* [fix] fix splitwise connector init

* [test] add test_rdma_transfer and test_expert_service

* [fix] fix code syntax

* [fix] fix test_rdma_transfer and build wheel with rdma script
2025-12-17 15:50:42 +08:00
xiaolei373 a30b4da260 [Feature] Tracing: Fine-Grained Tracing for Request Latency Part1 (#5458) 2025-12-16 16:36:09 +08:00
GoldPancake 909059c60a [Feature] Support for request-level speculative decoding metrics monitoring. (#5518)
* support spec metrics monitor per request

* fix bug

* remove debug log

* fix ut bugs
2025-12-12 12:22:18 +08:00
qwes5s5 d79438bb86 add detoken switch (#5463)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-12-10 21:44:02 +08:00
luukunn fbc9bce1e9 [Feature]Optimization of Thinking Pattern Framework (#4302)
* add model status in vl

* add x1 parser

* add model_status

* fix parser

* fix parser

* fix parser

* fix parser

* Revert "fix parser"

This reverts commit 300f446d8a.

* fix parser

* fix

* fix

* fix

* fix

* fix parser

* fix unit test

* fix unit test

* add unit test

* fix

* fix

* add unit test

* fix unit test

* add unit test

* add unit test

* fix unit test

* fix unit test

* fix bug

* fix unit test

* x1 tool parser

* fix unit test

* fix unit test

* fix unit test

* fix n

* fix unit test

* add unit test

* add unit test

* remove pring
2025-12-10 16:17:06 +08:00