mouxin
6cae9b1f50
[Feature] Config eviction_duration ( #7125 )
...
* [Feature] Config eviction_duration
* [Feature] Config eviction_duration
* [Feature] Config eviction_duration
* [Feature] Config eviction_duration
---------
Co-authored-by: mouxin <mouxin@baidu.com >
2026-04-01 16:46:21 +08:00
qwes5s5
daa95244f7
abort requests ( #6992 )
2026-03-31 11:02:26 +08:00
mouxin
96b0ecea6b
[Feature] Update Counter Release ( #6943 )
2026-03-20 10:51:37 +08:00
mouxin
b61731bb96
[Feature][Docs] Adjust prefill release & expose load metrics ( #6884 )
2026-03-17 15:23:13 +08:00
mouxin
49fe68a518
[Docs] Update Golang Router FAQ ( #6829 )
2026-03-13 15:48:36 +08:00
Jiang-Jia-Jun
18e79dd660
[Metrics] Support cpu-cache-block-num ( #6390 )
...
Co-authored-by: root <root@szzj-bcc-offline-1487319.szzj.baidu.com >
2026-02-09 10:27:56 +08:00
mouxin
6e96bd0bd2
[Feature] Fix counter release logic & update go-router download URL ( #6280 )
...
* [Doc] Update prerequisites in the documentation
* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info
* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info
* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info
* [Feature] Fix counter release logic
* [Feature] Update go-router download URL
* [Feature] Update go-router download URL
* [Feature] Update go-router download URL
* [Feature] Update go-router download URL
* [Feature] Update token counter logic and docs
* [Feature] Update token counter logic and docs
---------
Co-authored-by: mouxin <mouxin@baidu.com >
2026-02-04 15:02:38 +08:00
mouxin
506f1545cd
[Feature] Enhance Router with /v1/completions, docs, scripts, and version info ( #5966 )
...
* [Doc] Update prerequisites in the documentation
* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info
* [Feature] Enhance Router with /v1/completions, docs, scripts, and version info
---------
Co-authored-by: mouxin <mouxin@baidu.com >
2026-01-30 10:28:48 +08:00
qwes5s5
38378415c7
add token ratio metrics ( #6236 )
2026-01-27 17:00:49 +08:00
wangyifei
53dc56f11b
[Docs] add docs of /v1/pause、/v1/resume、/v1/is_paused ( #6192 )
...
* support dynamic run_control_request through zmq from apiserver to common_engine
* support pause/resume/is_paused/update_weights in apiserver->common_engine by common run_control_method
* change /is_puased from HTTP POST method to GET method
* add pause、resume、is_paused implementation
* support engine <==> worker communication(request&response)
* support sync weights through RDMA from checkpoint_transfer
* support specified version, rsync_config in update_weights rpc call
* add pause, update_weights, resume interface for async RL
* bug fix: update_weights support using default arguments
* fix typo
* typo fix
* typo fix
* typo fix
* add unitest for control request/response, localscheduler.get_inflight_requests, resource_manager_v1.preempted_all
* add "rsync" to LoadConfig.load_strategy Literal type hints
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* typo fix
* typo fix
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* check version/rsync params
* add error log when version.txt not exists
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* raise specified ValueError when paramters check failed
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* tp barrier after run_control_method
* encode 'engine_worker_queue_port' to unique name of worker2engine fmq queue
* typo fix
* typo fix
* update docs of /v1/pause, /v1/resume, /v1/is_paused
* add zh docs of pause、resume、is_paused
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2026-01-23 17:57:51 +08:00
qwes5s5
d79438bb86
add detoken switch ( #5463 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-12-10 21:44:02 +08:00
Juncai
80efe98f8d
[PD Disaggregation] Add timestamp for analyzing splitwise deployment ( #5317 )
...
* Add timestamp for analyzing splitwise deployment
* up
* up
* up
* up
* up
* up
* fix format
* fix
2025-12-08 10:08:44 +08:00
LiqinruiG
df427ba06d
[Docs] add request params ( #5207 )
...
* [BugFix] rollback max_tokens and min_tokens when continue to infer
* [BugFix] rollback max_tokens and min_tokens when continue to infer
* [fix] add more logger info: max_tokens
* [Docs] add request params
---------
Co-authored-by: liqinrui <liqinrui@baidu.com >
2025-11-26 15:04:22 +08:00
Yonghua Li
cead6b26fa
[Metrics] Update time_to_first_token to include tokenization & queue time, and remove redundant metrics ( #4993 )
...
* [update] update time_to_first_tokens to include queue time, and remove first_token_latency and infer_latency
* [doc] update docs
* [ci] fix test
* [chore] delete redundant code
---------
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com >
2025-11-26 14:42:17 +08:00
ApplEOFDiscord
cfdd1600a5
update doc ( #4675 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-10-30 11:19:04 +08:00
yangjianfengo1
ba5c2b7e37
[Docx] add language (en/cn) switch links ( #4470 )
...
* add install docs
* 修改文档
* 修改文档
2025-10-17 15:47:41 +08:00
LiqinruiG
4251ac5e95
【Fix】 remove text_after_process & raw_prediction ( #4421 )
...
* remove text_after_process & raw_prediction
* remove text_after_process & raw_prediction
2025-10-16 19:00:18 +08:00
qwes5s5
553adb299e
【FastDeploy CLI】collect-env subcommand ( #4044 )
...
* collect-env subcommand
* trigger ci
---------
Co-authored-by: K11OntheBoat <your_email@example.com >
2025-09-15 10:31:23 +08:00
qwes5s5
58e0785bab
[metrics] update metrics markdown file ( #4061 )
...
* adjust md
* trigger ci
---------
Co-authored-by: K11OntheBoat <your_email@example.com >
2025-09-12 11:13:43 +08:00
xiaolei373
571ddc677b
Modify markdown ( #3896 )
...
* feat(log):add_request_and_response_log
* modify markdown graceful shutdown
2025-09-08 16:42:34 +08:00
qwes5s5
17169a14f2
[metrics] Add serveral observability metrics ( #3868 )
...
* Add several observability metrics
* [wenxin-tools-584] 【可观测性】支持查看本节点的并发数、剩余block_size、排队请求数等信息
* adjust some metrics and md files
* trigger ci
* adjust ci file
* trigger ci
* trigger ci
---------
Co-authored-by: K11OntheBoat <your_email@example.com >
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-09-08 14:13:13 +08:00
Jiang-Jia-Jun
2bd7d90929
Remove useless parameters
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-09-01 14:43:56 +08:00
Sunny-bot1
c68c3c4b8b
[Feature] bad words support v1 scheduler and specifiy token ids ( #3608 )
...
* support bad_words_token_ids
* docs
* fix test
* fix
* bad words support kvcache v1 and token ids
* fix
2025-08-25 20:14:51 -07:00
chen
9cab3f47ff
[Feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing ( #3552 )
...
* [feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing
* infer engine support temp_scaled_logprobs and top_p_normalized_logprobs
* delete some code
* code check
* code check and add doc
* fix tokenizer.decoder(-1), return 'Invalid Token'
* add ci for temp_scaled and top_p logprobs
* check test
* check seq len time shape
* logprob clip inf
---------
Co-authored-by: sunlei1024 <sunlei5788@gmail.com >
2025-08-25 14:11:49 +08:00
luukunn
9c129813f9
[Feature] add custom chat template ( #3251 )
...
* add custom chat_template
* add custom chat_template
* add unittest
* fix
* add docs
* fix comment
* add offline chat
* fix unit test
* fix unit test
* fix
* fix pre commit
* fix unit test
* add unit test
* add unit test
* add unit test
* fix pre_commit
* fix enable_thinking
* fix pre commit
* fix pre commit
* fix unit test
* add requirements
2025-08-18 16:34:08 +08:00
Sunny-bot1
19fda4e912
fix docs ( #3332 )
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-11 21:03:49 +08:00
Sunny-bot1
789dc67ff7
[Docs]fix sampling docs ( #3113 )
...
* fix sampling docs
* fix sampling docs
* update
2025-08-11 20:42:27 +08:00
LiqinruiG
25005fee30
[Doc] add chat_template_kwagrs and update params docs ( #3103 )
...
* add chat_template_kwagrs and update params docs
* add chat_template_kwagrs and update params docs
* update enable_thinking
* pre-commit
* update test case
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-31 19:44:06 +08:00
Jiang-Jia-Jun
998968f1e8
[Doc] Update parameters of serving
2025-07-30 22:35:01 +08:00
李泳桦
b242150f94
[feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client ( #3058 )
...
* [feat] extra parameters are all passed directly via http payload now, or in extra_body if using openai client
* [fix] delete ci test case for enable_thinking
* [fix] add reasoning_parser when server starts
* [fix] fix ci consistency test error with reasoning parser
* [doc] update docs related to metadata
* [fix] cancel enable_thinking default value
2025-07-30 19:25:20 +08:00
Zero Rains
25698d56d1
polish code with new pre-commit rule ( #2923 )
2025-07-19 23:19:27 +08:00
zhenwenDang
5fc659b900
[Docs] add enable_logprob parameter description ( #2850 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
* add enable_logprob parameter description
* add enable_logprob parameter description
* add enable_logprob parameter description
* add enable_logprob parameter description
* add enable_logprob parameter description
* add enable_logprob parameter description
---------
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com >
2025-07-15 19:47:45 +08:00
Jiang-Jia-Jun
e5b94d4117
Update README.md
2025-07-03 15:28:05 +08:00
Jiang-Jia-Jun
9f4a65d817
Update README.md
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-02 10:04:58 +08:00
Jiang-Jia-Jun
92c2cfa2e7
Sync v2.0 version of code to github repo
2025-06-29 23:29:37 +00:00