[Feature] Support stopping the inference for the corresponding request in the online service after a disconnection request. (#5320)

* request disconnect

* request disconnect

* fix bug

* fix bug--amend

---------

Co-authored-by: root <root@yq01-sys-rpm26xc1knu.yq01.baidu.com>
This commit is contained in:
qwes5s5
2026-01-16 11:46:13 +08:00
committed by GitHub
parent 8f035101ad
commit b2a2e11551
25 changed files with 1339 additions and 63 deletions
+1
View File
@@ -58,6 +58,7 @@ class ResourceManager:
self.req_dict = dict()
# current batch status of the engine
self.real_bsz = 0
self.abort_req_ids_set = set()
llm_logger.info(f"{self.info()}")
main_process_metrics.max_batch_size.set(max_num_seqs)