[Engine] [Feature] Refactor async_llm:cross-process with EngineService,based on zmq communication (#4868)

* Refactor async_llm:cross-process with EngineService

* fix: async_llm output process

* fix: return prompt_token_ids and prompt_tokens in first res

* optimize common_engine start func
This commit is contained in:
zhouchong
2025-12-09 10:53:40 +08:00
committed by GitHub
parent 2f208db4e9
commit 5d9b5e4a5b
8 changed files with 2217 additions and 1790 deletions
+1 -1
View File
@@ -191,7 +191,7 @@ class ZmqServerBase(ABC):
return str(e), None
def recv_result_handle(self):
while True:
while self.running:
try:
with self.response_token_lock:
client, _, request_id = self.socket.recv_multipart(flags=zmq.NOBLOCK)