abort requests (#6992)

2026-04-23 00:17:25 +08:00 · 2026-03-31 11:02:26 +08:00
parent 6d9739f360
commit daa95244f7
13 changed files with 670 additions and 3 deletions
@@ -151,6 +151,7 @@ The Router exposes a set of HTTP services to provide unified request scheduling,
 |----------|------|------|
 | POST | `/v1/chat/completions` | Provide scheduling services for inference requests based on the Chat Completions API |
 | POST | `/v1/completions` | Provide scheduling services for general text completion inference requests |
+| POST | `/v1/abort_requests` | Abort inference requests to release GPU memory and compute resources. Accepts `req_ids` or `abort_all=true`. Returns aborted requests with their generated token counts |
 | POST | `/register` | Allow inference instances to register their metadata with the Router for scheduling |
 | GET | `/registered` | Query the list of currently registered inference instances |
 | GET | `/registered_number` | Query the number of currently registered inference instances |