abort requests (#6992)

This commit is contained in:
qwes5s5
2026-03-31 11:02:26 +08:00
committed by GitHub
parent 6d9739f360
commit daa95244f7
13 changed files with 670 additions and 3 deletions
+1
View File
@@ -151,6 +151,7 @@ The Router exposes a set of HTTP services to provide unified request scheduling,
|----------|------|------|
| POST | `/v1/chat/completions` | Provide scheduling services for inference requests based on the Chat Completions API |
| POST | `/v1/completions` | Provide scheduling services for general text completion inference requests |
| POST | `/v1/abort_requests` | Abort inference requests to release GPU memory and compute resources. Accepts `req_ids` or `abort_all=true`. Returns aborted requests with their generated token counts |
| POST | `/register` | Allow inference instances to register their metadata with the Router for scheduling |
| GET | `/registered` | Query the list of currently registered inference instances |
| GET | `/registered_number` | Query the number of currently registered inference instances |