polish code with new pre-commit rule (#2923)

This commit is contained in:
Zero Rains
2025-07-19 23:19:27 +08:00
committed by GitHub
parent b8676d71a8
commit 25698d56d1
424 changed files with 14307 additions and 13518 deletions
+7 -2
View File
@@ -18,8 +18,9 @@ Interfaces that support toggling the reasoning mode:
For reasoning models, the length of the reasoning content can be controlled via `reasoning_max_tokens`. Add `metadata={"reasoning_max_tokens": 1024}` to the request.
### Quick Start
When launching the model service, specify the parser name using the `--reasoning-parser` argument.
When launching the model service, specify the parser name using the `--reasoning-parser` argument.
This parser will process the model's output and extract the `reasoning_content` field.
```bash
python -m fastdeploy.entrypoints.openai.api_server \
--model /path/to/your/model \
@@ -29,7 +30,9 @@ python -m fastdeploy.entrypoints.openai.api_server \
--quantization wint4 \
--reasoning-parser ernie-45-vl
```
Next, make a request to the model that should return the reasoning content in the response.
```bash
curl -X POST "http://0.0.0.0:8192/v1/chat/completions" \
-H "Content-Type: application/json" \
@@ -43,10 +46,12 @@ curl -X POST "http://0.0.0.0:8192/v1/chat/completions" \
"metadata": {"enable_thinking": true}
}'
```
The `reasoning_content` field contains the reasoning steps to reach the final conclusion, while the `content` field holds the conclusion itself.
### Streaming chat completions
Streaming chat completions are also supported for reasoning models. The `reasoning_content` field is available in the `delta` field in `chat completion response chunks`
```python
from openai import OpenAI
# Set OpenAI's API key and API base to use vLLM's API server.
@@ -69,4 +74,4 @@ for chunk in chat_response:
if chunk.choices[0].delta is not None:
print(chunk.choices[0].delta, end='')
print("\n")
```
```