[Feature] bad words support v1 scheduler and specifiy token ids (#3608)

* support bad_words_token_ids

* docs

* fix test

* fix

* bad words support kvcache v1 and token ids

* fix
This commit is contained in:
Sunny-bot1
2025-08-26 11:14:51 +08:00
committed by GitHub
parent c43a4bec00
commit c68c3c4b8b
16 changed files with 420 additions and 62 deletions
+43 -6
View File
@@ -183,7 +183,7 @@ Used to prevent the model from generating certain specific words during the infe
## Usage Instructions
Include the `bad_words` parameter in the request:
Include the `bad_words` or `bad_words_token_ids` parameter in the request:
* Example request with curl:
@@ -192,9 +192,22 @@ curl -X POST "http://0.0.0.0:9222/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "How old are you"}
{"role": "user", "content": "How are you"}
],
"bad_words": ["age", "I"]
"bad_words": [" well", " Today"]
}'
```
Equal to
```bash
curl -X POST "http://0.0.0.0:9222/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "How are you"}
],
"bad_words_token_ids": [1622, 25062]
}'
```
@@ -203,15 +216,37 @@ curl -X POST "http://0.0.0.0:9222/v1/chat/completions" \
```python
import openai
host = "0.0.0.0"
port = "8170"
port = "9222"
client = openai.Client(base_url=f"http://{host}:{port}/v1", api_key="null")
response = client.chat.completions.create(
model="null",
messages=[
{"role": "system", "content": "I'm a helpful AI assistant."},
{"role": "user", "content": "Hello, how are you?"},
],
extra_body={"bad_words": ["you", "me"]},
extra_body={"bad_words": [" well", " Today"]},
stream=True,
)
for chunk in response:
if chunk.choices[0].delta:
print(chunk.choices[0].delta.content, end='')
print('\n')
```
Equal to
```python
import openai
host = "0.0.0.0"
port = "9222"
client = openai.Client(base_url=f"http://{host}:{port}/v1", api_key="null")
response = client.chat.completions.create(
model="null",
messages=[
{"role": "user", "content": "Hello, how are you?"},
],
extra_body={"bad_words_token_ids": [1622, 25062]},
stream=True,
)
for chunk in response:
@@ -223,3 +258,5 @@ print('\n')
## Parameter Description
`bad_words`: List of forbidden words. Type: list of str. Each word must be a single token.
`bad_words_token_ids`: List of forbidden token ids. Type: list of int.
+6
View File
@@ -153,6 +153,9 @@ include_stop_str_in_output: Optional[bool] = False
bad_words: Optional[List[str]] = None
# List of forbidden words (e.g., sensitive words) that the model should avoid generating (default None means no restriction).
bad_words_token_ids: Optional[List[int]] = None
# List of forbidden token ids that the model should avoid generating (default None means no restriction).
repetition_penalty: Optional[float] = None
# Repetition penalty coefficient, reducing the probability of repeating already generated tokens (`>1.0` suppresses repetition, `<1.0` encourages repetition, default None means disabled).
```
@@ -340,6 +343,9 @@ include_stop_str_in_output: Optional[bool] = False
bad_words: Optional[List[str]] = None
# List of forbidden words (e.g., sensitive words) that the model should avoid generating (default None means no restriction).
bad_words_token_ids: Optional[List[int]] = None
# List of forbidden token ids that the model should avoid generating (default None means no restriction).
repetition_penalty: Optional[float] = None
# Repetition penalty coefficient, reducing the probability of repeating already generated tokens (`>1.0` suppresses repetition, `<1.0` encourages repetition, default None means disabled).
```
+42 -6
View File
@@ -183,7 +183,7 @@ print('\n')
## 使用说明
请求中加入bad_words参数
可以在请求中加入bad_words参数,也可以加入bad_words_token_ids参数
* 使用 curl 命令发送用户请求示例如下:
@@ -192,9 +192,22 @@ curl -X POST "http://0.0.0.0:9222/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "How old are you"}
{"role": "user", "content": "How are you"}
],
"bad_words": ["age", "I"]
"bad_words": [" well", " Today"]
}'
```
等价于
```bash
curl -X POST "http://0.0.0.0:9222/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "How are you"}
],
"bad_words_token_ids": [1622, 25062]
}'
```
@@ -203,15 +216,37 @@ curl -X POST "http://0.0.0.0:9222/v1/chat/completions" \
```python
import openai
host = "0.0.0.0"
port = "8170"
port = "9222"
client = openai.Client(base_url=f"http://{host}:{port}/v1", api_key="null")
response = client.chat.completions.create(
model="null",
messages=[
{"role": "system", "content": "I'm a helpful AI assistant."},
{"role": "user", "content": "Hello, how are you?"},
],
extra_body={"bad_words": ["you", "me"]},
extra_body={"bad_words": [" well", " Today"]},
stream=True,
)
for chunk in response:
if chunk.choices[0].delta:
print(chunk.choices[0].delta.content, end='')
print('\n')
```
等价于
```python
import openai
host = "0.0.0.0"
port = "9222"
client = openai.Client(base_url=f"http://{host}:{port}/v1", api_key="null")
response = client.chat.completions.create(
model="null",
messages=[
{"role": "user", "content": "Hello, how are you?"},
],
extra_body={"bad_words_token_ids": [1622, 25062]},
stream=True,
)
for chunk in response:
@@ -223,3 +258,4 @@ print('\n')
## 参数说明
* `bad_words`: 禁止生成的词列表。list类型,每个元素为str类型。仅支持每个元素为单个token。
* `bad_words_token_ids`: 禁止生成的token id列表。list类型,每个元素为int类型。
+3
View File
@@ -153,6 +153,9 @@ include_stop_str_in_output: Optional[bool] = False
bad_words: Optional[List[str]] = None
# 禁止生成的词汇列表(例如敏感词),模型会避免输出这些词(默认 None 表示不限制)。
bad_words_token_ids: Optional[List[int]] = None
# 禁止生成的token id列表,模型会避免输出这些词(默认 None 表示不限制)。
repetition_penalty: Optional[float] = None
# 重复惩罚系数,降低已生成 token 的重复概率(>1.0 抑制重复,<1.0 鼓励重复,默认 None 表示禁用)。
```