[Feature] bad words support v1 scheduler and specifiy token ids (#3608)

* support bad_words_token_ids * docs * fix test * fix * bad words support kvcache v1 and token ids * fix
2026-05-07 16:08:58 +08:00 · 2025-08-26 11:14:51 +08:00
parent c43a4bec00
commit c68c3c4b8b
16 changed files with 420 additions and 62 deletions
@@ -183,7 +183,7 @@ Used to prevent the model from generating certain specific words during the infe

 ## Usage Instructions

-Include the `bad_words` parameter in the request:
+Include the `bad_words` or `bad_words_token_ids` parameter in the request:

 * Example request with curl:

@@ -192,9 +192,22 @@ curl -X POST "http://0.0.0.0:9222/v1/chat/completions" \
 -H "Content-Type: application/json" \
 -d '{
  "messages": [
-    {"role": "user", "content": "How old are you"}
+    {"role": "user", "content": "How are you"}
  ],
-  "bad_words": ["age", "I"]
+  "bad_words": [" well", " Today"]
+}'
+```
+
+Equal to
+
+```bash
+curl -X POST "http://0.0.0.0:9222/v1/chat/completions" \
+-H "Content-Type: application/json" \
+-d '{
+  "messages": [
+    {"role": "user", "content": "How are you"}
+  ],
+  "bad_words_token_ids": [1622, 25062]
 }'
 ```

@@ -203,15 +216,37 @@ curl -X POST "http://0.0.0.0:9222/v1/chat/completions" \
 ```python
 import openai
 host = "0.0.0.0"
-port = "8170"
+port = "9222"
 client = openai.Client(base_url=f"http://{host}:{port}/v1", api_key="null")

 response = client.chat.completions.create(
    model="null",
    messages=[
-        {"role": "system", "content": "I'm a helpful AI assistant."},
+        {"role": "user", "content": "Hello, how are you?"},
    ],
-    extra_body={"bad_words": ["you", "me"]},
+    extra_body={"bad_words": [" well", " Today"]},
+    stream=True,
+)
+for chunk in response:
+    if chunk.choices[0].delta:
+        print(chunk.choices[0].delta.content, end='')
+print('\n')
+```
+
+Equal to
+
+```python
+import openai
+host = "0.0.0.0"
+port = "9222"
+client = openai.Client(base_url=f"http://{host}:{port}/v1", api_key="null")
+
+response = client.chat.completions.create(
+    model="null",
+    messages=[
+        {"role": "user", "content": "Hello, how are you?"},
+    ],
+    extra_body={"bad_words_token_ids": [1622, 25062]},
    stream=True,
 )
 for chunk in response:
@@ -223,3 +258,5 @@ print('\n')
 ## Parameter Description

 `bad_words`: List of forbidden words. Type: list of str. Each word must be a single token.
+
+`bad_words_token_ids`: List of forbidden token ids. Type: list of int.
@@ -153,6 +153,9 @@ include_stop_str_in_output: Optional[bool] = False
 bad_words: Optional[List[str]] = None
 # List of forbidden words (e.g., sensitive words) that the model should avoid generating (default None means no restriction).

+bad_words_token_ids: Optional[List[int]] = None
+# List of forbidden token ids that the model should avoid generating (default None means no restriction).
+
 repetition_penalty: Optional[float] = None
 # Repetition penalty coefficient, reducing the probability of repeating already generated tokens (`>1.0` suppresses repetition, `<1.0` encourages repetition, default None means disabled).
 ```
@@ -340,6 +343,9 @@ include_stop_str_in_output: Optional[bool] = False
 bad_words: Optional[List[str]] = None
 # List of forbidden words (e.g., sensitive words) that the model should avoid generating (default None means no restriction).

+bad_words_token_ids: Optional[List[int]] = None
+# List of forbidden token ids that the model should avoid generating (default None means no restriction).
+
 repetition_penalty: Optional[float] = None
 # Repetition penalty coefficient, reducing the probability of repeating already generated tokens (`>1.0` suppresses repetition, `<1.0` encourages repetition, default None means disabled).
 ```
@@ -183,7 +183,7 @@ print('\n')

 ## 使用说明

-请求中加入bad_words参数：
+可以在请求中加入bad_words参数，也可以加入bad_words_token_ids参数

 * 使用 curl 命令发送用户请求示例如下：

@@ -192,9 +192,22 @@ curl -X POST "http://0.0.0.0:9222/v1/chat/completions" \
 -H "Content-Type: application/json" \
 -d '{
  "messages": [
-    {"role": "user", "content": "How old are you"}
+    {"role": "user", "content": "How are you"}
  ],
-  "bad_words": ["age", "I"]
+  "bad_words": [" well", " Today"]
+}'
+```
+
+等价于
+
+```bash
+curl -X POST "http://0.0.0.0:9222/v1/chat/completions" \
+-H "Content-Type: application/json" \
+-d '{
+  "messages": [
+    {"role": "user", "content": "How are you"}
+  ],
+  "bad_words_token_ids": [1622, 25062]
 }'
 ```

@@ -203,15 +216,37 @@ curl -X POST "http://0.0.0.0:9222/v1/chat/completions" \
 ```python
 import openai
 host = "0.0.0.0"
-port = "8170"
+port = "9222"
 client = openai.Client(base_url=f"http://{host}:{port}/v1", api_key="null")

 response = client.chat.completions.create(
    model="null",
    messages=[
-        {"role": "system", "content": "I'm a helpful AI assistant."},
+        {"role": "user", "content": "Hello, how are you?"},
    ],
-    extra_body={"bad_words": ["you", "me"]},
+    extra_body={"bad_words": [" well", " Today"]},
+    stream=True,
+)
+for chunk in response:
+    if chunk.choices[0].delta:
+        print(chunk.choices[0].delta.content, end='')
+print('\n')
+```
+
+等价于
+
+```python
+import openai
+host = "0.0.0.0"
+port = "9222"
+client = openai.Client(base_url=f"http://{host}:{port}/v1", api_key="null")
+
+response = client.chat.completions.create(
+    model="null",
+    messages=[
+        {"role": "user", "content": "Hello, how are you?"},
+    ],
+    extra_body={"bad_words_token_ids": [1622, 25062]},
    stream=True,
 )
 for chunk in response:
@@ -223,3 +258,4 @@ print('\n')
 ## 参数说明

 * `bad_words`: 禁止生成的词列表。list类型，每个元素为str类型。仅支持每个元素为单个token。
+* `bad_words_token_ids`: 禁止生成的token id列表。list类型，每个元素为int类型。
@@ -153,6 +153,9 @@ include_stop_str_in_output: Optional[bool] = False
 bad_words: Optional[List[str]] = None
 # 禁止生成的词汇列表（例如敏感词），模型会避免输出这些词（默认 None 表示不限制）。

+bad_words_token_ids: Optional[List[int]] = None
+# 禁止生成的token id列表，模型会避免输出这些词（默认 None 表示不限制）。
+
 repetition_penalty: Optional[float] = None
 # 重复惩罚系数，降低已生成 token 的重复概率（>1.0 抑制重复，<1.0 鼓励重复，默认 None 表示禁用）。
 ```