[Feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing (#3552)

* [feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing * infer engine support temp_scaled_logprobs and top_p_normalized_logprobs * delete some code * code check * code check and add doc * fix tokenizer.decoder(-1), return 'Invalid Token' * add ci for temp_scaled and top_p logprobs * check test * check seq len time shape * logprob clip inf --------- Co-authored-by: sunlei1024 <sunlei5788@gmail.com>
2026-04-23 00:17:25 +08:00 · 2025-08-25 14:11:49 +08:00
parent 2410adb041
commit 9cab3f47ff
8 changed files with 195 additions and 8 deletions
@@ -45,8 +45,9 @@ curl -X POST "http://0.0.0.0:8188/v1/chat/completions" \
 -H "Content-Type: application/json" \
 -d '{
  "messages": [
-    {"role": "user", "content": "Hello!"}, "logprobs": true, "top_logprobs": 5
-  ]
+    {"role": "user", "content": "Hello!"}
+  ],
+  "logprobs": true, "top_logprobs": 0,
 }'
 ```

@@ -193,6 +194,12 @@ max_streaming_response_tokens: Optional[int] = None

 disable_chat_template: Optional[bool] = False
 # Whether to disable chat template rendering, using raw input directly (default False means template is enabled).
+
+temp_scaled_logprobs: Optional[bool] = False
+# Whether to divide the logits by the temperature coefficient when calculating logprobs (default is False, meaning the logits are not divided by the temperature coefficient).
+
+top_p_normalized_logprobs: Optional[bool] = False
+# Whether to perform top-p normalization when calculating logprobs (default is False, indicating that top-p normalization is not performed).
 ```

 ### Differences in Return Fields