[Feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing (#3552)

* [feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing

* infer engine support temp_scaled_logprobs and top_p_normalized_logprobs

* delete some code

* code check

* code check and add doc

* fix tokenizer.decoder(-1), return 'Invalid Token'

* add ci for temp_scaled and top_p logprobs

* check test

* check seq len time shape

* logprob clip inf

---------

Co-authored-by: sunlei1024 <sunlei5788@gmail.com>
This commit is contained in:
chen
2025-08-25 14:11:49 +08:00
committed by GitHub
parent 2410adb041
commit 9cab3f47ff
8 changed files with 195 additions and 8 deletions
+9 -2
View File
@@ -45,8 +45,9 @@ curl -X POST "http://0.0.0.0:8188/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "Hello!"}, "logprobs": true, "top_logprobs": 5
]
{"role": "user", "content": "Hello!"}
],
"logprobs": true, "top_logprobs": 0,
}'
```
@@ -193,6 +194,12 @@ max_streaming_response_tokens: Optional[int] = None
disable_chat_template: Optional[bool] = False
# Whether to disable chat template rendering, using raw input directly (default False means template is enabled).
temp_scaled_logprobs: Optional[bool] = False
# Whether to divide the logits by the temperature coefficient when calculating logprobs (default is False, meaning the logits are not divided by the temperature coefficient).
top_p_normalized_logprobs: Optional[bool] = False
# Whether to perform top-p normalization when calculating logprobs (default is False, indicating that top-p normalization is not performed).
```
### Differences in Return Fields