mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced 2026-04-23 00:17:25 +08:00
[Feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing (#3552)
* [feature] Add temp_scaled_logprobs and top_p_normalized_logprobs parameters for logits and logprobs post processing * infer engine support temp_scaled_logprobs and top_p_normalized_logprobs * delete some code * code check * code check and add doc * fix tokenizer.decoder(-1), return 'Invalid Token' * add ci for temp_scaled and top_p logprobs * check test * check seq len time shape * logprob clip inf --------- Co-authored-by: sunlei1024 <sunlei5788@gmail.com>
This commit is contained in:
@@ -45,8 +45,9 @@ curl -X POST "http://0.0.0.0:8188/v1/chat/completions" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"messages": [
|
||||
{"role": "user", "content": "Hello!"}, "logprobs": true, "top_logprobs": 5
|
||||
]
|
||||
{"role": "user", "content": "Hello!"}
|
||||
],
|
||||
"logprobs": true, "top_logprobs": 0,
|
||||
}'
|
||||
```
|
||||
|
||||
@@ -193,6 +194,12 @@ max_streaming_response_tokens: Optional[int] = None
|
||||
|
||||
disable_chat_template: Optional[bool] = False
|
||||
# Whether to disable chat template rendering, using raw input directly (default False means template is enabled).
|
||||
|
||||
temp_scaled_logprobs: Optional[bool] = False
|
||||
# Whether to divide the logits by the temperature coefficient when calculating logprobs (default is False, meaning the logits are not divided by the temperature coefficient).
|
||||
|
||||
top_p_normalized_logprobs: Optional[bool] = False
|
||||
# Whether to perform top-p normalization when calculating logprobs (default is False, indicating that top-p normalization is not performed).
|
||||
```
|
||||
|
||||
### Differences in Return Fields
|
||||
|
||||
Reference in New Issue
Block a user