FastDeploy/benchmarks/yaml/GLM45-air-32k-bf16-rl.yaml at 39ff38aba17ca23488b66d5e8f4c00c4e0ba3b24 - FastDeploy - 子说镜像小站

apps/FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Files

T

xiegegege 51c6fa8afc [CE]add 21b cpu cache ,glm mtp,glm for rl config (#6328 )

2026-02-03 20:10:47 +08:00

11 lines

258 B

YAML

Raw Blame History

 tensor_parallel_size: 8
 max_num_seqs: 32
 gpu_memory_utilization: 0.8
 load_choices: default_v1
 enable_prefix_caching: True
 graph_optimization_config: '{"use_cudagraph":true}'
 max_model_len: 66560
 enable_logprob: True
 enable_custom_all_reduce: False
 worker: 2