FastDeploy/tests/ce/deploy/ernie45t_21b_sot_fp8.yaml at cae2709efffb2bf7c55be21f087f4e2fa42dad59 - FastDeploy - 子说镜像小站

apps/FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 08:21:53 +08:00

Files

T

Ryan 0d1a5e70bc [Graph Optimization] Add full_cuda_graph to control subgraph split (#6027 )

2026-01-14 11:43:59 +08:00

10 lines

220 B

YAML

Raw Blame History

 max_model_len: 32768
 max_num_seqs: 128
 tensor_parallel_size: 1
 quantization: block_wise_fp8
 graph_optimization_config:
   graph_opt_level: 1
   sot_warmup_sizes: [2,16,32,64]
   use_cudagraph: True
   full_cuda_graph: False