FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-05-10 17:41:13 +08:00

Files

T

lizexu123 6619298b50 【Optim】Optimize grid dimensions using max_tokens_per_expert for MoE models (#6007 )

* update w4afp8

* build.sh ok

* support cuda_graph

* fix

* add test

* fix max_tokens_per_expert

* >=70

* fix

* compute_max_tokens_from_prefix_sum in w4afp8

* compute_max_tokens use cub

2026-01-15 19:18:42 +08:00

DCU

mv test to tests (#4129 )

2025-09-16 20:45:40 +08:00

EB_Lite

[CI] Fix unit_test error of unstable execution (#5660 )

2025-12-19 22:59:53 +08:00

EB_Lite_with_adapter

[CI] Add unittest (#5328 )

2025-12-09 19:19:42 +08:00

EB_VL_Lite

[CI] Adapt vl_model baseline changes due to Paddle update_2 (#6033 )

2026-01-14 15:22:26 +08:00

GCU

Add stable ci (#3460 )

2025-08-20 08:57:17 +08:00

GLM-45-AIR

[CI] Allow occasional distributed worker exit_code (#5341 )

2025-12-03 10:56:59 +08:00

HPU

[INTEL_HPU] [CI] enabled fastdeploy PR testing (#4596 )

2025-11-17 19:24:41 +08:00

iluvatar_UT

[Iluvatar] Fix FD launch error when specifing CUDA_VISBLE_DEVICE (#5735 )

2025-12-26 14:01:27 +08:00

metrics

[CI] Refactor RL tests to reuse test_metrics (#5741 )

2025-12-24 17:08:40 +08:00

Prompt_logprobs

[ci case]Check the chunking of the chat interface (#5981 )

2026-01-12 16:36:13 +08:00

Qwen2-7B-Instruct_offline

[Feature] support logits processors (#4515 )

2025-10-29 00:08:53 +08:00

Qwen3-MoE

[Feature] support stop_token_ids (#5399 )

2025-12-09 17:49:12 +08:00

utils

[CI] stable test_rollout_model.py (#4536 )

2025-10-22 01:59:44 -07:00

w4afp8

【Optim】Optimize grid dimensions using max_tokens_per_expert for MoE models (#6007 )

2026-01-15 19:18:42 +08:00

XPU_45T

[CI] Add unittest (#5328 )

2025-12-09 19:19:42 +08:00