FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Files

T

yuxuan 44b52701f6 [Feature] Support NVFP4 MoE on SM100 (#6003 )

* fp4 dense

* [WIP] support nvfp4, dense part

* [wip] developing loading qwen model

* loading

* update

* dense fp4 OK, cudagraph error

* [WIP] moe forward part

* with flashinfer-backend

* qwen3_moe_fp4

* update

* support flashinfer-cutlass moe, qwen3-moe-fp4 OK

* support ernie4.5-fp4

* fix load error

* add some ut

* add docs

* fix CLA, test

* fix the apply() in ModelOptNvFp4FusedMoE

* fix CodeStyle

* del the PADDLE_COMPATIBLE_API

* fix broken url: nvidia_gpu.md

* fix docs

* fix token_ids

* fix CI in Hopper

* move flashinfer imports inside the function

* fix model_runner

Removed the logic for generating random padding IDs.

* Remove skip condition for CUDA version in nvfp4 test

* add test for nvfp4

* fix according to review

* Add Chinese translation link to NVFP4 documentation

* del flashinfer.py

* fix unittest

---------

Co-authored-by: zoooo0820 <zoooo0820@qq.com>
Co-authored-by: bukejiyu <395822456@qq.com>

2026-01-29 14:16:07 +08:00

test_kv_cache.py

[BugFix]Fix load kv cache quant scale (#4077 )

2025-09-12 17:44:03 +08:00

test_modelopt_nvfp4.py

[Feature] Support NVFP4 MoE on SM100 (#6003 )

2026-01-29 14:16:07 +08:00

test_tensor_wise_fp8.py

[Intel HPU] enable tensor_wise_fp8 (#5324 )

2025-12-17 16:45:03 +08:00

test_w4a8.py

[Docs] Add License in Unittest (#4957 )

2025-11-12 10:44:09 +08:00

test_w4afp8.py

[Others] remove add_bias option (#5425 )

2025-12-09 17:39:35 +08:00