lizexu123
6619298b50
【Optim】Optimize grid dimensions using max_tokens_per_expert for MoE models ( #6007 )
...
* update w4afp8
* build.sh ok
* support cuda_graph
* fix
* add test
* fix max_tokens_per_expert
* >=70
* fix
* compute_max_tokens_from_prefix_sum in w4afp8
* compute_max_tokens use cub
2026-01-15 19:18:42 +08:00
yangjianfengo1
16e1992eba
[Bugfix] Increase the shape of w4afp8 gemm ( #5957 )
...
* 增加w4afp8 shape
* 增加w4afp8 shape
* code style
2026-01-09 14:11:17 +08:00
yangjianfengo1
59523b27de
opt w4afp8 ( #5853 )
2026-01-07 12:22:35 +08:00
lizexu123
acdf0cd1d9
fix hadamard_block_size ( #5888 )
2026-01-06 14:12:14 +08:00
lizexu123
44a13e4557
[Feature] support w4afp8 v1_loader and v0_loader(tp>1) ( #5757 )
...
* support
* fix
* support w4afp8 v1_loader and v0_loader
* fix
* fix test
* fix test
* fix test
* fix moe.py
* add test_ernie_4_5_w4afp8
* add test
* delete tensor
* fix test
* fix
* add
* fix test
2025-12-30 14:11:52 +08:00
lizexu123
6d323769dd
fix w4afp8 ( #5634 )
2025-12-22 13:39:41 +08:00
Sunny-bot1
3629db4129
[Quantization] Support w4afp8 MoE dynamic quantization ( #5282 )
...
* support dynamic activation quant for w4afp8
* support dynamic w4afp8
* add test
* fix
* fix
---------
Co-authored-by: zhoutianzi666 <17801055074@163.com >
2025-12-02 18:56:16 +08:00
yangjianfengo1
3afb717995
【Fix】fix deepep dispatch ( #5036 )
...
* fix dispatch
* fix dispatch
---------
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com >
2025-11-17 10:34:01 +08:00
yangjianfengo1
ae7bee8122
【New Feature】W4afp8 supports per group quantization ( #4987 )
...
* w4afp8 支持per group
* code style
* fix transpose
* revert fast hardmard
---------
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com >
Co-authored-by: plusNew001 <95567040+plusNew001@users.noreply.github.com >
2025-11-13 19:17:27 +08:00
YuBaoku
819b2dbbae
Revert "【New Feature】W4afp8 supports per group quantization ( #4272 )" ( #4854 )
...
This reverts commit 93fcf7e4ec .
2025-11-06 17:48:28 +08:00
yangjianfengo1
93fcf7e4ec
【New Feature】W4afp8 supports per group quantization ( #4272 )
...
* w4afp8 支持per group
* code style
* 精度完成
* revert append attn utils
* ffn1 动态量化
* ffn2 支持动态量化
* code style
* code style
* 修改单测
* 修改单测
* fix bug
* Implement conditional parameter creation for layers
Add parameter creation for up_gate_proj_in_scale when ep_size > 1.
* code style
* fix conflict
* code style
* code style
* 修复w4aint8 精度
* fix ci
---------
Co-authored-by: yuanxiaolan <yuanxiaolan01@baidu.com >
2025-11-05 21:00:23 +08:00
Zhenghai Zhang
1712e1351b
【Hackathon 9th No.86】autogen MoeFastHardamardImplWrapper template_instantiation ( #4592 )
...
* autogen MoeFastHardamardImplWrapper template_instantiation
* fix codestyle
* fix codestyle
* add impl cu files
2025-10-30 10:28:36 +08:00
yangjianfengo1
8e1b35a09b
【Fix bug] w4afp8 的nblock固定为256,并且fa3的append attn 增加mask参数 ( #3771 )
...
* fix w4afp8
* 增加集中式配置
* codestyle
* fix fa3 append attn
2025-09-02 19:17:01 +08:00
Yuan Xiaolan
c71ee0831c
add w4afp8 offline script ( #3636 )
2025-08-29 17:56:05 +08:00
Yuan Xiaolan
9205c88da1
support w4afp8 EP inference ( #3044 )
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-25 11:27:45 +08:00
yangjianfengo1
e5aa7087db
【bug fix】修复w4a8编译慢 ( #3510 )
...
* 修复w4a8编译
* code style
* 修复tma copy
2025-08-21 18:50:14 +08:00
yangjianfengo1
b047681c5d
【New Feature】支持Fp8 group Gemm 24稀疏 ( #3463 )
...
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
* 支持24稀疏
* code style
* 增加stmatrix 宏定义判断
* code style
2025-08-19 02:54:47 -07:00
yangjianfengo1
89397516a8
[New Feature] Support W4Afp8 MoE GroupGemm ( #3171 )
...
* init
* 增加多线程编译
* fix bug
* fix bug
* code style
* 增加fp16
* 将print替换成assert
* 修复stmatrix
* 减小单测shape
* 减小单测shape
2025-08-06 10:34:05 +08:00
Zero Rains
25698d56d1
polish code with new pre-commit rule ( #2923 )
2025-07-19 23:19:27 +08:00
Jiang-Jia-Jun
92c2cfa2e7
Sync v2.0 version of code to github repo
2025-06-29 23:29:37 +00:00