Commit Graph

27 Commits

Author SHA1 Message Date
JYChen 9bcd863902 [Others] support import deepgemm/deepep from fleet ops (#6351)
* update paddleformers to v1.0

* only change import fleetpath
2026-02-09 11:53:13 +08:00
JYChen c745a22420 [Feature] Support Ernie FP8 on sm100 ( the fixed version) (#6304) 2026-02-03 17:47:38 +08:00
JYChen 6c685c9474 Revert "[Feature] Support Ernie FP8 on sm100 (#5593)" (#6275)
This reverts commit eb80724b71.
2026-01-30 11:22:01 +08:00
JYChen eb80724b71 [Feature] Support Ernie FP8 on sm100 (#5593)
* Deepgemm暂时可用版本

* dense部分 e8m0 ok

* EB模型E8M0跑通的版本

* code check

* support 21b-tp2, dev_paddle

* 单机4.5T ep OK的版本

* 修复删除的代码,单机4.5T ep(非cudagraph)

* eb tp

* Support SM100 block-wise FP8 inference

* refine codes, support deepgemm on sm100

* add thirdparty PFCC/DeepGEMM

* fix ep decode

* 使用deepep ue8m0, 解决精度问题

* 修复FP8 TP精度

* Deepgemm升级适配Hopper逻辑

* add ue8m0 kernel

* add ue8m0 kernel

* fix custom_ops/gpu_ops/cpp_extensions.cc

* eb 输出正常

* eb5 text is right

* 目测精度一致

* 自测精度对齐

* 替换masked_per_token_quant, ep精度OK

* 性能提升约30%

* 暂时跑通ep但是有问题

* 自测一致

* rm test fun

* fix ep event

* 图优化算子更新Deepgemm

* fix build

* 暂时绕过deepgemm CI编译问题

* 根据SM区分deepgemm版本

* remove useless code

---------

Co-authored-by: ckl117 <ckl117@163.com>
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
Co-authored-by: fxyfxy777 <fxyfxy777@163.com>
2026-01-29 13:49:54 +08:00
fxyfxy777 79f42209bf add scale_wrapper for per_block_cast_to_fp8 (#6183) 2026-01-23 00:37:20 -08:00
Haonan Luo 82057cb71f Support MXFP4 for GPT-OSS (#5435)
* support mxfp4 in gpt-oss

* support mxfp4 in gpt-oss

* add scope for flashinfer

* remove torch code

* update envs.FD_MXFP4_BACKEND

* update process_weights_after_loading

* update env name

* support tp in gpt-oss, add e2e test

* add flashinfer-python-paddle in requirements

* fix import error

* add test

* add test

* add test

* add test
2026-01-22 14:21:01 +08:00
fxyfxy777 9c4db0ac3f [BugFix] fix weight quant op (#6137)
* fix weight quant

* fix weight quant

* bit equal

* code style
2026-01-22 09:50:57 +08:00
fxyfxy777 4c92035f2d [Feature] Unify fp8 block_wise quant ops (#5991)
* quant stash

* blockwise_quant

* precommit

* rm tensor.cut

* tp ok

* add swiglu

* rm outdate code

* fix activate ut

* change baseline

* fix baseline error
2026-01-15 05:50:37 -08:00
K11OntheBoat 8d99bac532 Remove CUDA ERROR 9 of inputs of get_padding_offset kernel (#5440)
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
2025-12-09 14:17:30 +08:00
Sunny-bot1 364197c4b5 support w4afp8 mtp (#5429) 2025-12-08 20:24:00 +08:00
Sunny-bot1 3629db4129 [Quantization] Support w4afp8 MoE dynamic quantization (#5282)
* support dynamic activation quant for w4afp8

* support dynamic w4afp8

* add test

* fix

* fix

---------

Co-authored-by: zhoutianzi666 <17801055074@163.com>
2025-12-02 18:56:16 +08:00
bukejiyu 1539fd6056 [BugFix]Set default OMP_NUM_THREADS=3 and fix extra GPU memory usage in DeepSeek (#5219)
* fix bug

* update

* update

* update

* fix copy

* update
2025-11-28 14:22:04 +08:00
chen 7c1fd19f0f [OPs] MoE support wfp8afp8(channelwise) and improve per_token_quant_fp8 (#4238) 2025-09-24 16:39:51 +08:00
lizexu123 c96a535a5d [Feature] support qwen3-embedding model load (#4202)
* support qwen3-embedding

* fix ci bug

* fix

* fix ci bug

* fix ci bug

* fix
2025-09-23 00:14:35 -07:00
gaoziyuan ab1929f5ff fix mem boom in ep (#3854) 2025-09-05 11:48:21 +08:00
Ryan 45f81b34f0 add dtype int32 (#3692)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-08-29 14:56:35 +08:00
bukejiyu 77514e3e1e [V1 Loader] support weight_only (#3413)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
* support wint4/wint8

* delete smoe case

* update ci

* print log
2025-08-23 13:13:41 +08:00
Zero Rains ce1f353c70 Move create_parameters to __init__ in FuseMOE for CultassBackend and TritonBackend (#3148)
* w4a8 bug

* fix w4a8 bug

* remove code

* modify the triton backend

* fix ep

* fix the bug with tensor_wise_fp8 in triton backend

* fix the RL

* fix bug by merge

* fix the bug in w4a8

* fix the tensor_wise_fp8 bug

* fix RL
2025-08-08 15:55:47 +08:00
bukejiyu 8e203666d9 w4a8 offline (#3074)
* w4a8 offline

* update

* update

* update
2025-07-30 16:33:30 +08:00
Zero Rains 0fb37ab7e4 update flake8 version to support pre-commit in python3.12 (#3000)
* update flake8 version to support pre-commit in python3.12

* polish code
2025-07-24 01:43:31 -07:00
bukejiyu bfeb664ab8 update (#2978)
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-07-24 00:16:42 +08:00
chen ad202272ed 【Infer】Improve the performance block_wise_fp8 of triton_moe_backend (#2942) 2025-07-23 13:02:50 +08:00
Zero Rains 25698d56d1 polish code with new pre-commit rule (#2923) 2025-07-19 23:19:27 +08:00
liddk1121 1b54a2831e Adapt for iluvatar gpu (#2684) 2025-07-07 16:53:14 +08:00
Jiang-Jia-Jun 05c670e593 [Sync] Update to latest code (#2679)
* [Sync] Update to latest code

* Add new code files

* Add new code files

* update code

* Try to fix build.sh

* Try to fix build.sh

* Update code

* Update requirements.txt

* Update code

---------

Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
2025-07-03 15:43:53 +08:00
Jiang-Jia-Jun 92c2cfa2e7 Sync v2.0 version of code to github repo 2025-06-29 23:29:37 +00:00
jiangjiajun 684703fd72 [LLM] First commit the llm deployment code 2025-06-09 19:20:15 +08:00