Commit Graph

369 Commits

Author SHA1 Message Date
Yuanle Liu 8c3513a410 Revert "[TSP] last_norm allgather move to model.py (#5924)" (#5961)
This reverts commit 2bb838fed9.
2026-01-09 15:20:40 +08:00
xiaoluomi 2bb838fed9 [TSP] last_norm allgather move to model.py (#5924)
* support_lastnorm_gather_split_dev

* support_lastnorm_gather_split_dev1

* support_lastnorm_gather_split_dev3

* support_lastnorm_gather_split_dev4

* support_lastnorm_gather_split_dev5
2026-01-07 23:36:33 -08:00
GoldPancake a1fc4e249e [Bugfix] Fix mtp logprob hang problem when include stop_seq (#5927)
* fix mtp logprob hang when include stop_seq
2026-01-08 14:21:24 +08:00
lizhenyun01 2be8656c29 [BugFix] fix mtp split kv attetion (#5920)
* [BugFix] fix mtp split kv attetion

* clean code

* clean code
2026-01-07 04:07:31 -08:00
Ryan 3e74bacc5e add m_grouped_gemm_fp8_fp8_bf16_nt_contiguous_custom_python_op (#5847) 2026-01-07 16:17:55 +08:00
fmiao2372 1ee285c2d6 [Intel HPU] enable chunked prefill (#5903)
* [Intel HPU] enable chunked prefill

* fix bug by copilot comments
2026-01-06 21:01:50 +08:00
lizexu123 acdf0cd1d9 fix hadamard_block_size (#5888) 2026-01-06 14:12:14 +08:00
Neil Zhu 272a371635 [Metax] optimize flash attention backend (#5876) 2026-01-06 09:52:09 +08:00
lizexu123 1d3ae7c024 [BugFix] fix w4afp8 tp=8 (#5868)
* fix w4afp8 tp=8

* fix
2026-01-05 18:59:02 +08:00
ming1753 f50e1bcc16 [Others] enable use PFCC deep_ep (#5822)
* upstream deep_ep

* fix bug

* fix bug

* modify env name
2026-01-05 02:07:01 -08:00
周周周 dc13344ab8 [Optimization] add del to decrease peak memory in MoE prefill (#5863) 2026-01-05 14:01:48 +08:00
chen 193886e745 only cuda run triton op (#5846) 2025-12-31 14:17:31 +08:00
GoldPancake 4e10ae5d99 [Speculative Decoding] Optimize draft logprob (#5842)
* optimize draft logprob

* fix ut
2025-12-31 13:35:56 +08:00
chen 0bcf924e10 [Optimization] Optimization for gather_logprob by 10GB (#5817)
* opt logprobs gather_logprob,reduce device memory usage by 10GB when token_num=8k
2025-12-30 15:33:34 +08:00
lizexu123 44a13e4557 [Feature] support w4afp8 v1_loader and v0_loader(tp>1) (#5757)
* support

* fix

* support w4afp8 v1_loader and v0_loader

* fix

* fix test

* fix test

* fix test

* fix moe.py

* add test_ernie_4_5_w4afp8

* add test

* delete tensor

* fix test

* fix

* add

* fix test
2025-12-30 14:11:52 +08:00
CSWYF3634076 9286403570 [Models] Add Qwen3-VL Model Support (#5763)
* support v1 loader

* remove useless code

* remove useless

* [Model] support Qwen3VL images success

* [Model] support Qwen3VL rope_3d

* [Model] support Qwen3VL remove log

* [Model] support Qwen3VL RL

* [Model] support Qwen3VL tp

* [Model] support Qwen3VL video

* [Model] support Qwen3VL fix ernievl

* [Model] support Qwen3VL fix get_image_boundaries.cc array out of bounds

* [Model] support Qwen3VL fix multi card

* [Model] support Qwen3VL file close

* [Model] support Qwen3VL fix ce

* [Model] support Qwen3VL fix unittest

* [Model] support Qwen3VL add unittest

---------

Co-authored-by: Ayakouji <yuhongh@qq.com>
2025-12-29 17:39:33 +08:00
Ryan eb782a0225 [BugFix] Fix return value inconsistency for ep_moe_expert_combine op (#5812) 2025-12-29 16:44:00 +08:00
周周周 03363cab4c make flash_mask attention pybind (#5783) 2025-12-26 14:31:35 +08:00
Nyakku Shigure 11227e00bb [GraphOptimization] Wrap deep gemm and triton as python op (#5673)
* [GraphOptimization] Wrap deep gemm and triton as python op

* add unitest to _base_test && compatibility

* paddle.static.MetaTensor -> "paddle.static.MetaTensor"

* mv register_custom_python_op

* rename yaml

---------

Co-authored-by: DrRyanHuang <zihaohuang@aliyun.com>
2025-12-24 15:23:46 +08:00
GoldPancake 23d488c488 [Feature] Entropy calculation support (#5692)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
* support entropy

* fix bug

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-23 21:19:47 +08:00
bukejiyu d1c6e57341 [Others] upgrade paddleformer to 0.4.0 (#5599) 2025-12-23 05:08:01 -08:00
RuohengMa 2c3c983b96 [XPU] modify speculate_verify (#5522) 2025-12-23 14:50:30 +08:00
Sunny-bot1 04035e4ebf support w4afp8 two stage (#5608) 2025-12-22 15:13:05 +08:00
Sunny-bot1 40f3897a4e support w4afp8 moe offline permute & load (#5613) 2025-12-22 15:12:57 +08:00
yzwu ac013803f3 [Iluvatar] Support V1_KVCACHE_SCHEDULER and paddleocr-vl rope mode (#5555) 2025-12-18 02:14:25 -08:00
Longzhi Wang d8587e987e [Model] tp+ep support v1_loader (#5465)
* [Model] tp+ep support v1_loader

* fix

* fix mtp_linear

* fix mtp_linear

* fix

* fix

* fix v0 loader

* fix

* Add get_tensor for ep

* fix linear weight_loader

* fix typo

* fix
2025-12-18 14:31:54 +08:00
zhupengyang 8735cb5045 [XPU] refactor moe ffn (#5501)
- remove BKCL_DISPATCH_ALL_GATHER
- support sparse mode
- support moe quant_method
2025-12-18 14:14:05 +08:00
fmiao2372 404cf0ece4 [Intel HPU] enable tensor_wise_fp8 (#5324)
* [Intel HPU] enable tensor_wise_fp8

* update code based on comments

* fix code style issue

* fix bug about RP 5138

* mv kv_cache modifications to HPU backend

* fix FP8 Precision Issues

* fix FP8 Precision Issues

* Add quantization UT

---------

Co-authored-by: yanfeich <yanfei.cheng@intel.com>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-17 16:45:03 +08:00
freeliuzc 15f5112ecb [Speculative Decoding]Support different inferseed in speculate decoding (#5568)
* fix mtp entropy drop in RL

* optimize usage and fix unit test

* optimize padding_sampling_params speed(vectorized)

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-17 16:14:29 +08:00
RAM 6fc5eccf83 [RL] R3 Support RDMA Store (#5467)
* [RL] R3 support rdma store

* refine notes

* refine code

* disable prefix cache

* support preempted task and put cpu tensor
2025-12-16 16:50:13 +08:00
Yuanle Liu b8e4828373 [BugFix] fix dynamic c8 in v1 loader (#5562) 2025-12-15 04:07:54 -08:00
zhang-chenyi 77f8ba06e7 [Metax] fix release2.4 and support cudagraph (#5547)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Co-authored-by: xiaozude <xiaozude@outlook.com>
2025-12-15 14:23:33 +08:00
Ryan d01cb274d6 [Graph Optimization][CI] Add ERNIE45T 21B sot test (#5538)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-12-13 00:43:15 +08:00
Lucas 888c4b992d [XPU] refactor of block_attn param 'pos_emb_type' (#5511) 2025-12-12 14:30:09 +08:00
Ryan 4eb55332f6 [Models] Add forward_meta to VocabParallelEmbedding of all models (#5524) 2025-12-12 14:11:31 +08:00
bukejiyu 4066dfb4a6 RL fix (#5503) 2025-12-11 19:25:27 +08:00
Ryan e58fed3665 [Graph Optimization][BugFix][CI] Fix 0size bug && add unitest (#5495) 2025-12-11 16:25:26 +08:00
周周周 ff353b922f [Others] update tbo related code (#5485)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-12-11 12:34:46 +08:00
Neil Zhu 4403a21d4b [Metax] refactor cutlass moe and optimize flash attention (#5361)
* [Metax] refactor moe and flash attention backend
---------

Co-authored-by: zhangchenyi_dl <16219492+zhangchenyidl@user.noreply.gitee.com>
2025-12-10 17:15:17 +08:00
周周周 83a9ef51d7 [Others] add assert and only count the actual load in cuda_graph (#5445)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-12-10 11:22:54 +08:00
freeliuzc 53460935ec fix attention bug in spec decoding (#5460) 2025-12-10 10:56:37 +08:00
Haonan Luo e397c4fba6 [Others] remove add_bias option (#5425) 2025-12-09 17:39:35 +08:00
周周周 31410415db FA3 support qwen3 (#5441) 2025-12-09 16:16:16 +08:00
K11OntheBoat 8d99bac532 Remove CUDA ERROR 9 of inputs of get_padding_offset kernel (#5440)
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
2025-12-09 14:17:30 +08:00
chen 76649b45c1 [Optimization] compulte real max_logprobs in batch (#5430) 2025-12-09 14:15:05 +08:00
xiaozude c06a6234b9 [Metax] optimize mla attention (#5258) 2025-12-09 11:18:19 +08:00
Sunny-bot1 364197c4b5 support w4afp8 mtp (#5429) 2025-12-08 20:24:00 +08:00
周周周 2aea8a3a60 [Others] Remove useless code (#5404) 2025-12-08 13:59:46 +08:00
bukejiyu c3a8a16f4c fix deepseek (#5410)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-12-06 00:45:48 +08:00
bukejiyu f6eb4dcc40 bf16 deepseek (#5379) 2025-12-05 22:23:30 +08:00