Commit Graph

64 Commits

Author SHA1 Message Date
ShaneGZhu 2d8338f9e4 [Optimization][DeepSeekV3.2]Reducing slot_mapping compute frequency from twice per layer to a single pre-processing step. (#7367) 2026-04-16 19:54:12 +08:00
RichardWooSJTU 420a8c1af5 fix deep gemm import (#7425) 2026-04-16 17:56:56 +08:00
GoldPancake a498720a75 [RL] Add clear_graph_opt_backend for glm4_mtp (#7378)
* add clear_grpah func

* fix spell
2026-04-15 19:44:15 +08:00
AIbin 8995a38fa4 fix dsa indexer norm to layernorm (#7398) 2026-04-15 11:42:45 +08:00
AIbin bb30f88f1a [Models] support MLA gate attention (#7404)
* support mla gate attn

* support mla gate attn
2026-04-15 11:42:34 +08:00
周周周 73bd4ab318 [Feature] 为 FusedMoE 添加 hidden_size 显式参数支持 (#7361)
[Feature] 为 FusedMoE 添加 hidden_size 显式参数支持
2026-04-13 20:24:58 +08:00
AIbin 1fb8194191 [OP][Models][Optimization] 优化 RoPE CUDA kernel 并更新 DeepSeek V3 配置 (#7359)
* dsk del prefill mask

* dsk support 1M+ seq_len rope

* update rope tests

* Replace max_position_embeddings with max_model_len

* 1D grid: gridDim.x has a maximum size of 2^31-1, far exceeding the actual number of tokens.
2026-04-13 19:12:36 +08:00
AIbin ba01d7a823 [Optimization] [OP] [Models] dsk del prefill mask (#7313)
* dsk del prefill mask

* dsk support 1M+ seq_len rope

* update rope tests
2026-04-11 19:32:27 +08:00
AIbin 48d2bbeb74 fix dsa (#7252) 2026-04-08 20:21:38 +08:00
AIbin 1090f8b123 [Models]support GLM4.7 Flash && Ernie_MLA (#7139)
* support GLM4.7 Flash && Ernie_MLA
2026-04-03 17:41:33 +08:00
Nyakku Shigure 8b6bbb3504 [Optimization] Use a separate driver when using Triton with Paddle (#6897)
---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-03-24 10:56:00 +08:00
AIbin bf7e2424d0 [Optimization][Feature]Supports multiple batches of DSK-DSA. (#6930)
* support DSA_MUTI_BATCH

* update test topk

* update dsk-dsa
2026-03-20 15:59:22 +08:00
AIbin 4794a28f3d opt glm5 model (#6916) 2026-03-19 11:13:33 +08:00
AIbin 9b117aafac support glm-moe-dsa model (#6863) 2026-03-18 17:21:55 +08:00
AIbin c9f7f5234e [Optimization][BugFix]Optimize Deepseek networking code (#6861)
* update dsk model

* update dsk model
2026-03-16 16:52:43 +08:00
ming1753 bb925c605f [Other] Adjust GPUModelRunner to enhance compatibility (#6851) 2026-03-16 14:49:19 +08:00
AIbin 2b8a5b0d81 update indexer model (#6791) 2026-03-13 14:11:39 +08:00
AIbin 1118351b27 [Optimization] Update Deepseekv3.2 model and dsa-indexer networking and add some unitest (#6762)
* add deepseek model doc

* update deepseek model doc

* update deepseek model doc

* update deepseek model doc

* cwb suppor DSK_V32 Model

* update DSK_V32_DSA modeling

* Ibin Support DSK_DSA

* update kernel

* update yaml

* update requirements

* update pre_commit

* update model-runner

* fix CI bug

* del start.sh

* fix iluvatar_model_runner

* update DSA & add unitest

* update import deep_gemm
2026-03-11 15:52:54 +08:00
AIbin c3aceb6bdc [Models][OP][Optimization] Support DeepSeek-v3.2 model, integrate DSA & Indexer architecture with FlashMLA/DeepGEMM (#6689)
* Support DeepSeek-v3.2 model, integrate DSA & Indexer architecture with FlashMLA/DeepGEMM
2026-03-10 15:05:14 +08:00
周周周 3cc09418f1 support dsv3 use flashmla (#6593) 2026-03-03 11:09:43 +08:00
周周周 1503443871 add dsv3 mixed deploy as EP16 TP8 (#6525) 2026-02-27 14:08:25 +08:00
bukejiyu dc5917289d [loader]supoort wint2 backend (#6139)
* support wint2

* update
2026-02-08 22:42:36 -08:00
Yuanle Liu d4a386dfc4 Revert "Revert "[TSP] last_norm allgather move to model.py (#5924)" (#5961)" (#5972)
This reverts commit 8c3513a410.
2026-01-09 15:58:22 +08:00
Yuanle Liu 8c3513a410 Revert "[TSP] last_norm allgather move to model.py (#5924)" (#5961)
This reverts commit 2bb838fed9.
2026-01-09 15:20:40 +08:00
xiaoluomi 2bb838fed9 [TSP] last_norm allgather move to model.py (#5924)
* support_lastnorm_gather_split_dev

* support_lastnorm_gather_split_dev1

* support_lastnorm_gather_split_dev3

* support_lastnorm_gather_split_dev4

* support_lastnorm_gather_split_dev5
2026-01-07 23:36:33 -08:00
周周周 83ae59431e [BugFix] fix BatchMLAWithPagedKVCacheKernel O_tmp (#5895) 2026-01-06 15:39:06 +08:00
bukejiyu d1c6e57341 [Others] upgrade paddleformer to 0.4.0 (#5599) 2025-12-23 05:08:01 -08:00
Ryan 4eb55332f6 [Models] Add forward_meta to VocabParallelEmbedding of all models (#5524) 2025-12-12 14:11:31 +08:00
xiaozude c06a6234b9 [Metax] optimize mla attention (#5258) 2025-12-09 11:18:19 +08:00
Longzhi Wang 5cd17fd662 [Models] Add forward_meta to moe models' forward function (#5138)
* [Models] Add forward_meta to moe models' forward function

* fix missing param

* fix

* fix

* fix forward_meta

* fix test and remove chunked MoE releated in config

* fix test

* fix

* fix
2025-12-04 13:26:58 +08:00
K11OntheBoat 2e1680838f [PD Disaggregation] Support PD deployment of DeepSeekv3. (#5251)
* Support deepseekv3 cache transfer for PD deploy

* clean some log info

---------

Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
2025-12-02 14:11:50 +08:00
周周周 51b1f13547 [Executor]move batch_id_per_token (#4853) 2025-11-14 15:38:48 +08:00
xiaozude c45b3ccb52 [Metax] optimize flash mla (#4915) 2025-11-12 16:43:46 +08:00
bukejiyu b09ebb2813 refactor pt loading (#4532)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-11-11 21:30:39 +08:00
Yuanle Liu 3dc0ffa46d [TSP] Support qwen3 moe tsp + cudagraph (#4871)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* support qwen3_moe tsp mode

* fix

* fix

* update

* update

* update

* fix

* support external_rmsnorm

* update

* fix
2025-11-10 23:37:51 +08:00
xiaozude f7069b8057 [Metax] adapt DeepSeek (#4498) 2025-10-24 10:14:53 +08:00
chen b134e6afe6 [BugFix]Dev fix custom ar unstable result (#4437) 2025-10-17 11:47:16 +08:00
Yuanle Liu b455fd39f3 register_model_class compatible with plugins (#4236)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
2025-09-24 11:17:12 +08:00
chen 1a6283424e Fix noaux_tc cuda Error 700 in CUDAGraph (#4174)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Run Accuracy Tests (push) Has been cancelled
CI Images Build / Run Stable Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
2025-09-23 18:41:33 +08:00
lizexu123 c86945ef49 [Feature] support pool (#3827)
* support pool

* update pooling

* add pooler_config and check

* update

* support AutoWeightsLoader load weight

* fix

* update

* delete print

* update pre-commit

* fix

* fix xpu

* fix ModelRegistry->model_registry

* fix Copilot review

* fix pooler.py

* delete StepPooler

* fix abstract

* fix default_loader_v1

* fix Pre Commit

* support torch qwen3 dense

* add test and fix torch-qwen

* fix

* fix

* adapter ci:

* fix review

* fix pooling_params.py

* fix

* fix tasks.py 2025

* fix print and logger

* Modefy ModelRegistry and delete AutoWeightsLoader

* fix logger

* fix test_embedding

* fix ci bug

* ernie4_5 model_registry

* fix test

* support Qwen3-Embedding-0.6B tp=1 load

* fix extra code

* fix

* delete fix vocab_size

* delete prepare_params_dict

* fix:
2025-09-22 14:09:09 +08:00
YuanRisheng 2e9e53ff7e [FDConfig]Remove max_num_batched_tokens/max_num_seqs in parallel config (#4116)
* remove max_num_batched_tokens in parallel config

* remove max_num_seqs

* update test case

* fix test

* fix

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
2025-09-17 10:43:35 +08:00
AIbin 41aee08982 【Inference Optimize】Update MergedReplicatedLinear for DSK qkv_a_proj_with_mqa. (#3673)
* support MergedReplicatedLinear

* update MergedReplicatedLinear to support DSK_wint4 V1_load

* update model name

* update linear class

* fix

* fix v0 moe_bias load

---------

Co-authored-by: bukejiyu <52310069+bukejiyu@users.noreply.github.com>
2025-09-04 21:16:05 -07:00
co63oc 5441538173 rename fused_get_rope.cu (#3752)
* rename fused_get_rope.cu

* fix

* fix typos

* fix

* fix
2025-09-03 10:54:34 +08:00
chen ce9c0917c5 [Precision] Support lm_head layer running in float32 (#3597)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
* support lm_head fp32 bf16 fp16

* support lm_head fp32 bf16 fp16

* add doc and check code

* lm_head_fp32 specify lm_head as fp32

* code check

* check doc
2025-08-27 11:34:53 +08:00
AIbin 0a0d2959b9 qkv_a_proj horizontal fusion (#3591)
Support DSK qkv_a_proj horizontal fusion under V0 Loder
2025-08-26 14:25:57 +08:00
RAM 2fa173e327 [Executor] CUDAGraph support RL training (#3265)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
* add clear graph opt backend

* cuda graph support rl

* add branch

* 1.fix dynamic_weight_manager bug 2.add clear api for CasualLM

* open test case

* fix typo

* update mkdocs.yaml

* [Docs]Update mkdocs.yml

* update test case

* use unittest in graph test case
2025-08-25 20:59:30 +08:00
bukejiyu 77514e3e1e [V1 Loader] support weight_only (#3413)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
* support wint4/wint8

* delete smoe case

* update ci

* print log
2025-08-23 13:13:41 +08:00
AIbin beec24fd89 【Inference Optimize】DeepSeek-v3 model inference performance optimization (#3455)
* DSK_OPT_01

* update FA3
2025-08-19 10:42:42 +08:00
YuanRisheng 09c979f3dd [V1 Loader] Support Ernie text(moe and dense) (#3110)
* new loader support 0.3B

* fix weight

* support parallel load

* support parallel load

* fix slice

* support moe

* delete code

* perfect code

* perfect code
2025-08-14 20:25:28 +08:00
Zero Rains be94bdd0b0 [Loader V1] modify layername for DeepSeekV3 (#3336)
Co-authored-by: Yuanle Liu <yuanlehome@163.com>
Co-authored-by: YUNSHEN XIE <1084314248@qq.com>
2025-08-13 15:47:06 +08:00