Commit Graph

448 Commits

Author SHA1 Message Date
周周周 e3957a5ebc [Others] remove template NUM_EXPERTS_PER_RANK in permute_x_fp8_kernel (#5620) 2026-01-04 11:21:15 +08:00
MingkunZhang f732d7d2ad [Metax] adapt prefix caching & cpu swap (#5844)
Co-authored-by: root <root@lt-wks-10-0-180-15.pub.metax-tech.com>
2025-12-31 17:02:48 +08:00
ddchenhao66 9e45ef7ca9 [XPU]MAX_BSZ aligns gpu settings and disable prefix cache in OCR VL (#5831) 2025-12-31 09:49:12 +08:00
Sunny-bot1 598d292a69 w4afp8 fix quant (#5830) 2025-12-30 21:16:13 +08:00
lizexu123 44a13e4557 [Feature] support w4afp8 v1_loader and v0_loader(tp>1) (#5757)
* support

* fix

* support w4afp8 v1_loader and v0_loader

* fix

* fix test

* fix test

* fix test

* fix moe.py

* add test_ernie_4_5_w4afp8

* add test

* delete tensor

* fix test

* fix

* add

* fix test
2025-12-30 14:11:52 +08:00
Yonghua Li a8d3e3ba12 [BugFix] fix shm opened but not closed in set_data_ipc (#5826) 2025-12-29 23:35:07 +08:00
CSWYF3634076 9286403570 [Models] Add Qwen3-VL Model Support (#5763)
* support v1 loader

* remove useless code

* remove useless

* [Model] support Qwen3VL images success

* [Model] support Qwen3VL rope_3d

* [Model] support Qwen3VL remove log

* [Model] support Qwen3VL RL

* [Model] support Qwen3VL tp

* [Model] support Qwen3VL video

* [Model] support Qwen3VL fix ernievl

* [Model] support Qwen3VL fix get_image_boundaries.cc array out of bounds

* [Model] support Qwen3VL fix multi card

* [Model] support Qwen3VL file close

* [Model] support Qwen3VL fix ce

* [Model] support Qwen3VL fix unittest

* [Model] support Qwen3VL add unittest

---------

Co-authored-by: Ayakouji <yuhongh@qq.com>
2025-12-29 17:39:33 +08:00
周周周 a3f0696e35 [BugFix] fix compile error in sm89 (#5809) 2025-12-29 16:55:52 +08:00
Longzhi Wang 11329ee35e [Model] support mode config for expert_dispatch (#5748) 2025-12-29 13:37:20 +08:00
Ryan 09229d8953 change count_tokens_per_expert_func declaration: Tensor -> vector<Tensor> (#5794) 2025-12-26 19:02:28 +08:00
Ryan 724045c426 add some op infershape&dtype (#5762) 2025-12-26 16:17:39 +08:00
周周周 03363cab4c make flash_mask attention pybind (#5783) 2025-12-26 14:31:35 +08:00
kevin 5538dda3c8 [Feature] pd support dy-c8 ipc (#5750)
* pd support dy-c8 ipc

* update code

* support v0

* update code
2025-12-25 21:22:34 +08:00
freeliuzc 9018ccf74e [Speculative Decoding] Fix attn_mask_offset for multi-step MTP in mixed and PD-split modes (#5738)
* fix attn_mask_offset in mtp with multi-step and pd-split-mode

* fix xpu operater register

* update pmtp multi-step mtp strategy in d-split -mode

* add note

* fix xpu register
2025-12-25 01:54:59 -08:00
Juncai 412867fd99 [Feature] Support KV Cache Storage (#5571)
* Support Mooncake Store

* up

* up

* add op

* fix conflict

* fix error

* up for comments

* avoid thread lock

* up

* fix unittest

* fix unittest

* remove debug info

* consider tp_size > 1

* add default rdma_nics

* add utils

* up

* fix error

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-25 16:30:35 +08:00
RuohengMa e154c03416 [XPU] refine moe_expert_ffn ut (#5743) 2025-12-25 10:35:24 +08:00
chen c7ab32d154 check (#5736) 2025-12-24 16:49:20 +08:00
周周周 922a73ddd6 [Others] clean code (#5691) 2025-12-24 11:28:47 +08:00
RuohengMa 2c3c983b96 [XPU] modify speculate_verify (#5522) 2025-12-23 14:50:30 +08:00
lizexu123 6d323769dd fix w4afp8 (#5634) 2025-12-22 13:39:41 +08:00
chen a32cb54d0b [BugFix] Fix custom_all_reduce overflow (#5662)
* check

* check

* code style
2025-12-19 18:24:21 +08:00
lizan1999 ec6811f648 support token num = 0 (#5635)
Co-authored-by: lizan1999 <lizan03@baidu.com>
Co-authored-by: cmcamdy <1027740945@qq.com>
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
2025-12-19 10:20:38 +08:00
yzwu ac013803f3 [Iluvatar] Support V1_KVCACHE_SCHEDULER and paddleocr-vl rope mode (#5555) 2025-12-18 02:14:25 -08:00
lizan1999 e1a9b282eb fix bug for EP+MTP (#5605)
Co-authored-by: lizan1999 <lizan03@baidu.com>
2025-12-18 14:34:54 +08:00
zhupengyang 8735cb5045 [XPU] refactor moe ffn (#5501)
- remove BKCL_DISPATCH_ALL_GATHER
- support sparse mode
- support moe quant_method
2025-12-18 14:14:05 +08:00
Yuanle Liu cdc0004894 Revert "[Feature] add ue8m0 for per_token_quant_fp8 (#5563)" (#5611)
This reverts commit 73e1d6aa90.
2025-12-17 13:59:06 +08:00
Yuanle Liu 867803ae10 [BugFix] fix speculate_limit_thinking_content_length (#5590)
* fix speculate_limit_thinking_content_length

* update
2025-12-16 04:31:45 -08:00
chen 27ef3610b5 support glm fa3 (#5586) 2025-12-16 19:33:27 +08:00
fxyfxy777 73e1d6aa90 [Feature] add ue8m0 for per_token_quant_fp8 (#5563)
* ue8m0

* add default arg

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-16 18:40:12 +08:00
Echo-Nie 50100f98d7 [Feature] Support fusedmoe on Blackwell (#5325)
* update sm100

* fix

* fix style
2025-12-16 11:58:50 +08:00
freeliuzc 532f9ba227 [BugFix][Speculative Decoding](Spend many dyas to solve)Fix write qknorm cache bug in speculative decoding (#5491)
* [liuzichang spend 10 dyas]fix write qknorm cache bug

* fix 'fix cachekv bug''
2025-12-15 18:27:11 +08:00
ddchenhao66 9f70f4310e [PD Disaggregation][XPU] update_inputs_v1 operator supports PD (#5550)
Co-authored-by: ddchenhao66 <dhaochen163.com>
2025-12-15 15:39:38 +08:00
chen a389bb7c5c [Feature][Optimization] Qwen Support Dynamic block_wise_fp8 cache (#5486) 2025-12-12 17:10:17 +08:00
RuohengMa 12c76f8137 [XPU] add speculate_get_logits (#5497)
* [XPU] add speculate_step_system_cache

* [XPU] add speculate_step_system_cache

* [XPU] add speculate_get_logits

* delete context

* add ptr check

---------

Co-authored-by: cmcamdy <1027740945@qq.com>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-12 15:38:30 +08:00
Lucas 888c4b992d [XPU] refactor of block_attn param 'pos_emb_type' (#5511) 2025-12-12 14:30:09 +08:00
Juncai d67388a479 [PD Disaggregation] Distinguish the pipelines for sending kv signal in different prefill (#5514)
* Distinguish the pipelines for sending kv signal in different prefill

* up
2025-12-12 14:05:36 +08:00
cmcamdy 3c1f7b85a4 [XPU] support get hidden state for mix (#5513)
* fix git hidden states

* fix code style

* fix code style
2025-12-12 10:31:20 +08:00
FocusLuo c3aaa7e441 [BugFix] Fixed build script issue on Intel HPU platforms (#5455)
* [INTEL HPU]  Fixed build script issue for non-gpu platforms

Signed-off-by: Luo, Focus <focus.luo@intel.com>

* [INTEL HPU] PR CI HPU will not use fixed version of fastdeploy_intel_hpu

Signed-off-by: Luo, Focus <focus.luo@intel.com>

---------

Signed-off-by: Luo, Focus <focus.luo@intel.com>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
2025-12-11 16:36:37 +08:00
Neil Zhu 4403a21d4b [Metax] refactor cutlass moe and optimize flash attention (#5361)
* [Metax] refactor moe and flash attention backend
---------

Co-authored-by: zhangchenyi_dl <16219492+zhangchenyidl@user.noreply.gitee.com>
2025-12-10 17:15:17 +08:00
Copilot e38709b499 [BugFix] Fix limit_thinking early return logic in CUDA kernels (#5471)
* Initial plan

* [BugFix] Fix limit_thinking bug - change AND to OR in condition checks

Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>

* Update Chinese comments to reflect OR logic instead of AND

Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>
2025-12-10 11:03:19 +08:00
lzy 99f607eef5 [Others] Maintain the mtp branch temporarily. (#5446) 2025-12-09 19:17:53 +08:00
lizexu123 95eab9f9ee [Feature] support stop_token_ids (#5399)
* support stop_token_ids

* fix

* delete chinese

* support both

* delete print
2025-12-09 17:49:12 +08:00
xiaozude df67379bc3 [Metax] modify wrapSize to WARP_SIZE (#5442) 2025-12-09 01:44:02 -08:00
周周周 31410415db FA3 support qwen3 (#5441) 2025-12-09 16:16:16 +08:00
RuohengMa 8178e3fc6a [XPU] add speculate_step_system_cache (#5397)
* [XPU] add speculate_step_system_cache

* [XPU] add speculate_step_system_cache

---------

Co-authored-by: cmcamdy <1027740945@qq.com>
2025-12-09 14:40:11 +08:00
K11OntheBoat 8d99bac532 Remove CUDA ERROR 9 of inputs of get_padding_offset kernel (#5440)
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
2025-12-09 14:17:30 +08:00
周周周 2aea8a3a60 [Others] Remove useless code (#5404) 2025-12-08 13:59:46 +08:00
GoldPancake 8545b705ed fix top_p_candidates (#5400)
Co-authored-by: freeliuzc <lzc842650834@gmail.com>
2025-12-05 20:01:05 +08:00
Lucas 8f2b85362d [XPU] support moe_expert_ffn TGEMM selection (#5375) 2025-12-05 17:49:40 +08:00
Lucas 3aed8d257d [XPU] redirect xvllm/xtdk/xhpc downloading log (#5388) 2025-12-05 17:34:17 +08:00