kevin
|
5538dda3c8
|
[Feature] pd support dy-c8 ipc (#5750)
* pd support dy-c8 ipc
* update code
* support v0
* update code
|
2025-12-25 21:22:34 +08:00 |
|
freeliuzc
|
9018ccf74e
|
[Speculative Decoding] Fix attn_mask_offset for multi-step MTP in mixed and PD-split modes (#5738)
* fix attn_mask_offset in mtp with multi-step and pd-split-mode
* fix xpu operater register
* update pmtp multi-step mtp strategy in d-split -mode
* add note
* fix xpu register
|
2025-12-25 01:54:59 -08:00 |
|
Juncai
|
412867fd99
|
[Feature] Support KV Cache Storage (#5571)
* Support Mooncake Store
* up
* up
* add op
* fix conflict
* fix error
* up for comments
* avoid thread lock
* up
* fix unittest
* fix unittest
* remove debug info
* consider tp_size > 1
* add default rdma_nics
* add utils
* up
* fix error
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
|
2025-12-25 16:30:35 +08:00 |
|
RuohengMa
|
e154c03416
|
[XPU] refine moe_expert_ffn ut (#5743)
|
2025-12-25 10:35:24 +08:00 |
|
chen
|
c7ab32d154
|
check (#5736)
|
2025-12-24 16:49:20 +08:00 |
|
周周周
|
922a73ddd6
|
[Others] clean code (#5691)
|
2025-12-24 11:28:47 +08:00 |
|
RuohengMa
|
2c3c983b96
|
[XPU] modify speculate_verify (#5522)
|
2025-12-23 14:50:30 +08:00 |
|
lizexu123
|
6d323769dd
|
fix w4afp8 (#5634)
|
2025-12-22 13:39:41 +08:00 |
|
chen
|
a32cb54d0b
|
[BugFix] Fix custom_all_reduce overflow (#5662)
* check
* check
* code style
|
2025-12-19 18:24:21 +08:00 |
|
lizan1999
|
ec6811f648
|
support token num = 0 (#5635)
Co-authored-by: lizan1999 <lizan03@baidu.com>
Co-authored-by: cmcamdy <1027740945@qq.com>
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
|
2025-12-19 10:20:38 +08:00 |
|
yzwu
|
ac013803f3
|
[Iluvatar] Support V1_KVCACHE_SCHEDULER and paddleocr-vl rope mode (#5555)
|
2025-12-18 02:14:25 -08:00 |
|
lizan1999
|
e1a9b282eb
|
fix bug for EP+MTP (#5605)
Co-authored-by: lizan1999 <lizan03@baidu.com>
|
2025-12-18 14:34:54 +08:00 |
|
zhupengyang
|
8735cb5045
|
[XPU] refactor moe ffn (#5501)
- remove BKCL_DISPATCH_ALL_GATHER
- support sparse mode
- support moe quant_method
|
2025-12-18 14:14:05 +08:00 |
|
Yuanle Liu
|
cdc0004894
|
Revert "[Feature] add ue8m0 for per_token_quant_fp8 (#5563)" (#5611)
This reverts commit 73e1d6aa90.
|
2025-12-17 13:59:06 +08:00 |
|
Yuanle Liu
|
867803ae10
|
[BugFix] fix speculate_limit_thinking_content_length (#5590)
* fix speculate_limit_thinking_content_length
* update
|
2025-12-16 04:31:45 -08:00 |
|
chen
|
27ef3610b5
|
support glm fa3 (#5586)
|
2025-12-16 19:33:27 +08:00 |
|
fxyfxy777
|
73e1d6aa90
|
[Feature] add ue8m0 for per_token_quant_fp8 (#5563)
* ue8m0
* add default arg
---------
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
|
2025-12-16 18:40:12 +08:00 |
|
Echo-Nie
|
50100f98d7
|
[Feature] Support fusedmoe on Blackwell (#5325)
* update sm100
* fix
* fix style
|
2025-12-16 11:58:50 +08:00 |
|
freeliuzc
|
532f9ba227
|
[BugFix][Speculative Decoding](Spend many dyas to solve)Fix write qknorm cache bug in speculative decoding (#5491)
* [liuzichang spend 10 dyas]fix write qknorm cache bug
* fix 'fix cachekv bug''
|
2025-12-15 18:27:11 +08:00 |
|
ddchenhao66
|
9f70f4310e
|
[PD Disaggregation][XPU] update_inputs_v1 operator supports PD (#5550)
Co-authored-by: ddchenhao66 <dhaochen163.com>
|
2025-12-15 15:39:38 +08:00 |
|
chen
|
a389bb7c5c
|
[Feature][Optimization] Qwen Support Dynamic block_wise_fp8 cache (#5486)
|
2025-12-12 17:10:17 +08:00 |
|
RuohengMa
|
12c76f8137
|
[XPU] add speculate_get_logits (#5497)
* [XPU] add speculate_step_system_cache
* [XPU] add speculate_step_system_cache
* [XPU] add speculate_get_logits
* delete context
* add ptr check
---------
Co-authored-by: cmcamdy <1027740945@qq.com>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
|
2025-12-12 15:38:30 +08:00 |
|
Lucas
|
888c4b992d
|
[XPU] refactor of block_attn param 'pos_emb_type' (#5511)
|
2025-12-12 14:30:09 +08:00 |
|
Juncai
|
d67388a479
|
[PD Disaggregation] Distinguish the pipelines for sending kv signal in different prefill (#5514)
* Distinguish the pipelines for sending kv signal in different prefill
* up
|
2025-12-12 14:05:36 +08:00 |
|
cmcamdy
|
3c1f7b85a4
|
[XPU] support get hidden state for mix (#5513)
* fix git hidden states
* fix code style
* fix code style
|
2025-12-12 10:31:20 +08:00 |
|
FocusLuo
|
c3aaa7e441
|
[BugFix] Fixed build script issue on Intel HPU platforms (#5455)
* [INTEL HPU] Fixed build script issue for non-gpu platforms
Signed-off-by: Luo, Focus <focus.luo@intel.com>
* [INTEL HPU] PR CI HPU will not use fixed version of fastdeploy_intel_hpu
Signed-off-by: Luo, Focus <focus.luo@intel.com>
---------
Signed-off-by: Luo, Focus <focus.luo@intel.com>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
|
2025-12-11 16:36:37 +08:00 |
|
Neil Zhu
|
4403a21d4b
|
[Metax] refactor cutlass moe and optimize flash attention (#5361)
* [Metax] refactor moe and flash attention backend
---------
Co-authored-by: zhangchenyi_dl <16219492+zhangchenyidl@user.noreply.gitee.com>
|
2025-12-10 17:15:17 +08:00 |
|
Copilot
|
e38709b499
|
[BugFix] Fix limit_thinking early return logic in CUDA kernels (#5471)
* Initial plan
* [BugFix] Fix limit_thinking bug - change AND to OR in condition checks
Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>
* Update Chinese comments to reflect OR logic instead of AND
Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: yuanlehome <23653004+yuanlehome@users.noreply.github.com>
|
2025-12-10 11:03:19 +08:00 |
|
lzy
|
99f607eef5
|
[Others] Maintain the mtp branch temporarily. (#5446)
|
2025-12-09 19:17:53 +08:00 |
|
lizexu123
|
95eab9f9ee
|
[Feature] support stop_token_ids (#5399)
* support stop_token_ids
* fix
* delete chinese
* support both
* delete print
|
2025-12-09 17:49:12 +08:00 |
|
xiaozude
|
df67379bc3
|
[Metax] modify wrapSize to WARP_SIZE (#5442)
|
2025-12-09 01:44:02 -08:00 |
|
周周周
|
31410415db
|
FA3 support qwen3 (#5441)
|
2025-12-09 16:16:16 +08:00 |
|
RuohengMa
|
8178e3fc6a
|
[XPU] add speculate_step_system_cache (#5397)
* [XPU] add speculate_step_system_cache
* [XPU] add speculate_step_system_cache
---------
Co-authored-by: cmcamdy <1027740945@qq.com>
|
2025-12-09 14:40:11 +08:00 |
|
K11OntheBoat
|
8d99bac532
|
Remove CUDA ERROR 9 of inputs of get_padding_offset kernel (#5440)
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
|
2025-12-09 14:17:30 +08:00 |
|
周周周
|
2aea8a3a60
|
[Others] Remove useless code (#5404)
|
2025-12-08 13:59:46 +08:00 |
|
GoldPancake
|
8545b705ed
|
fix top_p_candidates (#5400)
Co-authored-by: freeliuzc <lzc842650834@gmail.com>
|
2025-12-05 20:01:05 +08:00 |
|
Lucas
|
8f2b85362d
|
[XPU] support moe_expert_ffn TGEMM selection (#5375)
|
2025-12-05 17:49:40 +08:00 |
|
Lucas
|
3aed8d257d
|
[XPU] redirect xvllm/xtdk/xhpc downloading log (#5388)
|
2025-12-05 17:34:17 +08:00 |
|
cmcamdy
|
86b6430582
|
fix split_rope_cache_kv_encoder in mix mtp (#5384)
|
2025-12-05 14:33:17 +08:00 |
|
Lucas
|
7b0b6e470a
|
[XPU] support XDNN downloading function (#5365)
|
2025-12-05 11:16:45 +08:00 |
|
Nyakku Shigure
|
f88c159de1
|
[BugFix] Exit if neither modern nor legacy wheel dir not found (#5367)
CE Compile Job / ce_job_pre_check (push) Has been cancelled
CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled
CE Compile Job / FD-Clone-Linux (push) Has been cancelled
CE Compile Job / Show Code Archive Output (push) Has been cancelled
CE Compile Job / BUILD_SM8090 (push) Has been cancelled
CE Compile Job / BUILD_SM8689 (push) Has been cancelled
CE Compile Job / CE_UPLOAD (push) Has been cancelled
Deploy GitHub Pages / deploy (push) Has been cancelled
Publish Job / publish_pre_check (push) Has been cancelled
Publish Job / print_publish_pre_check_outputs (push) Has been cancelled
Publish Job / FD-Clone-Linux (push) Has been cancelled
Publish Job / Show Code Archive Output (push) Has been cancelled
Publish Job / BUILD_SM8090 (push) Has been cancelled
Publish Job / BUILD_SM8689 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled
Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled
Publish Job / Run FD Image Build (push) Has been cancelled
Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled
Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
Publish Job / Run Base Tests (push) Has been cancelled
Publish Job / Run Accuracy Tests (push) Has been cancelled
Publish Job / Run Stable Tests (push) Has been cancelled
CI Images Build / FD-Clone-Linux (push) Has been cancelled
CI Images Build / Show Code Archive Output (push) Has been cancelled
CI Images Build / CI Images Build (push) Has been cancelled
CI Images Build / BUILD_SM8090 (push) Has been cancelled
CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled
CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled
CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled
CI Images Build / Run Base Tests (push) Has been cancelled
CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
2025-12-04 16:45:48 +08:00 |
|
Yonghua Li
|
f4119d51b4
|
[PD Disaggregation] support DP via v1 router and decouple DP and EP (#5197)
* [fix] support DP via v1 router and decouple DP and EP
* [fix] fix scripts
* [fix] reset model path
* [fix] dp use get_output_ep, fix router port type, update scripts
* [merge] merge with latest code
* [chore] remove some debug log
* [fix] fix code style check
* [fix] fix test_multi_api_server for log_dir name
* [chore] reduce logs
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
2025-12-04 15:38:43 +08:00 |
|
周周周
|
a36d60aa18
|
[FIX BUG] fix bug in TP in permute_x_fp8_kernel (#5350)
* commit
* commit
* commit
* commit
* commit
* commit
|
2025-12-03 05:17:37 -08:00 |
|
Sunny-bot1
|
d5a9b75b4e
|
fix cutlass ep (#5337)
|
2025-12-03 14:06:01 +08:00 |
|
lzy
|
c71a44c7e5
|
supports mtp split_kv_attn (#5343)
|
2025-12-03 12:40:16 +08:00 |
|
Sunny-bot1
|
3629db4129
|
[Quantization] Support w4afp8 MoE dynamic quantization (#5282)
* support dynamic activation quant for w4afp8
* support dynamic w4afp8
* add test
* fix
* fix
---------
Co-authored-by: zhoutianzi666 <17801055074@163.com>
|
2025-12-02 18:56:16 +08:00 |
|
周周周
|
fb7f951612
|
[UNITEST] add test (#5305)
|
2025-12-02 17:59:01 +08:00 |
|
qw86972190
|
6048ea37bd
|
[XPU]add enable_logprob (#5279)
* [XPU]Update document
* [XPU]Update documentation
* [XPU]add enable_logprob
* Fix code style issues
* “doc”
* “docs”
* “doc”
* Fix code style via pre-commit
---------
Co-authored-by: root <root@gajl-bbc-onlinec-com-1498354.gajl.baidu.com>
|
2025-12-02 15:32:28 +08:00 |
|
K11OntheBoat
|
2e1680838f
|
[PD Disaggregation] Support PD deployment of DeepSeekv3. (#5251)
* Support deepseekv3 cache transfer for PD deploy
* clean some log info
---------
Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>
|
2025-12-02 14:11:50 +08:00 |
|
chen
|
aa35ce449d
|
[Optimization] EP empty_input_forward Remove Communication (#5254)
|
2025-12-01 21:10:40 +08:00 |
|