FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Author	SHA1	Message	Date
AIbin	1090f8b123	[Models]support GLM4.7 Flash && Ernie_MLA (#7139 ) * support GLM4.7 Flash && Ernie_MLA	2026-04-03 17:41:33 +08:00
cmcamdy	7a2e33098f	[XPU] Refactor pre process (#6993 ) * [XPU] support speculate_pre_process * merge develop * fix codestype * fix mtp, support cu_seqlens_q_output * fix mtp, support cu_seqlens_q_output * fix test --------- Co-authored-by: lizan1999 <lizan03@baidu.com>	2026-04-01 20:29:55 +08:00
sunxin	c29e86fc9d	[Feature] Support mtp overlap schedule (#7001 )	2026-04-01 14:24:26 +08:00
ming1753	bb925c605f	[Other] Adjust GPUModelRunner to enhance compatibility (#6851 )	2026-03-16 14:49:19 +08:00
cmcamdy	7591e0d6bc	fix eb5 mtp(mix) (#6800 )	2026-03-13 17:36:57 +08:00
AIbin	c3aceb6bdc	[Models][OP][Optimization] Support DeepSeek-v3.2 model, integrate DSA & Indexer architecture with FlashMLA/DeepGEMM (#6689 ) * Support DeepSeek-v3.2 model, integrate DSA & Indexer architecture with FlashMLA/DeepGEMM	2026-03-10 15:05:14 +08:00
Ryan	5e78c1ac87	[Graph Optimization] Support CUDAGraph for P/PD mixed Batch using SOT subgraph spliting mode (#6196 ) * refine comment && refine variable name * replace comment	2026-01-29 16:29:54 +08:00
yinwei	1e3c35496c	[XPU][Graph Optimization] XPU Support CUDAGraph (#6152 ) * support cuda graph	2026-01-22 14:41:56 +08:00
jackyYang6	988e0bc338	[Feature] Add PaddleFormers fallback backend (#5999 ) * feat(paddleformers): add dense text model fallback backend * docs(paddleformers): add user guide and fix code review issues * add fallback unit test * precommit format * fix pre-commit * fix: address code review feedback * docs: add PaddleFormers backend documentation (EN) and simplify installation --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2026-01-19 21:50:50 +08:00
fmiao2372	1ee285c2d6	[Intel HPU] enable chunked prefill (#5903 ) * [Intel HPU] enable chunked prefill * fix bug by copilot comments	2026-01-06 21:01:50 +08:00
fmiao2372	404cf0ece4	[Intel HPU] enable tensor_wise_fp8 (#5324 ) * [Intel HPU] enable tensor_wise_fp8 * update code based on comments * fix code style issue * fix bug about RP 5138 * mv kv_cache modifications to HPU backend * fix FP8 Precision Issues * fix FP8 Precision Issues * Add quantization UT --------- Co-authored-by: yanfeich <yanfei.cheng@intel.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-17 16:45:03 +08:00
Lucas	888c4b992d	[XPU] refactor of block_attn param 'pos_emb_type' (#5511 )	2025-12-12 14:30:09 +08:00
Ryan	e58fed3665	[Graph Optimization][BugFix][CI] Fix 0size bug && add unitest (#5495 )	2025-12-11 16:25:26 +08:00
RAM	b2908b8e82	[New][RL] Support Rollout Routing Replay (#5405 ) * [RL] Support Rollout Routing Replay * add routing indices cache * fix config bug and moe forward bug * R3 Support GLM * support eb4.5 * fix merge bug * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * add routing replay ci * support glm topk * support orther top_k * fix ci bug * pre-commit * only support chatcmpl * Revert "Revert "[RL] Support Rollout Routing Replay (#5321)" (#5402)" This reverts commit `c45e064f3d`. * Fix XPU and NPU bug --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Yuanle Liu <yuanlehome@163.com>	2025-12-05 22:06:26 +08:00
Jiang-Jia-Jun	c45e064f3d	Revert "[RL] Support Rollout Routing Replay (#5321 )" (#5402 ) This reverts commit `96d2d4877b`.	2025-12-05 20:19:39 +08:00
RAM	96d2d4877b	[RL] Support Rollout Routing Replay (#5321 ) * [RL] Support Rollout Routing Replay * add routing indices cache * fix config bug and moe forward bug * R3 Support GLM * support eb4.5 * fix merge bug * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * add routing replay ci * support glm topk * support orther top_k * fix ci bug * pre-commit * only support chatcmpl --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Yuanle Liu <yuanlehome@163.com>	2025-12-05 20:01:33 +08:00
Longzhi Wang	5cd17fd662	[Models] Add forward_meta to moe models' forward function (#5138 ) * [Models] Add forward_meta to moe models' forward function * fix missing param * fix * fix * fix forward_meta * fix test and remove chunked MoE releated in config * fix test * fix * fix	2025-12-04 13:26:58 +08:00
ddchenhao66	e70e2279ce	[PD Disaggregation][XPU] Add XPU support for PD disaggregation (#5113 ) * [XPU] xpu support PD disaggregation * [XPU] fix the issue of cache KV transfer process startup failure on non-zero XPU cards * [XPU] xpu support PD disaggregation in v1 scheduler --------- Co-authored-by: ddchenhao66 <dhaochen163.com>	2025-11-21 14:09:01 +08:00
Yonghua Li	43097a512a	[BugFix] [PD Disaggregation] fix v1 scheduler prefill node profile run & ipc transfer protocol (#5132 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * [fix] fix v1 scheduler profile run for append attention in prefill node * [fix] skip send_signal if kv signal not inited for gpu and xpu * [fix] extend fix to flash_attn & mla_attn * [fix] fix v1 pd run in ipc transfer protocol * [ci] add test for v1 pd profile run using ipc transfer protocol * [style] fix code style check * [style] fix code style again * [fix] fix profile run * [update] remove --num-gpu-blocks-override in example script * [chore] rename forward_meta is_profiling to is_dummy_or_profile_run	2025-11-20 21:39:22 +08:00
周周周	876e4a8935	remove input_ids from ForwardMeta (#4793 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-11-05 11:55:51 +08:00
JYChen	83d45af1f3	fix import image_ops error on some platforms (#4559 )	2025-10-24 16:09:20 +08:00
Sunny-bot1	a751d977bc	[Optimization] Fuse get_max_len and get_kv_max_len (#4369 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * opt split_q_block * fuse max_lens and max kv_len	2025-10-13 20:35:00 +08:00
yinwei	20c7b741f4	[XPU] Support W4A8C8-TP4-300B Model (#4068 ) * support w4a8 * delete ep block attn * delete moe_topk_select * update note * update * delte useless info * update * add some note * fix some format * update scale info * add ans baseline --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-10-10 15:41:32 +08:00
RAM	aa27b03bc0	[Executor]CUDAGraph support Speculate Decode (#3769 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * success run ngram * Revert "[Code Simplification] remove cum_offsets (#3410)" This reverts commit `32b39620bc`. * success run ngram5 tp4 42bs * success run ngram5 tp4 42bs * mtp draft commit * add decorator for target model * enable draft model in cudagraph v0.5 * revert revrt cum_offset * enable target model in cudagraph v0.9 And clean debug code * Revert "success run ngram" This reverts commit `8351e83993`. * add reverted code * enable target model in cudagraph v0.9 * solve comment * fix bid < 0 * Enable Target Model Padding And Draft Model in cudagraph * solve problem * delete rebuild padding debug note * fast compile * Add capture list for mtp * success run 256 tp1 mtp * Enable Lite TP2 Bsz256 * realy enable tp2 bsz 256 * fix problem * Solve problem for Draft model in cudagraph * Solve comment * replace emptytensor as zeros * Solve comments * Revert "fast compile" This reverts commit `834639a7ff`. * fix bug * fix merge bug * fix typo * fix bug --------- Co-authored-by: lizexu <2694294196@qq.com> Co-authored-by: littledgg <1658565283@qq.com> Co-authored-by: zeroRains <linjunlu@zerorains.top> Co-authored-by: gongshaotian <gstain5555@outlook.com>	2025-10-09 21:18:29 +08:00
Lucas	87179cb744	[XPU] support XPU VL model inference (#4030 ) * [XPU] support XPU VL model inference * fix image op import and device check * rebase develop * fix perf	2025-09-25 14:34:15 +08:00
fmiao2372	f1b5392e20	[Intel HPU] Support intel hpu platform (#4161 ) * [Intel HPU] Support intel hpu platform * fix some issues * apply precommit and move AttentionBackend_HPU * fix format issue * correct ops import * fix ci issue * update code in layers * fix code style issue * remove dense tp moe ep mode * fix enc_dec_block_num * fix rebase issue * rename hpu to gaudi in readme * rename ForwardMeta_HPU to HPUForwardMeta	2025-09-24 12:27:50 +08:00
AIbin	a7392a0ff9	【Inference Optimize】DeepSeek-V3-model MLA Optimize (#3886 ) * support MLA chunk_size auto search & cuda_graph	2025-09-11 10:46:09 +08:00
Jundong Liu	3d0aaa5923	[Excutor] Experiment Feature-Support Prefill in cudagraph (#3459 ) * Support prefill in Cudagraph * Refactor GetBlockShapeAndSplitKVBlock Kernel V2 * Refactor GetBlockShapeAndSplitKVBlock Kernel V2.1 * Refactor GetBlockShapeAndSplitKVBlock Kernel V2.2 * Refactor GetBlockShapeAndSplitKVBlock Kernel V2.3 * Refactor GetBlockShapeAndSplitKVBlock Kernel V2.4 * Refactor GetBlockShapeAndSplitKVBlock Kernel V2.5 * Solve problem about encoder_num_blocks_x_cpu * Add early-exit mechanism for attention kernel * fix test case about append-attention * Update testcode, Add annotations to related tensors * move get_input_length_list * solve test_code * Add annotations about early-exit for attention kernel * Add annotations about early-exit for attention kernel2 * solve comment * solve mtp --------- Co-authored-by: RAM <gstian5555@outlook.com>	2025-09-08 13:12:24 +08:00
lifulll	72094d4d82	enable dcu ci (#3402 )	2025-08-29 10:23:08 +08:00
Yuanle Liu	4957908275	add input_processor plugin (#3657 ) * add input_processor plugin * update * update * update * update * update * update * update * update * update * update * update	2025-08-28 22:53:57 +08:00
Jundong Liu	ea4a3b479c	[Excutor] Increase buffer size to prevent address corruption; add forward metadata debug tool (#3404 ) * 修复buffer申请不够大，增加打印forwardmetadata的工具 * fix mistake * Make CPU tensor in CPUPlace * Add test about forward_meta_str and Add unitest_requirement --------- Co-authored-by: RAM <gstian5555@outlook.com>	2025-08-18 16:14:09 +08:00
Kane2011	b4fef2cf29	[MetaxGPU] Support FastDeploy on metax gpu (#3241 ) * [MetaxGPU] Support FastDeploy on metax gpu * Update metax_worker.py 1. change worker log; 2. remove custom allreduce, adapt it later; 3. remove cuda graph; * Update __init__.py 1. remove metax's key work comment * Update __init__.py 1. remove metax's key word comment; 2. add fused_moe_kernel_paddle import --------- Co-authored-by: yongqiangma <xing.wo@163.com>	2025-08-13 11:11:54 +08:00
RAM	d850660872	[Executor] Refactor GetBlockShapeAndSplitKVBlock Kernel (#2989 ) * reset decoder_block_shape_q buffer * refactor GetBlockShapeAndSplitKVBlock Kernel and cudagraph padding batch * update decode_max_tile_size * fix pre-commit * update block_multihead_attn_backend * update flas attn backend * update MLA Attention * update XPU Attention * update gcu,iluvatar model runner * Update MTP * fix MTP bug	2025-07-31 00:09:31 +08:00
Yuanle Liu	2f74e93d7e	use dist.all_reduce(min) to sync num_blocks_local (#2933 ) * pre-commit all files check * reduce min num_blocks_local * fix nranks=1 * pre-commit when commit-msg	2025-07-21 01:23:36 -07:00
周周周	8c5407d9e4	remove cum_offsets from ForwardMeta (#2925 ) Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-07-19 23:57:27 +08:00
Zero Rains	25698d56d1	polish code with new pre-commit rule (#2923 )	2025-07-19 23:19:27 +08:00
周周周	ddb10ac509	[Inference, rename] remove padding_offsets from atten use batch_id_per_token (#2880 ) * remove padding_offsets from atten	2025-07-17 18:41:31 +08:00
RAM	0fad10b35a	[Executor] CUDA Graph support padding batch (#2844 ) * cuda graph support padding batch * Integrate the startup parameters for the graph optimization backend and provide support for user - defined capture sizes. * Do not insert max_num_seqs when the user specifies a capture list * Support set graph optimization config from YAML file * update cuda graph ci * fix ci bug * fix ci bug	2025-07-15 19:49:01 -07:00
littledgg	59071268b6	[Executor] Move forward_meta.py to fastdeploy/model_executor (#2774 ) * Use PEP 563 in attention.py and fix conflict * merge commit * Change what was left out last time	2025-07-10 20:36:51 +08:00

39 Commits