FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Author	SHA1	Message	Date
bukejiyu	12d4b4cb87	[Feature]Support reorder ids to split prefill and decodes (#5779 ) * support reorder ids * perfect code * fix * fix unittest * delete code * fix * add python api * delete custom op * update algorithm * fix swap * support condense * support condense * support mtp * delete code * update * update * update * update * update for other platfrom * update * fix * fix mtp * fix ut * update * fix ut * update ut * fix * fix encoder_cache * fix ci * fix * fix vl * Fix performance regression * fix * fix * fix mtp * fix index->req_id mapping * fix ut --------- Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com> Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2026-02-03 00:28:02 -08:00
xiaozude	030647521a	[Metax] adapt to the latest develop (#6282 )	2026-01-29 23:21:20 -08:00
freeliuzc	ce06c6dfb3	[BugFix] Fix token_penalty kernel (#6069 ) * fix token_penalty kernel * try to fix xpu * fix xpu * fix unit test	2026-01-28 12:03:05 +08:00
sunxin	adc69c15d0	[Model Runner] Prepare token count and move FA3 initialization into the graph (#6170 ) * prepare for token num and put FA3 init in graph	2026-01-26 12:16:57 +08:00
周周周	0966df78dc	[Others] remove stop_nums (#6182 )	2026-01-26 12:12:47 +08:00
Yonghua Li	833d00e2d7	[BugFix] move cache creation back to cache transfer process and adapt clear/update (#6144 ) * [fix] move cache creation back to cache transfer process * [fix] fix clear cache * [chore] change some log level * [fix] fix clear cache * [fix] fix clear cache for blockwisefp8 and mtp * [fix] fix c8 * [fix] fix clear_mtp_cache args * [chore] update cache_transfer_manager * [fix] fix update mtp cache	2026-01-24 21:59:13 +08:00
GoldPancake	bda38aa519	[Speculative Decoding] Support MTP for GLM-4.5-Air (#6047 ) * glm mtp * add spec neox partial rope	2026-01-16 14:35:24 +08:00
xiaoluomi	62bd92f9ba	dev_fix_mtp_forward_meta (#5976 )	2026-01-10 00:40:56 +08:00
Yuanle Liu	d4a386dfc4	Revert "Revert "[TSP] last_norm allgather move to model.py (#5924 )" (#5961 )" (#5972 ) This reverts commit `8c3513a410`.	2026-01-09 15:58:22 +08:00
Yuanle Liu	8c3513a410	Revert "[TSP] last_norm allgather move to model.py (#5924 )" (#5961 ) This reverts commit `2bb838fed9`.	2026-01-09 15:20:40 +08:00
xiaoluomi	2bb838fed9	[TSP] last_norm allgather move to model.py (#5924 ) * support_lastnorm_gather_split_dev * support_lastnorm_gather_split_dev1 * support_lastnorm_gather_split_dev3 * support_lastnorm_gather_split_dev4 * support_lastnorm_gather_split_dev5	2026-01-07 23:36:33 -08:00
Yonghua Li	9fc2400e71	[BugFix] fix mtp cache attaching for pd disaggregation (#5884 ) * [fix] fix mtp cache attaching for pd disaggregation * [fix] fix test_mtp_proposer.py	2026-01-06 14:17:53 +08:00
freeliuzc	ca574119e5	support multi-step draft-model with cudagraph (#5886 )	2026-01-06 11:16:21 +08:00
cmcamdy	690d4bcdb0	[XPU] Speculative Decoding with PD (#5856 ) * [XPU] Speculative Decoding with PD * fix post process * share kv cache sender * support speculate decoding step system cache * support speculate decoding step system cache --------- Co-authored-by: root <root@gajl-bbc-onlinec-com-1512108.gajl.baidu.com>	2026-01-05 17:31:03 +08:00
Yonghua Li	5e4e6692a4	[BugFix] fix cache manager not launched in case of mtp or blockwise fp8 (#5840 ) * [BugFix] fix cache manager not launched in case of mtp or blockwise fp8 * [fix] fix mtp cache in mtp.py * [fix] fix gpu ops import * [fix] fix mtp layer idx * [fix] fix xpu model runner mtp cache * [fix] fix mtp import	2026-01-04 04:35:37 -08:00
GoldPancake	4e10ae5d99	[Speculative Decoding] Optimize draft logprob (#5842 ) * optimize draft logprob * fix ut	2025-12-31 13:35:56 +08:00
freeliuzc	9018ccf74e	[Speculative Decoding] Fix attn_mask_offset for multi-step MTP in mixed and PD-split modes (#5738 ) * fix attn_mask_offset in mtp with multi-step and pd-split-mode * fix xpu operater register * update pmtp multi-step mtp strategy in d-split -mode * add note * fix xpu register	2025-12-25 01:54:59 -08:00
lizan1999	e1a9b282eb	fix bug for EP+MTP (#5605 ) Co-authored-by: lizan1999 <lizan03@baidu.com>	2025-12-18 14:34:54 +08:00
Lucas	888c4b992d	[XPU] refactor of block_attn param 'pos_emb_type' (#5511 )	2025-12-12 14:30:09 +08:00
kevin	db936ab3e4	fix mtp prefix_cache dy-c8 bug (#5390 )	2025-12-05 19:03:19 +08:00
kevin	c9d7f9e7c3	[BugFix] fix async download bug (#5349 ) * fix async download bug * update log * Revert "update log" This reverts commit `5816e602f4`. * update code * fix mtp bug	2025-12-05 18:59:12 +08:00
Longzhi Wang	5cd17fd662	[Models] Add forward_meta to moe models' forward function (#5138 ) * [Models] Add forward_meta to moe models' forward function * fix missing param * fix * fix * fix forward_meta * fix test and remove chunked MoE releated in config * fix test * fix * fix	2025-12-04 13:26:58 +08:00
cmcamdy	9f4977eb74	[xpu] support mtp for xpu(mix) (#5274 ) * [XPU] support kernel for mtp(base) * [XPU] support kernel for mtp(base) * format * format * format * fix gather next token * fix step && add test * fix * mv pre/post process * add adjust batch / gather next token for mtp * fix code style * fix mtp kenrel name * fix mtp kernel test * mv xpu pre/post process * mv xpu pre/post process * [xpu] support mtp * fix code style	2025-12-01 11:03:14 +08:00
GoldPancake	cfc5b0ccf9	[BugFix] fix mtp logprob bugs in chunk prefill (#5244 ) * fix mtp logprob bugs in chunk prefill * fix * fix	2025-11-27 11:31:29 +08:00
freeliuzc	ba915e03e1	[BugFix]Fix attention mask bug in D-Node of PD-split mode (#5245 )	2025-11-26 17:56:28 +08:00
freeliuzc	214942e1ae	fix kernel output extract (#5208 )	2025-11-26 16:48:42 +08:00
freeliuzc	5c8c2d47eb	[Speculative Decoding][MTP]Update extract_mtp_weight script and optimize config (#5183 ) * update extract_mtp_model * modify config usage	2025-11-25 14:09:03 +08:00
freeliuzc	2d1dade5e2	[Speculative Decoding][MTP] Support static CacheKV C8 quantization and optimize memory usage (#5155 ) * support static cachekv c8 quantization in mtp mode * optimize memory allocation	2025-11-21 15:10:13 +08:00
ltd0924	5bf48de999	[KVCache] support unified cache backend (#4903 ) * [Feature] support unified cache backend * fix * fix * fix * fix * Update metax_model_runner.py * fix * update * Update test_moba_attention_backend.py --------- Co-authored-by: ltd0924 <luotingdan@baidu.com>	2025-11-12 14:54:52 +08:00
K11OntheBoat	76be598129	replace paddle.max by numpy to avoid useless error log (#4893 ) Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>	2025-11-11 16:28:05 +08:00
周周周	876e4a8935	remove input_ids from ForwardMeta (#4793 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-11-05 11:55:51 +08:00
freeliuzc	11398790d3	[Speculative Decoding][MTP]Support attn mask offset (#4641 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * [MTP]Merge support attn (#4591) * support mask_offset in speculate decoding * fix dummpy run output * add unit test * fix unit test import * support attn_mask_offset in mtp mode * add update_attn_mask op * fix unit test && fix code-style	2025-11-03 10:08:01 +08:00
freeliuzc	f44f4bafd1	support mtp in splitewise and scheduler_v1 mode (#4743 )	2025-11-03 10:07:15 +08:00
GoldPancake	1f3ce65b58	[Feature] support mtp distribution equivalence verification (#4699 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-10-31 11:45:04 +08:00
RAM	cd3b7cc392	[Graph Optimization] Add the CUDAGraph usage switch for Draft Model (#4601 ) * add draft model using cudagraph switch * set default as false * capture draft model in ci * fix bug	2025-10-30 11:44:50 +08:00
freeliuzc	c63361fd1d	[Speculative Decoding][MTP]Support mtp in epdptp mode (#4614 ) * support mtp many features * support mtp reshard in rl mode * fix function * support mtp ep * support mtp in hybird-dp-tp mode * default open scheduler_v1 in mtp	2025-10-28 16:02:47 +08:00
GoldPancake	47595a2480	[Feature] support mtp logprob (#4464 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * support mtp logprob * fix unitest	2025-10-20 15:18:12 +08:00
RAM	528c55776e	[Graph Optimization][Speculative Decoding] Fix the bug of CUDAGraph + MTP + EP (#4456 ) * Fix MTP dummy run bug * Target Model and Draft Model using the same flag * In mtp replace use_cudagraph as step_use_cudagraph	2025-10-20 10:38:55 +08:00
YuanRisheng	0355235fb9	[FDConfig]Remove total_block_num/dtype/block_size/enc_dec_block_num in ParallelConfig (#4400 ) * delete some attr in parallel config * delete comment --------- Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com>	2025-10-16 20:00:37 +08:00
Sunny-bot1	a751d977bc	[Optimization] Fuse get_max_len and get_kv_max_len (#4369 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * opt split_q_block * fuse max_lens and max kv_len	2025-10-13 20:35:00 +08:00
YuanRisheng	a2ec2c4152	[FDConfig]Remove max_model_len in FDConfig (#4350 ) * modify max_model_len * fix unittest * fix unittest --------- Co-authored-by: root <root@yqlcc01-sys-rpm12rzmwjd.yqlcc01.baidu.com>	2025-10-11 14:04:17 +08:00
RAM	aa27b03bc0	[Executor]CUDAGraph support Speculate Decode (#3769 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * success run ngram * Revert "[Code Simplification] remove cum_offsets (#3410)" This reverts commit `32b39620bc`. * success run ngram5 tp4 42bs * success run ngram5 tp4 42bs * mtp draft commit * add decorator for target model * enable draft model in cudagraph v0.5 * revert revrt cum_offset * enable target model in cudagraph v0.9 And clean debug code * Revert "success run ngram" This reverts commit `8351e83993`. * add reverted code * enable target model in cudagraph v0.9 * solve comment * fix bid < 0 * Enable Target Model Padding And Draft Model in cudagraph * solve problem * delete rebuild padding debug note * fast compile * Add capture list for mtp * success run 256 tp1 mtp * Enable Lite TP2 Bsz256 * realy enable tp2 bsz 256 * fix problem * Solve problem for Draft model in cudagraph * Solve comment * replace emptytensor as zeros * Solve comments * Revert "fast compile" This reverts commit `834639a7ff`. * fix bug * fix merge bug * fix typo * fix bug --------- Co-authored-by: lizexu <2694294196@qq.com> Co-authored-by: littledgg <1658565283@qq.com> Co-authored-by: zeroRains <linjunlu@zerorains.top> Co-authored-by: gongshaotian <gstain5555@outlook.com>	2025-10-09 21:18:29 +08:00
YuanRisheng	24180fba0a	[FDConfig]Remove splitwise_role and engine_worker_queue_port in FDConfig (#4147 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details * remove splitwise_role and engine_worker_queue_port * fix xpu * fix xpu * fix xpu * fix unittest * resolve conflct	2025-09-19 17:01:52 +08:00
freeliuzc	46911f903d	[MTP]update hybrid-mtp-with-ngram (#4047 )	2025-09-15 17:13:31 +08:00
AIbin	a7392a0ff9	【Inference Optimize】DeepSeek-V3-model MLA Optimize (#3886 ) * support MLA chunk_size auto search & cuda_graph	2025-09-11 10:46:09 +08:00
Yuanle Liu	c3b2a60fb8	[BugFix] Fix the abnormal memory usage caused by shape errors in the triton moe backend (#4026 ) * fix device_id to in * fix triton_moe bug	2025-09-09 20:05:54 -07:00
Jundong Liu	3d0aaa5923	[Excutor] Experiment Feature-Support Prefill in cudagraph (#3459 ) * Support prefill in Cudagraph * Refactor GetBlockShapeAndSplitKVBlock Kernel V2 * Refactor GetBlockShapeAndSplitKVBlock Kernel V2.1 * Refactor GetBlockShapeAndSplitKVBlock Kernel V2.2 * Refactor GetBlockShapeAndSplitKVBlock Kernel V2.3 * Refactor GetBlockShapeAndSplitKVBlock Kernel V2.4 * Refactor GetBlockShapeAndSplitKVBlock Kernel V2.5 * Solve problem about encoder_num_blocks_x_cpu * Add early-exit mechanism for attention kernel * fix test case about append-attention * Update testcode, Add annotations to related tensors * move get_input_length_list * solve test_code * Add annotations about early-exit for attention kernel * Add annotations about early-exit for attention kernel2 * solve comment * solve mtp --------- Co-authored-by: RAM <gstian5555@outlook.com>	2025-09-08 13:12:24 +08:00
freeliuzc	88d44a2c93	support mtp in v1_scheduler mode (#3695 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details Publish Job / publish_pre_check (push) Has been cancelled Details Publish Job / print_publish_pre_check_outputs (push) Has been cancelled Details Publish Job / FD-Clone-Linux (push) Has been cancelled Details Publish Job / Show Code Archive Output (push) Has been cancelled Details Publish Job / BUILD_SM8090 (push) Has been cancelled Details Publish Job / BUILD_SM8689 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled Details Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled Details Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details Publish Job / Run Base Tests (push) Has been cancelled Details Publish Job / Run Accuracy Tests (push) Has been cancelled Details Publish Job / Run Stable Tests (push) Has been cancelled Details CI Images Build / FD-Clone-Linux (push) Has been cancelled Details CI Images Build / Show Code Archive Output (push) Has been cancelled Details CI Images Build / CI Images Build (push) Has been cancelled Details CI Images Build / BUILD_SM8090 (push) Has been cancelled Details CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled Details CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details CI Images Build / Run Base Tests (push) Has been cancelled Details CI Images Build / Run Accuracy Tests (push) Has been cancelled Details CI Images Build / Run Stable Tests (push) Has been cancelled Details CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled Details	2025-09-04 17:39:59 +08:00
freeliuzc	52eda7fdb3	[Feature][MTP]support new speculative decoding method named hybrid mtp with ngram (#3610 )	2025-08-26 14:29:22 +08:00
lizexu123	32b39620bc	[Code Simplification] remove cum_offsets (#3410 ) Deploy GitHub Pages / deploy (push) Has been cancelled Details Publish Job / publish_pre_check (push) Has been cancelled Details Publish Job / print_publish_pre_check_outputs (push) Has been cancelled Details Publish Job / FD-Clone-Linux (push) Has been cancelled Details Publish Job / Show Code Archive Output (push) Has been cancelled Details Publish Job / BUILD_SM8090 (push) Has been cancelled Details Publish Job / BUILD_SM8689 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled Details Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled Details Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details Publish Job / Run Base Tests (push) Has been cancelled Details Publish Job / Run Accuracy Tests (push) Has been cancelled Details	2025-08-18 20:21:25 +08:00

1 2

75 Commits