FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 08:21:53 +08:00

Author	SHA1	Message	Date
Yuanle Liu	8c3513a410	Revert "[TSP] last_norm allgather move to model.py (#5924 )" (#5961 ) This reverts commit `2bb838fed9`.	2026-01-09 15:20:40 +08:00
xiaoluomi	2bb838fed9	[TSP] last_norm allgather move to model.py (#5924 ) * support_lastnorm_gather_split_dev * support_lastnorm_gather_split_dev1 * support_lastnorm_gather_split_dev3 * support_lastnorm_gather_split_dev4 * support_lastnorm_gather_split_dev5	2026-01-07 23:36:33 -08:00
GoldPancake	a1fc4e249e	[Bugfix] Fix mtp logprob hang problem when include stop_seq (#5927 ) * fix mtp logprob hang when include stop_seq	2026-01-08 14:21:24 +08:00
lizhenyun01	2be8656c29	[BugFix] fix mtp split kv attetion (#5920 ) * [BugFix] fix mtp split kv attetion * clean code * clean code	2026-01-07 04:07:31 -08:00
Ryan	3e74bacc5e	add m_grouped_gemm_fp8_fp8_bf16_nt_contiguous_custom_python_op (#5847 )	2026-01-07 16:17:55 +08:00
fmiao2372	1ee285c2d6	[Intel HPU] enable chunked prefill (#5903 ) * [Intel HPU] enable chunked prefill * fix bug by copilot comments	2026-01-06 21:01:50 +08:00
lizexu123	acdf0cd1d9	fix hadamard_block_size (#5888 )	2026-01-06 14:12:14 +08:00
Neil Zhu	272a371635	[Metax] optimize flash attention backend (#5876 )	2026-01-06 09:52:09 +08:00
lizexu123	1d3ae7c024	[BugFix] fix w4afp8 tp=8 (#5868 ) * fix w4afp8 tp=8 * fix	2026-01-05 18:59:02 +08:00
ming1753	f50e1bcc16	[Others] enable use PFCC deep_ep (#5822 ) * upstream deep_ep * fix bug * fix bug * modify env name	2026-01-05 02:07:01 -08:00
周周周	dc13344ab8	[Optimization] add del to decrease peak memory in MoE prefill (#5863 )	2026-01-05 14:01:48 +08:00
chen	193886e745	only cuda run triton op (#5846 )	2025-12-31 14:17:31 +08:00
GoldPancake	4e10ae5d99	[Speculative Decoding] Optimize draft logprob (#5842 ) * optimize draft logprob * fix ut	2025-12-31 13:35:56 +08:00
chen	0bcf924e10	[Optimization] Optimization for gather_logprob by 10GB (#5817 ) * opt logprobs gather_logprob,reduce device memory usage by 10GB when token_num=8k	2025-12-30 15:33:34 +08:00
lizexu123	44a13e4557	[Feature] support w4afp8 v1_loader and v0_loader(tp>1) (#5757 ) * support * fix * support w4afp8 v1_loader and v0_loader * fix * fix test * fix test * fix test * fix moe.py * add test_ernie_4_5_w4afp8 * add test * delete tensor * fix test * fix * add * fix test	2025-12-30 14:11:52 +08:00
CSWYF3634076	9286403570	[Models] Add Qwen3-VL Model Support (#5763 ) * support v1 loader * remove useless code * remove useless * [Model] support Qwen3VL images success * [Model] support Qwen3VL rope_3d * [Model] support Qwen3VL remove log * [Model] support Qwen3VL RL * [Model] support Qwen3VL tp * [Model] support Qwen3VL video * [Model] support Qwen3VL fix ernievl * [Model] support Qwen3VL fix get_image_boundaries.cc array out of bounds * [Model] support Qwen3VL fix multi card * [Model] support Qwen3VL file close * [Model] support Qwen3VL fix ce * [Model] support Qwen3VL fix unittest * [Model] support Qwen3VL add unittest --------- Co-authored-by: Ayakouji <yuhongh@qq.com>	2025-12-29 17:39:33 +08:00
Ryan	eb782a0225	[BugFix] Fix return value inconsistency for `ep_moe_expert_combine` op (#5812 )	2025-12-29 16:44:00 +08:00
周周周	03363cab4c	make flash_mask attention pybind (#5783 )	2025-12-26 14:31:35 +08:00
Nyakku Shigure	11227e00bb	[GraphOptimization] Wrap deep gemm and triton as python op (#5673 ) * [GraphOptimization] Wrap deep gemm and triton as python op * add unitest to _base_test && compatibility * paddle.static.MetaTensor -> "paddle.static.MetaTensor" * mv register_custom_python_op * rename yaml --------- Co-authored-by: DrRyanHuang <zihaohuang@aliyun.com>	2025-12-24 15:23:46 +08:00
GoldPancake	23d488c488	[Feature] Entropy calculation support (#5692 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details * support entropy * fix bug --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-23 21:19:47 +08:00
bukejiyu	d1c6e57341	[Others] upgrade paddleformer to 0.4.0 (#5599 )	2025-12-23 05:08:01 -08:00
RuohengMa	2c3c983b96	[XPU] modify speculate_verify (#5522 )	2025-12-23 14:50:30 +08:00
Sunny-bot1	04035e4ebf	support w4afp8 two stage (#5608 )	2025-12-22 15:13:05 +08:00
Sunny-bot1	40f3897a4e	support w4afp8 moe offline permute & load (#5613 )	2025-12-22 15:12:57 +08:00
yzwu	ac013803f3	[Iluvatar] Support V1_KVCACHE_SCHEDULER and paddleocr-vl rope mode (#5555 )	2025-12-18 02:14:25 -08:00
Longzhi Wang	d8587e987e	[Model] tp+ep support v1_loader (#5465 ) * [Model] tp+ep support v1_loader * fix * fix mtp_linear * fix mtp_linear * fix * fix * fix v0 loader * fix * Add get_tensor for ep * fix linear weight_loader * fix typo * fix	2025-12-18 14:31:54 +08:00
zhupengyang	8735cb5045	[XPU] refactor moe ffn (#5501 ) - remove BKCL_DISPATCH_ALL_GATHER - support sparse mode - support moe quant_method	2025-12-18 14:14:05 +08:00
fmiao2372	404cf0ece4	[Intel HPU] enable tensor_wise_fp8 (#5324 ) * [Intel HPU] enable tensor_wise_fp8 * update code based on comments * fix code style issue * fix bug about RP 5138 * mv kv_cache modifications to HPU backend * fix FP8 Precision Issues * fix FP8 Precision Issues * Add quantization UT --------- Co-authored-by: yanfeich <yanfei.cheng@intel.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-17 16:45:03 +08:00
freeliuzc	15f5112ecb	[Speculative Decoding]Support different inferseed in speculate decoding (#5568 ) * fix mtp entropy drop in RL * optimize usage and fix unit test * optimize padding_sampling_params speed(vectorized) --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-17 16:14:29 +08:00
RAM	6fc5eccf83	[RL] R3 Support RDMA Store (#5467 ) * [RL] R3 support rdma store * refine notes * refine code * disable prefix cache * support preempted task and put cpu tensor	2025-12-16 16:50:13 +08:00
Yuanle Liu	b8e4828373	[BugFix] fix dynamic c8 in v1 loader (#5562 )	2025-12-15 04:07:54 -08:00
zhang-chenyi	77f8ba06e7	[Metax] fix release2.4 and support cudagraph (#5547 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details Co-authored-by: xiaozude <xiaozude@outlook.com>	2025-12-15 14:23:33 +08:00
Ryan	d01cb274d6	[Graph Optimization][CI] Add ERNIE45T 21B sot test (#5538 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Publish Job / publish_pre_check (push) Has been cancelled Details Publish Job / print_publish_pre_check_outputs (push) Has been cancelled Details Publish Job / FD-Clone-Linux (push) Has been cancelled Details Publish Job / Show Code Archive Output (push) Has been cancelled Details Publish Job / BUILD_SM8090 (push) Has been cancelled Details Publish Job / BUILD_SM8689 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled Details Publish Job / Run FD Image Build (push) Has been cancelled Details Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled Details Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details Publish Job / Run Base Tests (push) Has been cancelled Details Publish Job / Run Accuracy Tests (push) Has been cancelled Details Publish Job / Run Stable Tests (push) Has been cancelled Details CI Images Build / FD-Clone-Linux (push) Has been cancelled Details CI Images Build / Show Code Archive Output (push) Has been cancelled Details CI Images Build / CI Images Build (push) Has been cancelled Details CI Images Build / BUILD_SM8090 (push) Has been cancelled Details CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled Details CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details CI Images Build / Run Base Tests (push) Has been cancelled Details CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled Details	2025-12-13 00:43:15 +08:00
Lucas	888c4b992d	[XPU] refactor of block_attn param 'pos_emb_type' (#5511 )	2025-12-12 14:30:09 +08:00
Ryan	4eb55332f6	[Models] Add forward_meta to VocabParallelEmbedding of all models (#5524 )	2025-12-12 14:11:31 +08:00
bukejiyu	4066dfb4a6	RL fix (#5503 )	2025-12-11 19:25:27 +08:00
Ryan	e58fed3665	[Graph Optimization][BugFix][CI] Fix 0size bug && add unitest (#5495 )	2025-12-11 16:25:26 +08:00
周周周	ff353b922f	[Others] update tbo related code (#5485 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-12-11 12:34:46 +08:00
Neil Zhu	4403a21d4b	[Metax] refactor cutlass moe and optimize flash attention (#5361 ) * [Metax] refactor moe and flash attention backend --------- Co-authored-by: zhangchenyi_dl <16219492+zhangchenyidl@user.noreply.gitee.com>	2025-12-10 17:15:17 +08:00
周周周	83a9ef51d7	[Others] add assert and only count the actual load in cuda_graph (#5445 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details	2025-12-10 11:22:54 +08:00
freeliuzc	53460935ec	fix attention bug in spec decoding (#5460 )	2025-12-10 10:56:37 +08:00
Haonan Luo	e397c4fba6	[Others] remove add_bias option (#5425 )	2025-12-09 17:39:35 +08:00
周周周	31410415db	FA3 support qwen3 (#5441 )	2025-12-09 16:16:16 +08:00
K11OntheBoat	8d99bac532	Remove CUDA ERROR 9 of inputs of get_padding_offset kernel (#5440 ) Co-authored-by: K11OntheBoat <“ruianmaidanglao@163.com”>	2025-12-09 14:17:30 +08:00
chen	76649b45c1	[Optimization] compulte real max_logprobs in batch (#5430 )	2025-12-09 14:15:05 +08:00
xiaozude	c06a6234b9	[Metax] optimize mla attention (#5258 )	2025-12-09 11:18:19 +08:00
Sunny-bot1	364197c4b5	support w4afp8 mtp (#5429 )	2025-12-08 20:24:00 +08:00
周周周	2aea8a3a60	[Others] Remove useless code (#5404 )	2025-12-08 13:59:46 +08:00
bukejiyu	c3a8a16f4c	fix deepseek (#5410 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Publish Job / publish_pre_check (push) Has been cancelled Details Publish Job / print_publish_pre_check_outputs (push) Has been cancelled Details Publish Job / FD-Clone-Linux (push) Has been cancelled Details Publish Job / Show Code Archive Output (push) Has been cancelled Details Publish Job / BUILD_SM8090 (push) Has been cancelled Details Publish Job / BUILD_SM8689 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8090 (push) Has been cancelled Details Publish Job / PADDLE_PYPI_UPLOAD_8689 (push) Has been cancelled Details Publish Job / Run FD Image Build (push) Has been cancelled Details Publish Job / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details Publish Job / Run FastDeploy LogProb Tests (push) Has been cancelled Details Publish Job / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details Publish Job / Run Base Tests (push) Has been cancelled Details Publish Job / Run Accuracy Tests (push) Has been cancelled Details Publish Job / Run Stable Tests (push) Has been cancelled Details CI Images Build / FD-Clone-Linux (push) Has been cancelled Details CI Images Build / Show Code Archive Output (push) Has been cancelled Details CI Images Build / CI Images Build (push) Has been cancelled Details CI Images Build / BUILD_SM8090 (push) Has been cancelled Details CI Images Build / Run FastDeploy Unit Tests and Coverage (push) Has been cancelled Details CI Images Build / Run FastDeploy LogProb Tests (push) Has been cancelled Details CI Images Build / Extracted partial CE model tasks to run in CI. (push) Has been cancelled Details CI Images Build / Run Base Tests (push) Has been cancelled Details CI Images Build / Publish Docker Images Pre Check (push) Has been cancelled Details	2025-12-06 00:45:48 +08:00
bukejiyu	f6eb4dcc40	bf16 deepseek (#5379 )	2025-12-05 22:23:30 +08:00

1 2 3 4 5 ...

369 Commits