FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Author	SHA1	Message	Date
zccjjj	20de04e249	[XPU] move xpu_attn_backend.py to FastDeploy/fastdeploy/model_executor/layers/backends/xpu (#5878 )	2026-01-09 16:34:57 +08:00
Yuanle Liu	d4a386dfc4	Revert "Revert "[TSP] last_norm allgather move to model.py (#5924 )" (#5961 )" (#5972 ) This reverts commit `8c3513a410`.	2026-01-09 15:58:22 +08:00
Yuanle Liu	8c3513a410	Revert "[TSP] last_norm allgather move to model.py (#5924 )" (#5961 ) This reverts commit `2bb838fed9`.	2026-01-09 15:20:40 +08:00
GoldPancake	e41d434548	[Bugfix] Fix entropy calculation bugs (#5941 ) * fix entropy bugs	2026-01-08 20:57:35 +08:00
xiaoluomi	2bb838fed9	[TSP] last_norm allgather move to model.py (#5924 ) * support_lastnorm_gather_split_dev * support_lastnorm_gather_split_dev1 * support_lastnorm_gather_split_dev3 * support_lastnorm_gather_split_dev4 * support_lastnorm_gather_split_dev5	2026-01-07 23:36:33 -08:00
GoldPancake	a1fc4e249e	[Bugfix] Fix mtp logprob hang problem when include stop_seq (#5927 ) * fix mtp logprob hang when include stop_seq	2026-01-08 14:21:24 +08:00
FocusLuo	decbbb3933	[INTEL HPU] support only one release package of PaddleCustomDevice (#5910 ) Signed-off-by: Luo, Focus <focus.luo@intel.com>	2026-01-08 11:57:13 +08:00
CSWYF3634076	d8fcb7c07d	[Models] Add Qwen3-VL Moe Model Support (#5913 ) * [Model] add Qwen3vl moe model support * [Model] add Qwen3vl moe model support remove log * [Model] add Qwen3vl moe model support unittest	2026-01-08 11:36:42 +08:00
FocusLuo	64f910553e	[INTEL_HPU] supported ERNIE-4.5-21B-A3B-Thinking (#5891 ) ERNIE-4.5-21B-A3B-Thinking needs to use DefaultModelLoaderV1 mode reference command line: ENABLE_V1_KVCACHE_SCHEDULER=1 FD_ENC_DEC_BLOCK_NUM=8 HPU_PERF_BREAKDOWN_SYNC_MODE=1 \ HPU_WARMUP_BUCKET=0 MAX_PREFILL_NUM=1 FD_ATTENTION_BACKEND=HPU_ATTN \ python -m fastdeploy.entrypoints.openai.api_server --model \ ./models--baidu--ERNIE-4.5-21B-A3B-Thinking/snapshots/4341bb42644d5422859509fa25d41544c57181f8/ \ --port 8388 --engine-worker-queue-port 8302 --metrics-port 8301 \ --cache-queue-port 8303 --max-model-len 16384 --tensor-parallel-size 1 \ --load-choices "default_v1" --num-gpu-blocks-override 5000 --kv-cache-ratio 0.5 \ --max-num-seqs 128 --block-size 64 --no-enable-prefix-caching \ --graph-optimization-config '{"use_cudagraph":false}' Signed-off-by: Luo, Focus <focus.luo@intel.com>	2026-01-07 21:31:53 +08:00
lizhenyun01	2be8656c29	[BugFix] fix mtp split kv attetion (#5920 ) * [BugFix] fix mtp split kv attetion * clean code * clean code	2026-01-07 04:07:31 -08:00
Ryan	3e74bacc5e	add m_grouped_gemm_fp8_fp8_bf16_nt_contiguous_custom_python_op (#5847 )	2026-01-07 16:17:55 +08:00
sunxin	6ee8241521	[V1 Loader] Support loading static C8 scale JSON (#5909 ) * v1 loader: support loading static C8 scale JSON * update	2026-01-06 19:49:30 -08:00
fmiao2372	1ee285c2d6	[Intel HPU] enable chunked prefill (#5903 ) * [Intel HPU] enable chunked prefill * fix bug by copilot comments	2026-01-06 21:01:50 +08:00
周周周	83ae59431e	[BugFix] fix BatchMLAWithPagedKVCacheKernel O_tmp (#5895 )	2026-01-06 15:39:06 +08:00
lizexu123	acdf0cd1d9	fix hadamard_block_size (#5888 )	2026-01-06 14:12:14 +08:00
Neil Zhu	272a371635	[Metax] optimize flash attention backend (#5876 )	2026-01-06 09:52:09 +08:00
lizexu123	1d3ae7c024	[BugFix] fix w4afp8 tp=8 (#5868 ) * fix w4afp8 tp=8 * fix	2026-01-05 18:59:02 +08:00
ming1753	f50e1bcc16	[Others] enable use PFCC deep_ep (#5822 ) * upstream deep_ep * fix bug * fix bug * modify env name	2026-01-05 02:07:01 -08:00
cmcamdy	690d4bcdb0	[XPU] Speculative Decoding with PD (#5856 ) * [XPU] Speculative Decoding with PD * fix post process * share kv cache sender * support speculate decoding step system cache * support speculate decoding step system cache --------- Co-authored-by: root <root@gajl-bbc-onlinec-com-1512108.gajl.baidu.com>	2026-01-05 17:31:03 +08:00
周周周	dc13344ab8	[Optimization] add del to decrease peak memory in MoE prefill (#5863 )	2026-01-05 14:01:48 +08:00
chen	193886e745	only cuda run triton op (#5846 )	2025-12-31 14:17:31 +08:00
GoldPancake	4e10ae5d99	[Speculative Decoding] Optimize draft logprob (#5842 ) * optimize draft logprob * fix ut	2025-12-31 13:35:56 +08:00
chen	0bcf924e10	[Optimization] Optimization for gather_logprob by 10GB (#5817 ) * opt logprobs gather_logprob,reduce device memory usage by 10GB when token_num=8k	2025-12-30 15:33:34 +08:00
lizexu123	44a13e4557	[Feature] support w4afp8 v1_loader and v0_loader(tp>1) (#5757 ) * support * fix * support w4afp8 v1_loader and v0_loader * fix * fix test * fix test * fix test * fix moe.py * add test_ernie_4_5_w4afp8 * add test * delete tensor * fix test * fix * add * fix test	2025-12-30 14:11:52 +08:00
GoldPancake	e78e22ebd5	[BugFix] Fix entropy bugs (#5818 ) * fix entropy bugs * fix ut * fix	2025-12-29 20:44:29 -08:00
CSWYF3634076	9286403570	[Models] Add Qwen3-VL Model Support (#5763 ) * support v1 loader * remove useless code * remove useless * [Model] support Qwen3VL images success * [Model] support Qwen3VL rope_3d * [Model] support Qwen3VL remove log * [Model] support Qwen3VL RL * [Model] support Qwen3VL tp * [Model] support Qwen3VL video * [Model] support Qwen3VL fix ernievl * [Model] support Qwen3VL fix get_image_boundaries.cc array out of bounds * [Model] support Qwen3VL fix multi card * [Model] support Qwen3VL file close * [Model] support Qwen3VL fix ce * [Model] support Qwen3VL fix unittest * [Model] support Qwen3VL add unittest --------- Co-authored-by: Ayakouji <yuhongh@qq.com>	2025-12-29 17:39:33 +08:00
Ryan	eb782a0225	[BugFix] Fix return value inconsistency for `ep_moe_expert_combine` op (#5812 )	2025-12-29 16:44:00 +08:00
Nyakku Shigure	da9ea88a3b	[BugFix] Correct condition for `reversed_window_indices` in `SiglipEncoder` (#5795 )	2025-12-26 19:16:07 +08:00
周周周	03363cab4c	make flash_mask attention pybind (#5783 )	2025-12-26 14:31:35 +08:00
yzwu	7b6cc11952	[Iluvatar] Fix FD launch error when specifing CUDA_VISBLE_DEVICE (#5735 )	2025-12-26 14:01:27 +08:00
qw86972190	135e47d551	[XPU]ZMQ logprob (#5628 ) * [XPU]ZMQ logprob	2025-12-25 14:50:01 +08:00
bukejiyu	f0bbdce849	[Loader]Fix bug in MTP weight loading (#5744 ) * fix torch mtp * fix * update	2025-12-25 11:32:17 +08:00
Nyakku Shigure	11227e00bb	[GraphOptimization] Wrap deep gemm and triton as python op (#5673 ) * [GraphOptimization] Wrap deep gemm and triton as python op * add unitest to _base_test && compatibility * paddle.static.MetaTensor -> "paddle.static.MetaTensor" * mv register_custom_python_op * rename yaml --------- Co-authored-by: DrRyanHuang <zihaohuang@aliyun.com>	2025-12-24 15:23:46 +08:00
bukejiyu	ba4b7afb3a	[Others] Rename tensor_parallel_degree to tensor_model_parallel_size for paddleformers 0.4.1 (#5727 )	2025-12-23 23:19:11 -08:00
GoldPancake	23d488c488	[Feature] Entropy calculation support (#5692 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details * support entropy * fix bug --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-23 21:19:47 +08:00
bukejiyu	d1c6e57341	[Others] upgrade paddleformer to 0.4.0 (#5599 )	2025-12-23 05:08:01 -08:00
RuohengMa	2c3c983b96	[XPU] modify speculate_verify (#5522 )	2025-12-23 14:50:30 +08:00
bukejiyu	6c36a17369	[Others]Prevent core dumps during Paddle version check (#5657 )	2025-12-22 21:57:45 -08:00
Sunny-bot1	04035e4ebf	support w4afp8 two stage (#5608 )	2025-12-22 15:13:05 +08:00
Sunny-bot1	40f3897a4e	support w4afp8 moe offline permute & load (#5613 )	2025-12-22 15:12:57 +08:00
bukejiyu	4aa2c6871b	[RL]Support loading weights via the load_weights function for RL (#5549 ) * RL support load_weights * fix	2025-12-18 02:27:05 -08:00
yzwu	ac013803f3	[Iluvatar] Support V1_KVCACHE_SCHEDULER and paddleocr-vl rope mode (#5555 )	2025-12-18 02:14:25 -08:00
Longzhi Wang	d8587e987e	[Model] tp+ep support v1_loader (#5465 ) * [Model] tp+ep support v1_loader * fix * fix mtp_linear * fix mtp_linear * fix * fix * fix v0 loader * fix * Add get_tensor for ep * fix linear weight_loader * fix typo * fix	2025-12-18 14:31:54 +08:00
zhupengyang	8735cb5045	[XPU] refactor moe ffn (#5501 ) - remove BKCL_DISPATCH_ALL_GATHER - support sparse mode - support moe quant_method	2025-12-18 14:14:05 +08:00
fmiao2372	404cf0ece4	[Intel HPU] enable tensor_wise_fp8 (#5324 ) * [Intel HPU] enable tensor_wise_fp8 * update code based on comments * fix code style issue * fix bug about RP 5138 * mv kv_cache modifications to HPU backend * fix FP8 Precision Issues * fix FP8 Precision Issues * Add quantization UT --------- Co-authored-by: yanfeich <yanfei.cheng@intel.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-17 16:45:03 +08:00
freeliuzc	15f5112ecb	[Speculative Decoding]Support different inferseed in speculate decoding (#5568 ) * fix mtp entropy drop in RL * optimize usage and fix unit test * optimize padding_sampling_params speed(vectorized) --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>	2025-12-17 16:14:29 +08:00
Yuanle Liu	867803ae10	[BugFix] fix speculate_limit_thinking_content_length (#5590 ) * fix speculate_limit_thinking_content_length * update	2025-12-16 04:31:45 -08:00
RAM	6fc5eccf83	[RL] R3 Support RDMA Store (#5467 ) * [RL] R3 support rdma store * refine notes * refine code * disable prefix cache * support preempted task and put cpu tensor	2025-12-16 16:50:13 +08:00
Yuanle Liu	b8e4828373	[BugFix] fix dynamic c8 in v1 loader (#5562 )	2025-12-15 04:07:54 -08:00
zhang-chenyi	77f8ba06e7	[Metax] fix release2.4 and support cudagraph (#5547 ) CE Compile Job / ce_job_pre_check (push) Has been cancelled Details CE Compile Job / print_ce_job_pre_check_outputs (push) Has been cancelled Details CE Compile Job / FD-Clone-Linux (push) Has been cancelled Details CE Compile Job / Show Code Archive Output (push) Has been cancelled Details CE Compile Job / BUILD_SM8090 (push) Has been cancelled Details CE Compile Job / BUILD_SM8689 (push) Has been cancelled Details CE Compile Job / CE_UPLOAD (push) Has been cancelled Details Deploy GitHub Pages / deploy (push) Has been cancelled Details Co-authored-by: xiaozude <xiaozude@outlook.com>	2025-12-15 14:23:33 +08:00

1 2 3 4 5 ...

521 Commits