FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Files

T

lizexu123 f4902fe42d [BugFix] fix wint2 (#6109 )

* fix

* fix

* fix

2026-01-20 21:46:21 +08:00

append_attn

[Speculative Decoding] Support MTP for GLM-4.5-Air (#6047 )

2026-01-16 14:35:24 +08:00

common

…

custom_all_reduce

Support setting communication groups in custom_allreduce and the all-to-all\transpose fused operator during the decoding phase. (#5917 )

2026-01-12 14:09:39 +08:00

cutlass_extensions

…

cutlass_kernels

…

flash_mask_attn

…

fp8_gemm_with_cutlass

…

glog

…

int8_gemm_with_cutlass

…

machete

…

mla_attn

[BugFix] fix BatchMLAWithPagedKVCacheKernel O_tmp (#5895 )

2026-01-06 15:39:06 +08:00

moba_attn

…

moe

【Optim】Optimize grid dimensions using max_tokens_per_expert for MoE models (#6007 )

2026-01-15 19:18:42 +08:00

quantization

…

sample_kernels

…

speculate_decoding

[Feature]Support tag phase token enforce generation (#6034 )

2026-01-15 03:59:55 -08:00

w4afp8_gemm

[BugFix] fix wint2 (#6109 )

2026-01-20 21:46:21 +08:00

wfp8afp8_sparse_gemm

…

append_attention.cu

…

beam_search_softmax.cu

…

cpp_extensions.cc

[Feature] Unify fp8 block_wise quant ops (#5991 )

2026-01-15 05:50:37 -08:00

cuda_multiprocess.h

…

dequant_int8.cu

…

enforce_generation.cu

…

env.h

…

fused_get_rotary_embedding.cu

…

fused_hadamard_quant_fp8.cu

…

fused_neox_rope_embedding.cu

…

fused_rotary_position_encoding.cu

…

gather_idx.cu

…

gelu_tanh.cu

…

get_data_ptr_ipc.cu

…

get_img_boundaries.cc

…

get_mm_split_fuse.cc

…

get_output_ep.cc

…

get_output_msg_with_topk.cc

…

get_output.cc

…

get_padding_offset_system.cu

…

get_padding_offset.cu

…

get_position_ids_and_mask_encoder_batch.cu

…

helper.cu

…

helper.h

…

init_signal_layerwise.cc

…

ipc_sent_key_value_cache_by_remote_ptr.cu

[Feature] support rdma pd dy-c8 (#5788 )

2026-01-07 14:55:25 +08:00

limit_thinking_content_length_v1.cu

…

limit_thinking_content_length_v2.cu

…

merge_prefill_decode_output.cu

…

msg_utils.h

…

multi_head_latent_attention.cu

…

ngram_mask.cu

…

noaux_tc_redundant.cu

…

noaux_tc.cu

…

noauxtc_kernel.h

…

open_shm_and_get_meta_signal.cc

…

per_token_quant_fp8.cu

[Feature] Unify fp8 block_wise quant ops (#5991 )

2026-01-15 05:50:37 -08:00

read_data_ipc.cu

…

read_ids.py

…

read_temp_ids.py

…

reasoning_phase_token_constraint.cu

[Feature]Support tag phase token enforce generation (#6034 )

2026-01-15 03:59:55 -08:00

rebuild_padding.cu

…

recover_decode_task.cu

…

remote_cache_kv_ipc.cc

…

remote_cache_kv_ipc.h

…

save_output_msg_with_topk.cc

[Optim] Robust sync status when preempted happens (#5796 )

2026-01-14 12:07:33 +08:00

save_with_output_msg.cc

[Optim] Robust sync status when preempted happens (#5796 )

2026-01-14 12:07:33 +08:00

save_with_output_msg.h

…

save_with_output.cc

…

scaled_gemm_f8_i4_f16_gemm.cu

…

scaled_gemm_f8_i4_f16_weight_quantize.cu

…

seqs2seqs.cu

…

set_data_ipc.cu

…

set_flags.cu

…

set_mask_value.cu

…

set_value_by_flags_and_idx.cu

…

share_external_data.cu

…

step_reschedule.cu

…

step_system_cache.cu

…

step.cu

…

stop_generation_multi_ends.cu

…

stop_generation.cu

…

swap_cache_batch.cu

…

swap_cache_layout.cu

…

swap_cache.cu

…

system2group.cu

…

text_image_gather_scatter.cu

…

text_image_index_out.cu

…

token_penalty_multi_scores.cu

[Optimization] Avoid unnecessary penalty computation (#6078 )

2026-01-19 15:24:12 +08:00

token_penalty_only_once.cu

…

token_transfer.hpp

…

transfer_output.cc

…

tune_cublaslt_gemm.cu

…

unset_data_ipc.cu

…

update_attn_mask_offsets.cu

…

update_inputs_beam.cu

…

update_inputs_v1.cu

…

update_inputs.cu

…

update_split_fuse_input.cu

…