This website requires JavaScript.
Explore
Help
Sign In
apps
/
FastDeploy
Watch
1
Star
0
Fork
0
You've already forked FastDeploy
mirror of
https://github.com/PaddlePaddle/FastDeploy.git
synced
2026-04-23 00:17:25 +08:00
Code
Issues
Actions
19
Packages
Projects
Releases
Wiki
Activity
Files
f4902fe42dee8054d373f016512529d2d06d1f19
FastDeploy
/
custom_ops
/
gpu_ops
T
History
lizexu123
f4902fe42d
[BugFix] fix wint2 (
#6109
)
...
* fix * fix * fix
2026-01-20 21:46:21 +08:00
..
append_attn
[Speculative Decoding] Support MTP for GLM-4.5-Air (
#6047
)
2026-01-16 14:35:24 +08:00
common
…
custom_all_reduce
Support setting communication groups in custom_allreduce and the all-to-all\transpose fused operator during the decoding phase. (
#5917
)
2026-01-12 14:09:39 +08:00
cutlass_extensions
…
cutlass_kernels
…
flash_mask_attn
…
fp8_gemm_with_cutlass
…
glog
…
int8_gemm_with_cutlass
…
machete
…
mla_attn
[BugFix] fix BatchMLAWithPagedKVCacheKernel O_tmp (
#5895
)
2026-01-06 15:39:06 +08:00
moba_attn
…
moe
【Optim】Optimize grid dimensions using max_tokens_per_expert for MoE models (
#6007
)
2026-01-15 19:18:42 +08:00
quantization
…
sample_kernels
…
speculate_decoding
[Feature]Support tag phase token enforce generation (
#6034
)
2026-01-15 03:59:55 -08:00
w4afp8_gemm
[BugFix] fix wint2 (
#6109
)
2026-01-20 21:46:21 +08:00
wfp8afp8_sparse_gemm
…
append_attention.cu
…
beam_search_softmax.cu
…
cpp_extensions.cc
[Feature] Unify fp8 block_wise quant ops (
#5991
)
2026-01-15 05:50:37 -08:00
cuda_multiprocess.h
…
dequant_int8.cu
…
enforce_generation.cu
…
env.h
…
fused_get_rotary_embedding.cu
…
fused_hadamard_quant_fp8.cu
…
fused_neox_rope_embedding.cu
…
fused_rotary_position_encoding.cu
…
gather_idx.cu
…
gelu_tanh.cu
…
get_data_ptr_ipc.cu
…
get_img_boundaries.cc
…
get_mm_split_fuse.cc
…
get_output_ep.cc
…
get_output_msg_with_topk.cc
…
get_output.cc
…
get_padding_offset_system.cu
…
get_padding_offset.cu
…
get_position_ids_and_mask_encoder_batch.cu
…
helper.cu
…
helper.h
…
init_signal_layerwise.cc
…
ipc_sent_key_value_cache_by_remote_ptr.cu
[Feature] support rdma pd dy-c8 (
#5788
)
2026-01-07 14:55:25 +08:00
limit_thinking_content_length_v1.cu
…
limit_thinking_content_length_v2.cu
…
merge_prefill_decode_output.cu
…
msg_utils.h
…
multi_head_latent_attention.cu
…
ngram_mask.cu
…
noaux_tc_redundant.cu
…
noaux_tc.cu
…
noauxtc_kernel.h
…
open_shm_and_get_meta_signal.cc
…
per_token_quant_fp8.cu
[Feature] Unify fp8 block_wise quant ops (
#5991
)
2026-01-15 05:50:37 -08:00
read_data_ipc.cu
…
read_ids.py
…
read_temp_ids.py
…
reasoning_phase_token_constraint.cu
[Feature]Support tag phase token enforce generation (
#6034
)
2026-01-15 03:59:55 -08:00
rebuild_padding.cu
…
recover_decode_task.cu
…
remote_cache_kv_ipc.cc
…
remote_cache_kv_ipc.h
…
save_output_msg_with_topk.cc
[Optim] Robust sync status when preempted happens (
#5796
)
2026-01-14 12:07:33 +08:00
save_with_output_msg.cc
[Optim] Robust sync status when preempted happens (
#5796
)
2026-01-14 12:07:33 +08:00
save_with_output_msg.h
…
save_with_output.cc
…
scaled_gemm_f8_i4_f16_gemm.cu
…
scaled_gemm_f8_i4_f16_weight_quantize.cu
…
seqs2seqs.cu
…
set_data_ipc.cu
…
set_flags.cu
…
set_mask_value.cu
…
set_value_by_flags_and_idx.cu
…
share_external_data.cu
…
step_reschedule.cu
…
step_system_cache.cu
…
step.cu
…
stop_generation_multi_ends.cu
…
stop_generation.cu
…
swap_cache_batch.cu
…
swap_cache_layout.cu
…
swap_cache.cu
…
system2group.cu
…
text_image_gather_scatter.cu
…
text_image_index_out.cu
…
token_penalty_multi_scores.cu
[Optimization] Avoid unnecessary penalty computation (
#6078
)
2026-01-19 15:24:12 +08:00
token_penalty_only_once.cu
…
token_transfer.hpp
…
transfer_output.cc
…
tune_cublaslt_gemm.cu
…
unset_data_ipc.cu
…
update_attn_mask_offsets.cu
…
update_inputs_beam.cu
…
update_inputs_v1.cu
…
update_inputs.cu
…
update_split_fuse_input.cu
…