FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Files

T

gongweibao 3fabba0dc7 [Feature] Add Triton unified attention kernel for deterministic inference (#6795 )

* [Feature] Add Triton unified attention kernel for deterministic inference

Add a Triton-based unified extend attention kernel that processes both
prefix (cached) and extend (new) KV tokens through a single kernel with
unified kv_indices, ensuring identical accumulation order regardless of
cache hit/miss patterns.

Key components:
- _fwd_kernel_unified: Triton JIT kernel with online softmax, paged KV
  cache support, and causal masking for prefix+extend
- Index building utilities: triton_cumsum_with_zero_prefix,
  build_kv_indices_from_block_tables, build_unified_kv_indices,
  _scatter_extend_kv_indices_kernel (all CUDA Graph compatible)
- pre_cache_len_concat_triton: GPU-only replacement for C++ op
- Reference implementations (_ref variants) for correctness validation
- Comprehensive tests: kernel correctness, split invariance,
  determinism, production-scale, cross-validation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Vectorize causal mask in test references for ~26x speedup

Replace triple Python for-loop with paddle.where vectorized mask in
naive_attention and _build_causal_mask. seq4096 test: 2m39s -> 6s.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix cover

---------

Co-authored-by: gongweibao <gognweibao@baidu.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-16 14:29:45 +08:00

test_build_triton_indices.py

[Feature] Add Triton unified attention kernel for deterministic inference (#6795 )

2026-03-16 14:29:45 +08:00

test_c16_warp1_4_determinism.py

[Feature][BugFix][OP] Enhance Deterministic Inference Mode with Kernel-level Fixes and Batch-invariant BMM (#6610 )

2026-03-09 10:27:53 +08:00

test_determinism_offline_single_gpu.py

[Feature][BugFix][OP] Enhance Deterministic Inference Mode with Kernel-level Fixes and Batch-invariant BMM (#6610 )

2026-03-09 10:27:53 +08:00

test_determinism_standalone.py

…