FastDeploy

mirror of https://github.com/PaddlePaddle/FastDeploy.git synced 2026-04-23 00:17:25 +08:00

Files

T

gongweibao 8906e09e0f [Feature][OP] Add batch-invariant RMSNorm kernel and TP embedding Custom AR path (#6749 )

* [Feature] Add batch-invariant RMSNorm kernel and TP embedding Custom AR path

- Add Triton-based rms_norm_batch_invariant kernel for M-invariant RMSNorm
- Add linear/linear_v2 tracking wrappers in batch_invariant_mode
- Route TP VocabParallelEmbedding through Custom AR instead of NCCL
- Increase FD_CUSTOM_AR_MAX_SIZE_MB default from 8 to 64
- Add unit tests for RMSNorm and TP embedding invariance

* [Fix] Fix test tolerances for bfloat16 RMSNorm and custom AR buffer size

- Relax bfloat16 atol from 1e-3 to 1e-2 for D=3584 in RMSNorm numerical
  correctness test (0.0078125 diff is expected at bfloat16 precision)
- Update test_communication expected buffer size from 8MB to 64MB to match
  FD_CUSTOM_AR_MAX_SIZE_MB default change in envs.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add RMSNorm layer batch_invariant_mode unit test for coverage

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add pragma no cover for Triton kernel and multi-GPU embedding path

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: gongweibao <gognweibao@baidu.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-13 14:34:44 +08:00

test_batch_invariance_op_addmm.py

[Feature] Add Deterministic Inference Support (#6476 )

2026-02-26 19:31:51 -08:00

test_batch_invariance_op_bmm.py

[Feature][BugFix][OP] Enhance Deterministic Inference Mode with Kernel-level Fixes and Batch-invariant BMM (#6610 )

2026-03-09 10:27:53 +08:00

test_batch_invariance_op_logsoftmax.py

[Deterministic] Move paddle version batch invariant pkg to Fastdeploy (#4763 )

2025-12-01 11:25:48 +08:00

test_batch_invariance_op_mean.py

[Deterministic] Move paddle version batch invariant pkg to Fastdeploy (#4763 )