Nyako Shigure
d659099415
[Cleanup] Replace torch proxy alias with public compat API ( #7348 )
2026-04-13 11:43:26 +08:00
Nyakku Shigure
8b6bbb3504
[Optimization] Use a separate driver when using Triton with Paddle ( #6897 )
...
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2026-03-24 10:56:00 +08:00
YuBaoku
0359794e08
[CI] Sync _log_softmax_batch_invariant with paddle update ( #6893 )
2026-03-17 23:03:57 +08:00
gongweibao
a6351dea0b
[BugFix][Optimization] Replace silent failures with catchable exceptions and informative error messages ( #6533 )
...
* init
* init
* fix format
* add
* add files
* add ut
* fix some
* add ut
* add more
* add
* fix pre-commit
* fix pre-commit
* fix cover
* skip long seq
* add
* add
* fix
* remove not need
* fix set attr
* fix comments
* fix comments
* fix failed tests
---------
Co-authored-by: gongweibao <gognweibao@baidu.com >
2026-03-16 21:32:43 +08:00
gongweibao
8906e09e0f
[Feature][OP] Add batch-invariant RMSNorm kernel and TP embedding Custom AR path ( #6749 )
...
* [Feature] Add batch-invariant RMSNorm kernel and TP embedding Custom AR path
- Add Triton-based rms_norm_batch_invariant kernel for M-invariant RMSNorm
- Add linear/linear_v2 tracking wrappers in batch_invariant_mode
- Route TP VocabParallelEmbedding through Custom AR instead of NCCL
- Increase FD_CUSTOM_AR_MAX_SIZE_MB default from 8 to 64
- Add unit tests for RMSNorm and TP embedding invariance
* [Fix] Fix test tolerances for bfloat16 RMSNorm and custom AR buffer size
- Relax bfloat16 atol from 1e-3 to 1e-2 for D=3584 in RMSNorm numerical
correctness test (0.0078125 diff is expected at bfloat16 precision)
- Update test_communication expected buffer size from 8MB to 64MB to match
FD_CUSTOM_AR_MAX_SIZE_MB default change in envs.py
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
* Add RMSNorm layer batch_invariant_mode unit test for coverage
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
* Add pragma no cover for Triton kernel and multi-GPU embedding path
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
---------
Co-authored-by: gongweibao <gognweibao@baidu.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-03-13 14:34:44 +08:00
gongweibao
30f9f33f34
[Feature][BugFix][OP] Enhance Deterministic Inference Mode with Kernel-level Fixes and Batch-invariant BMM ( #6610 )
...
* add fa deter
* add ut
* add long sentence
* fix basic
* fix bugs
* fix adn
* fix first
* fix single
* fix single
* fix single test
* refine
* add more test
* refine comments
* add comments of bmm
* fix ci
* remove probe
* add
* remove not need
* refine tests
* fix comments and refine code
* refine code
* refine test
* refine test
* mv 4cards tests
* fix tests
* add
* fix comments
* fix cover
* fix cover
---------
Co-authored-by: gongweibao <gognweibao@baidu.com >
2026-03-09 10:27:53 +08:00
YuBaoku
54f7d9f621
[CI] Sync mm_batch_invariant with paddle.mm update ( #6557 )
2026-02-28 14:56:42 +08:00
gongweibao
edd31e8849
[Feature] Add Deterministic Inference Support ( #6476 )
...
* add
* [tests] Add Paddle attention determinism tests and refactor resource manager
Add comprehensive determinism tests for Paddle attention layer and refactor
resource manager for deterministic mode support.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com >
* add
* add
* add
* add
* add more
* add more
* fixsome
* fixsome
* fix bugs
* fix bugs
* only in gpu
* add docs
* fix comments
* fix some
* fix some
* fix comments
* add more
* fix potential problem
* remove not need
* remove not need
* remove no need
* fix bug
* fix bugs
* fix comments
* fix comments
* Update tests/ce/deterministic/test_determinism_verification.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/inter_communicator/test_ipc_signal.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/layers/test_paddle_attention_determinism.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/engine/test_sampling_params_determinism.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/layers/test_paddle_attention_determinism.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/layers/test_paddle_attention_determinism_standalone.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* fix comments
* fix import error
* fix a bug
* fix bugs
* fix bugs
* fix coverage
* refine codes
* refine code
* fix comments
* fix comments
* fix comments
* rm not need
* fix allreduce large tensor bug
* mv log files
* mv log files
* add files
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2026-02-26 19:31:51 -08:00
Jundong Liu
6f42c37359
[Deterministic] Move paddle version batch invariant pkg to Fastdeploy ( #4763 )
...
* Move batch invariant pkg to Fastdeploy
* fix problem and pre-commit
* move test
* Change testcase to FD style
* Add testcase for log_softmax
* Add testcase for mean
* Add testcase for addmm
* fix pre-commit
* API check v0.9
* move to layers and add comment about log_softmax
* Update fastdeploy/model_executor/layers/batch_invariant_ops/batch_invariant_ops.py
存在于原版代码注释中的版本控制遗留的内容,确实应该去除
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/batch_invariant/test_batch_invariance_op_mean.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update tests/batch_invariant/test_batch_invariance_op_logsoftmax.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Update fastdeploy/model_executor/layers/batch_invariant_ops/batch_invariant_ops.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* change comment after copilot fix
* fix bug about addmm
* avoid global effect by enable_torch_proxy
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com >
2025-12-01 11:25:48 +08:00