[FDConfig] Reduce FD_CUSTOM_AR_MAX_SIZE_MB default from 64 to 8 (#6997)

Most single-GPU and small-model deployments do not need 64MB custom all-reduce buffers. Lowering the default to 8MB reduces unnecessary shared memory allocation. Tests that require larger buffers now explicitly set the value. Co-authored-by: gongweibao <gognweibao@baidu.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-24 17:49:42 +08:00 · 2026-03-25 17:40:01 +08:00
parent 7a6c28781b
commit 48cfb608aa
2 changed files with 3 additions and 3 deletions
@@ -143,7 +143,7 @@ def _module_env():
        {
            "CUDA_VISIBLE_DEVICES": os.environ.get("CUDA_VISIBLE_DEVICES", "0,1,2,3"),
            "FD_DETERMINISTIC_MODE": "1",
-            "FD_CUSTOM_AR_MAX_SIZE_MB": os.environ.get("FD_CUSTOM_AR_MAX_SIZE_MB", "57"),
+            "FD_CUSTOM_AR_MAX_SIZE_MB": os.environ.get("FD_CUSTOM_AR_MAX_SIZE_MB", "64"),
            "FLAGS_max_partition_size": _CHUNK_SIZE_FOR_TEST,
        }
    ):