[BugFix] Cap nvcc -t threads to avoid compilation failures on high-co… (#6885)

* [BugFix] Cap nvcc -t threads to avoid compilation failures on high-core machines On machines with many cores (e.g. 192), the nvcc -t flag was set to os.cpu_count(), causing each nvcc process to spawn that many internal threads. Combined with Paddle's ThreadPoolExecutor launching parallel compilations (also based on cpu_count), this leads to ~28K+ threads, resource exhaustion, and silent compilation failures. The linker then cannot find the missing .o files, but a second build succeeds because already-compiled objects are cached. Cap nvcc -t at 4 to keep total parallelism reasonable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: gongweibao <gognweibao@baidu.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-04-23 00:17:25 +08:00 · 2026-03-17 19:27:45 +08:00
parent cb6819d086
commit e4c9cac124
1 changed files with 5 additions and 2 deletions
@@ -363,8 +363,11 @@ elif paddle.is_compiled_with_cuda():
        "-Igpu_ops",
        "-Ithird_party/nlohmann_json/include",
    ]
-    worker_threads = os.cpu_count()
-    nvcc_compile_args += ["-t", str(worker_threads)]
+    # Limit nvcc internal threads to avoid resource exhaustion when Paddle's
+    # ThreadPoolExecutor also launches many parallel compilations.
+    # Total threads ≈ (number of parallel compile jobs) × nvcc_threads, so cap nvcc_threads at 4.
+    nvcc_threads = min(os.cpu_count() or 1, 4)
+    nvcc_compile_args += ["-t", str(nvcc_threads)]

    nvcc_version = get_nvcc_version()
    print(f"nvcc_version = {nvcc_version}")