[BugFix] Cap nvcc -t threads to avoid compilation failures on high-co… (#6885)

* [BugFix] Cap nvcc -t threads to avoid compilation failures on high-core machines

On machines with many cores (e.g. 192), the nvcc -t flag was set to
os.cpu_count(), causing each nvcc process to spawn that many internal
threads. Combined with Paddle's ThreadPoolExecutor launching parallel
compilations (also based on cpu_count), this leads to ~28K+ threads,
resource exhaustion, and silent compilation failures. The linker then
cannot find the missing .o files, but a second build succeeds because
already-compiled objects are cached.

Cap nvcc -t at 4 to keep total parallelism reasonable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: gongweibao <gognweibao@baidu.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
This commit is contained in:
gongweibao
2026-03-17 19:27:45 +08:00
committed by GitHub
parent cb6819d086
commit e4c9cac124
+5 -2
View File
@@ -363,8 +363,11 @@ elif paddle.is_compiled_with_cuda():
"-Igpu_ops",
"-Ithird_party/nlohmann_json/include",
]
worker_threads = os.cpu_count()
nvcc_compile_args += ["-t", str(worker_threads)]
# Limit nvcc internal threads to avoid resource exhaustion when Paddle's
# ThreadPoolExecutor also launches many parallel compilations.
# Total threads ≈ (number of parallel compile jobs) × nvcc_threads, so cap nvcc_threads at 4.
nvcc_threads = min(os.cpu_count() or 1, 4)
nvcc_compile_args += ["-t", str(nvcc_threads)]
nvcc_version = get_nvcc_version()
print(f"nvcc_version = {nvcc_version}")