[BugFix] Fix clear_parameters hang issue in MTP during weight cleanup in RL (#7522)
CE Compile Job / ce_job_pre_check (push) Waiting to run
CE Compile Job / print_ce_job_pre_check_outputs (push) Blocked by required conditions
CE Compile Job / FD-Clone-Linux (push) Blocked by required conditions
CE Compile Job / Show Code Archive Output (push) Blocked by required conditions
CE Compile Job / BUILD_SM8090 (push) Blocked by required conditions
CE Compile Job / BUILD_SM8090_RL (push) Blocked by required conditions
CE Compile Job / BUILD_SM8689 (push) Blocked by required conditions
CE Compile Job / CE_UPLOAD (push) Blocked by required conditions
CE Compile Job / CE_UPLOAD_RL (push) Blocked by required conditions
Deploy GitHub Pages / deploy (push) Waiting to run

* fix mtp clear graph bugs in rl
This commit is contained in:
GoldPancake
2026-04-22 15:24:01 +08:00
committed by GitHub
parent e580cf0fef
commit 68dbe71d77
+9 -2
View File
@@ -2910,12 +2910,19 @@ class GPUModelRunner(ModelRunnerBase):
# Clear CUDAGraph
if self.use_cudagraph:
self.model.clear_graph_opt_backend()
if (
self.speculative_decoding
and self.spec_method == SpecMethod.MTP
and self.graph_opt_config.draft_model_use_cudagraph
):
self.proposer.model.clear_graph_opt_backend()
# Clear parameters and Send single
self.dynamic_weight_manager.clear_parameters(
pid, self.fd_config.parallel_config.shutdown_comm_group_if_worker_idle
)
if self.spec_method == SpecMethod.MTP:
self.proposer.model.clear_graph_opt_backend()
# NOTE(wangyanpeng): MTP cache must be cleared before clearing the main KV cache
if self.speculative_decoding and self.spec_method == SpecMethod.MTP:
self.proposer.clear_mtp_cache()
self.clear_cache()
paddle.device.cuda.empty_cache()