Files
ice/candidate_server_reflexive.go
T
sirzooro ceccc15191 Fix race and disconnection after candidate replace (#913)
This PR fixes two race conditions introduced by peer-reflexive candidate
replacement and addresses a connectivity regression where an agent could
transition from Connected to Disconnected/Failed shortly after
replacement.

## Problems

1. Data race in candidate pair access:

- `replaceRedundantPeerReflexiveCandidates` updated `pair.Remote` in
place.
- At the same time, hot-path reads (for example `CandidatePair.Write`)
could read `pair.Remote` without synchronization.

2. Data race in remote candidate cache map:

- `candidateBase.handleInboundPacket` (non-STUN path) read/wrote
`remoteCandidateCaches` from recv loop goroutines.
- Concurrent writes also happened from agent-loop code
(`replaceRemoteCandidateCacheValues`).

3. Connectivity regression after replacement:

- When replacing a selected pair's remote candidate (prflx -> signaled),
the new candidate could have zero `LastReceived`/`LastSent`.
- `validateSelectedPair` uses `selectedPair.Remote.LastReceived()`,
which could cause quick `Connected -> Disconnected -> Failed`
transitions.

## Root Cause

Replacement logic preserved pair priority but mutated shared objects in
place and did not preserve candidate activity timestamps across
candidate object replacement.

## Fixes

### 1) Replace candidate pair objects instead of mutating `pair.Remote`

- Added helper `replacePairRemote(pair, remote)` to clone pair state
into a new `CandidatePair`.
- Preserved pair fields and runtime stats:
  - `id`, role, state, nomination flags, binding counters.
  - RTT fields, packet/byte counters, request/response counters.
  - timestamp `atomic.Value` fields.
- Replaced references in:
  - `a.checklist[i]`
  - `a.pairsByID[pair.id]`
- If old pair was selected, published replacement via
`a.setSelectedPair(replacement)`.

### 2) Serialize remote cache access on agent loop

- Updated non-STUN path in `candidateBase.handleInboundPacket`:
- Cache validation, remote candidate lookup, `seen(false)`, and cache
insert now run inside `agent.loop.Run(...)`.
- This removes concurrent map read/write between recv loop and
agent-loop updates.

### 3) Preserve candidate activity on replacement

- Added `copyCandidateActivity(dst, src)` to transfer:
  - `LastReceived`
  - `LastSent`
- Applied before replacing references so selected-pair liveness checks
remain stable.
2026-04-21 06:27:53 +02:00

68 lines
1.6 KiB
Go

// SPDX-FileCopyrightText: 2026 The Pion community <https://pion.ly>
// SPDX-License-Identifier: MIT
package ice
import (
"net"
"net/netip"
)
// CandidateServerReflexive ...
type CandidateServerReflexive struct {
candidateBase
}
// CandidateServerReflexiveConfig is the config required to create a new CandidateServerReflexive.
type CandidateServerReflexiveConfig struct {
CandidateID string
Network string
Address string
Port int
Component uint16
Priority uint32
Foundation string
RelAddr string
RelPort int
}
// NewCandidateServerReflexive creates a new server reflective candidate.
func NewCandidateServerReflexive(config *CandidateServerReflexiveConfig) (*CandidateServerReflexive, error) {
ipAddr, err := netip.ParseAddr(config.Address)
if err != nil {
return nil, err
}
networkType, err := determineNetworkType(config.Network, ipAddr)
if err != nil {
return nil, err
}
candidateID := config.CandidateID
if candidateID == "" {
candidateID = globalCandidateIDGenerator.Generate()
}
return &CandidateServerReflexive{
candidateBase: candidateBase{
id: candidateID,
networkType: networkType,
candidateType: CandidateTypeServerReflexive,
address: config.Address,
port: config.Port,
resolvedAddr: &net.UDPAddr{
IP: ipAddr.AsSlice(),
Port: config.Port,
Zone: ipAddr.Zone(),
},
component: config.Component,
foundationOverride: config.Foundation,
priorityOverride: config.Priority,
relatedAddress: &CandidateRelatedAddress{
Address: config.RelAddr,
Port: config.RelPort,
},
},
}, nil
}