When `func (h *Handle) filterModify(...)` handles an `U32` filter, it also corrects the endiannes for the `Mask` and `Val` in the filter's `Sel.Keys`. For this it creates a new Keys slice and copies the values from the old one. This new slice is created with an incorrect size, likely the intention was to specify its capacity, but instead the size is specified.
The old code happens to work correctly in practice when the number of keys is a power of 2. Otherwise empty (match all) keys are added to the end to make the number a power of 2.
This commit fixes the issue. It was well tested, here's an excerpt:
- Create a U32 filter with 5 Keys. The content of keys is irrelevant, only the number matters.
- Print the filter back with `tc filter show ...`.
The old behaviour:
```
filter parent ffff: protocol all pref 49150 u32 chain 0 fh 800::601 order 1537 key ht 800 bkt 0 *flowid :1 not_in_hw
match 40000000/60000000 at 0
match 07010723/ffffffff at 24
match 07450767/ffffffff at 28
match 07890733/ffffffff at 32
match 07420801/ffe00000 at 36
match 00000000/00000000 at 0
match 00000000/00000000 at 0
match 00000000/00000000 at 0
```
The last 3 entries were added by netlink.
New behaviour:
```
filter parent ffff: protocol all pref 49150 u32 chain 0 fh 800::801 order 2049 key ht 800 bkt 0 flowid :1 not_in_hw
match 60000000/f0000000 at 0
match 07010723/ffffffff at 24
match 07450767/ffffffff at 28
match 07890733/ffffffff at 32
match 07400000/ffe00000 at 36
```
Add support for geneve feature to specify source port range, see
kernel commits:
- e1f95b1992b8 ("geneve: Allow users to specify source port range")
- 5a41a00cd5d5 ("geneve, specs: Add port range to rt_link specification")
This is exactly equivalent on what is done in case of vxlan today.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This requirement limits the usefulness of labels (given the total label
length can only be 15 characters).
Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Some calls were already using it, some were not, but fix the remaining
ones.
Without this flag, the file descriptor would to the child process after
fork/exec.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Introduce AddQueues and RemoveQueues methods for attaching and detaching
queue file descriptors to an existing TUN/TAP interface in multi-queue mode.
This enables controlled testing of disabled queues and fine-grained queue
management without relying on interface recreation.
Signed-off-by: Ivan Tsvetkov <ivanfromearth@gmail.com>
On Linux, Netlink provides NDA_CACHEINFO which carries timestamps about
when ARP/ND was updated, used, and confirmed.
Expose these fields in the Neigh type
The `RouteGetWithOptions` function currently has a `Oif` option which
gets translated from link name to link index via a `LinkByName` call.
This adds unnecessary overhead when the link index is already known.
This commit adds a new `OifIndex` option to `RouteGetWithOptions` which
can be specified instead of `Oif` to skip the internal link index
translation.
Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
When adding a route with "mtu lock <mtu>" path MTU discovery (PMTUD)
will not be tried and packets will be sent without DF bit set. Upon
receiving an ICMP needs frag due to PMTUD, the kernel will not install a
cached route and lower the MTU.
Signed-off-by: Tim Rozet <trozet@redhat.com>
binary.Read() != nil check means error case, so the vxlan.Port{Low,High}
are never populated. Fix the check.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Currently, the ConntrackDeleteFilters captures all flow entries
it fails to delete and reports them as errors. This behavior
can potentially lead to memory leaks in high-traffic systems,
where thousands of conntrack flow entries are cleared in a single
batch. With this commit, instead of returning all the un-deleted
flow entries, we now return a single error message for all of them.
Signed-off-by: Daman Arora <aroradaman@gmail.com>
These attributes are supported since kernel v5.14 (see [1]). Here's
what iproute2 shows:
```
$ ip -d link show eth0
4: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65535 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
... parentbus virtio parentdev virtio0
```
[1]: https://github.com/torvalds/linux/commit/00e77ed8e64d5f271c1f015c7153545980d48a76
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Add deserialization of the `IFF_RUNNING` link flag which translates to
`net.FlagRunning`.
Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
Update the Go version we test against to Go v1.22 which is currently the
oldest version still receiving security updates.
Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
Add a specific error to report that a netlink response had
NLM_F_DUMP_INTR set, indicating that the set of results may be
incomplete or inconsistent.
unix.EINTR was previously returned (with no results) when the
NLM_F_DUMP_INTR flag was set. Now, errors.Is(err, unix.EINTR) will
still work. But, this will be a breaking change for any code that's
checking for equality with unix.EINTR.
Return results with ErrDumpInterrupted. Results may be incomplete
or inconsistent, but give the caller the option of using them.
Look for NLM_F_DUMP_INTR in more places:
- linkSubscribeAt, neighSubscribeAt, routeSubscribeAt
- can do an initial dump, which may report inconsistent results
-> if there's an error callback, call it with ErrDumpInterrupted
- socketDiagXDPExecutor
- makes an NLM_F_DUMP request, without using Execute()
-> give it the same behaviour as functions that do use Execute()
Signed-off-by: Rob Murray <rob.murray@docker.com>
They were implemented using SO_SNDTIMEO/SO_RCVTIMEO on the
socket descriptor - but that doesn't work now the socket is
non-blocking. Instead, set deadlines on the file read/write.
Signed-off-by: Rob Murray <rob.murray@docker.com>
Commit c96b03b4be changed the signature
of this method to accept a list of filters and renamed it to
ConntrackDeleteFilters (plural).
This patch
- adds back ConntrackDeleteFilter as an alias
- marks it as deprecated in favor of the new version.
- adds missing stubs for other platforms
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>