When RUNC_USE_SYSTEMD is set, tests/rootless.sh is using
ssh -tt rootless@localhost
to run tests as rootless user. In this case, local environment is not
passed to the user's ssh session (unless explicitly specified), and so
the tests do not get ROOTLESS_FEATURES.
As a result, idmap-related tests are skipped when running as rootless
using systemd cgroup driver:
integration test (systemd driver)
...
[02] run rootless tests ... (idmap)
...
ok 286 runc run detached ({u,g}id != 0) # skip test requires rootless_idmap
...
Fix this by creating a list of environment variables needed by the
tests, and adding those to ssh command line (in case of ssh) or
exporting (in case of sudo) so both cases work similarly.
Also, modify disable_idmap to unset variables set in enable_idmap so
they are not exported at all if idmap is not in features.
Fixes: bf15cc99 ("cgroup v2: support rootless systemd")
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Some of runc integration tests may do something that I would not like
when running those on my development laptop. Examples include
- changing the root mount propagation [1];
- replacing /root/runc [2];
- changing the file in /etc (see checkpoint.bats).
Yet it is totally fine to do all that in a throwaway CI environment,
or inside a Docker container.
Introduce a mechanism to skip specific "unsafe" tests unless an
environment variable, RUNC_ALLOW_UNSAFE_TESTS, is set. Use it
from a specific checkpoint/restore test which modifies
/etc/criu/default.conf.
[1]: https://github.com/opencontainers/runc/pull/5200
[2]: https://github.com/opencontainers/runc/pull/5207
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Since switching to Go 1.25 in go.mod, the "detect fd leaks" test fails
like this:
> not ok 57 runc create[detect fd leak as comprehensively as possible]
> # (in test file tests/integration/create.bats, line 76)
> # `[ "$violation_found" -eq 0 ]' failed
> ...
> # Violation: FD 9 -> '/system.slice/runc-test_busybox.scope/cpu.cfs_quota_us'
> # Violation: FD 10 -> '/system.slice/runc-test_busybox.scope/cpu.cfs_period_us'
> ...
This happens because Go 1.25 adds a feature to dynamically set GOMAXPROC
based on current CPU quota values. This feature can be disabled by setting
GODEBUG=containermaxprocs=0,updatemaxprocs=0
but it is harmless to keep it (except for the above test failure).
Add an exception to the test case.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This adds support for WaitKillableRecv seccomp flag
(also known as SCMP_FLTATR_CTL_WAITKILL in libseccomp and
as SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV in the kernel).
This requires:
- libseccomp >= 2.6.0
- libseccomp-golang >= 0.11.0
- linux kernel >= 5.19
Note that this flag does not make sense without NEW_LISTENER, and
the kernel returns EINVAL when SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV
is set but SECCOMP_FILTER_FLAG_NEW_LISTENER is not set.
For runc this means that .linux.seccomp.listenerPath should also be set,
and some of the seccomp rules should have SCMP_ACT_NOTIFY action. This
is why the flag is tested separately in seccomp-notify.bats.
At the moment the only adequate CI environment for this functionality is
Fedora 43. On all other platforms (including CentOS 10 and Ubuntu 24.04)
it is skipped similar to this:
> ok 251 runc run [seccomp] (SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV) # skip requires libseccomp >= 2.6.0 and API level >= 7 (current version: 2.5.6, API level: 6)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
SCMP_ACT_KILL terminates the process with a fatal signal, which may
produce a core dump depending on the host configuration.
While this is harmless on ephemeral CI instances, it can leave unwanted
core files on developer or customer systems. It also interferes with
test environments that detect unexpected core dumps.
Signed-off-by: Ricardo Branco <rbranco@suse.de>
When parsing mount options into recAttrSet and recAttrClr,
the code sets attr_clr to individual atime flags (e.g.
MOUNT_ATTR_NOATIME or MOUNT_ATTR_STRICTATIME) when clearing
atime attributes. However, this violates the kernel's
requirement documented in mount_setattr(2)[1]:
> Note that, since the access-time values are an enumeration
> rather than bit values, a caller wanting to transition to a
> different access-time setting cannot simply specify the
> access-time setting in attr_set, but must also include
> MOUNT_ATTR__ATIME in the attr_clr field. The kernel will
> verify that MOUNT_ATTR__ATIME isn't partially set in
> attr_clr (i.e., either all bits in the MOUNT_ATTR__ATIME
> bit field are either set or clear), and that attr_set
> doesn't have any access-time bits set if MOUNT_ATTR__ATIME
> isn't set in attr_clr.
Passing only a single atime flag (e.g. MOUNT_ATTR_RELATIME) in
attr_clr causes mount_setattr() to fail with EINVAL.
This change ensures that whenever an atime mode is updated,
attr_clr includes MOUNT_ATTR__ATIME to properly reset the
entire access-time attribute field before applying the new mode.
[1] https://man7.org/linux/man-pages/man2/mount_setattr.2.html
Signed-off-by: lifubang <lifubang@acmcoder.com>
We intentionally broke this in commit d40b3439a9 ("rootfs: switch to
fd-based handling of mountpoint targets") under the assumption that most
users do not need this feature. Sadly it turns out they do, and so
commit 3f925525b4 ("rootfs: re-allow dangling symlinks in mount
targets") added a hotfix to re-add this functionality.
This patch adds some much-needed tests for this behaviour, since it
seems we are going to need to keep this for compatibility reasons (at
least until runc v2...).
Co-developed-by: lifubang <lifubang@acmcoder.com>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
On some systems (e.g., AlmaLinux 8), systemd automatically removes cgroup paths
when they become empty (i.e., contain no processes). To prevent this, we spawn
a dummy process to pin the cgroup in place.
Fix: https://github.com/opencontainers/runc/issues/5003
Signed-off-by: lifubang <lifubang@acmcoder.com>
This was always the intended behaviour but commit 72fbb34f50 ("rootfs:
switch to fd-based handling of mountpoint targets") regressed it when
adding a mechanism to create a file handle to the target if it didn't
already exist (causing the later stat to always succeed).
A lot of people depend on this functionality, so add some tests to make
sure we don't break it in the future.
Fixes: 72fbb34f50 ("rootfs: switch to fd-based handling of mountpoint targets")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
This is mostly to improve readability. While at it, make the script more
robust by adding -e option to shell. The exception is echo $pid which is
opportunistic and may fail depending on the order of pids in the file.
Also, remove the empty comment and a shellcheck annotation.
Fixes: c91fe9ae
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The "runc delete --force [paused container]" test case does not check
runc pause exit code, and if added, the test fails in rootless tests,
because:
- not all rootless tests have access to cgroups;
- rootless containers doesn't have default cgroups path.
To fix, add:
- setup for rootless case;
- require cgroups_freezer;
- runc pause exit code check.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In our bats tests, runc itself is a wrapper which calls bats run helper,
so using "run runc" is wrong as it results in calling run helper twice.
Fixes: 8d180e965
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Commands that are not run via "run" helper (cat, mkdir, __runc)
do not set $status, so it makes no sense to check it.
Fixes: 94505a04, ed548376
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This is a bit opinionated, but some comments in integration tests do not
really help to understand the nature of the tests being performed by
stating something very obvious, like
# run busybox detached
runc run -d busybox
To make things worse, these not-so-helpful messages are being
copy/pasted over and over, and that is the main reason to remove them.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1. Remove the devicemapper driver mentions, and is it no longer
supported by docker (or podman).
2. Remove the test example -- we have plenty of real ones.
3. Add a link to (well written and extensive) bats documentation.
4. Fix capitalization in a sentence.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This removes `mips64le` (no longer supported by the image / upstream in Debian Trixie+) and adds `riscv64`.
Signed-off-by: Tianon Gravi <admwiggin@gmail.com>
The main benefit here is when we are using a systemd cgroup driver,
we actually ask systemd to add a PID, rather than doing it ourselves.
This way, we can add rootless exec PID to a cgroup.
This requires newer opencontainers/cgroups and coreos/go-systemd.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
When a non–page-aligned value is written to memory.max, the kernel aligns it
down to the nearest page boundary. On systems with a page size greater
than 4K (e.g., 64K), this caused failures because the configured
memory.max value was not 64K aligned.
This patch fixes the issue by explicitly aligning the memory.max value
to 64K. Since 64K is also a multiple of 4K, the value is correctly
aligned on both 4K and 64K page size systems.
However, this approach will still fail on systems where the hardcoded
memory.max value is not aligned to the system page size.
Fixes: https://github.com/opencontainers/runc/issues/4841
Signed-off-by: Vishal Chourasia <vishalc@linux.ibm.com>
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
1. In case runc binary file name is not runc, the test fails like
below. The fix is to get the binary name from $RUNC.
✗ runc command -h
(in test file tests/integration/help.bats, line 27)
`[[ ${lines[1]} =~ runc\ checkpoint+ ]]' failed
runc-go1.25.0-main checkpoint -h (status=0):
NAME:
runc-go1.25.0-main checkpoint - checkpoint a running container
2. Simplify the test by adding a loop for all commands. While at it, add
a loop for -h --help as well.
3. Add missing commands (create, ps, features).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The setup in selinux.bats assumes $RUNC binary name ends in runc, and
thus it fails when we run it like this:
sudo -E RUNC=$(pwd)/runc.patched bats tests/integration/selinux.bats
Fix is easy.
Fixes: b39781b06 ("tests/int: add selinux test case")
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In certain deployments, it's possible for runc to be spawned by a
process with a restrictive cpumask (such as from a systemd unit with
CPUAffinity=... configured) which will be inherited by runc and thus the
container process by default.
The cpuset cgroup used to reconfigure the cpumask automatically for
joining processes, but kcommit da019032819a ("sched: Enforce user
requested affinity") changed this behaviour in Linux 6.2.
The solution is to try to emulate the expected behaviour by resetting
our cpumask to correspond with the configured cpuset (in the case of
"runc exec", if the user did not configure an alternative one). Normally
we would have to parse /proc/stat and /sys/fs/cgroup, but luckily
sched_setaffinity(2) will transparently convert an all-set cpumask (even
if it has more entries than the number of CPUs on the system) to the
correct value for our usecase.
For some reason, in our CI it seems that rootless --systemd-cgroup
results in the cpuset (presumably temporarily?) being configured such
that sched_setaffinity(2) will allow the full set of CPUs. For this
particular case, all we care about is that it is different to the
original set, so include some special-casing (but we should probably
investigate this further...).
Reported-by: ningmingxiao <ning.mingxiao@zte.com.cn>
Reported-by: Martin Sivak <msivak@redhat.com>
Reported-by: Peter Hunt <pehunt@redhat.com>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Sometimes we need to run runc through some wrapper (like nohup), but
because "__runc" and "runc" are bash functions in our test suite this
doesn't work trivially -- and you cannot just pass "$RUNC" because you
you need to set --root for rootless tests.
So create a setup_runc_cmdline helper which sets $RUNC_CMDLINE to the
beginning cmdline used by __runc (and switch __runc to use that).
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
"runc" was a special wrapper around bats's "run" which output some very
useful diagnostic information to the bats log, but this was not usable
for other commands. So let's make it a more generic helper that we can
use for other commands.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
openSUSE has an unfortunate default udev setup which forcefully sets all
loop devices to use the "none" scheduler, even if you manually set it.
As this is a property of the host configuration (and udev is monitoring
from the host) we cannot really change this behaviour from inside our
test container.
So we should just skip the test in this (hopefully unusual) case.
Ideally tools running the test suite should disable this behaviour when
running our test suite.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
If an error occurs during a test which sets up loopback devices, the
loopback device is not freed. Since most systems have very conservative
limits on the number of loopback devices, re-running a failing test
locally to debug it often ends up erroring out due to loopback device
exhaustion.
So let's just move the "losetup -d" to teardown, where it belongs.
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>