zishuo/runc

mirror of https://github.com/opencontainers/runc.git synced 2026-04-24 00:30:44 +08:00

Author	SHA1	Message	Date
Sebastiaan van Stijn	ba83c7c7d7	libcontainer/devices: add '//go:fix inline' directives This allows users to automaticaly migrate to the new location using `go fix`. It has some limitations, but can help smoothen the transition; for example, taking this file; ``` package main import ( "github.com/opencontainers/runc/libcontainer/devices" ) func main() { _, _ = devices.DeviceFromPath("a", "b") _, _ = devices.HostDevices() _, _ = devices.GetDevices("a") } ``` Running `go fix -mod=readonly ./...` will migrate the code; ``` package main import ( devices0 "github.com/moby/sys/devices" ) func main() { _, _ = devices0.DeviceFromPath("a", "b") _, _ = devices0.HostDevices() _, _ = devices0.GetDevices("a") } ``` updates `b345c78dca` Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2026-04-04 19:36:43 +02:00
Aleksa Sarai	b345c78dca	libct/devices: deprecate in favour of moby/sys/devices The libcontainer/devices package has been moved to moby/sys/devices, so we can just point users to that and keep some compatibility shims around until runc 1.6. We don't use it at all so there are no other changes needed. Signed-off-by: Aleksa Sarai <aleksa@amutable.com>	2026-04-02 22:54:14 +11:00
lfbzhm	5b094ed1ac	libct: use preopened rootfs more This uses preopened rootfs in Chdir and pivotRoot. While at it, add O_PATH when opening oldroot in pivotRoot. Co-authored-by: Kir Kolyshkin <kolyshkin@gmail.com> Signed-off-by: lfbzhm <lifubang@acmcoder.com> Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2026-03-29 12:02:38 -07:00
Kir Kolyshkin	28cb321887	Pre-open container root directory A lot of filesystem-related stuff happens inside the container root directory, and we have used its name before. It makes sense to pre-open it and use a os.File handle instead. Function names in internal/pathrs are kept as is for simplicity (and it is an internal package), but they now accept root as os.File. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2026-03-29 12:02:36 -07:00
Kir Kolyshkin	78b80677f6	libct: minor refactor in mountToRootfs No change in functionality, just a preparation for the next patch. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2026-03-28 23:48:07 -07:00
Kir Kolyshkin	60352524d3	libct: mountCgroupV1: address TODO Indeed, it does not make sense to prepend c.root once we started using MkdirAllInRoot in commit `63c29081`. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2026-03-28 23:48:07 -07:00
Aleksa Sarai	7b40afb6cc	merge #5177 into opencontainers/runc:main Li Fubang (3): test: check mount source fds are cleaned up with idmapped mounts libct: close mount source fd as soon as possible libct: add a nil check for mountError LGTMs: kolyshkin rata cyphar	2026-03-28 17:32:21 +11:00
Kir Kolyshkin	f00b2f9fd5	libct/exeseal: drop own F_SEAL_EXEC Since golang.org/x/sys@v0.22 it is available from unix. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2026-03-20 15:57:55 -07:00
lifubang	c77e71a3e7	libct: close mount source fd as soon as possible This commit factors out setupAndMountToRootfs without changing any logic. Use "Hide whitespace changes" during review to focus on the actual changes. The refactor ensures the mount source file descriptor is closed via defer in each loop iteration, reducing the total number of open FDs in runc. This helps avoid hitting the file descriptor limit under high concurrency or when handling many mounts. Signed-off-by: lifubang <lifubang@acmcoder.com>	2026-03-20 01:09:49 +00:00
lifubang	0d0fd95731	libct: add a nil check for mountError Signed-off-by: lifubang <lifubang@acmcoder.com>	2026-03-19 15:47:32 +00:00
Kir Kolyshkin	0079bee17f	Support specs.LinuxSeccompFlagWaitKillableRecv This adds support for WaitKillableRecv seccomp flag (also known as SCMP_FLTATR_CTL_WAITKILL in libseccomp and as SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV in the kernel). This requires: - libseccomp >= 2.6.0 - libseccomp-golang >= 0.11.0 - linux kernel >= 5.19 Note that this flag does not make sense without NEW_LISTENER, and the kernel returns EINVAL when SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV is set but SECCOMP_FILTER_FLAG_NEW_LISTENER is not set. For runc this means that .linux.seccomp.listenerPath should also be set, and some of the seccomp rules should have SCMP_ACT_NOTIFY action. This is why the flag is tested separately in seccomp-notify.bats. At the moment the only adequate CI environment for this functionality is Fedora 43. On all other platforms (including CentOS 10 and Ubuntu 24.04) it is skipped similar to this: > ok 251 runc run [seccomp] (SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV) # skip requires libseccomp >= 2.6.0 and API level >= 7 (current version: 2.5.6, API level: 6) Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2026-03-16 10:48:42 -07:00
Kir Kolyshkin	d2abe47689	libct/configs: exclude Relabel from json [un]marshaling When deprecating Relabel field, its json attributes were mistakenly removed, so now it is: - saved to JSON under "Relabel" (rather than "relabel"); - won't be ignored if empty. Let's fix it before it's too late. Fixes: `8b2b5e94` ("libct: remove relabeling dead code") Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2026-03-10 14:13:11 -07:00
Aleksa Sarai	5f3ac16d18	merge #5152 into opencontainers/runc:main Kir Kolyshkin (1): libct: remove relabeling dead code LGTMs: cyphar rata	2026-03-08 00:16:03 +09:00
Rodrigo Campos Catelin	2db0c5e8b1	Merge pull request #5155 from cyphar/intelrdt-improve-mkdir libct: intelrdt: improve directory cleanup logic	2026-03-06 14:40:40 +01:00
Aleksa Sarai	1c35df9ea2	merge #5153 into opencontainers/runc:main Kir Kolyshkin (1): Revert "Preventing containers from being unable to be deleted" LGTMs: cyphar rata	2026-03-06 18:48:53 +09:00
Aleksa Sarai	fbaf5e3161	libct: intelrdt: improve directory cleanup logic It makes more sense to save whether we should cleanup the directory after it gets created (to avoid error cases deleting a different directory) as well as tying this check to the existing os.ErrExist check rather than doing an extra stat(2). Fixes: `e2baa3ad10` ("Intel RDT: update according to spec changes.") Suggested-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2026-03-06 18:45:36 +09:00
Kir Kolyshkin	5996fe143a	Revert "Preventing containers from being unable to be deleted" This fixes random failures to start a container in conmon integration tests (see issue 5151). I guess we need to find another way to fix issue 4645. This reverts commit `1b39997e73`. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2026-03-05 18:44:30 -08:00
Kir Kolyshkin	8b2b5e9492	libct: remove relabeling dead code There is no way to set Mount.Relabel field via OCI spec (config.json), and so the relabeling code is never used. My guess it's a leftover from times when runc used to be part of Docker. Remove it, and mark Relabel field as deprecated. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2026-03-05 14:57:21 -08:00
Ismo Puustinen	e2baa3ad10	Intel RDT: update according to spec changes. There is one proposed clarification to the OCI spec: the subdirectory needs to be deleted. Runc already does that, but the clarification adds for directory removal only if the directory was created by us. Signed-off-by: Ismo Puustinen <ismo.puustinen@intel.com> Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2026-03-05 12:28:32 +11:00
Antti Kervinen	700c944c4d	libct: fix resetting CPU affinity unix.CPUSet is limited to 1024 CPUs. Calling unix.SchedSetaffinity(pid, cpuset) removes all CPUs starting from 1024 from allowed CPUs of pid, even if cpuset is all ones. As a consequence, when runc tries to reset CPU affinity to "allow all" by default, it prevents all containers from CPUs 1024 onwards. This change uses a huge CPU mask to play safe and get all possible CPUs enabled with a single sched_setaffinity call. Fixes: #5023 Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>	2026-03-04 13:06:33 -08:00
Aleksa Sarai	625ef531b7	libct: devices: drop deprecated cgroup types These were all marked deprecated in commit `a75076b4a4` ("Switch to opencontainers/cgroups") when we switched maintenance of our cgroup code to opencontainers/cgroups. Users have had ample time to switch to opencontainers/cgroups themselves, so we can finally remove this. Note that the whole libcontainer/devices package will be moved to moby/sys in the near future, so this whole package will be marked deprecated soon. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2026-03-05 00:04:34 +11:00
Aleksa Sarai	6a77ee7864	libct: remove deprecated MPOL_* constants These were inadvertently added to our exported APIs by commit eeda7bdf80cca ("Add memory policy support"). We couldn't remove them from runc 1.4.x, but we deprecated them in commit `3741f9186d` ("libct/configs: mark MPOL_* constants as deprecated") and marked them for removal in runc 1.5. Users should never have used these in the first place. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2026-03-05 00:04:34 +11:00
Aleksa Sarai	87b0804345	libct: remove deprecated HooksList.RunHooks This was deprecated in commit e6a4870e4ac40 ("libct: better errors for hooks"), and users have had ample time to migrate to Hooks.Run since. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2026-03-04 23:01:30 +11:00
Aleksa Sarai	8fd8e433f8	libct: config: remove deprecated cgroup types These were all marked deprecated in commit `a75076b4a4` ("Switch to opencontainers/cgroups") when we switched maintenance of our cgroup code to opencontainers/cgroups. Users have had ample time to switch to opencontainers/cgroups themselves, so we can finally remove this. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2026-03-04 23:01:16 +11:00
lfbzhm	4d4e064109	Merge pull request #5133 from kolyshkin/usec libct/specconv: fix panic in initSystemdProps	2026-02-27 18:55:42 +08:00
lfbzhm	8de198f11d	Merge pull request #5118 from kolyshkin/lint29 ci: bump golangci-lint to v2.10, fix some prealloc linter warnings	2026-02-27 12:37:12 +08:00
Kir Kolyshkin	a48a7cef96	libct/specconv: fix panic in initSystemdProps There is a chance of panic here -- eliminate it. Add a test case (which panics before the fix). Reported-by: Luke Hinds <luke@stacklok.com> Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2026-02-26 18:26:46 -08:00
Kir Kolyshkin	392a221293	libct/specconv: TestInitSystemdProps: use t.Run Use t.Run for individual tests. Add missing desc fields. Best reviewed with --ignore-all-space. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2026-02-26 18:26:46 -08:00
Kir Kolyshkin	6a374e6c1d	libcontainer: move example code out of README Example code in README is outdated (especially since cgroups is moved to a separate repository) and lacks proper import statements. And, since it is not code, it is hard to keep it up to date. Let's move it out to the example_test.go file and refer to it. Note we still don't run it, but it will be compiled and linted in CI. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2026-02-26 09:36:56 -08:00
Kir Kolyshkin	5a6e1e18f9	Preallocate some slices Fix some of the prealloc linter warnings. While it does not make sense to address all warnings (or add prealloc to the list of linters we run in CI), some do make sense. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2026-02-25 13:48:55 -08:00
Kir Kolyshkin	6c07a37a58	libct: prepareCgroupFD: fall back to container init cgroup Previously, when prepareCgroupFD would not open container's cgroup (as configured in config.json and saved to state.json), it returned a fatal error, as we presumed a container can't exist without its own cgroup. Apparently, it can. In a case when container is configured without cgroupns (i.e. it uses hosts cgroups), and /sys/fs/cgroup is mounted read-write, a rootful container's init can move itself to an entirely different cgroup (even a new one that it just created), and then the original container cgroup is removed by the kernel (or systemd?) as it has no processes left. By the way, from the systemd point of view the container is gone. And yet it is still there, and users want runc exec to work! And it worked, thanks to the "let's try container init's cgroup" fallback as added by commit `c91fe9aeba` ("cgroup2: exec: join the cgroup of the init process on EBUSY"). The fallback was added for the entirely different reason, but it happened to work in this very case, too. This behavior was broken with the introduction of CLONE_INTO_CGROUP support. While it is debatable whether this is a valid scenario when a container moves itself into a different cgroup, this very setup is used by e.g. buildkitd running in a privileged kubernetes container (see issue 5089). To restore the way things are expected to work, add the same "try container init's cgroup" fallback into prepareCgroupFD. While at it, simplify the code flow. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2026-02-11 11:57:25 -08:00
Kir Kolyshkin	1d030fab7d	libct: refactor addIntoCgroupV2, fix wrt rootless 1. Refactor addIntoCgroupV2 in an attempt to simplify it. 2. Fix the bug of not trying the init cgroup fallback if rootlessCgroup is set. This is a bug because rootlessCgroup tells to ignore cgroup join errors, not to never try the fallback. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2026-02-11 11:56:57 -08:00
Kir Kolyshkin	94133fab97	libct: factor out initProcessCgroupPath Separate initProcessCgroupPath code out of addIntoCgroupV2. To be used by the next patch. While at it, describe the new scenario in which the container's configured cgroup might not be available. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2026-02-11 11:52:59 -08:00
lifubang	5560d55bfd	libct/specconv: fix partial clear of atime mount flags When parsing mount options into recAttrSet and recAttrClr, the code sets attr_clr to individual atime flags (e.g. MOUNT_ATTR_NOATIME or MOUNT_ATTR_STRICTATIME) when clearing atime attributes. However, this violates the kernel's requirement documented in mount_setattr(2)[1]: > Note that, since the access-time values are an enumeration > rather than bit values, a caller wanting to transition to a > different access-time setting cannot simply specify the > access-time setting in attr_set, but must also include > MOUNT_ATTR__ATIME in the attr_clr field. The kernel will > verify that MOUNT_ATTR__ATIME isn't partially set in > attr_clr (i.e., either all bits in the MOUNT_ATTR__ATIME > bit field are either set or clear), and that attr_set > doesn't have any access-time bits set if MOUNT_ATTR__ATIME > isn't set in attr_clr. Passing only a single atime flag (e.g. MOUNT_ATTR_RELATIME) in attr_clr causes mount_setattr() to fail with EINVAL. This change ensures that whenever an atime mode is updated, attr_clr includes MOUNT_ATTR__ATIME to properly reset the entire access-time attribute field before applying the new mode. [1] https://man7.org/linux/man-pages/man2/mount_setattr.2.html Signed-off-by: lifubang <lifubang@acmcoder.com>	2026-02-06 03:30:55 +00:00
Kir Kolyshkin	cb31d62f1c	Fix exec vs Go 1.26 Since [PR 4812], runc exec tries to use clone3 syscall with CLONE_INTO_CGROUP, falling back to the old method if it is not supported. One issue with that approach is, a > Cmd cannot be reused after calling its [Cmd.Start], [Cmd.Run], > [Cmd.Output], or [Cmd.CombinedOutput] methods. (from https://pkg.go.dev/os/exec#Cmd). This is enforced since Go 1.26, see [CL 728642], and so runc exec actually fails in specific scenarios (go1.26 and no CLONE_INTO_CGROUP support). The easiest workaround is to pre-copy the p.cmd structure (copy = *cmd). From the [CL 734200] it looks like it is an acceptable way, but it might break in the future as it also copies the private fields, so let's do a proper field-by-field copy. If the upstream will add cmd.Clone method, we will switch to it. Also, we can probably be fine with a post-copy (once the first Start has failed), but let's be conservative here and do a pre-copy. [PR 4812]: https://github.com/opencontainers/runc/pull/4812 [CL 728642]: https://go.dev/cl/728642 [CL 734200]: https://go.dev/cl/734200 Reported-by: Efim Verzakov <efimverzakov@gmail.com> Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2026-01-29 13:49:34 -08:00
Kir Kolyshkin	82b7597a26	libct: check cmd.Err after exec.Command call Theoretically, exec.Command can set cmd.Err. Practically, this should never happen (Linux, Go <= 1.26, exePath is absolute), but in the unlikely case it does, let's fail early. This is related to the cloneCmd (to be introduced by the following commit) which chooses to not copy the Err field. Theoretically, exec.Command can set Err and so the first call to cmd.Start will fail (since Err != nil), and the second call to cmd.Start may succeed because Err == nil. Yet, this scenario is highly unlikely, but better be safe than sorry. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2026-01-29 13:49:04 -08:00
Kir Kolyshkin	593ac3b7d9	libct: use pointers for Process methods The Process type is quite big (currently 368 bytes on a 64 bit Linux) and using non-pointer receivers in its methods results in copying which is totally unnecessary. Change the methods to use pointer receivers. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2026-01-26 14:17:46 -08:00
Kir Kolyshkin	6cd91f665e	libct/configs: use pointers for Config methods The Config type is quite big (currently 554 bytes on a 64 bit Linux) and using non-pointer receivers in its methods results in copying which is totally unnecessary. Change the methods to use pointer receivers. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2026-01-26 14:17:44 -08:00
Kir Kolyshkin	2088e000eb	libct/configs: Id -> ID Rename a function parameter (containerId -> containerID) to avoid a linter warning: > var-naming: method parameter containerId should be containerID (revive) In many other places, including config.json (.linux.uidMappings and .gidMappings) it is already called containerID, so let's rename. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2026-01-26 14:16:19 -08:00
Kir Kolyshkin	652269729d	libc/int: use strings.Builder Generated by modernize@latest (v0.21.0). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-12-16 15:04:04 -08:00
Akihiro Suda	4dcda051da	Merge pull request #5055 from kolyshkin/mpol-2 libct/configs: mark MPOL_* constants as deprecated	2025-12-16 10:39:09 +09:00
Curd Becker	536e183451	Replace os.Is* error checking functions with their errors.Is counterpart Signed-off-by: Curd Becker <me@curd-becker.de>	2025-12-11 03:16:02 +01:00
Kir Kolyshkin	3741f9186d	libct/configs: mark MPOL_* constants as deprecated Alas, these new constants are already in v1.4.0 release so we can't remove those right away, but we can mark them as deprecated now and target removal for v1.5.0. So, - mark them as deprecated; - redefine via unix.MPOL_* counterparts; - fix the validator code to use unix.MPOL_* directly. This amends commit `a0e809a8`. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-12-08 15:36:29 -08:00
Kir Kolyshkin	8a9b4dcda6	libct: mountFd: close mountFile on error Reported in issue 5008. Reported-by: Arina Cherednik <arinacherednik034@gmail.com> Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-12-02 15:15:23 -08:00
Kir Kolyshkin	c24965b742	libct: newProcessComm: close fds on error Reported in issue 5008. Reported-by: Arina Cherednik <arinacherednik034@gmail.com> Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-12-02 15:15:23 -08:00
Kir Kolyshkin	88f897160c	libct: startInitialization: add defer close This function calls Init what normally never returns, so the defer only works if there is an error and we can safely use it to close those fds we opened. This was done for most but not all fds. Reported in issue 5008. Reported-by: Arina Cherednik <arinacherednik034@gmail.com> Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-12-02 15:15:23 -08:00
Kir Kolyshkin	1f1ff4be06	Merge pull request #5051 from cyphar/libct-utils-deprecated libct/utils: remove Deprecated functions	2025-12-02 15:06:01 -08:00
Akihiro Suda	64c3c8eea6	Merge pull request #4994 from kolyshkin/gofumpt-extra Enable gofumpt extra rules	2025-11-28 09:30:57 +09:00
Aleksa Sarai	a412bd93e9	libct/utils: remove Deprecated functions These were all marked for deprecation in runc 1.5.0, so remove them now to make sure we don't forget. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-28 11:11:11 +11:00
Aleksa Sarai	195e9551e4	pathrs: add MkdirAllParentInRoot helper While CreateInRoot supports hallucinating the target path, we do not use it directly when constructing device inode targets because we need to have different handling for mknod and bind-mounts. The solution is to simply have a more generic MkdirAllParentInRoot helper that MkdirAll's the parent directory of the target path and then allows the caller to create the trailing component however they like. (This can be used by CreateInRoot internally as well!) Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2025-11-26 21:04:05 +11:00

1 2 3 4 5 ...

3053 Commits