zishuo/runc

mirror of https://github.com/opencontainers/runc.git synced 2026-04-24 16:39:52 +08:00

Author	SHA1	Message	Date
Akihiro Suda	4dcda051da	Merge pull request #5055 from kolyshkin/mpol-2 libct/configs: mark MPOL_* constants as deprecated	2025-12-16 10:39:09 +09:00
Curd Becker	536e183451	Replace os.Is* error checking functions with their errors.Is counterpart Signed-off-by: Curd Becker <me@curd-becker.de>	2025-12-11 03:16:02 +01:00
Kir Kolyshkin	3741f9186d	libct/configs: mark MPOL_* constants as deprecated Alas, these new constants are already in v1.4.0 release so we can't remove those right away, but we can mark them as deprecated now and target removal for v1.5.0. So, - mark them as deprecated; - redefine via unix.MPOL_* counterparts; - fix the validator code to use unix.MPOL_* directly. This amends commit `a0e809a8`. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-12-08 15:36:29 -08:00
Kir Kolyshkin	28daf53d7e	Merge pull request #4832 from marquiz/devel/rdt-enablemonitoring libcontainer/intelrdt: add support for EnableMonitoring field	2025-10-08 00:18:02 -07:00
Antti Kervinen	eda7bdf80c	Add memory policy support Implement support for Linux memory policy in OCI spec PR: https://github.com/opencontainers/runtime-spec/pull/1282 Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>	2025-10-07 15:06:37 +03:00
Markus Lehtonen	7aa4e1a63d	libcontainer/intelrdt: add support for EnableMonitoring field The linux.intelRdt.enableMonitoring field enables the creation of a per-container monitoring group. The monitoring group is removed when the container is destroyed. Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>	2025-09-17 08:54:08 +03:00
Tycho Andersen	70d88bc449	libcontainer/validator: allow setting user.* sysctls inside userns These sysctls are all per-userns (termed `ucounts` in the kernel code) are settable with CAP_SYS_RESOURCE in the user namespace. Signed-off-by: Tycho Andersen <tycho@tycho.pizza>	2025-09-12 12:40:44 -06:00
Rodrigo Campos	7a982f4282	Merge pull request #4854 from marquiz/devel/rdt-root-clos libcontainer/intelrdt: support explicit assignment to root CLOS	2025-08-29 07:17:43 -03:00
Markus Lehtonen	762819496e	libcontainer/configs/validate: add doc.go Add package comment to make revive pass muster. Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>	2025-08-29 12:36:04 +03:00
Markus Lehtonen	ba68a17ad1	libcontainer/configs: add validator unit tests for intelRdt Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>	2025-08-28 14:11:07 +03:00
Markus Lehtonen	b8a83ac255	libcontainer/intelrdt: support explicit assignment to root CLOS Makes it possible e.g. to enable monitoring (linux.intelRdt.enableMonitoring) without creating a CLOS (resctrl group) for the container. Implements https://github.com/opencontainers/runtime-spec/pull/1289. Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>	2025-08-28 14:08:37 +03:00
Kir Kolyshkin	89e59902c4	Modernize code for Go 1.24 Brought to you by modernize -fix -test ./... Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-08-27 19:11:02 -07:00
Markus Lehtonen	e846add595	libcontainer/configs/validate: check that intelrdt is enabled If intelRdt is specified in the spec, check that the resctrl fs is actually mounted. Fixes e.g. the case where "intelRdt.closID" is specified but runc silently ignores this if resctrl is not mounted. Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>	2025-08-01 10:03:54 +03:00
Antonio Ojea	8d180e9658	Add support for Linux Network Devices Implement support for passing Linux Network Devices to the container network namespace. The network device is passed during the creation of the container, before the process is started. It implements the logic defined in the OCI runtime specification. Signed-off-by: Antonio Ojea <aojea@google.com>	2025-06-18 15:52:30 +01:00
Kir Kolyshkin	a75076b4a4	Switch to opencontainers/cgroups This removes libcontainer/cgroups packages and starts using those from github.com/opencontainers/cgroups repo. Mostly generated by: git rm -f libcontainer/cgroups find . -type f -name "*.go" -exec sed -i \ 's\|github.com/opencontainers/runc/libcontainer/cgroups\|github.com/opencontainers/cgroups\|g' \ {} + go get github.com/opencontainers/cgroups@v0.0.1 make vendor gofumpt -w . Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-02-28 15:20:33 -08:00
Kir Kolyshkin	746a5c23c9	libcontainer/configs/validate: improve rootlessEUIDMount 1. Avoid splitting mount data into []string if it does not contain options we're interested in. This should result in slightly less garbage to collect. 2. Use if / else if instead of continue, to make it clearer that we're processing one option at a time. 3. Print the whole option as a sting in an error message; practically this should not have any effect, it's just simpler. 4. Improve some comments. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-02-06 19:47:23 -08:00
Kir Kolyshkin	055041e874	libct: use strings.CutPrefix where possible Using strings.CutPrefix (available since Go 1.20) instead of strings.HasPrefix and/or strings.TrimPrefix makes the code a tad more straightforward. No functional change. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2025-02-06 19:42:35 -08:00
Kir Kolyshkin	57462491c1	libct/configs/validate: add IOPriority.Class validation Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2024-12-22 18:17:44 -08:00
Kir Kolyshkin	b1449fd510	libct: use Namespaces.IsPrivate more In these cases, this is exactly what we want to find out. Slightly improves performance and readability. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2024-09-17 22:49:29 -07:00
utam0k	bfbd0305ba	Add I/O priority Signed-off-by: utam0k <k0ma@utam0k.jp>	2024-03-30 22:31:54 +09:00
lengrongfu	68438ba272	fix scheduler validate Signed-off-by: lengrongfu <lenronfu@gmail.com>	2024-01-05 09:50:41 +08:00
Aleksa Sarai	3b57e45cbf	mount: add support for ridmap and idmap ridmap indicates that the id mapping should be applied recursively (only really relevant for rbind mount entries), and idmap indicates that it should not be applied recursively (the default). If no mappings are specified for the mount, we use the userns configuration of the container. This matches the behaviour in the currently-unreleased runtime-spec. This includes a minor change to the state.json serialisation format, but because there has been no released version of runc with commit `fbf183c6f8` ("Add uid and gid mappings to mounts"), we can safely make this change without affecting running containers. Doing it this way makes it much easier to handle m.IsIDMapped() and indicating that a mapping has been specified. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2023-12-14 11:36:42 +11:00
Aleksa Sarai	5ae88daf06	idmap: allow arbitrary idmap mounts regardless of userns configuration With the rework of nsexec.c to handle MOUNT_ATTR_IDMAP in our Go code we can now handle arbitrary mappings without issue, so remove the primary artificial limit of mappings (must use the same mapping as the container's userns) and add some tests. We still only support idmap mounts for bind-mounts because configuring mappings for other filesystems would require switching our entire mount machinery to the new mount API. The current design would easily allow for this but we would need to convert new mount options entirely to the fsopen/fsconfig/fsmount API. This can be done in the future. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2023-12-14 11:36:41 +11:00
Aleksa Sarai	09822c3da8	configs: disallow ambiguous userns and timens configurations For userns and timens, the mappings (and offsets, respectively) cannot be changed after the namespace is first configured. Thus, configuring a container with a namespace path to join means that you cannot also provide configuration for said namespace. Previously we would silently ignore the configuration (and just join the provided path), but we really should be returning an error (especially when you consider that the configuration userns mappings are used quite a bit in runc with the assumption that they are the correct mapping for the userns -- but in this case they are not). In the case of userns, the mappings are also required if you _do not_ specify a path, while in the case of the time namespace you can have a container with a timens but no mappings specified. It should be noted that the case checking that the user has not specified a userns path and a userns mapping needs to be handled in specconv (as opposed to the configuration validator) because with this patchset we now cache the mappings of path-based userns configurations and thus the validator can't be sure whether the mapping is a cached mapping or a user-specified one. So we do the validation in specconv, and thus the test for this needs to be an integration test. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2023-12-05 17:46:09 +11:00
Aleksa Sarai	1912d5988b	: actually support joining a userns with a new container Our handling for name space paths with user namespaces has been broken for a long time. In particular, the need to parse /proc/self/id_map in quite a few places meant that we would treat userns configurations that had a namespace path as if they were a userns configuration without mappings, resulting in errors. The primary issue was down to the id translation helper functions, which could only handle configurations that had explicit mappings. Obviously, when joining a user namespace we need to map the ids but figuring out the correct mapping is non-trivial in comparison. In order to get the mapping, you need to read /proc/<pid>/id_map of a process inside the userns -- while most userns paths will be of the form /proc/<pid>/ns/user (and we have a fast-path for this case), this is not guaranteed and thus it is necessary to spawn a process inside the container and read its /proc/<pid>/id_map files in the general case. As Go does not allow us spawn a subprocess into a target userns, we have to use CGo to fork a sub-process which does the setns(2). To be honest, this is a little dodgy in regards to POSIX signal-safety(7) but since we do no allocations and we are executing in the forked context from a Go program (not a C program), it should be okay. The other alternative would be to do an expensive re-exec (a-la nsexec which would make several other bits of runc more complicated), or to use nsenter(1) which might not exist on the system and is less than ideal. Because we need to logically remap users quite a few times in runc (including in "runc init", where joining the namespace is not feasable), we cache the mapping inside the libcontainer config struct. A future patch will make sure that we stop allow invalid user configurations where a mapping is specified as well as a userns path to join. Finally, add an integration test to make sure we don't regress this again. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2023-12-05 17:46:08 +11:00
Aleksa Sarai	669f4dbef8	configs: validate: add validation for bind-mount fsflags Bind-mounts cannot have any filesystem-specific "data" arguments, because the kernel ignores the data argument for MS_BIND and MS_BIND\|MS_REMOUNT and we cannot safely try to override the flags because those would affect mounts on the host (these flags affect the superblock). It should be noted that there are cases where the filesystem-specified flags will also be ignored for non-bind-mounts but those are kernel quirks and there's no real way for us to work around them. And users wouldn't get any real benefit from us adding guardrails to existing kernel behaviour. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2023-11-02 07:50:03 +11:00
Rodrigo Campos	4bf8b55594	libct: Remove old comment We changed it in PR: https://github.com/opencontainers/runtime-spec/pull/1225 But we missed to remove this comment. Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>	2023-11-01 12:48:42 +01:00
utam0k	770728e16e	Support `process.scheduler` Spec: https://github.com/opencontainers/runtime-spec/pull/1188 Fix: https://github.com/opencontainers/runc/issues/3895 Co-authored-by: lifubang <lifubang@acmcoder.com> Signed-off-by: utam0k <k0ma@utam0k.jp> Signed-off-by: lifubang <lifubang@acmcoder.com>	2023-10-04 15:53:18 +08:00
Rodrigo Campos	b17c6f237d	validator: Relax warning for not abs mount dst path The runtime spec now allows relative mount dst paths, so remove the comment saying we will switch this to an error later and change the error messages to reflect that. Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>	2023-09-11 16:02:41 +02:00
Aleksa Sarai	aa5f4c1137	tests: add several timens tests These are not exhaustive, but at least confirm that the feature is not obviously broken (we correctly set the time offsets). Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2023-08-10 19:01:31 +10:00
Aleksa Sarai	9acfd7b1a3	timens: minor cleanups Fix up a few things that were flagged in the review of the original timens PR, namely around error handling and validation. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2023-08-10 18:59:55 +10:00
Aleksa Sarai	0866112e81	merge #3876 into opencontainers/runc:main Chethan Suresh (1): Support time namespace LGTMs: kolyskin cyphar Closes #3876	2023-08-10 18:27:17 +10:00
Rodrigo Campos	19d26a6596	Revert "libct/validator: Error out on non-abs paths" This reverts commit `881e92a3fd` and adjust the code so the idmap validations are strict. We now only throw a warning and the container is started just fine. Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>	2023-08-08 13:45:31 +02:00
Chethan Suresh	ebc2e7c435	Support time namespace "time" namespace was introduced in Linux v5.6 support new time namespace to set boottime and monotonic time offset Example runtime spec "timeOffsets": { "monotonic": { "secs": 172800, "nanosecs": 0 }, "boottime": { "secs": 604800, "nanosecs": 0 } } Signed-off-by: Chethan Suresh <chethan.suresh@sony.com>	2023-08-03 10:12:01 +05:30
Francis Laniel	c47f58c4e9	Capitalize [UG]idMappings as [UG]IDMappings Signed-off-by: Francis Laniel <flaniel@linux.microsoft.com>	2023-07-21 13:55:34 +02:00
Rodrigo Campos	fbf183c6f8	Add uid and gid mappings to mounts Co-authored-by: Francis Laniel <flaniel@linux.microsoft.com> Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>	2023-07-11 16:17:48 +02:00
Rodrigo Campos	881e92a3fd	libct/validator: Error out on non-abs paths This was a warning already and it was requested to make this an error while we will add validation of idmap mounts: https://github.com/opencontainers/runc/pull/3717#discussion_r1154705318 I've also tested a k8s cluster and the config.json generated by containerd didn't use any relative paths. I tested one pod, so it was definitely not an extensive test. Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>	2023-07-07 12:00:33 +02:00
utam0k	d9230602e9	Implement to set a domainname opencontainers/runtime-spec#1156 Signed-off-by: utam0k <k0ma@utam0k.jp>	2023-04-12 13:31:20 +00:00
Kir Kolyshkin	45cc290f02	libct: fixes for godoc 1.19 Since Go 1.19, godoc recognizes lists, code blocks, headings etc. It also reformats the sources making it more apparent that these features are used. Fix a few places where it misinterpreted the formatting (such as indented vs unindented), and format the result using the gofumpt from HEAD, which already incorporates gofmt 1.19 changes. Some more fixes (and enhancements) might be required. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2022-08-16 09:53:54 -07:00
Kir Kolyshkin	48006d0007	libct/configs/validate: rootlessEUIDMount: speedup 1. Fix function docs. In particular, remove the part which is not true ("verifies that the user isn't trying to set up any mounts they don't have the rights to do"), and fix the part that says "that doesn't resolve to root" (which is no longer true since commit `d8b669400a`). 2. Replace fmt.Sscanf (which is slow and does lots of allocations) with strings.TrimPrefix and strconv.Atoi. 3. Add a benchmark for rootlessEUIDMount. Comparing the old and the new implementations: name old time/op new time/op delta RootlessEUIDMount-4 1.01µs ± 2% 0.16µs ± 1% -84.15% (p=0.008 n=5+5) name old alloc/op new alloc/op delta RootlessEUIDMount-4 224B ± 0% 80B ± 0% -64.29% (p=0.008 n=5+5) name old allocs/op new allocs/op delta RootlessEUIDMount-4 7.00 ± 0% 1.00 ± 0% -85.71% (p=0.008 n=5+5) Note this code is already tested (in rootless_test.go). Fixes: `d8b669400a` Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2022-03-17 13:39:55 -07:00
Markus Lehtonen	1d5c331042	configs/validate: looser validation for RDT Don't require CAT or MBA because we don't detect those correctly (we don't support L2 or L3DATA/L3CODE for example, and in the future possibly even more). With plain "ClosId mode" we don't really care: we assign the container to a pre-configured CLOS without trying to do anything smarter. Moreover, this was a duplicate/redundant check anyway, as for CAT and MBA there is another specific sanity check that is done if L3 or MB is specified in the config. Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>	2022-02-18 16:24:50 +02:00
Kir Kolyshkin	0d21515038	libct: remove Validator interface We only have one implementation of config validator, which is always used. It makes no sense to have Validator interface. Having validate.Validator field in Factory does not make sense for all the same reasons. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2022-02-03 11:40:29 -08:00
Mengjiao Liu	a9bb11ec3c	Fix the conversion of sysctl variable dots and slashes Signed-off-by: Mengjiao Liu <mengjiao.liu@daocloud.io>	2021-11-04 11:45:15 +08:00
Mengjiao Liu	0f933d54fe	Rename package validate_test to package validate Signed-off-by: Mengjiao Liu <mengjiao.liu@daocloud.io>	2021-11-04 11:45:15 +08:00
Kir Kolyshkin	972aea3af0	libct/configs/validate: allow / in sysctl names Runtime spec says: > sysctl (object, OPTIONAL) allows kernel parameters to be modified at > runtime for the container. For more information, see the sysctl(8) > man page. and sysctl(8) says: > variable > The name of a key to read from. An example is > kernel.ostype. The '/' separator is also accepted in place of a '.'. Apparently, runc config validator do not support sysctls with / as a separator. Fortunately this is a one-line fix. Add some more test data where / is used as a separator. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-10-29 09:45:55 -07:00
Akihiro Suda	bd75bc2dc6	Merge pull request #3176 from kolyshkin/rm-config-error-alt libct/error.go: rm ConfigError (alt)	2021-09-02 14:34:32 +09:00
Kir Kolyshkin	6145628fff	configs/validate: audit all returned errors All the errors returned from Validate should tell about a configuration error. Some were lacking a context, so add it. While at it, fix abusing fmt.Errorf and logrus.Warnf where the argument do not contain %-style formatting. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-08-23 18:54:47 -07:00
Markus Lehtonen	17e3b41dd0	libcontainer/intelrdt: support ClosID parameter Handle ClosID parameter of IntelRdt. Makes it possible to use pre-configured classes/ClosIDs and avoid running out of available IDs which easily happens with per-container classes. Remove validator checks for empty L3CacheSchema and MemBwSchema fields in order to be able to leave them empty, and only specify ClosID for a pre-configured class. Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>	2021-08-09 15:58:03 +03:00
Kir Kolyshkin	a91ce3062f	libct/*_test.go: use t.TempDir Replace ioutil.TempDir (mostly) with t.TempDir, which require no explicit cleanup. While at it, fix incorrect usage of os.ModePerm in libcontainer/intelrdt test. This is supposed to be a mask, not mode bits. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-07-27 01:41:47 -07:00
Kir Kolyshkin	7be93a66b9	*: fmt.Errorf: use %w when appropriate This should result in no change when the error is printed, but make the errors returned unwrappable, meaning errors.As and errors.Is will work. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-06-22 16:09:47 -07:00

1 2

95 Commits