Commit Graph

95 Commits

Author SHA1 Message Date
Akihiro Suda 4dcda051da Merge pull request #5055 from kolyshkin/mpol-2
libct/configs: mark MPOL_* constants as deprecated
2025-12-16 10:39:09 +09:00
Curd Becker 536e183451 Replace os.Is* error checking functions with their errors.Is counterpart
Signed-off-by: Curd Becker <me@curd-becker.de>
2025-12-11 03:16:02 +01:00
Kir Kolyshkin 3741f9186d libct/configs: mark MPOL_* constants as deprecated
Alas, these new constants are already in v1.4.0 release so we can't
remove those right away, but we can mark them as deprecated now
and target removal for v1.5.0.

So,
 - mark them as deprecated;
 - redefine via unix.MPOL_* counterparts;
 - fix the validator code to use unix.MPOL_* directly.

This amends commit a0e809a8.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-12-08 15:36:29 -08:00
Kir Kolyshkin 28daf53d7e Merge pull request #4832 from marquiz/devel/rdt-enablemonitoring
libcontainer/intelrdt: add support for EnableMonitoring field
2025-10-08 00:18:02 -07:00
Antti Kervinen eda7bdf80c Add memory policy support
Implement support for Linux memory policy in OCI spec PR:
https://github.com/opencontainers/runtime-spec/pull/1282

Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
2025-10-07 15:06:37 +03:00
Markus Lehtonen 7aa4e1a63d libcontainer/intelrdt: add support for EnableMonitoring field
The linux.intelRdt.enableMonitoring field enables the creation of
a per-container monitoring group. The monitoring group is removed when
the container is destroyed.

Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
2025-09-17 08:54:08 +03:00
Tycho Andersen 70d88bc449 libcontainer/validator: allow setting user.* sysctls inside userns
These sysctls are all per-userns (termed `ucounts` in the kernel code) are
settable with CAP_SYS_RESOURCE in the user namespace.

Signed-off-by: Tycho Andersen <tycho@tycho.pizza>
2025-09-12 12:40:44 -06:00
Rodrigo Campos 7a982f4282 Merge pull request #4854 from marquiz/devel/rdt-root-clos
libcontainer/intelrdt: support explicit assignment to root CLOS
2025-08-29 07:17:43 -03:00
Markus Lehtonen 762819496e libcontainer/configs/validate: add doc.go
Add package comment to make revive pass muster.

Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
2025-08-29 12:36:04 +03:00
Markus Lehtonen ba68a17ad1 libcontainer/configs: add validator unit tests for intelRdt
Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
2025-08-28 14:11:07 +03:00
Markus Lehtonen b8a83ac255 libcontainer/intelrdt: support explicit assignment to root CLOS
Makes it possible e.g. to enable monitoring
(linux.intelRdt.enableMonitoring) without creating a CLOS (resctrl
group) for the container.

Implements https://github.com/opencontainers/runtime-spec/pull/1289.

Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
2025-08-28 14:08:37 +03:00
Kir Kolyshkin 89e59902c4 Modernize code for Go 1.24
Brought to you by

	modernize -fix -test ./...

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-08-27 19:11:02 -07:00
Markus Lehtonen e846add595 libcontainer/configs/validate: check that intelrdt is enabled
If intelRdt is specified in the spec, check that the resctrl fs is
actually mounted. Fixes e.g. the case where "intelRdt.closID" is
specified but runc silently ignores this if resctrl is not mounted.

Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
2025-08-01 10:03:54 +03:00
Antonio Ojea 8d180e9658 Add support for Linux Network Devices
Implement support for passing Linux Network Devices to the container
network namespace.

The network device is passed during the creation of the container,
before the process is started.

It implements the logic defined in the OCI runtime specification.

Signed-off-by: Antonio Ojea <aojea@google.com>
2025-06-18 15:52:30 +01:00
Kir Kolyshkin a75076b4a4 Switch to opencontainers/cgroups
This removes libcontainer/cgroups packages and starts
using those from github.com/opencontainers/cgroups repo.

Mostly generated by:

  git rm -f libcontainer/cgroups

  find . -type f -name "*.go" -exec sed -i \
    's|github.com/opencontainers/runc/libcontainer/cgroups|github.com/opencontainers/cgroups|g' \
    {} +

  go get github.com/opencontainers/cgroups@v0.0.1
  make vendor
  gofumpt -w .

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-02-28 15:20:33 -08:00
Kir Kolyshkin 746a5c23c9 libcontainer/configs/validate: improve rootlessEUIDMount
1. Avoid splitting mount data into []string if it does not contain
   options we're interested in. This should result in slightly less
   garbage to collect.

2. Use if / else if instead of continue, to make it clearer that
   we're processing one option at a time.

3. Print the whole option as a sting in an error message; practically
   this should not have any effect, it's just simpler.

4. Improve some comments.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-02-06 19:47:23 -08:00
Kir Kolyshkin 055041e874 libct: use strings.CutPrefix where possible
Using strings.CutPrefix (available since Go 1.20) instead of
strings.HasPrefix and/or strings.TrimPrefix makes the code
a tad more straightforward.

No functional change.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-02-06 19:42:35 -08:00
Kir Kolyshkin 57462491c1 libct/configs/validate: add IOPriority.Class validation
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2024-12-22 18:17:44 -08:00
Kir Kolyshkin b1449fd510 libct: use Namespaces.IsPrivate more
In these cases, this is exactly what we want to find out.

Slightly improves performance and readability.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2024-09-17 22:49:29 -07:00
utam0k bfbd0305ba Add I/O priority
Signed-off-by: utam0k <k0ma@utam0k.jp>
2024-03-30 22:31:54 +09:00
lengrongfu 68438ba272 fix scheduler validate
Signed-off-by: lengrongfu <lenronfu@gmail.com>
2024-01-05 09:50:41 +08:00
Aleksa Sarai 3b57e45cbf mount: add support for ridmap and idmap
ridmap indicates that the id mapping should be applied recursively (only
really relevant for rbind mount entries), and idmap indicates that it
should not be applied recursively (the default). If no mappings are
specified for the mount, we use the userns configuration of the
container. This matches the behaviour in the currently-unreleased
runtime-spec.

This includes a minor change to the state.json serialisation format, but
because there has been no released version of runc with commit
fbf183c6f8 ("Add uid and gid mappings to mounts"), we can safely make
this change without affecting running containers. Doing it this way
makes it much easier to handle m.IsIDMapped() and indicating that a
mapping has been specified.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2023-12-14 11:36:42 +11:00
Aleksa Sarai 5ae88daf06 idmap: allow arbitrary idmap mounts regardless of userns configuration
With the rework of nsexec.c to handle MOUNT_ATTR_IDMAP in our Go code we
can now handle arbitrary mappings without issue, so remove the primary
artificial limit of mappings (must use the same mapping as the
container's userns) and add some tests.

We still only support idmap mounts for bind-mounts because configuring
mappings for other filesystems would require switching our entire mount
machinery to the new mount API. The current design would easily allow
for this but we would need to convert new mount options entirely to the
fsopen/fsconfig/fsmount API. This can be done in the future.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2023-12-14 11:36:41 +11:00
Aleksa Sarai 09822c3da8 configs: disallow ambiguous userns and timens configurations
For userns and timens, the mappings (and offsets, respectively) cannot
be changed after the namespace is first configured. Thus, configuring a
container with a namespace path to join means that you cannot also
provide configuration for said namespace. Previously we would silently
ignore the configuration (and just join the provided path), but we
really should be returning an error (especially when you consider that
the configuration userns mappings are used quite a bit in runc with the
assumption that they are the correct mapping for the userns -- but in
this case they are not).

In the case of userns, the mappings are also required if you _do not_
specify a path, while in the case of the time namespace you can have a
container with a timens but no mappings specified.

It should be noted that the case checking that the user has not
specified a userns path and a userns mapping needs to be handled in
specconv (as opposed to the configuration validator) because with this
patchset we now cache the mappings of path-based userns configurations
and thus the validator can't be sure whether the mapping is a cached
mapping or a user-specified one. So we do the validation in specconv,
and thus the test for this needs to be an integration test.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2023-12-05 17:46:09 +11:00
Aleksa Sarai 1912d5988b *: actually support joining a userns with a new container
Our handling for name space paths with user namespaces has been broken
for a long time. In particular, the need to parse /proc/self/*id_map in
quite a few places meant that we would treat userns configurations that
had a namespace path as if they were a userns configuration without
mappings, resulting in errors.

The primary issue was down to the id translation helper functions, which
could only handle configurations that had explicit mappings. Obviously,
when joining a user namespace we need to map the ids but figuring out
the correct mapping is non-trivial in comparison.

In order to get the mapping, you need to read /proc/<pid>/*id_map of a
process inside the userns -- while most userns paths will be of the form
/proc/<pid>/ns/user (and we have a fast-path for this case), this is not
guaranteed and thus it is necessary to spawn a process inside the
container and read its /proc/<pid>/*id_map files in the general case.

As Go does not allow us spawn a subprocess into a target userns,
we have to use CGo to fork a sub-process which does the setns(2). To be
honest, this is a little dodgy in regards to POSIX signal-safety(7) but
since we do no allocations and we are executing in the forked context
from a Go program (not a C program), it should be okay. The other
alternative would be to do an expensive re-exec (a-la nsexec which would
make several other bits of runc more complicated), or to use nsenter(1)
which might not exist on the system and is less than ideal.

Because we need to logically remap users quite a few times in runc
(including in "runc init", where joining the namespace is not feasable),
we cache the mapping inside the libcontainer config struct. A future
patch will make sure that we stop allow invalid user configurations
where a mapping is specified as well as a userns path to join.

Finally, add an integration test to make sure we don't regress this again.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2023-12-05 17:46:08 +11:00
Aleksa Sarai 669f4dbef8 configs: validate: add validation for bind-mount fsflags
Bind-mounts cannot have any filesystem-specific "data" arguments,
because the kernel ignores the data argument for MS_BIND and
MS_BIND|MS_REMOUNT and we cannot safely try to override the flags
because those would affect mounts on the host (these flags affect the
superblock).

It should be noted that there are cases where the filesystem-specified
flags will also be ignored for non-bind-mounts but those are kernel
quirks and there's no real way for us to work around them. And users
wouldn't get any real benefit from us adding guardrails to existing
kernel behaviour.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2023-11-02 07:50:03 +11:00
Rodrigo Campos 4bf8b55594 libct: Remove old comment
We changed it in PR:
	https://github.com/opencontainers/runtime-spec/pull/1225

But we missed to remove this comment.

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2023-11-01 12:48:42 +01:00
utam0k 770728e16e Support process.scheduler
Spec: https://github.com/opencontainers/runtime-spec/pull/1188
Fix: https://github.com/opencontainers/runc/issues/3895

Co-authored-by: lifubang <lifubang@acmcoder.com>
Signed-off-by: utam0k <k0ma@utam0k.jp>
Signed-off-by: lifubang <lifubang@acmcoder.com>
2023-10-04 15:53:18 +08:00
Rodrigo Campos b17c6f237d validator: Relax warning for not abs mount dst path
The runtime spec now allows relative mount dst paths, so remove the
comment saying we will switch this to an error later and change the
error messages to reflect that.

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2023-09-11 16:02:41 +02:00
Aleksa Sarai aa5f4c1137 tests: add several timens tests
These are not exhaustive, but at least confirm that the feature is not
obviously broken (we correctly set the time offsets).

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2023-08-10 19:01:31 +10:00
Aleksa Sarai 9acfd7b1a3 timens: minor cleanups
Fix up a few things that were flagged in the review of the original
timens PR, namely around error handling and validation.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2023-08-10 18:59:55 +10:00
Aleksa Sarai 0866112e81 merge #3876 into opencontainers/runc:main
Chethan Suresh (1):
  Support time namespace

LGTMs: kolyskin cyphar
Closes #3876
2023-08-10 18:27:17 +10:00
Rodrigo Campos 19d26a6596 Revert "libct/validator: Error out on non-abs paths"
This reverts commit 881e92a3fd and adjust
the code so the idmap validations are strict.

We now only throw a warning and the container is started just fine.

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2023-08-08 13:45:31 +02:00
Chethan Suresh ebc2e7c435 Support time namespace
"time" namespace was introduced in Linux v5.6
support new time namespace to set boottime and monotonic time offset

Example runtime spec

"timeOffsets": {
    "monotonic": {
        "secs": 172800,
        "nanosecs": 0
    },
    "boottime": {
        "secs": 604800,
        "nanosecs": 0
    }
}

Signed-off-by: Chethan Suresh <chethan.suresh@sony.com>
2023-08-03 10:12:01 +05:30
Francis Laniel c47f58c4e9 Capitalize [UG]idMappings as [UG]IDMappings
Signed-off-by: Francis Laniel <flaniel@linux.microsoft.com>
2023-07-21 13:55:34 +02:00
Rodrigo Campos fbf183c6f8 Add uid and gid mappings to mounts
Co-authored-by: Francis Laniel <flaniel@linux.microsoft.com>
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2023-07-11 16:17:48 +02:00
Rodrigo Campos 881e92a3fd libct/validator: Error out on non-abs paths
This was a warning already and it was requested to make this an error
while we will add validation of idmap mounts:
	https://github.com/opencontainers/runc/pull/3717#discussion_r1154705318

I've also tested a k8s cluster and the config.json generated by
containerd didn't use any relative paths. I tested one pod, so it was
definitely not an extensive test.

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2023-07-07 12:00:33 +02:00
utam0k d9230602e9 Implement to set a domainname
opencontainers/runtime-spec#1156

Signed-off-by: utam0k <k0ma@utam0k.jp>
2023-04-12 13:31:20 +00:00
Kir Kolyshkin 45cc290f02 libct: fixes for godoc 1.19
Since Go 1.19, godoc recognizes lists, code blocks, headings etc. It
also reformats the sources making it more apparent that these features
are used.

Fix a few places where it misinterpreted the formatting (such as
indented vs unindented), and format the result using the gofumpt
from HEAD, which already incorporates gofmt 1.19 changes.

Some more fixes (and enhancements) might be required.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-08-16 09:53:54 -07:00
Kir Kolyshkin 48006d0007 libct/configs/validate: rootlessEUIDMount: speedup
1. Fix function docs. In particular, remove the part
   which is not true ("verifies that the user isn't trying to set up any
   mounts they don't have the rights to do"), and fix the part that
   says "that doesn't resolve to root" (which is no longer true since
   commit d8b669400a).

2. Replace fmt.Sscanf (which is slow and does lots of allocations)
   with strings.TrimPrefix and strconv.Atoi.

3. Add a benchmark for rootlessEUIDMount. Comparing the old and the new
   implementations:

	name                 old time/op    new time/op    delta
	RootlessEUIDMount-4    1.01µs ± 2%    0.16µs ± 1%  -84.15%  (p=0.008 n=5+5)

	name                 old alloc/op   new alloc/op   delta
	RootlessEUIDMount-4      224B ± 0%       80B ± 0%  -64.29%  (p=0.008 n=5+5)

	name                 old allocs/op  new allocs/op  delta
	RootlessEUIDMount-4      7.00 ± 0%      1.00 ± 0%  -85.71%  (p=0.008 n=5+5)

Note this code is already tested (in rootless_test.go).

Fixes: d8b669400a
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-03-17 13:39:55 -07:00
Markus Lehtonen 1d5c331042 configs/validate: looser validation for RDT
Don't require CAT or MBA because we don't detect those correctly (we
don't support L2 or L3DATA/L3CODE for example, and in the future
possibly even more). With plain "ClosId mode" we don't really care: we
assign the container to a pre-configured CLOS without trying to do
anything smarter.

Moreover, this was a duplicate/redundant check anyway, as for CAT and
MBA there is another specific sanity check that is done if L3 or MB
is specified in the config.

Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
2022-02-18 16:24:50 +02:00
Kir Kolyshkin 0d21515038 libct: remove Validator interface
We only have one implementation of config validator, which is always
used. It makes no sense to have Validator interface.

Having validate.Validator field in Factory does not make sense for all
the same reasons.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-02-03 11:40:29 -08:00
Mengjiao Liu a9bb11ec3c Fix the conversion of sysctl variable dots and slashes
Signed-off-by: Mengjiao Liu <mengjiao.liu@daocloud.io>
2021-11-04 11:45:15 +08:00
Mengjiao Liu 0f933d54fe Rename package validate_test to package validate
Signed-off-by: Mengjiao Liu <mengjiao.liu@daocloud.io>
2021-11-04 11:45:15 +08:00
Kir Kolyshkin 972aea3af0 libct/configs/validate: allow / in sysctl names
Runtime spec says:

> sysctl (object, OPTIONAL) allows kernel parameters to be modified at
> runtime for the container. For more information, see the sysctl(8)
> man page.

and sysctl(8) says:

> variable
>    The name of a key to read from. An example is
>    kernel.ostype. The '/' separator is also accepted in place of a '.'.

Apparently, runc config validator do not support sysctls with / as a
separator. Fortunately this is a one-line fix.

Add some more test data where / is used as a separator.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-10-29 09:45:55 -07:00
Akihiro Suda bd75bc2dc6 Merge pull request #3176 from kolyshkin/rm-config-error-alt
libct/error.go: rm ConfigError (alt)
2021-09-02 14:34:32 +09:00
Kir Kolyshkin 6145628fff configs/validate: audit all returned errors
All the errors returned from Validate should tell about a configuration
error. Some were lacking a context, so add it.

While at it, fix abusing fmt.Errorf and logrus.Warnf where the argument
do not contain %-style formatting.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-08-23 18:54:47 -07:00
Markus Lehtonen 17e3b41dd0 libcontainer/intelrdt: support ClosID parameter
Handle ClosID parameter of IntelRdt. Makes it possible to use
pre-configured classes/ClosIDs and avoid running out of available IDs
which easily happens with per-container classes.

Remove validator checks for empty L3CacheSchema and MemBwSchema fields
in order to be able to leave them empty, and only specify ClosID for
a pre-configured class.

Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
2021-08-09 15:58:03 +03:00
Kir Kolyshkin a91ce3062f libct/*_test.go: use t.TempDir
Replace ioutil.TempDir (mostly) with t.TempDir, which require no
explicit cleanup.

While at it, fix incorrect usage of os.ModePerm in libcontainer/intelrdt
test. This is supposed to be a mask, not mode bits.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-07-27 01:41:47 -07:00
Kir Kolyshkin 7be93a66b9 *: fmt.Errorf: use %w when appropriate
This should result in no change when the error is printed, but make the
errors returned unwrappable, meaning errors.As and errors.Is will work.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-22 16:09:47 -07:00