Commit Graph

215 Commits

Author SHA1 Message Date
Kir Kolyshkin a48a7cef96 libct/specconv: fix panic in initSystemdProps
There is a chance of panic here -- eliminate it.

Add a test case (which panics before the fix).

Reported-by: Luke Hinds <luke@stacklok.com>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2026-02-26 18:26:46 -08:00
Kir Kolyshkin 392a221293 libct/specconv: TestInitSystemdProps: use t.Run
Use t.Run for individual tests. Add missing desc fields.

Best reviewed with --ignore-all-space.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2026-02-26 18:26:46 -08:00
lifubang 5560d55bfd libct/specconv: fix partial clear of atime mount flags
When parsing mount options into recAttrSet and recAttrClr,
the code sets attr_clr to individual atime flags (e.g.
MOUNT_ATTR_NOATIME or MOUNT_ATTR_STRICTATIME) when clearing
atime attributes. However, this violates the kernel's
requirement documented in mount_setattr(2)[1]:

> Note that, since the access-time values are an enumeration
> rather than bit values, a caller wanting to transition to a
> different access-time setting cannot simply specify the
> access-time setting in attr_set, but must also include
> MOUNT_ATTR__ATIME in the attr_clr field.  The kernel will
> verify that MOUNT_ATTR__ATIME isn't partially set in
> attr_clr (i.e., either all bits in the MOUNT_ATTR__ATIME
> bit field are either set or clear), and that attr_set
> doesn't have any access-time bits set if MOUNT_ATTR__ATIME
> isn't set in attr_clr.

Passing only a single atime flag (e.g. MOUNT_ATTR_RELATIME) in
attr_clr causes mount_setattr() to fail with EINVAL.

This change ensures that whenever an atime mode is updated,
attr_clr includes MOUNT_ATTR__ATIME to properly reset the
entire access-time attribute field before applying the new mode.

[1] https://man7.org/linux/man-pages/man2/mount_setattr.2.html

Signed-off-by: lifubang <lifubang@acmcoder.com>
2026-02-06 03:30:55 +00:00
Curd Becker 536e183451 Replace os.Is* error checking functions with their errors.Is counterpart
Signed-off-by: Curd Becker <me@curd-becker.de>
2025-12-11 03:16:02 +01:00
Aleksa Sarai 42a1e19d67 libcontainer: move CleanPath and StripRoot to internal/pathrs
These helpers will be needed for the compatibility code added in future
patches in this series, but because "internal/pathrs" is imported by
"libcontainer/utils" we need to move them so that we can avoid circular
dependencies.

Because the old functions were in a non-internal package it is possible
some downstreams use them, so add some wrappers but mark them as
deprecated.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2025-11-26 21:03:29 +11:00
Aleksa Sarai a0e809a8ba libct: switch to unix.SetMemPolicy wrapper
This is mostly a mechanical change, but we also need to change some
types to match the "mode int" argument that golang.org/x/sys/unix
decided to use.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2025-11-10 16:03:02 +11:00
Kir Kolyshkin 28daf53d7e Merge pull request #4832 from marquiz/devel/rdt-enablemonitoring
libcontainer/intelrdt: add support for EnableMonitoring field
2025-10-08 00:18:02 -07:00
Antti Kervinen eda7bdf80c Add memory policy support
Implement support for Linux memory policy in OCI spec PR:
https://github.com/opencontainers/runtime-spec/pull/1282

Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
2025-10-07 15:06:37 +03:00
Markus Lehtonen 7aa4e1a63d libcontainer/intelrdt: add support for EnableMonitoring field
The linux.intelRdt.enableMonitoring field enables the creation of
a per-container monitoring group. The monitoring group is removed when
the container is destroyed.

Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
2025-09-17 08:54:08 +03:00
Kir Kolyshkin b5cb56413c Merge pull request #4830 from marquiz/devel/rdt-schemata-field
libcontainer/intelrdt: add support for Schemata field
2025-09-16 13:23:43 -07:00
Markus Lehtonen 41553216ee libcontainer/intelrdt: add support for Schemata field
Implement support for the linux.intelRdt.schemata field of the spec.
This allows management of the "schemata" file in the resctrl group in a
generic way.

Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
2025-09-15 15:09:06 +03:00
Kir Kolyshkin 89e59902c4 Modernize code for Go 1.24
Brought to you by

	modernize -fix -test ./...

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-08-27 19:11:02 -07:00
Antonio Ojea 8d180e9658 Add support for Linux Network Devices
Implement support for passing Linux Network Devices to the container
network namespace.

The network device is passed during the creation of the container,
before the process is started.

It implements the logic defined in the OCI runtime specification.

Signed-off-by: Antonio Ojea <aojea@google.com>
2025-06-18 15:52:30 +01:00
Kir Kolyshkin 7fdec327a0 Use any instead of interface{}
The keyword is available since Go 1.18 (see
https://pkg.go.dev/builtin#any).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-03-31 17:15:06 -07:00
Kir Kolyshkin 17570625c0 Use for range over integers
This appears in Go 1.22 (see https://tip.golang.org/ref/spec#For_range).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-03-31 17:15:06 -07:00
Kir Kolyshkin 0fc2338d59 libct/specconv: use maps.Clone
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-03-31 17:15:06 -07:00
Kir Kolyshkin 431b8bb4d8 int/linux: add/use Getwd
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-03-26 14:16:53 -07:00
Kir Kolyshkin 10ca66bff5 runc exec: implement CPU affinity
As per
- https://github.com/opencontainers/runtime-spec/pull/1253
- https://github.com/opencontainers/runtime-spec/pull/1261

CPU affinity can be set in two ways:
1. When creating/starting a container, in config.json's
   Process.ExecCPUAffinity, which is when applied to all execs.
2. When running an exec, in process.json's CPUAffinity, which
   applied to a given exec and overrides the value from (1).

Add some basic tests.

Note that older kernels (RHEL8, Ubuntu 20.04) change CPU affinity of a
process to that of a container's cgroup, as soon as it is moved to that
cgroup, while newer kernels (Ubuntu 24.04, Fedora 41) don't do that.

Because of the above,
 - it's impossible to really test initial CPU affinity without adding
   debug logging to libcontainer/nsenter;
 - for older kernels, there can be a brief moment when exec's affinity
   is different than either initial or final affinity being set;
 - exec's final CPU affinity, if not specified, can be different
   depending on the kernel, therefore we don't test it.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-03-02 19:17:41 -08:00
Kir Kolyshkin a75076b4a4 Switch to opencontainers/cgroups
This removes libcontainer/cgroups packages and starts
using those from github.com/opencontainers/cgroups repo.

Mostly generated by:

  git rm -f libcontainer/cgroups

  find . -type f -name "*.go" -exec sed -i \
    's|github.com/opencontainers/runc/libcontainer/cgroups|github.com/opencontainers/cgroups|g' \
    {} +

  go get github.com/opencontainers/cgroups@v0.0.1
  make vendor
  gofumpt -w .

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-02-28 15:20:33 -08:00
Kir Kolyshkin 055041e874 libct: use strings.CutPrefix where possible
Using strings.CutPrefix (available since Go 1.20) instead of
strings.HasPrefix and/or strings.TrimPrefix makes the code
a tad more straightforward.

No functional change.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-02-06 19:42:35 -08:00
Kir Kolyshkin 6c9ddcc648 libct: switch from libct/devices to libct/cgroups/devices/config
Use the old package name as an alias to minimize the patch.

No functional change; this just eliminates a bunch of deprecation
warnings.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-01-31 16:51:09 -08:00
Kir Kolyshkin 6171da6005 libct/configs: add HookList.SetDefaultEnv
1. Make CommandHook.Command a pointer, which reduces the amount of data
   being copied when using hooks, and allows to modify command hooks.

2. Add SetDefaultEnv, which is to be used by the next commit.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2025-01-09 18:22:53 +08:00
Kir Kolyshkin 394f4c3b70 Re-add tun/tap to default device rules
Since v1.2.0 was released, a number of users complained that the removal
of tun/tap device access from the default device ruleset is causing a
regression in their workloads.

Additionally, it seems that some upper-level orchestration tools
(Docker Swarm, Kubernetes) makes it either impossible or cumbersome
to supply additional device rules.

While it's probably not quite right to have /dev/net/tun in a default
device list, it was there from the very beginning, and users rely on it.
Let's keep it there for the sake of backward compatibility.

This reverts commit 2ce40b6ad7.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2024-12-16 12:01:10 -08:00
Kir Kolyshkin a56f85f87b libct/*: switch from configs to cgroups
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2024-12-11 19:08:40 -08:00
Kir Kolyshkin 0de1953333 runc spec, libct/int: do not add ambient capabilities
Commit 98fe566c removed inheritable capabilities from the example spec
(used by runc spec) and from the libcontainer/integration test config,
but neglected to also remove ambient capabilities.

An ambient capability could only be set if the same inheritable
capability is set, so as a result of the above change ambient
capabilities were not set (but due to a bug in gocapability package,
those errors are never reported).

Once we start using a library with the fix [1], that bug will become
apparent (both bats-based and libct/int tests will fail).

[1]: https://github.com/kolyshkin/capability/pull/3

Fixes: 98fe566c ("runc: do not set inheritable capabilities")
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2024-09-25 21:48:29 -07:00
Kir Kolyshkin b1449fd510 libct: use Namespaces.IsPrivate more
In these cases, this is exactly what we want to find out.

Slightly improves performance and readability.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2024-09-17 22:49:29 -07:00
Sebastiaan van Stijn 30b530ca94 libct/userns: split userns detection from internal userns code
Commit 4316df8b53 isolated RunningInUserNS
to a separate package to make it easier to consume without bringing in
additional dependencies, and with the potential to move it separate in
a similar fashion as libcontainer/user was moved to a separate module
in commit ca32014adb. While RunningInUserNS
is fairly trivial to implement, it (or variants of this utility) is used
in many codebases, and moving to a separate module could consolidate
those implementations, as well as making it easier to consume without
large dependency trees (when being a package as part of a larger code
base).

Commit 1912d5988b and follow-ups introduced
cgo code into the userns package, and code introduced in those commits
are not intended for external use, therefore complicating the potential
of moving the userns package separate.

This commit moves the new code to a separate package; some of this code
was included in v1.1.11 and up, but I could not find external consumers
of `GetUserNamespaceMappings` and `IsSameMapping`. The `Mapping` and
`Handles` types (added in ba0b5e2698) only
exist in main and in non-stable releases (v1.2.0-rc.x), so don't need
an alias / deprecation.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2024-06-30 20:06:30 +02:00
utam0k bfbd0305ba Add I/O priority
Signed-off-by: utam0k <k0ma@utam0k.jp>
2024-03-30 22:31:54 +09:00
dependabot[bot] 606251ab33 build(deps): bump github.com/opencontainers/runtime-spec
Bumps [github.com/opencontainers/runtime-spec](https://github.com/opencontainers/runtime-spec) from 1.1.1-0.20230823135140-4fec88fd00a4 to 1.2.0.
- [Release notes](https://github.com/opencontainers/runtime-spec/releases)
- [Changelog](https://github.com/opencontainers/runtime-spec/blob/main/ChangeLog)
- [Commits](https://github.com/opencontainers/runtime-spec/commits/v1.2.0)

---
updated-dependencies:
- dependency-name: github.com/opencontainers/runtime-spec
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2024-03-07 14:43:33 +09:00
lfbzhm 371ff9c5e7 Merge pull request #3985 from cyphar/idmap-generic
libcontainer: remove all mount logic from nsexec
2023-12-18 13:10:45 +08:00
Aleksa Sarai 482e56379a configs: make id mappings int64 to better handle 32-bit
Using ints for all of our mapping structures means that a 32-bit binary
errors out when trying to parse /proc/self/*id_map:

  failed to cache mappings for userns: failed to parse uid_map of userns /proc/1/ns/user:
  parsing id map failed: invalid format in line "         0          0 4294967295": integer overflow on token 4294967295

This issue was unearthed by commit 1912d5988b ("*: actually support
joining a userns with a new container") but the underlying issue has
been present since the docker/libcontainer days.

In theory, switching to uint32 (to match the spec) instead of int64
would also work, but keeping everything signed seems much less
error-prone. It's also important to note that a mapping might be too
large for an int on 32-bit, so we detect this during the mapping.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2023-12-14 12:14:32 +11:00
Aleksa Sarai 3b57e45cbf mount: add support for ridmap and idmap
ridmap indicates that the id mapping should be applied recursively (only
really relevant for rbind mount entries), and idmap indicates that it
should not be applied recursively (the default). If no mappings are
specified for the mount, we use the userns configuration of the
container. This matches the behaviour in the currently-unreleased
runtime-spec.

This includes a minor change to the state.json serialisation format, but
because there has been no released version of runc with commit
fbf183c6f8 ("Add uid and gid mappings to mounts"), we can safely make
this change without affecting running containers. Doing it this way
makes it much easier to handle m.IsIDMapped() and indicating that a
mapping has been specified.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2023-12-14 11:36:42 +11:00
Aleksa Sarai 7795ca4668 specconv: handle recursive attribute clearing more consistently
If a user specifies a configuration like "rro, rrw", we should have
similar behaviour to "ro, rw" where we clear the previous flags so that
the last specified flag takes precendence.

Fixes: 382eba4354 ("Support recursive mount attrs ("rro", "rnosuid", "rnodev", ...)")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2023-12-14 11:36:42 +11:00
Aleksa Sarai ebcef3e651 specconv: temporarily allow userns path and mapping if they match
It turns out that the error added in commit 09822c3da8 ("configs:
disallow ambiguous userns and timens configurations") causes issues with
containerd and CRIO because they pass both userns mappings and a userns
path.

These configurations are broken, but to avoid the regression in this one
case, output a warning to tell the user that the configuration is
incorrect but we will continue to use it if and only if the configured
mappings are identical to the mappings of the provided namespace.

Fixes: 09822c3da8 ("configs: disallow ambiguous userns and timens configurations")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2023-12-10 20:49:43 +11:00
Aleksa Sarai 09822c3da8 configs: disallow ambiguous userns and timens configurations
For userns and timens, the mappings (and offsets, respectively) cannot
be changed after the namespace is first configured. Thus, configuring a
container with a namespace path to join means that you cannot also
provide configuration for said namespace. Previously we would silently
ignore the configuration (and just join the provided path), but we
really should be returning an error (especially when you consider that
the configuration userns mappings are used quite a bit in runc with the
assumption that they are the correct mapping for the userns -- but in
this case they are not).

In the case of userns, the mappings are also required if you _do not_
specify a path, while in the case of the time namespace you can have a
container with a timens but no mappings specified.

It should be noted that the case checking that the user has not
specified a userns path and a userns mapping needs to be handled in
specconv (as opposed to the configuration validator) because with this
patchset we now cache the mappings of path-based userns configurations
and thus the validator can't be sure whether the mapping is a cached
mapping or a user-specified one. So we do the validation in specconv,
and thus the test for this needs to be an integration test.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2023-12-05 17:46:09 +11:00
Aleksa Sarai 1912d5988b *: actually support joining a userns with a new container
Our handling for name space paths with user namespaces has been broken
for a long time. In particular, the need to parse /proc/self/*id_map in
quite a few places meant that we would treat userns configurations that
had a namespace path as if they were a userns configuration without
mappings, resulting in errors.

The primary issue was down to the id translation helper functions, which
could only handle configurations that had explicit mappings. Obviously,
when joining a user namespace we need to map the ids but figuring out
the correct mapping is non-trivial in comparison.

In order to get the mapping, you need to read /proc/<pid>/*id_map of a
process inside the userns -- while most userns paths will be of the form
/proc/<pid>/ns/user (and we have a fast-path for this case), this is not
guaranteed and thus it is necessary to spawn a process inside the
container and read its /proc/<pid>/*id_map files in the general case.

As Go does not allow us spawn a subprocess into a target userns,
we have to use CGo to fork a sub-process which does the setns(2). To be
honest, this is a little dodgy in regards to POSIX signal-safety(7) but
since we do no allocations and we are executing in the forked context
from a Go program (not a C program), it should be okay. The other
alternative would be to do an expensive re-exec (a-la nsexec which would
make several other bits of runc more complicated), or to use nsenter(1)
which might not exist on the system and is less than ideal.

Because we need to logically remap users quite a few times in runc
(including in "runc init", where joining the namespace is not feasable),
we cache the mapping inside the libcontainer config struct. A future
patch will make sure that we stop allow invalid user configurations
where a mapping is specified as well as a userns path to join.

Finally, add an integration test to make sure we don't regress this again.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2023-12-05 17:46:08 +11:00
Zheao.Li 98511bb40e linux: Support setting execution domain via linux personality
carry #3126

Co-authored-by: Aditya R <arajan@redhat.com>
Signed-off-by: Zheao.Li <me@manjusaka.me>
2023-10-27 19:33:37 +08:00
Aleksa Sarai 7c71a22705 rootfs: remove --no-mount-fallback and finally fix MS_REMOUNT
The original reasoning for this option was to avoid having mount options
be overwritten by runc. However, adding command-line arguments has
historically been a bad idea because it forces strict-runc-compatible
OCI runtimes to copy out-of-spec features directly from runc and these
flags are usually quite difficult to enable by users when using runc
through several layers of engines and orchestrators.

A far more preferable solution is to have a heuristic which detects
whether copying the original mount's mount options would override an
explicit mount option specified by the user. In this case, we should
return an error. You only end up in this path in the userns case, if you
have a bind-mount source with locked flags.

During the course of writing this patch, I discovered that several
aspects of our handling of flags for bind-mounts left much to be
desired. We have completely botched the handling of explicitly cleared
flags since commit 97f5ee4e6a ("Only remount if requested flags differ
from current"), with our behaviour only becoming increasingly more weird
with 50105de1d8 ("Fix failure with rw bind mount of a ro fuse") and
da780e4d27 ("Fix bind mounts of filesystems with certain options
set"). In short, we would only clear flags explicitly request by the
user purely by chance, in ways that it really should've been reported to
us by now. The most egregious is that mounts explicitly marked "rw" were
actually mounted "ro" if the bind-mount source was "ro" and no other
special flags were included. In addition, our handling of atime was
completely broken -- mostly due to how subtle the semantics of atime are
on Linux.

Unfortunately, while the runtime-spec requires us to implement
mount(8)'s behaviour, several aspects of the util-linux mount(8)'s
behaviour are broken and thus copying them makes little sense. Since the
runtime-spec behaviour for this case (should mount options for a "bind"
mount use the "mount --bind -o ..." or "mount --bind -o remount,..."
semantics? Is the fallback code we have for userns actually
spec-compliant?) and the mount(8) behaviour (see [1]) are not
well-defined, this commit simply fixes the most obvious aspects of the
behaviour that are broken while keeping the current spirit of the
implementation.

NOTE: The handling of atime in the base case is left for a future PR to
deal with. This means that the atime of the source mount will be
silently left alone unless the fallback path needs to be taken, and any
flags not explicitly set will be cleared in the base case. Whether we
should always be operating as "mount --bind -o remount,..." (where we
default to the original mount source flags) is a topic for a separate PR
and (probably) associated runtime-spec PR.

So, to resolve this:

* We store which flags were explicitly requested to be cleared by the
  user, so that we can detect whether the userns fallback path would end
  up setting a flag the user explicitly wished to clear. If so, we
  return an error because we couldn't fulfil the configuration settings.

* Revert 97f5ee4e6a ("Only remount if requested flags differ from
  current"), as missing flags do not mean we can skip MS_REMOUNT (in
  fact, missing flags are how you indicate a flag needs to be cleared
  with mount(2)). The original purpose of the patch was to fix the
  userns issue, but as mentioned above the correct mechanism is to do a
  fallback mount that copies the lockable flags from statfs(2).

* Improve handling of atime in the fallback case by:
    - Correctly handling the returned flags in statfs(2).
    - Implement the MNT_LOCK_ATIME checks in our code to ensure we
      produce errors rather than silently producing incorrect atime
      mounts.

* Improve the tests so we correctly detect all of these contingencies,
  including a general "bind-mount atime handling" test to ensure that
  the behaviour described here is accurate.

This change also inlines the remount() function -- it was only ever used
for the bind-mount remount case, and its behaviour is very bind-mount
specific.

[1]: https://github.com/util-linux/util-linux/issues/2433

Reverts: 97f5ee4e6a ("Only remount if requested flags differ from current")
Fixes: 50105de1d8 ("Fix failure with rw bind mount of a ro fuse")
Fixes: da780e4d27 ("Fix bind mounts of filesystems with certain options set")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2023-10-24 17:28:25 +11:00
utam0k 770728e16e Support process.scheduler
Spec: https://github.com/opencontainers/runtime-spec/pull/1188
Fix: https://github.com/opencontainers/runc/issues/3895

Co-authored-by: lifubang <lifubang@acmcoder.com>
Signed-off-by: utam0k <k0ma@utam0k.jp>
Signed-off-by: lifubang <lifubang@acmcoder.com>
2023-10-04 15:53:18 +08:00
Eng Zer Jun c378602bf6 libct/specconv: remove redundant nil check
From the Go specification:

  "1. For a nil slice, the number of iterations is 0." [1]

Therefore, an additional nil check for before the loop is unnecessary.

[1]: https://go.dev/ref/spec#For_range

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2023-09-08 23:40:51 +08:00
Kailun Qin e1584831b6 libct/cg: add CFS bandwidth burst for CPU
Burstable CFS controller is introduced in Linux 5.14. This helps with
parallel workloads that might be bursty. They can get throttled even
when their average utilization is under quota. And they may be latency
sensitive at the same time so that throttling them is undesired.

This feature borrows time now against the future underrun, at the cost
of increased interference against the other system users, by introducing
cfs_burst_us into CFS bandwidth control to enact the cap on unused
bandwidth accumulation, which will then used additionally for burst.

The patch adds the support/control for CFS bandwidth burst.

runtime-spec: https://github.com/opencontainers/runtime-spec/pull/1120

Co-authored-by: Akihiro Suda <suda.kyoto@gmail.com>
Co-authored-by: Nadeshiko Manju <me@manjusaka.me>
Signed-off-by: Kailun Qin <kailun.qin@intel.com>
2023-09-06 23:23:30 +08:00
Kir Kolyshkin f62f0bdfbf Remove nolint annotations for unix errno comparisons
golangci-lint v1.54.2 comes with errorlint v1.4.4, which contains
the fix [1] whitelisting all errno comparisons for errors coming from
x/sys/unix.

Thus, these annotations are no longer necessary. Hooray!

[1] https://github.com/polyfloyd/go-errorlint/pull/47
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2023-08-24 17:28:10 -07:00
Aleksa Sarai 9acfd7b1a3 timens: minor cleanups
Fix up a few things that were flagged in the review of the original
timens PR, namely around error handling and validation.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2023-08-10 18:59:55 +10:00
Chethan Suresh ebc2e7c435 Support time namespace
"time" namespace was introduced in Linux v5.6
support new time namespace to set boottime and monotonic time offset

Example runtime spec

"timeOffsets": {
    "monotonic": {
        "secs": 172800,
        "nanosecs": 0
    },
    "boottime": {
        "secs": 604800,
        "nanosecs": 0
    }
}

Signed-off-by: Chethan Suresh <chethan.suresh@sony.com>
2023-08-03 10:12:01 +05:30
Ruediger Pluem da780e4d27 Fix bind mounts of filesystems with certain options set
Currently bind mounts of filesystems with nodev, nosuid, noexec,
noatime, relatime, strictatime, nodiratime options set fail in rootless
mode if the same options are not set for the bind mount.
For ro filesystems this was resolved by #2570 by remounting again
with ro set.

Follow the same approach for nodev, nosuid, noexec, noatime, relatime,
strictatime, nodiratime but allow to revert back to the old behaviour
via the new `--no-mount-fallback` command line option.

Add a testcase to verify that bind mounts of filesystems with nodev,
nosuid, noexec, noatime options set work in rootless mode.
Add a testcase that mounts a nodev, nosuid, noexec, noatime filesystem
with a ro flag.
Add two further testcases that ensure that the above testcases would
fail if the `--no-mount-fallback` command line option is set.

* contrib/completions/bash/runc:
      Add `--no-mount-fallback` command line option for bash completion.

* create.go:
      Add `--no-mount-fallback` command line option.

* restore.go:
      Add `--no-mount-fallback` command line option.

* run.go:
      Add `--no-mount-fallback` command line option.

* libcontainer/configs/config.go:
      Add `NoMountFallback` field to the `Config` struct to store
      the command line option value.

* libcontainer/specconv/spec_linux.go:
      Add `NoMountFallback` field to the `CreateOpts` struct to store
      the command line option value and store it in the libcontainer
      config.

* utils_linux.go:
      Store the command line option value in the `CreateOpts` struct.

* libcontainer/rootfs_linux.go:
      In case that `--no-mount-fallback` is not set try to remount the
      bind filesystem again with the options nodev, nosuid, noexec,
      noatime, relatime, strictatime or nodiratime if they are set on
      the source filesystem.

* tests/integration/mounts_sshfs.bats:
      Add testcases and rework sshfs setup to allow specifying
      different mount options depending on the test case.

Signed-off-by: Ruediger Pluem <ruediger.pluem@vodafone.com>
2023-07-28 16:32:02 -07:00
Francis Laniel c47f58c4e9 Capitalize [UG]idMappings as [UG]IDMappings
Signed-off-by: Francis Laniel <flaniel@linux.microsoft.com>
2023-07-21 13:55:34 +02:00
Rodrigo Campos 4b668a8224 Switch setupUserNamespace() to use the toConfigIDMap() helper
We just added this helper for other parts of the code, let's switch this
function to use the helper too.

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2023-07-11 16:17:48 +02:00
Rodrigo Campos fbf183c6f8 Add uid and gid mappings to mounts
Co-authored-by: Francis Laniel <flaniel@linux.microsoft.com>
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2023-07-11 16:17:48 +02:00
utam0k d9230602e9 Implement to set a domainname
opencontainers/runtime-spec#1156

Signed-off-by: utam0k <k0ma@utam0k.jp>
2023-04-12 13:31:20 +00:00
Akihiro Suda 92a4ccb84e specconv: avoid mapping "acl" to MS_POSIXACL
From https://github.com/torvalds/linux/commit/caaef1ba8c9ee7a54b53dd8bf4bb7e8658185583 :

> In fact SB_POSIXACL is an internal flag, and accepting MS_POSIXACL on
> the mount(2) interface is possibly a bug.

Fix issue 3738

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2023-02-11 22:50:41 +09:00