Two new seccomp actions have been added to the libseccomp-golang
dependency, which can be now supported by runc, too.
ActKillThread kills the thread that violated the rule. It is the same as
ActKill. All other threads from the same thread group will continue to
execute.
ActKillProcess kills the process that violated the rule. All threads in
the thread group are also terminated. This action is only usable when
libseccomp API level 3 or higher is supported.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
This commit implements support for the SCMP_ACT_NOTIFY action. It
requires libseccomp-2.5.0 to work but runc still works with older
libseccomp if the seccomp policy does not use the SCMP_ACT_NOTIFY
action.
A new synchronization step between runc[INIT] and runc run is introduced
to pass the seccomp fd. runc run fetches the seccomp fd with pidfd_get
from the runc[INIT] process and sends it to the seccomp agent using
SCM_RIGHTS.
As suggested by @kolyshkin, we also make writeSync() a wrapper of
writeSyncWithFd() and wrap the error there. To avoid pointless errors,
we made some existing code paths just return the error instead of
re-wrapping it. If we don't do it, error will look like:
writing syncT <act>: writing syncT: <err>
By adjusting the code path, now they just look like this
writing syncT <act>: <err>
Signed-off-by: Alban Crequy <alban@kinvolk.io>
Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
Co-authored-by: Rodrigo Campos <rodrigo@kinvolk.io>
Go 1.17 introduce this new (and better) way to specify build tags.
For more info, see https://golang.org/design/draft-gobuild.
As a way to seamlessly switch from old to new build tags, gofmt (and
gopls) from go 1.17 adds the new tags along with the old ones.
Later, when go < 1.17 is no longer supported, the old build tags
can be removed.
Now, as I started to use latest gopls (v0.7.1), it adds these tags
while I edit. Rather than to randomly add new build tags, I guess
it is better to do it once for all files.
Mind that previous commits removed some tags that were useless,
so this one only touches packages that can at least be built
on non-linux.
Brought to you by
go1.17 fmt ./...
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
All the errors returned from Validate should tell about a configuration
error. Some were lacking a context, so add it.
While at it, fix abusing fmt.Errorf and logrus.Warnf where the argument
do not contain %-style formatting.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This is helpful to kubernetes in cases it knows for sure that the freeze
is not required (since it created the systemd unit with no device
restrictions).
As the code is trivial, no tests are required.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Handle ClosID parameter of IntelRdt. Makes it possible to use
pre-configured classes/ClosIDs and avoid running out of available IDs
which easily happens with per-container classes.
Remove validator checks for empty L3CacheSchema and MemBwSchema fields
in order to be able to leave them empty, and only specify ClosID for
a pre-configured class.
Signed-off-by: Markus Lehtonen <markus.lehtonen@intel.com>
This was initially added by commits 41d9d26513 and 4a8f0b4db4,
apparently to implement docker run --cgroup container:ID, which was
never merged. Therefore, this code is not and was never used.
It needs to be removed mainly because having it makes it much harder to
understand how cgroup manager works (because with this in place we have
not one or two but three sets of cgroup paths to think about).
Note if the paths are known and there is a need to add a PID to existing
cgroup, cgroup manager is not needed at all -- something like
cgroups.WriteCgroupProc or cgroups.EnterPid is sufficient (and the
latter is what runc exec uses in (*setnsProcess).start).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Replace ioutil.TempDir (mostly) with t.TempDir, which require no
explicit cleanup.
While at it, fix incorrect usage of os.ModePerm in libcontainer/intelrdt
test. This is supposed to be a mask, not mode bits.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This should result in no change when the error is printed, but make the
errors returned unwrappable, meaning errors.As and errors.Is will work.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Using fmt.Errorf for errors that do not have %-style formatting
directives is an overkill. Switch to errors.New.
Found by
git grep fmt.Errorf | grep -v ^vendor | grep -v '%'
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Commits 1f1e91b1a0 and 2192670a24
added validation for mountpoints to be an absolute path, to match the OCI
specs.
Unfortunately, the old behavior (accepting the path to be a relative path)
has been around for a long time, and although "not according to the spec",
various higher level runtimes rely on this behavior.
While higher level runtime have been updated to address this requirement,
there will be a transition period before all runtimes are updated to carry
these fixes.
This patch relaxes the validation, to generate a WARNING instead of failing,
allowing runtimes to update (but allowing them to update runc to the current
version, which includes security fixes).
We can remove this exception in a future patch release.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
These were deprecated and moved; the stubs were included in the
last two (rc94, rc95) releases, so external consumers would have
the chance to update their code.
Removing this so that this doesn't get into v1.0.0 GA
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The runc update CLI is not able to modify devices, so let's set SkipDevices
(so that a cgroup controller won't try to update devices cgroup).
This helps use cases when some other device management (NVIDIA GPUs)
applies its configuration on top of what runc does.
Make sure we do not save SkipDevices into state.json.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
gofumpt (mvdan.cc/gofumpt) is a fork of gofmt with stricter rules.
Brought to you by
git ls-files \*.go | grep -v ^vendor/ | xargs gofumpt -s -w
Looking at the diff, all these changes make sense.
Also, replace gofmt with gofumpt in golangci.yml.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This is somewhat radical approach to deal with kernel memory.
Per-cgroup kernel memory limiting was always problematic. A few
examples:
- older kernels had bugs and were even oopsing sometimes (best example
is RHEL7 kernel);
- kernel is unable to reclaim the kernel memory so once the limit is
hit a cgroup is toasted;
- some kernel memory allocations don't allow failing.
In addition to that,
- users don't have a clue about how to set kernel memory limits
(as the concept is much more complicated than e.g. [user] memory);
- different kernels might have different kernel memory usage,
which is sort of unexpected;
- cgroup v2 do not have a [dedicated] kmem limit knob, and thus
runc silently ignores kernel memory limits for v2;
- kernel v5.4 made cgroup v1 kmem.limit obsoleted (see
https://github.com/torvalds/linux/commit/0158115f702b).
In view of all this, and as the runtime-spec lists memory.kernel
and memory.kernelTCP as OPTIONAL, let's ignore kernel memory
limits (for cgroup v1, same as we're already doing for v2).
This should result in less bugs and better user experience.
The only bad side effect from it might be that stat can show kernel
memory usage as 0 (since the accounting is not enabled).
[v2: add a warning in specconv that limits are ignored]
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Commit ccdd75760c introduced the HookName type
for hooks, but only set this type on the Prestart const, but not for the
other hooks.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Add some minimal validation for cgroups. The following checks
are implemented:
- cgroup name and/or prefix (or path) is set;
- for cgroup v1, unified resources are not set;
- for cgroup v2, if memorySwap is set, memory is also set,
and memorySwap > memory.
This makes some invalid configurations fail earlier (before runc init
is started), which is better.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
As reported by go test -race ./libcontainer/configs:
=== RUN TestCommandHookRunTimeout
==================
WARNING: DATA RACE
Read at 0x00c000202230 by goroutine 23:
os/exec.(*Cmd).Wait()
/usr/lib/golang/src/os/exec/exec.go:502 +0x91
github.com/opencontainers/runc/libcontainer/configs.Command.Run()
/home/kir/go/src/github.com/opencontainers/runc/libcontainer/configs/config.go:390 +0x58c
github.com/opencontainers/runc/libcontainer/configs_test.TestCommandHookRunTimeout()
/home/kir/go/src/github.com/opencontainers/runc/libcontainer/configs/config_test.go:223 +0x3ed
testing.tRunner()
/usr/lib/golang/src/testing/testing.go:1123 +0x202
Previous write at 0x00c000202230 by goroutine 27:
os/exec.(*Cmd).Wait()
/usr/lib/golang/src/os/exec/exec.go:505 +0xb4
github.com/opencontainers/runc/libcontainer/configs.Command.Run.func1()
/home/kir/go/src/github.com/opencontainers/runc/libcontainer/configs/config.go:373 +0x55
Goroutine 23 (running) created at:
testing.(*T).Run()
/usr/lib/golang/src/testing/testing.go:1168 +0x5bb
testing.runTests.func1()
/usr/lib/golang/src/testing/testing.go:1439 +0xa6
testing.tRunner()
/usr/lib/golang/src/testing/testing.go:1123 +0x202
testing.runTests()
/usr/lib/golang/src/testing/testing.go:1437 +0x612
testing.(*M).Run()
/usr/lib/golang/src/testing/testing.go:1345 +0x3b3
main.main()
_testmain.go:69 +0x236
Goroutine 27 (running) created at:
github.com/opencontainers/runc/libcontainer/configs.Command.Run()
/home/kir/go/src/github.com/opencontainers/runc/libcontainer/configs/config.go:372 +0x415
github.com/opencontainers/runc/libcontainer/configs_test.TestCommandHookRunTimeout()
/home/kir/go/src/github.com/opencontainers/runc/libcontainer/configs/config_test.go:223 +0x3ed
testing.tRunner()
/usr/lib/golang/src/testing/testing.go:1123 +0x202
==================
testing.go:1038: race detected during execution of test
--- FAIL: TestCommandHookRunTimeout (0.10s)
Apparently, the issue is we call two Wait()s for the same command
which can race internally.
Fix is easy -- since we already have a waiting goroutine,
wait for it to return instead of calling a second Wait().
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Test that CommandHook actually executes a new process with the given env
variables, parameters and json state.
This commit also solves an issue with the previous approach that was calling
'os.Exit(0)' failing to signal test failures.
Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>
In case many net.* sysctls are provided, and we're not running
in the host netns, the function keep repeating isNetNS check
for every such sysctl. This is a waste of resources.
Do the isNetNS check only once, and only if needed.
Note that using sync.Once() is not really needed here; we could
have used a boolean variable to skip the repeated check, but
it looks more idiomatic that way.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
In case nsfs mount (such as /run/docker/netns/xxxx) is provided as
the netns path, the current way of determining whether path is of
host netns or not is not working.
The proper way to check is to do stat(2) and compare dev_t and
inode fields, which is what this commit does.
This is a minimal fix which does not try to optimize repeated
check in case more than one net.* sysctl is given and there is
no error.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Move the Device-related types to libcontainer/devices, so that
the package can be used in isolation. Aliases have been created
in libcontainer/configs for backward compatibility.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>