Commit Graph

38 Commits

Author SHA1 Message Date
Kir Kolyshkin 0079bee17f Support specs.LinuxSeccompFlagWaitKillableRecv
This adds support for WaitKillableRecv seccomp flag
(also known as SCMP_FLTATR_CTL_WAITKILL in libseccomp and
as SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV in the kernel).

This requires:
 - libseccomp >= 2.6.0
 - libseccomp-golang >= 0.11.0
 - linux kernel >= 5.19

Note that this flag does not make sense without NEW_LISTENER, and
the kernel returns EINVAL when SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV
is set but SECCOMP_FILTER_FLAG_NEW_LISTENER is not set.

For runc this means that .linux.seccomp.listenerPath should also be set,
and some of the seccomp rules should have SCMP_ACT_NOTIFY action. This
is why the flag is tested separately in seccomp-notify.bats.

At the moment the only adequate CI environment for this functionality is
Fedora 43. On all other platforms (including CentOS 10 and Ubuntu 24.04)
it is skipped similar to this:

> ok 251 runc run [seccomp] (SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV) # skip requires libseccomp >= 2.6.0 and API level >= 7 (current version: 2.5.6, API level: 6)

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2026-03-16 10:48:42 -07:00
Akihiro Suda e7848482e2 Revert "libcontainer: seccomp: pass around *os.File for notifyfd"
This reverts commit 20b95f23ca.

> Conflicts:
>	libcontainer/init_linux.go

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2024-07-03 17:28:12 +09:00
Sebastiaan van Stijn c14213399a remove pre-go1.17 build-tags
Removed pre-go1.17 build-tags with go fix;

    go fix -mod=readonly ./...

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2024-06-29 15:45:25 +02:00
Aleksa Sarai 20b95f23ca libcontainer: seccomp: pass around *os.File for notifyfd
*os.File is correctly tracked by the garbage collector, and there's no
need to use raw file descriptors for this code.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2023-08-15 19:54:24 -07:00
Kir Kolyshkin 19a9d9fc9e tests/int: use runc features in seccomp flags test
This test (initially added by commit 58ea21daef and later amended in
commit 26dc55ef1a) currently has two major deficiencies:

1. All possible flag combinations, and their respective numeric values,
   have to be explicitly listed. Currently we support 3 flags, so
   there is only 2^3 - 1 = 7 combinations, but adding more flags will
   become increasingly difficult (for example, 5 flags will result in
   31 combinations).

2. The test requires kernel 4.17 (for SECCOMP_FILTER_FLAG_SPEC_ALLOW),
   and not doing any tests when running on an older kernel. This, too,
   will make it more difficult to add extra flags in the future.

Both issues can be solved by using runc features which now prints all
known and supported runc flags. We still have to hardcode the numeric
values of all flags, but most of the other work is coded now.

In particular:

 * The test only uses supported flags, meaning it can be used with
   older kernels, removing the limitation (2) above.

 * The test calculates the powerset (all possible combinations) of
   flags and their numeric values. This makes it easier to add more
   flags, removing the limitation (1) above.

 * The test will fail (in flags_value) if any new flags will be added
   to runc but the test itself is not amended.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-11-29 17:24:32 -08:00
Kir Kolyshkin 076745a40f runc features: add seccomp filter flags
Amend runc features to print seccomp flags. Two set of flags are added:
 * known flags are those that this version of runc is aware of;
 * supported flags are those that can be set; normally, this is the same
   set as known flags, but due to older version of kernel and/or
   libseccomp, some known flags might be unsupported.

This commit also consolidates three different switch statements dealing
with flags into one, in func setFlag. A note is added to this function
telling what else to look for when adding new flags.

Unfortunately, it also adds a list of known flags, that should be
kept in sync with the switch statement.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-11-29 17:24:32 -08:00
Kir Kolyshkin b265d1288f libct/seccomp: enable binary tree optimization
This makes libseccomp produce a BPF which uses a binary tree for
syscalls (instead of linear set of if statements).

It does not make sense to enable binary tree for small set of rules,
so don't do that if we have less than 8 syscalls (the number is chosen
arbitrarily).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-10-31 16:59:49 -07:00
Kir Kolyshkin c7dc8b1fed libct/seccomp/patchbpf: support SPEC_ALLOW
Commit 58ea21daef added support for seccomp flags such as
SPEC_ALLOW, but it does not work as expected, because since commit
7a8d7162f9 we do not use libseccomp-golang's Load(), but
handle flags separately in patchbfp.

This fixes setting SPEC_ALLOW flag.

Add a comment to not forget to amend filterFlags when adding new flags.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2022-08-29 15:48:10 -07:00
Alban Crequy 58ea21daef seccomp: add support for flags
List of seccomp flags defined in runtime-spec:
* SECCOMP_FILTER_FLAG_TSYNC
* SECCOMP_FILTER_FLAG_LOG
* SECCOMP_FILTER_FLAG_SPEC_ALLOW

Note that runc does not apply SECCOMP_FILTER_FLAG_TSYNC. It does not
make sense to apply the seccomp filter on only one thread; other threads
will be terminated after exec anyway.

See similar commit in crun:
https://github.com/containers/crun/commit/fefabffa2816ea343068ed036a86944393db189a

Note that SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV (introduced by
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?id=c2aa2dfef243
in Linux 5.19-rc1) is not added yet because Linux 5.19 is not released
yet.

Signed-off-by: Alban Crequy <albancrequy@microsoft.com>
2022-07-28 16:25:26 +02:00
CrazyMax 29a56b5206 fix deprecated ActKill
Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
2022-05-03 13:15:01 +02:00
Akihiro Suda 520702dac5 Add runc features command
Fix issue 3274

See `types/features/features.go`.

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2021-11-30 16:40:39 +09:00
Sascha Grunert 4a4d4f109b Add support for seccomp actions ActKillThread and ActKillProcess
Two new seccomp actions have been added to the libseccomp-golang
dependency, which can be now supported by runc, too.

ActKillThread kills the thread that violated the rule. It is the same as
ActKill. All other threads from the same thread group will continue to
execute.

ActKillProcess kills the process that violated the rule. All threads in
the thread group are also terminated. This action is only usable when
libseccomp API level 3 or higher is supported.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2021-09-09 17:47:00 +10:00
Aleksa Sarai 4a751b05a9 seccomp: drop unnecessary const SCMP_ACT_* defines
These are just boilerplate and are only really useful for the two
actions which require us to set a default errno/aux value (ActErrno and
ActTrace).

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2021-09-09 17:46:34 +10:00
Alban Crequy 2b025c0173 Implement Seccomp Notify
This commit implements support for the SCMP_ACT_NOTIFY action. It
requires libseccomp-2.5.0 to work but runc still works with older
libseccomp if the seccomp policy does not use the SCMP_ACT_NOTIFY
action.

A new synchronization step between runc[INIT] and runc run is introduced
to pass the seccomp fd. runc run fetches the seccomp fd with pidfd_get
from the runc[INIT] process and sends it to the seccomp agent using
SCM_RIGHTS.

As suggested by @kolyshkin, we also make writeSync() a wrapper of
writeSyncWithFd() and wrap the error there. To avoid pointless errors,
we made some existing code paths just return the error instead of
re-wrapping it. If we don't do it, error will look like:

	writing syncT <act>: writing syncT: <err>

By adjusting the code path, now they just look like this
	writing syncT <act>: <err>

Signed-off-by: Alban Crequy <alban@kinvolk.io>
Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
Co-authored-by: Rodrigo Campos <rodrigo@kinvolk.io>
2021-09-07 13:04:24 +02:00
Kir Kolyshkin d8da00355e *: add go-1.17+ go:build tags
Go 1.17 introduce this new (and better) way to specify build tags.
For more info, see https://golang.org/design/draft-gobuild.

As a way to seamlessly switch from old to new build tags, gofmt (and
gopls) from go 1.17 adds the new tags along with the old ones.

Later, when go < 1.17 is no longer supported, the old build tags
can be removed.

Now, as I started to use latest gopls (v0.7.1), it adds these tags
while I edit. Rather than to randomly add new build tags, I guess
it is better to do it once for all files.

Mind that previous commits removed some tags that were useless,
so this one only touches packages that can at least be built
on non-linux.

Brought to you by

        go1.17 fmt ./...

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-08-30 20:58:22 -07:00
Kir Kolyshkin 9ff64c3d97 *: rm redundant linux build tag
For files that end with _linux.go or _linux_test.go, there is no need to
specify linux build tag, as it is assumed from the file name.

In addition, rename libcontainer/notify_linux_v2.go -> libcontainer/notify_v2_linux.go
for the file name to make sense.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-08-30 20:15:00 -07:00
Kir Kolyshkin 5dd92fd9b4 libct/seccomp: skip redundant rules
This fixes using runc with podman on my system (Fedora 34).

> $ podman --runtime `pwd`/runc run --rm --memory 4M fedora echo it works
> Error: unable to start container process: error adding seccomp filter rule for syscall bdflush: permission denied: OCI permission denied

The problem is, libseccomp returns EPERM when a redundant rule (i.e. the
rule with the same action as the default one) is added, and podman (on
my machine) sets the following rules in config.json:

    <....>
    "seccomp": {
      "defaultAction": "SCMP_ACT_ERRNO",
      "architectures": [
        "SCMP_ARCH_X86_64",
        "SCMP_ARCH_X86",
        "SCMP_ARCH_X32"
      ],
      "syscalls": [
        {
          "names": [
            "bdflush",
            "io_pgetevents",
            <....>
          ],
          "action": "SCMP_ACT_ERRNO",
          "errnoRet": 1
        },
        <....>

(Note that defaultErrnoRet is not set, but it defaults to 1).

With this commit, it works:

> $ podman --runtime `pwd`/runc run --memory 4M fedora echo it works
> it works

Add an integration test (that fails without the fix).

Similar crun commit:
 * https://github.com/containers/crun/commit/08229f3fb904c5ea19a7d9

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-07-27 00:04:59 -07:00
Kir Kolyshkin e44bee1026 libct/seccomp: warn about unknown syscalls
Rather than silently ignoring unknown syscalls, print a warning.

While at it, fix imports ordering (stdlib, others, ours).

[v2: demote Warn to Debug]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-07-27 00:04:59 -07:00
Kir Kolyshkin 7be93a66b9 *: fmt.Errorf: use %w when appropriate
This should result in no change when the error is printed, but make the
errors returned unwrappable, meaning errors.As and errors.Is will work.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-22 16:09:47 -07:00
Giuseppe Scrivano c61f606254 libcontainer: honor seccomp defaultErrnoRet
https://github.com/opencontainers/runtime-spec/pull/1087 added support
for defaultErrnoRet to the OCI runtime specs.

If a defaultErrnoRet is specified, disable patching the generated
libseccomp cBPF.

Closes: https://github.com/opencontainers/runc/issues/2943

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2021-05-17 09:23:32 +02:00
Kir Kolyshkin ac93746c4d libct/seccomp: rm IsEnabled
seccomp.IsEnabled is not well defined (the presence of Seccomp: field
in /proc/self/status does not tell us whether CONFIG_SECCOMP_FILTER
is enabled in the kernel; parsing all keys in /proc/self/status is a
moderate waste of resources, etc).

I traced its addition back to [1] and even in there it is not clear
what for it was added. There were never an internal user (except
for the recently added one, removed by the previous commit), and
can't find any external users (but found two copy-pastes of this
code, suffering from the same problems, see [2] and [3]).

Since it is broken and has no users, remove it.

[1] https://github.com/opencontainers/runc/pull/471
[2] https://github.com/containerd/containerd/blob/master/pkg/seccomp/seccomp_linux.go
[3] https://github.com/containers/common/blob/master/pkg/seccomp/supported.go

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-03-23 16:59:46 -07:00
Aleksa Sarai 7a8d7162f9 seccomp: prepend -ENOSYS stub to all filters
Having -EPERM is the default was a fairly significant mistake from a
future-proofing standpoint in that it makes any new syscall return a
non-ignorable error (from glibc's point of view). We need to correct
this now because faccessat2(2) is something glibc critically needs to
have support for, but they're blocked on container runtimes because we
return -EPERM unconditionally (leading to confusion in glibc). This is
also a problem we're probably going to keep running into in the future.

Unfortunately there are several issues which stop us from having a clean
solution to this problem:

 1. libseccomp has several limitations which require us to emulate
    behaviour we want:

    a. We cannot do logic based on syscall number, meaning we cannot
       specify a "largest known syscall number";
    b. libseccomp doesn't know in which kernel version a syscall was
       added, and has no API for "minimum kernel version" so we cannot
       simply ask libseccomp to generate sane -ENOSYS rules for us.
    c. Additional seccomp rules for the same syscall are not treated as
       distinct rules -- if rules overlap, seccomp will merge them. This
       means we cannot add per-syscall -EPERM fallbacks;
    d. There is no inverse operation for SCMP_CMP_MASKED_EQ;
    e. libseccomp does not allow you to specify multiple rules for a
       single argument, making it impossible to invert OR rules for
       arguments.

 2. The runtime-spec does not have any way of specifying:

    a. The errno for the default action;
    b. The minimum kernel version or "newest syscall at time of profile
       creation"; nor
    c. Which syscalls were intentionally excluded from the allow list
       (weird syscalls that are no longer used were excluded entirely,
       but Docker et al expect those syscalls to get EPERM not ENOSYS).

 3. Certain syscalls should not return -ENOSYS (especially only for
    certain argument combinations) because this could also trigger glibc
    confusion. This means we have to return -EPERM for certain syscalls
    but not as a global default.

 4. There is not an obvious (and reasonable) upper limit to syscall
    numbers, so we cannot create a set of rules for each syscall above
    the largest syscall number in libseccomp. This means we must handle
    inverse rules as described below.

 5. Any syscall can be specified multiple times, which can make
    generation of hotfix rules much harder.

As a result, we have to work around all of these things by coming up
with a heuristic to stop the bleeding. In the future we could hopefully
improve the situation in the runtime-spec and libseccomp.

The solution applied here is to prepend a "stub" filter which returns
-ENOSYS if the requested syscall has a larger syscall number than any
syscall mentioned in the filter. The reason for this specific rule is
that syscall numbers are (roughly) allocated sequentially and thus newer
syscalls will (usually) have a larger syscall number -- thus causing our
filters to produce -ENOSYS if the filter was written before the syscall
existed.

Sadly this is not a perfect solution because syscalls can be added
out-of-order and the syscall table can contain holes for several
releases. Unfortuntely we do not have a nicer solution at the moment
because there is no library which provides information about which Linux
version a syscall was introduced in. Until that exists, this workaround
will have to be good enough.

The above behaviour only happens if the default action is a blocking
action (in other words it is not SCMP_ACT_LOG or SCMP_ACT_ALLOW). If the
default action is permissive then we don't do any patching.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2021-01-28 23:11:22 +11:00
Akihiro Suda 6249136a29 add libseccomp version to runc --version
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-08-08 04:56:29 +09:00
Giuseppe Scrivano 41aa19662b libcontainer: honor seccomp errnoRet
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-05-20 09:11:55 +02:00
John Hwang 7fc291fd45 Replace formatted errors when unneeded
Signed-off-by: John Hwang <John.F.Hwang@gmail.com>
2020-05-16 18:13:21 -07:00
blacktop 84373aaa56 Add SCMP_ACT_LOG as a valid Seccomp action (#1951)
Signed-off-by: blacktop <blacktop@users.noreply.github.com>
2019-09-26 11:03:03 -04:00
Matthew Heon e9193ba6e6 Fix breaking change in Seccomp profile behavior
Multiple conditions were previously allowed to be placed upon the
same syscall argument. Restore this behavior.

Signed-off-by: Matthew Heon <mheon@redhat.com>
2017-10-18 11:53:56 -04:00
Tobias Klauser a380fae959 libcontainer: use Prctl() from x/sys/unix
Use unix.Prctl() instead of manually reimplementing it using
unix.RawSyscall. Also use unix.SECCOMP_MODE_FILTER instead of locally
defining it.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
2017-07-10 10:56:58 +02:00
Christy Perez 3d7cb4293c Move libcontainer to x/sys/unix
Since syscall is outdated and broken for some architectures,
use x/sys/unix instead.

There are still some dependencies on the syscall package that will
remain in syscall for the forseeable future:

Errno
Signal
SysProcAttr

Additionally:
- os still uses syscall, so it needs to be kept for anything
returning *os.ProcessState, such as process.Wait.

Signed-off-by: Christy Perez <christy@linux.vnet.ibm.com>
2017-05-22 17:35:20 -05:00
Wang Long 3a71eb0256 move error check out of the for loop
The `bufio.Scanner.Scan` method returns false either by reaching the
end of the input or an error. After Scan returns false, the Err method
will return any error that occurred during scanning, except that if it
was io.EOF, Err will return nil.

We should check the error when Scan return false(out of the for loop).

Signed-off-by: Wang Long <long.wanglong@huawei.com>
2017-01-18 05:02:39 +00:00
Michael Crosby 8873ac26a5 Remove log from seccomp package
Logging in packages is bad, mkay.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-03-25 14:21:30 -07:00
Michael Crosby 9c41e8388c Handle seccomp proc parsing errors
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-01-19 11:43:49 -08:00
Jessica Frazelle 41edbeb25e add seccomp.IsEnabled() function
This is much like apparmor.IsEnabled() function and a nice helper.

Signed-off-by: Jessica Frazelle <acidburn@docker.com>
2016-01-18 10:44:31 -08:00
Michael Crosby caca840972 Add seccomp trace support
Closes #347

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-11-12 17:03:53 -08:00
Matthew Heon 795a6c9702 Libcontainer: Add support for multiple architectures in Seccomp
This commit allows additional architectures to be added to Seccomp filters
created by containers. This allows containers to make syscalls using these
architectures. For example, in a container on an AMD64 system, only AMD64
syscalls would be usable unless x86 was added to the filter using this patch,
which would allow both 32-bit and 64-bit syscalls to be used.

Signed-off-by: Matthew Heon <mheon@redhat.com>
2015-09-23 13:54:24 -04:00
Michael Crosby a8e0185d97 Add seccomp build tag
Add a seccomp build tag and also support in the Makefile to add or
remove build tags.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-09-11 12:03:57 -07:00
Matthew Heon a6b73dbc73 Remove Seccomp build tag to fix godep
Signed-off-by: Matthew Heon <mheon@redhat.com>
2015-08-13 15:23:43 -04:00
Matthew Heon 2ae581ae62 Convert Seccomp support to use Libseccomp
This removes the existing, native Go seccomp filter generation and replaces it
with Libseccomp. Libseccomp is a C library which provides architecture
independent generation of Seccomp filters for the Linux kernel.

This adds a dependency on v2.2.1 or above of Libseccomp.

Signed-off-by: Matthew Heon <mheon@redhat.com>
2015-08-13 07:56:27 -04:00