zishuo/runc

mirror of https://github.com/opencontainers/runc.git synced 2026-04-22 23:17:17 +08:00

Author	SHA1	Message	Date
Kir Kolyshkin	e6048715e4	Use gofumpt to format code gofumpt (mvdan.cc/gofumpt) is a fork of gofmt with stricter rules. Brought to you by git ls-files \*.go \| grep -v ^vendor/ \| xargs gofumpt -s -w Looking at the diff, all these changes make sense. Also, replace gofmt with gofumpt in golangci.yml. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-06-01 12:17:27 -07:00
Giuseppe Scrivano	c61f606254	libcontainer: honor seccomp defaultErrnoRet https://github.com/opencontainers/runtime-spec/pull/1087 added support for defaultErrnoRet to the OCI runtime specs. If a defaultErrnoRet is specified, disable patching the generated libseccomp cBPF. Closes: https://github.com/opencontainers/runc/issues/2943 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2021-05-17 09:23:32 +02:00
Kir Kolyshkin	1f1e91b1a0	libct/specconv: check mount destination is absolute Per OCI runtime spec, mount destination MUST be absolute. Let's check that and return an error if not. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-04-20 11:26:16 -07:00
Qiang Huang	2d38476c96	Merge pull request #2840 from kolyshkin/ignore-kmem Ignore kernel memory settings	2021-04-13 09:44:14 +08:00
Kir Kolyshkin	52390d6804	Ignore kernel memory settings This is somewhat radical approach to deal with kernel memory. Per-cgroup kernel memory limiting was always problematic. A few examples: - older kernels had bugs and were even oopsing sometimes (best example is RHEL7 kernel); - kernel is unable to reclaim the kernel memory so once the limit is hit a cgroup is toasted; - some kernel memory allocations don't allow failing. In addition to that, - users don't have a clue about how to set kernel memory limits (as the concept is much more complicated than e.g. [user] memory); - different kernels might have different kernel memory usage, which is sort of unexpected; - cgroup v2 do not have a [dedicated] kmem limit knob, and thus runc silently ignores kernel memory limits for v2; - kernel v5.4 made cgroup v1 kmem.limit obsoleted (see https://github.com/torvalds/linux/commit/0158115f702b). In view of all this, and as the runtime-spec lists memory.kernel and memory.kernelTCP as OPTIONAL, let's ignore kernel memory limits (for cgroup v1, same as we're already doing for v2). This should result in less bugs and better user experience. The only bad side effect from it might be that stat can show kernel memory usage as 0 (since the accounting is not enabled). [v2: add a warning in specconv that limits are ignored] Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-04-12 12:18:11 -07:00
Kir Kolyshkin	27bb1bd5ea	libct/specconv/CreateCgroupConfig: don't set c.Parent default c.Parent is only used by systemd cgroup drivers, and both v1 and v2 drivers do have code to set the default if it is empty, so setting it here is redundant. In addition, in case of cgroup v2 rootless container setting it here is harmful as the default should be user.slice not system.slice. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-04-01 19:50:37 -07:00
Kir Kolyshkin	f0dec0b4bf	libct/specconv/CreateCgroupConfig: nit Do not call libcontainerUtils.CleanPath in case its result will not be used. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-04-01 19:33:01 -07:00
Iceber Gu	fa52df9493	libcontainer: fix the file mode of the device Signed-off-by: Iceber Gu <wei.cai-nat@daocloud.io>	2021-02-17 15:08:22 +08:00
Aleksa Sarai	7a8d7162f9	seccomp: prepend -ENOSYS stub to all filters Having -EPERM is the default was a fairly significant mistake from a future-proofing standpoint in that it makes any new syscall return a non-ignorable error (from glibc's point of view). We need to correct this now because faccessat2(2) is something glibc critically needs to have support for, but they're blocked on container runtimes because we return -EPERM unconditionally (leading to confusion in glibc). This is also a problem we're probably going to keep running into in the future. Unfortunately there are several issues which stop us from having a clean solution to this problem: 1. libseccomp has several limitations which require us to emulate behaviour we want: a. We cannot do logic based on syscall number, meaning we cannot specify a "largest known syscall number"; b. libseccomp doesn't know in which kernel version a syscall was added, and has no API for "minimum kernel version" so we cannot simply ask libseccomp to generate sane -ENOSYS rules for us. c. Additional seccomp rules for the same syscall are not treated as distinct rules -- if rules overlap, seccomp will merge them. This means we cannot add per-syscall -EPERM fallbacks; d. There is no inverse operation for SCMP_CMP_MASKED_EQ; e. libseccomp does not allow you to specify multiple rules for a single argument, making it impossible to invert OR rules for arguments. 2. The runtime-spec does not have any way of specifying: a. The errno for the default action; b. The minimum kernel version or "newest syscall at time of profile creation"; nor c. Which syscalls were intentionally excluded from the allow list (weird syscalls that are no longer used were excluded entirely, but Docker et al expect those syscalls to get EPERM not ENOSYS). 3. Certain syscalls should not return -ENOSYS (especially only for certain argument combinations) because this could also trigger glibc confusion. This means we have to return -EPERM for certain syscalls but not as a global default. 4. There is not an obvious (and reasonable) upper limit to syscall numbers, so we cannot create a set of rules for each syscall above the largest syscall number in libseccomp. This means we must handle inverse rules as described below. 5. Any syscall can be specified multiple times, which can make generation of hotfix rules much harder. As a result, we have to work around all of these things by coming up with a heuristic to stop the bleeding. In the future we could hopefully improve the situation in the runtime-spec and libseccomp. The solution applied here is to prepend a "stub" filter which returns -ENOSYS if the requested syscall has a larger syscall number than any syscall mentioned in the filter. The reason for this specific rule is that syscall numbers are (roughly) allocated sequentially and thus newer syscalls will (usually) have a larger syscall number -- thus causing our filters to produce -ENOSYS if the filter was written before the syscall existed. Sadly this is not a perfect solution because syscalls can be added out-of-order and the syscall table can contain holes for several releases. Unfortuntely we do not have a nicer solution at the moment because there is no library which provides information about which Linux version a syscall was introduced in. Until that exists, this workaround will have to be good enough. The above behaviour only happens if the default action is a blocking action (in other words it is not SCMP_ACT_LOG or SCMP_ACT_ALLOW). If the default action is permissive then we don't do any patching. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2021-01-28 23:11:22 +11:00
Sebastiaan van Stijn	4fc2de77e9	libcontainer/devices: remove "Device" prefix from types Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2020-12-01 11:11:23 +01:00
Sebastiaan van Stijn	677baf22d2	libcontainer: isolate libcontainer/devices Move the Device-related types to libcontainer/devices, so that the package can be used in isolation. Aliases have been created in libcontainer/configs for backward compatibility. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2020-12-01 11:11:21 +01:00
Ashok Pon Kumar	fcf210d631	Fix goreport warnings of ineffassign and misspell Signed-off-by: Ashok Pon Kumar <ashokponkumar@gmail.com>	2020-10-02 09:03:45 +05:30
Kenta Tada	3d5dec2f44	libcontainer: remove the unused variable from spec Signed-off-by: Kenta Tada <Kenta.Tada@sony.com>	2020-10-01 12:29:48 +09:00
Sebastiaan van Stijn	8bf216728c	use string-concatenation instead of sprintf for simple cases Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2020-09-30 10:51:59 +02:00
Kir Kolyshkin	b006f4a180	libct/cgroups: support Cgroups.Resources.Unified Add support for unified resource map (as per [1]), and add some test cases for the new functionality. [1] https://github.com/opencontainers/runtime-spec/pull/1040 Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-09-24 15:29:35 -07:00
Giuseppe Scrivano	a63f99fcc5	Add support for umask Signed-off-by: Ashley Cui <acui@redhat.com>	2020-08-20 11:39:43 -04:00
Cesar Talledo	0709202da7	Remove runc default devices that overlap with spec devices. Runc has a set of default devices that it includes in Linux containers (e.g., /dev/null, /dev/random, /dev/tty, etc.) However if the container's OCI spec includes all or a subset of those same devices, runc is currently not detecting the redundancy, causing it to create a lib container config that has redundant device configurations. This causes a failure in rootless mode, in particular when the /dev/tty device has a redundant config: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: rootfs_linux.go:70: creating device nodes caused: open /tmp/busyboxtest/rootfs/dev/tty: no such device or address" The reason this fails in rootless mode only is that in this case runc sets up /dev/tty not by doing mknod (it's not allowed within a user-ns) but rather by creating a regular file under /dev/tty and bind-mounting the host's /dev/tty to the container's /dev/tty. When this operation is done redundantly, it fails the second time. This change fixes this problem by ensuring runc checks for redundant devices between the OCI spec it receives and the default devices it configures. If a redundant device is detected, the OCI spec takes priority. The change adds both a unit test and an integration test to verify the behavior. Without this fix, this new integration test fails as shown above. Signed-off-by: Cesar Talledo <ctalledo@nestybox.com>	2020-08-07 16:46:15 -07:00
Renaud Gaubert	ccdd75760c	Add the CreateRuntime, CreateContainer and StartContainer Hooks Signed-off-by: Renaud Gaubert <rgaubert@nvidia.com>	2020-06-17 02:10:00 +00:00
Kir Kolyshkin	4189cb65f8	cgroups: remove cgroup.Resources.CpuMax This (and the converting function) is only used by one of the four cgroup drivers. The other three do some checking and conversion in place, so let the fs2 do the same. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-06-09 17:15:38 -07:00
Giuseppe Scrivano	41aa19662b	libcontainer: honor seccomp errnoRet Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2020-05-20 09:11:55 +02:00
Aleksa Sarai	24388be71e	configs: use different types for .Devices and .Resources.Devices Making them the same type is simply confusing, but also means that you could accidentally use one in the wrong context. This eliminates that problem. This also includes a whole bunch of cleanups for the types within DeviceRule, so that they can be used more ergonomically. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2020-05-13 17:38:45 +10:00
Aleksa Sarai	60e21ec26e	specconv: remove default /dev/console access /dev/console is a host resouce which gives a bunch of permissions that we really shouldn't be giving to containers, not to mention that /dev/console in containers is actually /dev/pts/$n. Drop this since arguably this is a fairly scary thing to allow... Signed-off-by: Aleksa Sarai <asarai@suse.de>	2020-05-13 17:38:45 +10:00
Aleksa Sarai	b2bec9806f	cgroup: devices: eradicate the Allow/Deny lists These lists have been in the codebase for a very long time, and have been unused for a large portion of that time -- specconv doesn't generate them and the only user of these flags has been tests (which doesn't inspire much confidence). In addition, we had an incorrect implementation of a white-list policy. This wasn't exploitable because all of our users explicitly specify "deny all" as the first rule, but it was a pretty glaring issue that came from the "feature" that users can select whether they prefer a white- or black- list. Fix this by always writing a deny-all rule (which is what our users were doing anyway, to work around this bug). This is one of many changes needed to clean up the devices cgroup code. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2020-05-13 17:38:45 +10:00
Pradyumna Agrawal	4aa9101477	Honor spec.Process.NoNewPrivileges in specconv.CreateLibcontainerConfig The change ensures that the passed in value of NoNewPrivileges under spec.Process is reflected in the container config generated by specconv.CreateLibcontainerConfig Closes #2397 Signed-off-by: Pradyumna Agrawal <pradyumnaa@vmware.com>	2020-05-11 13:38:14 -07:00
lifubang	d2a9c5da37	using default allowed devices when linux resources is null Signed-off-by: lifubang <lifubang@acmcoder.com>	2020-04-16 11:40:44 +08:00
Akihiro Suda	cc183ca662	Merge pull request #2242 from AkihiroSuda/vendor-systemd vendor: update go-systemd and godbus	2020-03-25 02:40:22 +09:00
Akihiro Suda	492d525e55	vendor: update go-systemd and godbus Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2020-03-16 13:26:03 +09:00
Akihiro Suda	aa269315a4	cgroup2: add CpuMax conversion Fix #2243 Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2020-03-13 02:58:39 +09:00
Kir Kolyshkin	1cd71dfd71	systemd properties: support for *Sec values Some systemd properties are documented as having "Sec" suffix (e.g. "TimeoutStopSec") but are expected to have "USec" suffix when passed over dbus, so let's provide appropriate conversion to improve compatibility. This means, one can specify TimeoutStopSec with a numeric argument, in seconds, and it will be properly converted to TimeoutStopUsec with the argument in microseconds. As a side bonus, even float values are converted, so e.g. TimeoutStopSec=1.5 is possible. This turned out a bit more tricky to implement when I was originally expected, since there are a handful of numeric types in dbus and each one requires explicit conversion. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-02-17 16:07:19 -08:00
Kir Kolyshkin	4c5c3fb960	Support for setting systemd properties via annotations In case systemd is used to set cgroups for the container, it creates a scope unit dedicated to it (usually named `runc-$ID.scope`). This patch adds an ability to set arbitrary systemd properties for the systemd unit via runtime spec annotations. Initially this was developed as an ability to specify the `TimeoutStopUSec` property, but later generalized to work with arbitrary ones. Example usage: add the following to runtime spec (config.json): ``` "annotations": { "org.systemd.property.TimeoutStopUSec": "uint64 123456789", "org.systemd.property.CollectMode":"'inactive-or-failed'" }, ``` and start the container (e.g. `runc --systemd-cgroup run $ID`). The above will set the following systemd parameters: * `TimeoutStopSec` to 2 minutes and 3 seconds, * `CollectMode` to "inactive-or-failed". The values are in the gvariant format (see [1]). To figure out which type systemd expects for a particular parameter, see systemd sources. In particular, parameters with `USec` suffix require an `uint64` typed argument, while gvariant assumes int32 for a numeric values, therefore the explicit type is required. NOTE that systemd receives the time-typed parameters as USec but shows them (in `systemctl show`) as Sec. For example, the stop timeout should be set as `TimeoutStopUSec` but is shown as `TimeoutStopSec`. [1] https://developer.gnome.org/glib/stable/gvariant-text.html Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-02-17 16:07:19 -08:00
Boris Popovschi	7c439cc6f6	Added conversion for cpu.weight v2 Signed-off-by: Boris Popovschi <zyqsempai@mail.ru>	2020-02-12 11:32:34 +02:00
Julio Montes	cd7c59d042	libcontainer: export createCgroupConfig A `config.Cgroups` object is required to manipulate cgroups v1 and v2 using libcontainer. Export `createCgroupConfig` to allow API users to create `config.Cgroups` objects using directly libcontainer API. Signed-off-by: Julio Montes <julio.montes@intel.com>	2019-12-17 22:46:03 +00:00
Akihiro Suda	faf673ee45	cgroup2: port over eBPF device controller from crun The implementation is based on https://github.com/containers/crun/blob/0.10.2/src/libcrun/ebpf.c Although ebpf.c is originally licensed under LGPL-3.0-or-later, the author Giuseppe Scrivano agreed to relicense the file in Apache License 2.0: https://github.com/opencontainers/runc/issues/2144#issuecomment-543116397 See libcontainer/cgroups/ebpf/devicefilter/devicefilter_test.go for tested configurations. Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2019-10-31 14:01:46 +09:00
Aleksa Sarai	8296826da5	specconv: always set "type: bind" in case of MS_BIND We discovered in umoci that setting a dummy type of "none" would result in file-based bind-mounts no longer working properly, which is caused by a restriction for when specconv will change the device type to "bind" to work around rootfs_linux.go's ... issues. However, bind-mounts don't have a type (and Linux will ignore any type specifier you give it) because the type is copied from the source of the bind-mount. So we should always overwrite it to avoid user confusion. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2019-04-08 15:08:08 +10:00
Filipe Brandenburger	4b2b978291	Add cgroup name to error message More information should help troubleshoot an issue when this error occurs. Signed-off-by: Filipe Brandenburger <filbranden@google.com>	2019-03-14 10:25:00 -07:00
Mrunal Patel	4769cdf607	Merge pull request #1916 from crosbymichael/cgns Add support for cgroup namespace	2018-11-13 12:21:38 -08:00
Ace-Tang	16d55f17a8	libcontainer: fix potential panic if spec.Process is nil for the code logic, pointer 'spec.Process' should be judge first to avoid panic. Signed-off-by: Ace-Tang <aceapril@126.com>	2018-11-06 11:55:30 +08:00
Aleksa Sarai	9a3a8a5ebf	libcontainer: implement CLONE_NEWCGROUP This is a very simple implementation because it doesn't require any configuration unlike the other namespaces, and in its current state it only masks paths. This feature is available in Linux 4.6+ and is enabled by default for kernels compiled with CONFIG_CGROUP=y. Signed-off-by: Aleksa Sarai <asarai@suse.de> Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2018-10-23 16:23:00 -04:00
Xiaochen Shen	27560ace2f	libcontainer: intelrdt: add support for Intel RDT/MBA in runc Memory Bandwidth Allocation (MBA) is a resource allocation sub-feature of Intel Resource Director Technology (RDT) which is supported on some Intel Xeon platforms. Intel RDT/MBA provides indirect and approximate throttle over memory bandwidth for the software. A user controls the resource by indicating the percentage of maximum memory bandwidth. Hardware details of Intel RDT/MBA can be found in section 17.18 of Intel Software Developer Manual: https://software.intel.com/en-us/articles/intel-sdm In Linux 4.12 kernel and newer, Intel RDT/MBA is enabled by kernel config CONFIG_INTEL_RDT. If hardware support, CPU flags `rdt_a` and `mba` will be set in /proc/cpuinfo. Intel RDT "resource control" filesystem hierarchy: mount -t resctrl resctrl /sys/fs/resctrl tree /sys/fs/resctrl /sys/fs/resctrl/ \|-- info \| \|-- L3 \| \| \|-- cbm_mask \| \| \|-- min_cbm_bits \| \| \|-- num_closids \| \|-- MB \| \|-- bandwidth_gran \| \|-- delay_linear \| \|-- min_bandwidth \| \|-- num_closids \|-- ... \|-- schemata \|-- tasks \|-- <container_id> \|-- ... \|-- schemata \|-- tasks For MBA support for `runc`, we will reuse the infrastructure and code base of Intel RDT/CAT which implemented in #1279. We could also make use of `tasks` and `schemata` configuration for memory bandwidth resource constraints. The file `tasks` has a list of tasks that belongs to this group (e.g., <container_id>" group). Tasks can be added to a group by writing the task ID to the "tasks" file (which will automatically remove them from the previous group to which they belonged). New tasks created by fork(2) and clone(2) are added to the same group as their parent. The file `schemata` has a list of all the resources available to this group. Each resource (L3 cache, memory bandwidth) has its own line and format. Memory bandwidth schema: It has allocation values for memory bandwidth on each socket, which contains L3 cache id and memory bandwidth percentage. Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..." The minimum bandwidth percentage value for each CPU model is predefined and can be looked up through "info/MB/min_bandwidth". The bandwidth granularity that is allocated is also dependent on the CPU model and can be looked up at "info/MB/bandwidth_gran". The available bandwidth control steps are: min_bw + N * bw_gran. Intermediate values are rounded to the next control step available on the hardware. For more information about Intel RDT kernel interface: https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt An example for runc: Consider a two-socket machine with two L3 caches where the minimum memory bandwidth of 10% with a memory bandwidth granularity of 10%. Tasks inside the container may use a maximum memory bandwidth of 20% on socket 0 and 70% on socket 1. "linux": { "intelRdt": { "memBwSchema": "MB:0=20;1=70" } } Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2018-10-16 14:29:29 +08:00
Mrunal Patel	a00bf01908	Merge pull request #1862 from AkihiroSuda/decompose-rootless-pr Disable rootless mode except RootlessCgMgr when executed as the root in userns (fix Docker-in-LXD regression)	2018-10-15 17:32:15 -07:00
Jonathan Marler	1499c746a1	Move spec.Linux.IntelRdt check to spec.Linux != nil block Signed-off-by: Jonathan Marler <johnnymarler@gmail.com>	2018-10-04 21:30:55 -06:00
Akihiro Suda	06f789cf26	Disable rootless mode except RootlessCgMgr when executed as the root in userns This PR decomposes `libcontainer/configs.Config.Rootless bool` into `RootlessEUID bool` and `RootlessCgroups bool`, so as to make "runc-in-userns" to be more compatible with "rootful" runc. `RootlessEUID` denotes that runc is being executed as a non-root user (euid != 0) in the current user namespace. `RootlessEUID` is almost identical to the former `Rootless` except cgroups stuff. `RootlessCgroups` denotes that runc is unlikely to have the full access to cgroups. `RootlessCgroups` is set to false if runc is executed as the root (euid == 0) in the initial namespace. Otherwise `RootlessCgroups` is set to true. (Hint: if `RootlessEUID` is true, `RootlessCgroups` becomes true as well) When runc is executed as the root (euid == 0) in an user namespace (e.g. by Docker-in-LXD, Podman, Usernetes), `RootlessEUID` is set to false but `RootlessCgroups` is set to true. So, "runc-in-userns" behaves almost same as "rootful" runc except that cgroups errors are ignored. This PR does not have any impact on CLI flags and `state.json`. Note about CLI: * Now `runc --rootless=(auto\|true\|false)` CLI flag is only used for setting `RootlessCgroups`. * Now `runc spec --rootless` is only required when `RootlessEUID` is set to true. For runc-in-userns, `runc spec` without `--rootless` should work, when sufficient numbers of UID/GID are mapped. Note about `$XDG_RUNTIME_DIR` (e.g. `/run/user/1000`): * `$XDG_RUNTIME_DIR` is ignored if runc is being executed as the root (euid == 0) in the initial namespace, for backward compatibility. (`/run/runc` is used) * If runc is executed as the root (euid == 0) in an user namespace, `$XDG_RUNTIME_DIR` is honored if `$USER != "" && $USER != "root"`. This allows unprivileged users to allow execute runc as the root in userns, without mounting writable `/run/runc`. Note about `state.json`: * `rootless` is set to true when `RootlessEUID == true && RootlessCgroups == true`. Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>	2018-09-07 15:05:03 +09:00
Alban Crequy	3321aa1af7	Fix regression with mounts with non-absolute source path PR #1753 introduced a test on the mount flags but the binary operator was wrong, see https://github.com/opencontainers/runc/pull/1753#discussion_r203445652 This was noticed when investigating https://github.com/opencontainers/runtime-tools/issues/651 Symptoms: in the container, /proc/self/mountinfo displays some mounts as follow: 296 279 0:67 / /tmp rw,nosuid - tmpfs /home/dpark/go/src/github.com/opencontainers/runc/tmpfs rw,size=65536k,mode=755 Signed-off-by: Alban Crequy <alban@kinvolk.io>	2018-07-18 18:30:49 +02:00
Qiang Huang	dd67ab10d7	Merge pull request #1759 from cyphar/rootless-erofs-as-eperm rootless: cgroup: treat EROFS as a skippable error	2018-05-25 09:24:16 +08:00
dlorenc	40680b2d37	Make the setupSeccomp function public. This function is useful for converting from the OCI spec format to the one used by runC/libcontainer. Signed-off-by: dlorenc <lorenc.d@gmail.com>	2018-04-17 10:47:22 -07:00
Michael Crosby	d56f6cc202	Merge pull request #1753 from wking/do-not-require-bind-mount-type libcontainer/specconv/spec_linux: Support empty 'type' for bind mounts	2018-04-16 11:01:53 -04:00
Michael Crosby	9f0eca2a94	Merge pull request #1777 from nalind/no-config-for-extant-netns Only configure networking when creating a net ns	2018-04-12 10:55:02 -04:00
Nalin Dahyabhai	4521d4b19c	Only configure networking when creating a net ns When joining an existing namespace, don't default to configuring a loopback interface in that namespace. Its creator should have done that, and we don't want to fail to create the container when we don't have sufficient privileges to configure the network namespace. Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>	2018-04-11 13:28:19 -04:00
Aleksa Sarai	fd3a6e6c83	libcontainer: handle unset oomScoreAdj corectly Previously if oomScoreAdj was not set in config.json we would implicitly set oom_score_adj to 0. This is not allowed according to the spec: > If oomScoreAdj is not set, the runtime MUST NOT change the value of > oom_score_adj. Change this so that we do not modify oom_score_adj if oomScoreAdj is not present in the configuration. While this modifies our internal configuration types, the on-disk format is still compatible. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2018-03-17 13:53:42 +11:00
W. Trevor King	0aa6e4e5d3	libcontainer/specconv/spec_linux: Support empty 'type' for bind mounts From the "Creating a bind mount" section of mount(2) [1]: > If mountflags includes MS_BIND (available since Linux 2.4), then > perform a bind mount... > > The filesystemtype and data arguments are ignored. This commit adds support for configurations that leave the OPTIONAL type [2] unset for bind mounts. There's a related spec-example change in flight with [3], although my personal preference would be a more explicit spec for the whole mount structure [4]. [1]: http://man7.org/linux/man-pages/man2/mount.2.html [2]: https://github.com/opencontainers/runtime-spec/blame/v1.0.1/config.md#L102 [3]: https://github.com/opencontainers/runtime-spec/pull/954 [4]: https://github.com/opencontainers/runtime-spec/pull/771 Signed-off-by: W. Trevor King <wking@tremily.us>	2018-03-07 10:23:42 -08:00

1 2 3

101 Commits