Commit Graph

28 Commits

Author SHA1 Message Date
Kir Kolyshkin e6048715e4 Use gofumpt to format code
gofumpt (mvdan.cc/gofumpt) is a fork of gofmt with stricter rules.

Brought to you by

	git ls-files \*.go | grep -v ^vendor/ | xargs gofumpt -s -w

Looking at the diff, all these changes make sense.

Also, replace gofmt with gofumpt in golangci.yml.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-06-01 12:17:27 -07:00
Daniel Dao 8c7ece1e6d fs2: fallback to setting io.weight if io.bfq.weight
if bfq is not loaded, then io.bfq.weight is not available. io.weight
should always be available and is the next best equivalent thing.

Signed-off-by: Daniel Dao <dqminh89@gmail.com>
2021-03-05 13:55:36 +00:00
Daniel Dao c3ffd2ef81 Do not convert blkio weight value using blkio->io conversion scheme
bfq weight controller (i.e. io.bfq.weight if present) is still using the
same bfq weight scheme (i.e 1->1000, see [1].) Unfortunately the
documentation for this was wrong, and only fixed recently [2].

Therefore, if we map blkio weight to io.bfq.weight, there's no need to
do any conversion. Otherwise, we will try to write invalid value which
results in error such as:

```
time="2021-02-03T14:55:30Z" level=error msg="container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: process_linux.go:458: setting cgroup config for procHooks process caused: failed to write \"7475\": write /sys/fs/cgroup/runc-cgroups-integration-test/test-cgroup/io.bfq.weight: numerical result out of range"
```

[1] https://github.com/torvalds/linux/blob/master/Documentation/block/bfq-iosched.rst
[2] https://github.com/torvalds/linux/commit/65752aef0a407e1ef17ec78a7fc31ba4e0b360f9

Signed-off-by: Daniel Dao <dqminh89@gmail.com>
2021-02-23 19:46:16 -08:00
Aleksa Sarai 6eed6e5795 merge branch 'pr-2599'
Kir Kolyshkin (4):
  libct/cgroups/fs/cpuset: don't use MkdirAll
  libct/cg/fs/cpuset: don't parse mountinfo
  libct/cg/fs.getCgroupRoot: reuse (cached) cgroup mountinfo
  libct/cgroups/v1_utils: implement mountinfo cache

LGTMs: @AkihiroSuda @cyphar
Closes #2599
2021-02-01 11:04:11 +11:00
Kir Kolyshkin 657a24ce01 libct/cg/TestGetHugePageSizeImpl: only log errors
This:

> === RUN   TestGetHugePageSizeImpl
>     utils_test.go:504: (input [hugepages-akB], error strconv.Atoi: parsing "a": invalid syntax)

feels like an error but it's not.

Only log errors.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-01-14 10:38:59 -08:00
Kir Kolyshkin ed70dfa732 libct/cgroups/v1_utils: implement mountinfo cache
Apparently it is inevitable that we have to read mountinfo multiple
times when dealing with cgroup v1. It seems we can only do it once
and reuse the data, without major modifications to the code.

This commit does a few things.

1. Drop our custom mountinfo parser implementation in favor of
   moby/sys/mountinfo. While the custom parser is faster
   (about 2x according to benchmark) for this particular case,
   the one from the package is more correct and future-proof.

2. Read mountinfo only once, caching all the entries with fstype of
   cgroup.  With this, there's no need to worry about performance
   degradation introduced above.

3. Drop "isSubsystemAvailable" optimization (introduced by commit
   2a1a6cdf44) because now with the cache it is probably slowing
   things down.

4. Modify the tests accordingly.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-01-06 14:54:22 -08:00
Kir Kolyshkin 7cd062d7be libct/cgroup/utils: fix GetCgroupMounts(all=true)
The `all` argument was introduced by commit f557996401 specifically
for use by cAdvisor (see [1]), but there were no test cases added,
so it was later broken by 5ee0648bfb which started incrementing
numFound unconditionally.

Fix this (by not checking numFound in case all is true), and add a
simple test case to avoid future regressions.

[1] https://github.com/google/cadvisor/pull/1476

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-12-01 15:15:30 -08:00
Kir Kolyshkin 360981ae1d libct/cgroups: rewrite getHugePageSizeFromFilenames
This is a function to convert huge page sizes (obtained by reading
/sys/kernel/mm/hugepages directory entries) to strings user for hugetlb
cgroup controller resource files. Those strings are when used to get the
hugetlb resource statistics.

This function used external library, floating point numbers, and can
(theoretically) produce invalid values, since the kernel only uses KB,
MB, and GB suffixes.

Rewrite it to produce the same strings as used in the kernel (see [1]).
As a result, it's also faster, more future-proof (entries that do not
start with "hugepages-" and/or incorrect suffix are skipped), and does
more input sanity checks. As a side effect, libcontainer no longer
depends on docker/go-units.

While at it, add more test cases.

Before:
	BenchmarkGetHugePageSize-8       	  187452	      6265 ns/op
	BenchmarkGetHugePageSizeImpl-8   	  396769	      2998 ns/op

After:
	BenchmarkGetHugePageSize-8       	  222898	      4554 ns/op
	BenchmarkGetHugePageSizeImpl-8   	 4738924	       241 ns/op

NOTE on removing HugePageSizeUnitList -- this was added by commit
6f77e35da and was used by kubernetes code in [2], which was later
superceded by [3], so there are (hopefully) no external users.
If there are any, they should not be doing that.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/hugetlb_cgroup.c?id=eff48ddeab782e35e58ccc8853f7386bbae9dec4#n574
[2] https://github.com/kubernetes/kubernetes/pull/78495
[3] https://github.com/kubernetes/kubernetes/pull/84154

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-09-30 10:58:31 -07:00
Kir Kolyshkin 9aff7aaeb6 libct/utils: add GetHugePageSize benchmark
On my laptop, I get

BenchmarkGetHugePageSize
BenchmarkGetHugePageSize-8       	  115213	      9400 ns/op
BenchmarkGetHugePageSizeImpl
BenchmarkGetHugePageSizeImpl-8   	  397873	      2971 ns/op

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-09-22 15:25:02 -07:00
Kir Kolyshkin 0626c150c1 libct/cgroupv1: fix TestGetCgroupMounts test cases
When testing GetCgroupMounts, the map data is supposed to be obtained
from /proc/self/cgroup, but since we're mocking things, we provide
our own map.

Unfortunately, not all controllers existing in mountinfos were listed.
Also, "name=systemd" needs special handling, so add it.

The controllers added were:

 * for fedoraMountinfo case: name=systemd
 * for systemdMountinfo case: name=systemd, net_prio
 * for bedrockMountinfo case: name=systemd, net_prio, pids

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-16 12:45:30 -07:00
Kir Kolyshkin 4189cb65f8 cgroups: remove cgroup.Resources.CpuMax
This (and the converting function) is only used by one of the four
cgroup drivers. The other three do some checking and conversion in
place, so let the fs2 do the same.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-09 17:15:38 -07:00
Kir Kolyshkin 2db3240f35 libct/cgroups: rm GetClosestMountpointAncestor
The function GetClosestMountpointAncestor is not very efficient,
does not really belong to cgroup package, and is only used once
(from fs/cpuset.go).

Remove it, replacing with the implementation based on moby/sys/mountinfo
parser.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-05-13 17:32:06 -07:00
Kir Kolyshkin f160352682 libct/cgroup: prep to rm GetClosestMountpointAncestor
This function is not very efficient, does not really belong to cgroup
package, and is only used once (from fs/cpuset.go).

Prepare to remove it by replacing with the implementation based on
the parser from github.com/moby/sys/mountinfo parser.

This commit is here to make sure the proposed replacement passes the
unit test.

Funny, but the unit test need to be slightly modified since it
supplies the wrong mountinfo (space as the first character, empty line
at the end).

Validated by

 $ go test -v -run Ance
 === RUN   TestGetClosestMountpointAncestor
 --- PASS: TestGetClosestMountpointAncestor (0.00s)
 PASS
 ok  	github.com/opencontainers/runc/libcontainer/cgroups	0.002s

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-05-13 16:26:16 -07:00
Kir Kolyshkin c86be8a2c1 cgroupv2: fix setting MemorySwap
The resources.MemorySwap field from OCI is memory+swap, while cgroupv2
has a separate swap limit, so subtract memory from the limit (and make
sure values are set and sane).

Make sure to set MemorySwapMax for systemd, too. Since systemd does not
have MemorySwapMax for cgroupv1, it is only needed for v2 driver.

[v2: return -1 on any negative value, add unit test]
[v3: treat any negative value other than -1 as error]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-04-07 20:45:53 -07:00
Akihiro Suda aa269315a4 cgroup2: add CpuMax conversion
Fix #2243

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-03-13 02:58:39 +09:00
Akihiro Suda 64e9a97981 cgroup2: fix conversion
* TestConvertCPUSharesToCgroupV2Value(0) was returning 70369281052672, while the correct value is 0
* ConvertBlkIOToCgroupV2Value(0) was returning 32, while the correct value is 0
* ConvertBlkIOToCgroupV2Value(1000) was returning 4, while the correct value is 10000

Fix #2244
Follow-up to #2212 #2213

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-03-13 02:57:07 +09:00
Odin Ugedal c6445b1c1c Add tests for GetHugePageSize
Add tests to avoid regressions

Signed-off-by: Odin Ugedal <odin@ugedal.com>
2019-05-30 17:27:32 +02:00
Michael Crosby 76520a4bf0 Merge pull request #1872 from masters-of-cats/better-find-cgroup-mountpoint
Respect container's cgroup path
2018-11-16 14:06:54 -05:00
Danail Branekov a1d5398afa Respect container's cgroup path
Respect the container's cgroup path when finding the container's
cgroup mount point, which is useful in multi-tenant environments, where
containers have their own unique cgroup mounts

Signed-off-by: Danail Branekov <danailster@gmail.com>
Signed-off-by: Oliver Stenbom <ostenbom@pivotal.io>
Signed-off-by: Giuseppe Capizzi <gcapizzi@pivotal.io>
2018-09-25 17:43:36 +01:00
Jay Kamat e5a7c61f3c Add test for testing cgroup mounts on bedrock linux
Add a mountinfo from a bedrock linux system with 4 strata, and include
it for tests

Signed-off-by: Jay Kamat <jaygkamat@gmail.com>
Signed-off-by: Daniel Dao <dqminh89@gmail.com>
2018-06-24 00:01:07 +01:00
Daniel Dao 5ee0648bfb Stop relying on number of subsystems for cgroups
When there are complicated mount setups, there can be multiple mount
points which have the subsystem we are looking for. Instead of
counting the mountpoints, tick off subsystems until we have found them
all.

Without the 'all' flag, ignore duplicate subsystems after the first.

Signed-off-by: Daniel Dao <dqminh89@gmail.com>
2018-06-24 00:00:58 +01:00
Craig Furman f5c5aac958 Create containers when cgroups already mounted
Runc needs to copy certain files from the top of the cgroup cpuset hierarchy
into the container's cpuset cgroup directory. Currently, runc determines
which directory is the top of the hierarchy by using the parent dir of
the first entry in /proc/self/mountinfo of type cgroup.

This creates problems when cgroup subsystems are mounted arbitrarily in
different dirs on the host.

Now, we use the most deeply nested mountpoint that contains the
container's cpuset cgroup directory.

Signed-off-by: Konstantinos Karampogias <konstantinos.karampogias@swisscom.com>
Signed-off-by: Will Martin <wmartin@pivotal.io>
2017-03-15 10:10:30 +00:00
Mrunal Patel c7ebda72ac Add a test for testing that we ignore cgroup2 mounts
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2017-01-11 16:49:53 -08:00
Mrunal Patel f557996401 Add flag to allow getting all mounts for cgroups subsystems
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2016-09-15 15:19:27 -04:00
Euan Kemp 394610a396 cgroups: Parse correctly if cgroup path contains :
Prior to this change a cgroup with a `:` character in it's path was not
parsed correctly (as occurs on some instances of systemd cgroups under
some versions of systemd, e.g. 225 with accounting).

This fixes that issue and adds a test.

Signed-off-by: Euan Kemp <euank@coreos.com>
2016-06-10 23:09:03 -07:00
rajasec 3b2805834b Adding linux label to test file
Signed-off-by: rajasec <rajasec79@gmail.com>

Fixed review comments

Signed-off-by: rajasec <rajasec79@gmail.com>
2016-02-25 07:52:32 +05:30
Alexander Morozov 98cbce80fb Look for " - " instead of just - as separator
- symbol can appear in any path

Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2016-02-18 09:58:29 -08:00
Alexander Morozov 97146f4dc6 Remove usage of GetMounts from GetCgroupMounts
GetMounts is very cpu-expensive. I'll change other funcs in this package
to reuse code from GetCgroupMounts later.

Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2016-02-01 11:00:23 -08:00