gofumpt (mvdan.cc/gofumpt) is a fork of gofmt with stricter rules.
Brought to you by
git ls-files \*.go | grep -v ^vendor/ | xargs gofumpt -s -w
Looking at the diff, all these changes make sense.
Also, replace gofmt with gofumpt in golangci.yml.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
if bfq is not loaded, then io.bfq.weight is not available. io.weight
should always be available and is the next best equivalent thing.
Signed-off-by: Daniel Dao <dqminh89@gmail.com>
bfq weight controller (i.e. io.bfq.weight if present) is still using the
same bfq weight scheme (i.e 1->1000, see [1].) Unfortunately the
documentation for this was wrong, and only fixed recently [2].
Therefore, if we map blkio weight to io.bfq.weight, there's no need to
do any conversion. Otherwise, we will try to write invalid value which
results in error such as:
```
time="2021-02-03T14:55:30Z" level=error msg="container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: process_linux.go:458: setting cgroup config for procHooks process caused: failed to write \"7475\": write /sys/fs/cgroup/runc-cgroups-integration-test/test-cgroup/io.bfq.weight: numerical result out of range"
```
[1] https://github.com/torvalds/linux/blob/master/Documentation/block/bfq-iosched.rst
[2] https://github.com/torvalds/linux/commit/65752aef0a407e1ef17ec78a7fc31ba4e0b360f9
Signed-off-by: Daniel Dao <dqminh89@gmail.com>
This:
> === RUN TestGetHugePageSizeImpl
> utils_test.go:504: (input [hugepages-akB], error strconv.Atoi: parsing "a": invalid syntax)
feels like an error but it's not.
Only log errors.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Apparently it is inevitable that we have to read mountinfo multiple
times when dealing with cgroup v1. It seems we can only do it once
and reuse the data, without major modifications to the code.
This commit does a few things.
1. Drop our custom mountinfo parser implementation in favor of
moby/sys/mountinfo. While the custom parser is faster
(about 2x according to benchmark) for this particular case,
the one from the package is more correct and future-proof.
2. Read mountinfo only once, caching all the entries with fstype of
cgroup. With this, there's no need to worry about performance
degradation introduced above.
3. Drop "isSubsystemAvailable" optimization (introduced by commit
2a1a6cdf44) because now with the cache it is probably slowing
things down.
4. Modify the tests accordingly.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The `all` argument was introduced by commit f557996401 specifically
for use by cAdvisor (see [1]), but there were no test cases added,
so it was later broken by 5ee0648bfb which started incrementing
numFound unconditionally.
Fix this (by not checking numFound in case all is true), and add a
simple test case to avoid future regressions.
[1] https://github.com/google/cadvisor/pull/1476
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This is a function to convert huge page sizes (obtained by reading
/sys/kernel/mm/hugepages directory entries) to strings user for hugetlb
cgroup controller resource files. Those strings are when used to get the
hugetlb resource statistics.
This function used external library, floating point numbers, and can
(theoretically) produce invalid values, since the kernel only uses KB,
MB, and GB suffixes.
Rewrite it to produce the same strings as used in the kernel (see [1]).
As a result, it's also faster, more future-proof (entries that do not
start with "hugepages-" and/or incorrect suffix are skipped), and does
more input sanity checks. As a side effect, libcontainer no longer
depends on docker/go-units.
While at it, add more test cases.
Before:
BenchmarkGetHugePageSize-8 187452 6265 ns/op
BenchmarkGetHugePageSizeImpl-8 396769 2998 ns/op
After:
BenchmarkGetHugePageSize-8 222898 4554 ns/op
BenchmarkGetHugePageSizeImpl-8 4738924 241 ns/op
NOTE on removing HugePageSizeUnitList -- this was added by commit
6f77e35da and was used by kubernetes code in [2], which was later
superceded by [3], so there are (hopefully) no external users.
If there are any, they should not be doing that.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/hugetlb_cgroup.c?id=eff48ddeab782e35e58ccc8853f7386bbae9dec4#n574
[2] https://github.com/kubernetes/kubernetes/pull/78495
[3] https://github.com/kubernetes/kubernetes/pull/84154
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
On my laptop, I get
BenchmarkGetHugePageSize
BenchmarkGetHugePageSize-8 115213 9400 ns/op
BenchmarkGetHugePageSizeImpl
BenchmarkGetHugePageSizeImpl-8 397873 2971 ns/op
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
When testing GetCgroupMounts, the map data is supposed to be obtained
from /proc/self/cgroup, but since we're mocking things, we provide
our own map.
Unfortunately, not all controllers existing in mountinfos were listed.
Also, "name=systemd" needs special handling, so add it.
The controllers added were:
* for fedoraMountinfo case: name=systemd
* for systemdMountinfo case: name=systemd, net_prio
* for bedrockMountinfo case: name=systemd, net_prio, pids
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This (and the converting function) is only used by one of the four
cgroup drivers. The other three do some checking and conversion in
place, so let the fs2 do the same.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The function GetClosestMountpointAncestor is not very efficient,
does not really belong to cgroup package, and is only used once
(from fs/cpuset.go).
Remove it, replacing with the implementation based on moby/sys/mountinfo
parser.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This function is not very efficient, does not really belong to cgroup
package, and is only used once (from fs/cpuset.go).
Prepare to remove it by replacing with the implementation based on
the parser from github.com/moby/sys/mountinfo parser.
This commit is here to make sure the proposed replacement passes the
unit test.
Funny, but the unit test need to be slightly modified since it
supplies the wrong mountinfo (space as the first character, empty line
at the end).
Validated by
$ go test -v -run Ance
=== RUN TestGetClosestMountpointAncestor
--- PASS: TestGetClosestMountpointAncestor (0.00s)
PASS
ok github.com/opencontainers/runc/libcontainer/cgroups 0.002s
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The resources.MemorySwap field from OCI is memory+swap, while cgroupv2
has a separate swap limit, so subtract memory from the limit (and make
sure values are set and sane).
Make sure to set MemorySwapMax for systemd, too. Since systemd does not
have MemorySwapMax for cgroupv1, it is only needed for v2 driver.
[v2: return -1 on any negative value, add unit test]
[v3: treat any negative value other than -1 as error]
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
* TestConvertCPUSharesToCgroupV2Value(0) was returning 70369281052672, while the correct value is 0
* ConvertBlkIOToCgroupV2Value(0) was returning 32, while the correct value is 0
* ConvertBlkIOToCgroupV2Value(1000) was returning 4, while the correct value is 10000
Fix#2244
Follow-up to #2212#2213
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Respect the container's cgroup path when finding the container's
cgroup mount point, which is useful in multi-tenant environments, where
containers have their own unique cgroup mounts
Signed-off-by: Danail Branekov <danailster@gmail.com>
Signed-off-by: Oliver Stenbom <ostenbom@pivotal.io>
Signed-off-by: Giuseppe Capizzi <gcapizzi@pivotal.io>
Add a mountinfo from a bedrock linux system with 4 strata, and include
it for tests
Signed-off-by: Jay Kamat <jaygkamat@gmail.com>
Signed-off-by: Daniel Dao <dqminh89@gmail.com>
When there are complicated mount setups, there can be multiple mount
points which have the subsystem we are looking for. Instead of
counting the mountpoints, tick off subsystems until we have found them
all.
Without the 'all' flag, ignore duplicate subsystems after the first.
Signed-off-by: Daniel Dao <dqminh89@gmail.com>
Runc needs to copy certain files from the top of the cgroup cpuset hierarchy
into the container's cpuset cgroup directory. Currently, runc determines
which directory is the top of the hierarchy by using the parent dir of
the first entry in /proc/self/mountinfo of type cgroup.
This creates problems when cgroup subsystems are mounted arbitrarily in
different dirs on the host.
Now, we use the most deeply nested mountpoint that contains the
container's cpuset cgroup directory.
Signed-off-by: Konstantinos Karampogias <konstantinos.karampogias@swisscom.com>
Signed-off-by: Will Martin <wmartin@pivotal.io>
Prior to this change a cgroup with a `:` character in it's path was not
parsed correctly (as occurs on some instances of systemd cgroups under
some versions of systemd, e.g. 225 with accounting).
This fixes that issue and adds a test.
Signed-off-by: Euan Kemp <euank@coreos.com>
GetMounts is very cpu-expensive. I'll change other funcs in this package
to reuse code from GetCgroupMounts later.
Signed-off-by: Alexander Morozov <lk4d4@docker.com>