mirror of
https://github.com/samber/lo.git
synced 2026-04-22 15:37:14 +08:00
035f1b358a
* feat(exp,simd): adding SumAxB helpers * feat(exp,simd): adding MeanAxB and ClampAxB helpers * feat(exp,simd): adding MinAxB and MaxAxB helpers * refactor(exp,simd): group perf helper category + architecture * feat(exp,simd): adding ContainsAxB helpers * perf(exp,simd): cast to unsafe slice once * feat(exp,simd): call the right SIMD helper based on local architecture * chore: internal dependency linking * Update exp/simd/math.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * style: fix linter * style: fix linter * chore: enable simd in makefile * chore(ci): add simd package to test runs * chore(ci): add simd package to test runs only for go 1.26 * fix(simd): fix overflow * fix(simd): fix overflow and apply the same behavior than lo.Mean * doc(exp,simd): adding initial doc * refactor(simd): move intersect_avx2 and intersect_sse code into intersect_avx512 * fix(simd): call SSE fallback instead of lo.Sum for default helpers * feat(simd): cache simd features on package init to avoid repeated checks * perf(exp,simd): precompute length + improve code quality * perf(exp,simd): faster iteration for min/max value * test(exp,simd): adding benchmarks * test(exp,simd): adding benchmarks results * test(exp,simd): adding benchmarks results * doc(exp,simd): adding warning for overflows in SIMD operations * feat(exp,simd): adding more dispatch helpers * feat(exp,simd): adding SumBy variants * feat(exp,simd): adding MeanBy variants * fix(exp,simd): faster clamp * 💄 * doc(exp,simd): adding SumBy + MeanBy * fix(exp,simd): faster SIMD operations * chore(ci): enable the benchmarks temporary * chore(ci): display cpu architecture before running tests * chore(ci): github actions are hidding some useful stuffs * chore(ci): no SIMD VM available at Github during the weekend ??? * test(exp,simd): larger epsilon * oops * perf(exp,simd): faster iterations * doc(exp,simd): report last version of benchmarks * 💄 --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
41 KiB
41 KiB
Benchmark
Summary
Benchmarks show that running SIMD operations on small datasets is slower:
BenchmarkSumInt8/small/Fallback-lo-2 248740710 5.218 ns/op
BenchmarkSumInt8/small/SSE-x16-2 126181464 9.485 ns/op
BenchmarkSumInt8/small/AVX2-x32-2 73059427 14.44 ns/op
BenchmarkSumInt8/small/AVX512-x64-2 49913169 24.41 ns/op
But SIMD is much faster on large datasets:
BenchmarkSumInt8/xlarge/Fallback-lo-2 273898 4383 ns/op
BenchmarkSumInt8/xlarge/SSE-x16-2 6928408 173.1 ns/op
BenchmarkSumInt8/xlarge/AVX2-x32-2 12639586 94.09 ns/op
BenchmarkSumInt8/xlarge/AVX512-x64-2 13509693 89.67 ns/op
Run
export GOEXPERIMENT=simd
cd exp/simd/
go test -bench ./... -run=^Benchmark -benchmem -bench
# get instruction set
cat /proc/cpuinfo
Result
archsimd.X86: AVX=true AVX2=true AVX512=true
goos: linux
goarch: amd64
pkg: github.com/samber/lo/exp/simd
cpu: AMD EPYC 9454P 48-Core Processor
...
PASS
ok github.com/samber/lo/exp/simd 596.213s
| Benchmark | Iterations | Time/op | Bytes/op | Allocs/op |
|---|---|---|---|---|
| BenchmarkContainsInt8/tiny/SSE-x16-2 | 312359204 | 3.625 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt8/tiny/AVX2-x32-2 | 277194441 | 4.531 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt8/tiny/AVX512-x64-2 | 336853209 | 3.401 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt8/small/SSE-x16-2 | 449132103 | 2.670 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt8/small/AVX2-x32-2 | 148648339 | 8.332 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt8/small/AVX512-x64-2 | 143124861 | 7.982 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt8/medium/SSE-x16-2 | 276816714 | 4.302 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt8/medium/AVX2-x32-2 | 345774957 | 3.529 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt8/medium/AVX512-x64-2 | 449868722 | 2.669 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt8/large/SSE-x16-2 | 100000000 | 10.68 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt8/large/AVX2-x32-2 | 172934200 | 6.941 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt8/large/AVX512-x64-2 | 280992625 | 4.384 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt8/xlarge/SSE-x16-2 | 187189599 | 6.203 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt8/xlarge/AVX2-x32-2 | 274289563 | 4.042 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt8/xlarge/AVX512-x64-2 | 375048555 | 2.953 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt8/massive/SSE-x16-2 | 86434948 | 14.02 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt8/massive/AVX2-x32-2 | 153742346 | 8.012 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt8/massive/AVX512-x64-2 | 259404483 | 5.214 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt16/tiny/SSE-x8-2 | 270309470 | 4.315 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt16/tiny/AVX2-x16-2 | 264874646 | 4.281 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt16/tiny/AVX512-x32-2 | 328810479 | 3.593 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt16/small/SSE-x8-2 | 374742561 | 3.206 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt16/small/AVX2-x16-2 | 449838870 | 2.678 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt16/small/AVX512-x32-2 | 143845734 | 8.484 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt16/medium/SSE-x8-2 | 185415590 | 6.448 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt16/medium/AVX2-x16-2 | 273780868 | 4.268 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt16/medium/AVX512-x32-2 | 350067484 | 3.431 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt16/large/SSE-x8-2 | 61109778 | 19.66 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt16/large/AVX2-x16-2 | 100000000 | 10.74 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt16/large/AVX512-x32-2 | 182886646 | 6.575 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt16/xlarge/SSE-x8-2 | 15220682 | 71.53 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt16/xlarge/AVX2-x16-2 | 31876572 | 37.57 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt16/xlarge/AVX512-x32-2 | 61992217 | 19.55 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt16/massive/SSE-x8-2 | 4372000 | 262.8 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt16/massive/AVX2-x16-2 | 9019658 | 131.1 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt16/massive/AVX512-x32-2 | 16568430 | 74.25 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt32/tiny/SSE-x4-2 | 499209442 | 2.406 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt32/tiny/AVX2-x8-2 | 350479609 | 3.433 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt32/tiny/AVX512-x16-2 | 280918554 | 4.309 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt32/small/SSE-x4-2 | 299561596 | 4.028 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt32/small/AVX2-x8-2 | 374064310 | 3.205 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt32/small/AVX512-x16-2 | 499219765 | 2.418 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt32/medium/SSE-x4-2 | 100000000 | 10.42 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt32/medium/AVX2-x8-2 | 187391635 | 6.403 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt32/medium/AVX512-x16-2 | 307955800 | 3.875 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt32/large/SSE-x4-2 | 33256420 | 36.05 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt32/large/AVX2-x8-2 | 62421526 | 19.23 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt32/large/AVX512-x16-2 | 100000000 | 10.36 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt32/xlarge/SSE-x4-2 | 8328856 | 144.9 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt32/xlarge/AVX2-x8-2 | 17039037 | 71.14 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt32/xlarge/AVX512-x16-2 | 28740241 | 41.77 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt32/massive/SSE-x4-2 | 3525885 | 332.3 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt32/massive/AVX2-x8-2 | 7318027 | 164.5 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt32/massive/AVX512-x16-2 | 12181366 | 99.08 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt64/tiny/SSE-x2-2 | 409014308 | 2.934 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt64/tiny/AVX2-x4-2 | 449210791 | 2.667 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt64/tiny/AVX512-x8-2 | 280998146 | 4.293 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt64/small/SSE-x2-2 | 195631429 | 6.172 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt64/small/AVX2-x4-2 | 281272394 | 4.308 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt64/small/AVX512-x8-2 | 408933924 | 3.044 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt64/medium/SSE-x2-2 | 63006909 | 18.94 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt64/medium/AVX2-x4-2 | 100000000 | 10.67 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt64/medium/AVX512-x8-2 | 197411126 | 6.016 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt64/large/SSE-x2-2 | 17098578 | 70.57 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt64/large/AVX2-x4-2 | 32558013 | 37.07 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt64/large/AVX512-x8-2 | 57629485 | 20.94 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt64/xlarge/SSE-x2-2 | 4286155 | 281.8 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt64/xlarge/AVX2-x4-2 | 8344772 | 143.8 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt64/xlarge/AVX512-x8-2 | 14428276 | 83.14 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt64/massive/SSE-x2-2 | 1000000 | 1012 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt64/massive/AVX2-x4-2 | 2350525 | 510.6 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt64/massive/AVX512-x8-2 | 3773523 | 318.1 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint8/tiny/SSE-x16-2 | 338880315 | 3.332 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint8/tiny/AVX2-x32-2 | 320784217 | 3.559 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint8/tiny/AVX512-x64-2 | 341599854 | 3.331 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint8/small/SSE-x16-2 | 449579424 | 2.670 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint8/small/AVX2-x32-2 | 140368142 | 8.648 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint8/small/AVX512-x64-2 | 146828888 | 8.182 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint8/medium/SSE-x16-2 | 374443974 | 3.472 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint8/medium/AVX2-x32-2 | 449271607 | 2.672 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint8/medium/AVX512-x64-2 | 598525731 | 2.018 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint8/large/SSE-x16-2 | 254828565 | 4.956 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint8/large/AVX2-x32-2 | 407777484 | 2.938 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint8/large/AVX512-x64-2 | 443472316 | 2.666 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint8/xlarge/SSE-x16-2 | 162196827 | 7.867 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint8/xlarge/AVX2-x32-2 | 268324950 | 4.518 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint8/xlarge/AVX512-x64-2 | 400437789 | 2.952 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint8/massive/SSE-x16-2 | 214548872 | 5.640 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint8/massive/AVX2-x32-2 | 348431553 | 3.391 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint8/massive/AVX512-x64-2 | 459781908 | 2.455 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint16/tiny/SSE-x8-2 | 276271912 | 4.297 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint16/tiny/AVX2-x16-2 | 281145528 | 4.270 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint16/tiny/AVX512-x32-2 | 315343911 | 3.667 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint16/small/SSE-x8-2 | 374632351 | 3.204 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint16/small/AVX2-x16-2 | 449355727 | 2.670 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint16/small/AVX512-x32-2 | 138088146 | 8.395 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint16/medium/SSE-x8-2 | 187276191 | 6.582 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint16/medium/AVX2-x16-2 | 281107980 | 4.306 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint16/medium/AVX512-x32-2 | 358850328 | 3.516 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint16/large/SSE-x8-2 | 59025931 | 19.98 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint16/large/AVX2-x16-2 | 100000000 | 10.68 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint16/large/AVX512-x32-2 | 179631354 | 6.569 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint16/xlarge/SSE-x8-2 | 16576267 | 71.63 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint16/xlarge/AVX2-x16-2 | 32578981 | 36.96 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint16/xlarge/AVX512-x32-2 | 61464870 | 19.44 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint16/massive/SSE-x8-2 | 2153736 | 557.4 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint16/massive/AVX2-x16-2 | 4225728 | 281.3 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint16/massive/AVX512-x32-2 | 7829936 | 145.1 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint32/tiny/SSE-x4-2 | 499390296 | 2.403 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint32/tiny/AVX2-x8-2 | 362964080 | 3.342 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint32/tiny/AVX512-x16-2 | 281063364 | 4.268 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint32/small/SSE-x4-2 | 293867554 | 4.004 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint32/small/AVX2-x8-2 | 374510434 | 3.203 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint32/small/AVX512-x16-2 | 499714206 | 2.402 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint32/medium/SSE-x4-2 | 100000000 | 10.42 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint32/medium/AVX2-x8-2 | 187258657 | 6.405 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint32/medium/AVX512-x16-2 | 312999210 | 3.881 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint32/large/SSE-x4-2 | 33298366 | 36.02 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint32/large/AVX2-x8-2 | 62409421 | 19.23 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint32/large/AVX512-x16-2 | 100000000 | 10.10 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint32/xlarge/SSE-x4-2 | 7948898 | 143.6 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint32/xlarge/AVX2-x8-2 | 17021738 | 70.49 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint32/xlarge/AVX512-x16-2 | 28742320 | 41.77 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint32/massive/SSE-x4-2 | 1595774 | 751.1 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint32/massive/AVX2-x8-2 | 3094242 | 381.1 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint32/massive/AVX512-x16-2 | 5080051 | 238.3 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint64/tiny/SSE-x2-2 | 374760351 | 3.203 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint64/tiny/AVX2-x4-2 | 498763054 | 2.419 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint64/tiny/AVX512-x8-2 | 319635274 | 3.582 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint64/small/SSE-x2-2 | 187032452 | 6.447 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint64/small/AVX2-x4-2 | 299546244 | 4.009 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint64/small/AVX512-x8-2 | 373937659 | 3.207 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint64/medium/SSE-x2-2 | 62413118 | 19.23 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint64/medium/AVX2-x4-2 | 113978791 | 10.42 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint64/medium/AVX512-x8-2 | 186965330 | 6.484 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint64/large/SSE-x2-2 | 17005768 | 70.57 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint64/large/AVX2-x4-2 | 33286495 | 36.69 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint64/large/AVX512-x8-2 | 61486065 | 19.93 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint64/xlarge/SSE-x2-2 | 4154370 | 280.8 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint64/xlarge/AVX2-x4-2 | 8371358 | 148.2 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint64/xlarge/AVX512-x8-2 | 14193795 | 72.36 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint64/massive/SSE-x2-2 | 1773937 | 676.4 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint64/massive/AVX2-x4-2 | 3500168 | 343.0 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsUint64/massive/AVX512-x8-2 | 7097266 | 249.3 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat32/tiny/SSE-x4-2 | 410522160 | 2.675 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat32/tiny/AVX2-x8-2 | 308565882 | 3.814 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat32/tiny/AVX512-x16-2 | 315331897 | 3.755 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat32/small/SSE-x4-2 | 278219434 | 4.642 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat32/small/AVX2-x8-2 | 362945481 | 3.287 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat32/small/AVX512-x16-2 | 408523153 | 2.941 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat32/medium/SSE-x4-2 | 100000000 | 10.77 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat32/medium/AVX2-x8-2 | 186186376 | 6.409 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat32/medium/AVX512-x16-2 | 264255108 | 4.619 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat32/large/SSE-x4-2 | 33028701 | 36.27 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat32/large/AVX2-x8-2 | 62465360 | 19.53 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat32/large/AVX512-x16-2 | 108213310 | 10.95 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat32/xlarge/SSE-x4-2 | 8359381 | 143.6 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat32/xlarge/AVX2-x8-2 | 17042701 | 70.46 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat32/xlarge/AVX512-x16-2 | 31806921 | 37.13 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat32/massive/SSE-x4-2 | 1000000 | 1100 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat32/massive/AVX2-x8-2 | 2164672 | 554.4 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat32/massive/AVX512-x16-2 | 4201453 | 293.9 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat64/tiny/SSE-x2-2 | 362183925 | 3.223 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat64/tiny/AVX2-x4-2 | 449021466 | 2.687 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat64/tiny/AVX512-x8-2 | 320176149 | 3.820 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat64/small/SSE-x2-2 | 187139116 | 6.415 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat64/small/AVX2-x4-2 | 280722585 | 4.300 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat64/small/AVX512-x8-2 | 335670502 | 3.472 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat64/medium/SSE-x2-2 | 62343927 | 19.23 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat64/medium/AVX2-x4-2 | 112332902 | 10.69 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat64/medium/AVX512-x8-2 | 179610780 | 6.741 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat64/large/SSE-x2-2 | 16996959 | 70.51 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat64/large/AVX2-x4-2 | 33017950 | 36.29 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat64/large/AVX512-x8-2 | 60322328 | 19.73 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat64/xlarge/SSE-x2-2 | 4141281 | 282.9 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat64/xlarge/AVX2-x4-2 | 7856590 | 145.0 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat64/xlarge/AVX512-x8-2 | 16623739 | 72.06 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat64/massive/SSE-x2-2 | 541202 | 2195 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat64/massive/AVX2-x4-2 | 1000000 | 1158 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsFloat64/massive/AVX512-x8-2 | 2115301 | 560.4 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsWorstCase/SSE-x4-2 | 7651734 | 145.6 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsWorstCase/AVX2-x8-2 | 14921599 | 70.49 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsWorstCase/AVX512-x16-2 | 28708478 | 41.38 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsBestCase/SSE-x4-2 | 534237578 | 2.136 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsBestCase/AVX2-x8-2 | 561252645 | 2.159 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsBestCase/AVX512-x16-2 | 560396454 | 2.137 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsNegative/tiny/SSE-x4-2 | 499649139 | 2.401 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsNegative/tiny/AVX2-x8-2 | 329743240 | 3.421 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsNegative/tiny/AVX512-x16-2 | 280516392 | 4.276 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsNegative/small/SSE-x4-2 | 299373171 | 4.006 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsNegative/small/AVX2-x8-2 | 374407988 | 3.267 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsNegative/small/AVX512-x16-2 | 486948346 | 2.424 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsNegative/medium/SSE-x4-2 | 100000000 | 10.41 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsNegative/medium/AVX2-x8-2 | 182899621 | 6.412 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsNegative/medium/AVX512-x16-2 | 311969776 | 3.829 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsNegative/large/SSE-x4-2 | 33309816 | 36.04 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsNegative/large/AVX2-x8-2 | 59912676 | 19.74 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsNegative/large/AVX512-x16-2 | 100000000 | 10.65 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsNegative/xlarge/SSE-x4-2 | 8346818 | 143.7 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsNegative/xlarge/AVX2-x8-2 | 16980399 | 70.54 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsNegative/xlarge/AVX512-x16-2 | 28676455 | 42.94 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsNegative/massive/SSE-x4-2 | 1000000 | 1151 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsNegative/massive/AVX2-x8-2 | 2161594 | 555.2 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsNegative/massive/AVX512-x16-2 | 3549094 | 350.5 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt8ByWidth/SSE-x16-2 | 331533141 | 3.222 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt8ByWidth/AVX2-x32-2 | 408741681 | 3.193 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt8ByWidth/AVX512-x64-2 | 365382873 | 3.241 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt64SteadyState/SSE-x2-2 | 5722603 | 211.5 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt64SteadyState/AVX2-x4-2 | 11711869 | 103.1 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkContainsInt64SteadyState/AVX512-x8-2 | 19671033 | 61.36 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt8/small/Fallback-lo-2 | 248740710 | 5.218 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt8/small/SSE-x16-2 | 126181464 | 9.485 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt8/small/AVX2-x32-2 | 73059427 | 14.44 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt8/small/AVX512-x64-2 | 49913169 | 24.41 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt8/medium/Fallback-lo-2 | 17278075 | 69.96 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt8/medium/SSE-x16-2 | 100000000 | 10.58 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt8/medium/AVX2-x32-2 | 91620999 | 13.10 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt8/medium/AVX512-x64-2 | 54082130 | 22.20 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt8/large/Fallback-lo-2 | 2006178 | 576.3 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt8/large/SSE-x16-2 | 41836690 | 27.82 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt8/large/AVX2-x32-2 | 51735399 | 23.04 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt8/large/AVX512-x64-2 | 40861586 | 29.40 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt8/xlarge/Fallback-lo-2 | 273898 | 4383 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt8/xlarge/SSE-x16-2 | 6928408 | 173.1 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt8/xlarge/AVX2-x32-2 | 12639586 | 94.09 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt8/xlarge/AVX512-x64-2 | 13509693 | 89.67 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt16/small/Fallback-lo-2 | 249444103 | 5.012 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt16/small/SSE-x8-2 | 244927230 | 5.052 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt16/small/AVX2-x16-2 | 122088517 | 9.715 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt16/small/AVX512-x32-2 | 54098370 | 22.00 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt16/medium/Fallback-lo-2 | 15782683 | 72.54 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt16/medium/SSE-x8-2 | 100000000 | 10.51 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt16/medium/AVX2-x16-2 | 100000000 | 10.75 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt16/medium/AVX512-x32-2 | 56147455 | 21.38 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt16/large/Fallback-lo-2 | 2173214 | 598.1 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt16/large/SSE-x8-2 | 26319481 | 44.73 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt16/large/AVX2-x16-2 | 40459519 | 27.91 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt16/large/AVX512-x32-2 | 39359752 | 31.28 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt16/xlarge/Fallback-lo-2 | 273932 | 4382 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt16/xlarge/SSE-x8-2 | 3557265 | 331.2 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt16/xlarge/AVX2-x16-2 | 6930166 | 173.4 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt16/xlarge/AVX512-x32-2 | 12100244 | 97.01 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt32/small/Fallback-lo-2 | 249566539 | 4.808 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt32/small/SSE-x4-2 | 259250019 | 4.581 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt32/small/AVX2-x8-2 | 232858933 | 5.404 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt32/small/AVX512-x16-2 | 100000000 | 11.18 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt32/medium/Fallback-lo-2 | 17274441 | 72.28 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt32/medium/SSE-x4-2 | 58400258 | 20.56 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt32/medium/AVX2-x8-2 | 110851756 | 10.67 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt32/medium/AVX512-x16-2 | 106593603 | 11.25 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt32/large/Fallback-lo-2 | 2171817 | 551.8 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt32/large/SSE-x4-2 | 8270253 | 146.0 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt32/large/AVX2-x8-2 | 22234518 | 46.06 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt32/large/AVX512-x16-2 | 37448763 | 32.31 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt32/xlarge/Fallback-lo-2 | 273699 | 4559 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt32/xlarge/SSE-x4-2 | 1000000 | 1102 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt32/xlarge/AVX2-x8-2 | 3586887 | 332.4 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt32/xlarge/AVX512-x16-2 | 7214437 | 170.5 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt64/small/Fallback-lo-2 | 417473124 | 2.886 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt64/small/SSE-x2-2 | 287521756 | 4.169 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt64/small/AVX2-x4-2 | 277783513 | 4.311 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt64/small/AVX512-x8-2 | 172823103 | 6.993 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt64/medium/Fallback-lo-2 | 34022653 | 35.27 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt64/medium/SSE-x2-2 | 49241248 | 24.05 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt64/medium/AVX2-x4-2 | 78897342 | 14.58 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt64/medium/AVX512-x8-2 | 84361297 | 14.03 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt64/large/Fallback-lo-2 | 3680988 | 282.3 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt64/large/SSE-x2-2 | 6293607 | 170.7 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt64/large/AVX2-x4-2 | 12739849 | 91.28 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt64/large/AVX512-x8-2 | 25508130 | 46.30 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt64/xlarge/Fallback-lo-2 | 546321 | 2283 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt64/xlarge/SSE-x2-2 | 877434 | 1289 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt64/xlarge/AVX2-x4-2 | 1845892 | 650.4 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt64/xlarge/AVX512-x8-2 | 2148355 | 550.8 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumFloat32/small/Fallback-lo-2 | 411100770 | 2.951 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumFloat32/small/SSE-x4-2 | 264013596 | 4.572 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumFloat32/small/AVX2-x8-2 | 174478266 | 6.911 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumFloat32/small/AVX512-x16-2 | 61182673 | 19.78 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumFloat32/medium/Fallback-lo-2 | 33815070 | 35.68 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumFloat32/medium/SSE-x4-2 | 58238188 | 20.66 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumFloat32/medium/AVX2-x8-2 | 91316544 | 13.26 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumFloat32/medium/AVX512-x16-2 | 80046624 | 15.08 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumFloat32/large/Fallback-lo-2 | 4304168 | 278.7 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumFloat32/large/SSE-x4-2 | 6198957 | 184.8 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumFloat32/large/AVX2-x8-2 | 12260169 | 86.60 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumFloat32/large/AVX512-x16-2 | 22147112 | 45.34 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumFloat32/xlarge/Fallback-lo-2 | 546901 | 2193 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumFloat32/xlarge/SSE-x4-2 | 736503 | 1622 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumFloat32/xlarge/AVX2-x8-2 | 1493887 | 810.5 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumFloat32/xlarge/AVX512-x16-2 | 2959298 | 393.4 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumFloat64/small/Fallback-lo-2 | 410778070 | 3.043 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumFloat64/small/SSE-x2-2 | 254156008 | 4.714 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumFloat64/small/AVX2-x4-2 | 227604434 | 5.323 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumFloat64/small/AVX512-x8-2 | 170099748 | 7.115 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumFloat64/medium/Fallback-lo-2 | 33646345 | 35.78 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumFloat64/medium/SSE-x2-2 | 32931152 | 34.92 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumFloat64/medium/AVX2-x4-2 | 75389446 | 16.79 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumFloat64/medium/AVX512-x8-2 | 89826181 | 13.33 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumFloat64/large/Fallback-lo-2 | 4293837 | 302.8 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumFloat64/large/SSE-x2-2 | 3146601 | 381.4 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumFloat64/large/AVX2-x4-2 | 6373876 | 184.3 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumFloat64/large/AVX512-x8-2 | 13464712 | 88.96 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumFloat64/xlarge/Fallback-lo-2 | 545764 | 2193 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumFloat64/xlarge/SSE-x2-2 | 368846 | 3390 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumFloat64/xlarge/AVX2-x4-2 | 709940 | 1613 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumFloat64/xlarge/AVX512-x8-2 | 1480214 | 808.6 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMeanInt32/small/Fallback-lo-2 | 411529147 | 3.043 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMeanInt32/small/SSE-x4-2 | 204428401 | 5.872 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMeanInt32/small/AVX2-x8-2 | 187573928 | 6.214 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMeanInt32/small/AVX512-x16-2 | 98346700 | 12.12 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMeanInt32/medium/Fallback-lo-2 | 33481442 | 35.72 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMeanInt32/medium/SSE-x4-2 | 52042394 | 22.12 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMeanInt32/medium/AVX2-x8-2 | 96288541 | 13.44 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMeanInt32/medium/AVX512-x16-2 | 100995780 | 11.90 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMeanInt32/large/Fallback-lo-2 | 4296570 | 289.9 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMeanInt32/large/SSE-x4-2 | 7743022 | 146.4 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMeanInt32/large/AVX2-x8-2 | 24355988 | 46.26 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMeanInt32/large/AVX512-x16-2 | 37322655 | 32.89 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMeanInt32/xlarge/Fallback-lo-2 | 547008 | 2193 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMeanInt32/xlarge/SSE-x4-2 | 1087246 | 1112 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMeanInt32/xlarge/AVX2-x8-2 | 1386868 | 761.9 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMeanInt32/xlarge/AVX512-x16-2 | 7166142 | 170.7 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMeanFloat64/small/Fallback-lo-2 | 349760005 | 3.449 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMeanFloat64/small/SSE-x2-2 | 189674538 | 6.293 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMeanFloat64/small/AVX2-x4-2 | 159228600 | 7.531 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMeanFloat64/small/AVX512-x8-2 | 110196433 | 10.89 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMeanFloat64/medium/Fallback-lo-2 | 32968618 | 36.17 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMeanFloat64/medium/SSE-x2-2 | 30863817 | 37.69 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMeanFloat64/medium/AVX2-x4-2 | 62428772 | 19.66 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMeanFloat64/medium/AVX512-x8-2 | 77140984 | 15.54 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMeanFloat64/large/Fallback-lo-2 | 4281057 | 280.6 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMeanFloat64/large/SSE-x2-2 | 3057349 | 389.4 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMeanFloat64/large/AVX2-x4-2 | 6509438 | 185.9 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMeanFloat64/large/AVX512-x8-2 | 12668032 | 93.50 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMeanFloat64/xlarge/Fallback-lo-2 | 545898 | 2288 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMeanFloat64/xlarge/SSE-x2-2 | 367671 | 4048 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMeanFloat64/xlarge/AVX2-x4-2 | 739941 | 1621 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMeanFloat64/xlarge/AVX512-x8-2 | 1434867 | 811.3 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMinInt32/small/SSE-x4-2 | 312338268 | 3.860 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMinInt32/small/AVX2-x8-2 | 238034872 | 5.042 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMinInt32/small/AVX512-x16-2 | 152600943 | 6.661 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMinInt32/medium/SSE-x4-2 | 61051266 | 19.73 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMinInt32/medium/AVX2-x8-2 | 91792144 | 13.11 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMinInt32/medium/AVX512-x16-2 | 99994540 | 12.18 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMinInt32/large/SSE-x4-2 | 8604774 | 140.5 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMinInt32/large/AVX2-x8-2 | 15581037 | 77.56 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMinInt32/large/AVX512-x16-2 | 30512421 | 40.24 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMinInt32/xlarge/SSE-x4-2 | 1000000 | 1110 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMinInt32/xlarge/AVX2-x8-2 | 2158272 | 557.2 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMinInt32/xlarge/AVX512-x16-2 | 4253668 | 282.6 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMinFloat64/small/SSE-x2-2 | 264129410 | 4.544 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMinFloat64/small/AVX2-x4-2 | 299587609 | 4.008 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMinFloat64/small/AVX512-x8-2 | 100000000 | 10.05 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMinFloat64/medium/SSE-x2-2 | 32778514 | 36.93 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMinFloat64/medium/AVX2-x4-2 | 53356347 | 20.30 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMinFloat64/medium/AVX512-x8-2 | 74832976 | 16.21 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMinFloat64/large/SSE-x2-2 | 3863326 | 300.0 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMinFloat64/large/AVX2-x4-2 | 7670576 | 146.5 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMinFloat64/large/AVX512-x8-2 | 14017984 | 78.21 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMinFloat64/xlarge/SSE-x2-2 | 492739 | 2195 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMinFloat64/xlarge/AVX2-x4-2 | 1000000 | 1103 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMinFloat64/xlarge/AVX512-x8-2 | 2145290 | 560.3 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMaxInt32/small/SSE-x4-2 | 306585705 | 3.860 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMaxInt32/small/AVX2-x8-2 | 237347997 | 5.086 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMaxInt32/small/AVX512-x16-2 | 201433966 | 6.130 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMaxInt32/medium/SSE-x4-2 | 60759631 | 19.92 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMaxInt32/medium/AVX2-x8-2 | 90934662 | 13.13 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMaxInt32/medium/AVX512-x16-2 | 98517944 | 12.18 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMaxInt32/large/SSE-x4-2 | 8590542 | 139.6 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMaxInt32/large/AVX2-x8-2 | 15770372 | 77.69 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMaxInt32/large/AVX512-x16-2 | 30197324 | 39.32 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMaxInt32/xlarge/SSE-x4-2 | 1000000 | 1104 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMaxInt32/xlarge/AVX2-x8-2 | 2152038 | 562.1 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMaxInt32/xlarge/AVX512-x16-2 | 3917990 | 296.7 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMaxFloat64/small/SSE-x2-2 | 249617162 | 4.816 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMaxFloat64/small/AVX2-x4-2 | 207017514 | 5.855 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMaxFloat64/small/AVX512-x8-2 | 66520290 | 17.74 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMaxFloat64/medium/SSE-x2-2 | 32307492 | 36.92 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMaxFloat64/medium/AVX2-x4-2 | 57306838 | 20.77 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMaxFloat64/medium/AVX512-x8-2 | 56911946 | 21.12 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMaxFloat64/large/SSE-x2-2 | 4259366 | 287.1 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMaxFloat64/large/AVX2-x4-2 | 7905420 | 148.9 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMaxFloat64/large/AVX512-x8-2 | 14100686 | 83.43 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMaxFloat64/xlarge/SSE-x2-2 | 545378 | 2243 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMaxFloat64/xlarge/AVX2-x4-2 | 1000000 | 1113 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkMaxFloat64/xlarge/AVX512-x8-2 | 2119741 | 565.7 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt8ByWidth/Fallback-lo-2 | 896775 | 1335 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt8ByWidth/SSE-x16-2 | 12557700 | 94.52 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt8ByWidth/AVX2-x32-2 | 18702537 | 55.03 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt8ByWidth/AVX512-x64-2 | 21342572 | 56.10 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt64SteadyState/Fallback-lo-2 | 513738 | 2195 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt64SteadyState/SSE-x2-2 | 928376 | 1296 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt64SteadyState/AVX2-x4-2 | 1836968 | 888.1 ns/op | 0 B/op | 0 allocs/op |
| BenchmarkSumInt64SteadyState/AVX512-x8-2 | 2141715 | 551.3 ns/op | 0 B/op | 0 allocs/op |