mirror of
https://github.com/samber/lo.git
synced 2026-04-22 15:37:14 +08:00
035f1b358a
* feat(exp,simd): adding SumAxB helpers * feat(exp,simd): adding MeanAxB and ClampAxB helpers * feat(exp,simd): adding MinAxB and MaxAxB helpers * refactor(exp,simd): group perf helper category + architecture * feat(exp,simd): adding ContainsAxB helpers * perf(exp,simd): cast to unsafe slice once * feat(exp,simd): call the right SIMD helper based on local architecture * chore: internal dependency linking * Update exp/simd/math.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * style: fix linter * style: fix linter * chore: enable simd in makefile * chore(ci): add simd package to test runs * chore(ci): add simd package to test runs only for go 1.26 * fix(simd): fix overflow * fix(simd): fix overflow and apply the same behavior than lo.Mean * doc(exp,simd): adding initial doc * refactor(simd): move intersect_avx2 and intersect_sse code into intersect_avx512 * fix(simd): call SSE fallback instead of lo.Sum for default helpers * feat(simd): cache simd features on package init to avoid repeated checks * perf(exp,simd): precompute length + improve code quality * perf(exp,simd): faster iteration for min/max value * test(exp,simd): adding benchmarks * test(exp,simd): adding benchmarks results * test(exp,simd): adding benchmarks results * doc(exp,simd): adding warning for overflows in SIMD operations * feat(exp,simd): adding more dispatch helpers * feat(exp,simd): adding SumBy variants * feat(exp,simd): adding MeanBy variants * fix(exp,simd): faster clamp * 💄 * doc(exp,simd): adding SumBy + MeanBy * fix(exp,simd): faster SIMD operations * chore(ci): enable the benchmarks temporary * chore(ci): display cpu architecture before running tests * chore(ci): github actions are hidding some useful stuffs * chore(ci): no SIMD VM available at Github during the weekend ??? * test(exp,simd): larger epsilon * oops * perf(exp,simd): faster iterations * doc(exp,simd): report last version of benchmarks * 💄 --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2.9 KiB
2.9 KiB
name, slug, sourceRef, category, subCategory, similarHelpers, position, signatures
| name | slug | sourceRef | category | subCategory | similarHelpers | position | signatures | |||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Min | min | exp/simd/math_sse.go#L834 | exp | simd |
|
20 |
|
Finds the minimum value in a collection using SIMD instructions. The suffix (x2, x4, x8, x16, x32, x64) indicates the number of lanes processed simultaneously.
Requirements
- Go 1.26+ with
GOEXPERIMENT=simd - amd64 architecture only
CPU compatibility
| SIMD variant | Lanes | Required flags | Typical CPUs |
|---|---|---|---|
| SSE (xN) | 2-16 | sse2 |
All amd64 |
| AVX2 (xN) | 4-32 | avx2 |
Intel Haswell+, AMD Excavator+ |
| AVX-512 (xN) | 8-64 | avx512f |
Intel Skylake-X+, some Xeons |
Note
: Choose the variant matching your CPU's capabilities. Higher lane counts provide better performance but require newer CPU support.
// Using AVX2 variant (32 lanes at once) - Intel Haswell+ / AMD Excavator+
min := simd.MinInt8x32([]int8{5, 2, 8, 1, 9})
// 1
// Using AVX-512 variant (16 lanes at once) - Intel Skylake-X+
min := simd.MinFloat32x16([]float32{3.5, 1.2, 4.8, 2.1})
// 1.2
// Using SSE variant (4 lanes at once) - works on all amd64
min := simd.MinInt32x4([]int32{100, 50, 200, 75})
// 50
// Empty collection returns 0
min := simd.MinUint16x8([]uint16{})
// 0