Files
lo/docs/data/simd-max.md
T
Samuel Berthe 035f1b358a Experiments: adding SIMD helpers (#801)
* feat(exp,simd): adding SumAxB helpers

* feat(exp,simd): adding MeanAxB and ClampAxB helpers

* feat(exp,simd): adding MinAxB and MaxAxB helpers

* refactor(exp,simd): group perf helper category + architecture

* feat(exp,simd): adding ContainsAxB helpers

* perf(exp,simd): cast to unsafe slice once

* feat(exp,simd): call the right SIMD helper based on local architecture

* chore: internal dependency linking

* Update exp/simd/math.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* style: fix linter

* style: fix linter

* chore: enable simd in makefile

* chore(ci): add simd package to test runs

* chore(ci): add simd package to test runs only for go 1.26

* fix(simd): fix overflow

* fix(simd): fix overflow and apply the same behavior than lo.Mean

* doc(exp,simd): adding initial doc

* refactor(simd): move intersect_avx2 and intersect_sse code into intersect_avx512

* fix(simd): call SSE fallback instead of lo.Sum for default helpers

* feat(simd): cache simd features on package init to avoid repeated checks

* perf(exp,simd): precompute length + improve code quality

* perf(exp,simd): faster iteration for min/max value

* test(exp,simd): adding benchmarks

* test(exp,simd): adding benchmarks results

* test(exp,simd): adding benchmarks results

* doc(exp,simd): adding warning for overflows in SIMD operations

* feat(exp,simd): adding more dispatch helpers

* feat(exp,simd): adding SumBy variants

* feat(exp,simd): adding MeanBy variants

* fix(exp,simd): faster clamp

* 💄

* doc(exp,simd): adding SumBy + MeanBy

* fix(exp,simd): faster SIMD operations

* chore(ci): enable the benchmarks temporary

* chore(ci): display cpu architecture before running tests

* chore(ci): github actions are hidding some useful stuffs

* chore(ci): no SIMD VM available at Github during the weekend ???

* test(exp,simd): larger epsilon

* oops

* perf(exp,simd): faster iterations

* doc(exp,simd): report last version of benchmarks

* 💄

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-02-21 19:19:36 +01:00

2.9 KiB

name, slug, sourceRef, category, subCategory, similarHelpers, position, signatures
name slug sourceRef category subCategory similarHelpers position signatures
Max max exp/simd/math_sse.go#L1328 exp simd
exp#simd#max
30
func MaxInt8x16[T ~int8](collection []T) T
func MaxInt8x32[T ~int8](collection []T) T
func MaxInt8x64[T ~int8](collection []T) T
func MaxInt16x8[T ~int16](collection []T) T
func MaxInt16x16[T ~int16](collection []T) T
func MaxInt16x32[T ~int16](collection []T) T
func MaxInt32x4[T ~int32](collection []T) T
func MaxInt32x8[T ~int32](collection []T) T
func MaxInt32x16[T ~int32](collection []T) T
func MaxInt64x2[T ~int64](collection []T) T
func MaxInt64x4[T ~int64](collection []T) T
func MaxInt64x8[T ~int64](collection []T) T
func MaxUint8x16[T ~uint8](collection []T) T
func MaxUint8x32[T ~uint8](collection []T) T
func MaxUint8x64[T ~uint8](collection []T) T
func MaxUint16x8[T ~uint16](collection []T) T
func MaxUint16x16[T ~uint16](collection []T) T
func MaxUint16x32[T ~uint16](collection []T) T
func MaxUint32x4[T ~uint32](collection []T) T
func MaxUint32x8[T ~uint32](collection []T) T
func MaxUint32x16[T ~uint32](collection []T) T
func MaxUint64x2[T ~uint64](collection []T) T
func MaxUint64x4[T ~uint64](collection []T) T
func MaxUint64x8[T ~uint64](collection []T) T
func MaxFloat32x4[T ~float32](collection []T) T
func MaxFloat32x8[T ~float32](collection []T) T
func MaxFloat32x16[T ~float32](collection []T) T
func MaxFloat64x2[T ~float64](collection []T) T
func MaxFloat64x4[T ~float64](collection []T) T
func MaxFloat64x8[T ~float64](collection []T) T

Finds the maximum value in a collection using SIMD instructions. The suffix (x2, x4, x8, x16, x32, x64) indicates the number of lanes processed simultaneously.

Requirements

  • Go 1.26+ with GOEXPERIMENT=simd
  • amd64 architecture only

CPU compatibility

SIMD variant Lanes Required flags Typical CPUs
SSE (xN) 2-16 sse2 All amd64
AVX2 (xN) 4-32 avx2 Intel Haswell+, AMD Excavator+
AVX-512 (xN) 8-64 avx512f Intel Skylake-X+, some Xeons

Note

: Choose the variant matching your CPU's capabilities. Higher lane counts provide better performance but require newer CPU support.

// Using AVX2 variant (32 lanes at once) - Intel Haswell+ / AMD Excavator+
max := simd.MaxInt8x32([]int8{5, 2, 8, 1, 9})
// 9
// Using AVX-512 variant (16 lanes at once) - Intel Skylake-X+
max := simd.MaxFloat32x16([]float32{3.5, 1.2, 4.8, 2.1})
// 4.8
// Using SSE variant (4 lanes at once) - works on all amd64
max := simd.MaxInt32x4([]int32{100, 50, 200, 75})
// 200
// Empty collection returns 0
max := simd.MaxUint16x8([]uint16{})
// 0