Commit Graph

24 Commits

Author SHA1 Message Date
d-enk 68f827d9bf perf: optimize Substring to work directly with strings instead of converting to runes (#822)
* perf: optimize Substring to work directly with strings instead of converting to runes

- Rewrite Substring to iterate over string bytes directly, avoiding full []rune conversion
- Improve performance for long strings by only processing necessary portions
- Add comprehensive test cases for Unicode handling, invalid UTF-8, and edge cases
- Add BenchmarkSubstring to measure performance improvements
- Improve documentation with detailed parameter descriptions
- Handle invalid UTF-8 sequences by converting to []rune when needed

Bencstat:

                   │    old.txt    │               new.txt               │
                   │    sec/op     │    sec/op     vs base               │
Substring/{10_10}-4    558.85n ±  9%   39.75n ± 10%  -92.89% (p=0.000 n=8)
Substring/{50_50}-4    783.10n ±  6%   85.15n ±  5%  -89.13% (p=0.000 n=8)
Substring/{50_45}-4    773.30n ±  3%   126.5n ±  7%  -83.65% (p=0.000 n=8)
Substring/{-50_50}-4   794.00n ±  2%   177.6n ±  7%  -77.63% (p=0.000 n=8)
Substring/{-10_10}-4   542.85n ± 20%   41.82n ±  6%  -92.30% (p=0.000 n=8)
geomean               680.4n         79.52n        -88.31%

                   │  old.txt   │               new.txt                │
                   │    B/op    │   B/op    vs base                    │
Substring/{10_10}-4    432.0 ± 0%   0.0 ± 0%  -100.00% (p=0.000 n=8)
Substring/{50_50}-4    480.0 ± 0%   0.0 ± 0%  -100.00% (p=0.000 n=8)
Substring/{50_45}-4    464.0 ± 0%   0.0 ± 0%  -100.00% (p=0.000 n=8)
Substring/{-50_50}-4   480.0 ± 0%   0.0 ± 0%  -100.00% (p=0.000 n=8)
Substring/{-10_10}-4   432.0 ± 0%   0.0 ± 0%  -100.00% (p=0.000 n=8)

                   │  old.txt   │                new.txt                 │
                   │ allocs/op  │ allocs/op   vs base                    │
Substring/{10_10}-4    2.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=8)
Substring/{50_50}-4    2.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=8)
Substring/{50_45}-4    2.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=8)
Substring/{-50_50}-4   2.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=8)
Substring/{-10_10}-4   2.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=8)

* Enhance substring documentation with Unicode details

Returns a substring starting at the given offset with the specified length. Supports negative offsets; out-of-bounds are clamped. Operates on Unicode runes (characters) and is optimized for zero allocations.

---------

Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2026-02-27 22:19:20 +01:00
Samuel Berthe a602a36075 test: adding missing test cases to ellipsis (#809) 2026-02-21 22:56:05 +01:00
Samuel Berthe 7f2504a902 💄 2026-02-21 19:32:45 +01:00
Varun Chawla 0b4623da1e fix: make Ellipsis operate on runes instead of bytes to prevent Unicode truncation (#796)
* fix: make Ellipsis operate on runes instead of bytes to prevent Unicode truncation

The Ellipsis function previously used byte-based length counting (len(str))
and byte-based slicing (str[:length-3]), which could split multi-byte
Unicode characters in the middle, producing garbled output.

This changes the function to use []rune conversion so the length parameter
counts Unicode code points instead of bytes. Emoji, CJK ideographs, and
other multi-byte characters are now never split in the middle.

Fixes #520

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: avoid rune slice allocation in Ellipsis

Use range-based iteration to count runes without allocating a []rune
slice, per reviewer suggestion. The early-return for length < 3 is
kept explicit for clarity.

* Simplify Ellipsis: remove early return for length < 3, reuse ellipsis const

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 19:29:10 +01:00
Samuel Berthe fedd0b6d2d doc: explain chunkstring inconsistency (#789)
* doc: explain chunkstring inconsistency

* doc: explain chunkstring inconsistency
2026-01-27 18:53:04 +01:00
d-enk 123d5c2531 refactor: remove some redundant checks (#771) 2026-01-12 20:42:12 +01:00
Nathan Baulch 43cef1f439 feat: new iter package (#672)
* lint: pin golangci-lint version

* lint: fix issues triggered by go1.23 upgrade

* feat: new iter package

* lint: fix linter issues

* fix: restore go1.18

* fix: rename package to "it"

* feat: assign multiple sequences of maps

* fix: panic in DropRight if n = 0

* docs: fix incorrect non-iter helper references

* feat: implement Invert helper

* feat: helpers for creating and checking empty sequences

* feat: implement Reverse helper

* feat: implement ReduceRight helper

* feat: implement Shuffle helper

* feat: implement Sample* helpers

* refactor: rename helpers with Seq convention

* feat: implement SeqToChannel2 helper

* feat: implement HasPrefix/HasSuffix helpers

* chore: port recent changes

* perf: only iterate collection once in Every

* refactor: reduce dupe code by reusing helpers internally

* perf: reuse internal Mode slice

* feat: implement Length helper

* chore: duplicate unit tests for *I helpers

* fix: omit duplicates in second Intersect list

* feat: intersect more than 2 sequences

* feat: implement Drain helper

* feat: implement Seq/Seq2 conversion helpers

* refactor: rename *Right* to *Last*

* chore: minor cleanup

* refactor: consistent predicate/transform parameter names

* perf: abort Slice/Subset once upper bound reached

* refactor: rename IsSortedByKey to IsSortedBy

* refactor: reuse more helpers internally

* feat: implement Cut* helpers

* feat: implement Trim* helpers

* perf: reduce allocations

* docs: describe iteration and allocation expectations

* Update .github/workflows/lint.yml

---------

Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2025-10-02 19:23:16 +02:00
Nathan Baulch 1b92b5c7db lint: enable 7 more linters (#686)
* lint: enable and fix perfsprint issues

* lint: enable and fix nolintlint issues

* lint: enable and fix godot issues

* lint: enable and fix thelper issues

* lint: enable and fix tparallel issues

* lint: enable and fix paralleltest issues

* lint: enable and fix predeclared issues
2025-09-25 13:18:25 +02:00
Samuel Berthe 268215359e fix(string): fix division by zero (#684) 2025-09-25 04:21:56 +02:00
Nathan Baulch 7170719ec0 lint: unit test improvements (#674)
* lint: pin golangci-lint version

* lint: use is.Empty where possible

* lint: use is.ElementsMatch for unsorted slices

* lint: remove redundant is.Len assertions

* lint: use is.Zero to assert zero structs

* fix: misc assertion issues

* lint: more consistent test case pattern

* fix: reversed expect/actual assert values

* lint: use is.ErrorIs and is.EqualError for errors

* Update golangci-lint version in workflow

---------

Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2025-09-24 21:02:52 +02:00
Nathan Baulch b5e290abe0 fix: more consistent panic strings (#678)
* lint: pin golangci-lint version

* fix: more consistent panic strings

* Update golangci-lint version in workflow

Updated golangci-lint action version to v2.4.

---------

Co-authored-by: Samuel Berthe <dev@samuel-berthe.fr>
2025-09-24 21:02:02 +02:00
Nathan Baulch 76b76a7adb lint: Apply testifylint linter recommendations (#669) 2025-09-20 00:50:00 +02:00
Samuel Berthe 741cdfdb03 feat(ellipsis): trim after truncating 2024-08-19 00:14:36 +02:00
Samuel Berthe 40f630f33a feat(ellipsis): trim before truncating 2024-08-18 16:36:51 +02:00
Samuel Berthe de8e023551 fix: rename Elipse to Ellipsis 2024-08-18 16:33:20 +02:00
mr 1ca9c7b4e5 Update string.go (#496)
* Update string.go

more reasonable

* test
2024-07-17 19:08:46 +02:00
Samuel Berthe e5e4f028e4 feat: adding Elipse (#470) 2024-06-27 15:42:03 +02:00
eiixy 266436bb40 feat: add string conversion functions (#466)
* feat: add string conversion functions

* fix: fix `Capitalize`, update tests

* fix: fix `Capitalize`, update tests

* update README.md

* update tests

* update `Capitalize`

* style: unify coding style
2024-06-27 12:56:08 +02:00
Samuel Berthe 9ec076e4f6 test: adding some tests to Substring (see #288) 2023-03-20 17:59:52 +01:00
Liu Shuang de3bccf5d0 fix: substring support utf8 character (#327) 2023-03-20 15:14:00 +01:00
Corentin Clabaut a3c90f1ac4 Add RandomString (#266)
* Add RandomString

* PR update
2022-11-15 23:12:57 +01:00
Samuel Berthe 31f3bc3a85 test: parallel tests everywhere (#228) 2022-10-02 21:38:26 +02:00
Corentin Clabaut 6126b6497c Implement ChunkString (#188)
Implement ChunkString
2022-07-29 11:38:33 +02:00
Samuel Berthe 94d54a8f47 feat: adding runelength 2022-05-01 00:22:36 +02:00