- Update benchmark - Update documentations
25 KiB
Archive Compress
Unified compression and decompression utilities for multiple algorithms with automatic format detection, encoding/decoding support, and transparent Reader/Writer wrapping.
Table of Contents
- Overview
- Architecture
- Performance
- Use Cases
- Quick Start
- Best Practices
- API Reference
- Contributing
- Improvements & Security
- Resources
- AI Transparency
- License
Overview
The compress package provides a simple, consistent interface for working with various compression formats including Gzip, Bzip2, LZ4, and XZ. It offers automatic format detection, encoding/decoding support (JSON, text marshaling), and transparent Reader/Writer wrapping for seamless integration with Go's standard io interfaces.
Design Philosophy
- Algorithm Agnostic: Single interface for multiple compression formats (Gzip, Bzip2, LZ4, XZ)
- Auto-Detection: Automatic compression format detection from data headers
- Standard Compliance: Implements encoding.TextMarshaler/Unmarshaler and json.Marshaler/Unmarshaler
- Zero-Copy Wrapping: Efficient Reader/Writer wrapping without data buffering
- Type Safety: Enum-based algorithm selection prevents invalid format strings
Key Features
- ✅ Unified Algorithm Enumeration: 5 supported formats (None, Gzip, Bzip2, LZ4, XZ)
- ✅ Automatic Detection: Magic number analysis for format identification
- ✅ Reader/Writer Factories: Transparent compression/decompression wrappers
- ✅ JSON/Text Marshaling: Configuration serialization support
- ✅ Header Validation: Format verification via magic numbers
- ✅ 97.7% Test Coverage: 165 comprehensive test specs with race detection
Architecture
Component Diagram
┌─────────────────────────────────────────────────────┐
│ Algorithm (enum type) │
├─────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────────────────┐ │
│ │ Format │ │ Detection & Parsing │ │
│ │ │ │ │ │
│ │ • String() │ │ • Parse(string) │ │
│ │ • Extension()│ │ • Detect(io.Reader) │ │
│ │ • IsNone() │ │ • DetectOnly(io.Reader) │ │
│ └──────────────┘ │ • DetectHeader([]byte) │ │
│ └──────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────┐ │
│ │ I/O Wrapping │ │
│ │ │ │
│ │ • Reader(io.Reader) → io.ReadCloser │ │
│ │ • Writer(io.WriteCloser) → io.WriteCloser │ │
│ └──────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────┐ │
│ │ Encoding/Marshaling │ │
│ │ │ │
│ │ • MarshalText() / UnmarshalText() │ │
│ │ • MarshalJSON() / UnmarshalJSON() │ │
│ └──────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ Standard Library & External │
│ │
│ compress/gzip compress/bzip2 lz4 xz │
└─────────────────────────────────────────────────────┘
| Component | Memory | Complexity | Thread-Safe |
|---|---|---|---|
| Algorithm | O(1) | Simple | ✅ Stateless |
| Parse/Detect | O(1) | Header scan | ✅ Stateless |
| Reader/Writer | O(1) | Delegation | ✅ Per instance |
Data Flow
User Input → Parse/Detect → Algorithm Selection
│
▼
Reader/Writer Wrapping
│
▼
Stdlib/External Compression Library
│
▼
Compressed/Decompressed Output
Supported Algorithms
The package supports five compression algorithms:
| Algorithm | Extension | Compression | Speed | Use Case |
|---|---|---|---|---|
| None | (none) | 0% | Instant | Pass-through, testing |
| LZ4 | .lz4 | 20-30% | ~500 MB/s | Real-time, logging |
| Gzip | .gz | 30-50% | ~100 MB/s | Web content, general |
| Bzip2 | .bz2 | 40-60% | ~10 MB/s | Archival, cold storage |
| XZ | .xz | 50-70% | ~5 MB/s | Distribution, max compression |
Magic Numbers (Header Detection):
Gzip: 0x1F 0x8B
Bzip2: 'B' 'Z' 'h' [0-9]
LZ4: 0x04 0x22 0x4D 0x18
XZ: 0xFD 0x37 0x7A 0x58 0x5A 0x00
Performance
Benchmarks
Based on actual benchmark results (AMD64, Go 1.25, 20 samples per test):
Compression Performance
Small Data (1KB):
| Algorithm | Median | Mean | CPU Time | Memory | Allocations | Ratio |
|---|---|---|---|---|---|---|
| LZ4 | <1µs | <1µs | 0.032ms | 4.5 KB | 16 | 93.1% |
| Gzip | <1µs | <1µs | 0.073ms | 795 KB | 24 | 94.2% |
| Bzip2 | 100µs | 200µs | 0.186ms | 650 KB | 34 | 90.4% |
| XZ | 300µs | 500µs | 0.513ms | 8,226 KB | 144 | 89.8% |
Medium Data (10KB):
| Algorithm | Median | Mean | CPU Time | Memory | Allocations | Ratio |
|---|---|---|---|---|---|---|
| LZ4 | <1µs | <1µs | 0.019ms | 4.5 KB | 17 | 99.0% |
| Gzip | <1µs | 100µs | 0.089ms | 795 KB | 25 | 99.1% |
| Bzip2 | 200µs | 300µs | 0.339ms | 822 KB | 37 | 98.8% |
| XZ | 300µs | 400µs | 0.378ms | 8,226 KB | 147 | 98.7% |
Large Data (100KB):
| Algorithm | Median | Mean | CPU Time | Memory | Allocations | Ratio |
|---|---|---|---|---|---|---|
| LZ4 | <1µs | <1µs | 0.044ms | 1.2 KB | 11 | 99.5% |
| Gzip | 300µs | 400µs | 0.351ms | 796 KB | 26 | 99.7% |
| Bzip2 | 2.7ms | 2.8ms | 2.753ms | 2,544 KB | 38 | 99.9% |
| XZ | 6.9ms | 7.0ms | 6.994ms | 8,228 KB | 327 | 99.8% |
Decompression Performance
Small Data (1KB):
| Algorithm | Median | Mean | CPU Time | Memory | Allocations |
|---|---|---|---|---|---|
| LZ4 | <1µs | <1µs | 0.018ms | 1.2 KB | 7 |
| Gzip | <1µs | <1µs | 0.024ms | 24.6 KB | 16 |
| Bzip2 | <1µs | 100µs | 0.098ms | 276 KB | 25 |
| XZ | 100µs | 200µs | 0.192ms | 8,225 KB | 89 |
Medium Data (10KB):
| Algorithm | Median | Mean | CPU Time | Memory | Allocations |
|---|---|---|---|---|---|
| LZ4 | <1µs | <1µs | 0.017ms | 1.2 KB | 8 |
| Gzip | <1µs | <1µs | 0.033ms | 33.4 KB | 17 |
| Bzip2 | 100µs | 100µs | 0.133ms | 276 KB | 26 |
| XZ | 100µs | 100µs | 0.144ms | 8,225 KB | 92 |
Large Data (100KB):
| Algorithm | Median | Mean | CPU Time | Memory | Allocations |
|---|---|---|---|---|---|
| LZ4 | <1µs | <1µs | 0.028ms | 1.2 KB | 6 |
| Gzip | 100µs | 100µs | 0.112ms | 312 KB | 19 |
| Bzip2 | 1.3ms | 1.3ms | 1.259ms | 276 KB | 28 |
| XZ | 800µs | 1.0ms | 0.970ms | 8,225 KB | 192 |
Detection & Parsing
| Operation | Median | Mean | Max | Throughput |
|---|---|---|---|---|
| Parse (string) | <1µs | <1µs | 100µs | >1M ops/sec |
| Detection (6 bytes) | <1µs | <1µs | 100µs | >1M ops/sec |
Memory Usage
Algorithm-Specific Memory Footprint:
Base overhead: 1 byte (enum type)
Detection: 6-byte peek buffer
Parse operations: Minimal (string length)
Compression buffers:
- LZ4: ~4.5 KB (fastest, lowest memory)
- Bzip2: ~650 KB (1KB) to ~2.5 MB (100KB)
- Gzip: ~795 KB (consistent across sizes)
- XZ: ~8.2 MB (highest memory usage)
Decompression buffers:
- LZ4: ~1.2 KB (minimal footprint)
- Gzip: ~25-300 KB (size-dependent)
- Bzip2: ~276 KB (consistent)
- XZ: ~8.2 MB (consistent)
Scalability
- Stateless Operations: All format functions (String, Extension, IsNone) are O(1) and thread-safe
- Detection: O(1) header scan, requires only 6 bytes
- Concurrent Use: Multiple goroutines can use separate Algorithm instances safely
- Zero Allocations: Parse and detection operations have minimal allocation overhead
Use Cases
1. File Archiving with Auto-Detection
Extract files regardless of compression format:
func ExtractFile(src, dst string) error {
in, err := os.Open(src)
if err != nil {
return err
}
defer in.Close()
alg, reader, err := compress.Detect(in)
if err != nil {
return err
}
defer reader.Close()
log.Printf("Detected: %s", alg.String())
out, err := os.Create(dst)
if err != nil {
return err
}
defer out.Close()
_, err = io.Copy(out, reader)
return err
}
2. HTTP Response Compression
Compress HTTP responses based on client capabilities:
func CompressResponse(w http.ResponseWriter, data []byte, format string) error {
alg := compress.Parse(format)
if alg == compress.None {
w.Write(data)
return nil
}
w.Header().Set("Content-Encoding", alg.String())
writer, err := alg.Writer(struct {
io.Writer
io.Closer
}{w, io.NopCloser(nil)})
if err != nil {
return err
}
defer writer.Close()
_, err = writer.Write(data)
return err
}
3. Log File Rotation with Compression
Compress rotated log files:
func RotateLog(path string, compression compress.Algorithm) error {
src, err := os.Open(path)
if err != nil {
return err
}
defer src.Close()
dstPath := path + compression.Extension()
dst, err := os.Create(dstPath)
if err != nil {
return err
}
defer dst.Close()
writer, err := compression.Writer(dst)
if err != nil {
return err
}
defer writer.Close()
_, err = io.Copy(writer, src)
return err
}
4. Configuration with Compression Settings
Store compression preferences in config files:
type AppConfig struct {
DataCompression compress.Algorithm `json:"data_compression"`
LogCompression compress.Algorithm `json:"log_compression"`
}
// Save config
cfg := AppConfig{
DataCompression: compress.LZ4,
LogCompression: compress.Gzip,
}
data, _ := json.Marshal(cfg)
os.WriteFile("config.json", data, 0644)
// Load config
data, _ = os.ReadFile("config.json")
var loaded AppConfig
json.Unmarshal(data, &loaded)
Quick Start
Installation
go get github.com/nabbar/golib/archive/compress
Basic Compression
package main
import (
"log"
"os"
"github.com/nabbar/golib/archive/compress"
)
func main() {
file, err := os.Create("output.txt.gz")
if err != nil {
log.Fatal(err)
}
defer file.Close()
// Create gzip writer
writer, err := compress.Gzip.Writer(file)
if err != nil {
log.Fatal(err)
}
defer writer.Close()
// Write compressed data
writer.Write([]byte("This data will be compressed"))
}
Basic Decompression
package main
import (
"fmt"
"io"
"log"
"os"
"github.com/nabbar/golib/archive/compress"
)
func main() {
file, err := os.Open("input.txt.gz")
if err != nil {
log.Fatal(err)
}
defer file.Close()
// Create gzip reader
reader, err := compress.Gzip.Reader(file)
if err != nil {
log.Fatal(err)
}
defer reader.Close()
// Read decompressed data
data, _ := io.ReadAll(reader)
fmt.Println(string(data))
}
Automatic Detection
package main
import (
"fmt"
"io"
"log"
"os"
"github.com/nabbar/golib/archive/compress"
)
func main() {
file, err := os.Open("unknown.dat")
if err != nil {
log.Fatal(err)
}
defer file.Close()
// Detect and decompress automatically
alg, reader, err := compress.Detect(file)
if err != nil {
log.Fatal(err)
}
defer reader.Close()
fmt.Printf("Detected: %s\n", alg.String())
data, _ := io.ReadAll(reader)
fmt.Println(string(data))
}
Format Parsing
package main
import (
"fmt"
"github.com/nabbar/golib/archive/compress"
)
func main() {
// Parse from string
alg := compress.Parse("gzip")
fmt.Println(alg.String()) // "gzip"
fmt.Println(alg.Extension()) // ".gz"
// List all algorithms
algorithms := compress.List()
for _, alg := range algorithms {
fmt.Printf("%s (%s)\n", alg.String(), alg.Extension())
}
}
Best Practices
Testing
The package includes a comprehensive test suite with 97.7% code coverage and 165 test specifications using BDD methodology (Ginkgo v2 + Gomega).
Key test coverage:
- ✅ All algorithm operations (String, Extension, IsNone, DetectHeader)
- ✅ Format detection and parsing
- ✅ JSON and text marshaling/unmarshaling
- ✅ Reader/Writer wrapping for all algorithms
- ✅ Round-trip compression/decompression
- ✅ Edge cases and error handling
- ✅ Concurrent access with race detector (zero races detected)
- ✅ Performance benchmarks
For detailed test documentation, see TESTING.md.
✅ DO
Resource Management:
// ✅ GOOD: Always close resources
writer, err := alg.Writer(file)
if err != nil {
file.Close() // Close file if writer creation failed
return err
}
defer writer.Close() // Writer.Close() also flushes buffers
Error Handling:
// ✅ GOOD: Check errors and validate
alg, reader, err := compress.Detect(input)
if err != nil {
// Fallback to uncompressed
reader = io.NopCloser(input)
alg = compress.None
}
defer reader.Close()
Format Validation:
// ✅ GOOD: Validate parsed format
alg := compress.Parse(userInput)
if alg == compress.None && userInput != "none" {
return fmt.Errorf("unsupported compression: %s", userInput)
}
Algorithm Selection:
// ✅ GOOD: Check if compression is needed
if !alg.IsNone() {
writer, _ := alg.Writer(file)
defer writer.Close()
// ... write compressed data
}
❌ DON'T
Don't forget to close:
// ❌ BAD: Writer not closed (data loss)
writer, _ := alg.Writer(file)
writer.Write(data) // Data may be buffered
// ✅ GOOD: Always close
writer, _ := alg.Writer(file)
defer writer.Close()
writer.Write(data)
Don't assume format:
// ❌ BAD: Assuming format without detection
reader, _ := compress.Gzip.Reader(file)
// ✅ GOOD: Use automatic detection
alg, reader, _ := compress.Detect(file)
Don't use DetectHeader with truncated data:
// ❌ BAD: Truncated header (returns false, not error)
data := []byte{0x1F}
if compress.Gzip.DetectHeader(data) { // Always false
// Never executed
}
// ✅ GOOD: Ensure sufficient data
if len(data) >= 6 && compress.Gzip.DetectHeader(data) {
// Now safe
}
Don't parse untrusted input without validation:
// ❌ BAD: No validation
alg := compress.Parse(untrustedInput)
writer, _ := alg.Writer(file)
// ✅ GOOD: Validate result
alg := compress.Parse(untrustedInput)
if alg == compress.None && untrustedInput != "none" {
return errors.New("invalid compression format")
}
API Reference
Algorithm Type
type Algorithm uint8
const (
None Algorithm = iota // No compression
Bzip2 // Bzip2 compression
Gzip // Gzip compression
LZ4 // LZ4 compression
XZ // XZ compression
)
Methods:
String() string- Get lowercase string representationExtension() string- Get file extension (e.g., ".gz")IsNone() bool- Check if algorithm is NoneDetectHeader([]byte) bool- Validate magic numberReader(io.Reader) (io.ReadCloser, error)- Create decompression readerWriter(io.WriteCloser) (io.WriteCloser, error)- Create compression writerMarshalText() ([]byte, error)- Text marshalingUnmarshalText([]byte) error- Text unmarshalingMarshalJSON() ([]byte, error)- JSON marshalingUnmarshalJSON([]byte) error- JSON unmarshaling
Core Functions
Parse:
func Parse(s string) Algorithm
Parse string to Algorithm (case-insensitive). Returns None if unknown.
Detect:
func Detect(r io.Reader) (Algorithm, io.ReadCloser, error)
Auto-detect format and return decompression reader.
DetectOnly:
func DetectOnly(r io.Reader) (Algorithm, io.ReadCloser, error)
Detect format without creating decompression reader.
List:
func List() []Algorithm
Return all supported algorithms.
ListString:
func ListString() []string
Return string names of all algorithms.
Encoding Support
The package implements standard encoding interfaces:
encoding.TextMarshaler/encoding.TextUnmarshalerjson.Marshaler/json.Unmarshaler
JSON Marshaling:
cfg := Config{Compression: compress.Gzip}
json, _ := json.Marshal(cfg) // {"compression":"gzip"}
// None is marshaled as null
cfg := Config{Compression: compress.None}
json, _ := json.Marshal(cfg) // {"compression":null}
Contributing
Contributions are welcome! Please follow these guidelines:
-
Code Quality
- Follow Go best practices and idioms
- Maintain or improve code coverage (target: >80%)
- Pass all tests including race detector
- Use
gofmtandgolint
-
AI Usage Policy
- ❌ AI must NEVER be used to generate package code or core functionality
- ✅ AI assistance is limited to:
- Testing (writing and improving tests)
- Debugging (troubleshooting and bug resolution)
- Documentation (comments, README, TESTING.md)
- All AI-assisted work must be reviewed and validated by humans
-
Testing
- Add tests for new features
- Use Ginkgo v2 / Gomega for test framework
- Use
gmeasurefor performance benchmarks - Ensure zero race conditions with
go test -race
-
Documentation
- Update GoDoc comments for public APIs
- Add examples for new features
- Update README.md and TESTING.md if needed
-
Pull Request Process
- Fork the repository
- Create a feature branch
- Write clear commit messages
- Ensure all tests pass
- Update documentation
- Submit PR with description of changes
Improvements & Security
Current Status
The package is production-ready with no urgent improvements or security vulnerabilities identified.
Code Quality Metrics
- ✅ 97.7% test coverage (target: >80%)
- ✅ Zero race conditions detected with
-raceflag - ✅ Thread-safe stateless operations
- ✅ Memory-safe with proper resource cleanup
- ✅ Standard interfaces for maximum compatibility
Future Enhancements (Non-urgent)
The following enhancements could be considered for future versions:
- Compression Levels: Support for configurable compression levels (currently uses defaults)
- Custom Parameters: Support for algorithm-specific compression parameters
- Streaming API: Additional helpers for streaming large files
- Multi-Format Archives: Integration with tar/zip for complete archive solutions
- Progress Callbacks: Optional progress reporting for long operations
These are optional improvements and not required for production use. The current implementation is stable, performant, and feature-complete for its intended use cases.
Resources
Package Documentation
-
GoDoc - Complete API reference with function signatures, method descriptions, and runnable examples. Essential for understanding the public interface and usage patterns.
-
doc.go - In-depth package documentation including design philosophy, supported algorithms, magic numbers, performance characteristics, use cases, and implementation details. Provides detailed explanations of internal mechanisms and best practices for production use.
-
TESTING.md - Comprehensive test suite documentation covering test architecture, BDD methodology with Ginkgo v2, 97.7% coverage analysis, performance benchmarks, and guidelines for writing new tests. Includes troubleshooting and test execution examples.
Related golib Packages
-
github.com/nabbar/golib/archive/tar - Tar archive format support that works with compress for compressed tar files (.tar.gz, .tar.bz2, etc.).
-
github.com/nabbar/golib/archive/zip - Zip archive format support with optional compression integration.
External Dependencies
-
compress/gzip - Standard library Gzip support. Used for Gzip compression and decompression.
-
compress/bzip2 - Standard library Bzip2 decompression support (read-only).
-
github.com/dsnet/compress - Third-party Bzip2 compression support (write operations).
-
github.com/pierrec/lz4 - Pure Go LZ4 compression and decompression implementation.
-
github.com/ulikunitz/xz - Pure Go XZ compression and decompression implementation.
Standard Library References
-
io - Standard I/O interfaces implemented by compress. The package fully implements io.ReadCloser and io.WriteCloser for seamless integration with Go's I/O ecosystem.
-
encoding - Standard encoding interfaces. The package implements TextMarshaler/Unmarshaler for text-based serialization.
-
encoding/json - JSON marshaling support. The package implements json.Marshaler/Unmarshaler for JSON serialization.
External References
-
Effective Go - Official Go programming guide covering best practices for interfaces, error handling, and I/O patterns. The compress package follows these conventions for idiomatic Go code.
-
RFC 1952 (Gzip) - Gzip file format specification. Understanding this helps with debugging Gzip-specific issues.
AI Transparency
In compliance with EU AI Act Article 50.4: AI assistance was used for testing, documentation, and bug resolution under human supervision. All core functionality is human-designed and validated.
License
MIT License - See LICENSE file for details.
Copyright (c) 2025 Nicolas JUHEL
Maintained by: Nicolas JUHEL
Package: github.com/nabbar/golib/archive/compress
Version: See releases for versioning