Package Archive:

- Update benchmark
- Update documentations
This commit is contained in:
nabbar
2025-12-26 08:55:30 +01:00
parent 99075ec0d4
commit 0b384eee54
7 changed files with 1086 additions and 144 deletions
+86 -31
View File
@@ -90,17 +90,17 @@ archive/
```
```
┌─────────────────────────────────────────────────────────
│ Root Package
│ ExtractAll(), DetectArchive(), DetectCompression()
└──────────────┬──────────────┬─────────────┬────────────┘
┌────────▼─────┐ ┌────▼─────┐ ┌────▼────────┐
│ archive │ │ compress │ │ helper │
│ │ │ │ │ │
│ TAR, ZIP │ │ GZIP, XZ │ │ Pipelines │
│ Reader/Writer│ │ BZIP2,LZ4│ │ Thread-safe │
└──────────────┘ └──────────┘ └─────────────┘
┌──────────────────────────────────────────────────────┐
│ Root Package │
│ ExtractAll(), DetectArchive(), DetectCompression() │
└────────────────────────┬─────────────┬──────────────┘
│ │
┌────────▼─────┐ ┌────▼─────┐ ┌────▼────────┐
│ archive │ │ compress │ │ helper │
│ │ │ │ │ │
│ TAR, ZIP │ │ GZIP, XZ │ │ Pipelines │
│ Reader/Writer│ │ BZIP2,LZ4│ │ Thread-safe │
└──────────────┘ └──────────┘ └─────────────┘
```
### Package Structure
@@ -162,34 +162,89 @@ All operations are thread-safe through:
- **Goroutine Sync**: `sync.WaitGroup` for lifecycle management
- **Concurrent Safe**: Multiple goroutines can operate independently
### Throughput Benchmarks
### Benchmarks
| Operation | Throughput | Memory | Notes |
|-----------|------------|--------|-------|
| TAR Create | ~500 MB/s | O(1) | Sequential write |
| TAR Extract | ~400 MB/s | O(1) | Sequential read |
| ZIP Create | ~450 MB/s | O(n) | Index building |
| ZIP Extract | ~600 MB/s | O(1) | Random access |
| GZIP | ~150 MB/s | O(1) | Compression |
| GZIP | ~300 MB/s | O(1) | Decompression |
| BZIP2 | ~20 MB/s | O(1) | High ratio |
| LZ4 | ~800 MB/s | O(1) | Fastest |
| XZ | ~10 MB/s | O(1) | Best ratio |
Based on actual benchmark results (AMD64, Go 1.25, 20 samples per test):
*Measured on AMD64, Go 1.24, SSD storage*
#### Compression Performance
**Small Data (1KB):**
| Algorithm | Median | CPU Time | Memory | Allocations | Compression Ratio |
|-----------|--------|----------|--------|-------------|-------------------|
| **LZ4** | <1µs | 0.032ms | 4.5 KB | 16 | 93.1% |
| **Gzip** | <1µs | 0.073ms | 795 KB | 24 | 94.2% |
| **Bzip2** | 100µs | 0.186ms | 650 KB | 34 | 90.4% |
| **XZ** | 300µs | 0.513ms | 8,226 KB | 144 | 89.8% |
**Medium Data (10KB):**
| Algorithm | Median | CPU Time | Memory | Allocations | Compression Ratio |
|-----------|--------|----------|--------|-------------|-------------------|
| **LZ4** | <1µs | 0.019ms | 4.5 KB | 17 | 99.0% |
| **Gzip** | <1µs | 0.089ms | 795 KB | 25 | 99.1% |
| **Bzip2** | 200µs | 0.339ms | 822 KB | 37 | 98.8% |
| **XZ** | 300µs | 0.378ms | 8,226 KB | 147 | 98.7% |
**Large Data (100KB):**
| Algorithm | Median | CPU Time | Memory | Allocations | Compression Ratio |
|-----------|--------|----------|--------|-------------|-------------------|
| **LZ4** | <1µs | 0.044ms | 1.2 KB | 11 | 99.5% |
| **Gzip** | 300µs | 0.351ms | 796 KB | 26 | 99.7% |
| **Bzip2** | 2.7ms | 2.753ms | 2,544 KB | 38 | 99.9% |
| **XZ** | 6.9ms | 6.994ms | 8,228 KB | 327 | 99.8% |
#### Archive Format Performance
**TAR vs ZIP - Creation (Single 1KB file, uncompressed):**
| Format | Median | CPU Time | Memory | Allocations | Archive Size | Overhead |
|--------|--------|----------|--------|-------------|--------------|----------|
| **TAR** | <1µs | 0.019ms | 5.2 KB | 19 | 2,560 bytes | 1,536 bytes (150%) |
| **ZIP** | <1µs | 0.006ms | 5.2 KB | 19 | ~200 bytes | ~176 bytes |
**TAR vs ZIP - Extraction (Single 1KB file):**
| Format | Median | CPU Time | Memory | Allocations |
|--------|--------|----------|--------|-------------|
| **TAR** | <1µs | 0.008ms | 1.7 KB | 22 |
| **ZIP** | <1µs | 0.006ms | 0.2 KB | 4 |
**Important Notes on TAR vs ZIP:**
- **Compression**: TAR is an archive format only (no compression). ZIP integrates compression. Compression ratios are NOT comparable between formats.
- **Robustness**: TAR allows reading/writing even if corrupted (sequential format). ZIP cannot recover if corrupted as its central directory is at the end of the archive.
- **Use TAR + Compression**: Combine TAR with Gzip/Bzip2/LZ4/XZ for compressed archives (e.g., `.tar.gz`, `.tar.xz`).
### Algorithm Selection Guide
**By Speed (Compression):**
```
LZ4 (~0.04ms) >> Gzip (~0.35ms) > Bzip2 (~2.75ms) > XZ (~7ms)
└─ 175x faster ─┘ └─ 8x faster ─┘ └─ 2.5x slower ┘
```
Speed: LZ4 > GZIP > BZIP2 > XZ
Compression: XZ > BZIP2 > GZIP > LZ4
Recommended:
├─ Real-time/Logs → LZ4
├─ Web/API → GZIP
├─ Archival → XZ or BZIP2
└─ Balanced → GZIP
**By Compression Ratio (100KB data):**
```
Bzip2 (99.9%) ≈ XZ (99.8%) ≈ Gzip (99.7%) > LZ4 (99.5%)
└─────────── Best compression ─────────────┘ └─ Fastest ┘
```
**By Memory Efficiency:**
```
LZ4 (1.2-4.5 KB) << Gzip (~800 KB) ≈ Bzip2 (~650 KB-2.5 MB) << XZ (~8.2 MB)
└─── 200x less ──┘ └────── Moderate ───────────────────┘ └─ Highest ┘
```
**Recommended Use Cases:**
- **Real-time/Logs** → LZ4 (fastest, minimal memory)
- **Web/API** → Gzip (excellent ratio, moderate speed)
- **Archival/Cold Storage** → Bzip2 or XZ (best compression)
- **Balanced** → Gzip (good speed + ratio + memory)
**Archive Format Selection:**
- **TAR**: Best for large files (1.5% overhead at 100KB), streaming, backups
- **ZIP**: Best for small files, random access, Windows compatibility (minimal overhead)
---
+133 -34
View File
@@ -380,58 +380,157 @@ ok github.com/nabbar/golib/archive/helper 0.207s coverage: 82.4%
### Performance Report
**Benchmark Results (Aggregated Experiments):**
**Summary:**
#### Compression Operations
The archive package demonstrates excellent performance across all operations:
- **Sub-microsecond** compression/archive operations for small data
- **Minimal memory footprint**: 0.2-8,226 KB depending on algorithm
- **Predictable scaling**: Linear performance with data size
- **Efficient overhead**: TAR 1.5% at 100KB, ZIP ~200 bytes constant
| Configuration | Sample Size | Median | Mean | Max | Notes |
|---------------|-------------|--------|------|-----|-------|
| Gzip compress 1MB | 100 | 400µs | 700µs | 3.4ms | ~150 MB/s throughput |
| Bzip2 compress 1MB | 100 | 600µs | 900µs | 4ms | ~100 MB/s throughput |
| LZ4 compress 1MB | 100 | 200µs | 400µs | 2ms | ~500 MB/s throughput |
| XZ compress 1MB | 100 | 2ms | 3ms | 10ms | ~50 MB/s throughput |
**Benchmark Results (AMD64, Go 1.25, 20 samples per test):**
#### Archive Operations
#### Compression Performance by Data Size
| Operation | Sample Size | Median | Mean | Max | Notes |
|-----------|-------------|--------|------|-----|-------|
| TAR add 1MB | 100 | 1.4ms | 2.6ms | 5.1ms | Sequential writes |
| ZIP write 1MB | 100 | 1.8ms | 3.2ms | 6ms | Random access overhead |
| TAR list | 1000 | <1µs | <1µs | 300µs | Fast iteration |
**Small Data (1KB):**
| Algorithm | Median | Mean | CPU Time | Memory | Allocations | Compression Ratio |
|-----------|--------|------|----------|--------|-------------|-------------------|
| **LZ4** | <1µs | <1µs | 0.032ms | 4.5 KB | 16 | 93.1% |
| **Gzip** | <1µs | <1µs | 0.073ms | 795 KB | 24 | 94.2% |
| **Bzip2** | 100µs | 200µs | 0.186ms | 650 KB | 34 | 90.4% |
| **XZ** | 300µs | 500µs | 0.513ms | 8,226 KB | 144 | 89.8% |
**Medium Data (10KB):**
| Algorithm | Median | Mean | CPU Time | Memory | Allocations | Compression Ratio |
|-----------|--------|------|----------|--------|-------------|-------------------|
| **LZ4** | <1µs | <1µs | 0.019ms | 4.5 KB | 17 | 99.0% |
| **Gzip** | <1µs | 100µs | 0.089ms | 795 KB | 25 | 99.1% |
| **Bzip2** | 200µs | 300µs | 0.339ms | 822 KB | 37 | 98.8% |
| **XZ** | 300µs | 400µs | 0.378ms | 8,226 KB | 147 | 98.7% |
**Large Data (100KB):**
| Algorithm | Median | Mean | CPU Time | Memory | Allocations | Compression Ratio |
|-----------|--------|------|----------|--------|-------------|-------------------|
| **LZ4** | <1µs | <1µs | 0.044ms | 1.2 KB | 11 | 99.5% |
| **Gzip** | 300µs | 400µs | 0.351ms | 796 KB | 26 | 99.7% |
| **Bzip2** | 2.7ms | 2.8ms | 2.753ms | 2,544 KB | 38 | 99.9% |
| **XZ** | 6.9ms | 7.0ms | 6.994ms | 8,228 KB | 327 | 99.8% |
#### Decompression Performance by Data Size
**Small Data (1KB):**
| Algorithm | Median | Mean | CPU Time | Memory | Allocations |
|-----------|--------|------|----------|--------|-------------|
| **LZ4** | <1µs | <1µs | 0.018ms | 1.2 KB | 7 |
| **Gzip** | <1µs | <1µs | 0.024ms | 24.6 KB | 16 |
| **Bzip2** | <1µs | 100µs | 0.098ms | 276 KB | 25 |
| **XZ** | 100µs | 200µs | 0.192ms | 8,225 KB | 89 |
**Medium Data (10KB):**
| Algorithm | Median | Mean | CPU Time | Memory | Allocations |
|-----------|--------|------|----------|--------|-------------|
| **LZ4** | <1µs | <1µs | 0.017ms | 1.2 KB | 8 |
| **Gzip** | <1µs | <1µs | 0.033ms | 33.4 KB | 17 |
| **Bzip2** | 100µs | 100µs | 0.133ms | 276 KB | 26 |
| **XZ** | 100µs | 100µs | 0.144ms | 8,225 KB | 92 |
**Large Data (100KB):**
| Algorithm | Median | Mean | CPU Time | Memory | Allocations |
|-----------|--------|------|----------|--------|-------------|
| **LZ4** | <1µs | <1µs | 0.028ms | 1.2 KB | 6 |
| **Gzip** | 100µs | 100µs | 0.112ms | 312 KB | 19 |
| **Bzip2** | 1.3ms | 1.3ms | 1.259ms | 276 KB | 28 |
| **XZ** | 800µs | 1.0ms | 0.970ms | 8,225 KB | 192 |
#### Archive Format Performance
**TAR vs ZIP - Creation (Single 1KB file, uncompressed):**
| Format | Median | Mean | CPU Time | Memory | Allocations | Archive Size | Overhead |
|--------|--------|------|----------|--------|-------------|--------------|----------|
| **TAR** | <1µs | <1µs | 0.019ms | 5.2 KB | 19 | 2,560 bytes | 1,536 bytes (150%) |
| **ZIP** | <1µs | <1µs | 0.006ms | 5.2 KB | 19 | ~200 bytes | ~176 bytes |
**TAR vs ZIP - Extraction (Single 1KB file):**
| Format | Median | Mean | CPU Time | Memory | Allocations |
|--------|--------|------|----------|--------|-------------|
| **TAR** | <1µs | <1µs | 0.008ms | 1.7 KB | 22 |
| **ZIP** | <1µs | <1µs | 0.006ms | 0.2 KB | 4 |
**Critical Differences Between TAR and ZIP:**
1. **Compression**:
- TAR: Archive format only, NO compression (requires external compression like Gzip/Bzip2/LZ4/XZ)
- ZIP: Integrates compression natively
- ⚠️ Compression ratios are NOT comparable between TAR and ZIP formats
2. **Robustness to Corruption**:
- TAR: Sequential format allows reading/writing even if partially corrupted
- ZIP: Central directory at end of archive - ANY corruption prevents reading entire archive
- ✅ TAR recommended for critical backups and long-term storage
3. **Recommended Usage**:
- TAR + Compression (e.g., `.tar.gz`, `.tar.xz`) for backups, streaming, robustness
- ZIP for distribution, Windows compatibility, random access
### Performance Analysis
**Key Findings:**
1. **Sub-millisecond Small Operations**: Most small operations complete in <1ms
2. **Large Data Handling**: 1MB operations scale predictably (1-3ms mean)
3. **Algorithm Trade-offs**: LZ4 fastest, XZ highest ratio
4. **Streaming Efficiency**: TAR streaming faster than ZIP random access
1. **Compression Speed**: LZ4 175x faster than XZ, 8x faster than Gzip
2. **Memory Efficiency**: ZIP uses 5-8x less memory for extraction (0.2 KB vs 1.2-1.7 KB)
3. **Compression Ratios**: Bzip2/XZ achieve 99.8-99.9% on 100KB data
4. **Archive Overhead**: TAR fixed 1,536 bytes, ZIP minimal ~150-200 bytes
5. **CPU vs Ratio Trade-off**: XZ/Bzip2 best compression but 70-175x slower than LZ4
**Test Conditions:**
- **Hardware**: AMD64/ARM64 Multi-core, 8GB+ RAM
- **Sample Sizes**: 1000 samples (micro-ops), 100 samples (large data)
- **Data Sizes**: Small (10B), Medium (1KB), Large (1MB)
- **Hardware**: AMD64/ARM64, 2+ cores, 512MB+ RAM
- **Sample Sizes**: 20 samples per benchmark
- **Data Sizes**: Small (1KB), Medium (10KB), Large (100KB)
- **Measurement**: runtime.ReadMemStats for memory, gmeasure.Experiment for timing
### Performance Characteristics
**Strengths:**
- **Streaming Architecture**: O(1) memory usage
- **Efficient Algorithms**: LZ4 for speed, XZ for ratio
- **Predictable Performance**: Low standard deviation
-**Sub-microsecond Operations**: Most operations <1µs for small data
-**Memory Efficient**: LZ4 uses only 1.2-4.5 KB
-**Predictable Scaling**: Linear performance with data size
-**Low Allocations**: 6-327 allocations depending on algorithm
**Limitations:**
1. **XZ Compression**: Slow for large data (3ms/MB)
- *Mitigation*: Use LZ4 or Gzip for speed-critical applications
2. **ZIP Overhead**: Random access slower than TAR streaming
- *Context*: Trade-off for random file access capability
**Algorithm Recommendations:**
- **Real-time/Logs** → LZ4 (0.04ms, 4.5 KB memory)
- **Web/API** → Gzip (0.35ms, 800 KB memory, 99.7% ratio)
- **Archival** → Bzip2/XZ (best ratios 99.8-99.9%)
- **Balanced** → Gzip (good speed + ratio + memory)
### Memory Profile
**Archive Format Recommendations:**
- **TAR**: Best for large files (1.5% overhead at 100KB), streaming
- **ZIP**: Best for small files, extraction (8x less memory), random access
- **Compression**: ~64KB buffer per operation
- **TAR Reader**: ~32KB buffer
- **ZIP Reader**: ~64KB + index (O(n) entries)
- **Helper Pipeline**: ~32KB buffer
### Memory Profile (Real Measurements)
**Compression:**
- LZ4: 4.5 KB (small/medium) → 1.2 KB (large)
- Gzip: ~795 KB consistent
- Bzip2: 650 KB → 2,544 KB (scales with data)
- XZ: ~8,226 KB consistent (highest)
**Decompression:**
- LZ4: ~1.2 KB (minimal)
- Gzip: 24.6 KB → 312 KB (scales with data)
- Bzip2: ~276 KB consistent
- XZ: ~8,225 KB consistent
**Archives:**
- TAR: 5.2 KB creation, 1.2-1.7 KB extraction
- ZIP: 5.2 KB creation, 0.2 KB extraction (8x more efficient)
---
+75 -12
View File
@@ -193,7 +193,61 @@ This package consists of three sub-packages:
## Performance
### Detection Performance
### Benchmarks
Based on actual benchmark results (AMD64, Go 1.25, 20 samples per test):
#### Archive Creation Performance
**Small Data (1KB):**
| Format | Median | Mean | CPU Time | Memory | Allocations | Archive Size | Overhead |
|--------|--------|------|----------|--------|-------------|--------------|----------|
| **TAR** | <1µs | <1µs | 0.019ms | 5.2 KB | 19 | 2,560 bytes | 1,536 bytes (150%) |
| **ZIP** | <1µs | <1µs | 0.006ms | 5.2 KB | 19 | ~200 bytes | ~176 bytes |
**Medium Data (10KB):**
| Format | Median | Mean | CPU Time | Memory | Allocations | Archive Size | Overhead |
|--------|--------|------|----------|--------|-------------|--------------|----------|
| **TAR** | <1µs | <1µs | 0.019ms | 5.2 KB | 19 | 11,776 bytes | 1,536 bytes (15%) |
| **ZIP** | <1µs | <1µs | 0.008ms | 5.2 KB | 19 | ~10,400 bytes | ~160 bytes |
**Large Data (100KB):**
| Format | Median | Mean | CPU Time | Memory | Allocations | Archive Size | Overhead |
|--------|--------|------|----------|--------|-------------|--------------|----------|
| **TAR** | <1µs | <1µs | 0.020ms | 5.2 KB | 19 | 103,936 bytes | 1,536 bytes (1.5%) |
| **ZIP** | <1µs | <1µs | 0.009ms | 5.2 KB | 19 | ~102,600 bytes | ~200 bytes |
#### Archive Extraction Performance
**Small Data (1KB):**
| Format | Median | Mean | CPU Time | Memory | Allocations |
|--------|--------|------|----------|--------|-------------|
| **TAR** | <1µs | <1µs | 0.008ms | 1.7 KB | 22 |
| **ZIP** | <1µs | <1µs | 0.006ms | 0.2 KB | 4 |
**Medium Data (10KB):**
| Format | Median | Mean | CPU Time | Memory | Allocations |
|--------|--------|------|----------|--------|-------------|
| **TAR** | <1µs | <1µs | 0.005ms | 1.2 KB | 22 |
| **ZIP** | <1µs | <1µs | 0.006ms | 0.2 KB | 4 |
**Large Data (100KB):**
| Format | Median | Mean | CPU Time | Memory | Allocations |
|--------|--------|------|----------|--------|-------------|
| **TAR** | <1µs | <1µs | 0.006ms | 1.2 KB | 22 |
| **ZIP** | <1µs | <1µs | 0.006ms | 0.2 KB | 4 |
**Important Note**: These benchmarks measure archiving performance only (uncompressed). TAR and ZIP are fundamentally different:
- **TAR**: Archive format only, NO compression. Use with separate compression (Gzip/Bzip2/LZ4/XZ) for `.tar.gz`, `.tar.xz`, etc.
- **ZIP**: Integrates compression natively. Compression ratios are NOT comparable between formats.
#### Detection Performance
Format detection is extremely fast, requiring only a 265-byte header peek:
@@ -203,8 +257,6 @@ Format detection is extremely fast, requiring only a 265-byte header peek:
| **Format Match** | O(1) | <1µs |
| **Total Detection** | O(1) | ~2-3µs |
*Performance measured on AMD64, Go 1.25*
### Format Comparison
Understanding the performance characteristics of each format:
@@ -214,23 +266,34 @@ Understanding the performance characteristics of each format:
- **Get(file)**: O(n) - must scan until found
- **Has(file)**: O(n) - must scan until found
- **Walk()**: O(n) - single sequential pass
- **Memory**: O(1) - constant, streaming-friendly
- **Best for**: Backups, streaming, network transfers
- **Memory**: O(1) - constant ~1-2 KB (streaming-friendly)
- **Overhead**: Fixed 1,536 bytes per archive (512-byte headers)
- **Compression**: None (archive format only) - use with Gzip/Bzip2/LZ4/XZ externally
- **Robustness**: Can read/write even if partially corrupted (sequential format)
- **Best for**: Backups, streaming, network transfers, large files, critical data with corruption risk
**ZIP (Random Access)**:
- **List()**: O(1) - reads central directory only
- **Get(file)**: O(1) - direct seek via directory
- **Has(file)**: O(1) - lookup in directory
- **Walk()**: O(n) - iterates directory entries
- **Memory**: O(n) - central directory in memory
- **Best for**: Random file access, GUI tools, distribution
- **Memory**: Minimal ~0.2 KB for extraction, scales with file count for creation
- **Overhead**: ~150-200 bytes (central directory + metadata)
- **Compression**: Integrated natively
- **Robustness**: Cannot recover if corrupted (central directory at end of archive)
- **Best for**: Random file access, GUI tools, distribution, many small files, Windows compatibility
### Scalability
### Key Performance Insights
- **Writers**: Both formats scale well with file count
- **Concurrency**: Package is not thread-safe per instance (design choice for performance)
- **Throughput**: Limited by underlying I/O, minimal overhead from abstraction layer
- **Memory**: TAR constant, ZIP proportional to file count
1. **Creation Speed**: Both formats show similar sub-microsecond performance for single-file archives
2. **Extraction Speed**: ZIP slightly faster due to direct access, both extremely fast (<10µs)
3. **Memory Efficiency**: ZIP uses 5-8x less memory for extraction (0.2 KB vs 1.2-1.7 KB)
4. **Overhead Analysis**:
- TAR: Fixed 1,536 bytes overhead regardless of content size
- ZIP: Minimal overhead (~150-200 bytes), scales better with content
5. **Scalability**:
- TAR excels with large files (1.5% overhead at 100KB)
- ZIP excels with many small files (random access advantage)
---
+84 -4
View File
@@ -450,12 +450,62 @@ ok github.com/nabbar/golib/archive/archive 1.223s
**Summary:**
The archive package demonstrates excellent performance characteristics:
- **Sub-microsecond** algorithm operations (String, Extension, IsNone)
- **Sub-microsecond** archive operations for both TAR and ZIP
- **~2-3µs** format detection overhead
- **Zero allocation** for enum operations
- **Minimal overhead** compared to direct stdlib usage
- **Minimal memory footprint**: TAR ~1-5 KB, ZIP ~0.2-5 KB
- **Efficient overhead**: TAR 1,536 bytes fixed, ZIP ~150-200 bytes
**Algorithm Operations:**
**Benchmark Results (AMD64, Go 1.25, 20 samples per test):**
#### Archive Creation Performance by Data Size
**Small Data (1KB):**
| Format | Median | Mean | CPU Time | Memory | Allocations | Archive Size | Overhead |
|--------|--------|------|----------|--------|-------------|--------------|----------|
| **TAR** | <1µs | <1µs | 0.019ms | 5.2 KB | 19 | 2,560 bytes | 1,536 bytes (150%) |
| **ZIP** | <1µs | <1µs | 0.006ms | 5.2 KB | 19 | ~200 bytes | ~176 bytes |
**Medium Data (10KB):**
| Format | Median | Mean | CPU Time | Memory | Allocations | Archive Size | Overhead |
|--------|--------|------|----------|--------|-------------|--------------|----------|
| **TAR** | <1µs | <1µs | 0.019ms | 5.2 KB | 19 | 11,776 bytes | 1,536 bytes (15%) |
| **ZIP** | <1µs | <1µs | 0.008ms | 5.2 KB | 19 | ~10,400 bytes | ~160 bytes |
**Large Data (100KB):**
| Format | Median | Mean | CPU Time | Memory | Allocations | Archive Size | Overhead |
|--------|--------|------|----------|--------|-------------|--------------|----------|
| **TAR** | <1µs | <1µs | 0.020ms | 5.2 KB | 19 | 103,936 bytes | 1,536 bytes (1.5%) |
| **ZIP** | <1µs | <1µs | 0.009ms | 5.2 KB | 19 | ~102,600 bytes | ~200 bytes |
#### Archive Extraction Performance by Data Size
**Small Data (1KB):**
| Format | Median | Mean | CPU Time | Memory | Allocations |
|--------|--------|------|----------|--------|-------------|
| **TAR** | <1µs | <1µs | 0.008ms | 1.7 KB | 22 |
| **ZIP** | <1µs | <1µs | 0.006ms | 0.2 KB | 4 |
**Medium Data (10KB):**
| Format | Median | Mean | CPU Time | Memory | Allocations |
|--------|--------|------|----------|--------|-------------|
| **TAR** | <1µs | <1µs | 0.005ms | 1.2 KB | 22 |
| **ZIP** | <1µs | <1µs | 0.006ms | 0.2 KB | 4 |
**Large Data (100KB):**
| Format | Median | Mean | CPU Time | Memory | Allocations |
|--------|--------|------|----------|--------|-------------|
| **TAR** | <1µs | <1µs | 0.006ms | 1.2 KB | 22 |
| **ZIP** | <1µs | <1µs | 0.006ms | 0.2 KB | 4 |
**Important**: These benchmarks measure archiving performance only (uncompressed data). TAR and ZIP have fundamental differences that must be understood when interpreting results.
#### Algorithm Operations
| Operation | Complexity | Typical Latency | Allocations |
|-----------|------------|-----------------|-------------|
@@ -465,6 +515,8 @@ The archive package demonstrates excellent performance characteristics:
| Parse() | O(n) | <100ns | 0-1 |
| DetectHeader() | O(1) | <50ns | 0 |
#### Detection & Marshaling Performance
**Detection Operations:**
| Operation | Sample Size | Median | Mean | Max | Notes |
@@ -482,6 +534,34 @@ The archive package demonstrates excellent performance characteristics:
| MarshalJSON() | 1000 | <1µs | <1µs | 1µs | String + quotes |
| UnmarshalJSON() | 1000 | <1µs | 1µs | 3µs | JSON parsing |
#### Key Performance Insights
1. **Creation Speed**: Both formats show similar sub-microsecond performance
2. **Extraction Efficiency**: ZIP uses 5-8x less memory (0.2 KB vs 1.2-1.7 KB)
3. **CPU Efficiency**: ZIP slightly faster (0.006-0.009ms vs 0.019-0.020ms for creation)
4. **Memory Footprint**:
- TAR: Consistent 5.2 KB for creation, 1.2-1.7 KB for extraction
- ZIP: 5.2 KB for creation, only 0.2 KB for extraction
5. **Overhead Analysis**:
- TAR: Fixed 1,536 bytes (150% for 1KB, 1.5% for 100KB)
- ZIP: Minimal ~150-200 bytes regardless of size
#### Critical Format Differences
**Compression:**
- **TAR**: Archive format only, NO built-in compression. Must use external compression (Gzip/Bzip2/LZ4/XZ) → `.tar.gz`, `.tar.xz`, etc.
- **ZIP**: Integrates compression natively within the format
- ⚠️ **Compression ratios are NOT comparable** between TAR and ZIP formats
**Robustness to Corruption:**
- **TAR**: Sequential format allows reading/writing even if partially corrupted. Files before corruption point remain accessible.
- **ZIP**: Central directory at end of archive - ANY corruption typically prevents reading the entire archive.
-**TAR recommended** for critical backups, long-term storage, and scenarios where data integrity cannot be guaranteed
**Recommended Usage:**
- **TAR + Compression**: Backups, streaming, network transfers, critical data requiring corruption resilience
- **ZIP**: Software distribution, Windows compatibility, random file access, GUI applications
### Test Conditions
**Hardware Configuration:**
+517
View File
@@ -0,0 +1,517 @@
/*
* MIT License
*
* Copyright (c) 2025 Nicolas JUHEL
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in all
* copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*
*
*/
package archive_test
import (
"bytes"
"io"
"os"
"path/filepath"
"runtime"
"strings"
"time"
. "github.com/onsi/ginkgo/v2"
"github.com/onsi/gomega/gmeasure"
"github.com/nabbar/golib/archive/archive"
)
var _ = Describe("TC-BC-001: Benchmarks", func() {
Context("TC-BC-002: Algorithm operations", func() {
It("TC-BC-003: should benchmark Parse operations", func() {
experiment := gmeasure.NewExperiment("Parse operations")
AddReportEntry(experiment.Name, experiment)
inputs := []string{"tar", "zip", "none", "unknown"}
experiment.Sample(func(idx int) {
for _, input := range inputs {
experiment.MeasureDuration(input, func() {
_ = archive.Parse(input)
})
}
}, gmeasure.SamplingConfig{N: 100})
})
It("TC-BC-004: should benchmark String operations", func() {
experiment := gmeasure.NewExperiment("String operations")
AddReportEntry(experiment.Name, experiment)
algorithms := []archive.Algorithm{archive.None, archive.Tar, archive.Zip}
experiment.Sample(func(idx int) {
for _, alg := range algorithms {
experiment.MeasureDuration(alg.String(), func() {
_ = alg.String()
})
}
}, gmeasure.SamplingConfig{N: 1000})
})
It("TC-BC-005: should benchmark Extension operations", func() {
experiment := gmeasure.NewExperiment("Extension operations")
AddReportEntry(experiment.Name, experiment)
algorithms := []archive.Algorithm{archive.None, archive.Tar, archive.Zip}
experiment.Sample(func(idx int) {
for _, alg := range algorithms {
experiment.MeasureDuration(alg.String(), func() {
_ = alg.Extension()
})
}
}, gmeasure.SamplingConfig{N: 1000})
})
It("TC-BC-006: should benchmark DetectHeader operations", func() {
experiment := gmeasure.NewExperiment("DetectHeader operations")
AddReportEntry(experiment.Name, experiment)
// Create valid headers
tarHeader := make([]byte, 263)
copy(tarHeader[257:263], append([]byte("ustar"), 0x00))
zipHeader := make([]byte, 263)
zipHeader[0] = 0x50
zipHeader[1] = 0x4b
zipHeader[2] = 0x03
zipHeader[3] = 0x04
experiment.Sample(func(idx int) {
experiment.MeasureDuration("tar", func() {
_ = archive.Tar.DetectHeader(tarHeader)
})
experiment.MeasureDuration("zip", func() {
_ = archive.Zip.DetectHeader(zipHeader)
})
}, gmeasure.SamplingConfig{N: 1000})
})
})
Context("TC-BC-007: Detection operations", func() {
It("TC-BC-008: should benchmark Detect with various formats", func() {
experiment := gmeasure.NewExperiment("Detect operations")
AddReportEntry(experiment.Name, experiment)
// Prepare TAR archive
tmpDir, _ := createTempDir()
defer os.RemoveAll(tmpDir)
_ = createTestFile(tmpDir, "test.txt", strings.Repeat("x", 1000))
var tarBuf bytes.Buffer
tarWriter, _ := archive.Tar.Writer(&nopWriteCloser{&tarBuf})
_ = tarWriter.FromPath(tmpDir, "*.txt", nil)
_ = tarWriter.Close()
// Prepare ZIP archive
tmpFile, _ := createTempArchiveFile(".zip")
defer os.Remove(tmpFile.Name())
zipWriter, _ := archive.Zip.Writer(tmpFile)
_ = zipWriter.FromPath(tmpDir, "*.txt", nil)
_ = zipWriter.Close()
tmpFile.Close()
experiment.Sample(func(idx int) {
experiment.MeasureDuration("tar", func() {
_, reader, stream, err := archive.Detect(io.NopCloser(bytes.NewReader(tarBuf.Bytes())))
if err == nil {
if reader != nil {
reader.Close()
}
if stream != nil {
stream.Close()
}
}
})
experiment.MeasureDuration("zip", func() {
f, _ := os.Open(tmpFile.Name())
if f != nil {
defer f.Close()
_, reader, stream, err := archive.Detect(f)
if err == nil {
if reader != nil {
reader.Close()
}
if stream != nil {
stream.Close()
}
}
}
})
}, gmeasure.SamplingConfig{N: 100})
})
})
Context("TC-BC-009: Archive creation and extraction operations", func() {
It("TC-BC-010: should benchmark archive creation with different sizes", func() {
sizes := map[string]int{
"Small Data (1KB)": 1024,
"Medium Data (10KB)": 10240,
"Large Data (100KB)": 102400,
}
for sizeLabel, size := range sizes {
expTarCreate := gmeasure.NewExperiment("TAR Creation - " + sizeLabel)
AddReportEntry(expTarCreate.Name, expTarCreate)
expZipCreate := gmeasure.NewExperiment("ZIP Creation - " + sizeLabel)
AddReportEntry(expZipCreate.Name, expZipCreate)
// Prepare test data
tmpDir, _ := createTempDir()
_ = createTestFile(tmpDir, "test.txt", strings.Repeat("x", size))
// Benchmark TAR creation
expTarCreate.Sample(func(idx int) {
var buf bytes.Buffer
var m0, m1 runtime.MemStats
runtime.ReadMemStats(&m0)
t0 := time.Now()
expTarCreate.MeasureDuration("create", func() {
writer, _ := archive.Tar.Writer(&nopWriteCloser{&buf})
_ = writer.FromPath(tmpDir, "*.txt", nil)
_ = writer.Close()
})
elapsed := time.Since(t0)
runtime.ReadMemStats(&m1)
archiveSize := buf.Len()
ratio := (1 - float64(archiveSize)/float64(size)) * 100
if ratio < 0 {
ratio = 0
}
expTarCreate.RecordValue("CPU time", elapsed.Seconds()*1000, gmeasure.Units("ms"))
expTarCreate.RecordValue("Memory", float64(m1.TotalAlloc-m0.TotalAlloc)/1024, gmeasure.Units("KB"))
expTarCreate.RecordValue("Allocs", float64(m1.Mallocs-m0.Mallocs), gmeasure.Units("allocs"))
expTarCreate.RecordValue("Archive Size", float64(archiveSize), gmeasure.Units("bytes"))
expTarCreate.RecordValue("Overhead", float64(archiveSize-size), gmeasure.Units("bytes"))
}, gmeasure.SamplingConfig{N: 20})
// Benchmark ZIP creation
expZipCreate.Sample(func(idx int) {
tmpFile, _ := createTempArchiveFile(".zip")
defer os.Remove(tmpFile.Name())
var m0, m1 runtime.MemStats
runtime.ReadMemStats(&m0)
t0 := time.Now()
expZipCreate.MeasureDuration("create", func() {
writer, _ := archive.Zip.Writer(tmpFile)
_ = writer.FromPath(tmpDir, "*.txt", nil)
_ = writer.Close()
})
elapsed := time.Since(t0)
runtime.ReadMemStats(&m1)
tmpFile.Close()
stat, _ := os.Stat(tmpFile.Name())
archiveSize := int(stat.Size())
ratio := (1 - float64(archiveSize)/float64(size)) * 100
if ratio < 0 {
ratio = 0
}
expZipCreate.RecordValue("CPU time", elapsed.Seconds()*1000, gmeasure.Units("ms"))
expZipCreate.RecordValue("Memory", float64(m1.TotalAlloc-m0.TotalAlloc)/1024, gmeasure.Units("KB"))
expZipCreate.RecordValue("Allocs", float64(m1.Mallocs-m0.Mallocs), gmeasure.Units("allocs"))
expZipCreate.RecordValue("Archive Size", float64(archiveSize), gmeasure.Units("bytes"))
expZipCreate.RecordValue("Overhead", float64(archiveSize-size), gmeasure.Units("bytes"))
}, gmeasure.SamplingConfig{N: 20})
os.RemoveAll(tmpDir)
}
})
It("TC-BC-011: should benchmark archive extraction with different sizes", func() {
sizes := map[string]int{
"Small Data (1KB)": 1024,
"Medium Data (10KB)": 10240,
"Large Data (100KB)": 102400,
}
for sizeLabel, size := range sizes {
expTarExtract := gmeasure.NewExperiment("TAR Extraction - " + sizeLabel)
AddReportEntry(expTarExtract.Name, expTarExtract)
expZipExtract := gmeasure.NewExperiment("ZIP Extraction - " + sizeLabel)
AddReportEntry(expZipExtract.Name, expZipExtract)
// Prepare test archives
tmpDir, _ := createTempDir()
_ = createTestFile(tmpDir, "test.txt", strings.Repeat("x", size))
// Create TAR archive
var tarBuf bytes.Buffer
tarWriter, _ := archive.Tar.Writer(&nopWriteCloser{&tarBuf})
_ = tarWriter.FromPath(tmpDir, "*.txt", nil)
_ = tarWriter.Close()
// Create ZIP archive
tmpZipFile, _ := createTempArchiveFile(".zip")
zipWriter, _ := archive.Zip.Writer(tmpZipFile)
_ = zipWriter.FromPath(tmpDir, "*.txt", nil)
_ = zipWriter.Close()
tmpZipFile.Close()
// Benchmark TAR extraction
expTarExtract.Sample(func(idx int) {
var m0, m1 runtime.MemStats
runtime.ReadMemStats(&m0)
t0 := time.Now()
expTarExtract.MeasureDuration("extract", func() {
reader, _ := archive.Tar.Reader(io.NopCloser(bytes.NewReader(tarBuf.Bytes())))
rc, _ := reader.Get("test.txt")
if rc != nil {
_, _ = io.Copy(io.Discard, rc)
rc.Close()
}
reader.Close()
})
elapsed := time.Since(t0)
runtime.ReadMemStats(&m1)
expTarExtract.RecordValue("CPU time", elapsed.Seconds()*1000, gmeasure.Units("ms"))
expTarExtract.RecordValue("Memory", float64(m1.TotalAlloc-m0.TotalAlloc)/1024, gmeasure.Units("KB"))
expTarExtract.RecordValue("Allocs", float64(m1.Mallocs-m0.Mallocs), gmeasure.Units("allocs"))
}, gmeasure.SamplingConfig{N: 20})
// Benchmark ZIP extraction
expZipExtract.Sample(func(idx int) {
var m0, m1 runtime.MemStats
runtime.ReadMemStats(&m0)
t0 := time.Now()
expZipExtract.MeasureDuration("extract", func() {
f, _ := os.Open(tmpZipFile.Name())
if f != nil {
defer f.Close()
reader, _ := archive.Zip.Reader(f)
if reader != nil {
defer reader.Close()
rc, _ := reader.Get("test.txt")
if rc != nil {
_, _ = io.Copy(io.Discard, rc)
rc.Close()
}
}
}
})
elapsed := time.Since(t0)
runtime.ReadMemStats(&m1)
expZipExtract.RecordValue("CPU time", elapsed.Seconds()*1000, gmeasure.Units("ms"))
expZipExtract.RecordValue("Memory", float64(m1.TotalAlloc-m0.TotalAlloc)/1024, gmeasure.Units("KB"))
expZipExtract.RecordValue("Allocs", float64(m1.Mallocs-m0.Mallocs), gmeasure.Units("allocs"))
}, gmeasure.SamplingConfig{N: 20})
os.RemoveAll(tmpDir)
os.Remove(tmpZipFile.Name())
}
})
})
Context("TC-BC-012: Multiple files operations", func() {
It("TC-BC-013: should benchmark multiple files archiving", func() {
fileCounts := map[string]int{
"5 files": 5,
"10 files": 10,
"25 files": 25,
}
for label, count := range fileCounts {
expTar := gmeasure.NewExperiment("TAR Multiple Files - " + label)
AddReportEntry(expTar.Name, expTar)
expZip := gmeasure.NewExperiment("ZIP Multiple Files - " + label)
AddReportEntry(expZip.Name, expZip)
// Prepare test data
tmpDir, _ := createTempDir()
totalSize := 0
for i := 0; i < count; i++ {
content := strings.Repeat("x", 1000)
_ = createTestFile(tmpDir, filepath.Join("file", "test"+strings.Repeat("0", 2-len(strings.Split(strings.Trim(strings.Repeat("0", i), "0"), "")))+strings.Trim(strings.Repeat("0", i), "0")+".txt"), content)
totalSize += len(content)
}
// Benchmark TAR
expTar.Sample(func(idx int) {
var buf bytes.Buffer
var m0, m1 runtime.MemStats
runtime.ReadMemStats(&m0)
t0 := time.Now()
expTar.MeasureDuration("create", func() {
writer, _ := archive.Tar.Writer(&nopWriteCloser{&buf})
_ = writer.FromPath(tmpDir, "*.txt", nil)
_ = writer.Close()
})
elapsed := time.Since(t0)
runtime.ReadMemStats(&m1)
expTar.RecordValue("CPU time", elapsed.Seconds()*1000, gmeasure.Units("ms"))
expTar.RecordValue("Memory", float64(m1.TotalAlloc-m0.TotalAlloc)/1024, gmeasure.Units("KB"))
expTar.RecordValue("Allocs", float64(m1.Mallocs-m0.Mallocs), gmeasure.Units("allocs"))
}, gmeasure.SamplingConfig{N: 20})
// Benchmark ZIP
expZip.Sample(func(idx int) {
tmpFile, _ := createTempArchiveFile(".zip")
defer os.Remove(tmpFile.Name())
var m0, m1 runtime.MemStats
runtime.ReadMemStats(&m0)
t0 := time.Now()
expZip.MeasureDuration("create", func() {
writer, _ := archive.Zip.Writer(tmpFile)
_ = writer.FromPath(tmpDir, "*.txt", nil)
_ = writer.Close()
})
elapsed := time.Since(t0)
runtime.ReadMemStats(&m1)
tmpFile.Close()
expZip.RecordValue("CPU time", elapsed.Seconds()*1000, gmeasure.Units("ms"))
expZip.RecordValue("Memory", float64(m1.TotalAlloc-m0.TotalAlloc)/1024, gmeasure.Units("KB"))
expZip.RecordValue("Allocs", float64(m1.Mallocs-m0.Mallocs), gmeasure.Units("allocs"))
}, gmeasure.SamplingConfig{N: 20})
os.RemoveAll(tmpDir)
}
})
})
Context("TC-BC-014: Round-trip operations", func() {
It("TC-BC-015: should benchmark full round-trip", func() {
experiment := gmeasure.NewExperiment("Round-trip operations")
AddReportEntry(experiment.Name, experiment)
tmpDir, _ := createTempDir()
defer os.RemoveAll(tmpDir)
_ = createTestFile(tmpDir, "test.txt", strings.Repeat("x", 1024))
experiment.Sample(func(idx int) {
experiment.MeasureDuration("tar", func() {
var buf bytes.Buffer
writer, _ := archive.Tar.Writer(&nopWriteCloser{&buf})
_ = writer.FromPath(tmpDir, "*.txt", nil)
_ = writer.Close()
reader, _ := archive.Tar.Reader(io.NopCloser(&buf))
rc, _ := reader.Get("test.txt")
if rc != nil {
_, _ = io.ReadAll(rc)
rc.Close()
}
reader.Close()
})
experiment.MeasureDuration("zip", func() {
tmpFile, _ := createTempArchiveFile(".zip")
defer os.Remove(tmpFile.Name())
writer, _ := archive.Zip.Writer(tmpFile)
_ = writer.FromPath(tmpDir, "*.txt", nil)
_ = writer.Close()
tmpFile.Close()
f, _ := os.Open(tmpFile.Name())
if f != nil {
reader, _ := archive.Zip.Reader(f)
rc, _ := reader.Get("test.txt")
if rc != nil {
_, _ = io.ReadAll(rc)
rc.Close()
}
reader.Close()
f.Close()
}
})
}, gmeasure.SamplingConfig{N: 20})
})
})
Context("TC-BC-016: Size and overhead analysis", func() {
It("TC-BC-017: should measure archive overhead", func() {
sizes := []int{1024, 10240, 102400}
algorithms := []archive.Algorithm{archive.Tar, archive.Zip}
for _, size := range sizes {
for _, alg := range algorithms {
tmpDir, _ := createTempDir()
_ = createTestFile(tmpDir, "test.txt", strings.Repeat("x", size))
var archiveSize int64
if alg == archive.Tar {
var buf bytes.Buffer
writer, _ := alg.Writer(&nopWriteCloser{&buf})
_ = writer.FromPath(tmpDir, "*.txt", nil)
_ = writer.Close()
archiveSize = int64(buf.Len())
} else {
tmpFile, _ := createTempArchiveFile(".zip")
defer os.Remove(tmpFile.Name())
writer, _ := alg.Writer(tmpFile)
_ = writer.FromPath(tmpDir, "*.txt", nil)
_ = writer.Close()
tmpFile.Close()
stat, _ := os.Stat(tmpFile.Name())
archiveSize = stat.Size()
}
overhead := archiveSize - int64(size)
overheadPercent := (float64(overhead) / float64(size)) * 100
AddReportEntry(
"Archive Overhead Analysis",
map[string]interface{}{
"Algorithm": alg.String(),
"Original Size": size,
"Archive Size": archiveSize,
"Overhead (bytes)": overhead,
"Overhead (%)": overheadPercent,
},
)
os.RemoveAll(tmpDir)
}
}
})
})
})
+79 -24
View File
@@ -154,38 +154,93 @@ XZ: 0xFD 0x37 0x7A 0x58 0x5A 0x00
### Benchmarks
Based on actual benchmark results (AMD64, Go 1.25):
Based on actual benchmark results (AMD64, Go 1.25, 20 samples per test):
| Operation | Data Size | Median | Mean | Max |
|-----------|-----------|--------|------|-----|
| **Gzip Compress (1KB)** | 1KB | <1µs | <1µs | 300µs |
| **Gzip Decompress (1KB)** | 1KB | <1µs | <1µs | 300µs |
| **Bzip2 Compress (1KB)** | 1KB | <1µs | <1µs | 300µs |
| **LZ4 Compress (1KB)** | 1KB | <1µs | <1µs | 300µs |
| **XZ Compress (1KB)** | 1KB | 300µs | 500µs | 700µs |
| **Detection** | 6 bytes | <1µs | <1µs | 100µs |
| **Parse** | String | <1µs | <1µs | 100µs |
#### Compression Performance
**Compression Ratios (1KB test data):**
**Small Data (1KB):**
```
gzip: 94.2%
bzip2: 90.4%
lz4: 93.1%
xz: 89.8%
```
| Algorithm | Median | Mean | CPU Time | Memory | Allocations | Ratio |
|-----------|--------|------|----------|--------|-------------|-------|
| **LZ4** | <1µs | <1µs | 0.032ms | 4.5 KB | 16 | 93.1% |
| **Gzip** | <1µs | <1µs | 0.073ms | 795 KB | 24 | 94.2% |
| **Bzip2** | 100µs | 200µs | 0.186ms | 650 KB | 34 | 90.4% |
| **XZ** | 300µs | 500µs | 0.513ms | 8,226 KB | 144 | 89.8% |
**Medium Data (10KB):**
| Algorithm | Median | Mean | CPU Time | Memory | Allocations | Ratio |
|-----------|--------|------|----------|--------|-------------|-------|
| **LZ4** | <1µs | <1µs | 0.019ms | 4.5 KB | 17 | 99.0% |
| **Gzip** | <1µs | 100µs | 0.089ms | 795 KB | 25 | 99.1% |
| **Bzip2** | 200µs | 300µs | 0.339ms | 822 KB | 37 | 98.8% |
| **XZ** | 300µs | 400µs | 0.378ms | 8,226 KB | 147 | 98.7% |
**Large Data (100KB):**
| Algorithm | Median | Mean | CPU Time | Memory | Allocations | Ratio |
|-----------|--------|------|----------|--------|-------------|-------|
| **LZ4** | <1µs | <1µs | 0.044ms | 1.2 KB | 11 | 99.5% |
| **Gzip** | 300µs | 400µs | 0.351ms | 796 KB | 26 | 99.7% |
| **Bzip2** | 2.7ms | 2.8ms | 2.753ms | 2,544 KB | 38 | 99.9% |
| **XZ** | 6.9ms | 7.0ms | 6.994ms | 8,228 KB | 327 | 99.8% |
#### Decompression Performance
**Small Data (1KB):**
| Algorithm | Median | Mean | CPU Time | Memory | Allocations |
|-----------|--------|------|----------|--------|-------------|
| **LZ4** | <1µs | <1µs | 0.018ms | 1.2 KB | 7 |
| **Gzip** | <1µs | <1µs | 0.024ms | 24.6 KB | 16 |
| **Bzip2** | <1µs | 100µs | 0.098ms | 276 KB | 25 |
| **XZ** | 100µs | 200µs | 0.192ms | 8,225 KB | 89 |
**Medium Data (10KB):**
| Algorithm | Median | Mean | CPU Time | Memory | Allocations |
|-----------|--------|------|----------|--------|-------------|
| **LZ4** | <1µs | <1µs | 0.017ms | 1.2 KB | 8 |
| **Gzip** | <1µs | <1µs | 0.033ms | 33.4 KB | 17 |
| **Bzip2** | 100µs | 100µs | 0.133ms | 276 KB | 26 |
| **XZ** | 100µs | 100µs | 0.144ms | 8,225 KB | 92 |
**Large Data (100KB):**
| Algorithm | Median | Mean | CPU Time | Memory | Allocations |
|-----------|--------|------|----------|--------|-------------|
| **LZ4** | <1µs | <1µs | 0.028ms | 1.2 KB | 6 |
| **Gzip** | 100µs | 100µs | 0.112ms | 312 KB | 19 |
| **Bzip2** | 1.3ms | 1.3ms | 1.259ms | 276 KB | 28 |
| **XZ** | 800µs | 1.0ms | 0.970ms | 8,225 KB | 192 |
#### Detection & Parsing
| Operation | Median | Mean | Max | Throughput |
|-----------|--------|------|-----|------------|
| **Parse** (string) | <1µs | <1µs | 100µs | >1M ops/sec |
| **Detection** (6 bytes) | <1µs | <1µs | 100µs | >1M ops/sec |
### Memory Usage
**Algorithm-Specific Memory Footprint:**
```
Base overhead: Minimal (enum operations)
Base overhead: 1 byte (enum type)
Detection: 6-byte peek buffer
Reader wrapping: Depends on algorithm
- Gzip: ~256KB internal buffer
- Bzip2: ~64KB internal buffer
- LZ4: ~64KB internal buffer
- XZ: Variable (algorithm-dependent)
Writer wrapping: Depends on algorithm and settings
Parse operations: Minimal (string length)
Compression buffers:
- LZ4: ~4.5 KB (fastest, lowest memory)
- Bzip2: ~650 KB (1KB) to ~2.5 MB (100KB)
- Gzip: ~795 KB (consistent across sizes)
- XZ: ~8.2 MB (highest memory usage)
Decompression buffers:
- LZ4: ~1.2 KB (minimal footprint)
- Gzip: ~25-300 KB (size-dependent)
- Bzip2: ~276 KB (consistent)
- XZ: ~8.2 MB (consistent)
```
### Scalability
+112 -39
View File
@@ -206,19 +206,74 @@ Coverage: All public API usage patterns
### Performance Metrics
**Benchmark Results (AMD64, Go 1.25):**
**Benchmark Results (AMD64, Go 1.25, 20 samples per test):**
#### Compression Performance by Data Size
**Small Data (1KB):**
| Algorithm | Median | Mean | CPU Time | Memory | Allocations | Ratio |
|-----------|--------|------|----------|--------|-------------|-------|
| **LZ4** | <1µs | <1µs | 0.032ms | 4.5 KB | 16 | 93.1% |
| **Gzip** | <1µs | <1µs | 0.073ms | 795 KB | 24 | 94.2% |
| **Bzip2** | 100µs | 200µs | 0.186ms | 650 KB | 34 | 90.4% |
| **XZ** | 300µs | 500µs | 0.513ms | 8,226 KB | 144 | 89.8% |
**Medium Data (10KB):**
| Algorithm | Median | Mean | CPU Time | Memory | Allocations | Ratio |
|-----------|--------|------|----------|--------|-------------|-------|
| **LZ4** | <1µs | <1µs | 0.019ms | 4.5 KB | 17 | 99.0% |
| **Gzip** | <1µs | 100µs | 0.089ms | 795 KB | 25 | 99.1% |
| **Bzip2** | 200µs | 300µs | 0.339ms | 822 KB | 37 | 98.8% |
| **XZ** | 300µs | 400µs | 0.378ms | 8,226 KB | 147 | 98.7% |
**Large Data (100KB):**
| Algorithm | Median | Mean | CPU Time | Memory | Allocations | Ratio |
|-----------|--------|------|----------|--------|-------------|-------|
| **LZ4** | <1µs | <1µs | 0.044ms | 1.2 KB | 11 | 99.5% |
| **Gzip** | 300µs | 400µs | 0.351ms | 796 KB | 26 | 99.7% |
| **Bzip2** | 2.7ms | 2.8ms | 2.753ms | 2,544 KB | 38 | 99.9% |
| **XZ** | 6.9ms | 7.0ms | 6.994ms | 8,228 KB | 327 | 99.8% |
#### Decompression Performance by Data Size
**Small Data (1KB):**
| Algorithm | Median | Mean | CPU Time | Memory | Allocations |
|-----------|--------|------|----------|--------|-------------|
| **LZ4** | <1µs | <1µs | 0.018ms | 1.2 KB | 7 |
| **Gzip** | <1µs | <1µs | 0.024ms | 24.6 KB | 16 |
| **Bzip2** | <1µs | 100µs | 0.098ms | 276 KB | 25 |
| **XZ** | 100µs | 200µs | 0.192ms | 8,225 KB | 89 |
**Medium Data (10KB):**
| Algorithm | Median | Mean | CPU Time | Memory | Allocations |
|-----------|--------|------|----------|--------|-------------|
| **LZ4** | <1µs | <1µs | 0.017ms | 1.2 KB | 8 |
| **Gzip** | <1µs | <1µs | 0.033ms | 33.4 KB | 17 |
| **Bzip2** | 100µs | 100µs | 0.133ms | 276 KB | 26 |
| **XZ** | 100µs | 100µs | 0.144ms | 8,225 KB | 92 |
**Large Data (100KB):**
| Algorithm | Median | Mean | CPU Time | Memory | Allocations |
|-----------|--------|------|----------|--------|-------------|
| **LZ4** | <1µs | <1µs | 0.028ms | 1.2 KB | 6 |
| **Gzip** | 100µs | 100µs | 0.112ms | 312 KB | 19 |
| **Bzip2** | 1.3ms | 1.3ms | 1.259ms | 276 KB | 28 |
| **XZ** | 800µs | 1.0ms | 0.970ms | 8,225 KB | 192 |
#### Detection & Parsing Performance
| Operation | Median | Mean | Max | Throughput |
|-----------|--------|------|-----|------------|
| **Gzip Compress (1KB)** | <1µs | <1µs | 300µs | Variable |
| **Gzip Decompress (1KB)** | <1µs | <1µs | 300µs | Variable |
| **Bzip2 Compress (1KB)** | <1µs | <1µs | 300µs | Variable |
| **LZ4 Compress (1KB)** | <1µs | <1µs | 300µs | ~500 MB/s |
| **XZ Compress (1KB)** | 300µs | 500µs | 700µs | ~5 MB/s |
| **Detection (6 bytes)** | <1µs | <1µs | 100µs | >1M ops/sec |
| **Parse (string)** | <1µs | <1µs | 100µs | >1M ops/sec |
| **Parse** (string) | <1µs | <1µs | 100µs | >1M ops/sec |
| **Detection** (6 bytes) | <1µs | <1µs | 100µs | >1M ops/sec |
*Measured with gmeasure.Experiment on 20-100 samples per benchmark*
*All measurements obtained with gmeasure.Experiment using runtime.ReadMemStats for memory profiling*
### Test Execution Conditions
@@ -526,22 +581,26 @@ The `compress` package demonstrates excellent performance characteristics:
- **Low latency**: Sub-millisecond operations for detection and parsing
- **Minimal overhead**: Stateless operations with O(1) complexity
- **Efficient delegation**: Direct wrapping without intermediate buffering
- **Algorithm-dependent throughput**: Compression speed varies by algorithm
- **Algorithm-dependent throughput**: LZ4 fastest, XZ slowest but best compression
**Benchmark Results:**
**Key Performance Insights:**
```
Operation | Median | Mean | Max | Samples
=========================================================================
Parse (string) | <1µs | <1µs | 100µs | 100
Detect (6 bytes peek) | <1µs | <1µs | 100µs | 100
Gzip Compress (1KB) | <1µs | <1µs | 300µs | 20
Gzip Decompress (1KB) | <1µs | <1µs | 300µs | 20
Bzip2 Compress (1KB) | <1µs | <1µs | 300µs | 20
LZ4 Compress (1KB) | <1µs | <1µs | 300µs | 20
XZ Compress (1KB) | 300µs | 500µs | 700µs | 20
Compression Ratio Analysis | varies | varies | varies | 20
```
1. **Speed vs Compression Trade-off**:
- **LZ4**: Fastest (<1µs), minimal memory (1-5 KB), good ratio (93-99%)
- **Gzip**: Fast (<1µs to 400µs), moderate memory (~800 KB), excellent ratio (94-99.7%)
- **Bzip2**: Medium speed (100µs to 2.8ms), moderate memory (650 KB-2.5 MB), best ratio (90-99.9%)
- **XZ**: Slowest (300µs to 7ms), highest memory (~8.2 MB), excellent ratio (89-99.8%)
2. **Data Size Impact**:
- Small data (1KB): All algorithms show minimal latency differences
- Medium data (10KB): Performance characteristics become more apparent
- Large data (100KB): Clear separation between algorithm speeds
3. **Memory Footprint**:
- LZ4 uses 99% less memory than XZ
- Gzip memory usage remains stable across data sizes
- Bzip2 memory scales with data size
- XZ maintains consistent 8.2 MB regardless of data size
### Test Conditions
@@ -601,26 +660,40 @@ The package scales linearly with concurrent operations:
### Memory Usage
**Memory Profile:**
**Memory Profile (Real Measurements):**
#### Compression Memory by Data Size
| Algorithm | 1KB | 10KB | 100KB | Scaling |
|-----------|-----|------|-------|---------|
| **LZ4** | 4.5 KB | 4.5 KB | 1.2 KB | Minimal, consistent |
| **Gzip** | 795 KB | 795 KB | 796 KB | Stable across sizes |
| **Bzip2** | 650 KB | 822 KB | 2,544 KB | Scales with data |
| **XZ** | 8,226 KB | 8,226 KB | 8,228 KB | High, consistent |
#### Decompression Memory by Data Size
| Algorithm | 1KB | 10KB | 100KB | Scaling |
|-----------|-----|------|-------|---------|
| **LZ4** | 1.2 KB | 1.2 KB | 1.2 KB | Minimal, consistent |
| **Gzip** | 24.6 KB | 33.4 KB | 312 KB | Scales with data |
| **Bzip2** | 276 KB | 276 KB | 276 KB | Stable across sizes |
| **XZ** | 8,225 KB | 8,225 KB | 8,225 KB | High, consistent |
#### Base Operations Memory
```
Object | Size | Count | Total
================================================
Algorithm enum | 1 byte | 1 | 1 byte
Parse/Detect | Minimal | - | <1KB
Reader (Gzip) | ~256KB | 1 | ~256KB
Writer (Gzip) | ~256KB | 1 | ~256KB
================================================
Algorithm enum: 1 byte (uint8)
Parse operations: Minimal (string length)
Detect operations: 6-byte peek buffer
List operations: Static array (no allocation)
```
**Memory Scaling:**
| Operation | Memory | Notes |
|-----------|--------|-------|
| Algorithm methods | O(1) | No allocation |
| Parse | O(n) | String length |
| Detect | O(1) | 6-byte peek |
| Reader/Writer | O(1) | Algorithm-dependent |
**Memory Efficiency Ranking:**
1. **LZ4**: 1-5 KB (compression/decompression) - Best for memory-constrained environments
2. **Gzip**: 25-800 KB - Good balance for most use cases
3. **Bzip2**: 276-2,544 KB - Moderate memory footprint
4. **XZ**: ~8.2 MB - High memory usage, not suitable for embedded systems
---