Engineering notes

What the compression tax costs you — and when it still pays

If you are sizing a DeltaGlider deployment, negotiating SLAs, or choosing between cheap plaintext storage and delta compression plus optional proxy encryption, you need numbers tied to your artifact shape and network — not a slogan. This is the published narrative around our compression-tax benchmark harness — reproducible, forkable, and meant to survive scrutiny from your own performance reviewers. This page explains what our published harness measures, what the pinned sample run implies for capacity planning, and how you can rerun the same workload on your hardware. The charts render in your browser (Chart.js); values come from one archived bundle so you can verify them against the repo.

Why read this

What you can take away.

You are not here for our implementation details — you want decisions: sizing, risk, and economics.
  • Capacity & latency budgets. You see wall-clock MB/s for PUT and GET per mode, so you can compare upload pipelines (CI bursts) vs download-heavy workloads (deploys, mirrors) against what your users tolerate.
  • Storage economics vs CPU. When deltas shrink stored bytes enough, you fund cheaper object storage or longer retention; the charts show whether that trade shows up on your profile or disappears under network noise.
  • Encryption without rewriting clients. Proxy-side AES-GCM keeps SigV4 and S3 semantics; the benchmark isolates encryption cost so you can judge whether your bottleneck is crypto, disk, or something else.
  • Reproducibility. Everything is scripted — if our headline claims do not match what you see on your VM, you still own the methodology and raw CSV/JSON to explain the gap (region, disk class, concurrency, artifact mix).
Premise

What the harness answers — in operator terms.

These are the questions we encode into CSV + summary.json + HTML so you are not guessing from a single headline metric.
  • Will uploads keep up? Wall-clock PUT MB/s with compression on vs off — so you know whether ingest pipelines need wider concurrency or bigger proxy CPUs.
  • Will downloads feel OK? Cold vs warm GET MB/s when objects are stored as deltas or ciphertext — so you know whether edge caches or client parallelism matter more than proxy tuning.
  • Will the proxy fit? RSS, Docker CPU (mean/max over each mode window), and backend disk footprint — so you can map containers to instance sizes and spot noisy neighbours.
  • Will storage shrink enough? Logical bytes vs Prometheus Δ saved — so you can translate artifact churn into TB/month before you trust a cheaper tier.
Methodology

Same artifacts, four buckets, one harness.

The runner ships under docs/benchmark/. It downloads real artifacts (here: contiguous Alpine virt ISOs), runs PUT → cold GET → warm GET per mode, scrapes Prometheus and health, optionally captures host JSON (docker stats, du), and emits CSVs plus an HTML report — so your operators get the same artefact trail they would keep for an internal performance review.

Four modes map to four buckets: passthrough, compression-only, encryption-only (proxy AES-GCM at rest), compression + encryption — so you never confuse codec effects with backend routing.

Isolation. Single-VM smoke can restart the proxy between modes so per-mode RSS reflects a fresh process; split client/proxy VMs are what you use when you want publication-grade separation (see README).

Honest metrics. An inner docker restart between PUT and cold GET resets Prometheus counters — verification fails unless you choose --no-proxy-restart or --skip-compression-verify. You decide which trade-off matches how you test cold reads.

Results

Interactive charts (pinned sample run).

Charts render client-side so they stay faithful to Chart.js. Numbers match `benchmarkSampleRun.ts`; refresh that file when you adopt a new canonical run for marketing.
Interpretation

How to read Docker CPU without fooling yourself.

CPU% from a single idle snapshot lied; the report aggregates mean/max inside each mode window when timeseries exists.

For your planning, treat throughput MB/s as the primary speed signal for user-visible work. CPU charts complement that: they show whether you are thermally or scheduler-bound during mixed phases, not a perfect map of “xdelta cost” vs “memcpy cost.”

Whole-window averages blend PUT spikes with quieter GET phases — ordering effects happen. If your SLA is upload-bound, weight PUT; if mirror/download dominates, weight GET rows from the summary tables under the charts.

Conclusions

What this sample run implies — and what it does not.

Grounded in the pinned Hetzner single-VM Alpine ISO bundle referenced on this page. Your mileage will differ; these are structured takeaways, not guarantees.

Throughput

  • On this hardware and artifact mix, passthrough PUT landed near 105.57 MB/s wall-clock vs compression near 8.88 MB/s — roughly 11.9× faster ingest without delta encoding. For you: if CI spends most of its wall time in uploads and CPU headroom is scarce, budget more cores or fewer concurrent writers before blaming the network.
  • Encryption-mode cold GET showed very high reported MB/s (299.9 MB/s vs passthrough ~97.05 MB/s on this run). The client still receives plaintext; the gap reflects workload timing and cache state — not a promise every workload decrypts “faster than plaintext.” Use it as a sign your bottleneck may not be AES here.
  • Compression GET MB/s sat in the same ballpark as passthrough cold/warm on this profile — for you: once objects are written, read paths may be acceptable even when PUT was expensive; validate with your object sizes and cache behaviour.

Storage

Compression modes pushed implied stored size to about 61% below logical uploads ( ~0.124 GB implied vs ~0.317 GB logical for the same five ISOs). For you: when object-store bills dominate opex, that delta can fund cheaper tiers or longer retention — rerun with your binaries (kernels, models, DB dumps) because similarity drives savings.

Footprint & ops

  • Docker CPU mean/max varied by mode window — use it alongside RSS and disk charts when rightsizing the proxy container versus colocated workloads.
  • Single-VM smoke intentionally stresses everything on one box — fine for regression detection and quick parity checks; it is not a substitute for isolating client network RTT or backend latency in production. When you publish externally, prefer the two-VM topology from the benchmark README.

Bottom line for buyers & builders

  • Need minimum ingest latency and do not pay per GB? Passthrough or tuned concurrency may beat compression for that pipeline — confirm with your files.
  • Need minimum stored bytes on similar sequential artifacts? Compression modes here show the order-of-magnitude storage win you should model before picking cold storage.
  • Need ciphertext at the provider without client changes? Encryption modes isolate that tax so you can decide if proxy AES fits your CPU envelope.
  • None of this replaces your proof: rerun the harness, compare charts to this page, and attach the tarball when you internal sign-off.
Reproduce

Run the same benchmark yourself.

Clone https://github.com/beshu-tech/deltaglider_proxy, install Python deps, export HCLOUD_TOKEN, then drive the lifecycle (single-VM smoke below; swap for a split client/proxy run when you want cleaner networking).

1 · Toolchain

python3 -m venv .venv-dgp-bench
source .venv-dgp-bench/bin/activate
pip install -r docs/benchmark/requirements.txt

2 · Provision Hetzner single VM

export HCLOUD_TOKEN=…
python docs/benchmark/bench_production_tax.py up \
  --run-id dgp-bench-$(date -u +%Y%m%d-%H%M%SZ) \
  --single-vm --location hel1 --client-type ccx33 \
  --ssh-key-name YOUR_HCLOUD_SSH_KEY_NAME

3 · Execute smoke + download bundle

python docs/benchmark/bench_production_tax.py single-vm-smoke \
  --run-id YOUR_RUN_ID \
  --artifact-count 5 \
  --artifact-source alpine-iso \
  --artifact-extension .iso \
  --alpine-branch v3.19 \
  --alpine-flavor virt \
  --alpine-arch x86_64 \
  --concurrency 1 \
  --no-proxy-restart

Use --no-proxy-restart so Prometheus verification stays aligned; otherwise add --skip-compression-verify. Bundles land in docs/benchmark/results/<run-id>.tgz.

4 · Render the HTML report

python docs/benchmark/bench_production_tax.py html-report \
  --bundle docs/benchmark/results/YOUR_RUN_ID.tgz \
  --out docs/benchmark/results/YOUR_RUN_ID-report.html

Deeper methodology and Grafana mapping: docs/benchmark/README.md, docs/benchmark/grafana-parity.md.

Harness and HTML report evolve with the proxy (CPU rollup semantics, restart flags). Pin benchmarkSampleRun.ts when you refresh marketing numbers — your stakeholders should always be able to diff narrative claims against an archived tarball.