Benchmarks
Sairo ships with a benchmark suite you can run against your own deployment. All numbers on the landing page are derived from these reproducible tests.
Test Environment
Section titled “Test Environment”| Parameter | Value |
|---|---|
| Runtime | Docker container, single Uvicorn process |
| Host | macOS, Apple Silicon (Docker Desktop) |
| Local S3 | MinIO (localhost:9000) |
| Production S3 | Remote S3-compatible object storage |
| Production dataset | 134,707 objects, 38.25 TB |
| Methodology | 30 iterations per measurement, percentile reporting |
Search Latency
Section titled “Search Latency”FTS5 trigram search against the production bucket (134,707 objects, 38.25 TB):
| Query | Results | p50 | p95 |
|---|---|---|---|
parquet (limit=100) | 100 | 3.1ms | 4.8ms |
events | 200 | 2.4ms | 16.2ms |
tracking | 200 | 2.3ms | 3.0ms |
analytics | 200 | 2.5ms | 3.4ms |
ingest | 200 | 2.4ms | 5.5ms |
metadata | 200 | 2.4ms | 4.3ms |
2026 | 200 | 2.2ms | 22.0ms |
Fastest observed: 1.7ms. Typical queries return in 2-3ms p50 against 134K objects.
Indexing Throughput
Section titled “Indexing Throughput”| Bucket | Objects | Duration | Throughput |
|---|---|---|---|
| bench-small | 1,000 | 1.08s | 926 obj/s |
| bench-mixed | 2,416 | 1.08s | 1,348 obj/s |
| production-bucket | 134,707 | completed | Production-scale crawl |
Indexing rate: 1,000–1,350 objects/second on local MinIO.
Upload Throughput
Section titled “Upload Throughput”| File Size | p50 | Throughput |
|---|---|---|
| 1 KB | 44.9ms | — |
| 1 MB | 51.9ms | 19.3 MB/s |
| 10 MB | 130.7ms | 76.5 MB/s |
| 50 MB | 436.2ms | 114.6 MB/s |
API Response Times
Section titled “API Response Times”| Endpoint | p50 | p95 |
|---|---|---|
/healthz | 2.1ms | 3.6ms |
/api/buckets | 4.3ms | 5.8ms |
| Object listing (production) | 2.4ms | 4.6ms |
| Presigned URL generation | 3.1ms | 5.6ms |
Concurrent Load
Section titled “Concurrent Load”Production S3, concurrent search queries:
| Concurrent Users | Requests/sec |
|---|---|
| 5 | 236 |
| 10 | 333 |
| 25 | 528 |
Scaling Performance (v3.0)
Section titled “Scaling Performance (v3.0)”Benchmarked on production data running in Docker against Leaseweb StorageGRID:
Folder Listing (the user-facing speedup)
Section titled “Folder Listing (the user-facing speedup)”| Bucket | Objects | Before (v2.0) | After (v3.0) | Speedup |
|---|---|---|---|---|
| ssp-production-reports | 557K | 114ms | 0.048ms | 2,378x |
| ds-mletl-data | 139K | 27ms | 0.056ms | 486x |
| druid-lw-prod | 2M (was >1M, disabled) | 312ms (DISTINCT fallback) | 0.002ms | 191,231x |
Folder listing uses pre-computed prefix_children via SQL-only aggregation. Previously disabled for buckets >1M objects due to OOM — now works at any scale.
SQLite PRAGMA Tuning Impact
Section titled “SQLite PRAGMA Tuning Impact”| Query | Before | After | Dataset |
|---|---|---|---|
| COUNT(*) | 2.0ms | 1.5ms | 557K objects |
| COUNT(*) | 11.3ms | 7.3ms | 2M objects |
| SUM(size) | 40ms | 62ms | 557K objects |
| Folder stats rebuild | 63ms | 56ms | 139K objects |
PRAGMAs applied: cache_size=-64000 (64MB), mmap_size=268435456 (256MB), temp_store=MEMORY.
Crawl Performance
Section titled “Crawl Performance”| Setting | Before | After |
|---|---|---|
| Crawl workers | 6 | 12 |
| Prefix workers | 4 | 16 |
| Batch size | 2,000 | 10,000 |
| Update chunks | 500 | 2,000 |
| FTS rebuild | Blocks 15+ min | Background thread |
| Sub-prefix splitting | None | Auto for buckets with few prefixes |
At 1PB Scale (Projected)
Section titled “At 1PB Scale (Projected)”| Operation | Expected Performance |
|---|---|
| Folder listing | < 1ms (constant time, index lookup) |
| Search | ~1ms (FTS5, always available) |
| Storage breakdown | ~500ms (full scan with PRAGMA tuning) |
| Crawl (50M objects) | ~25 min (with sub-prefix splitting) |
| FTS rebuild (50M objects) | ~83 min (background, non-blocking) |
Running the Benchmarks
Section titled “Running the Benchmarks”The benchmark suite lives in the benchmark/ directory of the repository.
1. Seed test data
Section titled “1. Seed test data”# Requires MinIO CLI (mc) configured with alias "local"cd benchmark./seed-data.sh # Seeds bench-small (1K) + bench-mixed (2.4K)./seed-data.sh medium # Seeds bench-medium (10K objects)./seed-data.sh large # Seeds bench-large (50K objects)The seeder creates four buckets with realistic file structures:
| Bucket | Objects | Pattern |
|---|---|---|
bench-small | 1,000 | 5 dirs × 10 months × 20 files |
bench-mixed | ~2,400 | Parquet data lake, logs, configs, CSV reports |
bench-medium | 10,000 | 10 dirs × 10 months × 10 sub-dirs × 10 files |
bench-large | 50,000 | 10 dirs × 50 partitions × 100 records |
2. Run benchmarks
Section titled “2. Run benchmarks”# Run all benchmarks./run-benchmarks.sh
# Run specific benchmark categories./run-benchmarks.sh search # Search latency only./run-benchmarks.sh crawl # Crawl/indexing only./run-benchmarks.sh crawl listing # Crawl + listingPrerequisites
Section titled “Prerequisites”- Sairo running on
localhost:8000(or setSAIRO_URL) - MinIO running on
localhost:9000 - Test buckets seeded via
seed-data.sh - Default admin credentials (
admin/password) or setADMIN_USER/ADMIN_PASS
3. Results
Section titled “3. Results”Results are saved to benchmark/results/:
- JSON — machine-readable per-run results (
benchmark-YYYYMMDD-HHMMSS.json) - Markdown — human-readable summary (
LATEST-RESULTS.md)
Landing Page Claims
Section titled “Landing Page Claims”Every number on the landing page maps to a specific benchmark result:
| Landing Page Claim | Benchmark Evidence |
|---|---|
| ”Single-digit millisecond search” | Production p50 = 2.2–3.1ms on 134K objects |
| ”1,300+ obj/sec indexing” | 1,348 obj/s measured on bench-mixed |
| ”Sub-5ms API responses” | healthz p50 = 2.1ms, most endpoints < 5ms |
| ”500+ requests/second” | 528 req/s at 25 concurrent users (production) |
| “114 MB/s upload” | 50 MB file upload sustained throughput |