Architecture
Sairo is designed as a single-container application that combines a FastAPI backend and a React frontend into one deployable unit.
Tech Stack
Section titled “Tech Stack”| Layer | Technology |
|---|---|
| Backend | Python 3.12, FastAPI, Uvicorn |
| S3 Client | boto3 |
| Database | SQLite with WAL mode and FTS5 |
| Frontend | React 18, Vite, @tanstack/react-virtual |
| Auth | JWT (PyJWT), passlib (bcrypt), pyotp (TOTP), slowapi (rate limiting) |
| Encryption | Fernet (cryptography library) |
Architecture Overview
Section titled “Architecture Overview”Single Container Design
Section titled “Single Container Design”The FastAPI backend serves the React SPA as static files. There is no separate web server or reverse proxy required inside the container. A single docker run or Kubernetes pod gives you a fully functional instance.
The container runs as a non-root user (UID 1000) with no privilege escalation capabilities.
SQLite Indexing
Section titled “SQLite Indexing”Each bucket gets its own SQLite database file stored in the /data directory (configurable via DB_DIR). This design provides:
- Isolation — a corrupted or slow index for one bucket does not affect others
- WAL mode — concurrent reads during writes without blocking
- FTS5 full-text search — fast prefix and substring search across object keys and metadata
The database schema stores object keys, sizes, last-modified timestamps, ETags, and content types. The FTS5 index covers object keys for search.
Prefix-Parallel Crawling
Section titled “Prefix-Parallel Crawling”When indexing a bucket, Sairo first discovers top-level prefixes (folders) via a shallow ListObjectsV2 call. It then crawls each prefix in parallel using a thread pool.
Bucket: my-data├── logs/ ─── Thread 1├── backups/ ─── Thread 2├── uploads/ ─── Thread 3├── archives/ ─── Thread 4└── (root objects) ─── Main threadConfiguration:
- 16 threads per bucket — each top-level prefix is crawled by one of 16 concurrent threads
- 12 buckets max concurrent — the background
ThreadPoolExecutor(12)limits total crawling parallelism - 10,000 objects per batch — SQLite inserts are batched for minimal commit overhead
- Sub-prefix splitting — buckets with 3 or fewer top-level prefixes and 500K+ objects automatically discover sub-prefixes for better parallelism
- Recrawl interval — controlled by
RECRAWL_INTERVAL(default 120 seconds) - Async FTS rebuild — full-text search index rebuilds in a background thread after crawl completes, keeping search available at all times
This approach scales to petabyte-level buckets with tens of millions of objects.
SQLite Performance Tuning
Section titled “SQLite Performance Tuning”Sairo applies aggressive PRAGMA tuning for maximum query performance:
| PRAGMA | Value | Purpose |
|---|---|---|
journal_mode | WAL | Concurrent reads during writes |
synchronous | NORMAL | Reduced fsync for faster writes |
cache_size | -64000 (64MB) | Large page cache for hot data |
mmap_size | 268435456 (256MB) | Memory-mapped I/O for read-heavy workloads |
temp_store | MEMORY | Temp tables in RAM |
These settings make folder listings return in < 0.1ms and COUNT queries complete in < 2ms on 500K+ object databases.
Pre-Computed Prefix Hierarchy
Section titled “Pre-Computed Prefix Hierarchy”After each crawl, Sairo builds a prefix_children table that maps parent → child folder relationships with pre-aggregated object counts and sizes. This means folder navigation is a simple indexed lookup, not a full table scan:
-- Instant: 0.05ms regardless of bucket sizeSELECT * FROM prefix_children WHERE parent_prefix = '';
-- Old approach (disabled for >1M objects): 300ms+ full table scanSELECT DISTINCT SUBSTR(key, ...) FROM objects WHERE ...This works at any scale — tested up to 2M objects per bucket.
NDJSON Streaming
Section titled “NDJSON Streaming”Object listings are streamed to the frontend as Newline-Delimited JSON (NDJSON). Instead of waiting for the entire listing to complete, the backend writes one JSON object per line as objects are discovered:
{"key":"file1.txt","size":1024,"last_modified":"2026-01-15T10:30:00Z"}{"key":"file2.txt","size":2048,"last_modified":"2026-01-15T11:00:00Z"}...The React frontend renders rows incrementally as they arrive, so the UI feels responsive even for buckets with tens of thousands of objects. The @tanstack/react-virtual library virtualizes the list so only visible rows are rendered in the DOM.
Multi-Endpoint S3 Support
Section titled “Multi-Endpoint S3 Support”Sairo can connect to multiple S3-compatible storage backends simultaneously. The S3ClientManager uses Python’s contextvars to route each HTTP request to the correct boto3 S3 client based on the selected endpoint.
Endpoint credentials are encrypted at rest using Fernet symmetric encryption (from the cryptography library). The encryption key is derived from the JWT_SECRET.
Request → Endpoint Selection → contextvars → S3ClientManager → boto3 client → S3 BackendAdmins add, edit, and remove endpoints from the Endpoints page in the UI. Each endpoint has its own set of buckets and indexes.
Security Properties
Section titled “Security Properties”- Non-root container — runs as UID 1000 with no capability additions
- No privilege escalation — container security context prevents escalation
- Encrypted credentials — stored endpoint credentials use Fernet encryption
- httpOnly cookies — JWT tokens cannot be read by client-side JavaScript
- bcrypt passwords — all passwords are hashed, never stored in plaintext
- Rate limiting — configurable per-endpoint rate limits prevent abuse