Concepts

This page explains the core ideas behind Sairo so you can operate it effectively and troubleshoot when needed.

Architecture overview

Sairo is a single container that sits between your browser and any S3-compatible storage backend. There are no external databases, no message queues, and no additional services to manage.

Sairo architecture diagram showing Client Layer, Sairo Container, Persistence Layer, and External Storage Layer

Frontend

A React 18 single-page application served as static files by the same container. All UI assets are bundled at build time.

Backend

A FastAPI application that handles authentication, serves the API, manages the SQLite index, and proxies S3 operations.

Storage

Any S3-compatible endpoint (AWS S3, MinIO, Ceph, R2, B2, etc.) accessed via boto3 with Signature Version 4 authentication.

Index

Per-bucket SQLite databases with FTS5 full-text search. WAL mode enables concurrent reads during indexing.

SQLite index

Sairo maintains a lightweight local index of every object in your S3 buckets so that browsing and searching are instant, even across millions of objects.

Per-bucket databases

Each bucket gets its own SQLite database file in the /data directory inside the container (configurable via DB_DIR):

Directory/data/
- media.db — Index for the “media” bucket
- media.db-wal — Write-ahead log
- backups.db — Index for the “backups” bucket
- backups.db-wal
- logs.db — Index for the “logs” bucket
- logs.db-wal

This isolation means a large or slow bucket does not affect the responsiveness of others.

WAL mode

All databases use SQLite’s Write-Ahead Logging (WAL) mode. WAL allows concurrent reads while the crawler writes new entries, so the UI stays responsive during indexing. This is critical — without WAL, browsing would block during recrawl cycles.

Full-text search (FTS5)

Each database includes an FTS5 virtual table that indexes object keys. This is what powers the / search feature in the UI:

What’s indexed	Example match
Object key segments	`invoices/2024/march/report.pdf` matches `march report`
File extensions	`.parquet` matches all Parquet files
Prefix paths	`backups/daily/` matches all daily backups

Results return in milliseconds, even across buckets with 100K+ objects.

Crawl lifecycle

Sairo’s background crawler keeps the local index in sync with your S3 storage.

Startup crawl

When the container starts, Sairo immediately begins a full crawl of every accessible bucket. It uses ListObjectsV2 to enumerate all objects and inserts their metadata into the corresponding SQLite databases.
Parallel indexing

The crawler processes 16 concurrent prefix threads per bucket. For buckets with deep prefix hierarchies, this dramatically reduces indexing time compared to a single-threaded approach. See Architecture for the full crawl model.
Metadata stored

For each object, the index stores:
- Object key (full path)
- Size in bytes
- Last modified timestamp
- ETag
Auto-recrawl loop

After the initial crawl completes, Sairo waits RECRAWL_INTERVAL seconds (default: 120) and then refreshes the index. This loop repeats indefinitely, keeping the index fresh.
Adaptive delta crawling for large buckets

A small bucket is cheap to re-crawl in full, so it is. But re-walking a multi-million-object bucket every two minutes would be slow and expensive. So once a bucket’s full crawl takes longer than LARGE_BUCKET_SECONDS (default: 60), Sairo switches it to incremental delta crawls: each cycle re-lists only the newest/“hot” prefixes (e.g. today’s partition in a year=/month=/day= layout) to pick up new objects in seconds, and runs a full reconcile every FULL_CRAWL_INTERVAL (default: 3600s) to catch deletions and cold changes. This keeps huge buckets current without re-listing the whole bucket each cycle.

Authentication model

Sairo uses a simple but secure authentication system based on JSON Web Tokens.

How it works

Login — User submits credentials (local, LDAP, or OAuth). The server validates them and issues a signed JWT.
Cookie storage — The JWT is stored in an httpOnly, Secure cookie. httpOnly prevents XSS token theft. The Secure flag ensures transmission only over HTTPS.
Request auth — Every subsequent API request includes the cookie automatically. The server verifies the JWT signature and checks expiration.
Session expiry — Tokens expire after SESSION_HOURS hours (default: 24). The user must re-authenticate.

Roles & permissions

Sairo supports two roles with granular bucket-level permissions:

Admin
Viewer

Full access to all buckets
User management (create, edit, delete accounts)
System settings and configuration
Audit log access
Can grant/revoke bucket permissions for viewers

Authentication providers

Sairo supports multiple authentication backends. They can be used independently or combined:

Provider	Use case	Config
Local	Default admin account, standalone users	`ADMIN_USER`, `ADMIN_PASS`
LDAP	Corporate directory integration	`LDAP_*` env vars
Google OAuth	Google Workspace SSO	`OAUTH_GOOGLE_*` env vars
GitHub OAuth	GitHub organization SSO	`OAUTH_GITHUB_*` env vars
API Tokens	CI/CD pipelines, automation	Generated in the UI

See Authentication and OAuth & LDAP for setup guides.