Skip to content

Concepts

This page explains the core ideas behind Sairo so you can operate it effectively and troubleshoot when needed.

Sairo is a single container that sits between your browser and any S3-compatible storage backend. There are no external databases, no message queues, and no additional services to manage.

Sairo architecture diagram showing Client Layer, Sairo Container, Persistence Layer, and External Storage Layer

Frontend

A React 18 single-page application served as static files by the same container. All UI assets are bundled at build time.

Backend

A FastAPI application that handles authentication, serves the API, manages the SQLite index, and proxies S3 operations.

Storage

Any S3-compatible endpoint (AWS S3, MinIO, Ceph, R2, B2, etc.) accessed via boto3 with Signature Version 4 authentication.

Index

Per-bucket SQLite databases with FTS5 full-text search. WAL mode enables concurrent reads during indexing.

Sairo maintains a lightweight local index of every object in your S3 buckets so that browsing and searching are instant, even across millions of objects.

Each bucket gets its own SQLite database file in the /data directory inside the container (configurable via DB_DIR):

  • Directory/data/
    • media.db — Index for the “media” bucket
    • media.db-wal — Write-ahead log
    • backups.db — Index for the “backups” bucket
    • backups.db-wal
    • logs.db — Index for the “logs” bucket
    • logs.db-wal

This isolation means a large or slow bucket does not affect the responsiveness of others.

All databases use SQLite’s Write-Ahead Logging (WAL) mode. WAL allows concurrent reads while the crawler writes new entries, so the UI stays responsive during indexing. This is critical — without WAL, browsing would block during recrawl cycles.

Each database includes an FTS5 virtual table that indexes object keys. This is what powers the / search feature in the UI:

What’s indexedExample match
Object key segmentsinvoices/2024/march/report.pdf matches march report
File extensions.parquet matches all Parquet files
Prefix pathsbackups/daily/ matches all daily backups

Results return in milliseconds, even across buckets with 100K+ objects.

Sairo’s background crawler keeps the local index in sync with your S3 storage.

  1. Startup crawl

    When the container starts, Sairo immediately begins a full crawl of every accessible bucket. It uses ListObjectsV2 to enumerate all objects and inserts their metadata into the corresponding SQLite databases.

  2. Parallel indexing

    The crawler processes 4 prefix threads per bucket simultaneously. For buckets with deep prefix hierarchies, this dramatically reduces indexing time compared to a single-threaded approach.

  3. Metadata stored

    For each object, the index stores:

    • Object key (full path)
    • Size in bytes
    • Last modified timestamp
    • ETag
  4. Auto-recrawl loop

    After the initial crawl completes, Sairo waits RECRAWL_INTERVAL seconds (default: 120) and then starts another crawl. This loop repeats indefinitely, keeping the index reasonably fresh.

Sairo uses a simple but secure authentication system based on JSON Web Tokens.

  1. Login — User submits credentials (local, LDAP, or OAuth). The server validates them and issues a signed JWT.

  2. Cookie storage — The JWT is stored in an httpOnly, Secure cookie. httpOnly prevents XSS token theft. The Secure flag ensures transmission only over HTTPS.

  3. Request auth — Every subsequent API request includes the cookie automatically. The server verifies the JWT signature and checks expiration.

  4. Session expiry — Tokens expire after SESSION_HOURS hours (default: 24). The user must re-authenticate.

Sairo supports two roles with granular bucket-level permissions:

  • Full access to all buckets
  • User management (create, edit, delete accounts)
  • System settings and configuration
  • Audit log access
  • Can grant/revoke bucket permissions for viewers

Sairo supports multiple authentication backends. They can be used independently or combined:

ProviderUse caseConfig
LocalDefault admin account, standalone usersADMIN_USER, ADMIN_PASS
LDAPCorporate directory integrationLDAP_* env vars
Google OAuthGoogle Workspace SSOOAUTH_GOOGLE_* env vars
GitHub OAuthGitHub organization SSOOAUTH_GITHUB_* env vars
API TokensCI/CD pipelines, automationGenerated in the UI

See Authentication and OAuth & LDAP for setup guides.