Air-gapped deployment.
A step-by-step guide to installing Sector88 Runtime and Hub in environments with no external network access. Based on field deployments in classified facilities and sovereign compute sites.
[ Overview ]
Air-gapped deployments are the default for defence, sovereign, and critical infrastructure sites. The challenge is not just getting software onto the machine — it is proving that the software will keep running without phoning home, without pulling updates, and without relying on a cloud API that does not exist.
This playbook covers the full lifecycle: hardware audit, offline packaging, install, model validation, security hardening, and operator handover. Every step has been tested in facilities where the only network is an internal air-gap.
[ At a Glance ]
- Prerequisites: x86_64 or ARM64, CUDA-capable GPU, 32GB+ RAM
- Install time: 2–4 hours depending on model size
- Network: None required at runtime
- Auth: API-key or no-auth depending on regime
[ The Steps ]
Hardware audit
Before anything ships, we profile the target hardware remotely or on-site. CPU architecture, GPU VRAM, available RAM, NVMe capacity, and PCIe bandwidth. We also verify the OS kernel version, driver stack, and whether the environment is fully offline or has a bastion host. This audit produces a hardware-fit report that tells us which model sizes can run, which quantisation levels are required, and whether the node needs a GPU upgrade.
Offline packaging
All runtime binaries, Hub containers, model weights, and dependency wheels are packaged into a single signed archive. The archive includes the exact CUDA driver version, Python wheels, and system libraries validated during the audit. Nothing pulls from the internet at install time. The archive is checksummed and GPG-signed so the receiving team can verify integrity before opening.
Install and bootstrap
Runtime installs as a systemd service or Docker container depending on the site's security regime. Hub installs as a separate container with a local SQLite or Postgres backend. The install script is idempotent — if it fails halfway, re-running picks up where it left off. We do not require root beyond the initial service registration.
Model loading and validation
Models load from the local filesystem archive, not a remote registry. Runtime verifies layer hashes before loading weights into memory. We run a validation suite: token generation latency at 1, 8, and 32 concurrent requests; VRAM headroom under sustained load; thermal baseline after 30 minutes of inference. Results are written to a local report the operator can review.
Security hardening
Default install runs in dev mode. For production, we enable API key auth, disable the Hub UI if not required, configure audit logging to the local SIEM format, and set up the GPU watchdog with thermal and hang thresholds appropriate to the site's ambient conditions. We also verify no outbound network paths exist.
Handover
The operator receives a runbook: how to start and stop services, how to read the /health endpoint, how to interpret thermal alerts, and who to call if the model fails to load. Our engineers leave. The platform stays.
Deploy with us.
This playbook is a starting point. Every site is different. Our engineers will adapt it to yours.