[ For Research Partners ]

Research-grade AI.
Production-grade infrastructure.

The platform research teams build on when the model has to run on real hardware. Constrained, sovereign, or air-gapped. Same model. Your environment.

Talk to the team How we engage

Program names and logos are property of their respective owners.

[ Two Paths ]

Shrink the model. Or keep it.

The conventional path makes the model smaller until it fits the hardware. We take the opposite path. Keep the model. Make the hardware reach further.

The conventional path

Shrink the model

Distil, quantise, prune. The model that ships is not the model that was trained.

The Sector88 path

Keep the model. Tier the memory.

We orchestrate model weights across whatever memory your system has available. The model that ships is the model that was trained.

[ The Hard Part ]

A systems problem. Years in the making.

Memory management, kernel selection, hardware abstraction, deployment, fleet telemetry, security hardening. All of it has to work together, on every device, in every environment.

What it looks like

The duct-tape stack

Patched inference servers, custom CUDA kernels, bash scripts for deployment. Works on one machine until the hardware changes.

What we built

One integrated platform

Memory tiering, engine selection, fleet control, air-gapped deploy, audit, identity. Tested across Jetson, x86, GPU, CPU-only.

[ Runtime ]

Memory hierarchy.
In production.

Runtime probes the hardware, picks the engine, and tiers the model across whatever memory is available. GPU, system RAM, and disk. p95 latency stays predictable. The model that was trained is the model that serves.

Same Runtime on a Jetson at a ground station, a research workstation, and a server rack in a SCIF.

Explore Runtime

s88 serve --model Llama-3-70B-Q4_K_M

INITIALIZING

Llama-3-70B-Q4_K_M

GGUF Q4_K_M 70B params

Detected

Backend Selection

Auto

llama.cpp vLLM TensorRT-LLM Triton

Memory Hierarchy

PASS

VRAM (Tier 1)

16.8 / 24 GB

RAM (Tier 2)

42.3 / 64 GB

SSD Cache (Tier 3)

128 / 512 GB

Serving

localhost:8088/v1/chat/completions

Throughput

7.8 tok/s

Latency

118 ms

OOM Events

Uptime

[ The Platform ]

Three layers. One platform.

Fig 1.1

Hub

The control plane. Manage models, nodes, and deployments across every site. Audit, RBAC, telemetry, version control.

Explore Hub

Fig 1.2

Runtime

The execution layer. Tiered memory across VRAM, RAM, NVMe. Models that should not fit run stable. GPU, CPU, or mixed.

Explore Runtime

Fig 1.3

Deploy

Single Helm chart. Air-gapped install from media. No external dependencies. Same artifact from lab to ground station to facility.

Explore Deploy

[ Hub ]

Every node. One pane.

Canary deploys, rollback on failure, hot-swap models, rotate credentials. Same model promoted across the lab, the field, and the facility.

Audit trails into your SIEM. Identity through your IdP. Telemetry stays local, syncs when connectivity does.

Explore Hub

Sector88 Hub Fleet Overview

Live

Nodes

Serving

Fleet Uptime

99.9%

OOM Events

Active Deployments

ground-station-08

Svalbard, Norway

Llama-3-70B llama.cpp

VRAM

16.8/24

tok/s

7.8

Uptime

22d

VRAM 16.8/24 tok/s 7.8 22d up

ops-center-03

Edwards AFB, CA

Mistral-7B vLLM

VRAM

5.2/16

tok/s

24.1

Uptime

VRAM 5.2/16 tok/s 24.1 8d up

rig-platform-11

North Sea, Offshore

Llama-3-8B llama.cpp

VRAM

6.1/8

tok/s

18.6

Uptime

45d

VRAM 6.1/8 tok/s 18.6 45d up

datacenter-sg-02

Singapore, APAC Warming

Qwen2-72B TensorRT-LLM

VRAM

tok/s

Uptime

VRAM -- tok/s -- 0s up

Activity

2m agoModel Llama-3-70B serving on ground-station-08

5m agoPreflight passed on datacenter-sg-02. Loading Qwen2-72B.

18m agoTier swap on rig-platform-11. 2 layers RAM → VRAM.

[ How We Engage ]

Three ways in.

We work with research partners in a few different shapes. Each one starts the same way: a real environment and a real problem.

Hands-on access

We give your team direct access to the platform in a real environment. You run the models, we sit alongside as you go.

Co-funded research

We co-apply on research programs where you bring the academic side and we bring the industry side. National and international funds.

Embedded partnership

Longer-term research engagements where we deploy alongside your program. Shape and terms scoped to the relationship.

[ For the Engineers ]

The technical shape.

Fig 3.1

Memory orchestration

Weight tiering across VRAM, RAM, and NVMe. p95 latency stays predictable under load.

Fig 3.2

Model support

Open-weight transformers, custom fine-tunes, multimodal. vLLM, Triton, llama.cpp serving paths.

Fig 3.3

Validated hardware

Jetson Orin and Orin NX, industrial edge PCs, x86 with or without GPUs, CPU-only environments.

Fig 3.4

Air-gapped install

Single Helm chart. Offline model registry. Signed bundles. Zero external dependencies.

Fig 3.5

Audit and identity

Audit trails into your SIEM. Identity through your IdP. RBAC at model and node level.

Fig 3.6

Sovereign deployment

Data stays on premises. Models stay on premises. No phone-home. No outbound calls.

[ Where It Applies ]

Real environments. Real constraints.

Fig 4.1

Ground station

Vision-language model on a Jetson Orin. Imagery processed at the edge. Same artifact promotes to a server rack.

Fig 4.2

Sovereign facility

LLM on classified hardware. Air-gapped install from media. No phone-home, no licence server, no outbound calls.

Fig 4.3

University lab

Multi-model benchmarking on shared, mixed-generation GPUs. Auto-detect, auto-tier, version-tracked results.

[ Research Output ]

Publishable work. Not just a deployment.

Benchmarks

Latency, throughput, and memory data across hardware. Reproducible and citable.

Co-authored papers

Memory-tiered inference, edge deployment, sovereign AI. Your research, our experimental foundation.

Grant support

Deployment evidence, validation data, and industry letters for funding applications.

Build with us.

We work with a small number of research partners at a time. Tell us about the environment, the hardware, and the model you are trying to run.

Talk to the team View partner programs

Research-grade AI. Production-grade infrastructure.

Shrink the model. Or keep it.

Shrink the model

Keep the model. Tier the memory.

A systems problem. Years in the making.

The duct-tape stack

One integrated platform

Memory hierarchy.In production.

Three layers. One platform.

Hub

Runtime

Deploy

Every node. One pane.

Three ways in.

Hands-on access

Co-funded research

Embedded partnership

The technical shape.

Memory orchestration

Model support

Validated hardware

Air-gapped install

Audit and identity

Sovereign deployment

Real environments. Real constraints.

Ground station

Sovereign facility

University lab

Publishable work. Not just a deployment.

Build with us.

Research-grade AI.
Production-grade infrastructure.

Memory hierarchy.
In production.