[ For Research Partners ]

Research-grade AI. Production-grade infrastructure.

The platform research teams build on when the model has to run on real hardware. Constrained, sovereign, or air-gapped. Same model. Your environment.

  • NVIDIA Inception
  • AWS Activate
  • Microsoft for Startups
  • Garage&Co

Program names and logos are property of their respective owners.

[ Two Paths ]

Shrink the model. Or keep it.

The conventional path makes the model smaller until it fits the hardware. We take the opposite path. Keep the model. Make the hardware reach further.

The conventional path

Shrink the model

Distil, quantise, prune. The model that ships is not the model that was trained.

The Sector88 path

Keep the model. Tier the memory.

We orchestrate model weights across whatever memory your system has available. The model that ships is the model that was trained.

[ The Hard Part ]

A systems problem. Years in the making.

Memory management, kernel selection, hardware abstraction, deployment, fleet telemetry, security hardening. All of it has to work together, on every device, in every environment.

What it looks like

The duct-tape stack

Patched inference servers, custom CUDA kernels, bash scripts for deployment. Works on one machine until the hardware changes.

What we built

One integrated platform

Memory tiering, engine selection, fleet control, air-gapped deploy, audit, identity. Tested across Jetson, x86, GPU, CPU-only.

[ Runtime ]

Memory hierarchy.
In production.

Runtime probes the hardware, picks the engine, and tiers the model across whatever memory is available. GPU, system RAM, and disk. p95 latency stays predictable. The model that was trained is the model that serves.

Same Runtime on a Jetson at a ground station, a research workstation, and a server rack in a SCIF.

Explore Runtime
s88 serve --model Llama-3-70B-Q4_K_M
INITIALIZING

Llama-3-70B-Q4_K_M

GGUF Q4_K_M 70B params
Detected

Backend Selection

Auto
llama.cpp vLLM TensorRT-LLM Triton

Memory Hierarchy

PASS

VRAM (Tier 1)

16.8 / 24 GB

RAM (Tier 2)

42.3 / 64 GB

SSD Cache (Tier 3)

128 / 512 GB

Serving

localhost:8088/v1/chat/completions

Throughput

7.8 tok/s

Latency

118 ms

OOM Events

0

Uptime

0s

[ Hub ]

Every node. One pane.

Canary deploys, rollback on failure, hot-swap models, rotate credentials. Same model promoted across the lab, the field, and the facility.

Audit trails into your SIEM. Identity through your IdP. Telemetry stays local, syncs when connectivity does.

Explore Hub
Sector88 Hub
Live

Nodes

4

Serving

3

Fleet Uptime

99.9%

OOM Events

0

Active Deployments

ground-station-08

Svalbard, Norway
Llama-3-70B llama.cpp
VRAM 16.8/24 tok/s 7.8 22d up

ops-center-03

Edwards AFB, CA
Mistral-7B vLLM
VRAM 5.2/16 tok/s 24.1 8d up

rig-platform-11

North Sea, Offshore
Llama-3-8B llama.cpp
VRAM 6.1/8 tok/s 18.6 45d up

datacenter-sg-02

Singapore, APAC Warming
Qwen2-72B TensorRT-LLM
VRAM -- tok/s -- 0s up

Activity

2m agoModel Llama-3-70B serving on ground-station-08
5m agoPreflight passed on datacenter-sg-02. Loading Qwen2-72B.
18m agoTier swap on rig-platform-11. 2 layers RAM → VRAM.

[ How We Engage ]

Three ways in.

We work with research partners in a few different shapes. Each one starts the same way: a real environment and a real problem.

01

Hands-on access

We give your team direct access to the platform in a real environment. You run the models, we sit alongside as you go.

02

Co-funded research

We co-apply on research programs where you bring the academic side and we bring the industry side. National and international funds.

03

Embedded partnership

Longer-term research engagements where we deploy alongside your program. Shape and terms scoped to the relationship.

[ For the Engineers ]

The technical shape.

Fig 3.1

Memory orchestration

Memory orchestration

Weight tiering across VRAM, RAM, and NVMe. p95 latency stays predictable under load.

Fig 3.2

Model support

Model support

Open-weight transformers, custom fine-tunes, multimodal. vLLM, Triton, llama.cpp serving paths.

Fig 3.3

Validated hardware

Validated hardware

Jetson Orin and Orin NX, industrial edge PCs, x86 with or without GPUs, CPU-only environments.

Fig 3.4

Air-gapped install

Air-gapped install

Single Helm chart. Offline model registry. Signed bundles. Zero external dependencies.

Fig 3.5

Audit and identity

Audit and identity

Audit trails into your SIEM. Identity through your IdP. RBAC at model and node level.

Fig 3.6

Sovereign deployment

Sovereign deployment

Data stays on premises. Models stay on premises. No phone-home. No outbound calls.

[ Where It Applies ]

Real environments. Real constraints.

Fig 4.1

Ground station deployment

Ground station

Vision-language model on a Jetson Orin. Imagery processed at the edge. Same artifact promotes to a server rack.

Fig 4.2

Air-gapped sovereign facility

Sovereign facility

LLM on classified hardware. Air-gapped install from media. No phone-home, no licence server, no outbound calls.

Fig 4.3

University lab fleet

University lab

Multi-model benchmarking on shared, mixed-generation GPUs. Auto-detect, auto-tier, version-tracked results.

[ Research Output ]

Publishable work. Not just a deployment.

Benchmarks

Benchmark data

Latency, throughput, and memory data across hardware. Reproducible and citable.

Co-authored papers

Research publications

Memory-tiered inference, edge deployment, sovereign AI. Your research, our experimental foundation.

Grant support

Grant evidence

Deployment evidence, validation data, and industry letters for funding applications.

Build with us.

We work with a small number of research partners at a time. Tell us about the environment, the hardware, and the model you are trying to run.