Documentation
Technical Resources
Full technical documentation, deployment guides, and configuration references provided during evaluation.
Architecture
System design, memory tiers, policy engine
Deployment
Installation, configuration, production setup
API & CLI
Commands, metrics, telemetry endpoints
Hardware Platforms
NVIDIA CUDA
RTX series, A-series, H100, H200
AMD ROCm
MI series accelerators
Intel Gaudi / Xeon
AI accelerators, CPU
Google TPU
v4, v5 pods
Qualcomm AI
Edge accelerators
Apple Silicon
M-series processors
CPU Servers
x86, ARM architectures
Custom Hardware
Additional platforms on request
Inference Backends
PyTorch
Supported Native inference
vLLM
Supported PagedAttention optimization
llama.cpp
Supported GGUF models, CPU/GPU
TensorRT-LLM
Roadmap NVIDIA optimization
Triton
Roadmap NVIDIA inference server
Ollama
Roadmap Developer tooling
HuggingFace Transformers
Roadmap Direct library integration
SGLang
Roadmap Structured generation
MLC LLM
Roadmap Universal deployment
ExLlamaV2
Roadmap GPTQ inference
Custom Backends
On Request Additional engines and integrations