No servers, no API keys, no data leaves your device. Powered by WebLLM and Transformers.js — everything runs locally on your hardware.
Models are compiled ahead-of-time using Apache TVM / MLC (Machine Learning Compilation). The compiler transforms model weights and operations into optimized WebGPU compute shaders that run directly on your GPU.
Used by
SmolLM2 360M, SmolLM2 1.7B, Llama 3.2 1B, Phi-3.5 Mini
Models are stored in the standard ONNX (Open Neural Network Exchange) format. ONNX Runtime Web interprets the model graph at load time and executes it on your GPU via WebGPU, or falls back to WebAssembly on unsupported hardware.
Used by
Qwen3 4B Instruct, GPT-OSS 20B