Run AI entirely in your browser

No servers, no API keys, no data leaves your device. Powered by WebLLM and Transformers.js — everything runs locally on your hardware.

First load downloads model weights to your browser — this is a one-time download. After that, the model loads from cache in seconds.
WebLLM models require WebGPU (Chrome 113+, Edge 113+). Transformers.js models use ONNX Runtime Web. All need network access to huggingface.co
How do these models run in your browser? ↓

Two paths to in-browser AI

WebLLM + MLC

HuggingFaceMLC-format weights
TVM / MLC CompilerAhead-of-time compilation
WebGPU Compute ShadersPre-optimized GPU kernels
Your GPU

Models are compiled ahead-of-time using Apache TVM / MLC (Machine Learning Compilation). The compiler transforms model weights and operations into optimized WebGPU compute shaders that run directly on your GPU.

Trade-offs
  • Fast inference — kernels are pre-optimized
  • First run compiles shaders for your GPU (cached after)
  • Model must be specifically compiled for MLC

Used by

SmolLM2 360M, SmolLM2 1.7B, Llama 3.2 1B, Phi-3.5 Mini

Transformers.js + ONNX Runtime Web

HuggingFaceONNX-format model graph + weights
ONNX Runtime WebBuilds execution plan at load time
WebGPU
WASM fallback
Your GPU / CPU

Models are stored in the standard ONNX (Open Neural Network Exchange) format. ONNX Runtime Web interprets the model graph at load time and executes it on your GPU via WebGPU, or falls back to WebAssembly on unsupported hardware.

Trade-offs
  • Supports any model exportable to ONNX
  • Can fall back to WASM if WebGPU is unavailable
  • Slightly more overhead than pre-compiled kernels

Used by

Qwen3 4B Instruct, GPT-OSS 20B

Both methods use WebGPU for GPU acceleration. All model weights are cached in your browser after the first download — no server involved.
Initializing engine…
Download
Compile
Ready
0% 0s elapsed
Downloading model weights — this only happens once, then it's cached locally.

System Prompt

Generation

Knowledge Base

Embedding model not loaded
No documents added
Drop file to add as context
Model loaded · all processing happens here
0 tokens
Ready

Conversations

No saved conversations yet
Loading Embedding Model
Downloading model weights — this only happens once.
0%