Run AI entirely in your browser
Your conversations never leave your device. No servers, no API keys, no telemetry. Just you and a model running on your own hardware.
Why ThinkHere?
Privacy First
Your data never leaves your device. Every conversation, every prompt, every response stays in your browser. There is no server to breach because there is no server at all.
Zero Infrastructure
No servers to manage, no API keys to rotate, no cloud bills. Open the page, load a model, and start chatting. Everything runs on WebGPU right in your browser tab.
Open Core
The free tier is MIT licensed and fully open source. Inspect the code, fork it, contribute to it. Additional features are available with a free account, and premium features with a paid subscription.
Three paths to in-browser AI
ThinkHere supports three different inference backends, each with its own trade-offs. All three use WebGPU for GPU acceleration, and all model weights are cached in your browser after the first download.
WebLLM + MLC
Models are compiled ahead-of-time using Apache TVM / MLC (Machine Learning Compilation). The compiler transforms model weights and operations into optimized WebGPU compute shaders that run directly on your GPU.
- Fast inference — kernels are pre-optimized
- First run compiles shaders for your GPU (cached after)
- Model must be specifically compiled for MLC
Used by
SmolLM2 360M, SmolLM2 1.7B, Qwen3 4B, Phi-3.5 Mini, Llama 3.2 1B
Transformers.js + ONNX Runtime Web
Models are stored in the standard ONNX (Open Neural Network Exchange) format. ONNX Runtime Web interprets the model graph at load time and executes it on your GPU via WebGPU, or falls back to WebAssembly on unsupported hardware.
- Supports any model exportable to ONNX
- Can fall back to WASM if WebGPU is unavailable
- Slightly more overhead than pre-compiled kernels
Used by
Qwen3.5 0.8B, Qwen3.5 2B, Qwen3.5 4B
MediaPipe + LiteRT
Google's MediaPipe LLM Inference API loads Gemma models in the LiteRT format (formerly TFLite). Supports multimodal input — text and images — all processed on-device via WebGPU.
- Multimodal: text and image input
- Single large file download (no split shards)
- Requires WebGPU — no WASM fallback
Used by
Gemma 3n E2B, Gemma 3n E4B
Ready to try private AI that runs entirely on your device?