Run AI entirely in your browser
Your conversations never leave your device. No servers, no API keys, no telemetry. Just you and a model running on your own hardware.
Why ThinkHere?
Privacy First
Your data never leaves your device. Every conversation, every prompt, every response stays in your browser. There is no server to breach because there is no server at all.
Zero Infrastructure
No servers to manage, no API keys to rotate, no cloud bills. Open the page, load a model, and start chatting. Everything runs on WebGPU right in your browser tab.
Open Core
The free tier is MIT licensed and fully open source. Inspect the code, fork it, contribute to it. Additional features are available with a free account, and premium features with a paid subscription.
Two paths to in-browser AI
ThinkHere supports two inference backends, each with its own trade-offs. Both use WebGPU for GPU acceleration, and all model weights are cached in your browser after the first download.
WebLLM + MLC
Models are compiled ahead-of-time using Apache TVM / MLC (Machine Learning Compilation). The compiler transforms model weights and operations into optimized WebGPU compute shaders that run directly on your GPU.
- Fast inference — kernels are pre-optimized
- First run compiles shaders for your GPU (cached after)
- Model must be specifically compiled for MLC
Used by
SmolLM2 1.7B, Mistral 7B, Llama 3.2 1B
Transformers.js + ONNX Runtime Web
Models are stored in the standard ONNX (Open Neural Network Exchange) format. ONNX Runtime Web interprets the model graph at load time and executes it on your GPU via WebGPU, or falls back to WebAssembly on unsupported hardware.
- Supports any model exportable to ONNX
- Can fall back to WASM if WebGPU is unavailable
- Slightly more overhead than pre-compiled kernels
Used by
Gemma 4 E2B, Gemma 4 E4B, Qwen3.5 0.8B, Qwen3.5 2B, Qwen3.5 4B
Ready to try private AI that runs entirely on your device?