Run AI entirely in your browser

Your conversations never leave your device. No servers, no API keys, no telemetry. Just you and a model running on your own hardware.

Why ThinkHere?

Privacy First

Your data never leaves your device. Every conversation, every prompt, every response stays in your browser. There is no server to breach because there is no server at all.

Zero Infrastructure

No servers to manage, no API keys to rotate, no cloud bills. Open the page, load a model, and start chatting. Everything runs on WebGPU right in your browser tab.

Open Core

The free tier is MIT licensed and fully open source. Inspect the code, fork it, contribute to it. Additional features are available with a free account, and premium features with a paid subscription.

Two paths to in-browser AI

ThinkHere supports two inference backends, each with its own trade-offs. Both use WebGPU for GPU acceleration, and all model weights are cached in your browser after the first download.

WebLLM + MLC

HuggingFaceMLC-format weights
TVM / MLC CompilerAhead-of-time compilation
WebGPU Compute ShadersPre-optimized GPU kernels
Your GPU

Models are compiled ahead-of-time using Apache TVM / MLC (Machine Learning Compilation). The compiler transforms model weights and operations into optimized WebGPU compute shaders that run directly on your GPU.

Trade-offs
  • Fast inference — kernels are pre-optimized
  • First run compiles shaders for your GPU (cached after)
  • Model must be specifically compiled for MLC

Used by

SmolLM2 1.7B, Mistral 7B, Llama 3.2 1B

Transformers.js + ONNX Runtime Web

HuggingFaceONNX-format model graph + weights
ONNX Runtime WebBuilds execution plan at load time
WebGPU
WASM fallback
Your GPU / CPU

Models are stored in the standard ONNX (Open Neural Network Exchange) format. ONNX Runtime Web interprets the model graph at load time and executes it on your GPU via WebGPU, or falls back to WebAssembly on unsupported hardware.

Trade-offs
  • Supports any model exportable to ONNX
  • Can fall back to WASM if WebGPU is unavailable
  • Slightly more overhead than pre-compiled kernels

Used by

Gemma 4 E2B, Gemma 4 E4B, Qwen3.5 0.8B, Qwen3.5 2B, Qwen3.5 4B

Both methods use WebGPU for GPU acceleration. All model weights are cached in your browser after the first download — no server involved.

Ready to try private AI that runs entirely on your device?