ThinkHere — In-Browser AI

ThinkHere

Run AI entirely in your browser

No servers, no API keys, no data leaves your device. Powered by WebLLM and Transformers.js — everything runs locally on your hardware.

First load downloads model weights to your browser — this is a one-time download. After that, the model loads from cache in seconds.

WebLLM models require WebGPU (Chrome 113+, Edge 113+). Transformers.js models use ONNX Runtime Web. All need network access to huggingface.co

How do these models run in your browser? ↓

Two paths to in-browser AI

WebLLM + MLC

HuggingFaceMLC-format weights

TVM / MLC CompilerAhead-of-time compilation

WebGPU Compute ShadersPre-optimized GPU kernels

Your GPU

Models are compiled ahead-of-time using Apache TVM / MLC (Machine Learning Compilation). The compiler transforms model weights and operations into optimized WebGPU compute shaders that run directly on your GPU.

Trade-offs

Fast inference — kernels are pre-optimized
First run compiles shaders for your GPU (cached after)
Model must be specifically compiled for MLC

Used by

SmolLM2 360M, SmolLM2 1.7B, Llama 3.2 1B, Phi-3.5 Mini

Transformers.js + ONNX Runtime Web

HuggingFaceONNX-format model graph + weights

ONNX Runtime WebBuilds execution plan at load time

WebGPU

WASM fallback

Your GPU / CPU

Models are stored in the standard ONNX (Open Neural Network Exchange) format. ONNX Runtime Web interprets the model graph at load time and executes it on your GPU via WebGPU, or falls back to WebAssembly on unsupported hardware.

Trade-offs

Supports any model exportable to ONNX
Can fall back to WASM if WebGPU is unavailable
Slightly more overhead than pre-compiled kernels

Used by

Qwen3 4B Instruct, GPT-OSS 20B

Both methods use WebGPU for GPU acceleration. All model weights are cached in your browser after the first download — no server involved.

—

Initializing engine…

Download

Compile

Ready

0% — 0s elapsed

Downloading model weights — this only happens once, then it's cached locally.

System Prompt

Generation

Temperature 0.7

Top-P 0.9

Max Tokens 1024

Knowledge Base

Embedding model not loaded

No documents added

Drop file to add as context

Model loaded · all processing happens here

—

0 tokens

Ready

Run AI entirely in your browser

Two paths to in-browser AI

WebLLM + MLC

Transformers.js + ONNX Runtime Web

System Prompt

Generation

Knowledge Base

Conversations