Blog
Updates, announcements, and technical deep dives from the ThinkHere team.
Introducing ThinkHere — Private AI That Runs in Your Browser
We built an AI assistant that runs entirely on your hardware. No cloud servers, no data collection, no API keys. Here is why we think local-first AI matters, and how we made it work with WebGPU and MediaPipe.
Read more →Introducing ThinkHere — Private AI That Runs in Your Browser
Today we are launching ThinkHere, an AI chat application that runs entirely in your web browser. There is no server processing your prompts, no cloud storing your conversations, and no company reading your data. When you use ThinkHere, the language model runs on your own hardware, and your words never leave your device.
We believe this is how AI tools should work by default. And thanks to recent advances in browser technology, it is now practical to deliver that experience without asking users to install anything at all.
The privacy problem with cloud AI
Most AI chat tools work the same way: you type a prompt, it gets sent to a server, a model generates a response, and the server sends it back. Along the way, your prompt is transmitted over the internet, processed on hardware you do not control, and often stored in logs you cannot access or delete.
For casual questions, this might feel acceptable. But the moment you ask an AI to help with medical information, legal documents, financial planning, personal journal entries, or proprietary business data, the calculus changes entirely. You are trusting a third party with sensitive information, often without a clear understanding of how that data will be retained, used for training, or shared.
ThinkHere eliminates this tradeoff. Your prompts and responses never leave the browser tab. There is no network request carrying your conversation to a remote server. The model weights are downloaded once and cached locally, and every inference operation runs on your device's GPU via WebGPU. If you close the tab, the conversation exists nowhere but in your browser's local storage.
How in-browser AI actually works
Running a language model in a browser tab sounds improbable, but the underlying technology is mature and well-supported. ThinkHere is built on two key pieces of infrastructure:
- WebGPU is a modern browser API that provides low-level access to the GPU. Unlike WebGL, which was designed for graphics rendering, WebGPU is built for general-purpose compute workloads, including the matrix multiplications that power neural network inference. Chrome, Edge, and Safari all ship with WebGPU support.
- MediaPipe LLM Inference API is Google's framework for running large language models on edge devices. It handles model loading, tokenization, context management, and inference scheduling. ThinkHere uses MediaPipe to orchestrate the entire pipeline inside the browser.
When you first load a model in ThinkHere, the browser downloads the model weights (typically 1 to 3 GB depending on the model). These weights are stored in the browser's Cache Storage API, so you only download them once. On subsequent visits, the model loads from the local cache in a matter of seconds. From there, MediaPipe compiles the model for your specific GPU using WebGPU shader compilation, and inference runs natively on your hardware.
The result is a fully functional AI assistant running in a browser tab, with generation speeds that are practical for real conversations. On a modern laptop with a discrete or integrated GPU, you can expect anywhere from 10 to 30+ tokens per second depending on the model and hardware.
The tier system
ThinkHere is free to use, and we want to keep it that way for the core experience. Here is how our tiers work:
- Free (no account required): You can load and chat with Gemma 3n E2B immediately, with no sign-up. This is a capable, multimodal model that supports image input. You get a clean chat interface with no strings attached.
- Logged in (free account): Creating a free account unlocks the full model library (nine models and growing), system prompts, temperature and generation controls, conversation history, export, knowledge base with RAG, and file upload as context. The account is free and always will be.
- Paid (coming soon): We are designing a paid tier that will include priority support and premium features as we develop them. Details will be announced when it is ready.
Importantly, all tiers run the model locally in your browser. The account system manages preferences, settings, and feature access, but the core AI inference is always happening on your hardware. We never see your conversations regardless of which tier you are on.
Device requirements
Because ThinkHere runs on your local hardware, device capability matters. Here is what you need:
- Browser: Chrome 113+, Edge 113+, or Safari 18+ with WebGPU enabled
- RAM: At least 6 GB for the smallest model (Gemma 3n E2B). Larger models need more.
- Best experience: Desktop or laptop with a dedicated or integrated GPU
- iPhone: Not currently supported due to iOS memory limits that prevent model loading
- iPad: M-series iPads may work, though the experience varies
We are actively working to expand device support as browser capabilities and model efficiency improve. The trend is in our direction: models are getting smaller and more efficient, and WebGPU support is spreading to more devices.
What comes next
ThinkHere is open source under the MIT license, and we are building it in the open. Our near-term roadmap includes expanding the model library, improving generation speed, adding more document and context features for the logged-in tier, and designing the paid tier based on community feedback.
We think private, local-first AI is not a niche preference but an inevitability. As models become more capable at smaller sizes, and as hardware acceleration in browsers continues to mature, running AI on your own device will be the default, not the exception. ThinkHere is our bet on that future.
Ready to try private AI that runs in your browser?