ThinkHere
About Blog Guide
Sign In Create Account
Guide & Help
  • Quick Start
  • Feature Tiers
  • Device Requirements
  • Chat Models
Help Articles
  • Getting Started
  • How ThinkHere Works
  • Privacy & Security
  • Accounts & Tiers
  • Models & Performance
  • Files & Knowledge Base
  • Device & Browser Support
  • Troubleshooting
  • FAQ
ThinkHere · Guide

Guide & Help Centre

Getting started, features, and comprehensive help documentation

Getting Started

ThinkHere runs entirely in your browser. There is nothing to install and no account is required to start chatting.

No account

Step 1
Open thinkhere.ai in a supported browser
Step 2
Click Load Model to download and initialize SmolLM2 1.7B
Step 3
Start chatting — everything runs locally on your device

With a free account

Step 1
Create a free account at app.thinkhere.ai
Step 2
Choose from 8 models and click Load
Step 3
Conversations are saved automatically and can be exported

The first load downloads model weights (600 MB–4.5 GB depending on the model). After that, the model loads from your browser's cache in seconds.

Feature Tiers

ThinkHere offers three tiers. All tiers run the model locally in your browser — your conversations never leave your device.

Feature Free Logged In Pro
Text model (SmolLM2 1.7B) ✓ ✓ ✓
File & PDF upload as context ✓ ✓ ✓
Conversation history & saving ✓ ✓ ✓
All 8 models — ✓ ✓
Conversation export — ✓ ✓
Long conversation memory
Context compression
— ✓ ✓
Custom instructions
System prompts
— — ✓
Creativity settings
Temperature & generation controls
— — ✓
File-powered answers
Knowledge base / RAG
— — ✓
Voice transcription — — ✓

Chat Models

The free tier on thinkhere.ai includes SmolLM2 1.7B. All 8 models are available with a free account at app.thinkhere.ai.

Model Family Runtime Size Multi-modal Tier
SmolLM2 1.7B SmolLM WebLLM ~1 GB — Free
Qwen3.5 0.8B Qwen Transformers.js ~600 MB ✓ Logged In
Qwen3.5 2B Qwen Transformers.js ~1.5 GB ✓ Logged In
Qwen3.5 4B Qwen Transformers.js ~2.9 GB ✓ Logged In
Mistral 7B Mistral AI WebLLM ~3.8 GB — Logged In
Llama 3.2 1B Llama WebLLM ~700 MB — Logged In
Gemma 3n E2B Gemma MediaPipe ~3 GB ✓ Logged In
Gemma 3n E4B Gemma MediaPipe ~4.3 GB ✓ Logged In

Model sizes are approximate download sizes. Larger models generally produce higher quality output but require more RAM and generate tokens more slowly. Models marked as multi-modal support image and PDF input alongside text.

Supporting models

ThinkHere also uses the following models behind the scenes for Pro features. All run entirely on-device via WebGPU — no data is sent to any server.

Model Purpose Runtime Size Tier
Qwen3 Embedding 0.6B Knowledge base embeddings (RAG) Transformers.js · ONNX Runtime — Pro
Whisper Large V3 Turbo Voice transcription Transformers.js · ONNX Runtime — Pro

Device Requirements

ThinkHere runs AI inference on your local hardware, so device capability matters.

Browser
Chrome 113+, Edge 113+, Safari 18+
API
WebGPU required
Memory
Minimum 4 GB RAM for SmolLM2 1.7B
Best Experience
Desktop or laptop with dedicated or integrated GPU
iPhone
Not supported — iOS memory limits prevent model loading
iPad
M-series iPads may work, though experience varies

Getting Started

1. What is ThinkHere?

ThinkHere is a private AI chat app that runs entirely in your browser on your own hardware. Instead of sending your prompts to a remote AI server, ThinkHere downloads a language model to your device and runs it locally — using your GPU via WebGPU.

The result is an AI assistant where your prompts and responses never leave your browser tab. There is no cloud inference, no server storing your conversations, and no third party processing what you write.

ThinkHere is built by Qanata Lab and core features are open source under the MIT licence.

↑ Back to top

2. How to start using ThinkHere

To begin, open ThinkHere in a supported browser on a compatible device. No installation is required — ThinkHere runs entirely in the browser tab.

  • Open thinkhere.ai in Chrome 113+, Edge 113+, or Safari 18+.
  • ThinkHere will check that your browser supports WebGPU. If it does, you will be taken straight to the chat interface.
  • Without an account, SmolLM2 1.7B loads automatically. The first load includes a model download of ~1 GB, so allow a few minutes depending on your connection.
  • Once the model is ready, type your first message and start chatting. All processing happens on your device.
  • If you want access to more models and features, create a free account.

    ↑ Back to top

3. Use ThinkHere without an account

No account is required to use ThinkHere. Visit the site in a supported browser and you can start chatting immediately with SmolLM2 1.7B — a fast, lightweight chat model optimized for on-device use.

The no-account experience gives you a clean, fast path into local AI with no sign-up friction. It is ideal for trying ThinkHere for the first time or for one-off conversations you do not need to save.

Without an account you do not get conversation history, export, the full model library, system prompts, or knowledge base features. See what a free account unlocks.

↑ Back to top

4. Create a free account in ThinkHere

Creating a free account unlocks the full ThinkHere experience at no cost. To sign up, click Create Account on the ThinkHere homepage and provide your name and email address.

Signing in does not change the core architecture. ThinkHere still runs the AI model locally in your browser on your own hardware — your conversations are never sent to Qanata Lab's servers regardless of whether you are signed in.

Your account stores your preferences, settings, and feature access only. See everything a free account includes.

↑ Back to top

5. Your first model in ThinkHere

The first time you load a model in ThinkHere, the browser downloads the model weights (typically 600 MB–4.5 GB depending on the model), stores them in your browser's Cache Storage, and compiles them for your specific GPU using WebGPU shader compilation. This only happens once per model.

On subsequent visits, the model loads from the local cache in seconds. You do not need to download it again unless you clear your browser's cached data.

If the first load seems to stall, check your network connection and make sure you have at least 6 GB of free RAM. Closing other browser tabs frees up memory and can help.

↑ Back to top

How ThinkHere Works

6. How ThinkHere runs AI in the browser

ThinkHere uses two key pieces of browser technology to run AI locally: WebGPU for GPU-accelerated compute, and the WebLLM framework to manage the model pipeline.

When you send a message, ThinkHere processes it entirely within the browser tab. The model receives your prompt, generates a response token by token, and streams it back to the interface — all without any network request leaving your device. The full pipeline looks like this:

  • Model weights are downloaded once and cached in the browser's Cache Storage
  • On load, WebLLM compiles the model for your GPU using WebGPU shader compilation
  • Your prompt is tokenised and passed to the model running on your GPU
  • The model generates a response locally and streams it to the chat interface
  • Nothing in this process involves a server — it is entirely on-device
↑ Back to top

7. What WebGPU does in ThinkHere

WebGPU is a modern browser API that provides low-level access to your device's GPU. Unlike WebGL, which was designed for graphics rendering, WebGPU is built for general-purpose compute — including the matrix multiplications that power AI inference.

ThinkHere uses WebGPU to run language model inference directly on your hardware. Without WebGPU, the browser cannot access the GPU efficiently enough to run models at a usable speed.

WebGPU is supported in Chrome 113+, Edge 113+, and Safari 18+. It is not available on older browser versions or on iPhones due to iOS memory constraints.

↑ Back to top

8. What WebLLM does in ThinkHere

WebLLM is an open-source framework for running large language models in web browsers. ThinkHere uses WebLLM to handle model loading, tokenisation, context management, and inference scheduling — all inside the browser.

For most users, WebLLM is invisible. It is the layer that makes the browser capable of running a full language model pipeline without relying on a server. If you are curious about the technical detail, the WebLLM documentation is publicly available.

↑ Back to top

9. Why models are downloaded and cached locally

Because ThinkHere runs the AI model on your device, your browser needs access to the model weights — the large files that define how the model thinks and responds. These are downloaded from Qanata Lab's servers on first use and stored in your browser's Cache Storage.

The download is the only part of ThinkHere that requires an internet connection for inference. Once cached, the model loads locally in seconds on future visits, and every conversation happens entirely on-device with no network activity.

Model files are typically 600 MB–4.5 GB. You can clear cached models at any time in the app or through your browser's storage settings.

↑ Back to top

10. What happens when you close the tab

When you close the ThinkHere tab, the active AI session ends and the model is unloaded from memory. The model weights remain cached in your browser's Cache Storage, so the next time you open ThinkHere the model loads quickly without a re-download.

If you are signed in, your conversation history is saved in your browser's local storage and will be available when you return. If you are using ThinkHere without an account, or in a private/incognito window, the conversation is not retained after the tab closes.

Qanata Lab has no access to your conversation history. It exists only in your browser's local storage, on your device.

↑ Back to top

Privacy and Security

11. How ThinkHere protects your privacy

ThinkHere's privacy model is architectural, not just policy-based. The AI model runs on your device using WebGPU. Your prompts and responses are processed locally and stored only in your browser's local storage. No network request carries your conversation to any server at any point during inference.

The only outbound requests ThinkHere makes are: downloading model weights on first use (standard HTTPS), and account authentication if you are signed in. Neither involves your conversation content.

You can verify this yourself — ThinkHere is open source under the MIT licence and the full codebase is publicly available.

↑ Back to top

12. Does ThinkHere send my prompts to the cloud?

No. ThinkHere does not send your prompts to the cloud. Every response is generated on your own device. There is no cloud inference provider in the pipeline — not for any tier, not for any model available in ThinkHere.

The only data that leaves your device is: model weight downloads on first use, and standard account authentication requests if you are signed in. Your conversation content is never part of either.

↑ Back to top

13. Who can read my conversations?

Nobody at Qanata Lab can read your conversations. Because inference runs locally and conversations are stored only in your browser's local storage, they never reach our servers.

The only people who can access your ThinkHere conversations are those with physical or account-level access to your device and browser. Treat your conversation history the same way you would treat any other sensitive data stored locally on your machine.

↑ Back to top

14. Where conversations are stored

Conversations are stored in your browser's local storage, on your device. They are not synced to Qanata Lab's servers and are not associated with your account even if you are signed in.

This means:

  • Clearing your browser's local storage will delete your conversation history
  • Using ThinkHere in a private or incognito window means conversations are not retained after the session
  • Switching devices or browsers will not carry your history across — it stays on the original device
↑ Back to top

15. Is ThinkHere suitable for sensitive information?

ThinkHere's local-first design makes it well-suited for privacy-sensitive work. Because your prompts and responses never leave your device, ThinkHere avoids the exposure risk that comes with cloud AI tools — where conversations may be logged, retained, or used for model training.

For professionally sensitive workflows — legal, medical, financial, or confidential business content — ThinkHere offers a meaningful privacy advantage over cloud-based alternatives. That said, you should still apply your organisation's own security standards for device access, browser data, and local storage.

ThinkHere does not replace formal compliance or data governance processes. If your organisation has specific requirements around AI tool usage, review those alongside ThinkHere's open-source codebase and Privacy Policy before deploying.

↑ Back to top

Accounts and Tiers

16. What you get with no account

Without an account you get immediate access to ThinkHere with no sign-up required. You can start chatting with SmolLM2 1.7B — a fast, lightweight chat model — as soon as the model has downloaded and cached on your device.

The no-account experience is the fastest way to try local AI. It is intentionally simple: one model, no history, no advanced controls. Everything still runs locally on your device.

↑ Back to top

17. Does signing in change where inference happens?

No. Signing in unlocks features but does not change the architecture. ThinkHere always runs the AI model locally in your browser, on your hardware — whether you have an account or not, whether you are on the free or paid tier.

Your account only manages your preferences, settings, and feature access. It has no role in the inference pipeline.

↑ Back to top

Models and Performance

18. Which models are available in ThinkHere?

Without an account, ThinkHere gives you access to SmolLM2 1.7B — a fast, lightweight chat model optimized for on-device inference. It is a good starting point for most tasks and works well on a wide range of hardware.

With a free signed-in account, the full model library is available — currently nine models and growing. Models vary in size, capability, and hardware requirements. Larger models produce higher quality output but require more RAM and take longer to load. Smaller models are faster and work on lighter hardware.

The in-app model selector shows available models alongside their approximate size and recommended RAM. Check the ThinkHere Docs for the current full model list.

↑ Back to top

19. Why the first model load takes time

The first load involves three steps that do not repeat on subsequent visits:

  • Download — model weights (600 MB–4.5 GB depending on the model) are downloaded from Qanata Lab's servers over HTTPS.
  • Cache — the weights are stored in your browser's Cache Storage so they are available locally in future.
  • Compile — WebLLM compiles the model for your specific GPU using WebGPU shader compilation. This step can take 30–60 seconds on first run.
  • After this one-time setup, the model loads from the local cache in seconds on every subsequent visit.

    ↑ Back to top

20. How much RAM do I need?

You need at least 4 GB of RAM to run the smallest model (SmolLM2 1.7B). Larger models require more — the in-app model selector shows recommended RAM for each option.

RAM available to ThinkHere is shared with your operating system, browser, and other open applications. If you are close to the minimum, closing other tabs and applications before loading a model can make a meaningful difference.

  • 4 GB RAM — minimum for SmolLM2 1.7B
  • 8–12 GB RAM — comfortable for mid-size models
  • 16 GB+ RAM — recommended for larger, higher-quality models
↑ Back to top

21. How fast is ThinkHere?

On a modern laptop or desktop with a dedicated or integrated GPU, ThinkHere typically generates 10–30+ tokens per second — fast enough for natural back-and-forth conversation. High-spec devices at the top end of that range feel noticeably more responsive.

Speed depends on three factors: the model you have selected (smaller models are faster), your device's GPU capability, and how much RAM is available. On older or lower-powered hardware, generation may be slower but still functional with a lighter model.

↑ Back to top

22. How to choose the best model for your device

Start with the smallest model your use case allows. SmolLM2 1.7B is a solid default — it runs on 4 GB of RAM and handles most everyday tasks well. Move up to a larger model only if you need higher quality output and your hardware is comfortable at the current level.

  • If responses feel slow, switch to a lighter model
  • If the model fails to load, you may not have enough free RAM — close other applications and try again, or select a smaller model
  • If quality is the priority and your device has 16 GB+ RAM, try one of the larger models in the library

The model selector in the app shows the recommended minimum RAM for each model to help you decide.

↑ Back to top

Files and Knowledge Base

23. How file upload works in ThinkHere

File upload lets you add a local document or file as context for your conversation. Once uploaded, the content of the file is available to the model as part of the conversation — you can ask questions about it, summarise it, or use it as reference material.

Uploaded files are processed locally in your browser. They are not sent to Qanata Lab's servers. File upload is available to signed-in users on the free account tier and above.

Check the ThinkHere app for the current list of supported file types and any size limits, as these may be updated as the product develops.

↑ Back to top

24. What is the ThinkHere knowledge base?

The ThinkHere knowledge base lets you store reference documents and use them as persistent context across conversations. Instead of uploading a file each time, you can add documents to your knowledge base once and have the model draw on them whenever relevant.

The knowledge base uses retrieval-augmented generation (RAG) to find and inject the most relevant sections of your stored documents into the model's context when you ask a question. This means the model can answer questions grounded in your own material rather than just its training data.

The knowledge base is available to signed-in users on the free account tier and above.

↑ Back to top

25. What RAG means in ThinkHere

RAG stands for retrieval-augmented generation. It is the technique ThinkHere uses to let the model answer questions using documents from your knowledge base.

In practice it works like this: when you ask a question, ThinkHere searches your stored documents for the most relevant passages and includes them in the model's context alongside your question. The model then generates a response that draws on both its training and your specific material.

The benefit is that the model can give you accurate, grounded answers about your own documents — without you needing to paste the content into every message manually.

↑ Back to top

26. How local-first AI affects document workflows

With cloud AI tools, uploading a document means sending it to a third-party server. That content may be logged, retained, or used downstream in ways that are difficult to verify.

With ThinkHere, files you upload or add to your knowledge base stay on your device. The model reads and processes them locally. Nothing in your documents is transmitted to Qanata Lab. This makes ThinkHere particularly useful when working with confidential documents, internal business material, or personal content you would prefer not to share with a cloud service.

↑ Back to top

Device and Browser Support

27. Supported browsers for ThinkHere

ThinkHere requires a browser with WebGPU support. The following browsers are officially supported:

Browser Minimum version Notes
Google Chrome 113+ Recommended — best WebGPU support
Microsoft Edge 113+ Chromium-based; performs similarly to Chrome
Apple Safari 18+ Supported on macOS and iPadOS (M-series)
Mozilla Firefox — Not currently supported; WebGPU is not stable in Firefox

If your browser is not listed or is below the minimum version, ThinkHere will display an error on load. Updating your browser to the latest version is the simplest fix.

↑ Back to top

28. What devices work best with ThinkHere

ThinkHere runs best on desktop and laptop computers with at least 6 GB of RAM and a modern GPU. Devices with a discrete GPU (such as an NVIDIA or AMD card) or a capable integrated GPU (such as those in Apple Silicon Macs) generally provide the fastest experience.

  • Best: Mac with Apple Silicon (M1 or later), Windows desktop or laptop with a discrete GPU
  • Good: Most modern laptops with 8 GB+ RAM and a recent integrated GPU
  • Limited: Older laptops with 6 GB RAM — use the smallest model only
  • Not supported: iPhone (iOS memory constraints prevent model loading)
↑ Back to top

29. Does ThinkHere work on iPhone?

No. ThinkHere does not currently work on iPhone. iOS imposes memory limits that prevent the browser from loading model weights of the size required to run a language model. This is a hardware and OS constraint, not a browser version issue.

We are monitoring improvements in browser capabilities and model efficiency. If iPhone support becomes practical in the future, we will announce it. For now, iPhone users should use ThinkHere on a desktop or laptop.

↑ Back to top

30. Does ThinkHere work on iPad?

ThinkHere may work on iPads with an M-series chip (M1 or later), using Safari 18+. M-series iPads have more RAM and better GPU access than older iPads, which makes model loading more feasible.

However, performance can vary. Expect longer load times and slower generation than on a Mac or PC. Older iPads without an M-series chip are unlikely to load models successfully due to memory constraints, similar to iPhone.

If ThinkHere fails to load on your iPad, check that you are using Safari 18+ and that the device has an M-series chip. If it still does not work, use a desktop or laptop instead.

↑ Back to top

31. How to check whether WebGPU is enabled

ThinkHere checks for WebGPU support automatically when you open the app and will display an error if it is unavailable. If you want to check manually:

In Chrome or Edge:

  • Type chrome://gpu in the address bar and press Enter.
  • Look for WebGPU in the Graphics Feature Status list. It should show Hardware accelerated.
  • If it shows Disabled or Software only, your hardware or driver may not support WebGPU — update your graphics drivers and try again.
  • In Safari:

    • Open Safari Preferences → Advanced → enable Show features for web developers.
    • Go to Develop → Experimental Features and confirm WebGPU is enabled.
    • ↑ Back to top

Troubleshooting

32. ThinkHere is stuck loading a model

If the loading progress bar has stopped moving or the model has not finished loading after several minutes, try the following steps in order:

  • Check your internet connection. The first load requires a 600 MB–4.5 GB download. A slow or interrupted connection can cause the download to stall. Refresh the page to try again.
  • Refresh the page. ThinkHere will resume from the cached portion of the download if part of it completed.
  • Close other browser tabs and applications to free up RAM. If available memory drops too low, the model may fail to load.
  • Check that your browser supports WebGPU. See How to check whether WebGPU is enabled.
  • If the issue persists, try a different supported browser (Chrome 113+ is recommended) or select a smaller model from the model picker.
  • ↑ Back to top

33. ThinkHere feels slow

Slow generation speed is usually caused by one or more of these factors:

  • Model size — larger models generate more slowly. Switch to a smaller model if speed is a priority.
  • Available RAM — if your device is low on memory, generation slows. Close other applications and browser tabs.
  • Background workloads — other GPU-intensive tasks (video playback, games, video calls) compete with ThinkHere for GPU resources.
  • Browser — Chrome and Edge currently have the most mature WebGPU implementations. Try switching if you are on Safari and performance is poor.

On modern hardware with the recommended RAM, expect 10–30+ tokens per second with appropriately sized models.

↑ Back to top

34. My browser says WebGPU is unavailable

If ThinkHere reports that WebGPU is unavailable, work through these checks:

  • Confirm you are using a supported browser: Chrome 113+, Edge 113+, or Safari 18+. Firefox does not support WebGPU.
  • Update your browser to the latest version. WebGPU support has improved significantly in recent releases.
  • Update your graphics drivers. Outdated drivers can block WebGPU even on supported browsers.
  • Check that hardware acceleration is enabled in your browser settings. In Chrome, go to Settings → System and make sure Use hardware acceleration when available is turned on.
  • If you are on a corporate device, IT policy may have disabled WebGPU. Contact your IT team.
  • If WebGPU remains unavailable after these steps, your hardware may not support it. ThinkHere requires a GPU that is compatible with the WebGPU standard.

    ↑ Back to top

35. I do not have enough memory to load a model

If ThinkHere fails with a memory error, or the tab crashes during model loading, your device does not have enough free RAM for the selected model.

  • Close other browser tabs — each tab uses memory that ThinkHere needs.
  • Close other applications running in the background.
  • Select a smaller model. The minimum is 4 GB of RAM for SmolLM2 1.7B. Larger models require more.
  • Restart your browser with only ThinkHere open before attempting to load the model again.
  • If you consistently cannot load even the smallest model after following these steps, your device may not meet the minimum requirements. ThinkHere requires at least 6 GB of available RAM.

    ↑ Back to top

36. The app works on one browser but not another

WebGPU support varies between browsers. Chrome and Edge currently have the most stable and performant WebGPU implementations. Safari's support is improving but may behave differently on some hardware combinations. Firefox does not support WebGPU.

If ThinkHere works in Chrome but not in another browser, this is expected. Use Chrome 113+ or Edge 113+ for the most consistent experience. If Safari 18+ is your only option, make sure Experimental WebGPU is enabled (see How to check whether WebGPU is enabled).

↑ Back to top

FAQ

37. Is ThinkHere really private?

Yes. ThinkHere's privacy is architectural. The AI model runs on your device using WebGPU, and your prompts and responses are stored only in your browser's local storage. No network request carries your conversation to any server during inference — not for any tier, not for any model.

You can verify this yourself. ThinkHere is open source under the MIT licence. The codebase is publicly available for inspection. You can also watch your browser's network activity while using ThinkHere — you will see no outbound requests during a conversation.

↑ Back to top

38. Do I need an account to use ThinkHere?

No. You can start using ThinkHere immediately without creating an account. Visit the site in a supported browser and SmolLM2 1.7B loads automatically after a one-time model download.

A free account unlocks more models, conversation history, system prompts, file upload, and the knowledge base. See ThinkHere pricing and tiers explained for the full comparison.

↑ Back to top

39. Does ThinkHere use cloud inference?

No. ThinkHere does not use cloud inference at any tier. Every response is generated locally on your device using WebGPU. There is no remote AI server processing your prompts — not even in the background.

The only connection ThinkHere makes to external servers is downloading model weights on first use, and account authentication if you are signed in. Neither involves your conversation content.

↑ Back to top

40. Why do models need to be downloaded?

Because ThinkHere runs the AI model on your device rather than on a server, your browser needs the model files available locally. Model weights — the data that defines how the model responds — are Typically 600 MB–4.5 GB and cannot be generated or approximated on the fly.

The download happens once per model. After that, ThinkHere loads the model from your browser's local cache in seconds. You only need an internet connection for the initial download, not for ongoing use.

↑ Back to top

41. Can I use ThinkHere on mobile?

iPhone is not currently supported. iOS memory constraints prevent the browser from loading model weights of the size required to run a language model in ThinkHere.

M-series iPads (M1 or later) may work using Safari 18+, though performance varies and load times are longer than on a desktop or laptop.

Android is tentatively supported on newer, high-end premium devices with Chrome 113+. Due to extreme memory demands, most phones will experience browser crashes or fail to load models completely.

For the best experience, use ThinkHere on a desktop or laptop with Chrome 113+, Edge 113+, or Safari 18+ and at least 6 GB of free RAM.

↑ Back to top
Secure Private Confidential

By using ThinkHere you agree to our Terms of Use, Privacy Policy and Usage Policies · A Qanata Labs product