LocalAI Chat — Offline GGUF Assistant



Overview
Chat with any GGUF model 100% offline in your browser. Streaming, vision, PDF/Word/CSV analysis, workspace file editing.
LocalAI Chat lets you load and chat with any GGUF language model directly in your browser — no internet connection required after the extension is installed. HOW IT WORKS - Click the extension icon → a new tab opens automatically - Drop any .gguf model file onto the page - The model loads locally using WebAssembly (llama.cpp) - Chat with full streaming output, just like ChatGPT — but 100% private FEATURES ✓ Fully offline — zero network requests during inference ✓ Works with any GGUF model (Llama 3, Mistral, Qwen, Gemma, Phi, and more) ✓ Streaming token output with live tokens-per-second display ✓ Adjustable temperature, top-p, max tokens, context length, CPU threads ✓ Custom system prompt ✓ Chat history with copy button on every message ✓ Stop generation at any time ✓ New chat / change model buttons ✓ No account, no API key, no subscription RECOMMENDED MODELS Any Q4_K_M quantized model works well. Good starting points: - Llama-3.2-1B-Instruct-Q4_K_M (fast, ~800 MB) - Llama-3.2-3B-Instruct-Q4_K_M (balanced, ~2 GB) - Mistral-7B-Instruct-Q4_K_M (powerful, ~4 GB) Download models from: huggingface.co PRIVACY Your conversations and model files never leave your computer. No analytics. No tracking. No telemetry. Ever.
0 out of 5No ratings
Details
- Version1.1.1
- UpdatedJune 29, 2026
- Size2.63MiB
- LanguagesEnglish
- Developer
Email
yaniv.schwartz1@gmail.com - Non-traderThis developer has not identified itself as a trader. For consumers in the European Union, please note that consumer rights do not apply to contracts between you and this developer.
Privacy
This developer declares that your data is
- Not being sold to third parties, outside of the approved use cases
- Not being used or transferred for purposes that are unrelated to the item's core functionality
- Not being used or transferred to determine creditworthiness or for lending purposes