LocalSub - YouTube Local AI Subtitles & Translate
Overview
Transcribe any YouTube video and translate subtitles into your language. 100% local AI, no API key, no internet after setup.
Stop uploading your audio just to read it back. Most YouTube subtitle extensions send your video URL or your audio to a server, log your activity, lock features behind a sign-in, or stop working the moment you go offline. If you watch tutorials, lectures, dev talks, or foreign-language content every day, that adds up — to dependency, to slowdowns, and to a privacy tradeoff you never agreed to. LocalSub takes a different path. The transcription model and the translation model both run inside your browser, on your machine. Your audio never leaves Chrome. WHY YOUTUBE SUBTITLES ARE HARDER THAN THEY LOOK YouTube has captions for many videos, but the experience is patchy: - The CC track is in the wrong language - Auto-generated captions are noisy and untranslated - The video has no captions at all (common for new uploads, podcasts, screencasts) - The official translation drops technical terms and proper nouns - You want a clean SRT or TXT file for note-taking, but the page only shows the text as an overlay - You want the original and the translation side by side, not just one of the two Most tools solve part of this and create new problems — they require an API key, they upload your audio to an unknown server, they force a paid plan after a few videos, or they break whenever YouTube changes a player script. LocalSub is built to solve all of it in one panel without sending the audio anywhere. WHAT THIS EXTENSION DOES LocalSub adds a side panel to YouTube that does three things, in this order. It reads YouTube's existing captions when they exist, including auto-generated ones, directly from the player. If there is no caption track, it transcribes the audio locally with Whisper — the same open speech model used in many serious transcription tools — running fully inside your browser via WebAssembly. It then translates every segment into the target language you picked, using NLLB-200, a 200-language model from Meta AI Research that also runs locally. When the model cannot fit in memory on a smaller device, the panel automatically falls back to a free public translation endpoint so the workflow never dead-ends. Here is what you get: Local Whisper transcription — works on videos with no captions, no signed audio token, and no third-party transcription service. Local NLLB-200 translation — 19 target languages including English, German, French, Spanish, Italian, Portuguese, Dutch, Russian, Polish, Turkish, Arabic, Japanese, Korean, Simplified and Traditional Chinese, Hindi, Thai, Vietnamese, and Indonesian. Side-panel UI — keeps the video where it is and the transcript where it belongs, in a column you can read while the video plays. Clickable transcript — every segment is a timestamp; click it and the video seeks to that moment in the source tab. Bilingual rendering — by default each segment shows the translated line on top and the original line below it, so you can match the translation to the source word-for-word. SRT export — produces a standard subtitle file you can drop into VLC, Premiere, DaVinci Resolve, or anywhere else. TXT export — produces a clean reading transcript without timestamps, useful for note apps, AI chats, or sharing. Export mode selector — one tap switches between bilingual, translation only, or original only, for both SRT and TXT. 14 UI locales — the panel itself speaks English, Simplified Chinese, Traditional Chinese, Japanese, Korean, German, French, Spanish, Brazilian Portuguese, Italian, Arabic, Dutch, Russian, and Turkish. HOW IT WORKS 1. Open any video on youtube.com. 2. Pick your target language in the side panel — you only do this once. 3. Click Transcribe + Translate beside the video title. 4. Wait for the pipeline — fetching, transcribing if needed, translating. 5. Read the transcript, click any line to jump to that moment, or export to SRT or TXT. That is the whole loop. There is no account to create, no API key to paste, no credit card to add. WHY A SIDE PANEL INSTEAD OF A POPUP Popups vanish the moment you click anywhere else, which is exactly what you do when you scrub the video, change tabs, or open the description. A side panel stays open, shares the same window as the video, and gives the transcript enough vertical space to be readable without truncation. It also lets you copy a long line without the panel closing under your cursor. EXPORT MODES IN DETAIL BILINGUAL — both lines per segment. Useful for language learners and translators who need to verify the translation against the source. TRANSLATION — only the target language. Useful for clean reading and for handing the SRT to a player that does not support multi-line styling. ORIGINAL — the source language only. Useful when the video already has good captions and you just want a clean SRT or TXT, with no AI rewriting. The active mode applies to both SRT and TXT export, so you do not have to switch twice. REAL SCENARIOS WHERE THIS HELPS Watching a 90-minute Korean dev talk that only has Korean auto-captions: LocalSub reads the Korean track from the player, translates every line into your language, and lets you click your way back to the moments that matter. No upload, no waiting in someone's queue. Following a German university lecture with no captions at all: The page has no CC track. LocalSub reaches into the audio, runs Whisper on your machine, and produces timestamped segments. You then export SRT and load it into your player of choice. Studying a Japanese podcast clip embedded on YouTube: You want bilingual subtitles to read along. LocalSub transcribes the Japanese, translates to English, and exports a bilingual SRT. You drop it into VLC and watch with both rails. Skimming a four-hour course in English in your native language: You only have time for the highlights. LocalSub gives you a clickable transcript in the side panel; you scan, you click, you watch the relevant 30 seconds, you move on. Saving an Arabic news video for later reading: You want the article in text form. Export TXT in original mode, paste it into your reading app, done. Capturing quotes from a Spanish interview for a piece of writing: Click the segment, the video seeks, you confirm the speaker, you copy the bilingual line into your draft. No re-listening to a 12-minute clip to find one sentence. WHAT THIS EXTENSION DOES NOT DO It does not upload your audio to any external transcription server. It does not require a Google, OpenAI, or LocalSub account. There is no sign-in. It does not track your viewing history, share data with analytics, or fingerprint your browser. The only permissions it uses: - storage — remembers your target language and side-panel state. - downloads — saves SRT and TXT files when you click export. - activeTab and tabs — lets the side panel know which YouTube tab you are on so it can seek the video when you click a segment. - scripting — injects the small reader that pulls captions out of the YouTube player. - sidePanel — opens the transcript panel without taking over the page. - youtube.com and googlevideo.com — needed to read captions and audio from the player. - huggingface.co and related CDNs — used once on first run to download the Whisper and NLLB-200 model files into your browser cache. - translate.googleapis.com — used as a fallback only when the local NLLB model cannot fit in WebAssembly memory on the current device. That is the full list. After the one-time model download, no further network calls are required to transcribe and translate a video. PRICING LocalSub is free. There is no daily cap, no per-video paywall, no premium tier hiding the export buttons. Models download once and live in your browser cache. After that, transcription and translation work without an internet connection, with the small caveat that the public translation fallback does need network if your device cannot fit NLLB-200 locally. UNDER THE HOOD Transcription: Whisper-base, q4-quantized, around 142 MB. Translation: NLLB-200 distilled 600M parameters, q4-quantized, around 240 MB. Runtime: transformers.js with onnxruntime-web, WebGPU when available, WebAssembly otherwise. Caption fallback: a fetch and XHR hook on the player so the extension reads the same caption stream the player itself just downloaded, instead of re-requesting the protected timedtext endpoint. Translation fallback: when WebAssembly cannot allocate enough memory for NLLB on a smaller machine, the panel switches to a public Google translation endpoint for that session. No hidden cloud worker, no proxy server, no telemetry beacon. The code is straightforward enough that you can read it before installing. WHO THIS IS FOR - Language learners who want bilingual subtitles on every video, not just the few that ship with translated CC. - Researchers and students who watch lectures and conference talks across languages and need clean SRT or TXT for notes. - Developers and creators who follow tutorials in other languages and want timestamped, exportable transcripts. - Travel, finance, and news viewers who watch source-language videos and want a fast way to read them in their own language. - Privacy-minded users who would rather their audio stay inside their browser. GETTING STARTED Install the extension, pin it if you like, open a YouTube video, pick your language in the side panel, and click Transcribe + Translate. The first run downloads the models; every run after that is local. No account. No sign-up. No credit card. Just press the button.
0 out of 5No ratings
Details
- Version0.1.1
- UpdatedMay 4, 2026
- Offered bymonetgen.com
- Size7.95MiB
- Languages14 languages
- Developer
Email
monetgen.com@gmail.com - Non-traderThis developer has not identified itself as a trader. For consumers in the European Union, please note that consumer rights do not apply to contracts between you and this developer.
Privacy
This developer declares that your data is
- Not being sold to third parties, outside of the approved use cases
- Not being used or transferred for purposes that are unrelated to the item's core functionality
- Not being used or transferred to determine creditworthiness or for lending purposes