Item logo image for RAG Text Scraper

RAG Text Scraper

myvibedcode.com
Item media 3 (screenshot) for RAG Text Scraper
Item media 1 (screenshot) for RAG Text Scraper
Item media 2 (screenshot) for RAG Text Scraper
Item media 3 (screenshot) for RAG Text Scraper
Item media 1 (screenshot) for RAG Text Scraper
Item media 1 (screenshot) for RAG Text Scraper
Item media 2 (screenshot) for RAG Text Scraper
Item media 3 (screenshot) for RAG Text Scraper

Overview

Extracts clean article text from a list of URLs and saves as .txt files.

The AI Text Scraper is a powerful tool designed for developers, Vibe Coders, Product Managers, and Researchers who need to build high-quality text datasets for Retrieval-Augmented Generation (RAG) systems. Tired of manually cleaning up ads, headers, and other clutter from web articles? This extension automates the entire process, allowing you to turn a list of URLs into clean, ready-to-use .txt files with just one click. ✨ KEY FEATURES ✨ **Time Save! Bulk & Single Page Scraping** No more copying and pasting individual files into separate word doc, cleaning the data and saving as a .txt file! **More Time Save! Intelligent Content Extraction:** Powered by Mozilla's Readability.js library, the extension intelligently removes ads, banners, and navigation menus to isolate the core article content. **The Best Time Save! AI-Powered Cleaning:** Take your data quality to the next level. Connect your own API key to use powerful language models (Google Gemini, OpenAI GPT, or Anthropic Claude) to fix paragraphing, remove duplicate sentences, and eliminate any remaining artifacts. --- 👤 WHO IS THIS FOR? 👤 AI Developers, & Vibe Coders: Quickly build and expand knowledge bases for your RAG applications. Data Scientists: Efficiently gather and preprocess large text corpora for analysis and model training. Product Managers: Rapidly create a proof-of-concept or MVP for an AI feature by sourcing a clean, initial dataset without needing an engineering team. . Researchers & Students: Collect and archive articles and online sources for academic work without the noise. --- ⚙️ **HOW IT WORKS** ⚙️ The extension uses a two-stage process: 1. **Extraction:** It first uses Readability.js to find the main content of a webpage. 2. **AI Cleaning (Optional):** If you enable the AI feature, the extracted text is then sent to your chosen AI provider with a specific prompt to perform a final, high-fidelity cleanup, ensuring the output is perfect for ingestion into a vector database. Get started in seconds. Configure your settings, paste your URLs, and start building your dataset today!

Details

  • Version
    1.0
  • Updated
    October 30, 2025
  • Size
    316KiB
  • Languages
    English
  • Developer
    Website
    Email
    oscarcraven@gmail.com
  • Non-trader
    This developer has not identified itself as a trader. For consumers in the European Union, please note that consumer rights do not apply to contracts between you and this developer.

Privacy

Manage extensions and learn how they're being used in your organization
The developer has disclosed that it will not collect or use your data. To learn more, see the developer’s privacy policy.

This developer declares that your data is

  • Not being sold to third parties, outside of the approved use cases
  • Not being used or transferred for purposes that are unrelated to the item's core functionality
  • Not being used or transferred to determine creditworthiness or for lending purposes

Support

For help with questions, suggestions, or problems, visit the developer's support site

Related

Webpage to Markdown

0.0

Convert any webpage to clean markdown with one click. Extract article content and download or copy as markdown.

Right-Click Save Text

5.0

Enable right-click, copy text on any site, unblock selection, extract clean article text, and save content as TXT or PDF.

Summarization AI

3.0

Chrome extension to summarize the content of a URL

AI Text Cleaner: Remove Text Formatting

5.0

Use AI Text Cleaner to clean up text, remove formatting styles, and clear watermarks online - all in one tool.

AI Text Generator

5.0

Ask AI Text Generator, your ai writer, to continue a paragraph, rewrite text, and get a custom response from webpage content.

Bulk Links Scraper

4.2

Scan links and extract data: download files, save HTML, extract emails, and find patterns with regex

URL List – save and export lists of URLs

5.0

Save and export lists of URLs

Website Summarizer

5.0

One-click website summarizer to shorten & summarize text fast: smart article, text and sentence shortener, clean summary instantly

Eidexa AI

5.0

Extract and store webpage content using advanced content processing

Claude Research Extractor

5.0

Extract Claude AI research with links preserved. Open at markdown.vc to export as PDF, DOC, or HTML.

Html to Word

4.0

Clip articles from the web, process them with AI, and save as Word documents.

SuperCurate Web Clipper

0.0

Save web content to your SuperCurate library with one click. Clip articles, text snippets, images and PDFs from any webpage.

Webpage to Markdown

0.0

Convert any webpage to clean markdown with one click. Extract article content and download or copy as markdown.

Right-Click Save Text

5.0

Enable right-click, copy text on any site, unblock selection, extract clean article text, and save content as TXT or PDF.

Summarization AI

3.0

Chrome extension to summarize the content of a URL

AI Text Cleaner: Remove Text Formatting

5.0

Use AI Text Cleaner to clean up text, remove formatting styles, and clear watermarks online - all in one tool.

AI Text Generator

5.0

Ask AI Text Generator, your ai writer, to continue a paragraph, rewrite text, and get a custom response from webpage content.

Bulk Links Scraper

4.2

Scan links and extract data: download files, save HTML, extract emails, and find patterns with regex

URL List – save and export lists of URLs

5.0

Save and export lists of URLs

Website Summarizer

5.0

One-click website summarizer to shorten & summarize text fast: smart article, text and sentence shortener, clean summary instantly

Google apps