蜂采-商品数据采集器

Overview

批量抓取商品详情页内容，包括商品名称、价格、品牌、型号、销售单位、商品图片、商品详情等信息。

# 蜂采 · 商品数据采集器 — 功能介绍与使用说明 # Fengcai · Product Data Scraper — Feature Guide & User Manual --- ## 目录 / Table of Contents - [简介 / Introduction](#简介--introduction) - [界面概览 / UI Overview](#界面概览--ui-overview) - [步骤一：商品列表 / Step 1: Product List](#步骤一商品列表--step-1-product-list) - [步骤二：详情页规则 / Step 2: Detail Page Rules](#步骤二详情页规则--step-2-detail-page-rules) - [步骤三：执行采集 / Step 3: Run Scraping](#步骤三执行采集--step-3-run-scraping) - [截图功能 / Screenshot Feature](#截图功能--screenshot-feature) - [导出结果 / Export Results](#导出结果--export-results) - [规则管理 / Rule Management](#规则管理--rule-management) - [注意事项 / Notes](#注意事项--notes) --- ## 简介 / Introduction **蜂采**是一款 Chrome 侧边栏扩展，专为电商选品、商品数据整理场景设计。无需编程基础，通过点选配置即可批量抓取商品详情页数据，并自动下载图片、截图，导出 Excel。 **Fengcai** is a Chrome side-panel extension for e-commerce product research and data collection. No coding required — configure rules by clicking, then batch-scrape product pages, auto-download images and screenshots, and export to Excel. --- ## 界面概览 / UI Overview **中文** 插件以侧边栏形式运行，顶部为三步流程导航： | 步骤 | 标签 | 说明 | | ---- | ---------- | ------------------------------------ | | 1 | 商品列表 | 管理待采集的商品 SKU 和链接 | | 2 | 详情页规则 | 配置各站点的字段提取规则与截图设置 | | 3 | 执行采集 | 配置采集参数、启动采集、查看实时日志 | **English** The extension runs as a side panel with a 3-step navigation: | Step | Tab | Description | | ---- | ----------------- | ----------------------------------------------------------------- | | 1 | Product List | Manage SKUs and URLs to scrape | | 2 | Detail Page Rules | Configure field extraction rules and screenshot settings per site | | 3 | Run Scraping | Set scraping parameters, start scraping, view real-time logs | --- ## 步骤一：商品列表 / Step 1: Product List ### 两种模式 / Two Modes **中文** **① 手动输入模式（默认）** - 导入 CSV 或 Excel 文件（需包含 `SKU`、`URL` 两列） - 或逐行手动输入，格式：`SKU 空格 URL` - 列表中 SKU 和 URL 均可点击，在新标签页中打开对应链接 - 支持全选/单选后批量删除 **② 列表页模式** - 适合从分类页、搜索结果页批量采集 - 配置项： - **列表入口 URL**：采集起始页面 - **列表项选择器**：定位每个商品卡片的 CSS 选择器 - **列表字段**：在列表页直接提取的字段（标题、价格、主图等） - **翻页策略**：下一页按钮 / 无限滚动，支持配置终止条件 - **进入详情页补充抓取**：勾选后可配置详情链接选择器，抓取详情页数据 > 若列表入口 URL 与当前浏览器活动标签页一致，插件会直接复用该页面，不会新开标签页。 **English** **① Manual Mode (Default)** - Import a CSV or Excel file (must include `SKU` and `URL` columns) - Or enter products manually, one per line: `SKU SPACE URL` - Both SKU and URL in the list are clickable links (open in new tab) - Supports select-all / single-select for bulk deletion **② List Page Mode** - Ideal for bulk collection from category or search results pages - Configuration: - **Entry URL**: Starting page for scraping - **Item Selector**: CSS selector to locate each product card - **List Fields**: Fields extracted directly from the list page (title, price, image, etc.) - **Pagination**: Next-page button or infinite scroll, with configurable stop conditions - **Enter Detail Page**: When checked, configures detail link selector to also scrape detail pages > If the entry URL matches the currently active browser tab, the extension reuses that tab instead of opening a new one. --- ## 步骤二：详情页规则 / Step 2: Detail Page Rules ### 规则集 / Rule Sets **中文** 每个**规则集**对应一个或多个站点，包含： - **名称**：标识该规则集 - **URL 匹配**：正则表达式，可用逗号分隔多个，用于自动匹配商品 URL - **字段规则列表**：定义要提取的字段 - **高级设置**：页面加载判定、自动滚动、前置脚本 - **截图设置**：见下方截图功能章节采集时根据商品 URL 自动匹配规则集；无匹配则跳过该商品。 **English** Each **rule set** corresponds to one or more sites and contains: - **Name**: Identifier for the rule set - **URL Pattern**: Regex patterns (comma-separated) to auto-match product URLs - **Field Rules**: Define which fields to extract - **Advanced Settings**: Page load conditions, auto-scroll, pre-execution script - **Screenshot Settings**: See the Screenshot section below During scraping, the matching rule set is automatically selected based on product URL; unmatched products are skipped. --- ### 字段规则配置 / Field Rule Configuration **中文** 每个字段规则包含： | 配置项 | 说明 | | ------------ | ----------------------------------------------- | | 字段名称 | 导出 Excel 的列标题 | | CSS 选择器 | 支持多行（按顺序匹配第一个命中的） | | 字段类型 | 文本 / 数字 / 图片 / 图片列表 / HTML / 元素属性 | | 是否为列表 | 提取多个值（图片列表自动勾选） | | 图片存储方式 | 无 / 主图 / 详情图 | | 后处理函数 | JavaScript 函数，对提取结果进行二次处理 | | 字段宽度 | 控制 Excel 导出时该列的宽度 | 支持： - **元素拾取器**：点击按钮后在页面上点选元素，自动生成选择器 - **测试匹配**：实时在当前页面验证选择器命中情况 - **拖拽排序**：左侧手柄拖动调整字段顺序 **English** Each field rule includes: | Setting | Description | | --------------------- | ----------------------------------------------------- | | Field Name | Column header in the exported Excel | | CSS Selector | Multi-line supported (first match wins) | | Field Type | Text / Number / Image / Image List / HTML / Attribute | | Is List | Extract multiple values (auto-checked for Image List) | | Image Storage | None / Main Image / Description Image | | Post-process Function | JavaScript to transform the extracted value | | Column Width | Controls Excel column width on export | Features: - **Element Picker**: Click to pick an element on the page; selector is auto-generated - **Test Match**: Instantly verify the selector against the current page - **Drag to Reorder**: Drag the left handle to rearrange fields --- ### 高级设置 / Advanced Settings **中文** | 配置项 | 说明 | | ------------ | --------------------------------------------- | | 页面加载判定 | 元素出现 / 元素消失，配合选择器和超时时间使用 | | 自动滚动 | 采集前自动滚动页面，触发懒加载内容 | | 前置脚本 | 在提取数据前执行自定义 JS（支持 async/await） | **English** | Setting | Description | | -------------- | ----------------------------------------------------------------- | | Load Condition | Wait for element to appear / disappear, with selector and timeout | | Auto Scroll | Scroll the page before extraction to trigger lazy-loaded content | | Pre-script | Run custom JS before data extraction (supports async/await) | --- ## 步骤三：执行采集 / Step 3: Run Scraping **中文** | 配置项 | 说明 | | ------------ | ------------------------------------------------------------------------ | | 抓取间隔 | 相邻两个商品之间的等待时间（秒），建议 ≥ 5 | | 图片命名规则 | 主图 / 详情图的文件名格式（支持 `{sku}`、`{index}`、`{ext}` 变量） | | 图片存储方式 | 合并模式（主图/详情图共用目录）/ 分离模式（按 SKU 子目录） | 操作按钮：**开始 → 暂停 → 继续 → 停止** 右侧实时日志面板显示每条商品的采集状态、图片下载进度、截图结果等。导出目录结构预览（第三个 Tab 底部）会实时反映当前配置，包括是否显示 `screenshots/` 文件夹。 **English** | Setting | Description | | --------------- | ------------------------------------------------------------------------------------------- | | Scrape Interval | Wait time (seconds) between products; recommended ≥ 5 | | Image Naming | File name pattern for main/description images (`{sku}`, `{index}`, `{ext}` variables) | | Image Storage | Merged (shared dirs) / Separated (per-SKU subdirectories) | Controls: **Start → Pause → Resume → Stop** The real-time log panel on the right shows scraping status, image download progress, and screenshot results for each product. The directory structure preview (bottom of Step 3 tab) updates live based on settings, including whether `screenshots/` is shown. --- ## 截图功能 / Screenshot Feature **中文** 截图设置位于**详情页规则 Tab** 的字段列表下方，按规则集独立配置。 ### 启用截图勾选**启用截图**后，每个详情页抓取完成后自动截图。 ### 截图模式 | 模式 | 说明 | | -------------------------------- | --------------------------------------------------------- | | 可见区域（默认） | 截取当前浏览器视口 | | 全页面 | 截取完整页面（含滚动区域），使用 Chrome DevTools Protocol | | 滚动到指定元素，然后截取可见区域 | 滚动到目标元素后截取视口 | | 只截取指定元素 | 精确裁剪到元素边界，不含周围内容 | > "全页面"和"只截取指定元素"模式需要 `debugger` 权限，截图完成后立即释放。 ### 元素选择器 - 填写 CSS 选择器，指定目标元素 - 点击**拾取器按钮**（准星图标）从页面上直接选取元素 - 点击**测试按钮**（对勾图标）验证选择器是否能正确命中元素，并显示元素位置和尺寸信息 ### 文件命名与保存位置 - 文件名：`{SKU}.png`（特殊字符自动替换为 `_`） - 保存位置：导出目录下的 `screenshots/` 子文件夹 - 采集开始时自动预建该文件夹 ### 导入/导出截图设置随规则集一起保存，支持导入和导出 JSON 规则文件。 --- **English** Screenshot settings are located **below the field list** in the Detail Page Rules tab, configured per rule set independently. ### Enable Screenshot When **Enable Screenshot** is checked, a screenshot is taken automatically after each detail page is scraped. ### Screenshot Modes | Mode | Description | | -------------------------------------------- | --------------------------------------------------------------------------------- | | Visible Area (Default) | Captures the current browser viewport | | Full Page | Captures the entire page including scrolled content, via Chrome DevTools Protocol | | Scroll to element, then capture visible area | Scrolls to the target element, then captures the viewport | | Capture only the specific element | Precisely crops to the element's boundaries | > "Full Page" and "Capture only the specific element" modes require the `debugger` permission, which is released immediately after each screenshot. ### Element Selector - Enter a CSS selector for the target element - Click the **Picker button** (crosshair icon) to pick an element directly from the page - Click the **Test button** (checkmark icon) to verify the selector matches correctly, showing the element's position and dimensions ### File Naming & Save Location - Filename: `{SKU}.png` (special characters replaced with `_`) - Saved to: `screenshots/` subfolder inside the export directory - The folder is automatically pre-created when scraping begins ### Import / Export Screenshot settings are saved as part of the rule set and are included in JSON rule set import/export. --- ## 导出结果 / Export Results **中文** 采集完成后自动导出： - **Excel 文件**（`.xlsx`，含时间戳）：含表头样式、边框、自动筛选、冻结首行 - **图片文件**：按规则自动下载到对应目录 - **截图文件**：保存到 `screenshots/` 目录（若启用） ### 导出目录结构示例 ``` 导出目录/ ├── 商品数据_2026-04-22T13-30-00.xlsx ├── images/ │ ├── ABC001/ │ │ ├── main/ │ │ │ ├── 01_ABC001.jpg │ │ │ └── 02_ABC001.jpg │ │ └── description/ │ │ └── ABC001_01.jpg │ └── XYZ002/ │ └── ... └── screenshots/ # 仅启用截图时存在 ├── ABC001.png └── XYZ002.png ``` **English** After scraping, the following are exported automatically: - **Excel file** (`.xlsx` with timestamp): styled headers, borders, auto-filter, frozen first row - **Image files**: downloaded to the configured directories - **Screenshots**: saved to `screenshots/` directory (if enabled) ### Export Directory Structure Example ``` export-dir/ ├── ProductData_2026-04-22T13-30-00.xlsx ├── images/ │ ├── ABC001/ │ │ ├── main/ │ │ │ ├── 01_ABC001.jpg │ │ │ └── 02_ABC001.jpg │ │ └── description/ │ │ └── ABC001_01.jpg │ └── XYZ002/ │ └── ... └── screenshots/ # Only present when screenshot is enabled ├── ABC001.png └── XYZ002.png ``` --- ## 规则管理 / Rule Management **中文** | 操作 | 说明 | | ----------------- | -------------------------------------------- | | 导出规则集 | 将当前规则集（含截图设置）保存为 JSON 文件 | | 导入规则集 | 从 JSON 文件加载规则集，可选择立即应用到当前 | | 备份自定义规则 | 导出所有自定义规则集为一个 JSON 备份文件 | | 全局配置导出/导入 | 包含所有规则集、设置项、列表配置的完整备份 | **English** | Action | Description | | --------------------------- | --------------------------------------------------------------------- | | Export Rule Set | Save current rule set (including screenshot settings) as a JSON file | | Import Rule Set | Load a rule set from a JSON file; optionally apply immediately | | Backup Custom Rules | Export all custom rule sets as a single JSON backup | | Global Config Export/Import | Full backup including all rule sets, settings, and list configuration | --- ## 注意事项 / Notes **中文** - 抓取间隔建议 ≥ 5 秒，避免对目标网站造成压力 - 列表模式使用前建议先测试选择器和字段命中情况 - 部分站点需要登录后才能抓取完整数据 - 列表翻页最多 200 页，超限自动停止 - 使用全页面截图时，页面会短暂被 Debugger 附加，部分有反爬机制的站点可能会检测到 - 截图功能要求标签页在采集时保持激活状态（插件会自动切换） **English** - Set scrape interval ≥ 5 seconds to avoid overloading the target site - Always test selectors and field matching before starting a full list scrape - Some sites require you to be logged in to scrape full product data - List pagination is capped at 200 pages; scraping stops automatically after that - Full-page screenshot mode briefly attaches a Debugger to the tab; some anti-bot systems may detect this - Screenshot capture requires the tab to be active; the extension switches to it automatically

Details

Version
1.3.2
Updated
May 18, 2026
Flag concern
Offered by
ezmo
Size
1.79MiB
Languages
2 languages
Developer
陈继虎
南翔镇栖林路425弄1号2601室嘉定区, 上海市 201802 CN
Email
chenjihu@gmail.com
Phone
+86 186 2166 9879
Trader
This developer has identified itself as a trader per the definition from the European Union and committed to only offer products or services that comply with EU laws.

Privacy

Manage extensions and learn how they're being used in your organization

The developer has disclosed that it will not collect or use your data.

This developer declares that your data is

Not being sold to third parties, outside of the approved use cases
Not being used or transferred for purposes that are unrelated to the item's core functionality
Not being used or transferred to determine creditworthiness or for lending purposes

Overview

0 out of 5No ratingsLearn more about results and reviews.

Details

Privacy

This developer declares that your data is

0 out of 5
No ratings
Learn more about results and reviews.