蜂采-商品数据采集器
Overview
批量抓取商品详情页内容,包括商品名称、价格、品牌、型号、销售单位、商品图片、商品详情等信息。
# 蜂采 · 商品数据采集器 — 功能介绍与使用说明 # Fengcai · Product Data Scraper — Feature Guide & User Manual --- ## 目录 / Table of Contents - [简介 / Introduction](#简介--introduction) - [界面概览 / UI Overview](#界面概览--ui-overview) - [步骤一:商品列表 / Step 1: Product List](#步骤一商品列表--step-1-product-list) - [步骤二:详情页规则 / Step 2: Detail Page Rules](#步骤二详情页规则--step-2-detail-page-rules) - [步骤三:执行采集 / Step 3: Run Scraping](#步骤三执行采集--step-3-run-scraping) - [截图功能 / Screenshot Feature](#截图功能--screenshot-feature) - [导出结果 / Export Results](#导出结果--export-results) - [规则管理 / Rule Management](#规则管理--rule-management) - [注意事项 / Notes](#注意事项--notes) --- ## 简介 / Introduction **蜂采**是一款 Chrome 侧边栏扩展,专为电商选品、商品数据整理场景设计。无需编程基础,通过点选配置即可批量抓取商品详情页数据,并自动下载图片、截图,导出 Excel。 **Fengcai** is a Chrome side-panel extension for e-commerce product research and data collection. No coding required — configure rules by clicking, then batch-scrape product pages, auto-download images and screenshots, and export to Excel. --- ## 界面概览 / UI Overview **中文** 插件以侧边栏形式运行,顶部为三步流程导航: | 步骤 | 标签 | 说明 | | ---- | ---------- | ------------------------------------ | | 1 | 商品列表 | 管理待采集的商品 SKU 和链接 | | 2 | 详情页规则 | 配置各站点的字段提取规则与截图设置 | | 3 | 执行采集 | 配置采集参数、启动采集、查看实时日志 | **English** The extension runs as a side panel with a 3-step navigation: | Step | Tab | Description | | ---- | ----------------- | ----------------------------------------------------------------- | | 1 | Product List | Manage SKUs and URLs to scrape | | 2 | Detail Page Rules | Configure field extraction rules and screenshot settings per site | | 3 | Run Scraping | Set scraping parameters, start scraping, view real-time logs | --- ## 步骤一:商品列表 / Step 1: Product List ### 两种模式 / Two Modes **中文** **① 手动输入模式(默认)** - 导入 CSV 或 Excel 文件(需包含 `SKU`、`URL` 两列) - 或逐行手动输入,格式:`SKU 空格 URL` - 列表中 SKU 和 URL 均可点击,在新标签页中打开对应链接 - 支持全选/单选后批量删除 **② 列表页模式** - 适合从分类页、搜索结果页批量采集 - 配置项: - **列表入口 URL**:采集起始页面 - **列表项选择器**:定位每个商品卡片的 CSS 选择器 - **列表字段**:在列表页直接提取的字段(标题、价格、主图等) - **翻页策略**:下一页按钮 / 无限滚动,支持配置终止条件 - **进入详情页补充抓取**:勾选后可配置详情链接选择器,抓取详情页数据 > 若列表入口 URL 与当前浏览器活动标签页一致,插件会直接复用该页面,不会新开标签页。 **English** **① Manual Mode (Default)** - Import a CSV or Excel file (must include `SKU` and `URL` columns) - Or enter products manually, one per line: `SKU SPACE URL` - Both SKU and URL in the list are clickable links (open in new tab) - Supports select-all / single-select for bulk deletion **② List Page Mode** - Ideal for bulk collection from category or search results pages - Configuration: - **Entry URL**: Starting page for scraping - **Item Selector**: CSS selector to locate each product card - **List Fields**: Fields extracted directly from the list page (title, price, image, etc.) - **Pagination**: Next-page button or infinite scroll, with configurable stop conditions - **Enter Detail Page**: When checked, configures detail link selector to also scrape detail pages > If the entry URL matches the currently active browser tab, the extension reuses that tab instead of opening a new one. --- ## 步骤二:详情页规则 / Step 2: Detail Page Rules ### 规则集 / Rule Sets **中文** 每个**规则集**对应一个或多个站点,包含: - **名称**:标识该规则集 - **URL 匹配**:正则表达式,可用逗号分隔多个,用于自动匹配商品 URL - **字段规则列表**:定义要提取的字段 - **高级设置**:页面加载判定、自动滚动、前置脚本 - **截图设置**:见下方截图功能章节 采集时根据商品 URL 自动匹配规则集;无匹配则跳过该商品。 **English** Each **rule set** corresponds to one or more sites and contains: - **Name**: Identifier for the rule set - **URL Pattern**: Regex patterns (comma-separated) to auto-match product URLs - **Field Rules**: Define which fields to extract - **Advanced Settings**: Page load conditions, auto-scroll, pre-execution script - **Screenshot Settings**: See the Screenshot section below During scraping, the matching rule set is automatically selected based on product URL; unmatched products are skipped. --- ### 字段规则配置 / Field Rule Configuration **中文** 每个字段规则包含: | 配置项 | 说明 | | ------------ | ----------------------------------------------- | | 字段名称 | 导出 Excel 的列标题 | | CSS 选择器 | 支持多行(按顺序匹配第一个命中的) | | 字段类型 | 文本 / 数字 / 图片 / 图片列表 / HTML / 元素属性 | | 是否为列表 | 提取多个值(图片列表自动勾选) | | 图片存储方式 | 无 / 主图 / 详情图 | | 后处理函数 | JavaScript 函数,对提取结果进行二次处理 | | 字段宽度 | 控制 Excel 导出时该列的宽度 | 支持: - **元素拾取器**:点击按钮后在页面上点选元素,自动生成选择器 - **测试匹配**:实时在当前页面验证选择器命中情况 - **拖拽排序**:左侧手柄拖动调整字段顺序 **English** Each field rule includes: | Setting | Description | | --------------------- | ----------------------------------------------------- | | Field Name | Column header in the exported Excel | | CSS Selector | Multi-line supported (first match wins) | | Field Type | Text / Number / Image / Image List / HTML / Attribute | | Is List | Extract multiple values (auto-checked for Image List) | | Image Storage | None / Main Image / Description Image | | Post-process Function | JavaScript to transform the extracted value | | Column Width | Controls Excel column width on export | Features: - **Element Picker**: Click to pick an element on the page; selector is auto-generated - **Test Match**: Instantly verify the selector against the current page - **Drag to Reorder**: Drag the left handle to rearrange fields --- ### 高级设置 / Advanced Settings **中文** | 配置项 | 说明 | | ------------ | --------------------------------------------- | | 页面加载判定 | 元素出现 / 元素消失,配合选择器和超时时间使用 | | 自动滚动 | 采集前自动滚动页面,触发懒加载内容 | | 前置脚本 | 在提取数据前执行自定义 JS(支持 async/await) | **English** | Setting | Description | | -------------- | ----------------------------------------------------------------- | | Load Condition | Wait for element to appear / disappear, with selector and timeout | | Auto Scroll | Scroll the page before extraction to trigger lazy-loaded content | | Pre-script | Run custom JS before data extraction (supports async/await) | --- ## 步骤三:执行采集 / Step 3: Run Scraping **中文** | 配置项 | 说明 | | ------------ | ------------------------------------------------------------------------ | | 抓取间隔 | 相邻两个商品之间的等待时间(秒),建议 ≥ 5 | | 图片命名规则 | 主图 / 详情图的文件名格式(支持 `{sku}`、`{index}`、`{ext}` 变量) | | 图片存储方式 | 合并模式(主图/详情图共用目录)/ 分离模式(按 SKU 子目录) | 操作按钮:**开始 → 暂停 → 继续 → 停止** 右侧实时日志面板显示每条商品的采集状态、图片下载进度、截图结果等。 导出目录结构预览(第三个 Tab 底部)会实时反映当前配置,包括是否显示 `screenshots/` 文件夹。 **English** | Setting | Description | | --------------- | ------------------------------------------------------------------------------------------- | | Scrape Interval | Wait time (seconds) between products; recommended ≥ 5 | | Image Naming | File name pattern for main/description images (`{sku}`, `{index}`, `{ext}` variables) | | Image Storage | Merged (shared dirs) / Separated (per-SKU subdirectories) | Controls: **Start → Pause → Resume → Stop** The real-time log panel on the right shows scraping status, image download progress, and screenshot results for each product. The directory structure preview (bottom of Step 3 tab) updates live based on settings, including whether `screenshots/` is shown. --- ## 截图功能 / Screenshot Feature **中文** 截图设置位于**详情页规则 Tab** 的字段列表下方,按规则集独立配置。 ### 启用截图 勾选**启用截图**后,每个详情页抓取完成后自动截图。 ### 截图模式 | 模式 | 说明 | | -------------------------------- | --------------------------------------------------------- | | 可见区域(默认) | 截取当前浏览器视口 | | 全页面 | 截取完整页面(含滚动区域),使用 Chrome DevTools Protocol | | 滚动到指定元素,然后截取可见区域 | 滚动到目标元素后截取视口 | | 只截取指定元素 | 精确裁剪到元素边界,不含周围内容 | > "全页面"和"只截取指定元素"模式需要 `debugger` 权限,截图完成后立即释放。 ### 元素选择器 - 填写 CSS 选择器,指定目标元素 - 点击**拾取器按钮**(准星图标)从页面上直接选取元素 - 点击**测试按钮**(对勾图标)验证选择器是否能正确命中元素,并显示元素位置和尺寸信息 ### 文件命名与保存位置 - 文件名:`{SKU}.png`(特殊字符自动替换为 `_`) - 保存位置:导出目录下的 `screenshots/` 子文件夹 - 采集开始时自动预建该文件夹 ### 导入/导出 截图设置随规则集一起保存,支持导入和导出 JSON 规则文件。 --- **English** Screenshot settings are located **below the field list** in the Detail Page Rules tab, configured per rule set independently. ### Enable Screenshot When **Enable Screenshot** is checked, a screenshot is taken automatically after each detail page is scraped. ### Screenshot Modes | Mode | Description | | -------------------------------------------- | --------------------------------------------------------------------------------- | | Visible Area (Default) | Captures the current browser viewport | | Full Page | Captures the entire page including scrolled content, via Chrome DevTools Protocol | | Scroll to element, then capture visible area | Scrolls to the target element, then captures the viewport | | Capture only the specific element | Precisely crops to the element's boundaries | > "Full Page" and "Capture only the specific element" modes require the `debugger` permission, which is released immediately after each screenshot. ### Element Selector - Enter a CSS selector for the target element - Click the **Picker button** (crosshair icon) to pick an element directly from the page - Click the **Test button** (checkmark icon) to verify the selector matches correctly, showing the element's position and dimensions ### File Naming & Save Location - Filename: `{SKU}.png` (special characters replaced with `_`) - Saved to: `screenshots/` subfolder inside the export directory - The folder is automatically pre-created when scraping begins ### Import / Export Screenshot settings are saved as part of the rule set and are included in JSON rule set import/export. --- ## 导出结果 / Export Results **中文** 采集完成后自动导出: - **Excel 文件**(`.xlsx`,含时间戳):含表头样式、边框、自动筛选、冻结首行 - **图片文件**:按规则自动下载到对应目录 - **截图文件**:保存到 `screenshots/` 目录(若启用) ### 导出目录结构示例 ``` 导出目录/ ├── 商品数据_2026-04-22T13-30-00.xlsx ├── images/ │ ├── ABC001/ │ │ ├── main/ │ │ │ ├── 01_ABC001.jpg │ │ │ └── 02_ABC001.jpg │ │ └── description/ │ │ └── ABC001_01.jpg │ └── XYZ002/ │ └── ... └── screenshots/ # 仅启用截图时存在 ├── ABC001.png └── XYZ002.png ``` **English** After scraping, the following are exported automatically: - **Excel file** (`.xlsx` with timestamp): styled headers, borders, auto-filter, frozen first row - **Image files**: downloaded to the configured directories - **Screenshots**: saved to `screenshots/` directory (if enabled) ### Export Directory Structure Example ``` export-dir/ ├── ProductData_2026-04-22T13-30-00.xlsx ├── images/ │ ├── ABC001/ │ │ ├── main/ │ │ │ ├── 01_ABC001.jpg │ │ │ └── 02_ABC001.jpg │ │ └── description/ │ │ └── ABC001_01.jpg │ └── XYZ002/ │ └── ... └── screenshots/ # Only present when screenshot is enabled ├── ABC001.png └── XYZ002.png ``` --- ## 规则管理 / Rule Management **中文** | 操作 | 说明 | | ----------------- | -------------------------------------------- | | 导出规则集 | 将当前规则集(含截图设置)保存为 JSON 文件 | | 导入规则集 | 从 JSON 文件加载规则集,可选择立即应用到当前 | | 备份自定义规则 | 导出所有自定义规则集为一个 JSON 备份文件 | | 全局配置导出/导入 | 包含所有规则集、设置项、列表配置的完整备份 | **English** | Action | Description | | --------------------------- | --------------------------------------------------------------------- | | Export Rule Set | Save current rule set (including screenshot settings) as a JSON file | | Import Rule Set | Load a rule set from a JSON file; optionally apply immediately | | Backup Custom Rules | Export all custom rule sets as a single JSON backup | | Global Config Export/Import | Full backup including all rule sets, settings, and list configuration | --- ## 注意事项 / Notes **中文** - 抓取间隔建议 ≥ 5 秒,避免对目标网站造成压力 - 列表模式使用前建议先测试选择器和字段命中情况 - 部分站点需要登录后才能抓取完整数据 - 列表翻页最多 200 页,超限自动停止 - 使用全页面截图时,页面会短暂被 Debugger 附加,部分有反爬机制的站点可能会检测到 - 截图功能要求标签页在采集时保持激活状态(插件会自动切换) **English** - Set scrape interval ≥ 5 seconds to avoid overloading the target site - Always test selectors and field matching before starting a full list scrape - Some sites require you to be logged in to scrape full product data - List pagination is capped at 200 pages; scraping stops automatically after that - Full-page screenshot mode briefly attaches a Debugger to the tab; some anti-bot systems may detect this - Screenshot capture requires the tab to be active; the extension switches to it automatically
0 out of 5No ratings
Details
- Version1.3.1
- UpdatedApril 26, 2026
- Offered byezmo
- Size1.79MiB
- Languages2 languages
- Developer陈继虎
南翔镇栖林路425弄1号2601室 嘉定区, 上海市 201802 CNEmail
chenjihu@gmail.comPhone
+86 186 2166 9879 - TraderThis developer has identified itself as a trader per the definition from the European Union and committed to only offer products or services that comply with EU laws.
Privacy
This developer declares that your data is
- Not being sold to third parties, outside of the approved use cases
- Not being used or transferred for purposes that are unrelated to the item's core functionality
- Not being used or transferred to determine creditworthiness or for lending purposes