wechat-article-extractor
17
总安装量
16
周安装量
#20734
全站排名
安装命令
npx skills add https://github.com/freestylefly/wechat-article-extractor-skill --skill wechat-article-extractor
Agent 安装分布
opencode
16
gemini-cli
16
github-copilot
16
amp
16
codex
16
openclaw
16
Skill 文档
WeChat Article Extractor
Extract metadata and content from WeChat Official Account (å¾®ä¿¡å ¬ä¼å·) articles.
Capabilities
- Parse WeChat article URLs (
mp.weixin.qq.com) - Extract article metadata: title, author, description, publish time
- Extract account info: name, avatar, alias, description
- Get article content (HTML)
- Get cover image URL
- Support multiple article types: post, video, image, voice, text, repost
- Handle various error cases: deleted content, expired links, access limits
Usage
Basic Extraction from URL
const { extract } = require('./scripts/extract.js');
const result = await extract('https://mp.weixin.qq.com/s?__biz=...');
// Returns: { done: true, code: 0, data: {...} }
Extraction from HTML
const html = await fetch(url).then(r => r.text());
const result = await extract(html, { url: sourceUrl });
Options
const result = await extract(url, {
shouldReturnContent: true, // Return HTML content (default: true)
shouldReturnRawMeta: false, // Return raw metadata (default: false)
shouldFollowTransferLink: true, // Follow migrated account links (default: true)
shouldExtractMpLinks: false, // Extract embedded mp.weixin links (default: false)
shouldExtractTags: false, // Extract article tags (default: false)
shouldExtractRepostMeta: false // Extract repost source info (default: false)
});
Response Format
Success Response
{
done: true,
code: 0,
data: {
// Account info
account_name: "å
¬ä¼å·åç§°",
account_alias: "微信å·",
account_avatar: "头åURL",
account_description: "åè½ä»ç»",
account_id: "åå§ID",
account_biz: "bizåæ°",
account_biz_number: 1234567890,
account_qr_code: "äºç»´ç URL",
// Article info
msg_title: "æç« æ é¢",
msg_desc: "æç« æè¦",
msg_content: "HTMLå
容",
msg_cover: "å°é¢å¾URL",
msg_author: "ä½è
",
msg_type: "post", // post|video|image|voice|text|repost
msg_has_copyright: true,
msg_publish_time: Date,
msg_publish_time_str: "2024/01/15 10:30:00",
// Link params
msg_link: "æç« 龿¥",
msg_source_url: "é
读åæé¾æ¥",
msg_sn: "snåæ°",
msg_mid: 1234567890,
msg_idx: 1
}
}
Error Response
{
done: false,
code: 1001,
msg: "æ æ³è·åæç« ä¿¡æ¯"
}
Error Codes
| Code | Message | Description |
|---|---|---|
| 1000 | æç« è·å失败 | General failure |
| 1001 | æ æ³è·åæç« ä¿¡æ¯ | Missing title or publish time |
| 1002 | 请æ±å¤±è´¥ | HTTP request failed |
| 1003 | ååºä¸ºç©º | Empty response |
| 1004 | 访é®è¿äºé¢ç¹ | Rate limited |
| 1005 | èæ¬è§£æå¤±è´¥ | Script parsing error |
| 1006 | å ¬ä¼å·å·²è¿ç§» | Account migrated |
| 2001 | 请æä¾æç« å 容æé¾æ¥ | Missing input |
| 2002 | 龿¥å·²è¿æ | Link expired |
| 2003 | å 容æ¶å«ä¾µæ | Content removed (copyright) |
| 2004 | æ æ³è·åè¿ç§»åç龿¥ | Migration link failed |
| 2005 | å 容已被åå¸è å é¤ | Content deleted by author |
| 2006 | å 容å è¿è§æ æ³æ¥ç | Content blocked |
| 2007 | å 容åé失败 | Failed to send |
| 2008 | ç³»ç»åºé | System error |
| 2009 | 䏿¯æç龿¥ | Unsupported URL |
| 2010 | å 容è·å失败 | Content fetch failed |
| 2011 | æ¶å«è¿åº¦è¥é | Marketing/spam content |
| 2012 | è´¦å·å·²è¢«å±è½ | Account blocked |
| 2013 | è´¦å·å·²èªä¸»æ³¨é | Account deleted |
| 2014 | å 容被æè¯ | Content reported |
| 2015 | è´¦å·å¤äºè¿ç§»æµç¨ä¸ | Account migrating |
| 2016 | ååä¾µæ | Impersonation |
Dependencies
Required npm packages:
cheerio– HTML parsingdayjs– Date formattingrequest-promise– HTTP requestsqs– Query string parsinglodash.unescape– HTML entities
Notes
- Handles various WeChat page structures and anti-scraping measures
- Automatically detects article type from page content
- Supports extracting from Sogou WeChat search results (
weixin.sogou.com) - Some fields may be null depending on article type and page structure