wechat-article-fetcher
npx skills add https://github.com/wwwzhouhui/skills_collection --skill wechat-article-fetcher
Agent 安装分布
Skill 文档
å¾®ä¿¡å ¬ä¼å·æç« è·åå¨
è·åãè§£æå¹¶ä¿åå¾®ä¿¡å ¬ä¼å·æç« ï¼æ¯æåç¯åæ¹éä¸è½½ãå æ°æ®æåãå¾çä¸è½½å Markdown 转æ¢ã
å¿«éå¼å§
è·ååç¯æç« ï¼
python scripts/fetch_wechat_article.py "https://mp.weixin.qq.com/s/xxxxx"
æ¹éè·åå¤ç¯æç« ï¼ç©ºæ ¼åéï¼ï¼
python scripts/fetch_wechat_article.py "url1" "url2" "url3" --output-dir ./output
æ¹éè·åå¤ç¯æç« ï¼éå·åéï¼ï¼
python scripts/fetch_wechat_article.py "url1,url2,url3" --output-dir ./output
ä» è¾åºå æ°æ®ï¼ä¸ä¿åæä»¶ï¼ï¼
python scripts/fetch_wechat_article.py "https://mp.weixin.qq.com/s/xxxxx" --json
ä¾èµå®è£
pip install beautifulsoup4 html2text requests
åè½è¯´æ
1. è·åæç« å¹¶ä¿åå°æ¬å°
python scripts/fetch_wechat_article.py "<url>" --output-dir ./output
è¾åºç®å½ç»æï¼
output/<å
¬ä¼å·åç§°>/<æ¥æ>_<æ é¢>/
âââ index.html # æ ¼å¼åçç¬ç«HTMLæä»¶
âââ article.md # Markdownçæ¬
âââ meta.json # æç« å
æ°æ®
âââ images/ # ä¸è½½çå¾ç
2. ä» æåå æ°æ®
python scripts/fetch_wechat_article.py "<url>" --json
è¿å JSON å
å«ï¼titleï¼æ é¢ï¼ãauthorï¼ä½è
ï¼ãaccount_nicknameï¼å
¬ä¼å·åç§°ï¼ãdescriptionï¼æè¦ï¼ãcreate_timeï¼å叿¶é´ï¼ãcontent_textï¼æ£æææ¬ï¼ãcontent_markdownï¼Markdownå
容ï¼ãcover_imageï¼å°é¢å¾ï¼ãsource_urlï¼åæé¾æ¥ï¼ã
3. æ¹éä¸è½½å¤ç¯æç«
ç©ºæ ¼åéå¤ä¸ªé¾æ¥ï¼
python scripts/fetch_wechat_article.py "url1" "url2" "url3" --output-dir ./output
éå·åéå¤ä¸ªé¾æ¥ï¼
python scripts/fetch_wechat_article.py "url1,url2,url3" --output-dir ./output
èªå®ä¹ä¸è½½é´éï¼é»è®¤3ç§ï¼é¿å 触ååç¬ï¼ï¼
python scripts/fetch_wechat_article.py "url1" "url2" --interval 5
åä¸å ¬ä¼å·çæç« èªå¨å½ç±»å°åä¸ç®å½ä¸ã
4. ä¸ä¸è½½å¾ç
python scripts/fetch_wechat_article.py "<url>" --no-images
4. ä¸ä¸è½½å¾ç
python scripts/fetch_wechat_article.py "<url>" --no-images
5. ä½ä¸º Python åºè°ç¨
from scripts.fetch_wechat_article import fetch_article, batch_fetch
# åç¯è·åå¹¶ä¿å
result = fetch_article("https://mp.weixin.qq.com/s/xxxxx", output_dir="./output")
print(result['title'], result['path'])
# åç¯ä»
è·åå
æ°æ®
meta = fetch_article("https://mp.weixin.qq.com/s/xxxxx", json_only=True)
print(meta['title'])
print(meta['content_text'][:200])
# æ¹éè·å
urls = ["https://mp.weixin.qq.com/s/aaa", "https://mp.weixin.qq.com/s/bbb"]
stats = batch_fetch(urls, output_dir="./output", interval=3.0)
print(f"æå{stats['success']}ç¯, 失败{stats['fail']}ç¯")
主è¦å½æ°åæ°ï¼
urlï¼æç« é¾æ¥ï¼æ¯æç龿¥åé¿é¾æ¥ï¼output_dirï¼ä¿åç®å½ï¼é»è®¤ï¼./wechat_articlesï¼download_imgï¼æ¯å¦ä¸è½½å¾çï¼é»è®¤ï¼Trueï¼to_markdownï¼æ¯å¦è½¬æ¢ä¸º Markdownï¼é»è®¤ï¼Trueï¼json_onlyï¼ä» è¿åå æ°æ®åå ¸ï¼ä¸ä¿åæä»¶
batch_fetch é¢å¤åæ°ï¼
urlsï¼æç« é¾æ¥å表intervalï¼æ¯ç¯æç« ä¹é´çä¸è½½é´éç§æ°ï¼é»è®¤ï¼3.0ï¼
注æäºé¡¹
- ä¼å
使ç¨ç龿¥ï¼
/s/xxxxxï¼ââ 带__bizåæ°çé¿é¾æ¥å¯è½è§¦åéªè¯ç ã - æ¹éä¸è½½æ¶é»è®¤é´é3ç§ï¼å¯éè¿
--intervalè°æ´ï¼é¿å 触å微信åç¬æºå¶ã - èªå¨ä½¿ç¨å¾®ä¿¡ç§»å¨ç«¯ User-Agent ç»è¿è®¿é®éå¶ã
- 微信å¾ç使ç¨
data-src屿§ï¼ésrcï¼ï¼å 为éç¨äºæå è½½ã - ä¸è½½å¾çéè¦è®¾ç½®
Referer: https://mp.weixin.qq.com/请æ±å¤´ã - HTML ç»æè¯¦æ åè§ references/wechat_html_structure.mdã