china-news-crawler
35
总安装量
35
周安装量
#5857
全站排名
安装命令
npx skills add https://github.com/nanmicoder/newscrawler --skill china-news-crawler
Agent 安装分布
opencode
26
claude-code
21
gemini-cli
16
codex
16
openclaw
15
antigravity
12
Skill 文档
China News Crawler Skill
ä»ä¸å½ä¸»æµæ°é»å¹³å°æåæç« å 容ï¼è¾åº JSON å Markdown æ ¼å¼ã
ç¬ç«å¯è¿ç§»ï¼æ¬ Skill å 嫿æå¿ é代ç ï¼æ å¤é¨ä¾èµï¼å¯ç´æ¥å¤å¶å°å ¶ä»é¡¹ç®ä½¿ç¨ã
æ¯æå¹³å°
| å¹³å° | ID | URL ç¤ºä¾ |
|---|---|---|
| å¾®ä¿¡å ¬ä¼å· | https://mp.weixin.qq.com/s/xxxxx |
|
| 仿¥å¤´æ¡ | toutiao | https://www.toutiao.com/article/123456/ |
| ç½ææ°é» | netease | https://www.163.com/news/article/ABC123.html |
| æçæ°é» | sohu | https://www.sohu.com/a/123456_789 |
| è ¾è®¯æ°é» | tencent | https://news.qq.com/rain/a/20251016A07W8J00 |
ä½¿ç¨æ¹å¼
åºæ¬ç¨æ³
# æåæ°é»ï¼èªå¨æ£æµå¹³å°ï¼è¾åº JSON + Markdown
uv run .claude/skills/china-news-crawler/scripts/extract_news.py "URL"
# æå®è¾åºç®å½
uv run .claude/skills/china-news-crawler/scripts/extract_news.py "URL" --output ./output
# ä»
è¾åº JSON
uv run .claude/skills/china-news-crawler/scripts/extract_news.py "URL" --format json
# ä»
è¾åº Markdown
uv run .claude/skills/china-news-crawler/scripts/extract_news.py "URL" --format markdown
# ååºæ¯æçå¹³å°
uv run .claude/skills/china-news-crawler/scripts/extract_news.py --list-platforms
è¾åºæä»¶
èæ¬é»è®¤è¾åºä¸¤ç§æ ¼å¼å°æå®ç®å½ï¼é»è®¤ ./outputï¼ï¼
{news_id}.json– ç»æå JSON æ°æ®{news_id}.md– Markdown æ ¼å¼æç«
工使µç¨
- æ¥æ¶ URL – ç¨æ·æä¾æ°é»é¾æ¥
- 平尿£æµ – èªå¨è¯å«å¹³å°ç±»å
- å 容æå – è°ç¨å¯¹åºç¬è«è·åå¹¶è§£æå 容
- æ ¼å¼è½¬æ¢ – çæ JSON å Markdown
- è¾åºæä»¶ – ä¿åå°æå®ç®å½
è¾åºæ ¼å¼
JSON ç»æ
{
"title": "æç« æ é¢",
"news_url": "åå§é¾æ¥",
"news_id": "æç« ID",
"meta_info": {
"author_name": "ä½è
/æ¥æº",
"author_url": "",
"publish_time": "2024-01-01 12:00"
},
"contents": [
{"type": "text", "content": "æ®µè½ææ¬", "desc": ""},
{"type": "image", "content": "https://...", "desc": ""},
{"type": "video", "content": "https://...", "desc": ""}
],
"texts": ["段è½1", "段è½2"],
"images": ["å¾çURL1", "å¾çURL2"],
"videos": []
}
Markdown ç»æ
# æç« æ é¢
## æç« ä¿¡æ¯
**ä½è
**: xxx
**å叿¶é´**: 2024-01-01 12:00
**åæé¾æ¥**: [龿¥](URL)
---
## æ£æå
容
段è½å
容...

---
## åªä½èµæº
### å¾ç (N)
1. URL1
2. URL2
使ç¨ç¤ºä¾
æåå¾®ä¿¡å ¬ä¼å·æç«
uv run .claude/skills/china-news-crawler/scripts/extract_news.py \
"https://mp.weixin.qq.com/s/ebMzDPu2zMT_mRgYgtL6eQ"
è¾åº:
[INFO] Platform detected: wechat (微信å
¬ä¼å·)
[INFO] Extracting content...
[INFO] Title: æç« æ é¢
[INFO] Author: å
¬ä¼å·åç§°
[INFO] Text paragraphs: 15
[INFO] Images: 3
[SUCCESS] Saved: ./output/ebMzDPu2zMT_mRgYgtL6eQ.json
[SUCCESS] Saved: ./output/ebMzDPu2zMT_mRgYgtL6eQ.md
æå仿¥å¤´æ¡æç«
uv run .claude/skills/china-news-crawler/scripts/extract_news.py \
"https://www.toutiao.com/article/7434425099895210546/"
ä¾èµè¦æ±
æ¬ Skill éè¦ä»¥ä¸ Python å ï¼é常已å¨ä¸»é¡¹ç®ä¸å®è£ ï¼ï¼
- parsel
- pydantic
- requests
- curl-cffi
- tenacity
- demjson3
é误å¤ç
| é误类å | 说æ | è§£å³æ¹æ¡ |
|---|---|---|
æ æ³è¯å«è¯¥å¹³å° |
URL ä¸å¹é 任使¯æçå¹³å° | æ£æ¥ URL æ¯å¦æ£ç¡® |
å¹³å°ä¸æ¯æ |
éä¸å½ç«ç¹ | æ¬ Skill ä» æ¯æä¸å½æ°é»ç«ç¹ |
æå失败 |
ç½ç»é误æé¡µé¢ç»æåå | éè¯ææ£æ¥ URL æææ§ |
注æäºé¡¹
- ä» ç¨äºæè²åç ç©¶ç®ç
- ä¸è¦è¿è¡å¤§è§æ¨¡ç¬å
- å°éç®æ ç½ç«ç robots.txt åæå¡æ¡æ¬¾
- å¾®ä¿¡å ¬ä¼å·å¯è½éè¦ææç Cookieï¼å½åé»è®¤é ç½®é常å¯ç¨ï¼
ç®å½ç»æ
china-news-crawler/
âââ SKILL.md # [å¿
é] Skill å®ä¹æä»¶
âââ references/
â âââ platform-patterns.md # å¹³å° URL 模å¼è¯´æ
âââ scripts/
âââ extract_news.py # CLI å
¥å£èæ¬
âââ models.py # æ°æ®æ¨¡å
âââ detector.py # 平尿£æµ
âââ formatter.py # Markdown æ ¼å¼å
âââ crawlers/ # ç¬è«æ¨¡å
âââ __init__.py
âââ base.py # BaseNewsCrawler åºç±»
âââ fetchers.py # HTTP è·åçç¥
âââ wechat.py # 微信å
¬ä¼å·
âââ toutiao.py # 仿¥å¤´æ¡
âââ netease.py # ç½ææ°é»
âââ sohu.py # æçæ°é»
âââ tencent.py # è
¾è®¯æ°é»