news-extractor
3
总安装量
3
周安装量
#55869
全站排名
安装命令
npx skills add https://github.com/jackjin1997/clawforge --skill news-extractor
Agent 安装分布
mcpjam
3
claude-code
3
replit
3
junie
3
windsurf
3
zencoder
3
Skill 文档
News Extractor Skill
ä»ä¸»æµæ°é»å¹³å°æåæç« å 容ï¼è¾åº JSON å Markdown æ ¼å¼ã
æ¯æå¹³å°
| å¹³å° | ID | URL ç¤ºä¾ |
|---|---|---|
| å¾®ä¿¡å ¬ä¼å· | https://mp.weixin.qq.com/s/xxxxx |
|
| 仿¥å¤´æ¡ | toutiao | https://www.toutiao.com/article/123456/ |
| ç½ææ°é» | netease | https://www.163.com/news/article/ABC123.html |
| æçæ°é» | sohu | https://www.sohu.com/a/123456_789 |
| è ¾è®¯æ°é» | tencent | https://news.qq.com/rain/a/20251016A07W8J00 |
ä¾èµå®è£
æ¬ skill ä½¿ç¨ uv 管çä¾èµã馿¬¡ä½¿ç¨åéè¦å®è£ ï¼
# 项ç®çº§ä½¿ç¨ï¼æ¨èï¼
cd .claude/skills/news-extractor
uv sync
# æç¨æ·çº§ä½¿ç¨
cd ~/.claude/skills/news-extractor
uv sync
说æ: æ¤ skill 坿¾ç½®å¨é¡¹ç®çº§ (
.claude/skills/) æç¨æ·çº§ (~/.claude/skills/) ç®å½ã项ç®çº§ä¾¿äºå¢éå ±äº«ï¼ç¨æ·çº§ä¾¿äºè·¨é¡¹ç®å¤ç¨ã
éè¦: ææèæ¬å¿
é¡»ä½¿ç¨ uv run æ§è¡ï¼ä¸è¦ç´æ¥ç¨ python è¿è¡ãuv run ä¼èªå¨ä½¿ç¨é¡¹ç®èæç¯å¢ä¸çä¾èµã
ä¾èµå表
| å å | ç¨é |
|---|---|
| pydantic | æ°æ®æ¨¡åéªè¯ |
| requests | HTTP è¯·æ± |
| curl_cffi | æµè§å¨æ¨¡ææå |
| tenacity | éè¯æºå¶ |
| parsel | HTML/XPath è§£æ |
| demjson3 | éæ å JSON è§£æ |
ä½¿ç¨æ¹å¼
åºæ¬ç¨æ³
# æåæ°é»ï¼èªå¨æ£æµå¹³å°ï¼è¾åº JSON + Markdown
uv run .claude/skills/news-extractor/scripts/extract_news.py "URL"
# æå®è¾åºç®å½
uv run .claude/skills/news-extractor/scripts/extract_news.py "URL" --output ./output
# ä»
è¾åº JSON
uv run .claude/skills/news-extractor/scripts/extract_news.py "URL" --format json
# ä»
è¾åº Markdown
uv run .claude/skills/news-extractor/scripts/extract_news.py "URL" --format markdown
# ååºæ¯æçå¹³å°
uv run .claude/skills/news-extractor/scripts/extract_news.py --list-platforms
è¾åºæä»¶
èæ¬é»è®¤è¾åºä¸¤ç§æ ¼å¼å°æå®ç®å½ï¼é»è®¤ ./outputï¼ï¼
{news_id}.json– ç»æå JSON æ°æ®{news_id}.md– Markdown æ ¼å¼æç«
工使µç¨
- æ¥æ¶ URL – ç¨æ·æä¾æ°é»é¾æ¥
- 平尿£æµ – èªå¨è¯å«å¹³å°ç±»å
- å 容æå – è°ç¨å¯¹åºç¬è«è·åå¹¶è§£æå 容
- æ ¼å¼è½¬æ¢ – çæ JSON å Markdown
- è¾åºæä»¶ – ä¿åå°æå®ç®å½
è¾åºæ ¼å¼
JSON ç»æ
{
"title": "æç« æ é¢",
"news_url": "åå§é¾æ¥",
"news_id": "æç« ID",
"meta_info": {
"author_name": "ä½è
/æ¥æº",
"author_url": "",
"publish_time": "2024-01-01 12:00"
},
"contents": [
{"type": "text", "content": "æ®µè½ææ¬", "desc": ""},
{"type": "image", "content": "https://...", "desc": ""},
{"type": "video", "content": "https://...", "desc": ""}
],
"texts": ["段è½1", "段è½2"],
"images": ["å¾çURL1", "å¾çURL2"],
"videos": []
}
Markdown ç»æ
# æç« æ é¢
## æç« ä¿¡æ¯
**ä½è
**: xxx
**å叿¶é´**: 2024-01-01 12:00
**åæé¾æ¥**: [龿¥](URL)
---
## æ£æå
容
段è½å
容...

---
## åªä½èµæº
### å¾ç (N)
1. URL1
2. URL2
使ç¨ç¤ºä¾
æåå¾®ä¿¡å ¬ä¼å·æç«
uv run .claude/skills/news-extractor/scripts/extract_news.py \
"https://mp.weixin.qq.com/s/ebMzDPu2zMT_mRgYgtL6eQ"
è¾åº:
[INFO] Platform detected: wechat (微信å
¬ä¼å·)
[INFO] Extracting content...
[INFO] Title: æç« æ é¢
[INFO] Author: å
¬ä¼å·åç§°
[INFO] Text paragraphs: 15
[INFO] Images: 3
[SUCCESS] Saved: ./output/ebMzDPu2zMT_mRgYgtL6eQ.json
[SUCCESS] Saved: ./output/ebMzDPu2zMT_mRgYgtL6eQ.md
æå仿¥å¤´æ¡æç«
uv run .claude/skills/news-extractor/scripts/extract_news.py \
"https://www.toutiao.com/article/7434425099895210546/"
é误å¤ç
| é误类å | 说æ | è§£å³æ¹æ¡ |
|---|---|---|
æ æ³è¯å«è¯¥å¹³å° |
URL ä¸å¹é 任使¯æçå¹³å° | æ£æ¥ URL æ¯å¦æ£ç¡® |
å¹³å°ä¸æ¯æ |
鿝æçç«ç¹ | æ¬ Skill ä» æ¯æååºçæ°é»ç«ç¹ |
æå失败 |
ç½ç»é误æé¡µé¢ç»æåå | éè¯ææ£æ¥ URL æææ§ |
注æäºé¡¹
- ä» ç¨äºæè²åç ç©¶ç®ç
- ä¸è¦è¿è¡å¤§è§æ¨¡ç¬å
- å°éç®æ ç½ç«ç robots.txt åæå¡æ¡æ¬¾
- å¾®ä¿¡å ¬ä¼å·å¯è½éè¦ææç Cookieï¼å½åé»è®¤é ç½®é常å¯ç¨ï¼