content-extract
33
总安装量
3
周安装量
#11079
全站排名
安装命令
npx skills add https://github.com/blessonism/openclaw-search-skills --skill content-extract
Agent 安装分布
openclaw
2
replit
1
amp
1
kimi-cli
1
codex
1
Skill 文档
content-extract â ä¸å±å 容解æå ¥å£ï¼MCP è¯ä¹å¯¹é½ï¼ä½ä¸è· MCP Serverï¼
ç®æ ï¼æâç»æä¸ä¸ª URL â 产åºå¯è¯» Markdown + å¯è¿½æº¯å ¥å£âåæä¸ä¸ªç»ä¸å ¥å£ï¼ä¾åç»ææä¸å¡ skillï¼github-explorerãåä½ç±» skillsãæ¥æ¥çï¼å¤ç¨ã
æ ¸å¿ååï¼æ¥èªä½ åç Excel Skill æè§£æç« çå¯åï¼ï¼
- è¡ä¸ºè§çº¦å±ï¼æ°¸è¿ç»åºå¯è¿½æº¯å ¥å£ï¼åæ URL + è§£æäº§ç©è·¯å¾/龿¥ï¼ï¼ç»ä¸ç¼é æ¥æºã
- Token æ¢éï¼å ç¨ä½ææ¬ probe 夿å¯ä¸å¯ä»¥ç´æ¥æï¼ä¸è¡åèµ°éè§£æï¼MinerUï¼ã
- åå¼¹æºå¶ï¼å¤±è´¥æ¶è¿åâä¸ä¸æ¥å¨ä½å»ºè®®âï¼è䏿¯ä¸å å¼å¸¸æ ã
工使µï¼Decision Treeï¼
è¾å
¥ï¼url
- Domain Whitelistï¼è·³è¿ probeï¼ï¼è¥ URL å±äºé«æ¦çåç¬/卿ç«ç¹ï¼å¾®ä¿¡/ç¥ä¹çï¼ï¼ç´æ¥èµ° MinerU
- ç½ååæä»¶ï¼
references/domain-whitelist.md - 对å½ä¸ç½ååç URLï¼å¼ºå¶
model_version=MinerU-HTML
- Probeï¼ä½ææ¬ï¼ï¼ä¼å
ç¨
web_fetch(url)
- ç®æ ï¼æ¿å°æ£æ markdownï¼ä¾¿å®ãå¿«ï¼
- 夿â失败/ä¸åæ ¼âæ¡ä»¶ï¼è§
references/heuristics.mdï¼å æ¬ï¼- 403/401/åç¬
- åªæâç¯å¢å¼å¸¸/éªè¯ç /请å¨å¾®ä¿¡æå¼âçæç¤º
- å 容æç/ææ¾å¯¼èªé¡µ/ä¸¢æ£æ
- Fallbackï¼é«ä¿çï¼ï¼èµ° MinerU 宿¹ API
- è°ç¨ä¸æ¸¸ driverï¼
skills/mineru-extract/scripts/mineru_parse_documents.py - 对 HTML 页é¢ï¼å¾®ä¿¡çï¼ï¼å¼ºå¶
model_version=MinerU-HTML
- è¾åºç»ä¸ç»æååï¼Result Contractï¼
æ è®ºç¨ probe è¿æ¯ MinerUï¼é½è¿ååä¸å¥ç»æï¼
{
"ok": true,
"source_url": "...",
"engine": "web_fetch" ,
"markdown": "...",
"artifacts": {
"out_dir": "...",
"markdown_path": "...",
"zip_path": "..."
},
"sources": [
"åæURL",
"ï¼å¦ä½¿ç¨MinerUï¼MinerU full_zip_url",
"ï¼å¦ä½¿ç¨MinerUï¼æ¬å°markdown_path"
],
"notes": ["ä»»ä½éè¦éå¶/失败åå /ä¸ä¸æ¥å»ºè®®"]
}
注æï¼
engineå¯è½æ¯web_fetchæmineruã
MinerU è°ç¨ï¼ç» agent çç¡®å®æ§èæ¬ï¼
å½éè¦ MinerU æ¶ï¼ç¨è¿ä¸ªå½ä»¤ï¼è¿å JSONï¼ä¸å¯æ markdown å èè¿ JSONï¼ä¾¿äºä¸æ¸¸æ»ç»ï¼ï¼
python3 mineru-extract/scripts/mineru_parse_documents.py \
--file-sources "<URL>" \
--model-version MinerU-HTML \
--emit-markdown --max-chars 20000
è·¯å¾è¯´æ: ä¸è¿°å½ä»¤åè®¾ä½ å¨ skills å®è£ æ ¹ç®å½ä¸æ§è¡ã妿 mineru-extract å®è£ å¨å ¶ä»ä½ç½®ï¼è¯·æ¿æ¢ä¸ºå®é è·¯å¾ã
交ä»è§èï¼å¼ºå¶ï¼
- è¾åºå¿
é¡»å
å«
sourcesï¼åæå ¥å£ + è§£æäº§ç©å ¥å£ï¼ã - 妿 MinerU æåï¼å¿
é¡»æ
markdown_pathï¼æ¬å°è·¯å¾ï¼åè¿sourcesï¼æ¹ä¾¿å¤æ¥ã - 妿䏤æ¡é¾è·¯é½å¤±è´¥ï¼å¿ é¡»æç¡®å¤±è´¥åå ï¼å¹¶ç»åºä¸ä¸æ¥ï¼ä¾å¦ï¼è®© Boss æä¾å¯è®¿é®éå龿¥ / å 许æç¨æµè§å¨ relay å¯¼åº HTML / èµ°ä¸ä¼ HTML æä»¶è§£æçå åºæ¹æ¡ï¼ã
æ¬ skill èªèº«ä¸åä»ä¹
- ä¸è· MCP Serverï¼é¿å 常驻æå¡ä¸è¿ç»´è´æ ï¼
- ä¸è¯å¾ç»è¿ç»å½/éªè¯ç ï¼è¿å±äºè®¿é®å±é®é¢ï¼æä»¬åªåè§£æå±å工使µè·¯ç±ï¼
References
- MinerU API docs: https://mineru.net/apiManage/docs
- MinerU output files: https://opendatalab.github.io/MinerU/reference/output_files/