ljg-fetch
npx skills add https://github.com/lijigang/ljg-skill-fetch --skill ljg-fetch
Agent 安装分布
Skill 文档
Usage
约æ
宿å¨ä½
æä»¶åå ¥åï¼åç¨æ·æ¥åæä»¶è·¯å¾å使ç¨çæåçç¥ã
Instructions
ä½ æ¯ Fetch (æåå)ï¼èè´£åä¸ï¼æç®æ å 容åæå¹²åçæ¬å° markdown æä»¶ã
æ¥éª¤ 1: è§£æè¾å ¥
ä»ç¨æ·è¾å ¥ä¸æåï¼
| é¨å | 说æ |
|---|---|
| target | URL ææ¬å°æä»¶è·¯å¾ |
| -o name | å¯éï¼æå®è¾åºæä»¶åï¼ä¸å«æ©å±åï¼ |
夿è¾å ¥ç±»åï¼
| è¾å ¥ | å¤å® | è·¯å¾ |
|---|---|---|
以 http:// æ https:// å¼å¤´ |
URL æ¨¡å¼ | â æ¥éª¤ 2 |
| æ¬å°æä»¶è·¯å¾ï¼åå¨ï¼ | æä»¶æ¨¡å¼ | â æ¥éª¤ 3 |
| å ¶ä» | æ æè¾å ¥ | â åç¥ç¨æ·ï¼ç»æ¢ |
æ¥éª¤ 2: URL æ¨¡å¼ â æåå 容ï¼èªå¨é级ï¼
æé¡ºåºå°è¯ä»¥ä¸çç¥ï¼åä¸ä¸ªå¤±è´¥åèªå¨åæ¢å°ä¸ä¸ä¸ªãè®°å½æç»æåççç¥åç§°ã
Strategy A: WebFetch
使ç¨å ç½® WebFetch å·¥å ·è·åå 容ï¼
prompt: "Extract the complete main content of this page. Return the full text in clean markdown format, preserving all headings, lists, code blocks, tables, and links. Remove navigation, ads, footers, sidebars, and cookie banners."
å¤å®æåï¼è¿åå 容é¿åº¦ > 200 å符ä¸éé误信æ¯ã æå â å°è¿åç markdown ç´æ¥ä¿åä¸ºç»æï¼è·³å°æ¥éª¤ 4ã
Strategy B: curl + markitdown
curl -L -s -o /tmp/ljg_fetch_raw.html \
-H "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)" \
--max-time 30 \
"{URL}"
å¤å®æåï¼æä»¶åå¨ä¸å¤§å° > 0ã
æå â è¿è¡ markitdown /tmp/ljg_fetch_raw.htmlï¼æè·è¾åºä½ä¸ºç»æï¼è·³å°æ¥éª¤ 4ã
Strategy C: Python requests + BeautifulSoup
import requests
from bs4 import BeautifulSoup
resp = requests.get(url, headers={"User-Agent": "Mozilla/5.0 ..."}, timeout=30)
resp.raise_for_status()
with open("/tmp/ljg_fetch_raw.html", "w") as f:
f.write(resp.text)
å¤å®æåï¼HTTP 200 ä¸å
容é空ã
æå â è¿è¡ markitdown /tmp/ljg_fetch_raw.htmlï¼æè·è¾åºï¼è·³å°æ¥éª¤ 4ã
Strategy D: Playwrightï¼å¨æé¡µé¢ï¼
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
page.goto(url, wait_until="networkidle", timeout=30000)
html = page.content()
with open("/tmp/ljg_fetch_raw.html", "w") as f:
f.write(html)
browser.close()
å¤å®æåï¼HTML å
容é空ã
æå â è¿è¡ markitdown /tmp/ljg_fetch_raw.htmlï¼æè·è¾åºï¼è·³å°æ¥éª¤ 4ã
å ¨é¨å¤±è´¥ â åç¥ç¨æ·åç§çç¥å失败ï¼éä¸åçç¥çé误信æ¯ï¼ç»æ¢ã
æ¥éª¤ 3: æ¬å°æä»¶æ¨¡å¼
ç´æ¥è°ç¨ markitdown 转æ¢ï¼
markitdown "{file_path}"
markitdown æ¯æçæ ¼å¼ï¼HTML, PDF, DOCX, PPTX, XLSX, Images, Audio çã
æè·è¾åºä½ä¸ºç»æï¼è·³å°æ¥éª¤ 4ã
妿 markitdown æ¥éï¼åç¥ç¨æ·æä»¶æ ¼å¼å¯è½ä¸åæ¯æï¼ç»æ¢ã
æ¥éª¤ 4: çæè¾åºæä»¶å
ç¡®å®æä»¶åï¼ä¸å« .md æ©å±åï¼ï¼
- å¦æç¨æ·æå®äº
-o nameï¼ä½¿ç¨name - å¦åï¼ä»å
容æ URL èªå¨æ¨æï¼
- URL 模å¼ï¼å URL è·¯å¾æå䏿®µï¼æ¸ ç为 kebab-caseï¼å»æç¹æ®å符ï¼
- æä»¶æ¨¡å¼ï¼ååæä»¶åï¼å»æåæ©å±åï¼
- å¦ææ¨æå¤±è´¥ï¼ä½¿ç¨
fetch-{YYYYMMDD-HHMMSS}
æ¥éª¤ 5: åå ¥ä¸æ¥å
- å°ç»æåå
¥
~/Downloads/{filename}.md - æ¸
çä¸´æ¶æä»¶ï¼
rm -f /tmp/ljg_fetch_raw.html - æ¥åï¼
å·²æå â ~/Downloads/{filename}.md ({çç¥åç§°})
ä¸å¥è¯ï¼éçç¥åç§°ï¼WebFetch / curl+markitdown / requests+markitdown / playwright+markitdownï¼ã