web-scraping-automation
62
总安装量
62
周安装量
#3520
全站排名
安装命令
npx skills add https://github.com/aaaaqwq/claude-code-skills --skill web-scraping-automation
Agent 安装分布
claude-code
41
gemini-cli
39
cursor
39
codex
37
antigravity
36
Skill 文档
ç½ç«ç¬åä¸ API èªå¨å
åè½è¯´æ
æ¤æè½ä¸é¨ç¨äºèªå¨åç½ç«æ°æ®ç¬åå API æ¥å£è°ç¨ï¼å æ¬ï¼
- åæåç¬åç½ç«ç»æ
- è°ç¨åæµè¯ REST/GraphQL API
- å建èªå¨åç¬è«èæ¬
- æ°æ®è§£æåæ¸ æ´
- å¤çåç¬è«æºå¶
- 宿¶ä»»å¡åæ°æ®åå¨
使ç¨åºæ¯
- “ç¬åè¿ä¸ªç½ç«ç产å信比
- “帮æè°ç¨è¿ä¸ª API å¹¶è§£æè¿åæ°æ®”
- “å建ä¸ä¸ªèæ¬å®æ¶æåæ°é»”
- “åæè¿ä¸ªç½ç«ç API æ¥å£ææ¡£”
- “ç»è¿è¿ä¸ªç½ç«çåç¬è«éå¶”
ææ¯æ
Python ç¬è«
- requestsï¼HTTP 请æ±åº
- BeautifulSoup4ï¼HTML è§£æ
- Scrapyï¼ä¸ä¸ç¬è«æ¡æ¶
- Seleniumï¼æµè§å¨èªå¨å
- Playwrightï¼ç°ä»£æµè§å¨èªå¨å
JavaScript ç¬è«
- axiosï¼HTTP 客æ·ç«¯
- cheerioï¼æå¡ç«¯ jQuery
- puppeteerï¼Chrome èªå¨å
- node-fetchï¼Fetch API
工使µç¨
-
ç®æ åæï¼
- æ£æ¥ç½ç«ç»æåæ°æ®ä½ç½®
- åæ API æ¥å£åè®¤è¯æ¹å¼
- è¯ä¼°åç¬è«æºå¶
-
æ¹æ¡è®¾è®¡ï¼
- éæ©åéçææ¯æ
- è®¾è®¡æ°æ®æåçç¥
- è§åé误å¤çåéè¯æºå¶
-
èæ¬å¼åï¼
- ç¼åç¬è«ä»£ç
- å®ç°æ°æ®è§£æé»è¾
- æ·»å æ¥å¿åçæ§
-
æµè¯ä¼åï¼
- éªè¯æ°æ®åç¡®æ§
- ä¼åæ§è½åç¨³å®æ§
- å¤çè¾¹çæ åµ
æä½³å®è·µ
- éµå® robots.txt è§å
- 设置åçç请æ±é´é
- ä½¿ç¨ User-Agent å请æ±å¤´
- å®ç°é误éè¯æºå¶
- æ°æ®å»éåéªè¯
- 使ç¨ä»£çæ± ï¼å¦éè¦ï¼
- ä¿ååå§æ°æ®åæ¥å¿
常è§åºæ¯ç¤ºä¾
1. ç®åç½é¡µç¬å
import requests
from bs4 import BeautifulSoup
def scrape_website(url):
headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
# æåæ°æ®
data = []
for item in soup.select('.product'):
data.append({
'title': item.select_one('.title').text,
'price': item.select_one('.price').text
})
return data
2. API è°ç¨
import requests
def call_api(endpoint, params=None):
headers = {
'Authorization': 'Bearer YOUR_TOKEN',
'Content-Type': 'application/json'
}
response = requests.get(endpoint, headers=headers, params=params)
return response.json()
3. 卿ç½é¡µç¬å
from selenium import webdriver
from selenium.webdriver.common.by import By
def scrape_dynamic_page(url):
driver = webdriver.Chrome()
driver.get(url)
# çå¾
页é¢å è½½
driver.implicitly_wait(10)
# æåæ°æ®
elements = driver.find_elements(By.CLASS_NAME, 'item')
data = [elem.text for elem in elements]
driver.quit()
return data
åç¬è«åºå¯¹çç¥
- 请æ±å¤´ä¼ªè£ ï¼æ¨¡æç宿µè§å¨
- 代çè½®æ¢ï¼ä½¿ç¨ä»£çæ±
- éªè¯ç å¤çï¼OCR æç¬¬ä¸æ¹æå¡
- Cookie 管çï¼ç»´æ¤ä¼è¯ç¶æ
- 请æ±é¢çæ§å¶ï¼é¿å 触åéå¶
- JavaScript 渲æï¼ä½¿ç¨ Selenium/Playwright
æ°æ®å卿¹æ¡
- CSV/Excelï¼ç®åæ°æ®å¯¼åº
- JSONï¼ç»æåæ°æ®åå¨
- æ°æ®åºï¼MySQLãPostgreSQLãMongoDB
- äºåå¨ï¼S3ãOSS
- æ°æ®ä»åºï¼ç¨äºå¤§è§æ¨¡æ°æ®åæ