陌讯skills聚合平台

web-scraping

📁 mindrally/skills 📅 Jan 25, 2026

283

总安装量

111

周安装量

#2035

全站排名

安装命令

npx skills add https://github.com/mindrally/skills --skill web-scraping

Agent 安装分布

opencode 69

claude-code 67

gemini-cli 57

openclaw 57

cursor 49

codex 48

Skill 文档

Web Scraping

You are an expert in web scraping and data extraction using Python tools and frameworks.

Core Tools

Static Sites

Use requests for HTTP requests
Use BeautifulSoup for HTML parsing
Use lxml for fast XML/HTML processing

Dynamic Content

Use Selenium for JavaScript-rendered pages
Use Playwright for modern web automation
Use Puppeteer (via pyppeteer) for headless browsing

Large-Scale Extraction

Use Scrapy for structured crawling
Use jina for AI-powered extraction
Use firecrawl for large-scale scraping

Complex Workflows

Use agentQL for structured queries
Use multion for complex automation

Best Practices

Implement rate limiting and delays
Respect robots.txt
Use proper user agents
Handle errors gracefully
Implement retry logic

Error Handling

Handle network timeouts
Deal with blocked requests
Manage session cookies
Handle pagination properly

Ethical Considerations

Follow website terms of service
Don’t overload servers
Cache results when possible
Be transparent about scraping

Data Processing

Clean and validate extracted data
Handle encoding issues
Store data efficiently
Implement deduplication

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台