web-scraping
110
总安装量
111
周安装量
#2123
全站排名
安装命令
npx skills add https://github.com/mindrally/skills --skill web-scraping
Agent 安装分布
opencode
69
claude-code
67
gemini-cli
57
openclaw
57
cursor
49
codex
48
Skill 文档
Web Scraping
You are an expert in web scraping and data extraction using Python tools and frameworks.
Core Tools
Static Sites
- Use requests for HTTP requests
- Use BeautifulSoup for HTML parsing
- Use lxml for fast XML/HTML processing
Dynamic Content
- Use Selenium for JavaScript-rendered pages
- Use Playwright for modern web automation
- Use Puppeteer (via pyppeteer) for headless browsing
Large-Scale Extraction
- Use Scrapy for structured crawling
- Use jina for AI-powered extraction
- Use firecrawl for large-scale scraping
Complex Workflows
- Use agentQL for structured queries
- Use multion for complex automation
Best Practices
- Implement rate limiting and delays
- Respect robots.txt
- Use proper user agents
- Handle errors gracefully
- Implement retry logic
Error Handling
- Handle network timeouts
- Deal with blocked requests
- Manage session cookies
- Handle pagination properly
Ethical Considerations
- Follow website terms of service
- Don’t overload servers
- Cache results when possible
- Be transparent about scraping
Data Processing
- Clean and validate extracted data
- Handle encoding issues
- Store data efficiently
- Implement deduplication