oxylabs-web-scraper

📁 oxylabs/agent-skills 📅 9 days ago
3
总安装量
3
周安装量
#62083
全站排名
安装命令
npx skills add https://github.com/oxylabs/agent-skills --skill oxylabs-web-scraper

Agent 安装分布

amp 3
gemini-cli 3
github-copilot 3
codex 3
kimi-cli 3
opencode 3

Skill 文档

Oxylabs Web Scraper API

Authentication

Requires HTTP Basic Auth with credentials from environment variables:

curl -u "$OXY_WSA_USERNAME:$OXY_WSA_PASSWORD" ...

Endpoint

POST https://realtime.oxylabs.io/v1/queries
Content-Type: application/json

Core Parameters

Parameter Required Description
source Yes Target scraper (e.g., universal, amazon_product, google_search)
url Conditional URL to scrape (for universal and *_url sources)
query Conditional Search query or product ID (for *_search and *_product sources)
parse No Enable structured data parsing (recommended for supported sources)
render No JavaScript rendering: html or png
geo_location No Geographic targeting (country, state, or ZIP code)

Quick Start

Scrape any URL:

curl -X POST 'https://realtime.oxylabs.io/v1/queries' \
  -u "$OXY_WSA_USERNAME:$OXY_WSA_PASSWORD" \
  -H 'Content-Type: application/json' \
  -d '{"source": "universal", "url": "https://example.com"}'

Google search with parsing:

curl -X POST 'https://realtime.oxylabs.io/v1/queries' \
  -u "$OXY_WSA_USERNAME:$OXY_WSA_PASSWORD" \
  -H 'Content-Type: application/json' \
  -d '{"source": "google_search", "query": "best laptops", "parse": true}'

Amazon product by ASIN:

curl -X POST 'https://realtime.oxylabs.io/v1/queries' \
  -u "$OXY_WSA_USERNAME:$OXY_WSA_PASSWORD" \
  -H 'Content-Type: application/json' \
  -d '{"source": "amazon_product", "query": "B07FZ8S74R", "parse": true}'

Choosing the Right Source

  1. Use specific sources when available (amazon_product, google_search) – better parsing and reliability
  2. Use universal for unsupported sites – works with any URL
  3. Enable parse: true for structured JSON output on supported sources

Response Structure

{
  "results": [{
    "content": "...",
    "status_code": 200,
    "url": "https://..."
  }]
}

With parse: true, content contains structured data (title, price, reviews, etc.) instead of raw HTML.

Available Sources

For the complete list of 40+ supported sources organized by category, see sources.md.

More Examples

For detailed request/response examples including geo-location, JavaScript rendering, and custom headers, see examples.md.

Error Handling

Code Meaning
200 Success
400 Invalid parameters
401 Authentication failed
403 Access denied
429 Rate limit exceeded

Key Guidelines

  • Always set parse: true for supported sources to get structured data
  • Use ZIP codes for US e-commerce geo-location (e.g., "90210")
  • Use country/state format for search engines (e.g., "California,United States")
  • Add render: "html" for JavaScript-heavy pages