linkedin-scraper
4
总安装量
4
周安装量
#52243
全站排名
安装命令
npx skills add https://github.com/aspenas/ironclaw-skills --skill linkedin-scraper
Agent 安装分布
codex
4
openclaw
3
github-copilot
3
kimi-cli
3
gemini-cli
3
amp
3
Skill 文档
LinkedIn Scraper â Chrome Profile Web Scraping
Scrape LinkedIn profiles and search results using the user’s authenticated Chrome browser session. No API keys needed â uses the browser tool with the Chrome profile relay.
Prerequisites
- Chrome browser with active LinkedIn login
- Browser relay connected (Chrome extension or openclaw browser profile)
- DuckDB workspace for storing results (optional)
Core Workflow
1. Single Profile Scrape
browser â open LinkedIn profile URL
browser â snapshot (extract structured data)
â Parse: name, headline, title, company, location, education, experience, connections, about
â Return structured JSON or insert into DuckDB
2. Search + Bulk Scrape
browser â open LinkedIn search URL with filters
browser â snapshot (extract result cards)
â Parse each result: name, title, company, profile URL
â For each profile URL: open â snapshot â parse full profile
â Batch insert into DuckDB
3. Company Page Scrape
browser â open LinkedIn company page
â Parse: company name, industry, size, description, specialties, employee count
â Navigate to /people tab for employee list
Implementation Rules
Rate Limiting (CRITICAL)
- Minimum 3-5 second delay between page loads
- Maximum 80 profiles per session (LinkedIn rate limits)
- Randomize delays between 3-8 seconds (avoid detection)
- After every 20 profiles, take a 60-second break
- If CAPTCHA or “unusual activity” detected, stop immediately and alert user
Stealth Patterns
- Use natural scrolling (scroll down slowly, pause, scroll more)
- Don’t scrape the same search results page more than twice
- Vary the order of profile visits (don’t go sequentially)
- Close and reopen tabs periodically
Data Extraction â Profile Page
From a LinkedIn profile snapshot, extract these fields:
| Field | Location | Notes |
|---|---|---|
| name | Main heading h1 | Full name |
| headline | Below name | Title + Company usually |
| location | Location section | City, State/Country |
| current_title | Experience section, first entry | Most recent role |
| current_company | Experience section, first entry | Company name |
| education | Education section | School, degree, dates |
| connections | Connections count | Number or “500+” |
| about | About section | Bio text (may need “see more” click) |
| experience | Experience section | All roles with dates |
| profile_url | Browser URL bar | Canonical LinkedIn URL |
Data Extraction â Search Results
From LinkedIn search results page:
| Field | Location |
|---|---|
| name | Result card heading |
| headline | Below name in card |
| location | Card metadata |
| profile_url | Link href on name |
| mutual_connections | Card footer |
Search URL Patterns
# People search
https://www.linkedin.com/search/results/people/?keywords={query}
# With filters
&geoUrn=%5B%22103644278%22%5D # United States
&network=%5B%22F%22%2C%22S%22%5D # 1st + 2nd connections
¤tCompany=%5B%22{company_id}%22%5D # Current company
&schoolFilter=%5B%22{school_id}%22%5D # School filter
# YC founders (common query)
https://www.linkedin.com/search/results/people/?keywords=Y%20Combinator%20founder
# Company employees
https://www.linkedin.com/company/{slug}/people/
DuckDB Integration
When storing to DuckDB, use the Ironclaw workspace database:
-- Check if leads/contacts object exists
SELECT * FROM objects WHERE name = 'leads' OR name = 'contacts';
-- Insert via the EAV pattern or direct pivot view
INSERT INTO v_leads ("Name", "Title", "Company", "LinkedIn URL", "Location", "Source")
VALUES (?, ?, ?, ?, ?, 'LinkedIn Scrape');
If no suitable object exists, create one:
-- Use Ironclaw's object creation pattern from the dench skill
Error Handling
| Error | Action |
|---|---|
| “Sign in” page | LinkedIn session expired â alert user to re-login in Chrome |
| CAPTCHA / Security check | Stop immediately, wait 30+ min, alert user |
| “Profile not found” | Skip, log URL as invalid |
| Rate limit (429) | Stop, wait 15 min, retry with longer delays |
| Empty snapshot | Page still loading â wait 3s and re-snapshot |
Output Formats
JSON (default)
{
"name": "Jane Doe",
"headline": "CEO at Acme Corp",
"current_title": "CEO",
"current_company": "Acme Corp",
"location": "San Francisco, CA",
"linkedin_url": "https://www.linkedin.com/in/janedoe",
"connections": "500+",
"education": [{"school": "Stanford", "degree": "BS CS", "years": "2010-2014"}],
"experience": [{"title": "CEO", "company": "Acme Corp", "duration": "2020-Present"}],
"scraped_at": "2026-02-17T14:30:00Z"
}
Progress Reporting
For bulk scrapes, report progress:
Scraping: 15/50 profiles (30%) â Last: Jane Doe (Acme Corp)
Rate: ~4 profiles/min â ETA: 9 min remaining
Safety
- Never scrape private/restricted profiles
- Respect LinkedIn’s robots.txt for public pages
- Store data locally only (DuckDB) â never exfiltrate
- User must have legitimate LinkedIn access
- This tool assists the user’s own manual browsing at scale