setup-scheduled-scraper
1
总安装量
1
周安装量
#44842
全站排名
安装命令
npx skills add https://github.com/sawyerh/agents --skill setup-scheduled-scraper
Skill 文档
Setup Scheduled Scraper
Overview
Build a local, scheduled scraper that runs via Playwright and writes JSON results, with an optional Next.js viewer for tables/charts. Default stack: TypeScript, Playwright test runner, Next.js App Router, Tailwind v4, Shadcn UI, and launchd scheduling.
Workflow
Example Project Structure
project/
âââ src/
â âââ app/
â â âââ layout.tsx # Next.js root layout
â â âââ page.tsx # Viewer entry page
â âââ lib/ # Viewer helpers
â âââ scraper.ts # Playwright entry (called by test spec)
â âââ scrape.spec.ts # Playwright spec that invokes scraper
âââ scripts/
â âââ run_scrape_daily.sh # Scheduled wrapper (logs + npm run scrape)
â âââ update-schedule.sh # Updates launchd schedule times
â âââ schedule-wakes.sh # Optional pmset wake scheduling
âââ src/launchd/
â âââ com.example.scraper.plist # LaunchAgent schedule
â âââ com.example.scraper-wake.plist # LaunchDaemon wake helper
âââ results.json # Scheduled output (read-only)
âââ results-local.json # Manual run output
âââ scraper-metadata.json # Run metadata
âââ package.json
âââ tsconfig.json
âââ README.md
- Intake the request (read
references/intake.md). - Scaffold the project (Next.js app + Playwright + TypeScript).
- Implement the scraper pipeline (URLs -> parsed data -> JSON).
- Add the optional viewer (read-only).
- Add scheduling + logging with launchd.
- Verify manual run, schedule, and viewer.
Data conventions
- Use
results.jsonfor scheduled runs; useresults-local.jsonfor manual runs. - Support overriding the output path via
SCRAPE_RESULTS_PATH. - Store run metadata in
scraper-metadata.json(timestamp, counts, errors).
Example JSON
results.json (array of records):
[
{
"url": "https://example.com/scoreboard/some-unique-id",
"title": "Knicks at Lakers",
"game_start_time": "2026-02-01T19:00:00-08:00",
"scraped_at": "2026-02-01T07:00:12-08:00"
},
{
"url": "https://example.com/scoreboard/some-unique-id-2",
"title": "Bucks at Warriors",
"game_start_time": "2026-02-01T21:30:00-08:00",
"scraped_at": "2026-02-01T07:00:12-08:00"
}
]
scraper-metadata.json:
{"last_scraped_at": "2026-02-01T07:00:12-08:00"}
Scheduling (macOS launchd)
- Use a LaunchAgent to run a wrapper script at scheduled times.
- Keep the LaunchAgent plist in the repo and symlink it into
~/Library/LaunchAgents. - Wrapper script sets
PATH, logs JSON lines to~/Library/Logs, and runsnpm run scrape. - If the user wants wake-from-sleep, add a LaunchDaemon +
pmset schedule wakeorpoweronhelper. - For wake scheduling, copy the LaunchDaemon plist into
/Library/LaunchDaemons(not a symlink) and set ownership toroot:wheel. - Provide an
update-schedule.shhelper to editStartCalendarIntervalwith two daily times. If more than two times are needed, ask before expanding the schedule logic.
Multi-project notes
- Ensure each project has a unique LaunchAgent label and plist filename.
- Use distinct log file paths per project.
- If using a wake LaunchDaemon, give it a unique label and owner tag.
Viewer guidelines
- Use Next.js App Router and keep the UI read-only.
- Prefer Shadcn components and Tailwind defaults; avoid extra overrides.
- Derive filtered subsets once, then compute metrics/views from those subsets.
Verification
- Manual run:
npm run scrape(andnpm run scrape:uifor Playwright UI). - Viewer:
npm run dev. - Schedule checks:
launchctl listandpmset -g sched. - Logs:
tail -n 200 ~/Library/Logs/<project>.out.log ~/Library/Logs/<project>.err.log.
References
references/intake.mdreferences/checklists.md