setup-scheduled-scraper

📁 sawyerh/agents 📅 12 days ago

总安装量

周安装量

#44842

全站排名

安装命令

npx skills add https://github.com/sawyerh/agents --skill setup-scheduled-scraper

Skill 文档

Setup Scheduled Scraper

Overview

Build a local, scheduled scraper that runs via Playwright and writes JSON results, with an optional Next.js viewer for tables/charts. Default stack: TypeScript, Playwright test runner, Next.js App Router, Tailwind v4, Shadcn UI, and launchd scheduling.

Workflow

Example Project Structure

project/
âââ src/
â   âââ app/
â   â   âââ layout.tsx            # Next.js root layout
â   â   âââ page.tsx              # Viewer entry page
â   âââ lib/                      # Viewer helpers
â   âââ scraper.ts                # Playwright entry (called by test spec)
â   âââ scrape.spec.ts            # Playwright spec that invokes scraper
âââ scripts/
â   âââ run_scrape_daily.sh       # Scheduled wrapper (logs + npm run scrape)
â   âââ update-schedule.sh        # Updates launchd schedule times
â   âââ schedule-wakes.sh         # Optional pmset wake scheduling
âââ src/launchd/
â   âââ com.example.scraper.plist       # LaunchAgent schedule
â   âââ com.example.scraper-wake.plist  # LaunchDaemon wake helper
âââ results.json                  # Scheduled output (read-only)
âââ results-local.json            # Manual run output
âââ scraper-metadata.json         # Run metadata
âââ package.json
âââ tsconfig.json
âââ README.md

Intake the request (read references/intake.md).
Scaffold the project (Next.js app + Playwright + TypeScript).
Implement the scraper pipeline (URLs -> parsed data -> JSON).
Add the optional viewer (read-only).
Add scheduling + logging with launchd.
Verify manual run, schedule, and viewer.

Data conventions

Use results.json for scheduled runs; use results-local.json for manual runs.
Support overriding the output path via SCRAPE_RESULTS_PATH.
Store run metadata in scraper-metadata.json (timestamp, counts, errors).

Example JSON

results.json (array of records):

[
  {
    "url": "https://example.com/scoreboard/some-unique-id",
    "title": "Knicks at Lakers",
    "game_start_time": "2026-02-01T19:00:00-08:00",
    "scraped_at": "2026-02-01T07:00:12-08:00"
  },
  {
    "url": "https://example.com/scoreboard/some-unique-id-2",
    "title": "Bucks at Warriors",
    "game_start_time": "2026-02-01T21:30:00-08:00",
    "scraped_at": "2026-02-01T07:00:12-08:00"
  }
]

scraper-metadata.json:

{"last_scraped_at": "2026-02-01T07:00:12-08:00"}

Scheduling (macOS launchd)

Use a LaunchAgent to run a wrapper script at scheduled times.
Keep the LaunchAgent plist in the repo and symlink it into ~/Library/LaunchAgents.
Wrapper script sets PATH, logs JSON lines to ~/Library/Logs, and runs npm run scrape.
If the user wants wake-from-sleep, add a LaunchDaemon + pmset schedule wakeorpoweron helper.
For wake scheduling, copy the LaunchDaemon plist into /Library/LaunchDaemons (not a symlink) and set ownership to root:wheel.
Provide an update-schedule.sh helper to edit StartCalendarInterval with two daily times. If more than two times are needed, ask before expanding the schedule logic.

Multi-project notes

Ensure each project has a unique LaunchAgent label and plist filename.
Use distinct log file paths per project.
If using a wake LaunchDaemon, give it a unique label and owner tag.

Viewer guidelines

Use Next.js App Router and keep the UI read-only.
Prefer Shadcn components and Tailwind defaults; avoid extra overrides.
Derive filtered subsets once, then compute metrics/views from those subsets.

Verification

Manual run: npm run scrape (and npm run scrape:ui for Playwright UI).
Viewer: npm run dev.
Schedule checks: launchctl list and pmset -g sched.
Logs: tail -n 200 ~/Library/Logs/<project>.out.log ~/Library/Logs/<project>.err.log.

References

references/intake.md
references/checklists.md

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台