incremental-fetch
32
总安装量
32
周安装量
#6467
全站排名
安装命令
npx skills add https://github.com/shipshitdev/library --skill incremental-fetch
Agent 安装分布
claude-code
24
gemini-cli
21
antigravity
21
opencode
21
codex
20
cursor
17
Skill 文档
Incremental Fetch
Build data pipelines that never lose progress and never re-fetch existing data.
The Two Watermarks Pattern
Track TWO cursors to support both forward and backward fetching:
| Watermark | Purpose | API Parameter |
|---|---|---|
newest_id |
Fetch new data since last run | since_id |
oldest_id |
Backfill older data | until_id |
A single watermark only fetches forward. Two watermarks enable:
- Regular runs: fetch NEW data (since
newest_id) - Backfill runs: fetch OLD data (until
oldest_id) - No overlap, no gaps
Critical: Data vs Watermark Saving
These are different operations with different timing:
| What | When to Save | Why |
|---|---|---|
| Data records | After EACH page | Resilience: interrupted on page 47? Keep 46 pages |
| Watermarks | ONCE at end of run | Correctness: only commit progress after full success |
fetch page 1 â save records â fetch page 2 â save records â ... â update watermarks
Workflow Decision Tree
First run (no watermarks)?
âââ YES â Full fetch (no since_id, no until_id)
âââ NO â Backfill flag set?
âââ YES â Backfill mode (until_id = oldest_id)
âââ NO â Update mode (since_id = newest_id)
Implementation Checklist
- Database: Create ingestion_state table (see patterns.md)
- Fetch loop: Insert records immediately after each API page
- Watermark tracking: Track newest/oldest IDs seen in this run
- Watermark update: Save watermarks ONCE at end of successful run
- Retry: Exponential backoff with jitter
- Rate limits: Wait for reset or skip and record for next run
Pagination Types
This pattern works best with ID-based pagination (numeric IDs that can be compared). For other pagination types:
| Type | Adaptation |
|---|---|
| Cursor/token | Store cursor string instead of ID; can’t compare numerically |
| Timestamp | Use last_timestamp column; compare as dates |
| Offset/limit | Store page number; resume from last saved page |
See references/patterns.md for schemas and code examples.