openalex
npx skills add https://github.com/ondata/skills --skill openalex
Agent 安装分布
Skill 文档
OpenAlex
Use this skill to run reliable OpenAlex API workflows from shell.
IMPORTANT: Always write
curlcommands on a single line. Multi-line\continuation breaks argument parsing in agent environments and will cause errors.
Definition of Done
A task is complete when:
Results
- The API returns at least one result (or a clear “no results found” message)
- Each result shows: title (
display_name), year, citation count - Output is readable â not a raw JSON blob
Process
curlwritten on a single lineapi_keyincluded in every requestselect=used to limit returned fieldsjqused to format output
PDF download (when requested)
- If PDF is available: file saved locally, path printed
- If PDF is not available: clear message, exit code 2, no crash
Quick Start
- Export API key:
export OPENALEX_API_KEY='...'
- Run list query (works):
curl -sS --get 'https://api.openalex.org/works' --data-urlencode 'search="data quality" AND "open government data"' --data-urlencode 'filter=type:article,from_publication_date:2023-01-01' --data-urlencode 'sort=relevance_score:desc' --data-urlencode 'per-page=200' --data-urlencode 'select=id,display_name,publication_year,cited_by_count,doi' --data-urlencode "api_key=$OPENALEX_API_KEY" | jq '.results[] | {title:.display_name, year:.publication_year, cited_by:.cited_by_count, doi}'
Workflow
- Define entity endpoint (
works,authors,sources, etc.). - Build a
searchblock with boolean logic (AND,OR,NOT, quotes, parentheses). - Add structured
filterconstraints (type/date/language/OA/citation fields). - Restrict output with
select(root-level fields only). - Page results with
pageorcursor=*. - Extract fields via
jqand save/transform as needed.
Iterative Validation Workflow
Use this when building or debugging non-trivial queries.
- Start with a toy query (
per-page=5orper-page=10) and minimalselect=. - Manually inspect 5-10 records for relevance and field quality (
display_name, year, DOI). - Compare a baseline and a variant before scaling:
- baseline:
filter=title.search:"..." - variant:
search=...with same filters
- baseline:
- Tune one parameter at a time (
search,filter,sort,per-page, pagination mode). - Scale only after validation (
per-page=200, thencursor=*for deep pagination). - Log each run: command, key parameters, result count, and quick notes.
Avoid jumping directly from a paper/spec to a full extraction script without this short validation loop.
Query Blocks
title.search=: searches only in the title â use this by default for focused results. Must be passed insidefilter=, not as a standalone parameter:filter=title.search:"your query".search=: full-text search across the entire document â use only when title-only matching is too restrictive.search.semantic=: semantic/conceptual search (costs $0.001/request; requires API key).filter=: exact/structured constraints; comma means AND.sort=:relevance_score:desc,cited_by_count:desc,publication_date:desc, etc.per-page=: 1..200. Default is 25 â always setper-page=200for bulk queries (8Ã fewer API calls).page=: page number for standard pagination.cursor=*: deep pagination beyond first 10k records.select=: reduce payload; nested paths are not allowed inselect.group_by=: aggregate results by a field (e.g.group_by=publication_year,group_by=topics.id).sample=: random sample of N results (e.g.sample=20). Addseed=42for reproducibility.
Filter Syntax
Filters are comma-separated AND conditions. Within a single attribute:
| Logic | Syntax | Example |
|---|---|---|
| AND (comma) | filter=a:x,b:y |
filter=type:article,is_oa:true |
| OR (pipe) | filter=type:article|book |
multiple values for same field |
| NOT (exclamation) | filter=type:!journal-article |
negation |
| Greater than | filter=cited_by_count:>100 |
comparison |
| Less than | filter=publication_year:<2020 |
comparison |
| Range | filter=publication_year:2020-2023 |
inclusive range |
Batch Lookup
Combine up to 50 IDs in one request using the pipe operator â avoid sequential calls:
# Batch DOI lookup (up to 50 per request)
curl -sS --get 'https://api.openalex.org/works' --data-urlencode 'filter=doi:https://doi.org/10.1/abc|https://doi.org/10.2/def' --data-urlencode 'per-page=50' --data-urlencode "api_key=$OPENALEX_API_KEY" | jq '.results[] | {title:.display_name, doi}'
Two-Step Entity Lookup
Names are ambiguous; always resolve to an OpenAlex ID first, then filter.
Step 1 â find the entity ID:
curl -sS --get 'https://api.openalex.org/authors' --data-urlencode 'search=Heather Piwowar' --data-urlencode 'per-page=5' --data-urlencode "api_key=$OPENALEX_API_KEY" | jq '.results[] | {id, display_name}'
Step 2 â use the ID in a filter:
curl -sS --get 'https://api.openalex.org/works' --data-urlencode 'filter=authorships.author.id:A5023888391' --data-urlencode 'per-page=200' --data-urlencode 'select=id,display_name,publication_year,cited_by_count' --data-urlencode "api_key=$OPENALEX_API_KEY" | jq '.results[] | {title:.display_name, year:.publication_year}'
Applies to: authors (authorships.author.id), institutions (authorships.institutions.id), sources/journals (primary_location.source.id). External IDs are also accepted: ORCID, ROR, ISSN, DOI.
PDF Retrieval
For a work ID:
- Fetch work metadata.
- Resolve PDF URL in this order:
.content_urls.pdf.best_oa_location.pdf_url.primary_location.pdf_url- first non-null
.locations[].pdf_url
- Download with
api_keyquery parameter when source iscontent.openalex.org.
Output Format
When displaying results, always show display_name as the title â never use doi or id in its place.
Minimal jq for a results table:
| jq -r '.results[] | [.display_name, .publication_year, .cited_by_count, .doi] | @tsv'
Or as structured objects:
| jq '.results[] | {title: .display_name, year: .publication_year, cited_by: .cited_by_count, doi}'
CSV Export
To save results as a CSV file, use jq with @csv and include a header row:
curl -sS --get 'https://api.openalex.org/works' ... --data-urlencode "api_key=$OPENALEX_API_KEY" | jq -r '["title","year","cited_by","doi"], (.results[] | [.display_name, .publication_year, .cited_by_count, (.doi // "")]) | @csv' > results.csv
Rules:
- Use
// ""for fields that may be null (e.g.doi) â@csvfails on null values. - The header array and data array must have the same number of columns.
- Use
-r(raw output) so@csvproduces plain text, not JSON strings.
Error Handling
Implement exponential backoff on 403 (rate limit) and 500 (server error):
attempt 1 â wait 1s â attempt 2 â wait 2s â attempt 3 â wait 4s â attempt 4 â wait 8s
HTTP codes:
200â success400â invalid parameter or filter syntax; fix the query403â rate limit exceeded; back off and retry404â entity not found500â temporary server error; retry with backoff
Endpoint Costs
With the free $1/day budget:
| Request type | Cost | Daily limit |
|---|---|---|
Singleton (/works/W123) |
free | unlimited |
| List / filter | $0.0001 | ~10,000 requests |
| Search (full-text or semantic) | $0.001 | ~1,000 requests |
PDF download (content.openalex.org) |
$0.01 | ~100 downloads |
Use select= and per-page=200 to minimize request count.
Common Pitfalls
- Do not sort by
relevance_scorewithout a search query. - Do not use nested fields in
select(example: useopen_access, then parse.open_access.is_oawithjq). - Do not filter by entity names directly â use the two-step entity lookup to get the ID first.
- Do not use sequential calls for batch ID lookups â batch up to 50 with the pipe operator.
- Do not use
per-page=25(default) for bulk extraction â always setper-page=200. - Expect some records to have no downloadable PDF.
search=searches full text and can return loosely related results. Usetitle.search=when the topic must appear in the title.- Always write
curlcommands on a single line â multi-line\continuation breaks argument parsing in agent environments. title.searchis NOT a valid standalone parameter â always pass it insidefilter=:filter=title.search:"your query".- Always include
api_key=$OPENALEX_API_KEYin every request.
Resources
- Query recipes and jq snippets:
references/query-recipes.md - Generic query helper:
scripts/openalex_query.sh - PDF downloader for work IDs:
scripts/openalex_download_pdf.sh