scry
npx skills add https://github.com/exopriors/skills --skill scry
Agent 安装分布
Skill 文档
Scry Skill
Scry gives you read-only SQL access to the ExoPriors public corpus (229M+ entities)
via a single HTTP endpoint. You write Postgres SQL against a curated scry.* schema
and get JSON rows back. There is no ORM, no GraphQL, no pagination token — just SQL.
A) When to use / not use
Use this skill when:
- Searching, filtering, or aggregating content across the ExoPriors corpus
- Running lexical (BM25) or hybrid searches
- Exploring author networks, cross-platform identities, or publication patterns
- Creating shareable artifacts from query results
- Emitting structured agent judgements about entities or external references
Do NOT use this skill when:
- The user wants semantic/vector search composition or embedding algebra (use the vector-composition skill)
- The user wants LLM-based reranking (use the rerank skill)
- The user wants cross-platform people graph traversal (use the people-graph skill)
- The user wants OpenAlex academic helpers (use the openalex skill)
- The user is querying their own local database
B) Golden Rules
-
Schema first. ALWAYS call
GET /v1/scry/schemabefore writing SQL. Never guess column names or types. The schema endpoint returns live column metadata and row-count estimates for every view. -
LIMIT always. Every query MUST include a LIMIT clause. Max 10,000 rows. Queries without LIMIT are rejected by the SQL validator.
-
Prefer materialized views.
scry.entitieshas 229M+ rows. Scanning it without filters is slow. Usescry.mv_lesswrong_posts,scry.mv_arxiv_papers,scry.mv_hackernews_posts, etc. for targeted access. They are pre-filtered and often have embeddings pre-joined. -
Filter dangerous content. Always include
WHERE content_risk IS DISTINCT FROM 'dangerous'unless the user explicitly asks for unfiltered results. Dangerous content contains adversarial prompt-injection payloads. -
Raw SQL, not JSON.
POST /v1/scry/querytakesContent-Type: text/plainwith raw SQL in the body. Not JSON-wrapped SQL.
For full tier limits, timeout policies, and degradation strategies, see Shared Guardrails.
C) Quickstart
One end-to-end example: find recent high-scoring LessWrong posts about RLHF.
Step 1: Get schema
GET https://api.exopriors.com/v1/scry/schema
Authorization: Bearer $EXOPRIORS_KEY
Step 2: Run query
POST https://api.exopriors.com/v1/scry/query
Authorization: Bearer $EXOPRIORS_KEY
Content-Type: text/plain
WITH hits AS (
SELECT id FROM scry.search('RLHF reinforcement learning human feedback',
kinds=>ARRAY['post'], limit_n=>100)
)
SELECT e.uri, e.title, e.original_author, e.original_timestamp, e.score
FROM hits h
JOIN scry.entities e ON e.id = h.id
WHERE e.source = 'lesswrong'
AND e.content_risk IS DISTINCT FROM 'dangerous'
ORDER BY e.score DESC NULLS LAST
LIMIT 20
Response shape:
{
"columns": ["uri", "title", "original_author", "original_timestamp", "score"],
"rows": [["https://...", "My RLHF Post", "author", "2025-01-15T...", 142], ...],
"row_count": 20,
"duration_ms": 312,
"truncated": false
}
D) Decision Tree
User wants to search the ExoPriors corpus?
|
+-- By keywords/phrases? --> scry.search() (BM25 lexical)
| +-- Specific forum? --> pass mode='mv_lesswrong_posts' or kinds filter
| +-- Reddit? --> scry.search_reddit_posts() / search_reddit_comments()
| +-- Large result? --> scry.search_ids() (IDs only, up to 2000)
|
+-- By structured filters (source, date, author)? --> Direct SQL on MVs
|
+-- By semantic similarity? --> (vector-composition skill, not this one)
|
+-- Hybrid (keywords + semantic rerank)? --> scry.hybrid_search() or
| lexical CTE + JOIN scry.embeddings
|
+-- Author/people lookup? --> scry.actors, scry.people, scry.person_aliases
|
+-- Need to share results? --> POST /v1/scry/shares
|
+-- Need to emit a structured observation? --> POST /v1/scry/judgements
E) Recipes
E1. Lexical search (BM25)
WITH c AS (
SELECT id FROM scry.search('your query here',
kinds=>ARRAY['post'], limit_n=>100)
)
SELECT e.uri, e.title, e.original_author, e.original_timestamp
FROM c JOIN scry.entities e ON e.id = c.id
WHERE e.content_risk IS DISTINCT FROM 'dangerous'
LIMIT 50
Default kinds if omitted: ['post','paper','document','webpage','twitter_thread','grant'].
Pass kinds=>ARRAY['comment'] or kinds=>ARRAY['tweet'] explicitly for those types.
Pass mode=>'mv_lesswrong_posts' to scope to LessWrong posts.
E2. Reddit-specific search
SELECT id, uri, subreddit, original_author, original_timestamp
FROM scry.search_reddit_posts(
'transformer architecture',
subreddits=>ARRAY['MachineLearning','LocalLLaMA'],
limit_n=>50,
window_key=>'recent'
)
ORDER BY score DESC
Window keys: recent, 2022_2023, 2020_2021, 2018_2019, 2014_2017,
2010_2013, 2005_2009. Also: scry.search_reddit_comments(...).
E3. Source-filtered materialized view query
SELECT entity_id, uri, title, original_author, score, original_timestamp
FROM scry.mv_arxiv_papers
WHERE original_timestamp >= '2025-01-01'
ORDER BY original_timestamp DESC
LIMIT 50
E4. Author activity across sources
SELECT e.source::text, COUNT(*) AS docs, MAX(e.original_timestamp) AS latest
FROM scry.entities e
WHERE e.original_author ILIKE '%yudkowsky%'
AND e.content_risk IS DISTINCT FROM 'dangerous'
GROUP BY e.source::text
ORDER BY docs DESC
LIMIT 20
E5. Entity kind distribution for a source
SELECT kind::text, COUNT(*)
FROM scry.entities
WHERE source = 'hackernews'
GROUP BY kind::text
ORDER BY 2 DESC
LIMIT 20
E6. Hybrid search (lexical + semantic rerank in SQL)
WITH c AS (
SELECT id FROM scry.search('deceptive alignment',
kinds=>ARRAY['post'], limit_n=>200)
)
SELECT e.uri, e.title, e.original_author,
emb.embedding_voyage4 <=> @p_deadbeef_topic AS distance
FROM c
JOIN scry.entities e ON e.id = c.id
JOIN scry.embeddings emb ON emb.entity_id = c.id AND emb.chunk_index = 0
WHERE e.content_risk IS DISTINCT FROM 'dangerous'
ORDER BY distance
LIMIT 50
Requires a stored embedding handle (@p_deadbeef_topic). See vector-composition
skill for creating handles.
E7. Cost estimation before execution
curl -s -X POST https://api.exopriors.com/v1/scry/estimate \
-H "Authorization: Bearer $EXOPRIORS_KEY" \
-H "Content-Type: application/json" \
-d '{"sql": "SELECT id, title FROM scry.mv_arxiv_papers LIMIT 1000"}'
Returns EXPLAIN (FORMAT JSON) output. Use this for expensive queries before committing.
E8. Create a shareable artifact
# 1. Run query and capture results
# 2. POST share
curl -s -X POST https://api.exopriors.com/v1/scry/shares \
-H "Authorization: Bearer $EXOPRIORS_KEY" \
-H "Content-Type: application/json" \
-d '{
"kind": "query",
"title": "Top RLHF posts on LessWrong",
"summary": "20 highest-scored LW posts mentioning RLHF.",
"payload": {
"sql": "...",
"result": {"columns": [...], "rows": [...]}
}
}'
Kinds: query, rerank, insight, chat.
Progressive update: create stub immediately, then PATCH /v1/scry/shares/{slug}.
Rendered at: https://exopriors.com/scry/share/{slug}.
E9. Emit a structured agent judgement
curl -s -X POST https://api.exopriors.com/v1/scry/judgements \
-H "Authorization: Bearer $EXOPRIORS_KEY" \
-H "Content-Type: application/json" \
-d '{
"emitter": "my-agent",
"judgement_kind": "topic_classification",
"target_external_ref": "arxiv:2401.12345",
"summary": "Paper primarily about mechanistic interpretability.",
"payload": {"primary_topic": "mech_interp", "confidence_detail": "title+abstract match"},
"confidence": 0.88,
"tags": ["arxiv", "mech_interp"],
"privacy_level": "public"
}'
Exactly one target required: target_entity_id, target_actor_id,
target_judgement_id, or target_external_ref.
Judgement-on-judgement: use target_judgement_id to chain observations.
E10. People / author lookup
-- Per-source author grouping
SELECT a.handle, a.display_name, a.source::text, COUNT(*) AS docs
FROM scry.entities e
JOIN scry.actors a ON a.id = e.author_actor_id
WHERE e.source = 'twitter'
GROUP BY a.handle, a.display_name, a.source::text
ORDER BY docs DESC
LIMIT 50
E11. Thread navigation (replies)
-- Find all replies to a root post
SELECT id, uri, title, original_author, original_timestamp
FROM scry.entities
WHERE anchor_entity_id = 'ROOT_ENTITY_UUID'
ORDER BY original_timestamp
LIMIT 100
anchor_entity_id is the root subject; parent_entity_id is the direct parent.
E12. Count estimation (safe pattern)
Avoid COUNT(*) on large tables. Instead, use schema endpoint row estimates or:
SELECT reltuples::bigint AS estimated_rows
FROM pg_class
WHERE relname = 'mv_lesswrong_posts'
LIMIT 1
Note: pg_class access is blocked for public keys. Use /v1/scry/schema instead.
F) Error Handling
See references/error-reference.md for the full catalogue. Key patterns:
| HTTP | Code | Meaning | Action |
|---|---|---|---|
| 400 | invalid_request |
SQL parse error, missing LIMIT, bad params | Fix query |
| 401 | unauthorized |
Missing or invalid API key | Check key |
| 402 | insufficient_credits |
Token budget exhausted | Notify user |
| 429 | rate_limited |
Too many requests | Respect Retry-After header |
| 503 | service_unavailable |
Scry pool down or overloaded | Wait and retry |
Quota fallback strategy:
- If 429: wait
Retry-Afterseconds, retry once. - If 402: tell the user their token budget is exhausted.
- If 503: retry after 30s with exponential backoff (max 3 attempts).
- If query times out: simplify (use MV instead of full table, reduce LIMIT, add tighter WHERE filters).
G) Output Contract
When this skill completes a query task, return a consistent structure:
## Scry Result
**Query**: <natural language description>
**SQL**: ```sql <the SQL that ran> ```
**Rows returned**: <N> (truncated: <yes/no>)
**Duration**: <N>ms
<formatted results table or summary>
**Share**: <share URL if created>
**Caveats**: <any data quality notes, e.g., "score is NULL for arXiv">
Handoff Contract
Produces: JSON with columns, rows, row_count, duration_ms, truncated
Feeds into:
rerank: ensure SQL returnsidandpayloadcolumns for candidate setsvector-composition: save entity IDs for embedding lookup and semantic rerankingresearch-workflow: any query result can start a research pipelinepeople-graph: entity results withauthor_actor_idfeed into identity resolutionopenalex: entity IDs or DOIs can seed academic graph traversal Receives from: none (entry point for SQL-based corpus access)
Related Skills
- vector-composition — embed concepts as @handles, search by cosine distance, debias with vector algebra
- rerank — LLM-powered multi-attribute reranking of candidate sets via pairwise comparison
- people-graph — cross-platform author identity resolution (actors, people, aliases)
- openalex — navigate the OpenAlex academic graph (authors, citations, institutions, concepts)
- research-workflow — end-to-end research pipeline orchestrator chaining all skills
- tutorial — interactive guided onboarding for first-time Scry users
- scry-people-finder — people-finding workflow using vectors + rerank
For detailed schema documentation, see references/schema-guide.md.
For the full pattern library, see references/query-patterns.md.
For error codes and quota details, see references/error-reference.md.