scry

📁 exopriors/skills 📅 Today
2
总安装量
1
周安装量
#74598
全站排名
安装命令
npx skills add https://github.com/exopriors/skills --skill scry

Agent 安装分布

amp 1
cline 1
openclaw 1
opencode 1
cursor 1
kimi-cli 1

Skill 文档

Scry Skill

Scry gives you read-only SQL access to the ExoPriors public corpus (229M+ entities) via a single HTTP endpoint. You write Postgres SQL against a curated scry.* schema and get JSON rows back. There is no ORM, no GraphQL, no pagination token — just SQL.

A) When to use / not use

Use this skill when:

  • Searching, filtering, or aggregating content across the ExoPriors corpus
  • Running lexical (BM25) or hybrid searches
  • Exploring author networks, cross-platform identities, or publication patterns
  • Creating shareable artifacts from query results
  • Emitting structured agent judgements about entities or external references

Do NOT use this skill when:

  • The user wants semantic/vector search composition or embedding algebra (use the vector-composition skill)
  • The user wants LLM-based reranking (use the rerank skill)
  • The user wants cross-platform people graph traversal (use the people-graph skill)
  • The user wants OpenAlex academic helpers (use the openalex skill)
  • The user is querying their own local database

B) Golden Rules

  1. Schema first. ALWAYS call GET /v1/scry/schema before writing SQL. Never guess column names or types. The schema endpoint returns live column metadata and row-count estimates for every view.

  2. LIMIT always. Every query MUST include a LIMIT clause. Max 10,000 rows. Queries without LIMIT are rejected by the SQL validator.

  3. Prefer materialized views. scry.entities has 229M+ rows. Scanning it without filters is slow. Use scry.mv_lesswrong_posts, scry.mv_arxiv_papers, scry.mv_hackernews_posts, etc. for targeted access. They are pre-filtered and often have embeddings pre-joined.

  4. Filter dangerous content. Always include WHERE content_risk IS DISTINCT FROM 'dangerous' unless the user explicitly asks for unfiltered results. Dangerous content contains adversarial prompt-injection payloads.

  5. Raw SQL, not JSON. POST /v1/scry/query takes Content-Type: text/plain with raw SQL in the body. Not JSON-wrapped SQL.

For full tier limits, timeout policies, and degradation strategies, see Shared Guardrails.

C) Quickstart

One end-to-end example: find recent high-scoring LessWrong posts about RLHF.

Step 1: Get schema
GET https://api.exopriors.com/v1/scry/schema
Authorization: Bearer $EXOPRIORS_KEY

Step 2: Run query
POST https://api.exopriors.com/v1/scry/query
Authorization: Bearer $EXOPRIORS_KEY
Content-Type: text/plain

WITH hits AS (
  SELECT id FROM scry.search('RLHF reinforcement learning human feedback',
    kinds=>ARRAY['post'], limit_n=>100)
)
SELECT e.uri, e.title, e.original_author, e.original_timestamp, e.score
FROM hits h
JOIN scry.entities e ON e.id = h.id
WHERE e.source = 'lesswrong'
  AND e.content_risk IS DISTINCT FROM 'dangerous'
ORDER BY e.score DESC NULLS LAST
LIMIT 20

Response shape:

{
  "columns": ["uri", "title", "original_author", "original_timestamp", "score"],
  "rows": [["https://...", "My RLHF Post", "author", "2025-01-15T...", 142], ...],
  "row_count": 20,
  "duration_ms": 312,
  "truncated": false
}

D) Decision Tree

User wants to search the ExoPriors corpus?
  |
  +-- By keywords/phrases? --> scry.search() (BM25 lexical)
  |     +-- Specific forum?  --> pass mode='mv_lesswrong_posts' or kinds filter
  |     +-- Reddit?          --> scry.search_reddit_posts() / search_reddit_comments()
  |     +-- Large result?    --> scry.search_ids() (IDs only, up to 2000)
  |
  +-- By structured filters (source, date, author)? --> Direct SQL on MVs
  |
  +-- By semantic similarity? --> (vector-composition skill, not this one)
  |
  +-- Hybrid (keywords + semantic rerank)? --> scry.hybrid_search() or
  |     lexical CTE + JOIN scry.embeddings
  |
  +-- Author/people lookup? --> scry.actors, scry.people, scry.person_aliases
  |
  +-- Need to share results? --> POST /v1/scry/shares
  |
  +-- Need to emit a structured observation? --> POST /v1/scry/judgements

E) Recipes

E1. Lexical search (BM25)

WITH c AS (
  SELECT id FROM scry.search('your query here',
    kinds=>ARRAY['post'], limit_n=>100)
)
SELECT e.uri, e.title, e.original_author, e.original_timestamp
FROM c JOIN scry.entities e ON e.id = c.id
WHERE e.content_risk IS DISTINCT FROM 'dangerous'
LIMIT 50

Default kinds if omitted: ['post','paper','document','webpage','twitter_thread','grant']. Pass kinds=>ARRAY['comment'] or kinds=>ARRAY['tweet'] explicitly for those types. Pass mode=>'mv_lesswrong_posts' to scope to LessWrong posts.

E2. Reddit-specific search

SELECT id, uri, subreddit, original_author, original_timestamp
FROM scry.search_reddit_posts(
  'transformer architecture',
  subreddits=>ARRAY['MachineLearning','LocalLLaMA'],
  limit_n=>50,
  window_key=>'recent'
)
ORDER BY score DESC

Window keys: recent, 2022_2023, 2020_2021, 2018_2019, 2014_2017, 2010_2013, 2005_2009. Also: scry.search_reddit_comments(...).

E3. Source-filtered materialized view query

SELECT entity_id, uri, title, original_author, score, original_timestamp
FROM scry.mv_arxiv_papers
WHERE original_timestamp >= '2025-01-01'
ORDER BY original_timestamp DESC
LIMIT 50

E4. Author activity across sources

SELECT e.source::text, COUNT(*) AS docs, MAX(e.original_timestamp) AS latest
FROM scry.entities e
WHERE e.original_author ILIKE '%yudkowsky%'
  AND e.content_risk IS DISTINCT FROM 'dangerous'
GROUP BY e.source::text
ORDER BY docs DESC
LIMIT 20

E5. Entity kind distribution for a source

SELECT kind::text, COUNT(*)
FROM scry.entities
WHERE source = 'hackernews'
GROUP BY kind::text
ORDER BY 2 DESC
LIMIT 20

E6. Hybrid search (lexical + semantic rerank in SQL)

WITH c AS (
  SELECT id FROM scry.search('deceptive alignment',
    kinds=>ARRAY['post'], limit_n=>200)
)
SELECT e.uri, e.title, e.original_author,
       emb.embedding_voyage4 <=> @p_deadbeef_topic AS distance
FROM c
JOIN scry.entities e ON e.id = c.id
JOIN scry.embeddings emb ON emb.entity_id = c.id AND emb.chunk_index = 0
WHERE e.content_risk IS DISTINCT FROM 'dangerous'
ORDER BY distance
LIMIT 50

Requires a stored embedding handle (@p_deadbeef_topic). See vector-composition skill for creating handles.

E7. Cost estimation before execution

curl -s -X POST https://api.exopriors.com/v1/scry/estimate \
  -H "Authorization: Bearer $EXOPRIORS_KEY" \
  -H "Content-Type: application/json" \
  -d '{"sql": "SELECT id, title FROM scry.mv_arxiv_papers LIMIT 1000"}'

Returns EXPLAIN (FORMAT JSON) output. Use this for expensive queries before committing.

E8. Create a shareable artifact

# 1. Run query and capture results
# 2. POST share
curl -s -X POST https://api.exopriors.com/v1/scry/shares \
  -H "Authorization: Bearer $EXOPRIORS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "kind": "query",
    "title": "Top RLHF posts on LessWrong",
    "summary": "20 highest-scored LW posts mentioning RLHF.",
    "payload": {
      "sql": "...",
      "result": {"columns": [...], "rows": [...]}
    }
  }'

Kinds: query, rerank, insight, chat. Progressive update: create stub immediately, then PATCH /v1/scry/shares/{slug}. Rendered at: https://exopriors.com/scry/share/{slug}.

E9. Emit a structured agent judgement

curl -s -X POST https://api.exopriors.com/v1/scry/judgements \
  -H "Authorization: Bearer $EXOPRIORS_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "emitter": "my-agent",
    "judgement_kind": "topic_classification",
    "target_external_ref": "arxiv:2401.12345",
    "summary": "Paper primarily about mechanistic interpretability.",
    "payload": {"primary_topic": "mech_interp", "confidence_detail": "title+abstract match"},
    "confidence": 0.88,
    "tags": ["arxiv", "mech_interp"],
    "privacy_level": "public"
  }'

Exactly one target required: target_entity_id, target_actor_id, target_judgement_id, or target_external_ref. Judgement-on-judgement: use target_judgement_id to chain observations.

E10. People / author lookup

-- Per-source author grouping
SELECT a.handle, a.display_name, a.source::text, COUNT(*) AS docs
FROM scry.entities e
JOIN scry.actors a ON a.id = e.author_actor_id
WHERE e.source = 'twitter'
GROUP BY a.handle, a.display_name, a.source::text
ORDER BY docs DESC
LIMIT 50

E11. Thread navigation (replies)

-- Find all replies to a root post
SELECT id, uri, title, original_author, original_timestamp
FROM scry.entities
WHERE anchor_entity_id = 'ROOT_ENTITY_UUID'
ORDER BY original_timestamp
LIMIT 100

anchor_entity_id is the root subject; parent_entity_id is the direct parent.

E12. Count estimation (safe pattern)

Avoid COUNT(*) on large tables. Instead, use schema endpoint row estimates or:

SELECT reltuples::bigint AS estimated_rows
FROM pg_class
WHERE relname = 'mv_lesswrong_posts'
LIMIT 1

Note: pg_class access is blocked for public keys. Use /v1/scry/schema instead.

F) Error Handling

See references/error-reference.md for the full catalogue. Key patterns:

HTTP Code Meaning Action
400 invalid_request SQL parse error, missing LIMIT, bad params Fix query
401 unauthorized Missing or invalid API key Check key
402 insufficient_credits Token budget exhausted Notify user
429 rate_limited Too many requests Respect Retry-After header
503 service_unavailable Scry pool down or overloaded Wait and retry

Quota fallback strategy:

  1. If 429: wait Retry-After seconds, retry once.
  2. If 402: tell the user their token budget is exhausted.
  3. If 503: retry after 30s with exponential backoff (max 3 attempts).
  4. If query times out: simplify (use MV instead of full table, reduce LIMIT, add tighter WHERE filters).

G) Output Contract

When this skill completes a query task, return a consistent structure:

## Scry Result

**Query**: <natural language description>
**SQL**: ```sql <the SQL that ran> ```
**Rows returned**: <N> (truncated: <yes/no>)
**Duration**: <N>ms

<formatted results table or summary>

**Share**: <share URL if created>
**Caveats**: <any data quality notes, e.g., "score is NULL for arXiv">

Handoff Contract

Produces: JSON with columns, rows, row_count, duration_ms, truncated Feeds into:

  • rerank: ensure SQL returns id and payload columns for candidate sets
  • vector-composition: save entity IDs for embedding lookup and semantic reranking
  • research-workflow: any query result can start a research pipeline
  • people-graph: entity results with author_actor_id feed into identity resolution
  • openalex: entity IDs or DOIs can seed academic graph traversal Receives from: none (entry point for SQL-based corpus access)

Related Skills

  • vector-composition — embed concepts as @handles, search by cosine distance, debias with vector algebra
  • rerank — LLM-powered multi-attribute reranking of candidate sets via pairwise comparison
  • people-graph — cross-platform author identity resolution (actors, people, aliases)
  • openalex — navigate the OpenAlex academic graph (authors, citations, institutions, concepts)
  • research-workflow — end-to-end research pipeline orchestrator chaining all skills
  • tutorial — interactive guided onboarding for first-time Scry users
  • scry-people-finder — people-finding workflow using vectors + rerank

For detailed schema documentation, see references/schema-guide.md. For the full pattern library, see references/query-patterns.md. For error codes and quota details, see references/error-reference.md.