sm-bigquery-analyst

📁 source-medium/skills 📅 1 day ago

总安装量

周安装量

#49897

全站排名

安装命令

npx skills add https://github.com/source-medium/skills --skill sm-bigquery-analyst

Agent 安装分布

opencode 2

cursor 2

codex 2

github-copilot 2

claude-code 2

qwen-code 2

Skill 文档

SourceMedium BigQuery Analyst

Use this skill to help end users work with SourceMedium BigQuery data from setup to analysis.

Workflow

Verify environment (run these before any analysis)
Confirm project and dataset/table visibility
Use docs-first guidance for definitions and table discovery
Answer analytical questions with reproducible SQL receipts
Call out assumptions and caveats explicitly

Setup Verification

Run these commands in order before writing analysis SQL:

# 1. Check CLI tools are installed
gcloud --version && bq version

# 2. Check authenticated account
gcloud auth list

# 3. Check active project
gcloud config get-value project

# 4. Validate BigQuery API access (dry-run)
bq query --use_legacy_sql=false --dry_run 'SELECT 1 AS ok'

# 5. Test table access (replace YOUR_PROJECT_ID)
bq query --use_legacy_sql=false --dry_run "
  SELECT 1 
  FROM \`YOUR_PROJECT_ID.sm_transformed_v2.obt_orders\` 
  LIMIT 1
"

If any step fails, see references/TROUBLESHOOTING.md and guide the user to request access.

Safety Rules

These are hard constraints. Do not bypass.

Query Safety

SELECT-only â deny: INSERT, UPDATE, DELETE, MERGE, CREATE, DROP, EXPORT, COPY
Dry-run first when iterating on new queries: bq query --dry_run '...'
Maximum bytes billed â warn if scan exceeds 1GB without explicit approval
Always bound queries:
- Add LIMIT clause (max 100 rows for exploratory)
- Use date/partition filters when querying partitioned tables
- Prefer WHERE filters on partition columns

Data Safety

Default to aggregates â avoid outputting raw rows unless explicitly requested
PII handling:
- Do not output columns likely containing PII (email, phone, address, name) without explicit confirmation
- If PII is requested, confirm scope and purpose before proceeding
- Suggest anonymization (hashing, aggregation) as alternatives

Cost Guardrails

-- Good: bounded scan
SELECT ... FROM `project.dataset.table`
WHERE DATE(column) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
LIMIT 100

-- Bad: full table scan
SELECT ... FROM `project.dataset.table`  -- no filters

Output Contract

For analytical questions, always return:

Answer â concise plain-English conclusion
SQL (copy/paste) â BigQuery Standard SQL used for the result
Notes â timeframe, metric definitions, grain, scope, timezone, attribution lens
Verify â bq query --use_legacy_sql=false --dry_run '<SQL>' command

If access/setup fails, do not fabricate results. Return:

Exact failing step
Exact project/dataset that failed
Direct user to assets/BIGQUERY_ACCESS_REQUEST_TEMPLATE.md

Query Guardrails

Fully qualify tables as `project.dataset.table`
For order analyses, default to WHERE is_order_sm_valid = TRUE
Use sm_store_id (not smcid â that name does not exist in customer tables)
Use SAFE_DIVIDE for ratio math
Handle DATE/TIMESTAMP typing explicitly (DATE(ts_col) when comparing to dates)
Use order_net_revenue for revenue metrics (not order_gross_revenue unless explicitly asked)
Use *_local_datetime columns for date-based reporting (not UTC *_at columns)
Avoid LIKE/REGEXP on low-cardinality fields; discover values first with SELECT DISTINCT, then use exact match
LIKE is acceptable for free-text fields (utm_campaign, product_title, page_path)
LTV tables (rpt_cohort_ltv_*): always filter sm_order_line_type to exactly ONE value

References

references/SCHEMA.md â key tables, grains, columns, and naming conventions
references/QUERY_PATTERNS.md â common SQL patterns and LTV/cohort rules
references/TROUBLESHOOTING.md â auth, permission, and API issues
assets/BIGQUERY_ACCESS_REQUEST_TEMPLATE.md â copy/paste request for users without access

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台