sm-bigquery-analyst
4
总安装量
2
周安装量
#49897
全站排名
安装命令
npx skills add https://github.com/source-medium/skills --skill sm-bigquery-analyst
Agent 安装分布
opencode
2
cursor
2
codex
2
github-copilot
2
claude-code
2
qwen-code
2
Skill 文档
SourceMedium BigQuery Analyst
Use this skill to help end users work with SourceMedium BigQuery data from setup to analysis.
Workflow
- Verify environment (run these before any analysis)
- Confirm project and dataset/table visibility
- Use docs-first guidance for definitions and table discovery
- Answer analytical questions with reproducible SQL receipts
- Call out assumptions and caveats explicitly
Setup Verification
Run these commands in order before writing analysis SQL:
# 1. Check CLI tools are installed
gcloud --version && bq version
# 2. Check authenticated account
gcloud auth list
# 3. Check active project
gcloud config get-value project
# 4. Validate BigQuery API access (dry-run)
bq query --use_legacy_sql=false --dry_run 'SELECT 1 AS ok'
# 5. Test table access (replace YOUR_PROJECT_ID)
bq query --use_legacy_sql=false --dry_run "
SELECT 1
FROM \`YOUR_PROJECT_ID.sm_transformed_v2.obt_orders\`
LIMIT 1
"
If any step fails, see references/TROUBLESHOOTING.md and guide the user to request access.
Safety Rules
These are hard constraints. Do not bypass.
Query Safety
- SELECT-only â deny: INSERT, UPDATE, DELETE, MERGE, CREATE, DROP, EXPORT, COPY
- Dry-run first when iterating on new queries:
bq query --dry_run '...' - Maximum bytes billed â warn if scan exceeds 1GB without explicit approval
- Always bound queries:
- Add
LIMITclause (max 100 rows for exploratory) - Use date/partition filters when querying partitioned tables
- Prefer
WHEREfilters on partition columns
- Add
Data Safety
- Default to aggregates â avoid outputting raw rows unless explicitly requested
- PII handling:
- Do not output columns likely containing PII (email, phone, address, name) without explicit confirmation
- If PII is requested, confirm scope and purpose before proceeding
- Suggest anonymization (hashing, aggregation) as alternatives
Cost Guardrails
-- Good: bounded scan
SELECT ... FROM `project.dataset.table`
WHERE DATE(column) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
LIMIT 100
-- Bad: full table scan
SELECT ... FROM `project.dataset.table` -- no filters
Output Contract
For analytical questions, always return:
- Answer â concise plain-English conclusion
- SQL (copy/paste) â BigQuery Standard SQL used for the result
- Notes â timeframe, metric definitions, grain, scope, timezone, attribution lens
- Verify â
bq query --use_legacy_sql=false --dry_run '<SQL>'command
If access/setup fails, do not fabricate results. Return:
- Exact failing step
- Exact project/dataset that failed
- Direct user to
assets/BIGQUERY_ACCESS_REQUEST_TEMPLATE.md
Query Guardrails
- Fully qualify tables as
`project.dataset.table` - For order analyses, default to
WHERE is_order_sm_valid = TRUE - Use
sm_store_id(notsmcidâ that name does not exist in customer tables) - Use
SAFE_DIVIDEfor ratio math - Handle DATE/TIMESTAMP typing explicitly (
DATE(ts_col)when comparing to dates) - Use
order_net_revenuefor revenue metrics (notorder_gross_revenueunless explicitly asked) - Use
*_local_datetimecolumns for date-based reporting (not UTC*_atcolumns) - Avoid
LIKE/REGEXPon low-cardinality fields; discover values first withSELECT DISTINCT, then use exact match LIKEis acceptable for free-text fields (utm_campaign,product_title,page_path)- LTV tables (
rpt_cohort_ltv_*): always filtersm_order_line_typeto exactly ONE value
References
references/SCHEMA.mdâ key tables, grains, columns, and naming conventionsreferences/QUERY_PATTERNS.mdâ common SQL patterns and LTV/cohort rulesreferences/TROUBLESHOOTING.mdâ auth, permission, and API issuesassets/BIGQUERY_ACCESS_REQUEST_TEMPLATE.mdâ copy/paste request for users without access