bigquery

📁 jwmossmoz/agent-skills 📅 1 day ago
1
总安装量
1
周安装量
#54028
全站排名
安装命令
npx skills add https://github.com/jwmossmoz/agent-skills --skill bigquery

Agent 安装分布

amp 1
opencode 1
kimi-cli 1
codex 1
claude-code 1

Skill 文档

BigQuery

Query Mozilla telemetry data directly using the bq CLI.

Prerequisites

  • gcloud and bq CLI installed (brew install google-cloud-sdk)
  • Authenticated: gcloud auth login with a Mozilla account
  • Billing project set: queries run against a project you have bigquery.jobs.create on
  • (Optional but highly recommended) mozdata-claude-plugin — provides Glean Dictionary MCP for metric/ping discovery, making it much easier to find the right tables and columns

Authentication

# Check current account
gcloud config get-value account

# Re-authenticate if needed
gcloud auth login

# List available projects
gcloud projects list --format="table(projectId,name)"

# Set billing project (mozdata is the standard choice)
gcloud config set project mozdata

If queries fail with “Access Denied”, the billing project likely lacks permissions. Try --project_id=mozdata.

Running Queries

# Basic query
bq query --project_id=mozdata --use_legacy_sql=false --format=pretty "SELECT ..."

# Dry run (check cost before executing)
bq query --project_id=mozdata --use_legacy_sql=false --dry_run "SELECT ..."

Always use --project_id=mozdata and --use_legacy_sql=false.

Table Selection

Choose the right table — this is the most important optimization:

Query Type Table Why
Windows version distribution telemetry.windows_10_aggregate Pre-aggregated, instant
DAU/MAU by standard dimensions firefox_desktop_derived.active_users_aggregates_v3 Pre-computed, 100x faster
DAU with custom dimensions firefox_desktop.baseline_clients_daily One row per client per day
MAU/WAU/retention firefox_desktop.baseline_clients_last_seen Bit patterns, scan 1 day not 28
Event analysis firefox_desktop.events_stream Pre-unnested, clustered
Mobile search search.mobile_search_clients_daily_v2 Pre-aggregated
Specific Glean metric firefox_desktop.metrics Raw metrics ping

All tables are in the moz-fx-data-shared-prod project. Fully qualify as `moz-fx-data-shared-prod.{dataset}.{table}`.

Critical Rules

  • Always use aggregate tables first — raw tables are 10-100x more expensive
  • Always include partition filter — submission_date or DATE(submission_timestamp)
  • Use sample_id = 0 for development (1% sample) — remove for production
  • Say “clients” not “users” — BigQuery tracks client_id, not actual humans
  • Never join across products by client_id — each product has its own namespace
  • Use events_stream for events — never raw events_v1 (requires UNNEST)
  • Use baseline_clients_last_seen for MAU — bit patterns, scan 1 day not 28

References

  • references/tables.md — Detailed table schemas and common query patterns
  • references/os-versions.md — Windows, macOS, and Linux version distribution queries with build number, Darwin, and kernel version mappings

Related Skills

  • redash — Web UI frontend to BigQuery with visualizations and sharing
  • mozdata:query-writing — Guided query writing with Glean Dictionary MCP
  • mozdata:probe-discovery — Find Glean metrics and telemetry probes