jgi-lakehouse

📁 fmschulz/omics-skills 📅 7 days ago
4
总安装量
3
周安装量
#53364
全站排名
安装命令
npx skills add https://github.com/fmschulz/omics-skills --skill jgi-lakehouse

Agent 安装分布

gemini-cli 3
codex 3
cursor 3
trae 2
antigravity 2
codebuddy 2

Skill 文档

JGI Lakehouse

Use JGI Lakehouse (Dremio) for metadata queries and the JGI filesystem for sequence downloads.

Instructions

  1. Authenticate to Dremio using a PAT.
  2. Explore schemas and tables to find the required metadata.
  3. Run SQL queries for project/sample/taxon discovery.
  4. Use IMG taxon OIDs to fetch genome packages from the filesystem.
  5. Validate outputs and record provenance.

Quick Reference

Task Action
Auth setup See docs/authentication.md
SQL cheatsheet See docs/sql-quick-reference.md
Table catalog See docs/data-catalog.md
GOLD exploration See docs/explore_gold.md

Input Requirements

  • DREMIO_PAT token (for Lakehouse access)
  • Query intent (taxonomy, ecosystem, project IDs, etc.)
  • JGI filesystem access for downloads

Output

  • Query results (tables or CSVs)
  • Lists of taxon OIDs or accessions
  • Downloaded genome packages (FNA/FAA/GFF)

Quality Gates

  • SQL queries return expected row counts
  • Taxon OIDs map to existing filesystem packages
  • Downloaded files pass basic integrity checks

Examples

Example 1: Basic GOLD query

SELECT gold_id, project_name
FROM "gold-db-2 postgresql".gold.project
WHERE is_public = 'Yes'
LIMIT 5;

Troubleshooting

Issue: Authentication failures Solution: Re-create the PAT and confirm it is exported before querying.

Issue: Missing genome files Solution: Verify IMG taxon OIDs and filesystem path permissions.