token-efficiency

📁 delphine-l/claude_global 📅 Jan 24, 2026
93
总安装量
94
周安装量
#2490
全站排名
安装命令
npx skills add https://github.com/delphine-l/claude_global --skill token-efficiency

Agent 安装分布

opencode 64
claude-code 62
codex 55
gemini-cli 51
github-copilot 36
openclaw 33

Skill 文档

Token Efficiency Expert

This skill provides token optimization strategies for cost-effective Claude Code usage across all projects. These guidelines help minimize token consumption while maintaining high-quality assistance.

Core Principle

ALWAYS follow these optimization guidelines by default unless the user explicitly requests verbose output or full file contents.

Default assumption: Users prefer efficient, cost-effective assistance.


Model Selection Strategy

Use the right model for the task to optimize cost and performance:

Opus – For Learning and Deep Understanding

Use Opus when:

  • 🎓 Learning new codebases – Understanding architecture, code structure, design patterns
  • 📚 Broad exploration – Identifying key files, understanding repository organization
  • 🔍 Deep analysis – Analyzing complex algorithms, performance optimization
  • 📖 Reading and understanding – When you need to comprehend existing code before making changes
  • 🧠 Very complex debugging – Only when Sonnet can’t solve it or issue is architectural

Why Opus: More powerful reasoning for understanding complex systems and relationships

Example prompts:

"Use Opus to understand the architecture of this codebase"
"Switch to Opus - I need help understanding how this component works"
"Use Opus for this deep dive into the authentication system"

Sonnet – For Regular Development Tasks (DEFAULT)

Use Sonnet (default) for:

  • ✏️ Writing code – Creating new files, implementing features
  • 🔧 Editing and fixing – Updating configurations, fixing bugs
  • 🐛 Debugging – Standard debugging, error analysis, troubleshooting (use Sonnet unless very complex)
  • 🧪 Testing – Writing tests, running test suites
  • 📝 Documentation – Writing READMEs, comments, docstrings
  • 🚀 Deployment tasks – Running builds, deploying code
  • 💬 General questions – Quick clarifications, simple explanations

Why Sonnet: Faster and more cost-effective for straightforward tasks, handles most debugging well

Example workflow:

1. [Opus] Learn codebase structure and identify key components (one-time)
2. [Sonnet] Implement the feature based on understanding
3. [Sonnet] Debug and fix issues as they arise
4. [Sonnet] Write tests and documentation
5. [Opus] Only if stuck on architectural or very complex issues
6. [Sonnet] Final cleanup and deployment

Cost Optimization Strategy

Typical session pattern:

  1. Start with Opus – Spend 10-15 minutes understanding the codebase (one-time investment)
  2. Switch to Sonnet – Use for ALL implementation, debugging, and routine work
  3. Return to Opus – Only when explicitly needed for deep architectural understanding

Savings example:

  • 2 hours of work = 120 minutes
  • Opus for learning: 15 minutes (~5K tokens)
  • Sonnet for everything else: 105 minutes (~15K tokens)
  • vs all Opus: ~40K tokens
  • Savings: ~50% token cost

Remember: Sonnet is very capable – use it by default, including for debugging. Only escalate to Opus when the problem requires deep architectural insight.


Skills and Token Efficiency

Common Misconception

Myth: Having many skills in .claude/skills/ increases token usage.

Reality: Skills use progressive disclosure – Claude loads them intelligently:

  1. At session start: Claude sees only skill descriptions (minimal tokens)
  2. When activated: Full skill content loaded only for skills being used
  3. Unused skills: Consume almost no tokens (just the description line)

Example Token Usage

.claude/skills/
├── vgp-pipeline/          # ~50 tokens (description only)
├── galaxy-tool-wrapping/  # ~40 tokens (description only)
├── token-efficiency/      # ~30 tokens (description only)
└── python-testing/        # ~35 tokens (description only)

Total overhead: ~155 tokens for 4 skills (just descriptions)

When skill activated: Additional 2,000-5,000 tokens loaded for that specific skill

Implication for Centralized Skills

It’s safe to symlink multiple skills to a project!

  • Link 10+ skills from $CLAUDE_METADATA → only ~500 tokens overhead
  • Only activate skills you need by mentioning them by name
  • Example: “Use the vgp-pipeline skill to check status” → loads only that skill

Best practice:

# Link all potentially useful skills
ln -s $CLAUDE_METADATA/skills/vgp-pipeline .claude/skills/vgp-pipeline
ln -s $CLAUDE_METADATA/skills/galaxy-tool-wrapping .claude/skills/galaxy-tool-wrapping
ln -s $CLAUDE_METADATA/skills/python-testing .claude/skills/python-testing

# Activate selectively during session
"Use the vgp-pipeline skill to debug this workflow"  # Only VGP skill fully loaded

Token waste comes from:

  • ❌ Reading large log files unnecessarily
  • ❌ Running verbose commands
  • ❌ Reading unchanged files multiple times

NOT from:

  • ✅ Having many skills available
  • ✅ Well-organized skill directories
  • ✅ Using centralized skill repositories

Token Optimization Rules

1. Use Quiet/Minimal Output Modes

For commands with --quiet, --silent, or -q flags:

# ❌ DON'T: Use verbose mode by default
command --verbose

# ✅ DO: Use quiet mode by default
command --quiet
command -q
command --silent

Common commands with quiet modes:

  • grep -q (quiet, exit status only)
  • git --quiet or git -q
  • curl -s or curl --silent
  • wget -q
  • make -s (silent)
  • Custom scripts with --quiet flags

When to use verbose: Only when user explicitly asks for detailed output.


2. NEVER Read Entire Log Files

Log files can be 50-200K tokens. ALWAYS filter before reading.

# ❌ NEVER DO THIS:
Read: /var/log/application.log
Read: debug.log
Read: error.log

# ✅ ALWAYS DO ONE OF THESE:

# Option 1: Read only the end (most recent)
Bash: tail -100 /var/log/application.log

# Option 2: Filter for errors/warnings
Bash: grep -A 10 -i "error\|fail\|warning" /var/log/application.log | head -100

# Option 3: Specific time range (if timestamps present)
Bash: grep "2025-01-15" /var/log/application.log | tail -50

# Option 4: Count occurrences first
Bash: grep -c "ERROR" /var/log/application.log  # See if there are many errors
Bash: grep "ERROR" /var/log/application.log | tail -20  # Then read recent ones

Exceptions: Only read full log if:

  • User explicitly says “read the full log”
  • Filtered output lacks necessary context
  • Log is known to be small (<1000 lines)

3. Check Lightweight Sources First

Before reading large files, check if info is available in smaller sources:

For Git repositories:

# ✅ Check status first (small output)
Bash: git status --short
Bash: git log --oneline -10

# ❌ Don't immediately read
Read: .git/logs/HEAD  # Can be large

For Python/Node projects:

# ✅ Check package info (small files)
Bash: cat package.json | jq '.dependencies'
Bash: cat requirements.txt | head -20

# ❌ Don't immediately read
Read: node_modules/  # Huge directory
Read: venv/  # Large virtual environment

For long-running processes:

# ✅ Check process status
Bash: ps aux | grep python
Bash: top -b -n 1 | head -20

# ❌ Don't read full logs immediately
Read: /var/log/syslog

4. Use Grep Instead of Reading Files

When searching for specific content:

# ❌ DON'T: Read file then manually search
Read: large_file.py  # 30K tokens
# Then manually look for "def my_function"

# ✅ DO: Use Grep to find it
Grep: "def my_function" large_file.py
# Then only read relevant sections if needed

Advanced grep usage:

# Find with context
Bash: grep -A 5 -B 5 "pattern" file.py  # 5 lines before/after

# Case-insensitive search
Bash: grep -i "error" logfile.txt

# Recursive search in directory
Bash: grep -r "TODO" src/ | head -20

# Count matches
Bash: grep -c "import" *.py

5. Read Files with Limits

If you must read a file, use offset and limit parameters:

# ✅ Read first 100 lines to understand structure
Read: large_file.py (limit: 100)

# ✅ Read specific section
Read: large_file.py (offset: 500, limit: 100)

# ✅ Read just the imports/header
Read: script.py (limit: 50)

For very large files:

# Check file size first
Bash: wc -l large_file.txt
# Output: 50000 lines

# Then read strategically
Bash: head -100 large_file.txt  # Beginning
Bash: tail -100 large_file.txt  # End
Bash: sed -n '1000,1100p' large_file.txt  # Specific middle section

Reading Large Test Output Files:

For Galaxy tool_test_output.json files (can be 30K+ lines):

# Read summary first (top of file)
Read(file_path, limit=10)  # Just get summary section

# Then read specific test results
Read(file_path, offset=140, limit=120)  # Target specific test

# Search for patterns
Bash("grep -n 'test_index' tool_test_output.json")  # Find test boundaries

Token savings:

  • Full file: ~60K tokens
  • Targeted reads: ~5K tokens
  • Savings: 55K tokens (92%)

6. Use Bash Commands Instead of Reading Files

CRITICAL OPTIMIZATION: For file operations, use bash commands directly instead of reading files into Claude’s context.

Reading files costs tokens. Bash commands don’t.

Copy File Contents

# ❌ DON'T: Read and write (costs tokens for file content)
Read: source_file.txt
Write: destination_file.txt (with content from source_file.txt)

# ✅ DO: Use cp command (zero token cost for file content)
Bash: cp source_file.txt destination_file.txt

Token savings: 100% of file content

Replace Text in Files

# ❌ DON'T: Read, edit, write (costs tokens for entire file)
Read: config.yaml
Edit: config.yaml (old_string: "old_value", new_string: "new_value")

# ✅ DO: Use sed in-place (zero token cost for file content)
Bash: sed -i '' 's/old_value/new_value/g' config.yaml
# or
Bash: sed -i.bak 's/old_value/new_value/g' config.yaml  # with backup

# For literal strings with special characters
Bash: sed -i '' 's|old/path|new/path|g' config.yaml  # Use | as delimiter

Token savings: 100% of file content

macOS vs Linux compatibility:

# macOS (BSD sed) - requires empty string after -i
sed -i '' 's/old/new/g' file.txt

# Linux (GNU sed) - no argument needed
sed -i 's/old/new/g' file.txt

# Cross-platform solution (works everywhere):
sed -i.bak 's/old/new/g' file.txt && rm file.txt.bak
# OR detect OS:
if [[ "$OSTYPE" == "darwin"* ]]; then
    sed -i '' 's/old/new/g' file.txt
else
    sed -i 's/old/new/g' file.txt
fi

# Portable alternative (no -i flag):
sed 's/old/new/g' file.txt > file.tmp && mv file.tmp file.txt

Why this matters: Scripts using sed -i will fail on macOS with cryptic errors like “can’t read /pattern/…” if the empty string is omitted. Always use sed -i '' for macOS compatibility or sed -i.bak for cross-platform safety.

Append to Files

# ❌ DON'T: Read and write entire file
Read: log.txt
Write: log.txt (with existing content + new line)

# ✅ DO: Use echo or append
Bash: echo "New log entry" >> log.txt
Bash: cat >> log.txt << 'EOF'
Multiple lines
of content
EOF

Token savings: 100% of existing file content

Delete Lines from Files

# ❌ DON'T: Read, filter, write
Read: data.txt
Write: data.txt (without lines containing "DELETE")

# ✅ DO: Use sed or grep
Bash: sed -i '' '/DELETE/d' data.txt
# or
Bash: grep -v "DELETE" data.txt > data_temp.txt && mv data_temp.txt data.txt

Extract Specific Lines

# ❌ DON'T: Read entire file to get a few lines
Read: large_file.txt (find lines 100-110)

# ✅ DO: Use sed or awk
Bash: sed -n '100,110p' large_file.txt
Bash: awk 'NR>=100 && NR<=110' large_file.txt
Bash: head -110 large_file.txt | tail -11

Rename Files in Bulk

# ❌ DON'T: Read directory, loop in Claude, execute renames
Read directory listing...
For each file: mv old_name new_name

# ✅ DO: Use bash loop or rename command
Bash: for f in *.txt; do mv "$f" "${f%.txt}.md"; done
Bash: rename 's/\.txt$/.md/' *.txt  # if rename command available

Merge Files

# ❌ DON'T: Read multiple files and write combined
Read: file1.txt
Read: file2.txt
Write: combined.txt

# ✅ DO: Use cat
Bash: cat file1.txt file2.txt > combined.txt
# or append
Bash: cat file2.txt >> file1.txt

Count Lines/Words/Characters

# ❌ DON'T: Read file to count
Read: document.txt
# Then count lines manually

# ✅ DO: Use wc
Bash: wc -l document.txt  # Lines
Bash: wc -w document.txt  # Words
Bash: wc -c document.txt  # Characters

Check if File Contains Text

# ❌ DON'T: Read file to search
Read: config.yaml
# Then search for text

# ✅ DO: Use grep with exit code
Bash: grep -q "search_term" config.yaml && echo "Found" || echo "Not found"
# or just check exit code
Bash: grep -q "search_term" config.yaml  # Exit 0 if found, 1 if not

Sort File Contents

# ❌ DON'T: Read, sort in memory, write
Read: unsorted.txt
Write: sorted.txt (with sorted content)

# ✅ DO: Use sort command
Bash: sort unsorted.txt > sorted.txt
Bash: sort -u unsorted.txt > sorted_unique.txt  # Unique sorted
Bash: sort -n numbers.txt > sorted_numbers.txt  # Numeric sort

Remove Duplicate Lines

# ❌ DON'T: Read and deduplicate manually
Read: file_with_dupes.txt
Write: file_no_dupes.txt

# ✅ DO: Use sort -u or uniq
Bash: sort -u file_with_dupes.txt > file_no_dupes.txt
# or preserve order
Bash: awk '!seen[$0]++' file_with_dupes.txt > file_no_dupes.txt

Find and Replace Across Multiple Files

# ❌ DON'T: Read each file, edit, write back
Read: file1.py
Edit: file1.py (replace text)
Read: file2.py
Edit: file2.py (replace text)
# ... repeat for many files

# ✅ DO: Use sed with find or loop
Bash: find . -name "*.py" -exec sed -i '' 's/old_text/new_text/g' {} +
# or
Bash: for f in *.py; do sed -i '' 's/old_text/new_text/g' "$f"; done

Create File with Template Content

# ❌ DON'T: Use Write tool for static content
Write: template.txt (with multi-line template)

# ✅ DO: Use heredoc or echo
Bash: cat > template.txt << 'EOF'
Multi-line
template
content
EOF

# or for simple content
Bash: echo "Single line content" > file.txt

When to Break These Rules

Still use Read/Edit/Write when:

  1. Complex logic required: Conditional edits based on file structure
  2. Code-aware changes: Editing within functions, preserving indentation
  3. Validation needed: Need to verify content before changing
  4. Interactive review: User needs to see content before approving changes
  5. Multi-step analysis: Need to understand code structure first

Example where Read/Edit is better:

# Changing function signature requires understanding context
Read: module.py
Edit: module.py (update specific function while preserving structure)

Example where bash is better:

# Simple text replacement
Bash: sed -i '' 's/old_api_url/new_api_url/g' config.py

Token Savings Examples

Example 1: Update 10 config files

Wasteful approach:

Read: config1.yaml  # 5K tokens
Edit: config1.yaml
Read: config2.yaml  # 5K tokens
Edit: config2.yaml
# ... repeat 10 times = 50K tokens

Efficient approach:

Bash: for f in config*.yaml; do sed -i '' 's/old/new/g' "$f"; done
# Token cost: ~100 tokens for command, 0 for file content

Savings: 49,900 tokens (99.8%)

Example 2: Copy configuration

Wasteful approach:

Read: template_config.yaml  # 10K tokens
Write: project_config.yaml  # 10K tokens
# Total: 20K tokens

Efficient approach:

Bash: cp template_config.yaml project_config.yaml
# Token cost: ~50 tokens

Savings: 19,950 tokens (99.75%)

Example 3: Append log entry

Wasteful approach:

Read: application.log  # 50K tokens (large file)
Write: application.log  # 50K tokens
# Total: 100K tokens

Efficient approach:

Bash: echo "[$(date)] Log entry" >> application.log
# Token cost: ~50 tokens

Savings: 99,950 tokens (99.95%)

Find CSV Column Indices

# ❌ DON'T: Read entire CSV file to find column numbers
Read: large_table.csv (100+ columns, thousands of rows)
# Then manually count columns

# ✅ DO: Extract and number header row
Bash: head -1 file.csv | tr ',' '\n' | nl

# ✅ DO: Find specific columns by pattern
Bash: head -1 VGP-table.csv | tr ',' '\n' | nl | grep -i "chrom"
# Output shows column numbers and names:
#  54 num_chromosomes
# 106 total_number_of_chromosomes
# 122 num_chromosomes_haploid

How it works:

  • head -1: Get header row only
  • tr ',' '\n': Convert comma-separated to newlines
  • nl: Number the lines (gives column index)
  • grep -i: Filter by pattern (case-insensitive)

Use case: Quickly identify which columns contain needed data in wide tables (100+ columns).

Token savings: 100% of file content – Only see column headers, not data rows.

Python Data Filtering Pattern

# ✅ Create separate filtered files rather than overwriting
# Read original
species_data = []
with open('data.csv', 'r') as f:
    reader = csv.DictReader(f)
    for row in reader:
        if row['accession'] and row['chromosome_count']:  # Filter criteria
            species_data.append(row)

# Write to NEW file with descriptive suffix
output_file = 'data_filtered.csv'  # Not 'data.csv'
with open(output_file, 'w', newline='') as f:
    writer = csv.DictWriter(f, fieldnames=reader.fieldnames)
    writer.writeheader()
    writer.writerows(species_data)

Benefits:

  • Preserves original data for comparison
  • Clear naming indicates filtering applied
  • Can generate multiple filtered versions
  • Easier to debug and verify filtering logic

Handling Shell Aliases in Python Scripts

Problem: Python’s subprocess.run() doesn’t expand shell aliases.

# ❌ FAILS if 'datasets' is an alias
subprocess.run(['datasets', 'summary', ...])
# Error: [Errno 2] No such file or directory: 'datasets'

Solution: Use full path to executable

# Find full path
type -a datasets
# Output: datasets is an alias for ~/Workdir/ncbi_tests/datasets

echo ~/Workdir/ncbi_tests/datasets  # Expand ~
# Output: /Users/delphine/Workdir/ncbi_tests/datasets
# Use full path in script
datasets_cmd = '/Users/delphine/Workdir/ncbi_tests/datasets'
subprocess.run([datasets_cmd, 'summary', ...])

Alternative: Use shell=True (but avoid for security reasons with user input)


Key Principle for File Operations

Ask yourself first:

  1. Can this be done with cp, mv, sed, awk, grep?
  2. Is the change purely textual (not logic-dependent)?
  3. Do I need to see the file content, or just modify it?

If answers are YES, YES, NO → Use bash commands, not Read/Edit/Write


7. Filter Command Output

For commands that produce large output:

# ❌ DON'T: Capture all output
Bash: find / -name "*.py"  # Could return 10,000+ files

# ✅ DO: Limit or filter output
Bash: find /specific/path -name "*.py" | head -50
Bash: find . -name "*.py" -type f | wc -l  # Count first
Bash: find . -name "*.py" -type f | grep "test" | head -20  # Filter

# ❌ DON'T: Run verbose commands without filtering
Bash: ls -laR /  # Recursive listing of entire filesystem!

# ✅ DO: Limit scope and depth
Bash: ls -la
Bash: find . -maxdepth 2 -type f
Bash: tree -L 2  # Limit tree depth

7. Summarize, Don’t Dump

When explaining command output or file contents:

User: "What's in this directory?"

❌ BAD RESPONSE:
[Paste entire 5K token ls -la output with 500 files]

✅ GOOD RESPONSE:
"This directory contains 487 files. Key items:
- 235 Python files (*.py)
- 142 test files in tests/
- 89 config files (*.yaml, *.json)
- Main entry point: main.py
- Documentation in docs/

Would you like to see specific files or file types?"

For code:

User: "What does this script do?"

❌ BAD: [Read entire 500-line file, paste all code]

✅ GOOD:
1. Read: script.py (limit: 50)  # Just the header/imports
2. Grep: "^def " script.py  # List all functions
3. Summarize: "This script has 5 main functions:
   - parse_args(): Command-line argument parsing
   - load_data(): Reads CSV files
   - process_data(): Applies transformations
   - validate_output(): Checks results
   - main(): Orchestrates the workflow

   Would you like details on any specific function?"

8. Use Head/Tail for Large Output

When commands produce large output:

# ✅ Limit output length
Bash: cat large_file.txt | head -100
Bash: cat large_file.txt | tail -100
Bash: docker logs container_name | tail -50

# ✅ Sample from middle
Bash: cat large_file.txt | head -500 | tail -100  # Lines 400-500

# ✅ Check size before reading
Bash: wc -l file.txt
# If > 1000 lines, use head/tail

9. Use JSON/Data Tools Efficiently

For JSON, YAML, XML files:

# ❌ DON'T: Read entire file
Read: large_config.json  # Could be 50K tokens

# ✅ DO: Extract specific fields
Bash: cat large_config.json | jq '.metadata'
Bash: cat large_config.json | jq 'keys'  # Just see top-level keys
Bash: cat config.yaml | yq '.database.host'

# For XML
Bash: xmllint --xpath '//database/host' config.xml

For CSV files:

# ❌ DON'T: Read entire CSV
Read: large_data.csv  # Could be millions of rows

# ✅ DO: Sample and analyze
Bash: head -20 large_data.csv  # See header and sample rows
Bash: wc -l large_data.csv  # Count rows
Bash: csvstat large_data.csv  # Get statistics (if csvkit installed)

10. Optimize Code Reading

For understanding codebases:

# ✅ STEP 1: Get overview
Bash: find . -name "*.py" | head -20  # List files
Bash: grep -r "^class " --include="*.py" | head -20  # List classes
Bash: grep -r "^def " --include="*.py" | wc -l  # Count functions

# ✅ STEP 2: Read structure only
Read: main.py (limit: 100)  # Just imports and main structure

# ✅ STEP 3: Search for specific code
Grep: "class MyClass" src/

# ✅ STEP 4: Read only relevant sections
Read: src/mymodule.py (offset: 150, limit: 50)  # Just the relevant class

# ❌ DON'T: Read entire files sequentially
Read: file1.py  # 30K tokens
Read: file2.py  # 30K tokens
Read: file3.py  # 30K tokens

11. Use Task Tool for Exploratory Searches

When exploring a codebase to understand patterns or find information (not needle queries for specific files):

❌ Inefficient approach (many tool calls, large context):

# Direct grep through many files
Grep(pattern="some_pattern", path=".", output_mode="content")
# Followed by multiple Read calls to understand context
Read("file1.py")
Read("file2.py")
# Followed by more Grep calls for related patterns
Grep(pattern="related_pattern", path=".", output_mode="content")
# Results in dozens of tool calls and accumulating context

✅ Efficient approach (single consolidated response):

# Use Task tool with Explore subagent
Task(
    subagent_type="Explore",
    description="Research how Galaxy API works",
    prompt="""Explore the codebase to understand how Galaxy API calls are made.
    I need to know:
    - Which files contain API call patterns
    - How authentication is handled
    - Common error handling patterns
    Return a summary with file locations and key patterns."""
)

When to use Task/Explore:

  • “How does X work in this codebase?”
  • “Where are errors from Y handled?”
  • “What is the structure of Z?”
  • Searching for patterns across multiple files
  • Need context from multiple locations
  • Exploring unfamiliar codebases

When to use direct tools instead:

  • “Read file at specific path X” → Use Read
  • “Find class definition Foo” → Use Glob("**/foo.py") or Grep("class Foo")
  • “Search for specific string in file X” → Use Grep(pattern, path="file.py")
  • You know exactly which file to check

Token savings:

  • Task tool: ~5-10K tokens for consolidated response
  • Direct exploration: ~30-50K tokens (many tool calls + context accumulation)
  • Savings: 70-80% for exploratory searches

Example comparison:

# ❌ Inefficient: Exploring workflow patterns manually
Grep("workflow", output_mode="content")  # 15K tokens
Read("workflow1.py")  # 20K tokens
Read("workflow2.py")  # 18K tokens
Grep("error handling", output_mode="content")  # 12K tokens
# Total: ~65K tokens

# ✅ Efficient: Using Task tool
Task(
    subagent_type="Explore",
    description="Understand workflow error handling",
    prompt="Explore how workflows handle errors. Return patterns and file locations."
)
# Total: ~8K tokens (single consolidated response)
# Savings: 88%

12. Efficient Scientific Literature Searches

When searching for data across multiple species (karyotypes, traits, etc.):

❌ Inefficient: Sequential searches

for species in species_list:
    search(species)  # One at a time

✅ Efficient: Parallel searches in batches

# Make 5 searches simultaneously
WebSearch("species1 karyotype")
WebSearch("species2 karyotype")
WebSearch("species3 karyotype")
WebSearch("species4 karyotype")
WebSearch("species5 karyotype")

Benefits:

  • 5x faster for user
  • Same token usage per search
  • Better user experience
  • Allows quick progress saves before session limits

Best practices:

  • Batch 3-5 related searches together
  • Group by taxonomy or data type
  • Save results immediately after each batch
  • Document “not found” species to avoid re-searching

Dealing with Session Interruptions

When user warns about daily limits:

  1. Immediately save progress:

    • Write findings to file
    • Update CSV/database with confirmed data
    • Create detailed progress document
  2. Document search status:

    • Which species searched
    • Which confirmed/not found
    • Which remain to search
    • Next steps with priority order
  3. Create resume file with:

    • Current totals
    • Completed work
    • Pending tasks with priorities
    • Recommendations for next session

Example: PROGRESS_YYYYMMDD.md file with clear resumption instructions

Search Term Iteration

When initial searches fail, refine systematically:

  1. First try: Specific scientific terms

    • “Anas acuta karyotype 2n”
  2. Second try: Common name + scientific

    • “northern pintail Anas acuta chromosome number”
  3. Third try: Genus-level patterns

    • “Anas genus karyotype waterfowl”
  4. Fourth try: Family-level studies

    • “Anatidae chromosome evolution cytogenetics”

Don’t: Keep searching the same terms repeatedly Do: Escalate to higher taxonomic levels or comparative studies


Token Savings Examples

Example 1: Status Check

Scenario: User asks “What’s the status of my application?”

❌ Wasteful approach (50K tokens):

Read: /var/log/app.log  # 40K tokens
Bash: systemctl status myapp  # 10K tokens

✅ Efficient approach (3K tokens):

Bash: systemctl status myapp --no-pager | head -20  # 1K tokens
Bash: tail -50 /var/log/app.log  # 2K tokens

Savings: 94%


Example 2: Debugging Errors

Scenario: User says “My script is failing, help debug”

❌ Wasteful approach (200K tokens):

Read: debug.log  # 150K tokens
Read: script.py  # 30K tokens
Read: config.json  # 20K tokens

✅ Efficient approach (8K tokens):

Bash: tail -100 debug.log  # 3K tokens
Bash: grep -i "error\|traceback" debug.log | tail -50  # 2K tokens
Grep: "def main" script.py  # 1K tokens
Read: script.py (offset: 120, limit: 50)  # 2K tokens (just the failing function)

Savings: 96%


Example 3: Code Review

Scenario: User asks “Review this codebase”

❌ Wasteful approach (500K tokens):

Read: file1.py
Read: file2.py
Read: file3.py
Read: file4.py
# ... reads 20+ files

✅ Efficient approach (20K tokens):

Bash: find . -name "*.py" | head -30  # 1K
Bash: cloc .  # Lines of code summary - 1K
Bash: grep -r "^class " --include="*.py" | head -20  # 2K
Bash: grep -r "^def " --include="*.py" | wc -l  # 1K
Read: main.py (limit: 100)  # 3K
Read: README.md  # 5K
Grep: "TODO\|FIXME\|XXX" -r .  # 2K
# Then ask user what specific areas to review

Savings: 96%


When to Override These Guidelines

Override efficiency rules when:

  1. User explicitly requests full output:

    • “Show me the entire log file”
    • “Read the full source code”
    • “I don’t care about token cost”
  2. Filtered output lacks necessary context:

    • Error message references line numbers not in filtered output
    • Need to understand full data flow
    • Debugging requires seeing complete state
  3. File is known to be small:

    • File is < 200 lines
    • Config files with minimal content
    • Small documentation files
  4. Learning code structure and architecture (IMPORTANT):

    • User is exploring a new codebase to understand its organization
    • Learning coding patterns, idioms, or best practices from existing code
    • Understanding how modules/classes are structured
    • Studying implementation approaches for educational purposes
    • Reading example code or reference implementations
    • Initial exploration phase before making changes

    Key indicators for learning mode:

    • User says: “help me understand this codebase”, “how does X work?”, “show me how this is implemented”
    • User is asking conceptual questions: “what patterns are used?”, “how is this organized?”
    • User wants to learn from the code, not just debug or modify it
    • User is new to the project or technology

    In learning mode:

    ✅ DO: Read full files to show complete patterns and structure
    ✅ DO: Read multiple related files to show how components interact
    ✅ DO: Show full function/class implementations as examples
    ✅ DO: Explain code in detail with context
    
    ⚠️ BALANCE: Still use strategic efficiency (don't read 50 files at once)
    - Apply strategic file selection (see section below)
    - Read 2-5 key files fully to establish understanding
    - Use grep to find other relevant examples
    - Summarize patterns found across many files
    

    After learning phase, return to efficient mode for implementation.

In cases 1-3, explain to the user:

"This will use approximately [X]K tokens. Should I proceed?
Or would you prefer a filtered/summarized view first?"

In learning mode (case 4), prioritize understanding over token efficiency, but still be strategic about which files to read fully (see Strategic File Selection below).


Strategic File Selection for Learning Mode

When entering learning mode, first determine if this is broad exploration or targeted learning, then apply the appropriate strategy.

Learning Mode Types

Type 1: Broad Exploration – “Help me understand this codebase”, “How is this organized?” → Use repository-based strategies below (identify type, read key files)

Type 2: Targeted Pattern Learning – “How do I implement X?”, “Show me examples of Y” → Use targeted concept search (see Targeted Pattern Learning section below)


Targeted Pattern Learning

When user asks about a specific technique or pattern, use this focused approach instead of broad exploration.

Examples of Targeted Learning Queries

  • “How do variable number of outputs work in Galaxy wrappers?”
  • “Show me how to fetch invocation data from Galaxy API”
  • “How do I implement conditional parameters in Galaxy tools?”
  • “How does error handling work in this codebase?”
  • “Show me examples of async function patterns”
  • “How are tests structured for workflow X?”

Targeted Learning Workflow

STEP 1: Identify the Specific Concept

Extract the key concept from user’s question:

User: "How do variable number of outputs work in Galaxy wrappers?"
→ Concept: "variable number of outputs" OR "dynamic outputs"
→ Context: "Galaxy tool wrappers"
→ File types: ".xml" (Galaxy tool wrappers)

User: “How to fetch invocation data from Galaxy API?” → Concept: “fetch invocation” OR “invocation data” OR “get invocation” → Context: “Galaxy API calls” → File types: “.py” with Galaxy API usage


**STEP 2: Search for Examples**

Use targeted searches to find relevant code:

```bash
# For Galaxy variable outputs example
grep -r "discover_datasets\|collection_type.*list" --include="*.xml" | head -20
grep -r "<outputs>" --include="*.xml" -A 10 | grep -i "collection\|discover"

# For Galaxy invocation fetching
grep -r "invocation" --include="*.py" -B 2 -A 5 | head -50
grep -r "show_invocation\|get_invocation" --include="*.py" -l

# For conditional parameters
grep -r "<conditional" --include="*.xml" -l | head -10

# For error handling patterns
grep -r "try:\|except\|raise" --include="*.py" -l | xargs grep -l "class.*Error"

STEP 3: Rank and Select Examples

Selection criteria (in priority order):

  1. Documentation/Comments – Files with good comments explaining the pattern

    # Find well-documented examples
    grep -r "pattern-keyword" --include="*.py" -B 5 | grep -E "^\s*#|^\s*\"\"\"" | wc -l
    
  2. Simplicity – Simpler examples are better for learning

    # Find shorter files (likely simpler)
    grep -rl "pattern-keyword" --include="*.py" | xargs wc -l | sort -n | head -5
    
  3. Recency – Recent code shows current best practices

    # Find recent examples
    grep -rl "pattern-keyword" --include="*.py" | xargs ls -lt | head -5
    
  4. Multiple variations – Show different approaches if they exist

    # Compare different implementations
    grep -r "pattern-keyword" --include="*.py" -l | head -3
    

STEP 4: Read Examples Fully

Read 2-3 selected examples completely to understand the pattern:

# Example: Variable outputs in Galaxy
# After finding: tools/tool1.xml, tools/tool2.xml, tools/advanced.xml

Read: tools/tool1.xml  # Simple example
Read: tools/tool2.xml  # Standard example
Read: tools/advanced.xml  # Complex variation (if needed)

STEP 5: Extract and Explain the Pattern

After reading examples, explain:

  1. The core pattern – How it works conceptually
  2. Required elements – What’s needed to implement it
  3. Common variations – Different ways to use it
  4. Common pitfalls – What to avoid
  5. Best practices – Recommended approach

Targeted Learning Examples

Example 1: Variable Number of Outputs in Galaxy Wrappers

User query: “How do variable number of outputs work in Galaxy wrappers?”

Execution:

# Step 1: Identify concept
# Concept: dynamic outputs, discover_datasets
# File type: Galaxy XML wrappers (*.xml)

# Step 2: Search for examples
grep -r "discover_datasets" --include="*.xml" -l
# Output: tools/samtools/samtools_merge.xml, tools/kraken2/kraken2.xml, ...

# Step 3: Rank examples
ls -lt tools/*/samtools_merge.xml tools/*/kraken2.xml
# Pick: samtools_merge.xml (recent, well-maintained)

# Step 4: Read example fully
Read: tools/samtools/samtools_merge.xml

# Step 5: Find another variation
grep -r "collection_type.*list" --include="*.xml" -l | head -1
Read: tools/example/collection_output.xml

Explanation provided:

I found two main approaches for variable outputs in Galaxy:

1. **discover_datasets pattern** (samtools_merge.xml:45-52):
   - Use <discover_datasets> in output section
   - Specify pattern and format
   - Galaxy auto-detects files matching pattern

2. **output_collection pattern** (collection_output.xml:78-85):
   - Define collection type (list, paired, etc.)
   - Use discover_datasets within collection
   - For grouped/structured outputs

Key requirements:
- Set discover_datasets pattern attribute
- Specify format for discovered files
- Optional: Use directory attribute for subdirectories

Would you like me to show a specific implementation for your use case?

Example 2: Fetching Invocation Data from Galaxy API

User query: “How do I fetch invocation data using the Galaxy API?”

Execution:

# Step 1: Identify concept
# Concept: fetch invocation, show_invocation, invocation data
# Context: Galaxy API, bioblend usage
# File type: Python files with bioblend/API calls

# Step 2: Search for examples
grep -r "show_invocation\|get_invocation" --include="*.py" -l
# Output: galaxy_client.py, orchestrator.py, check_status.py

grep -r "invocations.show\|\.show_invocation" --include="*.py" -B 3 -A 8
# Shows actual usage patterns

# Step 3: Rank examples
# galaxy_client.py - likely has core API wrapper functions
# orchestrator.py - shows usage in context
# Pick both for different perspectives

# Step 4: Read examples
Read: batch_vgp_run/galaxy_client.py
# Focus on invocation-related functions

grep -n "def.*invocation" batch_vgp_run/galaxy_client.py
# Shows: check_invocation_complete (line 250), rerun_failed_invocation (line 847)

Read: batch_vgp_run/galaxy_client.py (offset: 245, limit: 60)
Read: batch_vgp_run/galaxy_client.py (offset: 840, limit: 70)

Explanation provided:

I found the pattern for fetching invocation data (galaxy_client.py:250-285):

**Basic invocation fetch:**
```python
invocation = gi.invocations.show_invocation(invocation_id)
state = invocation['state']  # 'ok', 'running', 'failed', 'cancelled'

Detailed invocation with steps:

invocation = gi.invocations.show_invocation(invocation_id, include_workflow_steps=True)
steps = invocation.get('steps', {})
for step_id, step_data in steps.items():
    step_state = step_data['state']
    job_id = step_data.get('job_id')

Key fields available:

  • invocation[‘state’] – Overall invocation status
  • invocation[‘steps’] – Individual step details
  • invocation[‘workflow_id’] – Associated workflow
  • invocation[‘history_id’] – Target history
  • invocation[‘update_time’] – Last update timestamp

Common patterns:

  1. Poll for completion (galaxy_client.py:250-285)
  2. Get failed step details (galaxy_client.py:320-340)
  3. Rerun with modifications (galaxy_client.py:847-899)

Which pattern do you need for your use case?


---

#### Example 3: Conditional Parameters in Galaxy Tools

**User query:** "How do conditional parameters work in Galaxy tool wrappers?"

**Execution:**
```bash
# Step 1: Identify concept
# Concept: conditional parameters, when expression
# File type: Galaxy XML wrappers

# Step 2: Search
grep -r "<conditional" --include="*.xml" -l | head -10
grep -r "<conditional" --include="*.xml" -A 15 | head -50

# Step 3: Find simple example first
grep -rl "<conditional" --include="*.xml" | xargs wc -l | sort -n | head -3
# Pick shortest file with conditionals

# Step 4: Read examples
Read: tools/simple-tool/simple_conditional.xml  # Simple case
Read: tools/complex-tool/advanced_conditional.xml  # Nested case

Explanation provided:

Conditional parameters in Galaxy (simple_conditional.xml:34-58):

**Basic structure:**
```xml
<conditional name="output_choice">
    <param name="output_type" type="select" label="Output type">
        <option value="single">Single file</option>
        <option value="collection">Collection</option>
    </param>

    <when value="single">
        <param name="format" type="select" label="Format">
            <option value="txt">Text</option>
            <option value="csv">CSV</option>
        </param>
    </when>

    <when value="collection">
        <param name="collection_type" type="select" label="Collection type">
            <option value="list">List</option>
            <option value="paired">Paired</option>
        </param>
    </when>
</conditional>

In command block (Cheetah syntax):

#if $output_choice.output_type == "single":
    --format ${output_choice.format}
#else:
    --collection-type ${output_choice.collection_type}
#end if

Advanced: Nested conditionals (advanced_conditional.xml:67-120):

  • Conditionals can contain other conditionals
  • Each is independent
  • Access nested values: ${outer.inner.value}

Would you like to see nested examples or specific use case?


---

### When to Use Targeted Learning

**Use targeted learning when user:**
- ✅ Asks "how do I..." about specific feature
- ✅ Requests "show me examples of X"
- ✅ Wants to learn specific pattern/technique
- ✅ Has focused technical question
- ✅ References specific concept/API/feature

**Don't use for:**
- ❌ "Understand this codebase" (use broad exploration)
- ❌ "What does this project do?" (use documentation reading)
- ❌ "Debug this error" (use debugging mode, not learning mode)

---

### Key Principles for Targeted Learning

1. **Search first, read second**
   - Use grep to find relevant examples
   - Rank by quality/simplicity/recency
   - Then read selected examples fully

2. **Read 2-3 examples, not 20**
   - Simple example (minimal working code)
   - Standard example (common usage)
   - Complex example (advanced features) - optional

3. **Extract the pattern**
   - Don't just show code, explain the pattern
   - Highlight key elements and structure
   - Show variations and alternatives

4. **Provide context**
   - Where this pattern is used
   - When to use it vs alternatives
   - Common pitfalls and best practices

5. **Confirm understanding**
   - Ask if user needs specific variation
   - Offer to show related patterns
   - Check if explanation answered their question

---

## General Exploration vs Targeted Learning

**When user says → Use this approach:**

| User Request | Approach | Strategy |
|--------------|----------|----------|
| "Help me understand this codebase" | **General Exploration** | Identify repo type → Read key files |
| "How is this project organized?" | **General Exploration** | Read docs → Entry points → Architecture |
| "Show me how to implement X" | **Targeted Learning** | Search for X → Read examples → Extract pattern |
| "How does feature Y work?" | **Targeted Learning** | Grep for Y → Find best examples → Explain |
| "What patterns are used here?" | **General Exploration** | Read core files → Identify patterns |
| "How do I use API method Z?" | **Targeted Learning** | Search for Z usage → Show examples |

---

## Broad Repository Exploration

When entering broad exploration mode, **first identify the repository context**, then apply the appropriate exploration strategy.

### STEP 1: Identify Repository Type

**Ask these questions or check indicators:**

```bash
# Check for multiple independent tools/packages
ls -d */ | wc -l  # Many directories at root level?
ls recipes/ tools/ packages/ 2>/dev/null  # Collection structure?

# Check for submission/contribution guidelines
ls -la | grep -i "contrib\|guideline\|submiss"
cat CONTRIBUTING.md README.md 2>/dev/null | grep -i "structure\|organization\|layout"

# Check for monolithic vs modular structure
find . -name "setup.py" -o -name "package.json" -o -name "Cargo.toml" | wc -l
# 1 = monolithic, many = multi-package

# Check for specific patterns
ls -la | grep -E "recipes/|tools/|workflows/|plugins/|examples/"

Repository type indicators:

  1. Tool Library / Recipe Collection (bioconda, tool collections)

    • Multiple independent directories at same level
    • Each subdirectory is self-contained
    • Examples: recipes/tool1/, recipes/tool2/, workflows/workflow-a/
    • Indicator files: recipes/, tools/, packages/, multiple meta.yaml or package.json
  2. Monolithic Application (single integrated codebase)

    • One main entry point
    • Hierarchical module structure
    • Shared dependencies and utilities
    • Examples: src/, lib/, single setup.py, main.py
    • Indicator files: Single setup.py, main.py, __init__.py, src/ directory
  3. Framework / SDK (extensible system)

    • Core framework + plugins/extensions
    • Base classes and interfaces
    • Examples: core/, plugins/, extensions/, base/
    • Indicator files: core/, plugins/, documentation on extending
  4. Example / Template Repository

    • Multiple example implementations
    • Each directory shows different pattern
    • Examples: examples/, samples/, templates/
    • Indicator files: examples/, README in each subdirectory

STEP 2: Apply Context-Specific Strategy

Strategy A: Tool Library / Recipe Collection

Goal: Learn the pattern from representative examples

Approach:

# 1. Find most recently modified (shows current best practices)
ls -lt recipes/ | head -10  # or tools/, workflows/, etc.

# 2. Find most common patterns
find recipes/ -name "meta.yaml" -o -name "*.xml" | head -1 | xargs dirname

# 3. Read submission guidelines first
cat CONTRIBUTING.md README.md | grep -A 20 -i "structure\|format\|template"

# 4. Read 2-3 representative examples
# Pick: 1 recent, 1 complex, 1 simple
ls -lt recipes/ | head -3

Files to read (in order):

  1. CONTRIBUTING.md or submission guidelines → Learn required structure
  2. Recent tool/recipe → Current best practices
  3. Well-established tool/recipe → Proven patterns
  4. Template or example → Base structure

Example:

# For bioconda-style repository
Read: CONTRIBUTING.md
ls -lt recipes/ | head -5  # Pick a recent one
Read: recipes/recent-tool/meta.yaml
Read: recipes/established-tool/meta.yaml  # Compare patterns

Strategy B: Monolithic Application

Goal: Understand execution flow and architecture

Approach:

# 1. Find entry point
find . -name "main.py" -o -name "app.py" -o -name "run*.py" | grep -v test | head -5

# 2. Find most imported modules (core components)
grep -r "^import\|^from" --include="*.py" . | \
  sed 's/.*import //' | cut -d' ' -f1 | cut -d'.' -f1 | \
  sort | uniq -c | sort -rn | head -10

# 3. Find orchestrators/managers
find . -name "*manager.py" -o -name "*orchestrator.py" -o -name "*controller.py"

# 4. Check recent changes (active development areas)
git log --name-only --pretty=format: --since="1 month ago" | \
  sort | uniq -c | sort -rn | head -10

Files to read (in order):

  1. README.md → Overview and architecture
  2. Entry point (main.py, run_all.py) → Execution flow
  3. Core orchestrator/manager → Main logic
  4. Most-imported utility module → Common patterns
  5. One domain-specific module → Implementation details

Example:

# For Python application
Read: README.md
Read: main.py  # Entry point
grep -r "^from.*import" main.py | head -10  # See what it imports
Read: src/orchestrator.py  # Core component
Read: src/utils.py  # Common utilities

Strategy C: Framework / SDK

Goal: Understand core abstractions and extension points

Approach:

# 1. Find base classes and interfaces
grep -r "^class.*Base\|^class.*Interface\|^class.*Abstract" --include="*.py" | head -10

# 2. Find core module
ls -la | grep -E "core/|base/|framework/"

# 3. Find plugin/extension examples
ls -la | grep -E "plugins?/|extensions?/|examples?/"

# 4. Check documentation for architecture
find . -name "*.md" | xargs grep -l -i "architecture\|design\|pattern" | head -5

Files to read (in order):

  1. Architecture documentation → Design philosophy
  2. Base/core classes → Fundamental abstractions
  3. Simple plugin/extension → How to extend
  4. Complex plugin/extension → Advanced patterns

Example:

# For plugin-based framework
Read: docs/architecture.md
Read: core/base.py  # Base classes
Read: plugins/simple-example/  # How to extend
Read: plugins/advanced-example/  # Advanced usage

Strategy D: Example / Template Repository

Goal: Learn different patterns and use cases

Approach:

# 1. List all examples
ls -d examples/*/ samples/*/ templates/*/

# 2. Read index/catalog if available
cat examples/README.md examples/INDEX.md

# 3. Pick representative examples
# - Simple/basic example
# - Medium complexity
# - Advanced/complete example

Files to read (in order):

  1. examples/README.md → Overview of examples
  2. Basic example → Minimal working pattern
  3. Advanced example → Full-featured pattern
  4. Compare differences → Learn progression

STEP 3: Execution Strategy Template

For ANY repository type, use this workflow:

# PHASE 1: Context Discovery (always token-efficient)
ls -la  # Repository structure
cat README.md  # Overview
ls -la .github/ docs/ | head -20  # Find documentation
cat CONTRIBUTING.md 2>/dev/null | head -50  # Submission guidelines

# PHASE 2: Identify Type (ask user if unclear)
"I see this repository has [X structure]. Is this:
A) A tool library where each tool is independent?
B) A monolithic application with integrated components?
C) A framework with core + plugins?
D) A collection of examples/templates?

This helps me choose the best files to learn from."

# PHASE 3: Strategic Reading (based on type)
[Apply appropriate strategy A/B/C/D from above]
Read 2-5 key files fully
Grep for patterns across remaining files

# PHASE 4: Summarize and Confirm
"Based on [files read], I understand:
- Pattern/architecture: [summary]
- Key components: [list]
- Common patterns: [examples]

Is this the area you want to focus on, or should I explore [other aspect]?"

File Selection Priorities (General Rules)

Priority 1: Documentation

README.md, CONTRIBUTING.md, docs/architecture.md
# These explain intent, not just implementation

Priority 2: Entry Points

# Monolithic: main.py, app.py, run.py, __main__.py
# Library: Most recent example in collection

Priority 3: Core Components

# Most imported modules
grep -r "import" | cut -d: -f2 | sort | uniq -c | sort -rn

# "Manager", "Controller", "Orchestrator", "Core", "Base"
find . -name "*manager*" -o -name "*core*" -o -name "*base*"

Priority 4: Representative Examples

# Recent files (current best practices)
ls -lt directory/ | head -5

# Medium complexity (not too simple, not too complex)
wc -l **/*.py | sort -n | awk 'NR > 10 && NR < 20'

Priority 5: Active Development Areas

# Git history (if available)
git log --name-only --since="1 month ago" --pretty=format: | sort | uniq -c | sort -rn

Practical Examples

Example 1: Learning bioconda recipe patterns

# Step 1: Identify type
ls recipes/ | wc -l
# Output: 3000+ → Tool library

# Step 2: Check guidelines
Read: CONTRIBUTING.md  # Learn structure requirements

# Step 3: Find representative recipes
ls -lt recipes/ | head -5  # Get recent ones
# Pick one that was updated recently (current practices)
Read: recipes/recent-tool/meta.yaml

# Pick one established recipe for comparison
Read: recipes/samtools/meta.yaml

# Step 4: Summarize pattern
"I see bioconda recipes follow this structure:
- Jinja2 variables at top
- package/source/build/requirements/test/about sections
- Current practice: use pip install for Python packages
- sha256 checksums required
Should I look at any specific type of recipe (Python/R/compiled)?"

Example 2: Learning VGP pipeline orchestration

# Step 1: Identify type
ls *.py
# Output: run_all.py, orchestrator.py → Monolithic application

# Step 2: Read entry point
Read: run_all.py

# Step 3: Find core components
grep "^from batch_vgp_run import" run_all.py
# Shows: orchestrator, galaxy_client, workflow_manager

# Step 4: Read core orchestrator
Read: batch_vgp_run/orchestrator.py  # Full file to understand flow

# Step 5: Read supporting modules selectively
grep "def run_species_workflows" batch_vgp_run/orchestrator.py -A 5
Read: batch_vgp_run/galaxy_client.py  # Key helper functions

Example 3: Learning Galaxy workflow patterns

# Step 1: Identify type
ls -d */  # Shows category directories
# Output: transcriptomics/, genome-assembly/, etc. → Example collection

# Step 2: Read guidelines
Read: .github/CONTRIBUTING.md

# Step 3: Pick representative workflows
ls -lt transcriptomics/  # Recent workflows
Read: transcriptomics/recent-workflow/workflow.ga
Read: transcriptomics/recent-workflow/README.md

# Step 4: Compare with another category
Read: genome-assembly/example-workflow/workflow.ga

# Step 5: Extract common patterns
grep -r "\"format-version\"" . | head -5
grep -r "\"creator\"" . | head -5

Key Principle for Learning Mode

Balance understanding with efficiency:

  • ✅ Read 2-5 strategic files fully (based on context)
  • ✅ Use grep/head/tail for pattern discovery across many files
  • ✅ Ask user which aspect to focus on after initial exploration
  • ✅ Summarize findings before reading more

Don’t:

  • ❌ Read 20+ files sequentially without strategy
  • ❌ Read files without understanding their role
  • ❌ Ignore repository context and documentation

Quick Reference Card

Model Selection (First Priority):

  • 🎓 Learning/Understanding → Use Opus
  • 🔧 Development/Debugging/Implementation → Use Sonnet (default)

Before ANY file operation, ask yourself:

  1. Can I use bash commands instead? (cp, sed, awk, grep) → 99%+ token savings
  2. Is this a simple text operation? → Use sed/awk, not Read/Edit
  3. Am I copying/merging files? → Use cp/cat, not Read/Write
  4. Can I check metadata first? (file size, line count, modification time)
  5. Can I filter before reading? (grep, head, tail)
  6. Can I read just the structure? (first 50 lines, function names)
  7. Can I summarize instead of showing raw data?
  8. Does the user really need the full content?

Default strategy for file operations:

# FIRST: Try bash commands
cp source.txt dest.txt                    # Instead of Read + Write
sed -i '' 's/old/new/g' file.txt         # Instead of Read + Edit
cat file1.txt file2.txt > combined.txt   # Instead of Read + Read + Write
echo "text" >> file.txt                  # Instead of Read + Write (append)

# ONLY IF NEEDED: Read files
wc -l file.txt                           # Check size first
head -20 file.txt                        # Read sample
grep "pattern" file.txt | head -50       # Filter before reading

# LAST RESORT: Full file read
# Only when you need to understand code structure or complex logic

Cost Impact

Conservative estimate for typical usage:

Approach Tokens/Week Claude Pro Claude Team Notes
Wasteful (Read/Edit/Write everything) 500K ⚠️ At risk of limits ✅ OK Reading files unnecessarily
Moderate (filtered reads only) 200K ✅ Comfortable ✅ Very comfortable Grep/head/tail usage
Efficient (bash commands + filters) 30-50K ✅ Very comfortable ✅ Excellent Using cp/sed/awk instead of Read

Applying these rules reduces costs by 90-95% on average.

Bash commands optimization alone:

  • File operations: 99%+ token savings (e.g., 50K tokens → 50 tokens)
  • Most impactful single optimization
  • Zero learning curve (standard bash commands)

Implementation

This skill automatically applies these optimizations when:

  • Reading log files
  • Executing commands with large output
  • Navigating codebases
  • Debugging errors
  • Checking system status

You can always override by saying:

  • “Show me the full output”
  • “Read the entire file”
  • “I want verbose mode”
  • “Don’t worry about tokens”

Managing Long-Running Background Processes

Best Practices for Background Tasks

When running scripts that take hours, properly manage background processes to prevent resource leaks and enable clean session transitions:

1. Run in background with Bash tool run_in_background: true

2. Document the process in status files:

## Background Processes
- Script: comprehensive_search.py
- Process ID: Available via BashOutput tool
- Status: Running (~6% complete)
- How to check: BashOutput tool with bash_id

3. Kill cleanly before session end:

# Before ending session:
# 1. Kill all background processes
KillShell(shell_id="abc123")

# 2. Create resume documentation (see claude-collaboration skill)
# 3. Document current progress (files, counts, status)
# 4. Save intermediate results

4. Design scripts to be resumable (see Python Environment Management skill):

  • Check for existing output files (skip if present)
  • Load existing results and append new ones
  • Save progress incrementally (not just at end)
  • Track completion status in structured format

Pre-Interruption Checklist

Before ending a session with running processes:

  1. ✅ Check background process status
  2. ✅ Kill all background processes cleanly
  3. ✅ Create resume documentation (RESUME_HERE.md)
  4. ✅ Document current progress with metrics
  5. ✅ Save intermediate results to disk
  6. ✅ Verify resume commands in documentation

Token Efficiency Benefit

Properly managing background processes:

  • Prevents context pollution – Old process output doesn’t leak into new sessions
  • Enables clean handoff – Resume docs allow fresh session without re-explaining
  • Avoids redundant work – Resumable scripts don’t repeat completed tasks

Repository Organization for Long Projects

Problem

Data enrichment and analysis projects generate many intermediate files, scripts, and logs that clutter the root directory, making it hard to:

  • Find the current working dataset
  • Identify which scripts are actively used
  • Navigate the project structure
  • Maintain focus on important files

Solution: Organize Early and Often

Create dedicated subfolders at project start:

mkdir -p python_scripts/ logs/ tables/

Organization strategy:

  • python_scripts/ – All analysis and processing scripts (16+ scripts in VGP project)
  • logs/ – All execution logs from script runs (38+ logs in VGP project)
  • tables/ – Intermediate results, old versions, and archived data
  • Root directory – Only main working dataset and current outputs

Benefits:

  • Reduces cognitive load when scanning directory
  • Makes git status cleaner and more readable
  • Easier to exclude intermediate files from version control
  • Faster file navigation with autocomplete
  • Professional project structure for collaboration

When to organize:

  • At project start (ideal)
  • After accumulating 5+ scripts or logs (acceptable)
  • Before sharing project with collaborators (essential)

Example cleanup script:

# Move all Python scripts
mkdir -p python_scripts
mv *.py python_scripts/

# Move all logs
mkdir -p logs
mv *.log logs/

# Move intermediate tables (keep main dataset in root)
mkdir -p tables
mv *_intermediate.csv *_backup.csv *_old.csv tables/

Token efficiency impact:

  • Cleaner ls outputs (fewer lines to process)
  • Easier to target specific directories with Glob
  • Reduced cognitive overhead when navigating
  • Faster file location with autocomplete

Summary

Core motto: Right model. Bash over Read. Filter first. Read selectively. Summarize intelligently.

Model selection (highest impact):

  • Use Opus for learning/understanding (one-time investment)
  • Use Sonnet for development/debugging/implementation (default)
  • This alone can save ~50% cost vs using Opus for everything

Primary optimization rule:

  • Use bash commands for file operations (cp, sed, awk, grep) instead of Read/Edit/Write
  • This alone can save 99%+ tokens on file operations

Secondary rules:

  • Filter before reading (grep, head, tail)
  • Read with limits when needed
  • Summarize instead of showing raw output
  • Use quiet modes for commands
  • Strategic file selection for learning

By following these guidelines, users can get 5-10x more value from their Claude subscription while maintaining high-quality assistance.