bio-vcf
npx skills add https://github.com/dakesan/cc-dnawork-plugin --skill bio-vcf
Agent 安装分布
Skill 文档
VCF Toolkit
Toolkit for VCF/BCF variant file analysis: calculate statistics, filter variants, and export as JSON. Designed for WGS/WES sequencing result inspection and quality control.
Quick Start
Install
uv pip install pysam typer
Basic Usage
# 1. VCF çµ±è¨æ
å ±ãåå¾
python scripts/vcf_stats.py --vcf variants.vcf.gz --chrom chr1
# 2. é«å質ããªã¢ã³ãã®ã¿ããã£ã«ã¿ãã¦æ°ãã VCF ã使
python scripts/filter_vcf.py \
--vcf variants.vcf.gz \
--output high_quality.vcf \
--min-qual 30 \
--min-dp 10
# 3. ãã£ã«ã¿ãããããªã¢ã³ãã JSON ã§åºåï¼â¤100 ã¨ã³ããªï¼
python scripts/inspect_vcf.py \
--vcf high_quality.vcf \
--chrom chr1 \
--output chr1.json
Scripts
inspect_vcf.py – VCF Inspection & JSON Export
Extract variants from VCF files for specific chromosomes or regions and export as JSON format.
Required Arguments
--vcf PATH– Input VCF file path--chrom TEXTor--region TEXT– Either one required--chrom: Entire chromosome (e.g.,chr1)--region: Specific region (e.g.,chr1:1000000-2000000)
Optional Arguments
Output:
--output PATH– JSON output path (default: stdout)
Filter Conditions:
--min-qual FLOAT– Minimum quality score (QUAL >= X)--min-dp INT– Minimum depth (INFO/DP >= X)--min-af FLOAT– Minimum allele frequency (INFO/AF >= X)--max-af FLOAT– Maximum allele frequency (INFO/AF <= X)--pass-only/--all-filters– PASS only (default) / Include all filters
Limits:
--max-variants INT– Maximum variant count (default: 100)--force– Ignore entry limit (allows large JSON output)
Output Format (JSON)
{
"num_variants": 45,
"samples": ["sample1", "sample2"],
"variants": [
{
"chrom": "chr1",
"pos": 12345,
"id": "rs123456",
"ref": "A",
"alts": ["G"],
"qual": 100.0,
"filter": ["PASS"],
"info": {
"DP": 50,
"AF": [0.5],
"AC": [25]
},
"samples": {
"sample1": {"GT": "0/1", "DP": 25, "GQ": 99},
"sample2": {"GT": "0/0", "DP": 25, "GQ": 99}
}
}
]
}
vcf_stats.py – VCF Statistics
Calculate comprehensive statistics from VCF files and output as JSON. Includes variant counts, quality distributions, depth distributions, and allele frequency statistics.
Arguments
Required:
--vcf PATH– Input VCF file path
Optional:
--chrom TEXT– Chromosome specification (default: all chromosomes)--region TEXT– Region specification (e.g.,chr1:1000-2000)--output PATH– JSON output path (default: stdout)
Output Content (JSON)
total_variants– Total variant countfilter_counts– Breakdown by filter (PASS, LowQual, etc.)variant_types– Breakdown by variant type (SNP, insertion, deletion)chrom_counts– Variant count per chromosomequality_stats– Quality score statistics (min, max, mean, median)depth_stats– Depth statistics (INFO/DP)allele_frequency_stats– Allele frequency statistics (INFO/AF)
Usage Examples
# Calculate statistics for chr1
python scripts/vcf_stats.py --vcf variants.vcf.gz --chrom chr1
# Calculate statistics for all chromosomes (output to JSON file)
python scripts/vcf_stats.py --vcf variants.vcf.gz --output stats.json
# Calculate statistics for specific region
python scripts/vcf_stats.py --vcf variants.vcf.gz --region chr1:10000-20000
filter_vcf.py – VCF Filtering
Filter VCF files by quality, depth, and allele frequency criteria. Output filtered variants as a new VCF file.
Arguments
Required:
--vcf PATH– Input VCF file path--output PATH– Output VCF file path
Optional:
--chrom TEXT– Chromosome specification--region TEXT– Region specification (e.g.,chr1:1000-2000)--min-qual FLOAT– Minimum quality score--min-dp INT– Minimum depth (INFO/DP)--min-af FLOAT– Minimum allele frequency (INFO/AF)--max-af FLOAT– Maximum allele frequency (INFO/AF)--pass-only– PASS variants only (default: False)
Usage Examples
# Extract chr1 PASS variants only
python scripts/filter_vcf.py \
--vcf variants.vcf.gz \
--output chr1_pass.vcf \
--chrom chr1 \
--pass-only
# Extract high-quality variants (QUAL >= 30, DP >= 10)
python scripts/filter_vcf.py \
--vcf variants.vcf.gz \
--output high_quality.vcf \
--min-qual 30 \
--min-dp 10
# Extract rare variants (AF <= 0.01)
python scripts/filter_vcf.py \
--vcf variants.vcf.gz \
--output rare_variants.vcf \
--max-af 0.01
Workflow Examples
Example 1: Comprehensive Variant Analysis Workflow
Combine all three scripts for complete VCF analysis:
# Step 1: Calculate overall statistics
python scripts/vcf_stats.py --vcf variants.vcf.gz --chrom chr1 --output stats.json
# Step 2: Filter high-quality variants to new VCF
python scripts/filter_vcf.py \
--vcf variants.vcf.gz \
--output high_quality.vcf \
--chrom chr1 \
--min-qual 30 \
--min-dp 10 \
--pass-only
# Step 3: Export filtered variants as JSON for downstream analysis
python scripts/inspect_vcf.py \
--vcf high_quality.vcf \
--chrom chr1 \
--output chr1_filtered.json
Example 2: Rare Variant Discovery
Identify and export rare variants from specific region:
# Filter rare variants (AF <= 0.01)
python scripts/filter_vcf.py \
--vcf variants.vcf.gz \
--output rare.vcf \
--region chr17:41196312-41277500 \
--max-af 0.01
# Export as JSON for analysis
python scripts/inspect_vcf.py \
--vcf rare.vcf \
--region chr17:41196312-41277500 \
--output brca1_rare.json
Error Handling
Variant Count Exceeds Limit
$ python scripts/inspect_vcf.py --vcf huge.vcf --chrom chr1 --output out.json
Error: VCF contains 1,234+ variants after filtering (limit: 100).
Suggestions:
- Apply more restrictive filters: --min-qual, --min-dp, --pass-only
- Specify a genomic region: --region chr1:1000-2000
- Override limit with --force (warning: may produce very large JSON)
- Use bcftools directly for large-scale processing
Current filter conditions:
--chrom chr1 --pass-only
Solutions:
- Apply more restrictive filters:
--min-qual 30,--min-dp 10 - Narrow down the region:
--region chr1:1000000-1100000 - Override limit with
--force(use cautiously)
Missing Chromosome/Region Specification
$ python scripts/inspect_vcf.py --vcf variants.vcf --output out.json
Error: Either --chrom or --region must be specified.
Solutions:
- Add
--chrom chr1or--region chr1:1000-2000to the command
Best Practices
1. Always Specify Chromosome or Region
Always specify chromosome or region when using inspect_vcf.py to avoid processing entire VCF files inefficiently.
# â Bad: No chromosome specified
python scripts/inspect_vcf.py --vcf variants.vcf
# â
Good: Chromosome specified
python scripts/inspect_vcf.py --vcf variants.vcf --chrom chr1
2. Apply Additional Filters for Efficiency
Combine quality and depth filters with default PASS-only filtering for better results.
# â
Good: Multiple filters applied
python scripts/inspect_vcf.py \
--vcf variants.vcf \
--chrom chr1 \
--min-qual 30 \
--min-dp 10
3. Respect 100-Entry Limit for JSON Export
Use inspect_vcf.py for small datasets only. Pre-filter large VCF files with filter_vcf.py or bcftools before JSON export.
# Pre-filter large datasets with bcftools
bcftools view -i 'QUAL>=30 && DP>=10' -r chr1:1000000-2000000 variants.vcf > filtered.vcf
# Then export to JSON
python scripts/inspect_vcf.py --vcf filtered.vcf --chrom chr1 --output filtered.json
4. Use –force Cautiously
Use –force only when necessary. JSON files with thousands of entries can become several MB to tens of MB in size.
When to Use vcf-toolkit vs bcftools
| Task | vcf-toolkit | bcftools |
|---|---|---|
| Small dataset JSON export | â inspect_vcf.py | – |
| Large-scale filtering | filter_vcf.py | â bcftools view |
| Complex filter expressions | – | â bcftools |
| VCF-to-VCF conversion | filter_vcf.py | â bcftools |
| Variant statistics | â vcf_stats.py | â bcftools stats |
Recommended Workflow:
- Pre-filter large datasets with bcftools or filter_vcf.py
- Export filtered results to JSON with inspect_vcf.py for detailed inspection
- Perform downstream analysis in Python/R using JSON output
Related Skills
- pysam – BAM/CRAM alignment file operations
- sequence-io – FASTA/FASTQ sequence file operations
- blast-search – BLAST homology search
- blat-api-searching – BLAT genome mapping
Troubleshooting
VCF File Too Large
Specify a narrower region or pre-filter with bcftools before JSON export.
# Specify narrower region
python scripts/inspect_vcf.py --vcf variants.vcf --region chr1:1000000-1100000
# Pre-filter with bcftools
bcftools view -i 'QUAL>=50' variants.vcf | python scripts/inspect_vcf.py --vcf - --chrom chr1
Index Error
Create tabix index for compressed VCF files.
# Compress with bgzip
bgzip variants.vcf
# Create tabix index
tabix -p vcf variants.vcf.gz
# Use indexed VCF
python scripts/inspect_vcf.py --vcf variants.vcf.gz --chrom chr1
Include Non-PASS Variants
Use --all-filters flag to include all variants regardless of FILTER field.
python scripts/inspect_vcf.py --vcf variants.vcf --chrom chr1 --all-filters