bio-copy-number-gatk-cnv
3
总安装量
3
周安装量
#55190
全站排名
安装命令
npx skills add https://github.com/gptomics/bioskills --skill bio-copy-number-gatk-cnv
Agent 安装分布
windsurf
2
trae
2
opencode
2
codex
2
claude-code
2
antigravity
2
Skill 文档
GATK CNV Workflow
Somatic CNV Workflow Overview
1. PreprocessIntervals â intervals.interval_list
2. CollectReadCounts â sample.counts.hdf5
3. CreateReadCountPanelOfNormals â pon.hdf5
4. DenoiseReadCounts â sample.denoised.tsv
5. CollectAllelicCounts â sample.allelicCounts.tsv
6. ModelSegments â sample.modelFinal.seg
7. CallCopyRatioSegments â sample.called.seg
Step 1: Preprocess Intervals
# For WES/targeted
gatk PreprocessIntervals \
-R reference.fa \
-L targets.interval_list \
--bin-length 0 \
--interval-merging-rule OVERLAPPING_ONLY \
-O preprocessed.interval_list
# For WGS
gatk PreprocessIntervals \
-R reference.fa \
--bin-length 1000 \
--padding 0 \
-O wgs.interval_list
Step 2: Collect Read Counts
# For each sample
gatk CollectReadCounts \
-R reference.fa \
-I sample.bam \
-L preprocessed.interval_list \
--interval-merging-rule OVERLAPPING_ONLY \
-O sample.counts.hdf5
Step 3: Create Panel of Normals
# Combine multiple normal samples
gatk CreateReadCountPanelOfNormals \
-I normal1.counts.hdf5 \
-I normal2.counts.hdf5 \
-I normal3.counts.hdf5 \
--minimum-interval-median-percentile 5.0 \
-O cnv_pon.hdf5
Step 4: Denoise Read Counts
# Using panel of normals
gatk DenoiseReadCounts \
-I tumor.counts.hdf5 \
--count-panel-of-normals cnv_pon.hdf5 \
--standardized-copy-ratios tumor.standardized.tsv \
--denoised-copy-ratios tumor.denoised.tsv
Step 5: Collect Allelic Counts
# From known SNP sites (for LOH detection)
gatk CollectAllelicCounts \
-R reference.fa \
-I tumor.bam \
-L common_snps.vcf \
-O tumor.allelicCounts.tsv
Step 6: Model Segments
# Somatic with matched normal allelic counts
gatk ModelSegments \
--denoised-copy-ratios tumor.denoised.tsv \
--allelic-counts tumor.allelicCounts.tsv \
--normal-allelic-counts normal.allelicCounts.tsv \
--output-prefix tumor \
-O results/
# Output files: tumor.cr.seg, tumor.modelFinal.seg, tumor.hets.tsv
Step 7: Call Copy Ratio Segments
gatk CallCopyRatioSegments \
-I results/tumor.cr.seg \
-O results/tumor.called.seg
Plotting
# Plot copy ratios and segments
gatk PlotDenoisedCopyRatios \
--standardized-copy-ratios tumor.standardized.tsv \
--denoised-copy-ratios tumor.denoised.tsv \
--sequence-dictionary reference.dict \
--minimum-contig-length 46709983 \
--output-prefix tumor \
-O plots/
# Plot segments with allelic information
gatk PlotModeledSegments \
--denoised-copy-ratios tumor.denoised.tsv \
--allelic-counts results/tumor.hets.tsv \
--segments results/tumor.modelFinal.seg \
--sequence-dictionary reference.dict \
--minimum-contig-length 46709983 \
--output-prefix tumor \
-O plots/
Germline CNV Workflow
# For germline: use cohort mode
# 1. Collect counts (same as above)
# 2. Determine contig ploidy
gatk DetermineGermlineContigPloidy \
-I sample1.counts.hdf5 \
-I sample2.counts.hdf5 \
--model cohort_ploidy_model \
--contig-ploidy-priors ploidy_priors.tsv \
-O ploidy-calls/
# 3. Call germline CNVs
gatk GermlineCNVCaller \
--run-mode COHORT \
-I sample1.counts.hdf5 \
-I sample2.counts.hdf5 \
--contig-ploidy-calls ploidy-calls/ploidy_calls \
--annotated-intervals annotated_intervals.tsv \
--output-prefix cohort \
-O germline_cnv_calls/
# 4. Post-process calls per sample
gatk PostprocessGermlineCNVCalls \
--calls-shard-path germline_cnv_calls/cohort-calls \
--model-shard-path germline_cnv_calls/cohort-model \
--sample-index 0 \
--contig-ploidy-calls ploidy-calls/ploidy_calls \
--sequence-dictionary reference.dict \
--output-genotyped-intervals sample1.genotyped.tsv \
--output-denoised-copy-ratios sample1.denoised.tsv \
-O sample1_segments.vcf
Complete Somatic Pipeline Script
#!/bin/bash
REFERENCE=reference.fa
INTERVALS=targets.interval_list
PON=cnv_pon.hdf5
SNP_SITES=common_snps.vcf
TUMOR=$1
NORMAL=$2
OUTDIR=$3
mkdir -p $OUTDIR
# Collect read counts
gatk CollectReadCounts -R $REFERENCE -I $TUMOR -L $INTERVALS \
-O $OUTDIR/tumor.counts.hdf5
gatk CollectReadCounts -R $REFERENCE -I $NORMAL -L $INTERVALS \
-O $OUTDIR/normal.counts.hdf5
# Denoise
gatk DenoiseReadCounts -I $OUTDIR/tumor.counts.hdf5 \
--count-panel-of-normals $PON \
--standardized-copy-ratios $OUTDIR/tumor.standardized.tsv \
--denoised-copy-ratios $OUTDIR/tumor.denoised.tsv
# Allelic counts
gatk CollectAllelicCounts -R $REFERENCE -I $TUMOR -L $SNP_SITES \
-O $OUTDIR/tumor.allelicCounts.tsv
gatk CollectAllelicCounts -R $REFERENCE -I $NORMAL -L $SNP_SITES \
-O $OUTDIR/normal.allelicCounts.tsv
# Model and call
gatk ModelSegments \
--denoised-copy-ratios $OUTDIR/tumor.denoised.tsv \
--allelic-counts $OUTDIR/tumor.allelicCounts.tsv \
--normal-allelic-counts $OUTDIR/normal.allelicCounts.tsv \
--output-prefix tumor -O $OUTDIR/
gatk CallCopyRatioSegments -I $OUTDIR/tumor.cr.seg -O $OUTDIR/tumor.called.seg
Key Output Files
| File | Description |
|---|---|
| .counts.hdf5 | Raw read counts per interval |
| .denoised.tsv | Denoised log2 copy ratios |
| .modelFinal.seg | Segmented copy ratios with confidence |
| .called.seg | Final called segments with CN state |
| .hets.tsv | Heterozygous SNP allelic counts |
Related Skills
- copy-number/cnvkit-analysis – Alternative CNV caller
- copy-number/cnv-visualization – Plotting results
- alignment-files/bam-statistics – Input BAM QC
- variant-calling/variant-calling – SNP calling for allelic counts