bio-read-qc-quality-reports
3
总安装量
3
周安装量
#60744
全站排名
安装命令
npx skills add https://github.com/gptomics/bioskills --skill bio-read-qc-quality-reports
Agent 安装分布
trae
2
windsurf
1
opencode
1
codex
1
claude-code
1
antigravity
1
Skill 文档
Quality Reports
Generate quality reports for FASTQ files using FastQC and aggregate multiple reports with MultiQC.
FastQC – Single Sample Reports
Basic Usage
# Single file
fastqc sample.fastq.gz
# Multiple files
fastqc *.fastq.gz
# Specify output directory
fastqc -o qc_reports/ sample_R1.fastq.gz sample_R2.fastq.gz
# Set threads
fastqc -t 4 *.fastq.gz
Output Files
FastQC produces two files per input:
sample_fastqc.html– Interactive HTML reportsample_fastqc.zip– Data files and images
Key Modules
| Module | What It Shows | Warning Signs |
|---|---|---|
| Per base sequence quality | Quality scores across read | Drop below Q20 at 3′ end |
| Per sequence quality | Quality score distribution | Bimodal distribution |
| Per base sequence content | Nucleotide composition | Imbalance at start (normal) |
| Per sequence GC content | GC distribution | Secondary peak (contamination) |
| Per base N content | Unknown bases | High N content |
| Sequence length distribution | Read lengths | Unexpected variation |
| Sequence duplication | Duplicate reads | High duplication (PCR) |
| Overrepresented sequences | Common sequences | Adapter contamination |
| Adapter content | Adapter sequences | Visible adapter curves |
Extract Data from ZIP
# Unzip to access raw data
unzip sample_fastqc.zip
# View summary
cat sample_fastqc/summary.txt
# Get per-base quality
cat sample_fastqc/fastqc_data.txt | grep -A 50 ">>Per base sequence quality"
MultiQC – Aggregate Reports
Basic Usage
# Aggregate all FastQC reports in current directory
multiqc .
# Specify input and output
multiqc qc_reports/ -o multiqc_output/
# Custom report name
multiqc . -n my_project_qc
# Force overwrite
multiqc . -f
Common Options
# Flat directory (no sample subdirs)
multiqc --flat .
# Export data as TSV
multiqc . --export
# Only specific modules
multiqc . -m fastqc
# Exclude patterns
multiqc . --ignore '*_trimmed*'
# Include patterns
multiqc . --ignore-samples '*negative*'
Output Files
multiqc_report.html– Interactive HTML reportmultiqc_data/– Directory with data tablesmultiqc_fastqc.txt– FastQC metricsmultiqc_general_stats.txt– Summary statisticsmultiqc_sources.txt– Source files used
Extract Data Programmatically
import pandas as pd
general_stats = pd.read_csv('multiqc_data/multiqc_general_stats.txt', sep='\t')
print(general_stats.columns)
fastqc_data = pd.read_csv('multiqc_data/multiqc_fastqc.txt', sep='\t')
Batch Processing
Process Multiple Samples
# All FASTQ files in parallel
fastqc -t 8 -o qc_reports/ raw_data/*.fastq.gz
# Then aggregate
multiqc qc_reports/ -o multiqc_output/
Before and After Trimming
# Create separate directories
mkdir -p qc_reports/raw qc_reports/trimmed
# QC raw reads
fastqc -o qc_reports/raw/ raw_data/*.fastq.gz
# After trimming (using fastp, cutadapt, etc.)
fastqc -o qc_reports/trimmed/ trimmed_data/*.fastq.gz
# Compare with MultiQC
multiqc qc_reports/ -o qc_comparison/
Interpretation Guide
Quality Scores
| Phred Score | Error Rate | Interpretation |
|---|---|---|
| Q40 | 0.0001 | Excellent |
| Q30 | 0.001 | Good (Illumina target) |
| Q20 | 0.01 | Acceptable |
| Q10 | 0.1 | Poor |
Common Issues
| Issue | Likely Cause | Action |
|---|---|---|
| Low quality at 3′ end | Normal degradation | Trim 3′ end |
| Adapter contamination | Short inserts | Trim adapters |
| GC bias | Library prep | Consider correction |
| High duplication | Low complexity, PCR | Mark/remove duplicates |
| Overrepresented seqs | Adapters, primers | Check sequences |
Configuration
Custom Adapters
Create ~/.fastqc/Configuration/adapter_list.txt:
Custom_Adapter_Name ACGTACGTACGT
Custom Limits
Create ~/.fastqc/Configuration/limits.txt to customize thresholds:
# Warn if mean quality below 25
quality_sequence warn 25
quality_sequence error 20
Related Skills
- adapter-trimming – Remove adapters detected by FastQC
- fastp-workflow – All-in-one QC and trimming
- sequence-io – FASTQ file reading/writing