genomics-pipeline

Step 3: Variant Calling (BAM to VCF)

What This Does

Identifies all positions where the sample’s DNA differs from the reference genome: SNPs (single nucleotide changes) and small indels (insertions/deletions <50bp).

Why

The VCF file is the foundation for ALL downstream analyses: ClinVar screening, pharmacogenomics, PRS, ROH, etc.

Tool

Docker Image

google/deepvariant:1.6.0

Prerequisites

Command

SAMPLE=your_sample
GENOME_DIR=/path/to/your/data

docker run --rm \
  --cpus 16 --memory 32g \
  -v ${GENOME_DIR}:/genome \
  google/deepvariant:1.6.0 \
  /opt/deepvariant/bin/run_deepvariant \
    --model_type=WGS \
    --ref=/genome/reference/Homo_sapiens_assembly38.fasta \
    --reads=/genome/${SAMPLE}/aligned/${SAMPLE}_sorted.bam \
    --output_vcf=/genome/${SAMPLE}/vcf/${SAMPLE}.vcf.gz \
    --num_shards=16

# Output: ~93MB VCF with ~5.5M total variants (~4.6M PASS)

Resource Requirements

Output Interpretation

Notes