genomics-pipeline

Step 19: Structural Variant Calling with Delly

What This Does

Third structural variant caller — combines paired-end, split-read, and read-depth signals for comprehensive SV detection including deletions, duplications, inversions, translocations, and insertions.

Why

Using multiple SV callers and intersecting their results dramatically reduces false positives:

SVs called by 2+ callers have lower false-positive rates than single-caller calls. Multi-caller intersection is a common strategy in WGS pipelines, though dedicated tools like SURVIVOR or Jasmine provide more precise breakpoint-aware merging than simple position overlap.

Tool

Docker Image

quay.io/biocontainers/delly:1.7.3--hd6466ae_0

Command

# SV calling (all SV types)
docker run --rm \
  --cpus 4 --memory 8g \
  -v ${GENOME_DIR}:/genome \
  quay.io/biocontainers/delly:1.7.3--hd6466ae_0 \
  delly call \
    -g /genome/reference/Homo_sapiens_assembly38.fasta \
    -o /genome/${SAMPLE}/delly/${SAMPLE}_sv.bcf \
    /genome/${SAMPLE}/aligned/${SAMPLE}_sorted.bam

# Convert BCF to VCF for downstream tools
docker run --rm \
  -v ${GENOME_DIR}:/genome \
  staphb/bcftools:1.21 \
  bcftools view \
    /genome/${SAMPLE}/delly/${SAMPLE}_sv.bcf \
    -Oz -o /genome/${SAMPLE}/delly/${SAMPLE}_sv.vcf.gz

# Index
docker run --rm \
  -v ${GENOME_DIR}:/genome \
  staphb/bcftools:1.21 \
  bcftools index -t \
    /genome/${SAMPLE}/delly/${SAMPLE}_sv.vcf.gz

Optional: Dedicated CNV Calling

Delly also has a dedicated CNV mode using read-depth only (similar to CNVnator):

docker run --rm \
  --cpus 4 --memory 8g \
  -v ${GENOME_DIR}:/genome \
  quay.io/biocontainers/delly:1.7.3--hd6466ae_0 \
  delly cnv \
    -g /genome/reference/Homo_sapiens_assembly38.fasta \
    -o /genome/${SAMPLE}/delly/${SAMPLE}_cnv.bcf \
    /genome/${SAMPLE}/aligned/${SAMPLE}_sorted.bam

Output

Filtering

# Keep only PASS variants
bcftools view -f PASS ${SAMPLE}_sv.vcf.gz

# Filter by SV type
bcftools view -i 'INFO/SVTYPE="DEL"' ${SAMPLE}_sv.vcf.gz
bcftools view -i 'INFO/SVTYPE="INV"' ${SAMPLE}_sv.vcf.gz

Runtime

~2-4 hours per 30X WGS genome.

Notes