genomics-pipeline

Step 18: Depth-Based CNV Calling with CNVnator

What This Does

Detects copy number variants (CNVs) using read-depth analysis — complementary to Manta’s paired-end/split-read approach. Especially effective for large CNVs (>1 kb) that Manta may miss.

Why

Manta (step 4) detects SVs from discordant read pairs and split reads, which works well for balanced SVs (inversions, translocations) and smaller deletions/duplications. CNVnator uses read-depth signal only, making it better for:

Tool

Docker Image

quay.io/biocontainers/cnvnator:0.4.1--py312h99c8fb2_11

Command

# Step 1: Extract read mapping from BAM
docker run --rm \
  --cpus 4 --memory 8g \
  -v ${GENOME_DIR}:/genome \
  quay.io/biocontainers/cnvnator:0.4.1--py312h99c8fb2_11 \
  cnvnator \
    -root /genome/${SAMPLE}/cnvnator/${SAMPLE}.root \
    -tree /genome/${SAMPLE}/aligned/${SAMPLE}_sorted.bam

# Step 2: Generate read-depth histogram
docker run --rm \
  --cpus 4 --memory 8g \
  -v ${GENOME_DIR}:/genome \
  quay.io/biocontainers/cnvnator:0.4.1--py312h99c8fb2_11 \
  cnvnator \
    -root /genome/${SAMPLE}/cnvnator/${SAMPLE}.root \
    -his 1000 \
    -fasta /genome/reference/Homo_sapiens_assembly38.fasta

# Step 3: Statistics
docker run --rm \
  --cpus 4 --memory 8g \
  -v ${GENOME_DIR}:/genome \
  quay.io/biocontainers/cnvnator:0.4.1--py312h99c8fb2_11 \
  cnvnator \
    -root /genome/${SAMPLE}/cnvnator/${SAMPLE}.root \
    -stat 1000

# Step 4: Partition
docker run --rm \
  --cpus 4 --memory 8g \
  -v ${GENOME_DIR}:/genome \
  quay.io/biocontainers/cnvnator:0.4.1--py312h99c8fb2_11 \
  cnvnator \
    -root /genome/${SAMPLE}/cnvnator/${SAMPLE}.root \
    -partition 1000

# Step 5: Call CNVs
docker run --rm \
  --cpus 4 --memory 8g \
  -v ${GENOME_DIR}:/genome \
  quay.io/biocontainers/cnvnator:0.4.1--py312h99c8fb2_11 \
  cnvnator \
    -root /genome/${SAMPLE}/cnvnator/${SAMPLE}.root \
    -call 1000 \
    > ${GENOME_DIR}/${SAMPLE}/cnvnator/${SAMPLE}_cnvs.txt

Bin Size

The 1000 parameter is the bin size in base pairs. Use:

Output

Filtering

# Keep only significant CNVs (e-value < 0.01, size > 1kb)
awk '$5 < 0.01 && $3 > 1000' ${SAMPLE}_cnvs.txt

Runtime

~2-4 hours per 30X WGS genome.

Notes