genomics-pipeline

Step 6: ClinVar Pathogenic Variant Screening

What This Does

Intersects your sample VCF against the ClinVar database of known pathogenic variants, identifying any positions where your genome carries a clinically reported disease variant.

Why

ClinVar is the gold standard for known disease-causing variants. This screen catches pathogenic SNPs and indels that have been reported in clinical settings — carrier status, dominant disease risk, and pharmacogenomic flags.

Tool

Docker Image

staphb/bcftools:1.21

Prerequisites

Command

SAMPLE=your_sample
GENOME_DIR=/path/to/your/data

# Step 1: Intersect sample VCF with ClinVar pathogenic database
docker run --rm \
  --cpus 2 --memory 4g \
  -v ${GENOME_DIR}:/genome \
  staphb/bcftools:1.21 \
  bcftools isec \
    -n =2 -w 1 \
    /genome/${SAMPLE}/vcf/${SAMPLE}.vcf.gz \
    /genome/clinvar/clinvar_pathogenic_chr.vcf.gz \
    -Oz -o /genome/${SAMPLE}/clinvar/${SAMPLE}_clinvar_hits.vcf.gz

# Step 2: Index the result
docker run --rm \
  -v ${GENOME_DIR}:/genome \
  staphb/bcftools:1.21 \
  bcftools index -t /genome/${SAMPLE}/clinvar/${SAMPLE}_clinvar_hits.vcf.gz

# Step 3: Extract human-readable summary
docker run --rm \
  -v ${GENOME_DIR}:/genome \
  staphb/bcftools:1.21 \
  bcftools query \
    -f '%CHROM\t%POS\t%REF\t%ALT\t%INFO/CLNSIG\t%INFO/CLNDN\n' \
    /genome/${SAMPLE}/clinvar/${SAMPLE}_clinvar_hits.vcf.gz \
    > /genome/${SAMPLE}/clinvar/${SAMPLE}_clinvar_summary.tsv

Output

Interpreting Results

| Scenario | Meaning | Action | |—|—|—| | Heterozygous + autosomal recessive | Healthy carrier | Note for family planning only | | Homozygous + autosomal recessive | Affected | Investigate — confirm with phenotype | | Any genotype + autosomal dominant | Potentially affected | Investigate — check penetrance and phenotype | | Compound het (two variants, same gene) | Potentially affected (recessive) | Check if variants are on different alleles (phasing) |

Important Notes