Personal-Genome-Pipeline

Step 6: ClinVar Pathogenic Variant Screening

What This Does

Intersects your sample VCF against the ClinVar database of known pathogenic variants, identifying any positions where your genome carries a clinically reported disease variant.

Why

ClinVar is the most widely used public database of clinically reported variants. This screen catches pathogenic SNPs and indels that have been submitted by clinical labs — carrier status, dominant disease risk, and pharmacogenomic flags. Note that ClinVar entries vary in evidence quality (see star ratings in interpreting-results.md).

Tool

Docker Image

staphb/bcftools:1.21

Prerequisites

Command

export GENOME_DIR=/path/to/data
./scripts/06-clinvar-screen.sh <sample_name>

# For long-read Clair3 output:
VCF_DIR=vcf_clair3 ./scripts/06-clinvar-screen.sh <sample_name>

What the Script Does

  1. Filters the sample VCF to PASS variants only (bcftools view -f PASS)
  2. Intersects the PASS VCF against the ClinVar pathogenic subset using bcftools isec -p
  3. Reports the count of shared variants (positions in both the sample and ClinVar pathogenic)

Output

File Description
clinvar/${SAMPLE}_pass.vcf.gz PASS-only subset of the sample VCF (intermediate)
clinvar/isec/0000.vcf Variants unique to the sample
clinvar/isec/0001.vcf Variants unique to ClinVar pathogenic
clinvar/isec/0002.vcf Shared variants — positions overlapping ClinVar pathogenic entries
clinvar/isec/0003.vcf Shared variants (ClinVar’s perspective)

Interpreting Results

This step screens against Pathogenic and Likely_pathogenic variants only — benign and VUS entries are excluded at the database level (see step 00 reference setup). Every hit in the output is at a position ClinVar classifies as disease-associated.

Scenario Meaning Action
Heterozygous + autosomal recessive Healthy carrier Note for family planning only
Homozygous + autosomal recessive Possibly affected — requires clinical confirmation Investigate — confirm with clinical evaluation and ClinVar review status
Any genotype + autosomal dominant Possibly affected — requires clinical confirmation Investigate — check penetrance, ClinVar review status, and phenotype
Compound het (two variants, same gene) Potentially affected (recessive) Check if variants are on different alleles (phasing)

Limitations

Important Notes