Intersects your sample VCF against the ClinVar database of known pathogenic variants, identifying any positions where your genome carries a clinically reported disease variant.
ClinVar is the most widely used public database of clinically reported variants. This screen catches pathogenic SNPs and indels that have been submitted by clinical labs — carrier status, dominant disease risk, and pharmacogenomic flags. Note that ClinVar entries vary in evidence quality (see star ratings in interpreting-results.md).
staphb/bcftools:1.21
clinvar_pathogenic_chr.vcf.gz from reference setup (step 00) — chr-prefixed, filtered to Pathogenic/Likely_pathogenic onlyexport GENOME_DIR=/path/to/data
./scripts/06-clinvar-screen.sh <sample_name>
# For long-read Clair3 output:
VCF_DIR=vcf_clair3 ./scripts/06-clinvar-screen.sh <sample_name>
bcftools view -f PASS)bcftools isec -p| File | Description |
|---|---|
clinvar/${SAMPLE}_pass.vcf.gz |
PASS-only subset of the sample VCF (intermediate) |
clinvar/isec/0000.vcf |
Variants unique to the sample |
clinvar/isec/0001.vcf |
Variants unique to ClinVar pathogenic |
clinvar/isec/0002.vcf |
Shared variants — positions overlapping ClinVar pathogenic entries |
clinvar/isec/0003.vcf |
Shared variants (ClinVar’s perspective) |
This step screens against Pathogenic and Likely_pathogenic variants only — benign and VUS entries are excluded at the database level (see step 00 reference setup). Every hit in the output is at a position ClinVar classifies as disease-associated.
| Scenario | Meaning | Action |
|---|---|---|
| Heterozygous + autosomal recessive | Healthy carrier | Note for family planning only |
| Homozygous + autosomal recessive | Possibly affected — requires clinical confirmation | Investigate — confirm with clinical evaluation and ClinVar review status |
| Any genotype + autosomal dominant | Possibly affected — requires clinical confirmation | Investigate — check penetrance, ClinVar review status, and phenotype |
| Compound het (two variants, same gene) | Potentially affected (recessive) | Check if variants are on different alleles (phasing) |
bcftools isec) — representation differences between callers can cause both false positives and missed overlaps. The Nextflow module normalizes VCFs before intersection; the bash script relies on upstream normalization.