genomics-pipeline

Step 23: Clinical Variant Filter

What This Does

Extracts the small subset of clinically interesting variants from your VEP-annotated VCF. Instead of manually searching through 4-5 million variants, this step produces a focused list of ~200-500 variants that are rare AND functionally impactful.

Why

The biggest challenge after running VEP annotation is: “I have millions of variants, what do I look at?” This step solves that by applying conservative filters to surface the variants most likely to be medically relevant.

Tool

bcftools + bcftools +split-vep plugin (parses VEP CSQ fields structurally — no grep)

Docker Image

staphb/bcftools:1.21

Input

Command

./scripts/23-clinical-filter.sh your_name

What Gets Filtered

The script produces up to three variant sets (depending on VEP annotations available) that are merged:

HIGH Impact Variants

Expected count: 100-200 per genome. Uses bcftools +split-vep -s worst to select the most severe consequence per variant.

Rare MODERATE Impact Variants

Expected count: 200-400 per genome after frequency filtering. If VEP output lacks gnomAD frequencies (--af_gnomade), all MODERATE variants are included.

ClinVar Pathogenic/Likely Pathogenic (Conditional)

Expected count: 10-50 per genome (depends on ClinVar version).

Output

File Contents Size
${SAMPLE}_clinical.vcf.gz Combined clinically interesting VCF < 5 MB
${SAMPLE}_clinical_summary.tsv Human-readable tab-delimited table < 1 MB
${SAMPLE}_high_impact.vcf.gz HIGH impact variants only < 2 MB
${SAMPLE}_rare_moderate.vcf.gz Rare MODERATE variants only < 3 MB
${SAMPLE}_clinvar_pathogenic.vcf.gz ClinVar P/LP only (if CLIN_SIG available) < 1 MB

Runtime

~5-10 minutes (I/O-bound, reading the large VEP VCF)

How to Use the Output

Quick look at the summary

# View the most important variants
column -t ${GENOME_DIR}/${SAMPLE}/clinical/${SAMPLE}_clinical_summary.tsv | head -20

Cross-reference with ClinVar

# Find which clinical variants are also in ClinVar
docker run --rm -v "${GENOME_DIR}:/genome" staphb/bcftools:1.21 \
  bcftools isec -n=2 -w1 \
    /genome/${SAMPLE}/clinical/${SAMPLE}_clinical.vcf.gz \
    /genome/clinvar/clinvar.vcf.gz \
    -Oz -o /genome/${SAMPLE}/clinical/${SAMPLE}_clinical_clinvar.vcf.gz

Load in a genome browser

The _clinical.vcf.gz file is small enough to load in IGV Web or gene.iobio for visual inspection.

Limitations

Notes