Prioritizes clinically interesting variants using tiered filters and detects compound heterozygote candidates. Optionally annotates results with gnomAD gene constraint metrics.
A typical VEP-annotated VCF contains thousands of MODERATE/HIGH impact variants. Most are common population variants or benign polymorphisms. This step filters down to the variants most likely to be clinically relevant using a three-tier approach, then flags potential compound heterozygotes that could cause autosomal recessive disease.
slivar (by Brent Pedersen, author of vcfanno, mosdepth, duphold) is a streaming VCF filter that replaces the unmaintained GEMINI database approach. It uses JS expressions for flexible filtering without loading variants into a database.
${GENOME_DIR}/annotations/gnomad_v4.1_constraint.tsvquay.io/biocontainers/slivar:0.3.3--h5f107b1_0 # compound het detection
staphb/bcftools:1.21 # variant filtering via split-vep
export GENOME_DIR=/path/to/your/data
./scripts/31-slivar.sh your_sample
conflicting_interpretations_of_pathogenicity (the substring match ~"pathogenic" would otherwise include these)All tiers are merged and deduplicated into a single prioritized VCF.
slivar’s compound-hets command groups heterozygous variants by gene from the prioritized VCF and reports pairs that could form compound heterozygotes (two different damaging variants in the same gene, one from each parent). It requires a PED file (--ped) describing sample relationships. For singleton samples (no trio), the --allow-non-trios flag is required.
The command outputs VCF to stdout with INFO/slivar_comphet annotations linking partner variants. Each VCF record represents a unique variant; the slivar_comphet field lists all its compound-het partners (format: sample/GENE/PAIR_ID/chrom/pos/ref/alt, comma-separated). A gene with N variants produces up to C(N,2) pairs but only N VCF records. The script counts unique pair IDs from this field and exports a human-readable TSV with columns: GENE, CHROM, POS, REF, ALT, IMPACT, Consequence, GT – sorted by gene so that compound-het partners appear in consecutive rows.
Important: With single-sample unphased data, these are candidates only. The two variants might be on the same haplotype (cis) rather than different haplotypes (trans). Trio data or read-backed phasing is needed to confirm true compound hets.
If gnomad_v4.1_constraint.tsv is available, the summary TSV is enriched with per-gene constraint metrics:
| Column | Description | Threshold |
|---|---|---|
| LOEUF | Loss-of-function observed/expected upper bound | < 0.35 = constrained |
| pLI | Probability of LoF intolerance | > 0.9 = constrained |
| mis_z | Missense Z-score | > 3.09 = constrained |
| CONSTRAINED | YES if LOEUF < 0.35 or pLI > 0.9 | Flag column |
Variants in constrained genes are more likely to be pathogenic – these genes are under strong purifying selection against damaging variants.
| File | Description |
|---|---|
slivar/${SAMPLE}_prioritized.vcf.gz |
All prioritized variants (merged, deduplicated) |
slivar/${SAMPLE}_slivar_summary.tsv |
Human-readable table with gene constraint |
slivar/${SAMPLE}_compound_hets.vcf.gz |
Compound het candidate variants (VCF) |
slivar/${SAMPLE}_compound_hets.tsv |
Compound het candidates (human-readable TSV) |
slivar/${SAMPLE}_rare_high.vcf.gz |
Tier 1: HIGH impact |
slivar/${SAMPLE}_rare_moderate_del.vcf.gz |
Tier 2: MODERATE + deleterious |
slivar/${SAMPLE}_clinvar_path.vcf.gz |
Tier 3: ClinVar P/LP |
~5-10 minutes. Most time is spent on bcftools split-vep filtering.
A typical 30X WGS sample produces:
Focus review on:
See docs/interpreting-results.md for pathogenicity score thresholds.