Prepares a WGS VCF for submission to the Michigan Imputation Server (MIS) or TOPMed Imputation Server. Splits the VCF by chromosome, filters to PASS variants, and converts to the required format.
Imputation servers statistically infer missing genotypes using large reference panels. For WGS data, imputation is primarily useful for phasing (determining which alleles are on the same chromosome) rather than filling in missing variants. Phased data is required for haplotype-level analyses and accurate PRS calculation.
staphb/bcftools:1.21
SAMPLE=your_sample
GENOME_DIR=/path/to/your/data
# Step 1: Filter to PASS variants only
docker run --rm \
-v ${GENOME_DIR}/${SAMPLE}/vcf:/data \
staphb/bcftools:1.21 \
bcftools view -f PASS \
/data/${SAMPLE}.vcf.gz \
-Oz -o /data/${SAMPLE}_pass.vcf.gz
# Step 2: Index the filtered VCF
docker run --rm \
-v ${GENOME_DIR}/${SAMPLE}/vcf:/data \
staphb/bcftools:1.21 \
bcftools index -t /data/${SAMPLE}_pass.vcf.gz
# Step 3: Split by chromosome (chr1-22, autosomes only)
for CHR in $(seq 1 22); do
docker run --rm \
-v ${GENOME_DIR}/${SAMPLE}/vcf:/data \
staphb/bcftools:1.21 \
bcftools view -r chr${CHR} \
/data/${SAMPLE}_pass.vcf.gz \
-Oz -o /data/imputation/chr${CHR}.vcf.gz
docker run --rm \
-v ${GENOME_DIR}/${SAMPLE}/vcf:/data \
staphb/bcftools:1.21 \
bcftools index -t /data/imputation/chr${CHR}.vcf.gz
done
# Output: 22 per-chromosome VCF files in /data/imputation/
| Server | Panel | Samples | Build | URL | |—|—|—|—|—| | Michigan (MIS) | HRC r1.1 | 32,470 | GRCh37/38 | imputationserver.sph.umich.edu | | TOPMed | TOPMed r2 | 132,070 | GRCh38 native | imputation.biodatacatalyst.nhlbi.nih.gov |
.vcf.gz format — ensure files are bgzipped (bcftools output is bgzipped by default)mkdir -p ${GENOME_DIR}/${SAMPLE}/vcf/imputation