Calculates polygenic risk scores for 10 common conditions using validated scoring files from the PGS Catalog and plink2. Each PRS aggregates the tiny effects of hundreds to millions of genetic variants into a single number representing your relative genetic predisposition for a trait or disease.
Most common diseases (heart disease, diabetes, cancer) are not caused by a single gene. They result from the combined effect of many variants, each contributing a small amount of risk. A PRS sums these contributions using weights derived from large genome-wide association studies (GWAS). While no single variant is predictive on its own, the aggregate score can be informative.
pgscatalog/plink2:2.00a5.10
${GENOME_DIR}/${SAMPLE}/vcf/${SAMPLE}.vcf.gz./scripts/25-prs.sh your_name
| Condition | PGS ID | Source |
|---|---|---|
| Coronary artery disease | PGS000018 | Khera et al. 2018 |
| Type 2 diabetes | PGS000014 | Mahajan et al. 2018 |
| Breast cancer | PGS000004 | Mavaddat et al. 2019 |
| Prostate cancer | PGS000662 | Conti et al. 2021 |
| Atrial fibrillation | PGS000016 | Khera et al. 2018 |
| Alzheimer’s disease | PGS000334 | De Rojas et al. 2021 |
| Body mass index | PGS000027 | Khera et al. 2019 |
| Schizophrenia | PGS000738 | PGC 2022 |
| Inflammatory bowel disease | PGS000020 | Khera et al. 2018 |
| Colorectal cancer | PGS000055 | Huyghe et al. 2019 |
${GENOME_DIR}/prs_scores/)chr:pos format (matching PGS Catalog convention)--score input format, deduplicating entries with the same variant ID and alleleplink2 --score for each condition, producing a .sscore file with the aggregate score and the number of variants matched| File | Contents |
|---|---|
${SAMPLE}_prs_summary.tsv |
Tab-delimited summary: condition, PGS ID, score, variants used, variants total |
${PGS_ID}.sscore |
Raw plink2 score output per condition |
${PGS_ID}_formatted.tsv |
Reformatted scoring file used for each calculation |
${SAMPLE}.pgen/.pvar/.psam |
plink2 binary genotype files (intermediate) |
All output is written to ${GENOME_DIR}/${SAMPLE}/prs/.
~20-40 minutes total (dominated by VCF-to-plink conversion and scoring across all 10 conditions).
The summary TSV contains a raw score for each condition. Here is what the columns mean:
Raw PRS become useful only when compared against a population distribution. To convert your score into a percentile, you need a reference panel of thousands of individuals with scores computed using the same scoring file. The PGS Catalog provides some population-level statistics, but full percentile calculation requires a reference cohort (not included in this pipeline).
Comparing two people is only defensible when both were scored with the same PGS ID, the same scoring file version, the same genome build conventions, and the same preprocessing. Even then, treat the comparison as directional rather than clinically calibrated unless you also have a matched reference distribution.
As a rough guide:
Check the Variants_Used / Variants_Total ratio. If fewer than 50% of scoring variants matched, the score is less reliable. Low matching rates usually indicate:
no-mean-imputation flag), so missing variants reduce the score proportionally rather than being imputed to population averages.${GENOME_DIR}/prs_scores/. Delete this directory to force re-download.PGS_IDS associative array in the script. Browse available scores at pgscatalog.org.PGS ID, the harmonized scoring file version/date, and the pipeline commit together so score changes remain auditable.