Personal-Genome-Pipeline

Step 17: Cancer Predisposition Screening with CPSR

What This Does

Screens germline variants against curated cancer predisposition gene panels to identify clinically actionable cancer risk variants. CPSR uses its own panels sourced from Genomics England PanelApp and other curated databases — these are cancer-focused and distinct from the 81-gene ACMG SF v3.2 list (which also includes cardiac and metabolic genes not covered by CPSR).

Why

ClinVar screening (step 6) finds known pathogenic variants, but CPSR applies ACMG/AMP classification criteria to novel or rare variants in cancer predisposition genes — catching variants ClinVar hasn’t yet classified.

Tool

Docker Image

sigven/pcgr:2.2.5

CPSR binary is at /usr/local/bin/cpsr inside this image. Requires a separate ref data bundle (~5 GB) and a VEP cache.

Prerequisites

1. VEP Cache

PCGR 2.2.5 bundles VEP 113, which requires the release-113 cache. This is different from the release-112 cache used by step 13 — both coexist in the same vep_cache/ directory under different subdirectories (112_GRCh38/ and 113_GRCh38/).

mkdir -p ${GENOME_DIR}/vep_cache
wget -c -P ${GENOME_DIR}/vep_cache https://ftp.ensembl.org/pub/release-113/variation/indexed_vep_cache/homo_sapiens_vep_113_GRCh38.tar.gz
tar xzf ${GENOME_DIR}/vep_cache/homo_sapiens_vep_113_GRCh38.tar.gz -C ${GENOME_DIR}/vep_cache

2. PCGR Ref Data Bundle

PCGR 2.x uses a new, smaller ref data bundle (~5 GB) separate from VEP:

mkdir -p ${GENOME_DIR}/pcgr_data
cd ${GENOME_DIR}/pcgr_data
wget -c https://insilico.hpc.uio.no/pcgr/pcgr_ref_data.20250314.grch38.tgz
tar xzf pcgr_ref_data.20250314.grch38.tgz
mkdir -p 20250314 && mv data/ 20250314/

This creates a 20250314/data/ directory with ClinVar, CancerMine, UniProt, and other databases. VEP cache is now mounted separately.

Command

docker run --rm --user root \
  --cpus 4 --memory 8g \
  -v ${GENOME_DIR}/vep_cache:/mnt/.vep \
  -v ${GENOME_DIR}/pcgr_data/20250314:/mnt/bundle \
  -v ${GENOME_DIR}/${SAMPLE}/vcf:/mnt/inputs \
  -v ${GENOME_DIR}/${SAMPLE}/cpsr:/mnt/outputs \
  sigven/pcgr:2.2.5 \
  cpsr \
    --input_vcf /mnt/inputs/${SAMPLE}.vcf.gz \
    --vep_dir /mnt/.vep \
    --refdata_dir /mnt/bundle \
    --output_dir /mnt/outputs \
    --genome_assembly grch38 \
    --sample_id ${SAMPLE} \
    --panel_id 0 \
    --classify_all \
    --force_overwrite

Panel Options

| Panel ID | Description | |—|—| | 0 | Comprehensive cancer superpanel (500+ genes) — recommended | | 1 | Adult-onset hereditary cancer | | 2 | Childhood-onset hereditary cancer | | 3 | Lynch syndrome | | 4 | BRCA1/BRCA2 |

Output

Runtime

~30-60 minutes per genome (depends on variant count).

Notes