Screens germline variants against curated cancer predisposition gene panels to identify clinically actionable cancer risk variants. CPSR uses its own panels sourced from Genomics England PanelApp and other curated databases — these are cancer-focused and distinct from the 81-gene ACMG SF v3.2 list (which also includes cardiac and metabolic genes not covered by CPSR).
ClinVar screening (step 6) finds known pathogenic variants, but CPSR applies ACMG/AMP classification criteria to novel or rare variants in cancer predisposition genes — catching variants ClinVar hasn’t yet classified.
sigven/pcgr:2.2.5
CPSR binary is at /usr/local/bin/cpsr inside this image. Requires a separate ref data bundle (~5 GB) and a VEP cache.
PCGR 2.2.5 bundles VEP 113, which requires the release-113 cache. This is different from the release-112 cache used by step 13 — both coexist in the same vep_cache/ directory under different subdirectories (112_GRCh38/ and 113_GRCh38/).
mkdir -p ${GENOME_DIR}/vep_cache
wget -c -P ${GENOME_DIR}/vep_cache https://ftp.ensembl.org/pub/release-113/variation/indexed_vep_cache/homo_sapiens_vep_113_GRCh38.tar.gz
tar xzf ${GENOME_DIR}/vep_cache/homo_sapiens_vep_113_GRCh38.tar.gz -C ${GENOME_DIR}/vep_cache
PCGR 2.x uses a new, smaller ref data bundle (~5 GB) separate from VEP:
mkdir -p ${GENOME_DIR}/pcgr_data
cd ${GENOME_DIR}/pcgr_data
wget -c https://insilico.hpc.uio.no/pcgr/pcgr_ref_data.20250314.grch38.tgz
tar xzf pcgr_ref_data.20250314.grch38.tgz
mkdir -p 20250314 && mv data/ 20250314/
This creates a 20250314/data/ directory with ClinVar, CancerMine, UniProt, and other databases. VEP cache is now mounted separately.
docker run --rm --user root \
--cpus 4 --memory 8g \
-v ${GENOME_DIR}/vep_cache:/mnt/.vep \
-v ${GENOME_DIR}/pcgr_data/20250314:/mnt/bundle \
-v ${GENOME_DIR}/${SAMPLE}/vcf:/mnt/inputs \
-v ${GENOME_DIR}/${SAMPLE}/cpsr:/mnt/outputs \
sigven/pcgr:2.2.5 \
cpsr \
--input_vcf /mnt/inputs/${SAMPLE}.vcf.gz \
--vep_dir /mnt/.vep \
--refdata_dir /mnt/bundle \
--output_dir /mnt/outputs \
--genome_assembly grch38 \
--sample_id ${SAMPLE} \
--panel_id 0 \
--classify_all \
--force_overwrite
| Panel ID | Description | |—|—| | 0 | Comprehensive cancer superpanel (500+ genes) — recommended | | 1 | Adult-onset hereditary cancer | | 2 | Childhood-onset hereditary cancer | | 3 | Lynch syndrome | | 4 | BRCA1/BRCA2 |
${SAMPLE}.cpsr.grch38.html — Interactive HTML report with classified variants${SAMPLE}.cpsr.grch38.snvs_indels.tiers.tsv — Tab-separated variant classifications~30-60 minutes per genome (depends on variant count).
--pcgr_dir flag (which internally appended /data) is replaced by --refdata_dir and --vep_dir as separate mount points. The single monolithic data bundle is split into a smaller ref data bundle + the standard Ensembl VEP cache. Docker volume mounts changed from a single :/genome to four separate mounts for VEP, bundle, inputs, and outputs.20250314 bundle dates from March 2025. Check the PCGR releases page periodically for updated bundles — newer bundles include more recent ClinVar classifications and gene-disease annotations.vep_cache/homo_sapiens/ (subdirectories 112_GRCh38/ and 113_GRCh38/). You need both if running both steps.--panel_id 0 for the comprehensive cancer superpanel (500+ genes). Note: this is broader than ACMG SF but cancer-focused — it does not replace a full ACMG incidental-findings screen.--classify_all ensures all variants in target genes get ACMG classification, not just known pathogenic.