Personal-Genome-Pipeline

Lessons Learned

Every failure encountered during pipeline development (Mar 2026), documented so they don’t happen again.

Docker Image Issues

AnnotSV: Official image doesn’t exist

SnpEff/SnpSift: Combined package

SnpEff Database: Azure blob storage outage

ExpansionHunter: Different image formats

ExpansionHunter: Missing required –log parameter

StellarPGx: Empty Docker image

Tool-Specific Issues

TelomereHunter: Permission denied

TelomereHunter: pip install on host fails

HLA-LA: Graph not serialized (3 failures)

HLA-LA: Binary crash with pre-built graph image

T1K: Coordinate file with wrong values

T1K: BAM extraction produces 0-byte FASTQ

Cyrius CYP2D6: Inconclusive on short-read WGS

bcftools/htslib Issues

bgzip/tabix not in bcftools image PATH

MIS VCF conversion: 0-byte output files

VEP Cache Issues

VEP INSTALL.pl: Permission denied on temp directory

VEP INSTALL.pl: Silent download failure

General Docker Tips

Always use resource limits

docker run --cpus 4 --memory 8g ...

Without limits, tools like DeepVariant or minimap2 will consume ALL available RAM and crash the host.

Build for amd64 from macOS

docker build --platform linux/amd64 ...

macOS is arm64; most Linux servers are amd64. Images built on Mac without --platform won’t run on amd64 servers.

Use –rm for one-shot containers

Always use --rm for analysis containers to avoid accumulating stopped containers. Use -d (detached) for long-running jobs.

Always use –user root for write access

Most bioinformatics containers run as non-root users. If writing to bind-mounted volumes, add --user root to avoid permission issues.

CI / Workflow Issues

ShellCheck warnings still fail the GitHub Action

Protect main only after CI is green

MToolBox Issues

MToolBox: No working Docker image exists

CNVnator Issues

CNVnator: Biocontainer tag with wrong build hash

Delly Issues

Delly: Biocontainer tag doesn’t exist

Delly: SV annotation phase takes 2-3 hours

Delly: Output is BCF format, not VCF

CNVnator: ROOT file appears empty (266 bytes) during tree extraction

CPSR/PCGR Issues

CPSR: –pcgr_dir path confusion (PCGR 1.x, historical)

CPSR: Docker image is inside PCGR

PCGR 2.x Migration (1.4.1 to 2.2.5)

Michigan Imputation Server Notes

Minimum 20 samples per job

Registration required

TOPMed panel is best for Europeans

Alternative Variant Caller Issues

FreeBayes 1.3.7: SIGILL crash (exit code 132)

FreeBayes: Memory grows to ~13 GB on full genome

FreeBayes: Single-threaded, no parallelism

GATK HaplotypeCaller: bcftools index fails on existing .tbi

GATK HaplotypeCaller: Requires .dict file

bcftools isec: -R vs -r for region strings

TIDDIT >=3.9: Requires BWA index for local assembly

TIDDIT: Image tag 3.7.0 doesn’t exist on quay.io

Strelka2: –callRegions needs bgzipped + tabixed BED

bgzip/tabix not in staphb/samtools or staphb/bcftools images

FreeBayes chr22 variant count (3x more than DeepVariant)

Chip Data Conversion (Genotyping Arrays → VCF)

MyHeritage CSV must be converted to TSV

bcftools hg19 VCF needs chr prefix before liftover

PharmCAT chip vs WGS results (MyHeritage GSA, verified 2026-03-31)

ROH and PRS need special flags for chip data

v0.3.0 Tool Additions (Apr 2026)

ExpansionHunter v5.0.0: Completely different CLI from v2.5.5

GRIDSS: Requires BWA index (not minimap2)

GRIDSS: 32 GB memory requirement

GRIDSS: ENCODE blacklist download

fastp: Maximum 16 threads despite –workers flag

fastp: BGI/MGI adapter auto-detection

mosdepth: –fast-mode skips per-base output

MultiQC: Auto-discovers fastp JSON by content, not filename

Octopus: No issues observed

PharmCAT 3.x Migration (2.15.5 to 3.2.0)

Preprocessor script renamed (no .py extension)

Reporter flags: must be explicit for both formats

JSON property rename: wildtypeAllele to referenceAllele

New features in 3.2.0