genomics-pipeline

Lessons Learned

Every failure encountered during pipeline development (Mar 2026), documented so they don’t happen again.

Docker Image Issues

AnnotSV: Official image doesn’t exist

SnpEff/SnpSift: Combined package

SnpEff Database: Azure blob storage outage

ExpansionHunter: Different image formats

ExpansionHunter: Missing required –log parameter

StellarPGx: Empty Docker image

Tool-Specific Issues

TelomereHunter: Permission denied

TelomereHunter: pip install on host fails

HLA-LA: Graph not serialized (3 failures)

HLA-LA: Binary crash with pre-built graph image

T1K: Coordinate file with wrong values

T1K: BAM extraction produces 0-byte FASTQ

Cyrius CYP2D6: Inconclusive on short-read WGS

bcftools/htslib Issues

bgzip/tabix not in bcftools image PATH

MIS VCF conversion: 0-byte output files

VEP Cache Issues

VEP INSTALL.pl: Permission denied on temp directory

VEP INSTALL.pl: Silent download failure

General Docker Tips

Always use resource limits

docker run --cpus 4 --memory 8g ...

Without limits, tools like DeepVariant or minimap2 will consume ALL available RAM and crash the host.

Build for amd64 from macOS

docker build --platform linux/amd64 ...

macOS is arm64; most Linux servers are amd64. Images built on Mac without --platform won’t run on amd64 servers.

Use –rm for one-shot containers

Always use --rm for analysis containers to avoid accumulating stopped containers. Use -d (detached) for long-running jobs.

Always use –user root for write access

Most bioinformatics containers run as non-root users. If writing to bind-mounted volumes, add --user root to avoid permission issues.

CI / Workflow Issues

ShellCheck warnings still fail the GitHub Action

Protect main only after CI is green

MToolBox Issues

MToolBox: No working Docker image exists

CNVnator Issues

CNVnator: Biocontainer tag with wrong build hash

Delly Issues

Delly: Biocontainer tag doesn’t exist

Delly: SV annotation phase takes 2-3 hours

Delly: Output is BCF format, not VCF

CNVnator: ROOT file appears empty (266 bytes) during tree extraction

CPSR/PCGR Issues

CPSR: –pcgr_dir path confusion

CPSR: Docker image is inside PCGR

Michigan Imputation Server Notes

Minimum 20 samples per job

Registration required

TOPMed panel is best for Europeans

Alternative Variant Caller Issues

FreeBayes 1.3.7: SIGILL crash (exit code 132)

FreeBayes: Memory grows to ~13 GB on full genome

FreeBayes: Single-threaded, no parallelism

GATK HaplotypeCaller: bcftools index fails on existing .tbi

GATK HaplotypeCaller: Requires .dict file

bcftools isec: -R vs -r for region strings

TIDDIT >=3.9: Requires BWA index for local assembly

TIDDIT: Image tag 3.7.0 doesn’t exist on quay.io

Strelka2: –callRegions needs bgzipped + tabixed BED

bgzip/tabix not in staphb/samtools or staphb/bcftools images

FreeBayes chr22 variant count (3x more than DeepVariant)

Chip Data Conversion (Genotyping Arrays → VCF)

MyHeritage CSV must be converted to TSV

bcftools hg19 VCF needs chr prefix before liftover

PharmCAT chip vs WGS results (MyHeritage GSA, verified 2026-03-31)

ROH and PRS need special flags for chip data