Every failure encountered during pipeline development (Mar 2026), documented so they don’t happen again.
bioinfochrustrasbourg/annotsv:3.4.4 — no such image on Docker Hubgetwilds/annotsv:latest instead (Fred Hutch maintained)quay.io/biocontainers/snpsift:5.2--hdfd78af_1 — no such manifestquay.io/biocontainers/snpeff:5.2--hdfd78af_1 — SnpSift is bundled inside the SnpEff packagesnpEff download GRCh38.105 and all database names — Azure blob storage returned 0-byte files for ALL URLs including GRCh38.105, GRCh38.mane.1.2.ensemblquay.io/biocontainers/expansionhunter:5.0.0--hd03093a_1 — manifest not found. Also mgibio/expansionhunter:latest — not foundweisburd/expansionhunter:latest — binary at /ExpansionHunter/bin/ExpansionHunter, variant catalogs at /pathogenic_repeats/GRCh38/--repeat-specs (directory), not --variant-catalog (single JSON)--log /output/sample_eh.log in the commandtwesigomwedavid/stellarpgx:latest — image exists but contains no StellarPGx binariesOSError: [Errno 13] Permission denied: '/output/sergio' when writing output (sample name in error will vary)--user root flag to docker runpip install telomerehunter on the host gives UnicodeDecodeError — Python environment issueslgalarno/telomerehunter:latest instead of native installzlskidmore/hla-la:latest — graph at src/additionalReferences/PRG_MHC_GRCh38_withIMGT/ exists but is NOT serialized. HLA-LA exits silently.graphs/PRG_MHC_GRCh38_withIMGT/ — “graph not complete”--action prepareGraph in detached container — container exited during prep, graph still not serializedjiachenzdocker/hla-la:latest (27.5GB image with pre-built graph) OR switch to T1Kjiachenzdocker/hla-la:latest — read extraction succeeds but HLA-LA C++ binary crashes during graph alignment even with 32GB RAM and 8 threads. Error: “HLA-LA execution not successful.”t1k-build.pl -d hla.dat -g reference.fasta.fai produced coordinate file with chr19 -1 -1 + for all HLA genes-g parameter expects the actual reference FASTA (3.1GB), not the FAI index (158KB). The AddGeneCoord.pl script needs to align HLA sequences against the genome to find coordinates.-g Homo_sapiens_assembly38.fasta (the full FASTA, not the .fai)bam-extractor runs but produces empty _candidate_1.fq and _candidate_2.fq-1 -1 coordinates (see above), so no genomic region was extractedNone for CYP2D6 star allelesstaphb/bcftools:1.21 does not include bgzip or tabix in $PATHbcftools view -Oz -o output.vcf.gz (native bgzip output) and bcftools index -t (native tabix index) instead of piping to bgzip/tabixquay.io/biocontainers/samtools:1.21 which includes all htslib toolsmis_ready/ were 0 bytesbgzip -c > output.vcf.gz which failed silently because bgzip wasn’t availablebcftools view -Oz -oCannot open Local file /opt/vep/.vep/tmp/homo_sapiens_vep_112_GRCh38.tar.gz--user root and pre-create the temp dir: mkdir -p /opt/vep/.vep/tmp && chmod 777 /opt/vep/.vep/tmpwget -c (supports resume), then extract with tar xzf, then run VEP with --cache --dir_cache /path/to/cachedocker run --cpus 4 --memory 8g ...
Without limits, tools like DeepVariant or minimap2 will consume ALL available RAM and crash the host.
docker build --platform linux/amd64 ...
macOS is arm64; most Linux servers are amd64. Images built on Mac without --platform won’t run on amd64 servers.
Always use --rm for analysis containers to avoid accumulating stopped containers. Use -d (detached) for long-running jobs.
Most bioinformatics containers run as non-root users. If writing to bind-mounted volumes, add --user root to avoid permission issues.
ludeeus/action-shellcheck@master exits non-zero even when configured with severity: warningmain, “warning-only” findings still block mergesmain only after CI is greenLint and Smoke Tests green first, then enable required checks, block force-pushes, and block deletionrobertopreste/mtoolbox:latest — “repository does not exist or may require docker login”broadinstitute/gatk:latest). Mutect2 handles mitochondrial heteroplasmy detection natively and is well-maintained.quay.io/biocontainers/cnvnator:0.4.1--py312hc02a2a2_7 — manifest not foundquay.io/biocontainers/cnvnator:0.4.1--py312h99c8fb2_11. Biocontainer hashes encode the conda build hash and change between builds. Always verify at quay.io/repository/biocontainers/cnvnator.quay.io/biocontainers/delly:1.2.9--ha41ced6_0 — manifest not foundquay.io/biocontainers/delly:1.7.3--hd6466ae_0 (latest as of Mar 2026). Biocontainer tags are version-specific and change frequently — always verify at quay.io/repository/biocontainers/delly.docker stats to confirm the container is still using CPU. If CPU is at 0%, the process may actually be stuck..bcf extension.bcftools view input.bcf -Oz -o output.vcf.gz and index with bcftools index -t output.vcf.gz. The pipeline script handles this automatically.-tree step, the .root file stays at 266 bytes (just the ROOT header) until the entire BAM is parsed..root file from a previous failed run is blocking it. Delete and retry.cpsr --pcgr_dir /genome/pcgr_data/data → “Data directory (/genome/pcgr_data/data/data) does not exist”/data to whatever --pcgr_dir you pass. If you point to the data/ directory inside the extracted bundle, it looks for data/data/.--pcgr_dir to the parent of the data/ directory: --pcgr_dir /genome/pcgr_data (not /genome/pcgr_data/data)sigven/cpsr:2.0.0 does not exist on Docker Hubsigven/pcgr:1.4.1 which bundles both pcgr and cpsr binaries at /usr/local/bin/chr prefix (which GRCh38 BAMs already have)quay.io/biocontainers/freebayes:1.3.7--h1870644_0 — freebayes --version works, but actual variant calling triggers SIGILL (illegal instruction, exit code 132)quay.io/biocontainers/freebayes:1.3.6--hbfe0e7f_2 which works correctly--memory 16g was too tight — would have OOM-killed at 80% usage--memory 32g for full-genome runs. Peak observed was 12.8GB but growth is non-linear and region-dependent.-t or --threads flag. Full 30X WGS takes 8-12 hours.--region chr22 (or INTERVALS=chr22) for quick testing (~20-40 min)bcftools index -t fails with “index file exists” after GATK already creates its own .tbibcftools index -ft (with -f force flag) to overwrite the GATK-generated indexHomo_sapiens_assembly38.dict is missinggatk CreateSequenceDictionary -R /genome/reference/Homo_sapiens_assembly38.fastabcftools isec -R chr22 treats -R (uppercase) as a BED file path, fails with “file not found”-r chr22 (lowercase) for region strings. -R expects a file.tiddit --sv exits with “The reference must be indexed using bwa index; run bwa index, or skip local assembly (–skip_assembly)”--skip_assembly when using minimap2 alignments (no BWA index available). If using BWA-MEM2 alignment, the index files are compatible.quay.io/biocontainers/tiddit:3.7.0--py312h24f4cff_1 — manifest unknownquay.io/biocontainers/tiddit:3.9.5--py312h6e8b409_0. Always verify tags at quay.io/repository/biocontainers/tiddit.--callRegions reference.fasta.fai → “Can’t find expected call-regions bed index file”staphb/samtools:1.20 nor staphb/bcftools:1.21 include bgzip or tabix in PATHbroadinstitute/gatk:4.6.1.0 which has both at /usr/bin/bgzip and /usr/bin/tabix. Or use bcftools view -Oz as a bgzip alternative.bcftools filter or vcffilter before use.plink --23file (1.9) to import + plink2 --ref-from-fa force to fix REF/ALT.bim stores only one allele for these. --ref-from-fa cannot create a proper ALT because there’s no second allele slot. Homozygous ALT genotypes silently become homozygous REF.REF=A, ALT=., GT=0/0 (WRONG). bcftools: REF=T, ALT=A, GT=1/1 (CORRECT). ~66K positions (11%) corrupted.bcftools convert --tsv2vcf -f <reference.fa>. Single command, no intermediate binary format.--allow-extra-chr cannot be used with --23file"RSID","CHROMOSOME","POSITION","RESULT" columns## comments, header, quotes; convert commas to tabsbcftools annotate --rename-chrs between conversion and liftover-G30 required (no FORMAT/PL in chip VCF)no-mean-imputation required (single sample lacks allele frequencies)