genomics-pipeline

Step 2: Alignment (FASTQ to BAM)

What This Does

Aligns raw sequencing reads against the GRCh38 human reference genome. Produces a sorted, indexed BAM file.

Why

Alignment maps each 150bp sequencing read to its position in the human genome. Required for all downstream variant calling.

Tools

Docker Images

Prerequisites

Commands

SAMPLE=your_sample
GENOME_DIR=/path/to/your/data
REF=${GENOME_DIR}/reference/Homo_sapiens_assembly38.fasta

# Step 1: Create minimap2 index (one-time, ~30 min)
minimap2 -d ${GENOME_DIR}/reference/GRCh38.mmi $REF

# Step 2: Align + sort (1-2 hours for 30X WGS)
minimap2 -a -x sr -t 16 \
  ${GENOME_DIR}/reference/GRCh38.mmi \
  ${GENOME_DIR}/${SAMPLE}/fastq/${SAMPLE}_R1.fastq.gz \
  ${GENOME_DIR}/${SAMPLE}/fastq/${SAMPLE}_R2.fastq.gz \
| samtools sort -@ 8 -o ${GENOME_DIR}/${SAMPLE}/aligned/${SAMPLE}_sorted.bam

# Step 3: Index BAM
samtools index ${GENOME_DIR}/${SAMPLE}/aligned/${SAMPLE}_sorted.bam

# Output: ~30-40GB BAM + ~9MB BAI index

Resource Requirements

Notes