Runs principal component analysis (PCA) on your sample using common SNPs shared with the 1000 Genomes Project reference panel. Because this is single-sample PCA (not a joint projection with the 1000G cohort), the resulting PC values capture your genome’s internal variance structure but are not directly comparable to published population cluster plots. See Single-sample limitation below for details.
Knowing your genetic ancestry is useful for two practical reasons:
Note: This step produces single-sample PCA, which is a starting point but cannot place you on a population map without joint analysis. See Interpreting Results for details.
pgscatalog/plink2:2.00a5.10
staphb/bcftools:1.21
${GENOME_DIR}/${SAMPLE}/vcf/${SAMPLE}.vcf.gz./scripts/26-ancestry.sh your_name
bcftools isec| File | Contents |
|---|---|
${SAMPLE}_pca.eigenvec |
Principal component values (10 PCs per sample) |
${SAMPLE}_pca.eigenval |
Eigenvalues showing variance explained by each PC |
${SAMPLE}_shared.vcf.gz |
SNPs shared between your sample and 1000G |
${SAMPLE}_ld.prune.in |
SNPs retained after LD pruning |
${SAMPLE}_ld.prune.out |
SNPs removed by LD pruning |
All output is written to ${GENOME_DIR}/${SAMPLE}/ancestry/. Reference data is cached in ${GENOME_DIR}/ancestry_ref/.
~15-30 minutes (dominated by the initial 1000G download on first run; subsequent runs are faster).
The eigenvec file contains your sample’s coordinates on 10 principal components. The eigenval file shows how much variance each PC explains.
This script runs PCA on your sample alone, not jointly with the 1000G reference panel. This is a fundamental limitation: in population-structure PCA (Price et al. 2006), the PC axes are defined by the variance across many individuals. With a single sample, the axes instead capture internal genotype variance (e.g., heterozygosity patterns), which does not map onto population-level structure.
The PC values from this step are not comparable to published 1000G PCA plots, where PC1 separates African from non-African ancestry and PC2 separates European from East Asian. Those axis interpretations require joint PCA across a multi-population cohort.
To properly place yourself on a population map, you would need to:
This pipeline does not perform joint PCA. The single-sample output is included as a starting point for users who want to extend it with their own reference panel.
${GENOME_DIR}/ancestry_ref/. Delete this directory to force re-download.--admixture approach.