Some tips when working with PLINK2

1 minute read

Published:

PLINK is a well-established software for genetic analysis. In many project, we used plink2 for genome-wide association study (GWAS) and other computations related to the raw genotype matrix. Here, I list several tips on the use of this software.

pgenlibr to read plink2 pgen files from R

To read genotype data stored in plink2 pgen/pvar/psam files, one can use an R package called pgenlibr. Here I show some example usage of pgenlibr.

How to install pgenlibr in your R environment?

$ git clone https://github.com/chrchang/plink-ng.git
$ R
> install.packages('plink-ng/2.0/pgenlibr', repos = NULL, type='source')

Some basic usage of pgenlibr

require(pgenlibr)

pfile <- '@@@/sample_genotype_data'

pvar <- NewPvar(paste0(pfile, '.pvar.zst'))
pgen <- NewPgen(paste0(pfile, '.pgen'), pvar=pvar)

GetVariantId(pvar, 1)
GetVariantId(pvar, 10)
GetVariantCt(pvar)
GetRawSampleCt(pgen)

# read a single variant
buf <- pgenlibr::Buf(pgen)
pgenlibr::Read(pgen, buf, 1)

# read a list of variants
## var.idxs (list of variants you would like to read)
geno_mat <- ReadList(pgen, var.idxs, meanimpute=F)

Running recessive model on chrX

As illustrated in PLINK2 website, --glm recessive will be the modifier to run GWAS scan with recessive model. However, this does not work well for X chromosome (chrX). To mitigate this limitation, we can record the chromosome X in the pvar file and pretend as if another autosome first and run thee association analysis.

  1. use --output-chr 26 to change the .pvar to encode chrX as “23”
  2. run the original --glm command with e.g. --chr-set 40 (which specifies 40 instead of 22 autosomes).

GWAS of multiple quantitative traits

Starting plink2/20190401, the program supports efficient linear regression of multiple quantitative phenotypes.

–glm without –adjust now detects groups of quantitative phenotypes with the same “missingness pattern”, and processes them together

One should use this feature for efficient computation.