Genetic risk modeling based on L1-penalized regression


I gave a research-in-progress talk in our department seminar.


The polygenic prediction has been extensively used to identify individuals with elevated risk in a given population. Albeit advancements in modern high-dimensional statistics, most of the genetic risk scores are built based on the significant associations from univariate marginal GWAS summary statistics for a single trait, limiting the incorporation of informative alleles with modest effects across the relevant phenotypes. To address this, we build genome-wide multivariate penalized regression models and their corresponding polygenic risk scores using batch screening iterative Lasso (BASIL) algorithm implemented in R snpnet package across more than 50 traits in UK Biobank, including serum and urine biomarkers and cancer phenotypes. We demonstrate its consistency with the GWAS-based approach and show improved predictive performance. As an example of further extensions of our approach, we introduce a method to combine risk scores across multiple traits to fit a better predictive model for a disease outcome.