Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

高校生への科学アウトリーチ活動

less than 1 minute read

Published:

Occasionally, Yosuke performs science outreach talks. This post explains an example of such activities targeted for Japanese high schoolers.

Egronomics

less than 1 minute read

Published:

Kinesis Advantageous 2 keyboard.

Using tmux on mac

less than 1 minute read

Published:

Terminal multiplexer software, such as tmux and screen, is a useful tool. When using tmux on mac, I encountered the following error.

Some tips when working with PLINK2

1 minute read

Published:

PLINK is a well-established software for genetic analysis. In many project, we used plink2 for genome-wide association study (GWAS) and other computations related to the raw genotype matrix. Here, I list several tips on the use of this software.

Working with very small values in R

less than 1 minute read

Published:

When you’re working on extremely small floating numbers in R (such as when you have strong p-values), there are a few options.

Using Bundler via Docker on mac

less than 1 minute read

Published:

Ruby bundler is a convenient tool to manage the Ruby environments. This website, for example, is built with Jekyll.

figshare API

less than 1 minute read

Published:

Figshare is a data hosting service to make the scientific results available. While they have a nice web interface to upload your files, you sometimes want to upload a large file (> 5GB, for example). For that purpose, they provide an API access. Here is my example on how to use their API.

Useful links

less than 1 minute read

Published:

Here, I list publicly available resources.

SLURM

less than 1 minute read

Published:

SLURM is one of the most common job scheduler used in many high performance cluster computing severs (HPC). Here, I summarize useful SLURM commands.

publications

Collaborative environmental DNA sampling from petal surfaces of flowering cherry Cerasus × yedoensis ‘Somei-yoshino’ across the Japanese archipelago

Published in Journal of Plant Research, 2018

Cerasus × yedoensis ‘Somei-yoshino’ is a common cherry blossoms tree in Japan. The wide-spread clone of the single species across the country provides excellent opportunities to investigate the composition of microbiome and its interaction with its surronding environment. As a pilot project of such an analysis, we collected 577 environmental DNA samples via crowd sourcing among participants of a scientific conference, and characterized the composition of microbial species on petal surfaces.


Citation: T. Ohta, T. Kawashima, N. O. Shinozaki, A. Dobashi, S. Hiraoka, T. Hoshino, K. Kanno, T. Kataoka, S. Kawashima, M. Matsui, W. Nemoto, S. Nishijima, N. Suganuma, H. Suzuki, Y. Taguchi, Y. Takenaka, Y. Tanigawa, M. Tsuneyoshi, K. Yoshitake, Y. Sato, R. Yamashita, K. Arakawa, W. Iwasaki, Collaborative environmental DNA sampling from petal surfaces of flowering cherry Cerasus × yedoensis ‘Somei-yoshino’ across the Japanese archipelago. J Plant Res. 131, 709-717 (2018). https://doi.org/10.1007/s10265-018-1017-x

Medical relevance of protein-truncating variants across 337,205 individuals in the UK Biobank study

Published in Nature Communications, 2018

Using the UK Biobank population cohort, we investigated the genetic effects of Protein-truncating variants (PTVs) and the clinical impacts.


Citation: C. DeBoever, Y. Tanigawa, M. E. Lindholm, G. McInnes, A. Lavertu, E. Ingelsson, C. Chang, E. A. Ashley, C. D. Bustamante, M. J. Daly, M. A. Rivas, Medical relevance of protein-truncating variants across 337,205 individuals in the UK Biobank study. Nat Commun. 9, 1612 (2018). https://doi.org/10.1038/s41467-018-03910-9

SNPs2ChIP: Latent Factors of ChIP-seq to infer functions of non-coding SNPs

Published in Pacific Symposium on Biocomputing, 2018

We propose SNPs2ChIP, a method to infer biological functions of non-coding variants through unsupervised statistical learning methods applied to publicly-available epigenetic datasets.

Citation: S. Anand, L. Kalesinskas, C. Smail, Y. Tanigawa, SNPs2ChIP: Latent Factors of ChIP-seq to infer functions of non-coding SNPs. Pac Symp Biocomput. 2019, 24: 184-195 (WORLD SCIENTIFIC, 2018). https://doi.org/10.1142/9789813279827_0017

Global Biobank Engine: enabling genotype-phenotype browsing for biobank summary statistics

Published in Bioinformatics, 2018

We present Global Biobank Engine as a platform to visualize genome- and phenome-wide associations and to perform statistical inference using those association data.


Citation: G. McInnes, Y. Tanigawa, C. DeBoever, A. Lavertu, J. E. Olivieri, M. Aguirre, M. A. Rivas, Global Biobank Engine: enabling genotype-phenotype browsing for biobank summary statistics. Bioinformatics 35(14), 2495-2497 (2019). https://doi.org/10.1093/bioinformatics/bty999

Significant shared heritability underlies suicide attempt and clinically predicted probability of attempting suicide

Published in Molecular Psychiatry, 2019

Using two independent datasets from genotyped cohorts (UK Biobank and electronic medical record (EMR) in Vanderbilt University Medical Center), we quantified the heritability estimates of sucide attempt. We also showed the shared genetic basis of sucide attempt and other phenotypes.


Citation: D. M. Ruderfer, C. G. Walsh, M. W. Aguirre, Y. Tanigawa, J. D. Ribeiro, J. C. Franklin, M. A. Rivas, Significant shared heritability underlies suicide attempt and clinically predicted probability of attempting suicide. Mol Psychiatry. 25, 2422-2430 (2020). https://doi.org/10.1038/s41380-018-0326-8

[Preprint] A Fast and Scalable Framework for Large-scale and Ultrahigh-dimensional Sparse Regression with Application to the UK Biobank

Preprint posted on bioRxiv, 2019

In this project led by Junyang Qian, we developed BASIL, a novel algorithm to fit large-scale L1 penalized (Lasso) regression model using an iterative procedure, and implemented R snpnet package specially designed for genetic data. We demonstrate the ability of this approach in an application to UK Biobank dataset.


Citation: J. Qian, Y. Tanigawa, W. Du, M. Aguirre, R. Tibshirani, M. A. Rivas, T. Hastie, A Fast and Scalable Framework for Large-scale and Ultrahigh-dimensional Sparse Regression with Application to the UK Biobank. bioRxiv, 630079 (2019). https://doi.org/10.1101/630079

[Preprint] Genetics of 38 blood and urine biomarkers in the UK Biobank

Preprint posted on bioRxiv, 2019

We characterized the genetics of 35 biomarkers in UK Biobank. We performed the association and fine-mapping analysis to prioritize the causal variants, constructed the polygenic risk score (PRS) models, and evaluated their medical relevance with causal inference and PRS-PheWAS. We demonstrate a new approach, called multi-PRS, to improve PRS by combining PRSs across traits.


Citation: N. Sinnott-Armstrong*, Y. Tanigawa*, D. Amar, N. J. Mars, M. Aguirre, G. R. Venkataraman, M. Wainberg, H. M. Ollila, J. P. Pirruccello, J. Qian, A. Shcherbina, FinnGen, F. Rodriguez, T. L. Assimes, V. Agarwala, R. Tibshirani, T. Hastie, S. Ripatti, J. K. Pritchard, M. J. Daly, M. A. Rivas, Genetics of 38 blood and urine biomarkers in the UK Biobank. bioRxiv, 660506 (2019). https://doi.org/10.1101/660506

[Preprint] WhichTF is dominant in your open chromatin data?

Preprint posted on bioRxiv, 2019

To identify functionally important transcription factors (TFs), we developed WhichTF. This method takes experimentally characterized chromatin accessibilty measure as the input and returns a ranked list of TFs. We combined available genomic resources, such as gene regulatory domain models, conservation-aware prediction of TF binding sites, and ontology annotation of genes, for this task.


Citation: Y. Tanigawa*, E. S. Dyer*, G. Bejerano, WhichTF is dominant in your open chromatin data? bioRxiv, 730200 (2019). https://doi.org/10.1101/730200

Components of genetic associations across 2,138 phenotypes in the UK Biobank highlight adipocyte biology

Published in Nature Communications, 2019

While many pleiotropic genetic loci have been identified, how they contribute to phenotypes across traits and diseases is unclear. Here, the authors propose decomposition of genetic associations (DeGAs), which uses singular value decomposition, to characterize the underlying latent structure of genetic associations of 2,138 phenotypes.


Citation: Y. Tanigawa*, J. Li*, J. M. Justesen, H. Horn, M. Aguirre, C. DeBoever, C. Chang, B. Narasimhan, K. Lage, T. Hastie, C. Y. Park, G. Bejerano, E. Ingelsson, M. A. Rivas, Components of genetic associations across 2,138 phenotypes in the UK Biobank highlight adipocyte biology. Nat Commun. 10, 4064 (2019). https://doi.org/10.1038/s41467-019-11953-9

[Preprint] Polygenic risk modeling with latent trait-related genetic components

Preprint posted on bioRxiv, 2019

Polygenic risk score (PRS) has been proposed for disease risk prediction with potential clinical relevance for some traits, but its personalized interpretation is generally difficult, especially when there existis disease subtypes driven by different genetic components. Here, we introduce dPRS (DeGAs-PRS) as an extension of Decomposition of Genetic Associasions (DeGAs) to decompose the polygenic risk of an individuals into latent components of genetic associations characterized from hundreads of thousands of traits.


Citation: M. Aguirre, Y. Tanigawa, G. Venkataraman, R. J. Tibshirani, T. Hastie, M. A. Rivas, Polygenic risk modeling with latent trait-related genetic components. bioRxiv, 808675 (2019). https://doi.org/10.1101/808675

[Preprint] Sex-specific genetic effects across biomarkers

Preprint posted on bioRxiv, 2019

In this study led by Emily Flynn, we discovered a surprising sex-specificity in the genetics of testosterone. Yosuke performed polygenic risk score (PRS) analysis and demonstrated that PRS models trained for each sex show improvements in predictive accuracy.


Citation: E. Flynn, Y. Tanigawa, F. Rodriguez, R. B. Altman, N. Sinnott-Armstrong, M. A. Rivas, Sex-specific genetic effects across biomarkers. bioRxiv, 837021 (2019). https://doi.org/10.1101/837021

[Preprint] Medical relevance of common protein-altering variants in GPCR genes across 337,205 individuals in the UK Biobank study

Preprint posted on bioRxiv, 2019

Citation: C. DeBoever, A. J. Venkatakrishnan, J. M. Paggi, F. M. Heydenreich, S.-A. Laurin, M. Masureel, Y. Tanigawa, G. Venkataraman, M. Bouvier, R. Dror, M. A. Rivas, Medical relevance of common protein-altering variants in GPCR genes across 337,205 individuals in the UK Biobank study. bioRxiv, 2019.12.13.876250 (2019). https://doi.org/10.1101/2019.12.13.876250

[Preprint] Fast Lasso method for Large-scale and Ultrahigh-dimensional Cox Model with applications to UK Biobank

Preprint posted on bioRxiv, 2020

We propose an extension of BASIL/snpnet alrogirhm to fit L1 penalized Cox proportional hazards model using a large-scale dataset from a genotyped cohort. We present its application to 300+ time-to-event traits in UK Biobank.


Citation: R. Li, C. Chang, J. M. Justesen, Y. Tanigawa, J. Qian, T. Hastie, M. A. Rivas, R. J. Tibshirani, Fast Lasso method for Large-scale and Ultrahigh-dimensional Cox Model with applications to UK Biobank. bioRxiv, 2020.01.20.913194 (2020). https://doi.org/10.1101/2020.01.20.913194

[Preprint] Cardiac imaging of aortic valve area from 26,142 UK Biobank participants reveal novel genetic associations and shared genetic comorbidity with multiple disease phenotypes

Preprint posted on medRxiv, 2020

Citation: A. Cordova-Palomera, C. Tcheandjieu, J. Fries, P. Varma, V. Chen, M. Fiterau, K. Xiao, H. Tejeda, B. Keavney, H. Cordell, Y. Tanigawa, G. Venkataraman, M. Rivas, C. Re, E. Ashley, J. R. Priest, Cardiac imaging of aortic valve area from 26,142 UK Biobank participants reveal novel genetic associations and shared genetic comorbidity with multiple disease phenotypes. medRxiv, 2020.04.09.20060012 (2020). https://doi.org/10.1101/2020.04.09.20060012

Rare protein-altering variants in ANGPTL7 lower intraocular pressure and protect against glaucoma

Published in PLOS Genetics, 2020

From the analysis of more than 500,000 individuals in population cohorts, we identified rare protein-altering variants in ANGPTL7 that reduces the risk of glaucoma. One of the alleles reported in the study (220C) are highly (50x +) enriched in Finnish population, highlighting the power of the founder population with prior a bottlenecking event in genetic discovery. With the comprehensive health informations in the two studied cohorts, we assess the potential impact of the rare variants on a spectrum of human disorders. We did not find any severe medical consequences. Taken together, our results indicate that ANGPTL7 as a safe and effective therapeutic target for glaucoma.
This paper was highlighted as Editors’ Choice in Science.


Citation: Y. Tanigawa, M. Wainberg, J. Karjalainen, T. Kiiskinen, G. Venkataraman, S. Lemmelä, J. A. Turunen, R. R. Graham, A. S. Havulinna, M. Perola, A. Palotie, FinnGen, M. J. Daly, M. A. Rivas, Rare protein-altering variants in ANGPTL7 lower intraocular pressure and protect against glaucoma. PLOS Genetics. 16, e1008682 (2020). https://doi.org/10.1371/journal.pgen.1008682

Assessing Digital Phenotyping to Enhance Genetic Studies of Human Diseases

Published in The American Journal of Human Genetics, 2020

Large-scale population-based genotyped biobanks with dense phenotypic information provide opportunities for genetic analysis at scale. However, the heterogenous phenotypic data sources in such biobanks present challenges in disease case assertation. Here, we evaluated the consistencies of genetic associations identified from hospital records, questionnaire responses, and family history of diseases using genetic parameters, such as genetic correlation.We also showed the utilities of combining those unstructured heterogeneous data sources to improve the power of genetic analysis.


Citation: C. DeBoever, Y. Tanigawa, M. Aguirre, G. McInnes, A. Lavertu, M. A. Rivas, Assessing Digital Phenotyping to Enhance Genetic Studies of Human Diseases. The American Journal of Human Genetics. 106, 611-622 (2020). https://doi.org/10.1016/j.ajhg.2020.03.007

[Preprint] Pervasive additive and non-additive effects within the HLA region contribute to disease risk in the UK Biobank

Preprint posted on bioRxiv, 2020

We characterized the genetic associations between HLA allelotypes and comprehensive human disease phenotypes in UK Biobank.


Citation: G. R. Venkataraman, J. E. Olivieri, C. DeBoever, Y. Tanigawa, J. M. Justesen, M. A. Rivas, Pervasive additive and non-additive effects within the HLA region contribute to disease risk in the UK Biobank. bioRxiv, 2020.05.28.119669 (2020). https://doi.org/10.1101/2020.05.28.119669

[Preprint] Large-Scale Sparse Regression for Multiple Responses with Applications to UK Biobank

Preprint posted on bioRxiv, 2020

In this study led by Junyang Qian, we present a method to fit sparse multi-variate and multi-response regression model. When demonstrate the application to the UK Biobank biomarker traits, where we investigated the latent structure of regression coefficients using biplot representation.


Citation: J. Qian, Y. Tanigawa, R. Li, R. Tibshirani, M. A. Rivas, T. Hastie, Large-Scale Sparse Regression for Multiple Responses with Applications to UK Biobank. bioRxiv, 2020.05.30.125252 (2020). https://doi.org/10.1101/2020.05.30.125252

[Preprint] High-throughput SARS-CoV-2 and host genome sequencing from single nasopharyngeal swabs

Preprint posted on medRxiv, 2020

In this pre-print, we describe a new method to generate host and pathogen genomic data.


Citation: J. E. Gorzynski*, H. N. D. Jong*, D. Amar, C. R. Hughes, A. Ioannidis, R. Bierman, D. Liu, Y. Tanigawa, A. Kistler, J. Kamm, J. Kim, L. Cappello, N. F. Neff, S. Rubinacci, O. Delaneua, M. J. Shoura, K. Seo, A. Kirillova, A. Raja, S. Sutton, C. Huang, M. K. Sahoo, K. C. Mallempati, G. Montero-Martin, K. Osoegawa, N. Watson, N. Hammond, R. Joshi, M. Fernandez-Vina, J. W. Christle, M. T. Wheeler, P. Febbo, K. Farh, G. Schroth, F. Desouza, J. Palacios, J. Salzman, B. A. Pinsky, M. A. Rivas, C. D. Bustamante, E. A. Ashley, V. N. Parikh, High-throughput SARS-CoV-2 and host genome sequencing from single nasopharyngeal swabs, medRxiv, 2020.07.27.20163147 (2020). https://doi.org/10.1101/2020.07.27.20163147

[Preprint] LPA and APOE are associated with statin selection in the UK Biobank

Preprint posted on bioRxiv, 2020

Statin is a commonly used drug for high cholesterol. Physicians adjust the type and dose of statin based on the observed response to the treatment. To investigate the role of genetics, we performed genome-wide association scan to identify genetic variants associated with statin selection. When we investigated the identified variants in LPA and APOE, we found that the carriers of those variants more likely to be on a higher dose of statin.


Citation: A. Lavertu*, G. M. McInnes*, Y. Tanigawa, R. B. Altman, M. A. Rivas, LPA and APOE are associated with statin selection in the UK Biobank. bioRxiv, 2020.08.28.272765 (2020). https://doi.org/10.1101/2020.08.28.272765

Sex-specific genetic effects across biomarkers

Published in European Journal of Human Genetics, 2020

In this study led by Emily Flynn, we discovered a surprising sex-specificity in the genetics of testosterone. Yosuke performed polygenic risk score (PRS) analysis and demonstrated that PRS models trained for each sex show improvements in predictive accuracy.


Citation: E. Flynn, Y. Tanigawa, F. Rodriguez, R. B. Altman, N. Sinnott-Armstrong, M. A. Rivas, Sex-specific genetic effects across biomarkers. European Journal of Human Genetics, 1-10 (2020). https://doi.org/10.1038/s41431-020-00712-w

Fast Lasso method for large-scale and ultrahigh-dimensional Cox model with applications to UK Biobank

Published in Biostatistics, 2020

We propose an extension of BASIL/snpnet alrogirhm to fit L1 penalized Cox proportional hazards model using a large-scale dataset from a genotyped cohort. We present its application to 300+ time-to-event traits in UK Biobank.


Citation: R. Li, C. Chang, J. M. Justesen, Y. Tanigawa, J. Qiang, T. Hastie, M. A. Rivas, R. Tibshirani, Fast Lasso method for large-scale and ultrahigh-dimensional Cox model with applications to UK Biobank. Biostatistics (2020). https://doi.org/doi:10.1093/biostatistics/kxaa038

resources

Global Biobank Engine

Published:

We, the Rivas Lab, have aggregated summary statistics from population cohorts, originally from over 330,000 individuals from UK Biobank, and provide a browser and inference engine for the community. As of July 2020, our data now feature over 750,000 individuals across three cohorts: UK Biobank, Million Veterans Program and Biobank Japan.


Resource: Global Biobank Engine https://gbe.stanford.edu/

talks

Multi-trait analysis informs genetic disease studies

Published:

I had a wonderful opportunity to give a virtual oral presentation at Informatics in Biology, Medicine, and Pharmacology conference, 2020. I talked about joint analysis of multiple traits in genetic disease studies using DeGAs and multi-PRS as example projects.


teaching