We develop an ontology-guided approach to ranking tissue-/cell-type-specific transcription factors (TFs) from chromatin accessibility data.
Transcription factors (TFs) regulate context-specific gene expression. Genome-wide chromatin accessibility profile coupled with a computational analysis has been one of the most promising approaches to rank TFs enriched in a given sample. Most existing methods rely on statistical enrichment, either on sequence motifs or occurrences of the TF binding sites. We empirically find that those abundance-based approaches tend to be less cell-type-specific. For example, CTCF is frequently ranked among the top hits.
To address this, we propose an alternative, WhichTF, that aggregates the results of ontology-based stratified enrichment tests to rank TFs. Our model considers ontology-based annotations of genomic regions on top of high-confidence prediction of conserved TF binding sites. Specifically, we decompose the user-provided accessible profile into a series of genomic tracks annotated for ontology terms. We consider the most enriched terms in the accessible peaks as a proxy to capture cellular contexts. Aggregating across terms, we score and rank TFs.
When applied to well-characterized datasets, we find the cell-type-specific roles of the top-ranked TFs are supported by the literature.
Applying across ~90 human samples followed by the t-SNE projection of the WhichTF scores TFs indicates WhichTF captures biological similarities and dissimilarities of TF-mediated transcriptional programs.
To investigate closely related cells, we present differential TF ranking with WhichTF, where we apply WhichTF on differentially accessible regions. The top-ranked TFs are well-supported by the literature.
Beyond well-characterized cell types, we present applications in mesoderm development,
… and patient-derived samples for systemic lupus erythematosus.
Overall, we show that ontology-guided stratified enrichment is a powerful approach to ranking cell-type-specific TFs. We highlight examples in human and mouse cells, developmental trajectories, and disease samples.
This work was started and was jointly led by Ethan S. Dyer and myself while I was in Prof. Gill Bejerano’s lab at Stanford, with tremendous support and help from many lab members. I want to thank Gill, Ethan, and my colleagues. Thank you, all!
WhichTF software and reference data are available on BitBucket and figshare.
Also, as a part of this project, we performed a v4 update of the Genomic Regions Enrichment of Annotations Tool (GREAT).
Reference: Y. Tanigawa*, E. S. Dyer*, G. Bejerano, WhichTF is dominant in your open chromatin data? PLOS Comput Biol 18(8): e1010378. (2022). https://doi.org/10.1371/journal.pcbi.1010378