scispace - formally typeset
Search or ask a question
Author

Katherine Decker

Bio: Katherine Decker is an academic researcher from University of California, San Diego. The author has contributed to research in topics: Chromatin immunoprecipitation & Genome. The author has an hindex of 3, co-authored 6 publications receiving 20 citations. Previous affiliations of Katherine Decker include University of California, Berkeley.

Papers
More filters
Journal ArticleDOI
TL;DR: iModulonDB (imodulondb.org), a knowledgebase of prokaryotic transcriptional regulation computed from high-quality transcriptomic datasets using ICA, is presented, enhancing research by presenting scientists of all backgrounds with co-expressed gene sets and their activity levels.
Abstract: Independent component analysis (ICA) of bacterial transcriptomes has emerged as a powerful tool for obtaining co-regulated, independently-modulated gene sets (iModulons), inferring their activities across a range of conditions, and enabling their association to known genetic regulators. By grouping and analyzing genes based on observations from big data alone, iModulons can provide a novel perspective into how the composition of the transcriptome adapts to environmental conditions. Here, we present iModulonDB (imodulondb.org), a knowledgebase of prokaryotic transcriptional regulation computed from high-quality transcriptomic datasets using ICA. Users select an organism from the home page and then search or browse the curated iModulons that make up its transcriptome. Each iModulon and gene has its own interactive dashboard, featuring plots and tables with clickable, hoverable, and downloadable features. This site enhances research by presenting scientists of all backgrounds with co-expressed gene sets and their activity levels, which lead to improved understanding of regulator-gene relationships, discovery of transcription factors, and the elucidation of unexpected relationships between conditions and genetic regulatory activity. The current release of iModulonDB covers three organisms (Escherichia coli, Staphylococcus aureus and Bacillus subtilis) with 204 iModulons, and can be expanded to cover many additional organisms.

40 citations

Posted ContentDOI
14 Aug 2020-bioRxiv
TL;DR: iModulonDB (imodulondb.org), a knowledgebase of prokaryotic transcriptional regulation computed from high-quality transcriptomic datasets using ICA, is presented, enhancing research by presenting scientists of all backgrounds with co-expressed gene sets and their activity levels.
Abstract: Independent component analysis (ICA) of bacterial transcriptomes has emerged as a powerful tool for obtaining co-regulated, independently-modulated gene sets (iModulons), inferring their activities across a range of conditions, and enabling their association to known genetic regulators. By grouping and analyzing genes based on observations from big data alone, iModulons can provide a novel perspective into how the composition of the transcriptome adapts to environmental conditions. Here, we present iModulonDB (imodulondb.com), a knowledgebase of prokaryotic transcriptional regulation computed from high-quality transcriptomic datasets using ICA. Users select an organism from the home page and then search or browse the curated iModulons that make up its transcriptome. Each iModulon and gene has its own interactive dashboard, featuring plots and tables with clickable, hoverable, and downloadable features. This site enhances research by presenting scientists of all backgrounds with co-expressed gene sets and their activity levels, which lead to improved understanding of regulator-gene relationships, discovery of transcription factors, and the elucidation of unexpected relationships between conditions and genetic regulatory activity. The current release of iModulonDB covers three organisms (E. coli, S. aureus, and B. subtilis) with 204 iModulons, and can be expanded to cover many additional organisms.

29 citations

Journal ArticleDOI
10 Nov 2020
TL;DR: Five different modes of TCS transcriptional regulation can lead to an understanding of the metabolic capabilities of bacteria and correctly predict complex phenotype under diverse conditions, especially when further incorporated with genome-scale metabolic models.
Abstract: Escherichia coli uses two-component systems (TCSs) to respond to environmental signals. TCSs affect gene expression and are parts of E. coli's global transcriptional regulatory network (TRN). Here, we identified the regulons of five TCSs in E. coli MG1655: BaeSR and CpxAR, which were stimulated by ethanol stress; KdpDE and PhoRB, induced by limiting potassium and phosphate, respectively; and ZraSR, stimulated by zinc. We analyzed RNA-seq data using independent component analysis (ICA). ChIP-exo data were used to validate condition-specific target gene binding sites. Based on these data, we do the following: (i) identify the target genes for each TCS; (ii) show how the target genes are transcribed in response to stimulus; and (iii) reveal novel relationships between TCSs, which indicate noncognate inducers for various response regulators, such as BaeR to iron starvation, CpxR to phosphate limitation, and PhoB and ZraR to cell envelope stress. Our understanding of the TRN in E. coli is thus notably expanded.IMPORTANCE E. coli is a common commensal microbe found in the human gut microenvironment; however, some strains cause diseases like diarrhea, urinary tract infections, and meningitis. E. coli's two-component systems (TCSs) modulate target gene expression, especially related to virulence, pathogenesis, and antimicrobial peptides, in response to environmental stimuli. Thus, it is of utmost importance to understand the transcriptional regulation of TCSs to infer bacterial environmental adaptation and disease pathogenicity. Utilizing a combinatorial approach integrating RNA sequencing (RNA-seq), independent component analysis, chromatin immunoprecipitation coupled with exonuclease treatment (ChIP-exo), and data mining, we suggest five different modes of TCS transcriptional regulation. Our data further highlight noncognate inducers of TCSs, which emphasizes the cross-regulatory nature of TCSs in E. coli and suggests that TCSs may have a role beyond their cognate functionalities. In summary, these results can lead to an understanding of the metabolic capabilities of bacteria and correctly predict complex phenotype under diverse conditions, especially when further incorporated with genome-scale metabolic models.

20 citations

Posted ContentDOI
11 Apr 2021-bioRxiv
TL;DR: The PRECISE 2.0 dataset as discussed by the authors is a large-scale, high-quality transcriptomics dataset which may be analyzed at multiple scales to yield important biological insights. But it is limited to Escherichia coli K-12 MG1655, containing 278 RNA-seq datasets created using a standardized protocol.
Abstract: Uncovering the structure of the transcriptional regulatory network (TRN) that modulates gene expression in prokaryotes remains an important challenge. Transcriptomics data is plentiful, necessitating the development of scalable methods for converting this data into useful knowledge about the TRN. Previously, we published the PRECISE dataset for Escherichia coli K-12 MG1655, containing 278 RNA-seq datasets created using a standardized protocol. Here, we present PRECISE 2.0, which is nearly three times the size of the original PRECISE dataset and also created using a standardized protocol. We analyze PRECISE 2.0 at multiple scales, demonstrating multiple analytical strategies for extracting knowledge from this dataset. Specifically, we: (1) highlight patterns in gene expression across the dataset; (2) utilize independent component analysis to extract 218 independently modulated groups of genes (iModulons) that describe the TRN at the systems level; (3) demonstrate the utility of iModulons over traditional differential expression analysis; and (4) uncover 6 new potential regulons. Thus, PRECISE 2.0 is a large-scale, high-quality transcriptomics dataset which may be analyzed at multiple scales to yield important biological insights.

19 citations

Posted ContentDOI
24 Jun 2020-bioRxiv
TL;DR: It is shown that E. coli’s two-component system (TCS) have five different modes of transcriptional regulation, which suggests that TCSs may have a role beyond their cognate functionalities and when further incorporated with genome scale metabolic models can lead to understanding of metabolic capabilities of bacteria.
Abstract: Escherichia coli uses two-component systems (TCSs) to respond to environmental signals TCSs affect gene expression and are parts of E coli’s global transcriptional regulatory network (TRN) Here, we identified the regulons of five TCSs in E coli MG1655: BaeSR and CpxAR, which were stimulated by ethanol stress; KdpDE and PhoRB, induced by limiting potassium and phosphate, respectively; and ZraSR, stimulated by zinc We analyzed RNA-seq data using independent component analysis (ICA) ChIP-exo data was used to validate condition-specific target gene binding sites Based on this data we (1) identify the target genes for each TCS; (2) show how the target genes are transcribed in response to stimulus; and (3) reveal novel relationships between TCSs, which indicate non-cognate inducers for various response regulators, such as BaeR to iron starvation, CpxR to phosphate limitation, and PhoB and ZraR to cell envelope stress Our understanding of the TRN in E coli is thus notably expanded Importance E coli is a common commensal microbe found in human gut microenvironment; however, some strains cause diseases like diarrhea, urinary tract infections and meningitis E coli’s two-component system (TCS) modulates target gene expression, specially related to virulence, pathogenesis and anti-microbial peptides, in response to environmental stimuli Thus, it is of utmost importance to understand the transcriptional regulation of the TCSs to infer its environmental adaptation and disease pathogenicity Utilizing a combinatorial approach integrating RNAseq, independent component analysis, ChIP-exo and data mining, we show that TCSs have five different modes of transcriptional regulation Our data further highlights non-cognate inducers of TCSs emphasizing cross-regulatory nature of TCSs in E coli and suggests that TCSs may have a role beyond their cognate functionalities In summary, these results when further incorporated with genome scale metabolic models can lead to understanding of metabolic capabilities of bacteria and correctly predict complex phenotype under diverse conditions

17 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The TRN structure and its condition-dependent activity uncover putative or recently discovered roles for at least five regulons, such as a relationship between histidine utilization and quorum sensing, and facilitates quantification of population-level sporulation states.
Abstract: The transcriptional regulatory network (TRN) of Bacillus subtilis coordinates cellular functions of fundamental interest, including metabolism, biofilm formation, and sporulation. Here, we use unsupervised machine learning to modularize the transcriptome and quantitatively describe regulatory activity under diverse conditions, creating an unbiased summary of gene expression. We obtain 83 independently modulated gene sets that explain most of the variance in expression and demonstrate that 76% of them represent the effects of known regulators. The TRN structure and its condition-dependent activity uncover putative or recently discovered roles for at least five regulons, such as a relationship between histidine utilization and quorum sensing. The TRN also facilitates quantification of population-level sporulation states. As this TRN covers the majority of the transcriptome and concisely characterizes the global expression state, it could inform research on nearly every aspect of transcriptional regulation in B. subtilis.

34 citations

Journal ArticleDOI
TL;DR: A radical upgrade to more than 2000 high-throughput datasets, processed to facilitate their comparison, introducing up-to-date collections of transcription termination sites, transcription units, as well as transcription factor binding interactions derived from ChIP-seq, Chip-exo, gSELEX and DAP-seq experiments.
Abstract: Genomics has set the basis for a variety of methodologies that produce high-throughput datasets identifying the different players that define gene regulation, particularly regulation of transcription initiation and operon organization. These datasets are available in public repositories, such as the Gene Expression Omnibus, or ArrayExpress. However, accessing and navigating such a wealth of data is not straightforward. No resource currently exists that offers all available high and low-throughput data on transcriptional regulation in Escherichia coli K-12 to easily use both as whole datasets, or as individual interactions and regulatory elements. RegulonDB (https://regulondb.ccg.unam.mx) began gathering high-throughput dataset collections in 2009, starting with transcription start sites, then adding ChIP-seq and gSELEX in 2012, with up to 99 different experimental high-throughput datasets available in 2019. In this paper we present a radical upgrade to more than 2000 high-throughput datasets, processed to facilitate their comparison, introducing up-to-date collections of transcription termination sites, transcription units, as well as transcription factor binding interactions derived from ChIP-seq, ChIP-exo, gSELEX and DAP-seq experiments, besides expression profiles derived from RNA-seq experiments. For ChIP-seq experiments we offer both the data as presented by the authors, as well as data uniformly processed in-house, enhancing their comparability, as well as the traceability of the methods and reproducibility of the results. Furthermore, we have expanded the tools available for browsing and visualization across and within datasets. We include comparisons against previously existing knowledge in RegulonDB from classic experiments, a nucleotide-resolution genome viewer, and an interface that enables users to browse datasets by querying their metadata. A particular effort was made to automatically extract detailed experimental growth conditions by implementing an assisted curation strategy applying Natural language processing and machine learning. We provide summaries with the total number of interactions found in each experiment, as well as tools to identify common results among different experiments. This is a long-awaited resource to make use of such wealth of knowledge and advance our understanding of the biology of the model bacterium E. coli K-12.

25 citations

Posted ContentDOI
02 Jul 2021-bioRxiv
TL;DR: In this paper, the authors introduce a workflow that converts all public gene expression data for a microbe into a dynamic representation of the organism's transcriptional regulatory network, which can be used to predict new regulons and analyze datasets in the context of all published data.
Abstract: We are firmly in the era of biological big data. Millions of omics datasets are publicly accessible and can be employed to support scientific research or build a holistic view of an organism. Here, we introduce a workflow that converts all public gene expression data for a microbe into a dynamic representation of the organism’s transcriptional regulatory network. This five-step process walks researchers through the mining, processing, curation, analysis, and characterization of all available expression data, using Bacillus subtilis as an example. The resulting reconstruction of the B. subtilis regulatory network can be leveraged to predict new regulons and analyze datasets in the context of all published data. The results are hosted at https://imodulondb.org/, and additional analyses can be performed using the PyModulon Python package. As the number of publicly available datasets increases, this pipeline will be applicable to a wide range of microbial pathogens and cell factories.

20 citations

Posted ContentDOI
11 Apr 2021-bioRxiv
TL;DR: The PRECISE 2.0 dataset as discussed by the authors is a large-scale, high-quality transcriptomics dataset which may be analyzed at multiple scales to yield important biological insights. But it is limited to Escherichia coli K-12 MG1655, containing 278 RNA-seq datasets created using a standardized protocol.
Abstract: Uncovering the structure of the transcriptional regulatory network (TRN) that modulates gene expression in prokaryotes remains an important challenge. Transcriptomics data is plentiful, necessitating the development of scalable methods for converting this data into useful knowledge about the TRN. Previously, we published the PRECISE dataset for Escherichia coli K-12 MG1655, containing 278 RNA-seq datasets created using a standardized protocol. Here, we present PRECISE 2.0, which is nearly three times the size of the original PRECISE dataset and also created using a standardized protocol. We analyze PRECISE 2.0 at multiple scales, demonstrating multiple analytical strategies for extracting knowledge from this dataset. Specifically, we: (1) highlight patterns in gene expression across the dataset; (2) utilize independent component analysis to extract 218 independently modulated groups of genes (iModulons) that describe the TRN at the systems level; (3) demonstrate the utility of iModulons over traditional differential expression analysis; and (4) uncover 6 new potential regulons. Thus, PRECISE 2.0 is a large-scale, high-quality transcriptomics dataset which may be analyzed at multiple scales to yield important biological insights.

19 citations

Journal ArticleDOI
TL;DR: In this paper , a review of the application of global transcription machinery engineering technology and the transcription factor-based biosensors and quorum sensing systems is presented, and strategies for optimizing the transcriptional regulatory tools' performance by refactoring transcription factors are summarized.

18 citations