Open AccessDissertation
G-quadruplexes and gene expression in Arabidopsis thaliana
TLDR
A novel method for identifying G4s is introduced, which uses a machine learning approach trained on datasets derived from the high throughput sequencing of G4 structures, to study the prevalence of PG4s in the genome of Arabidopsis thaliana, the model plant.Abstract:
G-Quadruplexes (G4s) are four stranded DNA structures which form in regions with high GC content and high GC skew. Because of the dependence of G4 structure on specific sequences, it is possible to predict putative G4s (PG4s) throughout genomic sequence. PG4s are non-uniformly distributed in genomes, with higher densities within various genic features, particularly promoters, 5’ untranslated regions (UTRs) and coding sequences (CDSs). When they form G4s, these sequences can have a variety of implications for biological processes including replication, transcription, translation and splicing. Here, we introduce a novel method for identifying PG4s, which uses a machine learning approach trained on datasets derived from the high throughput sequencing of G4 structures. We apply this and other techniques, to study the prevalence of PG4s in the genome of Arabidopsis thaliana, the model plant. Finally, we study the effect of G4 stabilisation on gene expression in Arabidopsis, using the GQuadruplex binding agent N-methyl mesoporphyrin (NMM). We identify a family of genes which are strongly downregulated by NMM, and find that they contain large numbers of PG4s in their CDSs.read more
Citations
More filters
Integrative Genomics Viewer
James T. Robinson,Helga Thorvaldsdottir,Wendy Winckler,Mitchell Guttman,Eric S. Lander,Eric S. Lander,Gad Getz,Jill P. Mesirov +7 more
TL;DR: The sheer volume and scope of data posed by this flood of data pose a significant challenge to the development of efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data.
Chromatin-state discovery and genome annotation with ChromHMM
Jason Ernst,Manolis Kellis +1 more
TL;DR: ChromHMM combines multiple genome-wide epigenomic maps, and uses combinatorial and spatial mark patterns to infer a complete annotation for each cell type, and provides an automated enrichment analysis of the resulting annotations to facilitate the functional interpretations of each chromatin state.
Journal Article
Python: the tutorial
TL;DR: Over the years, programming languages have grown more powerful, but correspondingly more complex; and while that complexity is fine and appropriate for professional programmers, it hinders and discourages beginning Computer Science students.
Journal ArticleDOI
A Hitchhiker's Guide to…
TL;DR: In this paper, the authors present a summary of issues that faculty members should review as they begin to consider retirement, including the benefits they consider to be important and the issues that need to be considered.
References
More filters
Journal ArticleDOI
Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks
Cole Trapnell,Adam Roberts,Loyal A. Goff,Loyal A. Goff,Loyal A. Goff,Geo Pertea,Daehwan Kim,Daehwan Kim,David R. Kelley,David R. Kelley,Harold Pimentel,Steven L. Salzberg,John L. Rinn,John L. Rinn,Lior Pachter +14 more
TL;DR: This protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results, which takes less than 1 d of computer time for typical experiments and ∼1 h of hands-on time.
Journal ArticleDOI
Integrative genomics viewer
James T. Robinson,Helga Thorvaldsdottir,Wendy Winckler,Mitchell Guttman,Eric S. Lander,Eric S. Lander,Gad Getz,Jill P. Mesirov +7 more
TL;DR: In this article, the authors present an approach for efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data.
Posted Content
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
Martín Abadi,Ashish Agarwal,Paul Barham,Eugene Brevdo,Zhifeng Chen,Craig Citro,Greg S. Corrado,Andy Davis,Jeffrey Dean,Matthieu Devin,Sanjay Ghemawat,Ian Goodfellow,Andrew Harp,Geoffrey Irving,Michael Isard,Yangqing Jia,Rafal Jozefowicz,Lukasz Kaiser,Manjunath Kudlur,Josh Levenberg,Dan Mané,Rajat Monga,Sherry Moore,Derek G. Murray,Chris Olah,Mike Schuster,Jonathon Shlens,Benoit Steiner,Ilya Sutskever,Kunal Talwar,Paul A. Tucker,Vincent Vanhoucke,Vijay K. Vasudevan,Fernanda B. Viégas,Oriol Vinyals,Pete Warden,Martin Wattenberg,Martin Wicke,Yuan Yu,Xiaoqiang Zheng +39 more
TL;DR: The TensorFlow interface and an implementation of that interface that is built at Google are described, which has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields.
Journal ArticleDOI
The functions of animal microRNAs
TL;DR: Evidence is mounting that animal miRNAs are more numerous, and their regulatory impact more pervasive, than was previously suspected.
Journal ArticleDOI
Analysis of the genome sequence of the flowering plant Arabidopsis thaliana.
TL;DR: This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.