Showing papers by "Michael Snyder published in 2007"

PDF

Open Access

Journal Article•DOI•

Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project

[...]

Ewan Birney, John A. Stamatoyannopoulos¹, Anindya Dutta², Roderic Guigó³ +317 more•Institutions (44)

14 Jun 2007-Nature

TL;DR: Functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project are reported, providing convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts.

...read moreread less

Abstract: We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.

...read moreread less

5,091 citations

Journal Article•DOI•

Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing

[...]

Gordon Robertson, Martin Hirst, Matthew N. Bainbridge, Misha Bilenky, Yongjun Zhao, Thomas Zeng, Ghia Euskirchen¹, Bridget Bernier, Richard Varhol, Allen Delaney, Nina Thiessen, Obi L. Griffith, A He, Marco A. Marra, Michael Snyder¹, Steven J.M. Jones - Show less +12 more•Institutions (1)

Yale University¹

11 Jun 2007-Nature Methods

TL;DR: ChIP-seq identified 41,582 and 11,004 putative STAT1-binding regions in stimulated and unstimulated cells, respectively, and found 24 loci known to contain STAT1 interferon-responsive binding sites, including 24 that were enriched in sequences similar to known STAT1 binding motifs.

...read moreread less

Abstract: We developed a method, ChIP-sequencing (ChIP-seq), combining chromatin immunoprecipitation (ChIP) and massively parallel sequencing to identify mammalian DNA sequences bound by transcription factors in vivo. We used ChIP-seq to map STAT1 targets in interferon-γ (IFN-γ)–stimulated and unstimulated human HeLa S3 cells, and compared the method's performance to ChIP-PCR and to ChIP-chip for four chromosomes. By ChIP-seq, using 15.1 and 12.9 million uniquely mapped sequence reads, and an estimated false discovery rate of less than 0.001, we identified 41,582 and 11,004 putative STAT1-binding regions in stimulated and unstimulated cells, respectively. Of the 34 loci known to contain STAT1 interferon-responsive binding sites, ChIP-seq found 24 (71%). ChIP-seq targets were enriched in sequences similar to known STAT1 binding motifs. Comparisons with two ChIP-PCR data sets suggested that ChIP-seq sensitivity was between 70% and 92% and specificity was at least 95%.

...read moreread less

1,444 citations

Journal Article•DOI•

Paired-end mapping reveals extensive structural variation in the human genome.

[...]

19 Oct 2007-Science

TL;DR: High-throughput and massive paired-end mapping (PEM) was used to map SVs in an African and in a putatively European individual and identified shared and divergent SVs relative to the reference genome, documenting that the number of SVs among humans is much larger than initially hypothesized; many of the SVs potentially affect gene function.

...read moreread less

Abstract: Structural variation of the genome involves kilobase- to megabase-sized deletions, duplications, insertions, inversions, and complex combinations of rearrangements. We introduce high-throughput and massive paired-end mapping (PEM), a large-scale genome-sequencing method to identify structural variants (SVs) ∼3 kilobases (kb) or larger that combines the rescue and capture of paired ends of 3-kb fragments, massive 454 sequencing, and a computational approach to map DNA reads onto a reference genome. PEM was used to map SVs in an African and in a putatively European individual and identified shared and divergent SVs relative to the reference genome. Overall, we fine-mapped more than 1000 SVs and documented that the number of SVs among humans is much larger than initially hypothesized; many of the SVs potentially affect gene function. The breakpoint junction sequences of more than 200 SVs were determined with a novel pooling strategy and computational analysis. Our analysis provided insights into the mechanisms of SV formation in humans.

...read moreread less

1,211 citations

Journal Article•DOI•

What is a gene, post-ENCODE? History and updated definition

[...]

Mark Gerstein¹, Joel Rozowsky¹, Deyou Zheng¹, Jiang Du¹, Jan O. Korbel¹, Olof Emanuelsson, Zhengdong D. Zhang¹, Sherman M. Weissman¹, Michael Snyder¹ - Show less +5 more•Institutions (1)

Yale University¹

01 Jun 2007-Genome Research

TL;DR: This definition side-steps the complexities of regulation and transcription by removing the former altogether from the definition and arguing that final, functional gene products (rather than intermediate transcripts) should be used to group together entities associated with a single gene.

...read moreread less

Abstract: While sequencing of the human genome surprised us with how many protein-coding genes there are, it did not fundamentally change our perspective on what a gene is. In contrast, the complex patterns of dispersed regulation and pervasive transcription uncovered by the ENCODE project, together with non-genic conservation and the abundance of noncoding RNA genes, have challenged the notion of the gene. To illustrate this, we review the evolution of operational definitions of a gene over the past century—from the abstract elements of heredity of Mendel and Morgan to the present-day ORFs enumerated in the sequence databanks. We then summarize the current ENCODE findings and provide a computational metaphor for the complexity. Finally, we propose a tentative update to the definition of a gene: A gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products. Our definition sidesteps the complexities of regulation and transcription by removing the former altogether from the definition and arguing that final, functional gene products (rather than intermediate transcripts) should be used to group together entities associated with a single gene. It also manifests how integral the concept of biological function is in defining genes.

...read moreread less

678 citations

Journal Article•DOI•

Getting connected: analysis and principles of biological networks

[...]

Xiaowei Zhu¹, Mark Gerstein, Michael Snyder•Institutions (1)

Yale University¹

01 May 2007-Genes & Development

TL;DR: Systematic approaches to study large numbers of proteins, metabolites, and their modification have revealed complex molecular networks which provide novel insights in understanding basic mechanisms controlling normal cellular processes and disease pathologies.

...read moreread less

Abstract: The execution of complex biological processes requires the precise interaction and regulation of thousands of molecules. Systematic approaches to study large numbers of proteins, metabolites, and their modification have revealed complex molecular networks. These biological networks are significantly different from random networks and often exhibit ubiquitous properties in terms of their structure and organization. Analyzing these networks provides novel insights in understanding basic mechanisms controlling normal cellular processes and disease pathologies.

...read moreread less

555 citations

Journal Article•DOI•

New insights into Acinetobacter baumannii pathogenesis revealed by high-density pyrosequencing and transposon mutagenesis

[...]

Michael G. Smith¹, Tara A. Gianoulis, Stefan Pukatzki, John J. Mekalanos, L. Nicholas Ornston, Mark Gerstein, Michael Snyder - Show less +3 more•Institutions (1)

Yale University¹

01 Mar 2007-Genes & Development

TL;DR: The pathogenic content of this harmful pathogen is explored using a combination of DNA sequencing and insertional mutagenesis and it is verified that six of the islands contain virulence genes, including two novel islands containing genes that lacked homology with others in the databases.

...read moreread less

Abstract: Acinetobacter baumannii has emerged as an important and problematic human pathogen as it is the causative agent of several types of infections including pneumonia, meningitis, septicemia, and urinary tract infections. We explored the pathogenic content of this harmful pathogen using a combination of DNA sequencing and insertional mutagenesis. The genome of this organism was sequenced using a strategy involving high-density pyrosequencing, a novel, rapid method of high-throughput sequencing. Excluding the rDNA repeats, the assembled genome is 3,976,746 base pairs (bp) and has 3830 ORFs. A significant fraction of ORFs (17.2%) are located in 28 putative alien islands, indicating that the genome has acquired a large amount of foreign DNA. Consistent with its role in pathogenesis, a remarkable number of the islands (16) contain genes implicated in virulence, indicating the organism devotes a considerable portion of its genes to pathogenesis. The largest island contains elements homologous to the Legionella/Coxiella Type IV secretion apparatus. Type IV secretion systems have been demonstrated to be important for virulence in other organisms and thus are likely to help mediate pathogenesis of A. baumannii. Insertional mutagenesis generated avirulent isolates of A. baumannii and verified that six of the islands contain virulence genes, including two novel islands containing genes that lacked homology with others in the databases. The DNA sequencing approach described in this study allows the rapid elucidation of the DNA sequence of any microbe and, when combined with genetic screens, can identify many novel genes important for microbial pathogenesis.

...read moreread less

490 citations

Journal Article•DOI•

HTRA1 promoter polymorphism in wet age-related macular degeneration

[...]

Andrew T. DeWan, M.–G. Liu, Sara J. Hartman, Samuel Shao Min Zhang, David T.L. Liu, Connie Zhao, Pancy O. S. Tam, Wai-Man Chan, Dennis S.C. Lam, Michael Snyder, Colin J. Barnstable, Chi Pui Pang, Josephine Hoh - Show less +9 more

01 Feb 2007-American Journal of Ophthalmology

TL;DR: It is reported that a single-nucleotide polymorphism in the promoter region of HTRA1, a serine protease gene on chromosome 10q26, is a major genetic risk factor for wet AMD.

...read moreread less

464 citations

Journal Article•DOI•

Divergence of transcription factor binding sites across related yeast species

[...]

Anthony R. Borneman¹, Tara A. Gianoulis¹, Zhengdong D. Zhang¹, Haiyuan Yu¹, Joel Rozowsky¹, Michael Seringhaus¹, Lu Yong Wang², Mark Gerstein¹, Michael Snyder¹ - Show less +5 more•Institutions (2)

Yale University¹, Princeton University²

10 Aug 2007-Science

TL;DR: It is shown that most of the binding sites of the pseudohyphal regulators Ste12 and Tec1 have diverged across these species, far exceeding the interspecies variation in orthologous genes.

...read moreread less

Abstract: Characterization of interspecies differences in gene regulation is crucial for understanding the molecular basis of both phenotypic diversity and evolution. By means of chromatin immunoprecipitation and DNA microarray analysis, the divergence in the binding sites of the pseudohyphal regulators Ste12 and Tec1 was determined in the yeasts Saccharomyces cerevisiae, S. mikatae, and S. bayanus under pseudohyphal conditions. We have shown that most of these sites have diverged across these species, far exceeding the interspecies variation in orthologous genes. A group of Ste12 targets was shown to be bound only in S. mikatae and S. bayanus under pseudohyphal conditions. Many of these genes are targets of Ste12 during mating in S. cerevisiae, indicating that specialization between the two pathways has occurred in this species. Transcription factor binding sites have therefore diverged substantially faster than ortholog content. Thus, gene regulation resulting from transcription factor binding is likely to be a major cause of divergence between related species.

...read moreread less

374 citations

Journal Article•DOI•

Differential binding of calmodulin-related proteins to their targets revealed through high-density Arabidopsis protein microarrays

[...]

Sorina C. Popescu¹, George V. Popescu, Shawn Bachan, Zimei Zhang, Montrell Seay, Mark Gerstein, Michael Snyder, Savithramma P. Dinesh-Kumar - Show less +4 more•Institutions (1)

Yale University¹

13 Mar 2007-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: It is suggested that calcium functions through distinct CaM/CML proteins to regulate a wide range of targets and cellular activities.

...read moreread less

Abstract: Calmodulins (CaMs) are the most ubiquitous calcium sensors in eukaryotes. A number of CaM-binding proteins have been identified through classical methods, and many proteins have been predicted to bind CaMs based on their structural homology with known targets. However, multicellular organisms typically contain many CaM-like (CML) proteins, and a global identification of their targets and specificity of interaction is lacking. In an effort to develop a platform for large-scale analysis of proteins in plants we have developed a protein microarray and used it to study the global analysis of CaM/CML interactions. An Arabidopsis thaliana expression collection containing 1,133 ORFs was generated and used to produce proteins with an optimized medium-throughput plant-based expression system. Protein microarrays were prepared and screened with several CaMs/CMLs. A large number of previously known and novel CaM/CML targets were identified, including transcription factors, receptor and intracellular protein kinases, F-box proteins, RNA-binding proteins, and proteins of unknown function. Multiple CaM/CML proteins bound many binding partners, but the majority of targets were specific to one or a few CaMs/CMLs indicating that different CaM family members function through different targets. Based on our analyses, the emergent CaM/CML interactome is more extensive than previously predicted. Our results suggest that calcium functions through distinct CaM/CML proteins to regulate a wide range of targets and cellular activities.

...read moreread less

357 citations

Journal Article•DOI•

Protein microarray technology.

[...]

David A. Hall¹, Jason Ptacek¹, Michael Snyder¹•Institutions (1)

Yale University¹

01 Jan 2007-Mechanisms of Ageing and Development

TL;DR: Current methods in the generation and applications of protein microarrays are reviewed, including protein–protein interactions, protein–phospholipid interactions, small molecule targets, and substrates of proteins kinases.

...read moreread less

287 citations

Journal Article•DOI•

Identification of differentially expressed proteins in ovarian cancer using high-density protein microarrays

[...]

Michael E. Hudson¹, Irina Pozdnyakova, Kenneth Haines, Gil Mor, Michael Snyder - Show less +1 more•Institutions (1)

Yale University¹

30 Oct 2007-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: Overall these studies identified candidate tissue marker proteins for ovarian cancer and demonstrate that protein microarrays provide a powerful approach to identify proteins aberrantly expressed in disease states.

...read moreread less

Abstract: Ovarian cancer is a leading cause of deaths, yet many aspects of the biology of the disease and a routine means of its detection are lacking We have used protein microarrays and autoantibodies from cancer patients to identify proteins that are aberrantly expressed in ovarian tissue Sera from 30 cancer patients and 30 healthy individuals were used to probe microarrays containing 5,005 human proteins Ninety-four antigens were identified that exhibited enhanced reactivity from sera in cancer patients relative to control sera The differential reactivity of four antigens was tested by using immunoblot analysis and tissue microarrays Lamin A/C, SSRP1, and RALBP1 were found to exhibit increased expression in the cancer tissue relative to controls The combined signals from multiple antigens proved to be a robust test to identify cancerous ovarian tissue These antigens were also reactive with tissue from other types of cancer and thus are not specific to ovarian cancer Overall our studies identified candidate tissue marker proteins for ovarian cancer and demonstrate that protein microarrays provide a powerful approach to identify proteins aberrantly expressed in disease states

...read moreread less

Journal Article•DOI•

Pseudogenes in the ENCODE regions: Consensus annotation, analysis of transcription, and evolution

[...]

Deyou Zheng¹, Adam Frankish², Robert Baertsch³, Philipp Kapranov⁴, Alexandre Reymond⁵, Alexandre Reymond⁶, Siew Woh Choo⁷, Yontao Lu³, Stylianos E. Antonarakis⁵, Michael Snyder¹, Yijun Ruan⁷, Chia-Lin Wei⁷, Thomas R. Gingeras⁴, Roderic Guigó⁸, Jennifer Harrow², Mark Gerstein¹ - Show less +12 more•Institutions (8)

Yale University¹, Wellcome Trust Sanger Institute², University of California, Santa Cruz³, Thermo Fisher Scientific⁴, University of Geneva⁵, University of Lausanne⁶, Agency for Science, Technology and Research⁷, Pompeu Fabra University⁸

01 Jun 2007-Genome Research

TL;DR: This work extensively examined the transcriptional activity of the ENCODE pseudogenes and performed systematic series of pseudogene-specific RACE analyses, demonstrating that at least a fifth of the 201 pseudogene are transcribed in one or more cell lines or tissues.

...read moreread less

Abstract: Arising from either retrotransposition or genomic duplication of functional genes, pseudogenes are “genomic fossils” valuable for exploring the dynamics and evolution of genes and genomes. Pseudogene identification is an important problem in computational genomics, and is also critical for obtaining an accurate picture of a genome’s structure and function. However, no consensus computational scheme for defining and detecting pseudogenes has been developed thus far. As part of the ENCyclopedia Of DNA Elements (ENCODE) project, we have compared several distinct pseudogene annotation strategies and found that different approaches and parameters often resulted in rather distinct sets of pseudogenes. We subsequently developed a consensus approach for annotating pseudogenes (derived from protein coding genes) in the ENCODE regions, resulting in 201 pseudogenes, two-thirds of which originated from retrotransposition. A survey of orthologs for these pseudogenes in 28 vertebrate genomes showed that a significant fraction (∼80%) of the processed pseudogenes are primate-specific sequences, highlighting the increasing retrotransposition activity in primates. Analysis of sequence conservation and variation also demonstrated that most pseudogenes evolve neutrally, and processed pseudogenes appear to have lost their coding potential immediately or soon after their emergence. In order to explore the functional implication of pseudogene prevalence, we have extensively examined the transcriptional activity of the ENCODE pseudogenes. We performed systematic series of pseudogene-specific RACE analyses. These, together with complementary evidence derived from tiling microarrays and high throughput sequencing, demonstrated that at least a fifth of the 201 pseudogenes are transcribed in one or more cell lines or tissues.

...read moreread less

Journal Article•DOI•

Mapping of transcription factor binding regions in mammalian cells by ChIP: Comparison of array- and sequencing-based technologies

[...]

Ghia Euskirchen¹, Joel Rozowsky¹, Chia-Lin Wei², Wah Heng Lee², Zhengdong D. Zhang¹, Stephen Hartman¹, Olof Emanuelsson¹, Viktor Stolc³, Sherman M. Weissman¹, Mark Gerstein¹, Yijun Ruan², Michael Snyder¹ - Show less +8 more•Institutions (3)

Yale University¹, Agency for Science, Technology and Research², Ames Research Center³

01 Jun 2007-Genome Research

TL;DR: It is found that Chip-chip and ChIP-PET are frequently complementary in their relative abilities to detect STAT1 targets for the lower ranked targets; each method detected validated targets that were missed by the other method.

...read moreread less

Abstract: Recent progress in mapping transcription factor (TF) binding regions can largely be credited to chromatin immunoprecipitation (ChIP) technologies. We compared strategies for mapping TF binding regions in mammalian cells using two different ChIP schemes: ChIP with DNA microarray analysis (ChIP-chip) and ChIP with DNA sequencing (ChIP-PET). We first investigated parameters central to obtaining robust ChIP-chip data sets by analyzing STAT1 targets in the ENCODE regions of the human genome, and then compared ChIP-chip to ChIP-PET. We devised methods for scoring and comparing results among various tiling arrays and examined parameters such as DNA microarray format, oligonucleotide length, hybridization conditions, and the use of competitor Cot-1 DNA. The best performance was achieved with high-density oligonucleotide arrays, oligonucleotides >/=50 bases (b), the presence of competitor Cot-1 DNA and hybridizations conducted in microfluidics stations. When target identification was evaluated as a function of array number, 80%-86% of targets were identified with three or more arrays. Comparison of ChIP-chip with ChIP-PET revealed strong agreement for the highest ranked targets with less overlap for the low ranked targets. With advantages and disadvantages unique to each approach, we found that ChIP-chip and ChIP-PET are frequently complementary in their relative abilities to detect STAT1 targets for the lower ranked targets; each method detected validated targets that were missed by the other method. The most comprehensive list of STAT1 binding regions is obtained by merging results from ChIP-chip and ChIP-sequencing. Overall, this study provides information for robust identification, scoring, and validation of TF targets using ChIP-based technologies.

...read moreread less

Journal Article•DOI•

Structured RNAs in the ENCODE selected regions of the human genome

[...]

01 Jun 2007-Genome Research

TL;DR: In this paper, the authors presented a computational study to detect functional RNA structures within the ENCODE regions of the human genome using three recently introduced programs based on either phylogenetic-stochastic context-free grammar (EvoFold) or energy directed folding (RNAz and AlifoldZ), yielding several thousand candidate structures.

...read moreread less

Abstract: Functional RNA structures play an important role both in the context of noncoding RNA transcripts as well as regulatory elements in mRNAs. Here we present a computational study to detect functional RNA structures within the ENCODE regions of the human genome. Since structural RNAs in general lack characteristic signals in primary sequence, comparative approaches evaluating evolutionary conservation of structures are most promising. We have used three recently introduced programs based on either phylogenetic-stochastic context-free grammar (EvoFold) or energy directed folding (RNAz and AlifoldZ), yielding several thousand candidate structures (corresponding to approximately 2.7% of the ENCODE regions). EvoFold has its highest sensitivity in highly conserved and relatively AU-rich regions, while RNAz favors slightly GC-rich regions, resulting in a relatively small overlap between methods. Comparison with the GENCODE annotation points to functional RNAs in all genomic contexts, with a slightly increased density in 3'-UTRs. While we estimate a significant false discovery rate of approximately 50%-70% many of the predictions can be further substantiated by additional criteria: 248 loci are predicted by both RNAz and EvoFold, and an additional 239 RNAz or EvoFold predictions are supported by the (more stringent) AlifoldZ algorithm. Five hundred seventy RNAz structure predictions fall into regions that show signs of selection pressure also on the sequence level (i.e., conserved elements). More than 700 predictions overlap with noncoding transcripts detected by oligonucleotide tiling arrays. One hundred seventy-five selected candidates were tested by RT-PCR in six tissues, and expression could be verified in 43 cases (24.6%).

...read moreread less

Journal Article•DOI•

Systematic prediction and validation of breakpoints associated with copy-number variants in the human genome.

[...]

Jan O. Korbel¹, Alexander E. Urban, Fabian Grubert, Jiang Du, Thomas Royce, Peter Starr, Guoneng Zhong, Beverly S. Emanuel, Sherman M. Weissman, Michael Snyder, Mark Gerstein - Show less +7 more•Institutions (1)

Yale University¹

12 Jun 2007-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: An iterative, “active” approach to initially scoring with a preliminary model, performing targeted validations, retraining the model, and then rescoring, and a flexible parameterization system that intuitively collapses from a full model of 2,503 parameters to a core one of only 10 enable the study of CNV population frequencies.

...read moreread less

Abstract: Copy-number variants (CNVs) are an abundant form of genetic variation in humans. However, approaches for determining exact CNV breakpoint sequences (physical deletion or duplication boundaries) across individuals, crucial for associating genotype to phenotype, have been lacking so far, and the vast majority of CNVs have been reported with approximate genomic coordinates only. Here, we report an approach, called BreakPtr, for fine-mapping CNVs (available from http://breakptr.gersteinlab.org). We statistically integrate both sequence characteristics and data from high-resolution comparative genome hybridization experiments in a discrete-valued, bivariate hidden Markov model. Incorporation of nucleotide-sequence information allows us to take into account the fact that recently duplicated sequences (e.g., segmental duplications) often coincide with breakpoints. In anticipation of an upcoming increase in CNV data, we developed an iterative, “active” approach to initially scoring with a preliminary model, performing targeted validations, retraining the model, and then rescoring, and a flexible parameterization system that intuitively collapses from a full model of 2,503 parameters to a core one of only 10. Using our approach, we accurately mapped >400 breakpoints on chromosome 22 and a region of chromosome 11, refining the boundaries of many previously approximately mapped CNVs. Four predicted breakpoints flanked known disease-associated deletions. We validated an additional four predicted CNV breakpoints by sequencing. Overall, our results suggest a predictive resolution of ≈300bp. This level of resolution enables more precise correlations between CNVs and across individuals than previously possible, allowing the study of CNV population frequencies. Further, it enabled us to demonstrate a clear Mendelian pattern of inheritance for one of the CNVs.

...read moreread less

Journal Article•DOI•

Statistical analysis of the genomic distribution and correlation of regulatory elements in the ENCODE regions

[...]

Zhengdong D. Zhang¹, Alberto Paccanaro², Yutao Fu³, Sherman M. Weissman¹, Zhiping Weng³, Joseph T. Chang¹, Michael Snyder¹, Mark Gerstein¹ - Show less +4 more•Institutions (3)

Yale University¹, Royal Holloway, University of London², Boston University³

01 Jun 2007-Genome Research

TL;DR: This study developed an intuitive and yet powerful approach to analyze the distribution of regulatory elements found in many different ChIP-chip experiments on a 10 approximately 100-kb scale and shows that regulatory elements are associated with the location of known genes.

...read moreread less

Abstract: The comprehensive inventory of functional elements in 44 human genomic regions carried out by the ENCODE Project Consortium enables for the first time a global analysis of the genomic distribution of transcriptional regulatory elements. In this study we developed an intuitive and yet powerful approach to analyze the distribution of regulatory elements found in many different ChIP–chip experiments on a 10∼100-kb scale. First, we focus on the overall chromosomal distribution of regulatory elements in the ENCODE regions and show that it is highly nonuniform. We demonstrate, in fact, that regulatory elements are associated with the location of known genes. Further examination on a local, single-gene scale shows an enrichment of regulatory elements near both transcription start and end sites. Our results indicate that overall these elements are clustered into regulatory rich “islands” and poor “deserts.” Next, we examine how consistent the nonuniform distribution is between different transcription factors. We perform on all the factors a multivariate analysis in the framework of a biplot, which enhances biological signals in the experiments. This groups transcription factors into sequence-specific and sequence-nonspecific clusters. Moreover, with experimental variation carefully controlled, detailed correlations show that the distribution of sites was generally reproducible for a specific factor between different laboratories and microarray platforms. Data sets associated with histone modifications have particularly strong correlations. Finally, we show how the correlations between factors change when only regulatory elements far from the transcription start sites are considered.

...read moreread less

Journal Article•DOI•

Mapping the chromosomal targets of STAT1 by Sequence Tag Analysis of Genomic Enrichment (STAGE).

[...]

Akshay Bhinge¹, Jonghwan Kim, Ghia Euskirchen², Michael Snyder², Vishwanath R. Iyer - Show less +1 more•Institutions (2)

University of Texas at Austin¹, Yale University²

01 Jun 2007-Genome Research

TL;DR: STAGE identified several previously unknown STAT1 target genes, many of which are involved in mediating the response to interferon-gamma signaling, and is a viable method for identifying the chromosomal targets of transcription factors and generating meaningful biological hypotheses that further the understanding of transcriptional regulatory networks.

...read moreread less

Abstract: Identifying the genome-wide binding sites of transcription factors is important in deciphering transcriptional regulatory networks. ChIP-chip (Chromatin immunoprecipitation combined with microarrays) has been widely used to map transcription factor binding sites in the human genome. However, whole genome ChIP-chip analysis is still technically challenging in vertebrates. We recently developed STAGE as an unbiased method for identifying transcription factor binding sites in the genome. STAGE is conceptually based on SAGE, except that the input is ChIP-enriched DNA. In this study, we implemented an improved sequencing strategy and analysis methods and applied STAGE to map the genomic binding profile of the transcription factor STAT1 after interferon treatment. STAT1 is mainly responsible for mediating the cellular responses to interferons, such as cell proliferation, apoptosis, immune surveillance, and immune responses. We present novel algorithms for STAGE tag analysis to identify enriched loci with high specificity, as verified by quantitative ChIP. STAGE identified several previously unknown STAT1 target genes, many of which are involved in mediating the response to interferon-gamma signaling. STAGE is thus a viable method for identifying the chromosomal targets of transcription factors and generating meaningful biological hypotheses that further our understanding of transcriptional regulatory networks.

...read moreread less

Journal Article•DOI•

Tilescope: online analysis pipeline for high-density tiling microarray data

[...]

Zhengdong D. Zhang¹, Joel Rozowsky¹, Hugo Y. K. Lam¹, Jiang Du¹, Michael Snyder¹, Mark Gerstein¹ - Show less +2 more•Institutions (1)

Yale University¹

14 May 2007-Genome Biology

TL;DR: Tilescope is a fully integrated data processing pipeline for analyzing high-density tiling-array data, designed with a modular, three-tiered architecture, facilitating parallelism, and a graphic user-friendly interface.

...read moreread less

Abstract: We developed Tilescope, a fully integrated data processing pipeline for analyzing high-density tiling-array data http://tilescope.gersteinlab.org. In a completely automated fashion, Tilescope will normalize signals between channels and across arrays, combine replicate experiments, score each array element, and identify genomic features. The program is designed with a modular, three-tiered architecture, facilitating parallelism, and a graphic user-friendly interface, presenting results in an organized web page, downloadable for further analysis.

...read moreread less

Journal Article•DOI•

Arabidopsis protein microarrays for the high-throughput identification of protein-protein interactions.

[...]

Sorina C. Popescu¹, Michael Snyder¹, Savithramma P. Dinesh-Kumar¹•Institutions (1)

Yale University¹

01 Sep 2007-Plant Signaling & Behavior

TL;DR: This study demonstrated that Arabidopsis functional protein microarrays can be generated and employed to characterize the function of plant proteins and provided new testable hypotheses in the area of CaM/Ca2+-regulated processes.

...read moreread less

Abstract: Protein microarray technology has emerged as a powerful new approach for the study of thousands of proteins simultaneously. Protein microarrays have been used for a wide variety of applications for the human and yeast systems. In a recent study, we demonstrated that Arabidopsis functional protein microarrays can be generated and employed to characterize the function of plant proteins. The arrayed proteins were produced using an optimized large-scale plant-based expression system. In a proof-of concept study, 173 known and novel potential substrates of calmodulin (CaM) and calmodulin-like proteins (CML) were identified in an unbiased and high-throughput manner. The information documented here on novel potential CaM targets provides new testable hypotheses in the area of CaM/Ca2+-regulated processes and represents a resource of functional information for the scientific community.

...read moreread less

Journal Article•DOI•

Integrated analysis of experimental data sets reveals many novel promoters in 1% of the human genome

[...]

Nathan D. Trinklein¹, Ulas Karaoz², Jia Qian Wu³, Anason S. Halees², Shelley Force Aldred¹, Patrick J. Collins¹, Deyou Zheng³, Zhengdong D. Zhang³, Mark Gerstein³, Michael Snyder³, Richard M. Myers¹, Zhiping Weng² - Show less +8 more•Institutions (3)

Stanford University¹, Boston University², Yale University³

01 Jun 2007-Genome Research

TL;DR: The authors' results suggest that there are at least 35% more functional promoters in the human genome than currently annotated, and that some of them might regulate anti-sense transcription.

...read moreread less

Abstract: The regulation of transcriptional initiation in the human genome is a critical component of global gene regulation, but a complete catalog of human promoters currently does not exist. In order to identify regulatory regions, we developed four computational methods to integrate 129 sets of ENCODE-wide chromatin immunoprecipitation data. They collectively predicted 1393 regions. Roughly 47% of the regions were unique to one method, as each method makes different assumptions about the data. Overall, predicted regions tend to localize to highly conserved, DNase I hypersensitive, and actively transcribed regions in the genome. Interestingly, a significant portion of the regions overlaps with annotated 3-UTRs, suggesting that some of them might regulate anti-sense transcription. The majority of the predicted regions are >2 kb away from the 5-ends of previously annotated human cDNAs and hence are novel. These novel regions may regulate unannotated transcripts or may represent new alternative transcription start sites of known genes. We tested 163 such regions for promoter activity in four cell lines using transient transfection assays, and 25% of them showed transcriptional activity above background in at least one cell line. We also performed 5-RACE experiments on 62 novel regions, and 76% of the regions were associated with the 5-ends of at least two RACE products. Our results suggest that there are at least 35% more functional promoters in the human genome than currently annotated.

...read moreread less

Journal Article•DOI•

Assessing the performance of different high-density tiling microarray strategies for mapping transcribed regions of the human genome

[...]

Olof Emanuelsson¹, Ugrappa Nagalakshmi¹, Deyou Zheng¹, Joel Rozowsky¹, Alexander E. Urban¹, Jiang Du¹, Zheng Lian¹, Viktor Stolc², Sherman M. Weissman¹, Michael Snyder¹, Mark Gerstein¹ - Show less +7 more•Institutions (2)

Yale University¹, Ames Research Center²

01 Jun 2007-Genome Research

TL;DR: Overall, the performance improves with more data points per locus, coupled with statistical scoring approaches that properly take advantage of this, where this larger number of data points arises from higher genomic tiling density and the use of replicate arrays and mismatches.

...read moreread less

Abstract: Genomic tiling microarrays have become a popular tool for interrogating the transcriptional activity of large regions of the genome in an unbiased fashion. There are several key parameters associated with each tiling experiment (e.g., experimental protocols and genomic tiling density). Here, we assess the role of these parameters as they are manifest in different tiling-array platforms used for transcription mapping. First, we analyze how a number of published tiling-array experiments agree with established gene annotation on human chromosome 22. We observe that the transcription detected from high-density arrays correlates substantially better with annotation than that from other array types. Next, we analyze the transcription-mapping performance of the two main high-density oligonucleotide array platforms in the ENCODE regions of the human genome. We hybridize identical biological samples and develop several ways of scoring the arrays and segmenting the genome into transcribed and nontranscribed regions, with the aim of making the platforms most comparable to each other. Finally, we develop a platform comparison approach based on agreement with known annotation. Overall, we find that the performance improves with more data points per locus, coupled with statistical scoring approaches that properly take advantage of this, where this larger number of data points arises from higher genomic tiling density and the use of replicate arrays and mismatches. While we do find significant differences in the performance of the two high-density platforms, we also find that they complement each other to some extent. Finally, our experiments reveal a significant amount of novel transcription outside of known genes, and an appreciable sample of this was validated by independent experiments.

...read moreread less

Journal Article•DOI•

The DART classification of unannotated transcription within the ENCODE regions: Associating transcription with known and novel loci

[...]

Joel Rozowsky¹, Daniel E. Newburger¹, Fred Sayward¹, Jia Qian Wu¹, Greg Jordan¹, Jan O. Korbel¹, Ugrappa Nagalakshmi¹, Jin Yang¹, Deyou Zheng¹, Roderic Guigó², Thomas R. Gingeras³, Sherman M. Weissman¹, Perry L. Miller¹, Michael Snyder¹, Mark Gerstein¹ - Show less +11 more•Institutions (3)

Yale University¹, Pompeu Fabra University², Thermo Fisher Scientific³

01 Jun 2007-Genome Research

TL;DR: This work uses a number of disparate features to classify the 6988 novel TARs-array expression profiles across cell lines and conditions, sequence composition, phylogenetic profiles (presence/absence of syntenic conservation across 17 species), and locations relative to genes.

...read moreread less

Abstract: For the ∼1% of the human genome in the ENCODE regions, only about half of the transcriptionally active regions (TARs) identified with tiling microarrays correspond to annotated exons. Here we categorize this large amount of “unannotated transcription.” We use a number of disparate features to classify the 6988 novel TARs—array expression profiles across cell lines and conditions, sequence composition, phylogenetic profiles (presence/absence of syntenic conservation across 17 species), and locations relative to genes. In the classification, we first filter out TARs with unusual sequence composition and those likely resulting from cross-hybridization. We then associate some of those remaining with proximal exons having correlated expression profiles. Finally, we cluster unclassified TARs into putative novel loci, based on similar expression and phylogenetic profiles. To encapsulate our classification, we construct a Database of Active Regions and Tools (DART.gersteinlab.org). DART has special facilities for rapidly handling and comparing many sets of TARs and their heterogeneous features, synchronizing across builds, and interfacing with other resources. Overall, we find that ∼14% of the novel TARs can be associated with known genes, while ∼21% can be clustered into ∼200 novel loci. We observe that TARs associated with genes are enriched in the potential to form structural RNAs and many novel TAR clusters are associated with nearby promoters. To benchmark our classification, we design a set of experiments for testing the connectivity of novel TARs. Overall, we find that 18 of the 46 connections tested validate by RT-PCR and four of five sequenced PCR products confirm connectivity unambiguously.

...read moreread less

Journal Article•DOI•

Transcription factor binding site identification in yeast: a comparison of high-density oligonucleotide and PCR-based microarray platforms.

[...]

Anthony R. Borneman¹, Zhengdong D. Zhang¹, Joel Rozowsky¹, Michael Seringhaus¹, Mark Gerstein¹, Michael Snyder¹ - Show less +2 more•Institutions (1)

Yale University¹

19 Jul 2007-Functional & Integrative Genomics

TL;DR: The HDO array platform provides a far more robust array system by all measures than PCR-based arrays, all of which is directly attributable to the large number of probes available.

...read moreread less

Abstract: In recent years, techniques have been developed to map transcription factor binding sites using chromatin immunoprecipitation combined with DNA microarrays (chIP chip). Initially, polymerase chain reaction (PCR)-based DNA arrays were used for the chIP chip procedure, however, high-density oligonucleotide (HDO) arrays, which allow for the production of thousands more features per array, have emerged as a competing array platform. To compare the two platforms, data from chIP chip analysis performed for three factors (Tec1, Ste12, and Sok2) using both HDO and PCR arrays under identical experimental conditions were compared. HDO arrays provided increased reproducibility and sensitivity, detecting approximately three times more binding events than the PCR arrays while also showing increased accuracy. The increased resolution provided by the HDO arrays also allowed for the identification of multiple binding peaks in close proximity and of novel binding events such as binding within ORFs. The HDO array platform provides a far more robust array system by all measures than PCR-based arrays, all of which is directly attributable to the large number of probes available.

...read moreread less

Patent•

Early diagnosis of congenital abnormalities in the offspring of diabetic mothers

[...]

Joseph A. Madri¹, Anjali K. Nath¹, Michael Krauthammer¹, Michael Snyder¹, Eugene Davidov¹ - Show less +1 more•Institutions (1)

Yale University¹

21 Mar 2007

TL;DR: In this paper, the identification of a series of biomarkers, the detection of which is prognostic for women at risk of becoming hyperglycemic during pregnancy and/or fetuses at risk for developing congenital anomalies as a result of maternal hyperglycemia.

...read moreread less

Abstract: The present invention relates to the identification of a series of biomarkers, the detection of which is prognostic for women at risk of becoming hyperglycemic during pregnancy and/or fetuses at risk of developing congenital anomalies as a result of maternal hyperglycemia.

...read moreread less

Book Chapter•DOI•

14 Yeast Protein Microarrays

[...]

Jason Ptacek¹, Michael Snyder¹•Institutions (1)

Yale University¹

01 Jan 2007-Methods in Microbiology

TL;DR: This chapter discusses many challenges and options that exist in designing a yeast protein array and the many questions that have been addressed using this technology, predominantly in the form of functional protein microarrays.

...read moreread less

Abstract: Publisher Summary Protein microarrays are the arrays of protein, or in the case of yeast, nearly the entire proteome, which will expedite the study of the proteome by providing a platform to elucidate a protein's function and the way it relates to other proteins on a global scale. This chapter discusses many challenges and options that exist in designing a yeast protein array and the many questions that have been addressed using this technology, predominantly in the form of functional protein microarrays. All technologies involving proteins are challenged by the large scale of the proteome and the difficulty in working with proteins, given that their chemistry and solubility are much more variable. The goals of proteomics and a sample of the common technologies applied to each are listed in tabulated form in the chapter. Many techniques have been used to address different aspects of these goals. Mass spectrometry has been used to identify protein complexes, the components of the yeast nuclear pore complex, and to catalogue 1484 proteins from yeast in log-phase. By using this technique, it is difficult, but not impossible, to determine if a protein interaction is a direct, or binary, interaction or if it is an indirect interaction, mediated by other components of the complex.

...read moreread less

Book Chapter•DOI•

Kinase Substrate Identification Using Yeast Protein Microarrays

[...]

Michael Snyder, Geeta Devgan

30 Mar 2007

Journal Article•DOI•

Chromatin Structure and Transcription of the Human alpha Globin Locus in Erythroid and Non-Erythroid Environments.

[...]

Milind Mahajan¹, Ghia Euskirchen¹, Jin Lian¹, Adam S. Raefski¹, Michael Snyder¹, Sherman M. Weissman¹ - Show less +2 more•Institutions (1)

Yale University¹

16 Nov 2007-Blood

TL;DR: A comparative analysis of the chromatin structure of the alpha globin locus, recruitment of transcription factors, and the transcriptional activity of the locus in enrythroid and non-erythroid cells finds that a strong HS40 enhancer formed by the virtue of the recruitment of the enhancer factors can overcome blocking by the downstream flanking CTCF site and may be mediated by specific interactions between upstream and downstream insulators.

...read moreread less

by high-density pyrosequencing and transposon mutagenesis pathogenesis revealed Acinetobacter baumannii New insights into

[...]

Michael Snyder, Michael G. Smith, Tara A. Gianoulis, Stefan Pukatzki, John J. Mekalanos, L. Nicholas Ornston, Mark Gerstein - Show less +3 more

01 Jan 2007