scispace - formally typeset
Search or ask a question
Author

Matthew G. Endrizzi

Bio: Matthew G. Endrizzi is an academic researcher from Harvard University. The author has contributed to research in topics: Genome & Comparative genomics. The author has an hindex of 8, co-authored 8 publications receiving 4542 citations. Previous affiliations of Matthew G. Endrizzi include Howard Hughes Medical Institute & J. Craig Venter Institute.

Papers
More filters
Journal ArticleDOI
15 May 2003-Nature
TL;DR: A comparative analysis of the yeast Saccharomyces cerevisiae based on high-quality draft sequences of three related species, which inferred a putative function for most of these motifs, and provided insights into their combinatorial interactions.
Abstract: Identifying the functional elements encoded in a genome is one of the principal challenges in modern biology. Comparative genomics should offer a powerful, general approach. Here, we present a comparative analysis of the yeast Saccharomyces cerevisiae based on high-quality draft sequences of three related species (S. paradoxus, S. mikatae and S. bayanus). We first aligned the genomes and characterized their evolution, defining the regions and mechanisms of change. We then developed methods for direct identification of genes and regulatory motifs. The gene analysis yielded a major revision to the yeast gene catalogue, affecting approximately 15% of all genes and reducing the total count by about 500 genes. The motif analysis automatically identified 72 genome-wide elements, including most known regulatory motifs and numerous new motifs. We inferred a putative function for most of these motifs, and provided insights into their combinatorial interactions. The results have implications for genome analysis of diverse organisms, including the human.

1,837 citations

Journal ArticleDOI
24 Apr 2003-Nature
TL;DR: A high-quality draft sequence of the N. crassa genome is reported, suggesting that RIP has had a profound impact on genome evolution, greatly slowing the creation of new genes through genomic duplication and resulting in a genome with an unusually low proportion of closely related genes.
Abstract: Neurospora crassa is a central organism in the history of twentieth-century genetics, biochemistry and molecular biology. Here, we report a high-quality draft sequence of the N. crassa genome. The approximately 40-megabase genome encodes about 10,000 protein-coding genes—more than twice as many as in the fission yeast Schizosaccharomyces pombe and only about 25% fewer than in the fruitfly Drosophila melanogaster. Analysis of the gene set yields insights into unexpected aspects of Neurospora biology including the identification of genes potentially associated with red light photobiology, genes implicated in secondary metabolism, and important differences in Ca21 signalling as compared with plants and animals. Neurospora possesses the widest array of genome defence mechanisms known for any eukaryotic organism, including a process unique to fungi called repeat-induced point mutation (RIP). Genome analysis suggests that RIP has had a profound impact on genome evolution, greatly slowing the creation of new genes through genomic duplication and resulting in a genome with an unusually low proportion of closely related genes.

1,659 citations

Journal ArticleDOI
TL;DR: The complete genome sequence of an acetate-utilizing methanogen, Methanosarcina acetivorans C2A, is reported, which indicates the likelihood of undiscovered natural energy sources for methanogenesis, whereas the presence of single-subunit carbon monoxide dehydrogenases raises the possibility of nonmethanogenic growth.
Abstract: The Archaea remain the most poorly understood domain of life despite their importance to the biosphere. Methanogenesis, which plays a pivotal role in the global carbon cycle, is unique to the Archaea. Each year, an estimated 900 million metric tons of methane are biologically produced, representing the major global source for this greenhouse gas and contributing significantly to global warming (Schlesinger 1997). Methanogenesis is critical to the waste-treatment industry and biologically produced methane also represents an important alternative fuel source. At least two-thirds of the methane in nature is derived from acetate, although only two genera of methanogens are known to be capable of utilizing this substrate. We report here the first complete genome sequence of an acetate-utilizing (acetoclastic) methanogen, Methanosarcina acetivorans C2A. The Methanosarcineae are metabolically and physiologically the most versatile methanogens. Only Methanosarcina species possess all three known pathways for methanogenesis (Fig. ​(Fig.1)1) and are capable of utilizing no less than nine methanogenic substrates, including acetate. In contrast, all other orders of methanogens possess a single pathway for methanogenesis, and many utilize no more than two substrates. Among methanogens, the Methanosarcineae also display extensive environmental diversity. Individual species of Methanosarcina have been found in freshwater and marine sediments, decaying leaves and garden soils, oil wells, sewage and animal waste digesters and lagoons, thermophilic digesters, feces of herbivorous animals, and the rumens of ungulates (Zinder 1993). Figure 1 Three pathways for methanogenesis. Methanogenesis is a form of anaerobic respiration using a variety of one-carbon (C-1) compounds or acetic acid as a terminal electron acceptor. All three pathways converge on the reduction of methyl-CoM to methane (CH ... The Methanosarcineae are unique among the Archaea in forming complex multicellular structures during different phases of growth and in response to environmental change (Fig. ​(Fig.2).2). Within the Methanosarcineae, a number of distinct morphological forms have been characterized, including single cells with and without a cell envelope, as well as multicellular packets and lamina (Macario and Conway de Macario 2001). Packets and lamina display internal morphological heterogeneity, suggesting the possibility of cellular differentiation. Moreover, it has been suggested that cells within lamina may display differential production of extracellular material, a potential form of cellular specialization (Macario and Conway de Macario 2001). The formation of multicellular structures has been proposed to act as an adaptation to stress and likely plays a role in the ability of Methanosarcina species to colonize diverse environments. Figure 2 Different morphological forms of Methanosarcina acetivorans. Thin-section electron micrographs showing M. acetivorans growing as both single cells (center of micrograph) and within multicellular aggregates (top left, bottom right). Cells were harvested ... Significantly, powerful methods for genetic analysis exist for Methanosarcina species. These tools include plasmid shuttle vectors (Metcalf et al. 1997), very high efficiency transformation (Metcalf et al. 1997), random in vivo transposon mutagenesis (Zhang et al. 2000), directed mutagenesis of specific genes (Zhang et al. 2000), multiple selectable markers (Boccazzi et al. 2000), reporter gene fusions (M. Pritchett and W. Metcalf, unpubl.), integration vectors (Conway de Macario et al. 1996), and anaerobic incubators for large-scale growth of methanogens on solid media (Metcalf et al. 1998). Furthermore, and in contrast to other known methanogens, genetic analysis can be used to study the process of methanogenesis: Because Methanosarcina species are able to utilize each of the three known methanogenic pathways, mutants in a single pathway are viable (M. Pritchett and W. Metcalf, unpubl.). The availability of genetic methods allowing immediate exploitation of genomic sequence, coupled with the genetic, physiological, and environmental diversity of M. acetivorans make this species an outstanding model organism for the study of archaeal biology. For these reasons, we set out to study the genome of M. acetivorans.

626 citations

Journal ArticleDOI
TL;DR: It is speculated that Naip5 is a functional mammalian homolog of plant "resistance" proteins that monitor for, and initiate host response to, the presence of secreted bacterial virulence proteins.

262 citations

Journal ArticleDOI
TL;DR: A multi-copy microsatellite marker that is deleted in more than 90% of type I SMA chromosomes is embedded in an intron of this gene, indicating that H4F5 is also highly deleted in type ISMA chromosomes, and thus is a candidate phenotypic modifier for SMA.
Abstract: Spinal muscular atrophy (SMA) is a common recessive disorder characterized by the loss of lower motor neurons in the spinal cord. The disease has been classified into three types based on age of onset and severity1. SMA I-III all map to chromosome 5q13 (Refs 2,3), and nearly all patients display deletions or gene conversions of the survival motor neuron (SMN1) gene4,5,6,7. Some correlation has been established between SMN protein levels and disease course8,9,10; nevertheless, the genetic basis for SMA phenotypic variability remains unclear, and it has been postulated that the loss of an additional modifying factor contributes to the severity of type I SMA. Using comparative genomics to screen for such a factor among evolutionarily conserved sequences between mouse and human, we have identified a novel transcript, H4F5, which lies closer to SMN1 than any previously identified gene in the region. A multi-copy microsatellite marker that is deleted in more than 90% of type I SMA chromosomes is embedded in an intron of this gene, indicating that H4F5 is also highly deleted in type I SMA chromosomes, and thus is a candidate phenotypic modifier for SMA.

147 citations


Cited by
More filters
Journal ArticleDOI
16 Oct 2003-Nature
TL;DR: The construction and analysis of a collection of yeast strains expressing full-length, chromosomally tagged green fluorescent protein fusion proteins helps reveal the logic of transcriptional co-regulation, and provides a comprehensive view of interactions within and between organelles in eukaryotic cells.
Abstract: A fundamental goal of cell biology is to define the functions of proteins in the context of compartments that organize them in the cellular environment. Here we describe the construction and analysis of a collection of yeast strains expressing full-length, chromosomally tagged green fluorescent protein fusion proteins. We classify these proteins, representing 75% of the yeast proteome, into 22 distinct subcellular localization categories, and provide localization information for 70% of previously unlocalized proteins. Analysis of this high-resolution, high-coverage localization data set in the context of transcriptional, genetic, and protein-protein interaction data helps reveal the logic of transcriptional co-regulation, and provides a comprehensive view of interactions within and between organelles in eukaryotic cells.

4,310 citations

Journal ArticleDOI
21 Oct 2004-Nature
TL;DR: The current human genome sequence (Build 35) as discussed by the authors contains 2.85 billion nucleotides interrupted by only 341 gaps and is accurate to an error rate of approximately 1 event per 100,000 bases.
Abstract: The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers approximately 99% of the euchromatic genome and is accurate to an error rate of approximately 1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human genome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead.

3,989 citations

Journal ArticleDOI
16 Oct 2003-Nature
TL;DR: A Saccharomyces cerevisiae fusion library is created where each open reading frame is tagged with a high-affinity epitope and expressed from its natural chromosomal location, and it is found that about 80% of the proteome is expressed during normal growth conditions.
Abstract: The availability of complete genomic sequences and technologies that allow comprehensive analysis of global expression profiles of messenger RNA have greatly expanded our ability to monitor the internal state of a cell. Yet biological systems ultimately need to be explained in terms of the activity, regulation and modification of proteins--and the ubiquitous occurrence of post-transcriptional regulation makes mRNA an imperfect proxy for such information. To facilitate global protein analyses, we have created a Saccharomyces cerevisiae fusion library where each open reading frame is tagged with a high-affinity epitope and expressed from its natural chromosomal location. Through immunodetection of the common tag, we obtain a census of proteins expressed during log-phase growth and measurements of their absolute levels. We find that about 80% of the proteome is expressed during normal growth conditions, and, using additional sequence information, we systematically identify misannotated genes. The abundance of proteins ranges from fewer than 50 to more than 10(6) molecules per cell. Many of these molecules, including essential proteins and most transcription factors, are present at levels that are not readily detectable by other proteomic techniques nor predictable by mRNA levels or codon bias measurements.

3,894 citations

Journal ArticleDOI
TL;DR: A comprehensive search for conserved elements in vertebrate genomes is conducted, using genome-wide multiple alignments of five vertebrate species (human, mouse, rat, chicken, and Fugu rubripes), using a two-state phylogenetic hidden Markov model (phylo-HMM).
Abstract: We have conducted a comprehensive search for conserved elements in vertebrate genomes, using genome-wide multiple alignments of five vertebrate species (human, mouse, rat, chicken, and Fugu rubripes). Parallel searches have been performed with multiple alignments of four insect species (three species of Drosophila and Anopheles gambiae), two species of Caenorhabditis, and seven species of Saccharomyces. Conserved elements were identified with a computer program called phastCons, which is based on a two-state phylogenetic hidden Markov model (phylo-HMM). PhastCons works by fitting a phylo-HMM to the data by maximum likelihood, subject to constraints designed to calibrate the model across species groups, and then predicting conserved elements based on this model. The predicted elements cover roughly 3%-8% of the human genome (depending on the details of the calibration procedure) and substantially higher fractions of the more compact Drosophila melanogaster (37%-53%), Caenorhabditis elegans (18%-37%), and Saccharaomyces cerevisiae (47%-68%) genomes. From yeasts to vertebrates, in order of increasing genome size and general biological complexity, increasing fractions of conserved bases are found to lie outside of the exons of known protein-coding genes. In all groups, the most highly conserved elements (HCEs), by log-odds score, are hundreds or thousands of bases long. These elements share certain properties with ultraconserved elements, but they tend to be longer and less perfectly conserved, and they overlap genes of somewhat different functional categories. In vertebrates, HCEs are associated with the 3' UTRs of regulatory genes, stable gene deserts, and megabase-sized regions rich in moderately conserved noncoding sequences. Noncoding HCEs also show strong statistical evidence of an enrichment for RNA secondary structure.

3,719 citations

Journal ArticleDOI
06 Jun 2008-Science
TL;DR: A quantitative sequencing-based method is developed for mapping transcribed regions, in which complementary DNA fragments are subjected to high-throughput sequencing and mapped to the genome, and it is demonstrated that most (74.5%) of the nonrepetitive sequence of the yeast genome is transcribed.
Abstract: The identification of untranslated regions, introns, and coding regions within an organism remains challenging. We developed a quantitative sequencing-based method called RNA-Seq for mapping transcribed regions, in which complementary DNA fragments are subjected to high-throughput sequencing and mapped to the genome. We applied RNA-Seq to generate a high-resolution transcriptome map of the yeast genome and demonstrated that most (74.5%) of the nonrepetitive sequence of the yeast genome is transcribed. We confirmed many known and predicted introns and demonstrated that others are not actively used. Alternative initiation codons and upstream open reading frames also were identified for many yeast genes. We also found unexpected 3'-end heterogeneity and the presence of many overlapping genes. These results indicate that the yeast transcriptome is more complex than previously appreciated.

2,506 citations