Author
Lee Murphy
Other affiliations: Wellcome Trust, Western General Hospital, Wellcome Trust Sanger Institute
Bio: Lee Murphy is an academic researcher from University of Edinburgh. The author has contributed to research in topics: Genome & Gene. The author has an hindex of 40, co-authored 72 publications receiving 30828 citations. Previous affiliations of Lee Murphy include Wellcome Trust & Western General Hospital.
Topics: Genome, Gene, Medicine, DNA methylation, Population
Papers published on a yearly basis
Papers
More filters
••
TL;DR: The complete genome sequence of the best-characterized strain of Mycobacterium tuberculosis, H37Rv, has been determined and analysed in order to improve the understanding of the biology of this slow-growing pathogen and to help the conception of new prophylactic and therapeutic interventions.
Abstract: Countless millions of people have died from tuberculosis, a chronic infectious disease caused by the tubercle bacillus. The complete genome sequence of the best-characterized strain of Mycobacterium tuberculosis, H37Rv, has been determined and analysed in order to improve our understanding of the biology of this slow-growing pathogen and to help the conception of new prophylactic and therapeutic interventions. The genome comprises 4,411,529 base pairs, contains around 4,000 genes, and has a very high guanine + cytosine content that is reflected in the biased amino-acid content of the proteins. M. tuberculosis differs radically from other bacteria in that a very large portion of its coding capacity is devoted to the production of enzymes involved in lipogenesis and lipolysis, and to two new families of glycine-rich proteins with a repetitive structure that may represent a source of antigenic variation.
7,779 citations
••
Mark Raymond Adams1, Susan E. Celniker2, Robert A. Holt1, Cheryl A. Evans1 +191 more•Institutions (23)
TL;DR: The nucleotide sequence of nearly all of the approximately 120-megabase euchromatic portion of the Drosophila genome is determined using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map.
Abstract: The fly Drosophila melanogaster is one of the most intensively studied organisms in biology and serves as a model system for the investigation of many developmental and cellular processes common to higher eukaryotes, including humans. We have determined the nucleotide sequence of nearly all of the approximately 120-megabase euchromatic portion of the Drosophila genome using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map. Efforts are under way to close the remaining gaps; however, the sequence is of sufficient accuracy and contiguity to be declared substantially complete and to support an initial analysis of genome structure and preliminary gene annotation and interpretation. The genome encodes approximately 13,600 genes, somewhat fewer than the smaller Caenorhabditis elegans genome, but with comparable functional diversity.
6,180 citations
••
TL;DR: The 8,667,507 base pair linear chromosome of Streptomyces coelicolor is reported, containing the largest number of genes so far discovered in a bacterium.
Abstract: Streptomyces coelicolor is a representative of the group of soil-dwelling, filamentous bacteria responsible for producing most natural antibiotics used in human and veterinary medicine. Here we report the 8,667,507 base pair linear chromosome of this organism, containing the largest number of genes so far discovered in a bacterium. The 7,825 predicted genes include more than 20 clusters coding for known or predicted secondary metabolites. The genome contains an unprecedented proportion of regulatory genes, predominantly those likely to be involved in responses to external stimuli and stresses, and many duplicated gene sets that may represent 'tissue-specific' isoforms operating in different phases of colonial development, a unique situation for a bacterium. An ancient synteny was revealed between the central 'core' of the chromosome and the whole chromosome of pathogens Mycobacterium tuberculosis and Corynebacterium diphtheriae. The genome sequence will greatly increase our understanding of microbial life in the soil as well as aiding the generation of new drug candidates by genetic engineering.
3,077 citations
••
Wellcome Trust Sanger Institute1, London Research Institute2, Katholieke Universiteit Leuven3, Max Planck Society4, GATC Biotech5, Université catholique de Louvain6, Centre national de la recherche scientifique7, University of Exeter8, Institut national agronomique Paris Grignon9, University of Málaga10, Pablo de Olavide University11, University of Salamanca12, University of Sussex13, Salk Institute for Biological Studies14, Stanford University15, Cold Spring Harbor Laboratory16, TigerLogic17, Rosalind Franklin University of Medicine and Science18, Russian Academy of Sciences19, Technical University of Denmark20
TL;DR: The genome of fission yeast (Schizosaccharomyces pombe), which contains the smallest number of protein-coding genes yet recorded for a eukaryote, is sequenced and highly conserved genes important for eukARYotic cell organization including those required for the cytoskeleton, compartmentation, cell-cycle control, proteolysis, protein phosphorylation and RNA splicing are identified.
Abstract: We have sequenced and annotated the genome of fission yeast (Schizosaccharomyces pombe), which contains the smallest number of protein-coding genes yet recorded for a eukaryote: 4,824. The centromeres are between 35 and 110 kilobases (kb) and contain related repeats including a highly conserved 1.8-kb element. Regions upstream of genes are longer than in budding yeast (Saccharomyces cerevisiae), possibly reflecting more-extended control regions. Some 43% of the genes contain introns, of which there are 4,730. Fifty genes have significant similarity with human disease genes; half of these are cancer related. We identify highly conserved genes important for eukaryotic cell organization including those required for the cytoskeleton, compartmentation, cell-cycle control, proteolysis, protein phosphorylation and RNA splicing. These genes may have originated with the appearance of eukaryotic life. Few similarly conserved genes that are important for multicellular organization were identified, suggesting that the transition from prokaryotes to eukaryotes required more new genes than did the transition from unicellular to multicellular organization.
1,686 citations
••
TL;DR: Comparing the 3.27-megabase genome sequence of an armadillo-derived Indian isolate of the leprosy bacillus with that of Mycobacterium tuberculosis provides clear explanations for these properties and reveals an extreme case of reductive evolution.
Abstract: Leprosy, a chronic human neurological disease, results from infection with the obligate intracellular pathogen Mycobacterium leprae, a close relative of the tubercle bacillus. Mycobacterium leprae has the longest doubling time of all known bacteria and has thwarted every effort at culture in the laboratory. Comparing the 3.27-megabase (Mb) genome sequence of an armadillo-derived Indian isolate of the leprosy bacillus with that of Mycobacterium tuberculosis (4.41 Mb) provides clear explanations for these properties and reveals an extreme case of reductive evolution. Less than half of the genome contains functional genes but pseudogenes, with intact counterparts in M. tuberculosis, abound. Genome downsizing and the current mosaic arrangement appear to have resulted from extensive recombination events between dispersed repetitive sequences. Gene deletion and decay have eliminated many important metabolic activities including siderophore production, part of the oxidative and most of the microaerophilic and anaerobic respiratory chains, and numerous catabolic systems and their regulatory circuits.
1,620 citations
Cited by
More filters
••
TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.
Abstract: Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.
35,225 citations
••
TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.
Abstract: Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is approximately 100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11,912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/).
14,075 citations
••
TL;DR: Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems are indicated.
Abstract: A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.
12,098 citations
••
TL;DR: This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.
Abstract: The flowering plant Arabidopsis thaliana is an important model system for identifying genes and determining their functions. Here we report the analysis of the genomic sequence of Arabidopsis. The sequenced regions cover 115.4 megabases of the 125-megabase genome and extend into centromeric regions. The evolution of Arabidopsis involved a whole-genome duplication, followed by subsequent gene loss and extensive local gene duplications, giving rise to a dynamic genome enriched by lateral gene transfer from a cyanobacterial-like ancestor of the plastid. The genome contains 25,498 genes encoding proteins from 11,000 families, similar to the functional diversity of Drosophila and Caenorhabditis elegans--the other sequenced multicellular eukaryotes. Arabidopsis has many families of new proteins but also lacks several common protein families, indicating that the sets of common proteins have undergone differential expansion and contraction in the three multicellular eukaryotes. This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.
8,742 citations
••
TL;DR: WRITING GROUP MEMBERS Emelia J. Benjamin, MD, SCM, FAHA Michael J. Reeves, PhD Matthew Ritchey, PT, DPT, OCS, MPH Carlos J. Jiménez, ScD, SM Lori Chaffin Jordan,MD, PhD Suzanne E. Judd, PhD
Abstract: WRITING GROUP MEMBERS Emelia J. Benjamin, MD, SCM, FAHA Michael J. Blaha, MD, MPH Stephanie E. Chiuve, ScD Mary Cushman, MD, MSc, FAHA Sandeep R. Das, MD, MPH, FAHA Rajat Deo, MD, MTR Sarah D. de Ferranti, MD, MPH James Floyd, MD, MS Myriam Fornage, PhD, FAHA Cathleen Gillespie, MS Carmen R. Isasi, MD, PhD, FAHA Monik C. Jiménez, ScD, SM Lori Chaffin Jordan, MD, PhD Suzanne E. Judd, PhD Daniel Lackland, DrPH, FAHA Judith H. Lichtman, PhD, MPH, FAHA Lynda Lisabeth, PhD, MPH, FAHA Simin Liu, MD, ScD, FAHA Chris T. Longenecker, MD Rachel H. Mackey, PhD, MPH, FAHA Kunihiro Matsushita, MD, PhD, FAHA Dariush Mozaffarian, MD, DrPH, FAHA Michael E. Mussolino, PhD, FAHA Khurram Nasir, MD, MPH, FAHA Robert W. Neumar, MD, PhD, FAHA Latha Palaniappan, MD, MS, FAHA Dilip K. Pandey, MBBS, MS, PhD, FAHA Ravi R. Thiagarajan, MD, MPH Mathew J. Reeves, PhD Matthew Ritchey, PT, DPT, OCS, MPH Carlos J. Rodriguez, MD, MPH, FAHA Gregory A. Roth, MD, MPH Wayne D. Rosamond, PhD, FAHA Comilla Sasson, MD, PhD, FAHA Amytis Towfighi, MD Connie W. Tsao, MD, MPH Melanie B. Turner, MPH Salim S. Virani, MD, PhD, FAHA Jenifer H. Voeks, PhD Joshua Z. Willey, MD, MS John T. Wilkins, MD Jason HY. Wu, MSc, PhD, FAHA Heather M. Alger, PhD Sally S. Wong, PhD, RD, CDN, FAHA Paul Muntner, PhD, MHSc On behalf of the American Heart Association Statistics Committee and Stroke Statistics Subcommittee Heart Disease and Stroke Statistics—2017 Update
7,190 citations