scispace - formally typeset
Search or ask a question
Author

David Harris

Bio: David Harris is an academic researcher from Wellcome Trust Sanger Institute. The author has contributed to research in topics: Genome & Gene. The author has an hindex of 42, co-authored 59 publications receiving 27537 citations. Previous affiliations of David Harris include Wellcome Trust & John Radcliffe Hospital.


Papers
More filters
Journal ArticleDOI
11 Jun 1998-Nature
TL;DR: The complete genome sequence of the best-characterized strain of Mycobacterium tuberculosis, H37Rv, has been determined and analysed in order to improve the understanding of the biology of this slow-growing pathogen and to help the conception of new prophylactic and therapeutic interventions.
Abstract: Countless millions of people have died from tuberculosis, a chronic infectious disease caused by the tubercle bacillus. The complete genome sequence of the best-characterized strain of Mycobacterium tuberculosis, H37Rv, has been determined and analysed in order to improve our understanding of the biology of this slow-growing pathogen and to help the conception of new prophylactic and therapeutic interventions. The genome comprises 4,411,529 base pairs, contains around 4,000 genes, and has a very high guanine + cytosine content that is reflected in the biased amino-acid content of the proteins. M. tuberculosis differs radically from other bacteria in that a very large portion of its coding capacity is devoted to the production of enzymes involved in lipogenesis and lipolysis, and to two new families of glycine-rich proteins with a repetitive structure that may represent a source of antigenic variation.

7,779 citations

Journal ArticleDOI
09 May 2002-Nature
TL;DR: The 8,667,507 base pair linear chromosome of Streptomyces coelicolor is reported, containing the largest number of genes so far discovered in a bacterium.
Abstract: Streptomyces coelicolor is a representative of the group of soil-dwelling, filamentous bacteria responsible for producing most natural antibiotics used in human and veterinary medicine. Here we report the 8,667,507 base pair linear chromosome of this organism, containing the largest number of genes so far discovered in a bacterium. The 7,825 predicted genes include more than 20 clusters coding for known or predicted secondary metabolites. The genome contains an unprecedented proportion of regulatory genes, predominantly those likely to be involved in responses to external stimuli and stresses, and many duplicated gene sets that may represent 'tissue-specific' isoforms operating in different phases of colonial development, a unique situation for a bacterium. An ancient synteny was revealed between the central 'core' of the chromosome and the whole chromosome of pathogens Mycobacterium tuberculosis and Corynebacterium diphtheriae. The genome sequence will greatly increase our understanding of microbial life in the soil as well as aiding the generation of new drug candidates by genetic engineering.

3,077 citations

Journal ArticleDOI
Valerie Wood1, R. Gwilliam1, Marie-Adèle Rajandream1, M. Lyne1, Rachel Lyne1, A. Stewart2, J. Sgouros2, N. Peat2, Jacqueline Hayles2, Stephen Baker1, D. Basham1, Sharen Bowman1, Karen Brooks1, D. Brown1, Steve D.M. Brown1, Tracey Chillingworth1, Carol Churcher1, Mark O. Collins1, R. Connor1, Ann Cronin1, P. Davis1, Theresa Feltwell1, Andrew G. Fraser1, S. Gentles1, Arlette Goble1, N. Hamlin1, David Harris1, J. Hidalgo1, Geoffrey M. Hodgson1, S. Holroyd1, T. Hornsby1, S. Howarth1, Elizabeth J. Huckle1, Sarah E. Hunt1, Kay Jagels1, Kylie R. James1, L. Jones1, Matthew Jones1, S. Leather1, S. McDonald1, J. McLean1, P. Mooney1, Sharon Moule1, Karen Mungall1, Lee Murphy1, D. Niblett1, C. Odell1, Karen Oliver1, Susan O'Neil1, D. Pearson1, Michael A. Quail1, Ester Rabbinowitsch1, Kim Rutherford1, Simon Rutter1, David L. Saunders1, Kathy Seeger1, Sarah Sharp1, Jason Skelton1, Mark Simmonds1, R. Squares1, S. Squares1, K. Stevens1, K. Taylor1, Ruth Taylor1, Adrian Tivey1, S. Walsh1, T. Warren1, S. Whitehead1, John Woodward1, Guido Volckaert3, Rita Aert3, Johan Robben3, B. Grymonprez3, I. Weltjens3, E. Vanstreels3, Michael A. Rieger, M. Schafer, S. Muller-Auer, C. Gabel, M. Fuchs, C. Fritzc, E. Holzer, D. Moestl, H. Hilbert, K. Borzym4, I. Langer4, Alfred Beck4, Hans Lehrach4, Richard Reinhardt4, Thomas M. Pohl5, P. Eger5, Wolfgang Zimmermann, H. Wedler, R. Wambutt, Bénédicte Purnelle6, André Goffeau6, Edouard Cadieu7, Stéphane Dréano7, Stéphanie Gloux7, Valerie Lelaure7, Stéphanie Mottier7, Francis Galibert7, Stephen J. Aves8, Z. Xiang8, Cherryl Hunt8, Karen Moore8, S. M. Hurst8, M. Lucas9, M. Rochet9, Claude Gaillardin9, Victor A. Tallada10, Victor A. Tallada11, Andrés Garzón10, Andrés Garzón11, G. Thode10, Rafael R. Daga11, Rafael R. Daga10, L. Cruzado10, Juan Jimenez11, Juan Jimenez10, Miguel del Nogal Sánchez12, F. del Rey12, J. Benito12, Angel Domínguez12, José L. Revuelta12, Sergio Moreno12, John Armstrong13, Susan L. Forsburg14, L. Cerrutti1, Todd M. Lowe15, W. R. McCombie16, Ian T. Paulsen17, Judith A. Potashkin18, G. V. Shpakovski19, David W. Ussery20, Bart Barrell1, Paul Nurse2 
21 Feb 2002-Nature
TL;DR: The genome of fission yeast (Schizosaccharomyces pombe), which contains the smallest number of protein-coding genes yet recorded for a eukaryote, is sequenced and highly conserved genes important for eukARYotic cell organization including those required for the cytoskeleton, compartmentation, cell-cycle control, proteolysis, protein phosphorylation and RNA splicing are identified.
Abstract: We have sequenced and annotated the genome of fission yeast (Schizosaccharomyces pombe), which contains the smallest number of protein-coding genes yet recorded for a eukaryote: 4,824. The centromeres are between 35 and 110 kilobases (kb) and contain related repeats including a highly conserved 1.8-kb element. Regions upstream of genes are longer than in budding yeast (Saccharomyces cerevisiae), possibly reflecting more-extended control regions. Some 43% of the genes contain introns, of which there are 4,730. Fifty genes have significant similarity with human disease genes; half of these are cancer related. We identify highly conserved genes important for eukaryotic cell organization including those required for the cytoskeleton, compartmentation, cell-cycle control, proteolysis, protein phosphorylation and RNA splicing. These genes may have originated with the appearance of eukaryotic life. Few similarly conserved genes that are important for multicellular organization were identified, suggesting that the transition from prokaryotes to eukaryotes required more new genes than did the transition from unicellular to multicellular organization.

1,686 citations

Journal ArticleDOI
22 Feb 2001-Nature
TL;DR: Comparing the 3.27-megabase genome sequence of an armadillo-derived Indian isolate of the leprosy bacillus with that of Mycobacterium tuberculosis provides clear explanations for these properties and reveals an extreme case of reductive evolution.
Abstract: Leprosy, a chronic human neurological disease, results from infection with the obligate intracellular pathogen Mycobacterium leprae, a close relative of the tubercle bacillus. Mycobacterium leprae has the longest doubling time of all known bacteria and has thwarted every effort at culture in the laboratory. Comparing the 3.27-megabase (Mb) genome sequence of an armadillo-derived Indian isolate of the leprosy bacillus with that of Mycobacterium tuberculosis (4.41 Mb) provides clear explanations for these properties and reveals an extreme case of reductive evolution. Less than half of the genome contains functional genes but pseudogenes, with intact counterparts in M. tuberculosis, abound. Genome downsizing and the current mosaic arrangement appear to have resulted from extensive recombination events between dispersed repetitive sequences. Gene deletion and decay have eliminated many important metabolic activities including siderophore production, part of the oxidative and most of the microaerophilic and anaerobic respiratory chains, and numerous catabolic systems and their regulatory circuits.

1,620 citations

Journal ArticleDOI
Alasdair Ivens1, Christopher S. Peacock1, Elizabeth A. Worthey2, Lee Murphy1, Gautam Aggarwal2, Matthew Berriman1, Ellen Sisk2, Marie-Adèle Rajandream1, Ellen Adlem1, Rita Aert3, Atashi Anupama2, Zina Apostolou, Philip Attipoe2, Nathalie Bason1, Christopher Bauser4, Alfred Beck5, Stephen M. Beverley6, Gabriella Bianchettin7, K. Borzym5, G. Bothe4, Carlo V. Bruschi8, Carlo V. Bruschi7, Matt Collins1, Eithon Cadag2, Laura Ciarloni7, Christine Clayton, Richard M.R. Coulson9, Ann Cronin1, Angela K. Cruz10, Robert L. Davies1, Javier G. De Gaudenzi11, Deborah E. Dobson6, Andreas Duesterhoeft, Gholam Fazelina2, Nigel Fosker1, Alberto C.C. Frasch11, Audrey Fraser1, Monika Fuchs, Claudia Gabel, Arlette Goble1, André Goffeau12, David Harris1, Christiane Hertz-Fowler1, Helmut Hilbert, David Horn13, Yiting Huang2, Sven Klages5, Andrew J Knights1, Michael Kube5, Natasha Larke1, Lyudmila Litvin2, Angela Lord1, Tin Louie2, Marco A. Marra, David Masuy12, Keith R. Matthews14, Shulamit Michaeli, Jeremy C. Mottram15, Silke Müller-Auer, Heather Munden2, Siri Nelson2, Halina Norbertczak1, Karen Oliver1, Susan O'Neil1, Martin Pentony2, Thomas M. Pohl4, Claire Price1, Bénédicte Purnelle12, Michael A. Quail1, Ester Rabbinowitsch1, Richard Reinhardt5, Michael A. Rieger, Joel Rinta2, Johan Robben3, Laura Robertson2, Jeronimo C. Ruiz10, Simon Rutter1, David L. Saunders1, Melanie Schäfer, Jacquie Schein, David C. Schwartz16, Kathy Seeger1, Amber Seyler2, Sarah Sharp1, Heesun Shin, Dhileep Sivam2, Rob Squares1, Steve Squares1, Valentina Tosato7, Christy Vogt2, Guido Volckaert3, Rolf Wambutt, T. Warren1, Holger Wedler, John Woodward1, Shiguo Zhou16, Wolfgang Zimmermann, Deborah F. Smith17, Jenefer M. Blackwell18, Kenneth Stuart19, Kenneth Stuart2, Bart Barrell1, Peter J. Myler2, Peter J. Myler19 
15 Jul 2005-Science
TL;DR: The organization of protein-coding genes into long, strand-specific, polycistronic clusters and lack of general transcription factors in the L. major, Trypanosoma brucei, and Tritryp genomes suggest that the mechanisms regulating RNA polymerase II–directed transcription are distinct from those operating in other eukaryotes, although the trypanosomatids appear capable of chromatin remodeling.
Abstract: Leishmania species cause a spectrum of human diseases in tropical and subtropical regions of the world. We have sequenced the 36 chromosomes of the 32.8-megabase haploid genome of Leishmania major (Friedlin strain) and predict 911 RNA genes, 39 pseudogenes, and 8272 protein-coding genes, of which 36% can be ascribed a putative function. These include genes involved in host-pathogen interactions, such as proteolytic enzymes, and extensive machinery for synthesis of complex surface glycoconjugates. The organization of protein-coding genes into long, strand-specific, polycistronic clusters and lack of general transcription factors in the L. major, Trypanosoma brucei, and Trypanosoma cruzi (Tritryp) genomes suggest that the mechanisms regulating RNA polymerase II-directed transcription are distinct from those operating in other eukaryotes, although the trypanosomatids appear capable of chromatin remodeling. Abundant RNA-binding proteins are encoded in the Tritryp genomes, consistent with active posttranscriptional regulation of gene expression.

1,357 citations


Cited by
More filters
Journal ArticleDOI
19 Nov 2014-PLOS ONE
TL;DR: Pilon is a fully automated, all-in-one tool for correcting draft assemblies and calling sequence variants of multiple sizes, including very large insertions and deletions, which is being used to improve the assemblies of thousands of new genomes and to identify variants from thousands of clinically relevant bacterial strains.
Abstract: Advances in modern sequencing technologies allow us to generate sufficient data to analyze hundreds of bacterial genomes from a single machine in a single day. This potential for sequencing massive numbers of genomes calls for fully automated methods to produce high-quality assemblies and variant calls. We introduce Pilon, a fully automated, all-in-one tool for correcting draft assemblies and calling sequence variants of multiple sizes, including very large insertions and deletions. Pilon works with many types of sequence data, but is particularly strong when supplied with paired end data from two Illumina libraries with small e.g., 180 bp and large e.g., 3-5 Kb inserts. Pilon significantly improves draft genome assemblies by correcting bases, fixing mis-assemblies and filling gaps. For both haploid and diploid genomes, Pilon produces more contiguous genomes with fewer errors, enabling identification of more biologically relevant genes. Furthermore, Pilon identifies small variants with high accuracy as compared to state-of-the-art tools and is unique in its ability to accurately identify large sequence variants including duplications and resolve large insertions. Pilon is being used to improve the assemblies of thousands of new genomes and to identify variants from thousands of clinically relevant bacterial strains. Pilon is freely available as open source software.

5,659 citations

01 Jan 2016
TL;DR: The modern applied statistics with s is universally compatible with any devices to read, and is available in the digital library an online access to it is set as public so you can download it instantly.
Abstract: Thank you very much for downloading modern applied statistics with s. As you may know, people have search hundreds times for their favorite readings like this modern applied statistics with s, but end up in harmful downloads. Rather than reading a good book with a cup of coffee in the afternoon, instead they cope with some harmful virus inside their laptop. modern applied statistics with s is available in our digital library an online access to it is set as public so you can download it instantly. Our digital library saves in multiple countries, allowing you to get the most less latency time to download any of our books like this one. Kindly say, the modern applied statistics with s is universally compatible with any devices to read.

5,249 citations

Journal ArticleDOI
TL;DR: A new greedy alignment algorithm is introduced with particularly good performance and it is shown that it computes the same alignment as does a certain dynamic programming algorithm, while executing over 10 times faster on appropriate data.
Abstract: For aligning DNA sequences that differ only by sequencing errors, or by equivalent errors from other sources, a greedy algorithm can be much faster than traditional dynamic programming approaches and yet produce an alignment that is guaranteed to be theoretically optimal. We introduce a new greedy alignment algorithm with particularly good performance and show that it computes the same alignment as does a certain dynamic programming algorithm, while executing over 10 times faster on appropriate data. An implementation of this algorithm is currently used in a program that assembles the UniGene database at the National Center for Biotechnology Information.

4,628 citations

Journal ArticleDOI
03 Oct 2002-Nature
TL;DR: The genome sequence of P. falciparum clone 3D7 is reported, which is the most (A + T)-rich genome sequenced to date and is being exploited in the search for new drugs and vaccines to fight malaria.
Abstract: The parasite Plasmodium falciparum is responsible for hundreds of millions of cases of malaria, and kills more than one million African children annually. Here we report an analysis of the genome sequence of P. falciparum clone 3D7. The 23-megabase nuclear genome consists of 14 chromosomes, encodes about 5,300 genes, and is the most (A + T)-rich genome sequenced to date. Genes involved in antigenic variation are concentrated in the subtelomeric regions of the chromosomes. Compared to the genomes of free-living eukaryotic microbes, the genome of this intracellular parasite encodes fewer enzymes and transporters, but a large proportion of genes are devoted to immune evasion and host-parasite interactions. Many nuclear-encoded proteins are targeted to the apicoplast, an organelle involved in fatty-acid and isoprenoid metabolism. The genome sequence provides the foundation for future studies of this organism, and is being exploited in the search for new drugs and vaccines to fight malaria.

4,312 citations

Journal ArticleDOI
31 Aug 2000-Nature
TL;DR: It is proposed that the size and complexity of the P. aeruginosa genome reflect an evolutionary adaptation permitting it to thrive in diverse environments and resist the effects of a variety of antimicrobial substances.
Abstract: Pseudomonas aeruginosa is a ubiquitous environmental bacterium that is one of the top three causes of opportunistic human infections. A major factor in its prominence as a pathogen is its intrinsic resistance to antibiotics and disinfectants. Here we report the complete sequence of P. aeruginosa strain PAO1. At 6.3 million base pairs, this is the largest bacterial genome sequenced, and the sequence provides insights into the basis of the versatility and intrinsic drug resistance of P. aeruginosa. Consistent with its larger genome size and environmental adaptability, P. aeruginosa contains the highest proportion of regulatory genes observed for a bacterial genome and a large number of genes involved in the catabolism, transport and efflux of organic compounds as well as four potential chemotaxis systems. We propose that the size and complexity of the P. aeruginosa genome reflect an evolutionary adaptation permitting it to thrive in diverse environments and resist the effects of a variety of antimicrobial substances.

4,220 citations