scispace - formally typeset
Search or ask a question

Showing papers on "Genomics published in 2002"


Journal ArticleDOI
Robert H. Waterston1, Kerstin Lindblad-Toh2, Ewan Birney, Jane Rogers3  +219 moreInstitutions (26)
05 Dec 2002-Nature
TL;DR: The results of an international collaboration to produce a high-quality draft sequence of the mouse genome are reported and an initial comparative analysis of the Mouse and human genomes is presented, describing some of the insights that can be gleaned from the two sequences.
Abstract: The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

6,643 citations


Journal ArticleDOI
25 Jul 2002-Nature
TL;DR: It is shown that previously known and new genes are necessary for optimal growth under six well-studied conditions: high salt, sorbitol, galactose, pH 8, minimal medium and nystatin treatment, and less than 7% of genes that exhibit a significant increase in messenger RNA expression are also required for optimal Growth in four of the tested conditions.
Abstract: Determining the effect of gene deletion is a fundamental approach to understanding gene function. Conventional genetic screens exhibit biases, and genes contributing to a phenotype are often missed. We systematically constructed a nearly complete collection of gene-deletion mutants (96% of annotated open reading frames, or ORFs) of the yeast Saccharomyces cerevisiae. DNA sequences dubbed 'molecular bar codes' uniquely identify each strain, enabling their growth to be analysed in parallel and the fitness contribution of each gene to be quantitatively assessed by hybridization to high-density oligonucleotide arrays. We show that previously known and new genes are necessary for optimal growth under six well-studied conditions: high salt, sorbitol, galactose, pH 8, minimal medium and nystatin treatment. Less than 7% of genes that exhibit a significant increase in messenger RNA expression are also required for optimal growth in four of the tested conditions. Our results validate the yeast gene-deletion collection as a valuable resource for functional genomics.

4,328 citations


Journal ArticleDOI
03 Oct 2002-Nature
TL;DR: The genome sequence of P. falciparum clone 3D7 is reported, which is the most (A + T)-rich genome sequenced to date and is being exploited in the search for new drugs and vaccines to fight malaria.
Abstract: The parasite Plasmodium falciparum is responsible for hundreds of millions of cases of malaria, and kills more than one million African children annually. Here we report an analysis of the genome sequence of P. falciparum clone 3D7. The 23-megabase nuclear genome consists of 14 chromosomes, encodes about 5,300 genes, and is the most (A + T)-rich genome sequenced to date. Genes involved in antigenic variation are concentrated in the subtelomeric regions of the chromosomes. Compared to the genomes of free-living eukaryotic microbes, the genome of this intracellular parasite encodes fewer enzymes and transporters, but a large proportion of genes are devoted to immune evasion and host-parasite interactions. Many nuclear-encoded proteins are targeted to the apicoplast, an organelle involved in fatty-acid and isoprenoid metabolism. The genome sequence provides the foundation for future studies of this organism, and is being exploited in the search for new drugs and vaccines to fight malaria.

4,312 citations


Journal ArticleDOI
05 Apr 2002-Science
TL;DR: A draft sequence of the rice genome for the most widely cultivated subspecies in China, Oryza sativa L. ssp.indica, by whole-genome shotgun sequencing is produced, with a large proportion of rice genes with no recognizable homologs due to a gradient in the GC content of rice coding sequences.
Abstract: We have produced a draft sequence of the rice genome for the most widely cultivated subspecies in China, Oryza sativa L. ssp. indica, by whole-genome shotgun sequencing. The genome was 466 megabases in size, with an estimated 46,022 to 55,615 genes. Functional coverage in the assembled sequences was 92.0%. About 42.2% of the genome was in exact 20-nucleotide oligomer repeats, and most of the transposons were in the intergenic regions between genes. Although 80.6% of predicted Arabidopsis thaliana genes had a homolog in rice, only 49.4% of predicted rice genes had a homolog in A. thaliana. The large proportion of rice genes with no recognizable homologs is due to a gradient in the GC-content of rice coding sequences.

4,064 citations


Journal ArticleDOI
TL;DR: This work presents a novel approach called TRIBE-MCL for rapid and accurate clustering of protein sequences into families based on precomputed sequence similarity information that has been rigorously tested and validated on a number of very large databases.
Abstract: Detection of protein families in large databases is one of the principal research objectives in structural and functional genomics. Protein family classification can significantly contribute to the delineation of functional diversity of homologous proteins, the prediction of function based on domain architecture or the presence of sequence motifs as well as comparative genomics, providing valuable evolutionary insights. We present a novel approach called TRIBE-MCL for rapid and accurate clustering of protein sequences into families. The method relies on the Markov cluster (MCL) algorithm for the assignment of proteins into families based on precomputed sequence similarity information. This novel approach does not suffer from the problems that normally hinder other protein sequence clustering algorithms, such as the presence of multi-domain proteins, promiscuous domains and fragmented proteins. The method has been rigorously tested and validated on a number of very large databases, including SwissProt, InterPro, SCOP and the draft human genome. Our results indicate that the method is ideally suited to the rapid and accurate detection of protein families on a large scale. The method has been used to detect and categorise protein families within the draft human genome and the resulting families have been used to annotate a large proportion of human proteins.

3,468 citations


Journal ArticleDOI
09 May 2002-Nature
TL;DR: The 8,667,507 base pair linear chromosome of Streptomyces coelicolor is reported, containing the largest number of genes so far discovered in a bacterium.
Abstract: Streptomyces coelicolor is a representative of the group of soil-dwelling, filamentous bacteria responsible for producing most natural antibiotics used in human and veterinary medicine. Here we report the 8,667,507 base pair linear chromosome of this organism, containing the largest number of genes so far discovered in a bacterium. The 7,825 predicted genes include more than 20 clusters coding for known or predicted secondary metabolites. The genome contains an unprecedented proportion of regulatory genes, predominantly those likely to be involved in responses to external stimuli and stresses, and many duplicated gene sets that may represent 'tissue-specific' isoforms operating in different phases of colonial development, a unique situation for a bacterium. An ancient synteny was revealed between the central 'core' of the chromosome and the whole chromosome of pathogens Mycobacterium tuberculosis and Corynebacterium diphtheriae. The genome sequence will greatly increase our understanding of microbial life in the soil as well as aiding the generation of new drug candidates by genetic engineering.

3,077 citations


Journal ArticleDOI
Yasushi Okazaki, Masaaki Furuno, Takeya Kasukawa1, Jun Adachi, Hidemasa Bono, S. Kondo, Itoshi Nikaido2, Naoki Osato, Rintaro Saito3, Harukazu Suzuki, Itaru Yamanaka, H. Kiyosawa2, Ken Yagi, Yasuhiro Tomaru4, Yuki Hasegawa2, A. Nogami2, Christian Schönbach, Takashi Gojobori, Richard M. Baldarelli, David P. Hill, Carol J. Bult, David A. Hume5, John Quackenbush6, Lynn M. Schriml7, Alexander Kanapin, Hideo Matsuda8, Serge Batalov9, Kirk W. Beisel10, Judith A. Blake, Dirck W. Bradt, Vladimir Brusic, Cyrus Chothia11, Lori E. Corbani, S. Cousins, Emiliano Dalla, Tommaso A. Dragani, Colin F. Fletcher12, Colin F. Fletcher9, Alistair R. R. Forrest5, K. S. Frazer13, Terry Gaasterland14, Manuela Gariboldi, Carmela Gissi15, Adam Godzik16, Julian Gough11, Sean M. Grimmond5, Stefano Gustincich17, Nobutaka Hirokawa18, Ian J. Jackson19, Erich D. Jarvis20, Akio Kanai3, Hideya Kawaji1, Hideya Kawaji8, Yuka Imamura Kawasawa21, Rafal M. Kedzierski21, Benjamin L. King, Akihiko Konagaya, Igor V. Kurochkin, Yong-Hwan Lee6, Boris Lenhard22, Paul A. Lyons23, Donna Maglott7, Lois J. Maltais, Luigi Marchionni, Louise M. McKenzie, Harukata Miki18, Takeshi Nagashima, Koji Numata3, Toshihisa Okido, William J. Pavan7, Geo Pertea6, Graziano Pesole15, Nikolai Petrovsky24, Ramesh S. Pillai, Joan Pontius7, D. Qi, Sridhar Ramachandran, Timothy Ravasi5, Jonathan C. Reed16, Deborah J Reed, Jeffrey G. Reid, Brian Z. Ring, M. Ringwald, Albin Sandelin22, Claudio Schneider, Colin A. Semple19, Mitsutoshi Setou18, K. Shimada25, Razvan Sultana6, Yoichi Takenaka8, Martin S. Taylor19, Rohan D. Teasdale5, Masaru Tomita3, Roberto Verardo, Lukas Wagner7, Claes Wahlestedt22, Y. Wang6, Yoshiki Watanabe25, Christine A. Wells5, Laurens G. Wilming26, Anthony Wynshaw-Boris27, Masashi Yanagisawa21, Ivana V. Yang6, L. Yang, Zheng Yuan5, Mihaela Zavolan14, Yunhui Zhu, Anne M. Zimmer28, Piero Carninci, N. Hayatsu, Tomoko Hirozane-Kishikawa, Hideaki Konno, M. Nakamura, Naoko Sakazume, K. Sato4, Toshiyuki Shiraki, Kazunori Waki, Jun Kawai, Katsunori Aizawa, Takahiro Arakawa, S. Fukuda, A. Hara, W. Hashizume, K. Imotani, Y. Ishii, Masayoshi Itoh, Ikuko Kagawa, A. Miyazaki, K. Sakai, D. Sasaki, K. Shibata, Akira Shinagawa, Ayako Yasunishi, Masayasu Yoshino, Robert H. Waterston29, Eric S. Lander30, Jane Rogers26, Ewan Birney, Yoshihide Hayashizaki 
05 Dec 2002-Nature
TL;DR: The present work, completely supported by physical clones, provides the most comprehensive survey of a mammalian transcriptome so far, and is a valuable resource for functional genomics.
Abstract: Only a small proportion of the mouse genome is transcribed into mature messenger RNA transcripts There is an international collaborative effort to identify all full-length mRNA transcripts from the mouse, and to ensure that each is represented in a physical collection of clones Here we report the manual annotation of 60,770 full-length mouse complementary DNA sequences These are clustered into 33,409 'transcriptional units', contributing 901% of a newly established mouse transcriptome database Of these transcriptional units, 4,258 are new protein-coding and 11,665 are new non-coding messages, indicating that non-coding RNA is a major component of the transcriptome 41% of all transcriptional units showed evidence of alternative splicing In protein-coding transcripts, 79% of splice variations altered the protein product Whole-transcriptome analyses resulted in the identification of 2,431 sense-antisense pairs The present work, completely supported by physical clones, provides the most comprehensive survey of a mammalian transcriptome so far, and is a valuable resource for functional genomics

1,663 citations


Journal ArticleDOI
TL;DR: The recent availability of large quantities of genomic sequence has led to a shift from the genetic characterization of single elements to genome-wide analysis of enormous transposable-element populations, particularly in plants.
Abstract: Transposable elements are the single largest component of the genetic material of most eukaryotes. The recent availability of large quantities of genomic sequence has led to a shift from the genetic characterization of single elements to genome-wide analysis of enormous transposable-element populations. Nowhere is this shift more evident than in plants, in which transposable elements were first discovered and where they are still actively reshaping genomes.

923 citations


Journal ArticleDOI
TL;DR: MUMmer as discussed by the authors is a suffix-tree algorithm that can align the entire genome sequences of eukaryotic and prokaryotic organisms with minimal use of computer time and memory.
Abstract: We describe a suffix-tree algorithm that can align the entire genome sequences of eukaryotic and prokaryotic organisms with minimal use of computer time and memory. The new system, MUMmer 2, runs three times faster while using one-third as much memory as the original MUMmer system. It has been used successfully to align the entire human and mouse genomes to each other, and to align numerous smaller eukaryotic and prokaryotic genomes. A new module permits the alignment of multiple DNA sequence fragments, which has proven valuable in the comparison of incomplete genome sequences. We also describe a method to align more distantly related genomes by detecting protein sequence homology. This extension to MUMmer aligns two genomes after translating the sequence in all six reading frames, extracts all matching protein sequences and then clusters together matches. This method has been applied to both incomplete and complete genome sequences in order to detect regions of conserved synteny, in which multiple proteins from one organism are found in the same order and orientation in another. The system code is being made freely available by the authors.

897 citations


Journal ArticleDOI
31 Jan 2002-Nature
TL;DR: The complete genome sequence and its analysis of strain GMI1000 suggests that bacterial plant pathogens and animal pathogens harbour distinct arrays of specialized type III-dependent effectors.
Abstract: Ralstonia solanacearum is a devastating, soil-borne plant pathogen with a global distribution and an unusually wide host range. It is a model system for the dissection of molecular determinants governing pathogenicity. We present here the complete genome sequence and its analysis of strain GMI1000. The 5.8-megabase (Mb) genome is organized into two replicons: a 3.7-Mb chromosome and a 2.1-Mb megaplasmid. Both replicons have a mosaic structure providing evidence for the acquisition of genes through horizontal gene transfer. Regions containing genetically mobile elements associated with the percentage of G+C bias may have an important function in genome evolution. The genome encodes many proteins potentially associated with a role in pathogenicity. In particular, many putative attachment factors were identified. The complete repertoire of type III secreted effector proteins can be studied. Over 40 candidates were identified. Comparison with other genomes suggests that bacterial plant pathogens and animal pathogens harbour distinct arrays of specialized type III-dependent effectors.

887 citations


Journal ArticleDOI
TL;DR: This review article provides definitions of terms commonly used in genetics, delineates the distinction between genetics and genomics, and supplies examples of the ways in which genetic information can be used in the day-to-day care of patients.
Abstract: This review article launches our series on genomic medicine. It provides definitions of terms commonly used in genetics, delineates the distinction between genetics and genomics, and supplies examples of the ways in which genetic information can be used in the day-to-day care of patients. The mechanisms leading to the availability of more than 100,000 proteins from only approximately 30,000 genes are described. The various common types of mutations are identified and defined, and modes of inheritance — from simple mendelian to complex to mitochondrial — are detailed.

Journal ArticleDOI
TL;DR: The complete re-annotation of the genome sequence of Mycobacterium tuberculosis strain H37Rv is presented almost 4 years after the first submission, with eighty-two new protein-coding sequences (CDS) included and 22 of these have a predicted function.
Abstract: Original genome annotations need to be regularly updated if the information they contain is to remain accurate and relevant. Here the complete re-annotation of the genome sequence of Mycobacterium tuberculosis strain H37Rv is presented almost 4 years after the first submission. Eighty-two new protein-coding sequences (CDS) have been included and 22 of these have a predicted function. The majority were identified by manual or automated re-analysis of the genome and most of them were shorter than the 100 codon cut-off used in the initial genome analysis. The functional classification of 643 CDS has been changed based principally on recent sequence comparisons and new experimental data from the literature. More than 300 gene names and over 1000 targeted citations have been added and the lengths of 60 genes have been modified. Presently, it is possible to assign a function to 2058 proteins (52% of the 3995 proteins predicted) and only 376 putative proteins share no homology with known proteins and thus could be unique to M. tuberculosis.

Journal ArticleDOI
TL;DR: The approach presented here simplifies the production of proteins from a wide variety of organisms for genomics-based studies and automates the design of oligonucleotides for gene synthesis.
Abstract: The availability of sequences of entire genomes has dramatically increased the number of protein targets, many of which will need to be overexpressed in cells other than the original source of DNA Gene synthesis often provides a fast and economically efficient approach The synthetic gene can be optimized for expression and constructed for easy mutational manipulation without regard to the parent genome Yet design and construction of synthetic genes, especially those coding for large proteins, can be a slow, difficult and confusing process We have written a computer program that automates the design of oligonucleotides for gene synthesis Our program requires simple input information, ie amino acid sequence of the target protein and melting temperature (needed for the gene assembly) of synthetic oligonucleotides The program outputs a series of oligonucleotide sequences with codons optimized for expression in an organism of choice Those oligonucleotides are characterized by highly homogeneous melting temperatures and a minimized tendency for hairpin formation With the help of this program and a two-step PCR method, we have successfully constructed numerous synthetic genes, ranging from 139 to 1042 bp The approach presented here simplifies the production of proteins from a wide variety of organisms for genomics-based studies

Journal ArticleDOI
02 Aug 2002-Science
TL;DR: By combining advances in computational fluorescence microscopy with multiplex probe design, this work devised technology in which the expression of many genes can be visualized simultaneously inside single cells with high spatial and temporal resolution.
Abstract: A key goal of biology is to relate the expression of specific genes to a particular cellular phenotype. However, current assays for gene expression destroy the structural context. By combining advances in computational fluorescence microscopy with multiplex probe design, we devised technology in which the expression of many genes can be visualized simultaneously inside single cells with high spatial and temporal resolution. Analysis of 11 genes in serum-stimulated cultured cells revealed unique patterns of gene expression within individual cells. Using the nucleus as the substrate for parallel gene analysis, we provide a platform for the fusion of genomics and cell biology: "cellular genomics."

Journal ArticleDOI
TL;DR: A genomics approach that uses hmmer, a computational search tool based on hidden Markov models, in combination with blast identifies 28 new human and 43 new mouse β-defensin genes in five syntenic chromosomal regions, demonstrating the value of a genomewide search strategy to identify genes with conserved structural motifs.
Abstract: The innate immune system includes antimicrobial peptides that protect multicellular organisms from a diverse spectrum of microorganisms. β-Defensins comprise one important family of mammalian antimicrobial peptides. The annotation of the human genome fails to reveal the expected diversity, and a recent query of the draft sequence with the blast search engine found only one new β-defensin gene (DEFB3). To define better the β-defensin gene family, we adopted a genomics approach that uses hmmer, a computational search tool based on hidden Markov models, in combination with blast. This strategy identified 28 new human and 43 new mouse β-defensin genes in five syntenic chromosomal regions. Within each syntenic cluster, the gene sequences and organization were similar, suggesting each cluster pair arose from a common ancestor and was retained because of conserved functions. Preliminary analysis indicates that at least 26 of the predicted genes are transcribed. These results demonstrate the value of a genomewide search strategy to identify genes with conserved structural motifs. Discovery of these genes represents a new starting point for exploring the role of β-defensins in innate immunity.

Journal ArticleDOI
11 Jan 2002-Cell
TL;DR: Comparative Genomics addresses--for the first time at whole genome resolution--a set of fundamental biological questions related to populations: What is the structure of the global phage population?

Journal ArticleDOI
TL;DR: The WGS strategy can efficiently produce a high-quality sequence of a metazoan genome while generating the reagents required for sequence finishing, however, the initial method of repeat assembly was flawed.
Abstract: Background The Drosophila melanogaster genome was the first metazoan genome to have been sequenced by the whole-genome shotgun (WGS) method. Two issues relating to this achievement were widely debated in the genomics community: how correct is the sequence with respect to base-pair (bp) accuracy and frequency of assembly errors? And, how difficult is it to bring a WGS sequence to the accepted standard for finished sequence? We are now in a position to answer these questions.

Journal ArticleDOI
14 Jun 2002-Science
TL;DR: Comparison of the whole-genome sequence of Bacillus anthracis isolated from a victim of a recent bioterrorist anthrax attack with a reference reveals 60 new markers that include single nucleotide polymorphisms, inserted or deleted sequences, and tandem repeats.
Abstract: Comparison of the whole-genome sequence of Bacillus anthracis isolated from a victim of a recent bioterrorist anthrax attack with a reference reveals 60 new markers that include single nucleotide polymorphisms (SNPs), inserted or deleted sequences, and tandem repeats. Genome comparison detected four high-quality SNPs between the two sequenced B. anthracis chromosomes and seven differences among different preparations of the reference genome. These markers have been tested on a collection of anthrax isolates and were found to divide these samples into distinct families. These results demonstrate that genome-based analysis of microbial pathogens will provide a powerful new tool for investigation of infectious disease outbreaks.

Journal ArticleDOI
TL;DR: It is found that functional cis-regulatory variation is widespread in the human genome and that the consequent variation in gene expression is twofold or greater for 63% of the genes surveyed, and the distinctive consequences of cis-Regulatory variation for the genotype-phenotype relationship are outlined.
Abstract: Changes in gene expression and regulation--due in particular to the evolution of cis-regulatory DNA sequences--may underlie many evolutionary changes in phenotypes, yet little is known about the distribution of such variation in populations. We present in this study the first survey of experimentally validated functional cis-regulatory polymorphism. These data are derived from more than 140 polymorphisms involved in the regulation of 107 genes in Homo sapiens, the eukaryote species with the most available data. We find that functional cis-regulatory variation is widespread in the human genome and that the consequent variation in gene expression is twofold or greater for 63% of the genes surveyed. Transcription factor-DNA interactions are highly polymorphic, and regulatory interactions have been gained and lost within human populations. On average, humans are heterozygous at more functional cis-regulatory sites (>16,000) than at amino acid positions (<13,000), in part because of an overrepresentation among the former in multiallelic tandem repeat variation, especially (AC)(n) dinucleotide microsatellites. The role of microsatellites in gene expression variation may provide a larger store of heritable phenotypic variation, and a more rapid mutational input of such variation, than has been realized. Finally, we outline the distinctive consequences of cis-regulatory variation for the genotype-phenotype relationship, including ubiquitous epistasis and genotype-by-environment interactions, as well as underappreciated modes of pleiotropy and overdominance. Ordinary small-scale mutations contribute to pervasive variation in transcription rates and consequently to patterns of human phenotypic variation.

Journal ArticleDOI
31 May 2002-Science
TL;DR: Comparison of the structure and protein-coding potential of Mmu 16 with that of the homologous segments of the human genome identifies regions of conserved synteny with human chromosomes (Hsa) 3, 8, 12, 16, 21, and 22.
Abstract: The high degree of similarity between the mouse and human genomes is demonstrated through analysis of the sequence of mouse chromosome 16 (Mmu 16), which was obtained as part of a whole-genome shotgun assembly of the mouse genome. The mouse genome is about 10% smaller than the human genome, owing to a lower repetitive DNA content. Comparison of the structure and protein-coding potential of Mmu 16 with that of the homologous segments of the human genome identifies regions of conserved synteny with human chromosomes (Hsa) 3, 8, 12, 16, 21, and 22. Gene content and order are highly conserved between Mmu 16 and the syntenic blocks of the human genome. Of the 731 predicted genes on Mmu 16, 509 align with orthologs on the corresponding portions of the human genome, 44 are likely paralogous to these genes, and 164 genes have homologs elsewhere in the human genome; there are 14 genes for which we could find no human counterpart.

Journal ArticleDOI
TL;DR: Toxicogenomics is defined as “the study of the relationship between the structure and activity of the genome (the cellular complement of genes) and the adverse biological effects of exogenous agents”, and there are powerful new methods for protein analysis and for analysis of cellular small molecules.
Abstract: The unprecedented advances in molecular biology during the last two decades have resulted in a dramatic increase in knowledge about gene structure and function, an immense database of genetic sequence information, and an impressive set of efficient new technologies for monitoring genetic sequences, genetic variation, and global functional gene expression. These advances have led to a new sub-discipline of toxicology: “toxicogenomics”. We define toxicogenomics as “the study of the relationship between the structure and activity of the genome (the cellular complement of genes) and the adverse biological effects of exogenous agents.” This broad definition encompasses most of the variations in the current usage of this term, and in its broadest sense includes studies of the cellular products controlled by the genome (messenger RNAs, proteins, metabolites, etc.). The new “global” methods of measuring families of cellular molecules, such as RNA, proteins, and intermediary metabolites have been termed “-omic” technologies, based on their ability to characterize all, or most, members of a family of molecules in a single analysis. With these new tools, we can now obtain complete assessments of the functional activity of biochemical pathways, and of the structural genetic (sequence) differences among individuals and species, that were previously unattainable. These powerful new methods of high-throughput and multi-endpoint analysis, include gene expression arrays that will soon permit the simultaneous measurement of the expression of all human genes on a single “chip”. Likewise, there are powerful new methods for protein analysis (proteomics: the study of the complement of proteins in the cell) and for analysis of cellular small molecules (metabonomics: the study of the cellular This article has been reproduced from Mutation Research, Vol 499, 2002, pp 13–25, Aardema & MacGregor, by the permission of Elsevier Science, Ltd. metabolites formed and degraded under genetic control). This will likely be extended in the near future to other important classes of biomolecules such as lipids, carbohydrates, etc. These assays provide a general capability for global assessment of many classes of cellular molecules, providing new approaches to assessing functional cellular alterations. These new methods have already facilitated significant advances in our understanding of the molecular responses to cell and tissue damage, and of perturbations in functional cellular systems.

Journal ArticleDOI
TL;DR: This work identifies potential new members of many existing functional categories including 285 candidate proteins involved in transcription, processing and transport of non-coding RNA molecules and presents experimental validation confirming the involvement of several of these proteins in ribosomal RNA processing.
Abstract: Genome sequencing has led to the discovery of tens of thousands of potential new genes. Six years after the sequencing of the well-studied yeast Saccharomyces cerevisiae and the discovery that its genome encodes ∼6,000 predicted proteins, more than 2,000 have not yet been characterized experimentally, and determining their functions seems far from a trivial task. One crucial constraint is the generation of useful hypotheses about protein function. Using a new approach to interpret microarray data, we assign likely cellular functions with confidence values to these new yeast proteins. We perform extensive genome-wide validations of our predictions and offer visualization methods for exploration of the large numbers of functional predictions. We identify potential new members of many existing functional categories including 285 candidate proteins involved in transcription, processing and transport of non-coding RNA molecules. We present experimental validation confirming the involvement of several of these proteins in ribosomal RNA processing. Our methodology can be applied to a variety of genomics data types and organisms.

Journal ArticleDOI
TL;DR: The entire genome of a thermophilic unicellular cyanobacterium, Thermosynechococcus elongatus BP-1, was sequenced, and the presence of 28 copies of group II introns, 8 of which contained a presumptive gene for maturase/reverse transcriptase was observed.
Abstract: The entire genome of a thermophilic unicellular cyanobacterium, Thermosynechococcus elongatus BP-1, was sequenced. The genome consisted of a circular chromosome 2,593,857 bp long, and no plasmid was detected. A total of 2475 potential protein-encoding genes, one set of rRNA genes, 42 tRNA genes representing 42 tRNA species and 4 genes for small structural RNAs were assigned to the chromosome by similarity search and computer prediction. The translated products of 56% of the potential protein-encoding genes showed sequence similarity to experimentally identified and predicted proteins of known function, and the products of 34% of these genes showed sequence similarity to the translated products of hypothetical genes. The remaining 10% lacked significant similarity to genes for predicted proteins in the public DNA databases. Sixty-three percent of the T. elongatus genes showed significant sequence similarity to those of both Synechocystis sp. PCC 6803 and Anabaena sp. PCC 7120, while 22% of the genes were unique to this species, indicating a high degree of divergence of the gene information among cyanobacterial strains. The lack of genes for typical fatty acid desaturases and the presence of more genes for heat-shock proteins in comparison with other mesophilic cyanobacteria may be genomic features of thermophilic strains. A remarkable feature of the genome is the presence of 28 copies of group II introns, 8 of which contained a presumptive gene for maturase/reverse transcriptase. A trace of genome rearrangement mediated by the group II introns was also observed.

Journal ArticleDOI
TL;DR: Nucleotide diversity is also being used to discover the function of genes through the mapping of quantitative trait loci in structured populations, the positional cloning of strong QTL, and association mapping.

Journal ArticleDOI
TL;DR: The availability of new resources, such as a bacterial artificial chromosome library and a huge collection of expressed sequence tags, has opened the gateway to promising functional analyses on a genomic scale.

Journal ArticleDOI
TL;DR: Higher nucleotide diversity in the avian genome could be due to the relatively older age of flycatcher populations, compared with humans, and/or a higher long‐term effective population size.
Abstract: As a case study for single-nucleotide polymorphism (SNP) identification in species for which little or no sequence information is available, we investigated several approaches to identifying SNPs in two passerine bird species: pied and collared flycatchers (Ficedula hypoleuca and F. albicollis). All approaches were successful in identifying sequence polymorphism and over 50 candidate SNPs per species were identified from ≈ 9.1 kb of sequence. In addition, 17 sites were identified in which the frequency of alternative bases differed by > 50% between species (termed interspecific SNPs). Interestingly, polymorphism of microsatellite/intron loci in the source species appeared to be a positive predictor of nucleotide diversity in homologous flycatcher sequences. The overall nucleotide diversity of flycatchers was 2.3–2.7 × 10−3, which is ≈ 3–6 times higher than observed in recent studies of human SNPs. Higher nucleotide diversity in the avian genome could be due to the relatively older age of flycatcher populations, compared with humans, and/or a higher long-term effective population size.


Journal ArticleDOI
04 Apr 2002-Nature
TL;DR: In a comparison of amino-acid replacements amongspecies of the mustard weed Arabidopsis with those among species of the fruitfly Drosophila, it is found that there is evidence for predominantly beneficial gene substitutions but predominantly detrimental substitutions inArabidopsis, corroborating a prediction of population genetics theory that species with a high frequency of inbreeding are less efficient in eliminating deleterious mutations.
Abstract: Population geneticists have long sought to estimate the distribution of selection intensities among genes of diverse function across the genome. Only recently have DNA sequencing and analytical techniques converged to make this possible. Important advances have come from comparing genetic variation within species (polymorphism) with fixed differences between species (divergence). These approaches have been used to examine individual genes for evidence of selection. Here we use the fact that the time since species divergence allows combination of data across genes. In a comparison of amino-acid replacements among species of the mustard weed Arabidopsis with those among species of the fruitfly Drosophila, we find evidence for predominantly beneficial gene substitutions in Drosophila but predominantly detrimental substitutions in Arabidopsis. We attribute this difference to the Arabidopsis mating system of partial self-fertilization, which corroborates a prediction of population genetics theory that species with a high frequency of inbreeding are less efficient in eliminating deleterious mutations owing to their reduced effective population size.

Book
30 Sep 2002
TL;DR: The history of genomics, systems biology: overview of regulatory, metabolic and signaling networks, and survey of current DNA array applications.
Abstract: Preface 1. A brief history of genomics 2. DNA array formats 3. DNA array readout methods 4. Gene expression profiling experiments: problems, pitfalls and solutions 5. Statistical analysis of array data: inferring changes 6. Statistical analysis of array data: dimensionality reduction, clustering, and regulatory regions 7. Survey of current DNA array applications 8. Systems biology: overview of regulatory, metabolic and signaling networks.

Journal ArticleDOI
TL;DR: To accelerate the molecular analysis of behavior in the honey bee (Apis mellifera), expressed sequence tag (EST) and cDNA microarray resources for the bee brain are created and over 100 Apis transcript sequences conserved with other organisms appear to have been lost from the Drosophila genome.
Abstract: The honey bee (Apis mellifera) is an important model for studies of neural and behavioral plasticity, particularly with respect to social behavior, learning, and memory (Fahrbach and Robinson 1995; Robinson 1998; Menzel 2001; Maleszka et al. 2000). The neuroanatomy, neurophysiology, and neurochemistry of the honey bee brain have been studied extensively, and several functions have been mapped to particular brain regions (e.g., Menzel 2001; Fahrbach and Robinson 1995). Honey bees also have been used extensively to study the genetic underpinnings of behavior (Rothenbuhler 1967; Page and Robinson 1991). In the past few years, these lines of inquiry have been extended to the discovery of quantitative trait loci (Hunt et al. 1995, 1998) and analyses of expression levels of genes in the brain (Kucharski et al. 1998, 2000; Fiala et al. 1999; Toma et al. 2000; Shapira et al. 2001; Kucharski and Maleszka 2002). One strong advantage of working with honey bees is that it is possible to study behavior under both laboratory and natural conditions. The natural social life of honey bees, though arguably as complex as in many vertebrate societies, can be extensively manipulated with precision. Insights gained from both lab and field studies ultimately will enable information on genes influencing neural and behavioral plasticity to be interpreted from ecological and evolutionary perspectives, contributing to a more comprehensive understanding of genes, brain, and behavior (Robinson 1999). Molecular analyses in the honey bee have been constrained by the high investment required to identify and clone individual genes and the need to have an a priori hypothesis about each gene. The public databases contained only about 101 complete or near-complete A. mellifera gene sequences (nonredundant entries in SWISS-PROT and TrEMBL, as of December 2001) and, prior to this study, a total of 800 nucleotide sequences, most of them expressed sequence tags (ESTs) from antennae (H.M.R., unpubl.) or larvae (Evans and Wheeler 2001). The value of studying many genes simultaneously in the honey bee was demonstrated by Evans and Wheeler (2001) who identified gene expression profiles that were characteristic for worker/queen caste differentiation. This study involved the initial identification of 158 candidate clones using subtractive methods, and was thus limited by the small number of genes analyzed. Current DNA microarray technologies allow expression studies of many thousands of genes at the same time (Schena et al. 1995; DeRisi et al. 1997). ESTs provide an economical approach to identifying large numbers of genes that can be used in gene expression and other genomic studies (reviewed by Gerhold and Caskey 1996; see also Dimopoulos et al. 2000 and Porcel et al. 2000). Here, we describe a collection of more than 20,000 ESTs generated from the A. mellifera brain, putatively representing 8912 different transcripts after sequence assembly. To facilitate gene identification and functional genomic studies in the honey bee, the brain EST set has been annotated using the structured vocabulary provided by the Gene Ontology Consortium (2001), based on molecular studies of gene function in Drosophila melanogaster. We describe a DNA microarray resource composed of over 7000 EST cDNA clones putatively representing different transcripts. We demonstrate the utility of this resource by reporting on gene expression measured in single honey bee brains. Additionally, comparative genomics approaches were used to predict or improve predictions for 122 genes in Drosophila, as well as to identify 126 genes conserved between Apis and other organisms that apparently have been lost from the Drosophila genome.