Showing papers on "Genomics published in 2003"

PDF

Open Access

Journal Article•DOI•

[...]

John W. Belmont¹, Paul Hardenbol, Thomas D. Willis, Fuli Yu¹, Huanming Yang², Lan Yang Ch'Ang, Wei Huang³, Bin Liu², Yan Shen³, Paul K.H. Tam⁴, Lap-Chee Tsui⁴, Mary M.Y. Waye⁵, Jeffrey Tze Fei Wong⁶, Changqing Zeng², Qingrun Zhang², Mark S. Chee⁷, Luana Galver⁷, Semyon Kruglyak⁷, Sarah S. Murray⁷, Arnold Oliphant⁷, Alexandre Montpetit⁸, Fanny Chagnon⁸, Vincent Ferretti⁸, Martin Leboeuf⁸, Michael S. Phillips⁸, Andrei Verner⁸, Shenghui Duan⁹, Denise L. Lind¹⁰, Raymond D. Miller⁹, John P. Rice⁹, Nancy L. Saccone⁹, Patricia Taillon-Miller⁹, Ming Xiao¹⁰, Akihiro Sekine, Koki Sorimachi, Yoichi Tanaka, Tatsuhiko Tsunoda, Eiji Yoshino, David R. Bentley¹¹, Sarah E. Hunt¹¹, Don Powell¹¹, Houcan Zhang¹², Ichiro Matsuda¹³, Yoshimitsu Fukushima¹⁴, Darryl Macer¹⁵, Eiko Suda¹⁵, Charles N. Rotimi¹⁶, Clement Adebamowo¹⁷, Toyin Aniagwu¹⁷, Patricia A. Marshall¹⁸, Olayemi Matthew¹⁷, Chibuzor Nkwodimmah¹⁷, Charmaine D.M. Royal¹⁶, Mark Leppert¹⁹, Missy Dixon¹⁹, Fiona Cunningham²⁰, Ardavan Kanani²⁰, Gudmundur A. Thorisson²⁰, Peter E. Chen²¹, David J. Cutler²¹, Carl S. Kashuk²¹, Peter Donnelly²², Jonathan Marchini²², Gilean McVean²², Simon Myers²², Lon R. Cardon²², Andrew P. Morris²², Bruce S. Weir²³, James C. Mullikin²⁴, Michael Feolo²⁴, Mark J. Daly²⁵, Renzong Qiu²⁶, Alastair Kent, Georgia M. Dunston¹⁶, Kazuto Kato²⁷, Norio Niikawa²⁸, Jessica Watkin²⁹, Richard A. Gibbs¹, Erica Sodergren¹, George M. Weinstock¹, Richard K. Wilson⁹, Lucinda Fulton⁹, Jane Rogers¹¹, Bruce W. Birren²⁵, Hua Han², Hongguang Wang, Martin Godbout³⁰, John C. Wallenburg⁸, Paul L'Archevêque, Guy Bellemare, Kazuo Todani, Takashi Fujita, Satoshi Tanaka, Arthur L. Holden, Francis S. Collins²⁴, Lisa D. Brooks²⁴, Jean E. McEwen²⁴, Mark S. Guyer²⁴, Elke Jordan³¹, Jane Peterson²⁴, Jack Spiegel²⁴, Lawrence M. Sung³², Lynn F. Zacharia²⁴, Karen Kennedy²⁹, Michael Dunn²⁹, Richard Seabrook²⁹, Mark Shillito, Barbara Skene²⁹, John Stewart²⁹, David Valle²¹, Ellen Wright Clayton³³, Lynn B. Jorde¹⁹, Aravinda Chakravarti²¹, Mildred K. Cho³⁴, Troy Duster³⁵, Troy Duster³⁶, Morris W. Foster³⁷, Maria Jasperse³⁸, Bartha Maria Knoppers³⁹, Pui-Yan Kwok¹⁰, Julio Licinio⁴⁰, Jeffrey C. Long⁴¹, Pilar N. Ossorio⁴², Vivian Ota Wang³³, Charles N. Rotimi¹⁶, Patricia Spallone²⁹, Patricia Spallone⁴³, Sharon F. Terry⁴⁴, Eric S. Lander²⁵, Eric H. Lai⁴⁵, Deborah A. Nickerson⁴⁶, Gonçalo R. Abecasis⁴¹, David Altshuler⁴⁷, Michael Boehnke⁴¹, Panos Deloukas¹¹, Julie A. Douglas⁴¹, Stacey Gabriel²⁵, Richard R. Hudson⁴⁸, Thomas J. Hudson⁸, Leonid Kruglyak⁴⁹, Yusuke Nakamura⁵⁰, Robert L. Nussbaum²⁴, Stephen F. Schaffner²⁵, Stephen T. Sherry²⁴, Lincoln Stein²⁰, Toshihiro Tanaka - Show less +142 more•Institutions (50)

Baylor College of Medicine¹, Chinese Academy of Sciences², Chinese National Human Genome Center³, University of Hong Kong⁴, The Chinese University of Hong Kong⁵, Hong Kong University of Science and Technology⁶, Illumina⁷, McGill University⁸, Washington University in St. Louis⁹, University of California, San Francisco¹⁰, Wellcome Trust Sanger Institute¹¹, Beijing Normal University¹², Health Sciences University of Hokkaido¹³, Shinshu University¹⁴, University of Tsukuba¹⁵, Howard University¹⁶, University of Ibadan¹⁷, Case Western Reserve University¹⁸, University of Utah¹⁹, Cold Spring Harbor Laboratory²⁰, Johns Hopkins University²¹, University of Oxford²², North Carolina State University²³, National Institutes of Health²⁴, Massachusetts Institute of Technology²⁵, Chinese Academy of Social Sciences²⁶, Kyoto University²⁷, Nagasaki University²⁸, Wellcome Trust²⁹, Genome Canada³⁰, Foundation for the National Institutes of Health³¹, University of Maryland, Baltimore³², Vanderbilt University³³, Stanford University³⁴, New York University³⁵, University of California, Berkeley³⁶, University of Oklahoma³⁷, University of New Mexico³⁸, Université de Montréal³⁹, University of California, Los Angeles⁴⁰, University of Michigan⁴¹, University of Wisconsin-Madison⁴², London School of Economics and Political Science⁴³, Genetic Alliance⁴⁴, GlaxoSmithKline⁴⁵, University of Washington⁴⁶, Harvard University⁴⁷, University of Chicago⁴⁸, Fred Hutchinson Cancer Research Center⁴⁹, University of Tokyo⁵⁰

18 Dec 2003-Nature

TL;DR: The HapMap will allow the discovery of sequence variants that affect common disease, will facilitate development of diagnostic tools, and will enhance the ability to choose targets for therapeutic intervention.

...read moreread less

Abstract: The goal of the International HapMap Project is to determine the common patterns of DNA sequence variation in the human genome and to make this information freely available in the public domain. An international consortium is developing a map of these patterns across the genome by determining the genotypes of one million or more sequence variants, their frequencies and the degree of association between them, in DNA samples from populations with ancestry from parts of Africa, Asia and Europe. The HapMap will allow the discovery of sequence variants that affect common disease, will facilitate development of diagnostic tools, and will enhance our ability to choose targets for therapeutic intervention.

...read moreread less

5,926 citations

Journal Article•DOI•

Genome-Wide Insertional Mutagenesis of Arabidopsis thaliana

[...]

Jose M. Alonso¹, Anna Stepanova¹, Thomas J. Leisse¹, Christopher J. Kim¹, Huaming Chen¹, Paul Shinn¹, Denise K. Stevenson¹, Justin Zimmerman¹, Pascual Barajas¹, Rosa Cheuk¹, Carmelita Gadrinab¹, Collen Heller¹, Albert Jeske¹, Eric Koesema¹, Cristina C. Meyers¹, Holly Parker¹, Lance Prednis¹, Yasser Ansari¹, Nathan Choy¹, Hashim Deen¹, Michael Geralt¹, Nisha Hazari¹, Emily Hom¹, Meagan Karnes¹, Celene Mulholland¹, Ral Ndubaku¹, Ian Thomas Schmidt¹, Plinio Guzmán¹, Laura Aguilar-Henonin¹, Markus Schmid¹, Detlef Weigel¹, David E. Carter², Trudy Marchand², Eddy Risseeuw², Debra Brogden², Albana Zeko², William L. Crosby², Charles C. Berry³, Joseph R. Ecker¹ - Show less +35 more•Institutions (3)

Salk Institute for Biological Studies¹, National Research Council², University of California, San Diego³

01 Aug 2003-Science

TL;DR: Genome-wide analysis of the distribution of integration events revealed the existence of a large integration site bias at both the chromosome and gene levels, and insertion mutations were identified in genes that are regulated in response to the plant hormone ethylene.

...read moreread less

Abstract: Over 225,000 independent Agrobacterium transferred DNA (T-DNA) insertion events in the genome of the reference plant Arabidopsis thaliana have been created that represent near saturation of the gene space. The precise locations were determined for more than 88,000 T-DNA insertions, which resulted in the identification of mutations in more than 21,700 of the approximately 29,454 predicted Arabidopsis genes. Genome-wide analysis of the distribution of integration events revealed the existence of a large integration site bias at both the chromosome and gene levels. Insertion mutations were identified in genes that are regulated in response to the plant hormone ethylene.

...read moreread less

5,227 citations

Journal Article•DOI•

Systematic functional analysis of the Caenorhabditis elegans genome using RNAi

[...]

Ravi S. Kamath¹, Andrew G. Fraser², Andrew G. Fraser¹, Yan Dong¹, Gino B. Poulin¹, Richard Durbin², Monica Gotta¹, Alexander Kanapin³, Nathalie Le Bot¹, Sergio Moreno¹, Sergio Moreno⁴, Marc Sohrmann², David P. Welchman¹, Peder Zipperlen¹, Julie Ahringer¹ - Show less +11 more•Institutions (4)

University of Cambridge¹, Wellcome Trust Sanger Institute², European Bioinformatics Institute³, Spanish National Research Council⁴

16 Jan 2003-Nature

TL;DR: It is found that genes of similar functions are clustered in distinct, multi-megabase regions of individual chromosomes; genes in these regions tend to share transcriptional profiles.

...read moreread less

Abstract: A principal challenge currently facing biologists is how to connect the complete DNA sequence of an organism to its development and behaviour. Large-scale targeted-deletions have been successful in defining gene functions in the single-celled yeast Saccharomyces cerevisiae, but comparable analyses have yet to be performed in an animal. Here we describe the use of RNA interference to inhibit the function of ∼86% of the 19,427 predicted genes of C. elegans. We identified mutant phenotypes for 1,722 genes, about two-thirds of which were not previously associated with a phenotype. We find that genes of similar functions are clustered in distinct, multi-megabase regions of individual chromosomes; genes in these regions tend to share transcriptional profiles. Our resulting data set and reusable RNAi library of 16,757 bacterial clones will facilitate systematic analyses of the connections among gene sequence, chromosomal location and gene function in C. elegans.

...read moreread less

3,529 citations

Journal Article•DOI•

Sequencing and comparison of yeast species to identify genes and regulatory elements

[...]

Manolis Kellis¹, Nick Patterson¹, Matthew G. Endrizzi¹, Bruce W. Birren¹, Eric S. Lander¹ - Show less +1 more•Institutions (1)

Massachusetts Institute of Technology¹

15 May 2003-Nature

TL;DR: A comparative analysis of the yeast Saccharomyces cerevisiae based on high-quality draft sequences of three related species, which inferred a putative function for most of these motifs, and provided insights into their combinatorial interactions.

...read moreread less

Abstract: Identifying the functional elements encoded in a genome is one of the principal challenges in modern biology. Comparative genomics should offer a powerful, general approach. Here, we present a comparative analysis of the yeast Saccharomyces cerevisiae based on high-quality draft sequences of three related species (S. paradoxus, S. mikatae and S. bayanus). We first aligned the genomes and characterized their evolution, defining the regions and mechanisms of change. We then developed methods for direct identification of genes and regulatory motifs. The gene analysis yielded a major revision to the yeast gene catalogue, affecting approximately 15% of all genes and reducing the total count by about 500 genes. The motif analysis automatically identified 72 genome-wide elements, including most known regulatory motifs and numerous new motifs. We inferred a putative function for most of these motifs, and provided insights into their combinatorial interactions. The results have implications for genome analysis of diverse organisms, including the human.

...read moreread less

1,837 citations

Journal Article•DOI•

A vision for the future of genomics research

[...]

Francis S. Collins¹, Eric D. Green¹, Alan E. Guttmacher¹, Mark S. Guyer¹•Institutions (1)

National Institutes of Health¹

24 Apr 2003-Nature

TL;DR: The Human Genome Project (HGP) as mentioned in this paper was the first attempt to obtain a high-quality, comprehensive sequence of the human genome, in this fiftieth anniversary year of the discovery of the double-helical structure of DNA.

...read moreread less

Abstract: The completion of a high-quality, comprehensive sequence of the human genome, in this fiftieth anniversary year of the discovery of the double-helical structure of DNA, is a landmark event. The genomic era is now a reality. In contemplating a vision for the future of genomics research,it is appropriate to consider the remarkable path that has brought us here. The rollfold (Figure 1) shows a timeline of landmark accomplishments in genetics and genomics, beginning with Gregor Mendel’s discovery of the laws of heredity and their rediscovery in the early days of the twentieth century.Recognition of DNA as the hereditary material, determination of its structure, elucidation of the genetic code, development of recombinant DNA technologies, and establishment of increasingly automatable methods for DNA sequencing set the stage for the Human Genome Project (HGP) to begin in 1990 (see also www.nature.com/nature/DNA50). Thanks to the vision of the original planners, and the creativity and determination of a legion of talented scientists who decided to make this project their overarching focus, all of the initial objectives of the HGP have now been achieved at least two years ahead of expectation, and a revolution in biological research has begun. The project’s new research strategies and experimental technologies have generated a steady stream of ever-larger and more complex genomic data sets that have poured into public databases and have transformed the study of virtually all life processes. The genomic approach of technology development and large-scale generation of community resource data sets has introduced an important new dimension into biological and biomedical research. Interwoven advances in genetics, comparative genomics, highthroughput biochemistry and bioinformatics

...read moreread less

1,704 citations

Journal Article•DOI•

The Transcriptome of the Intraerythrocytic Developmental Cycle of Plasmodium falciparum

[...]

Zbynek Bozdech¹, Manuel Llinás¹, Brian Pulliam¹, Edith D. Wong¹, Jingchun Zhu¹, Joseph L. DeRisi¹ - Show less +2 more•Institutions (1)

University of California, San Francisco¹

18 Aug 2003-PLOS Biology

TL;DR: Analysis of the complete asexual intraerythrocytic developmental cycle (IDC) transcriptome of the HB3 strain of P. falciparum demonstrates that this parasite has evolved an extremely specialized mode of transcriptional regulation that produces a continuous cascade of gene expression, beginning with genes corresponding to general cellular processes, such as protein synthesis, and ending with Plasmodium-specific functionalities.

...read moreread less

Abstract: Plasmodium falciparum is the causative agent of the most burdensome form of human malaria, affecting 200–300 million individuals per year worldwide. The recently sequenced genome of P. falciparum revealed over 5,400 genes, of which 60% encode proteins of unknown function. Insights into the biochemical function and regulation of these genes will provide the foundation for future drug and vaccine development efforts toward eradication of this disease. By analyzing the complete asexual intraerythrocytic developmental cycle (IDC) transcriptome of the HB3 strain of P. falciparum, we demonstrate that at least 60% of the genome is transcriptionally active during this stage. Our data demonstrate that this parasite has evolved an extremely specialized mode of transcriptional regulation that produces a continuous cascade of gene expression, beginning with genes corresponding to general cellular processes, such as protein synthesis, and ending with Plasmodium-specific functionalities, such as genes involved in erythrocyte invasion. The data reveal that genes contiguous along the chromosomes are rarely coregulated, while transcription from the plastid genome is highly coregulated and likely polycistronic. Comparative genomic hybridization between HB3 and the reference genome strain (3D7) was used to distinguish between genes not expressed during the IDC and genes not detected because of possible sequence variations. Genomic differences between these strains were found almost exclusively in the highly antigenic subtelomeric regions of chromosomes. The simple cascade of gene regulation that directs the asexual development of P. falciparum is unprecedented in eukaryotic biology. The transcriptome of the IDC resembles a “just-in-time” manufacturing process whereby induction of any given gene occurs once per cycle and only at a time when it is required. These data provide to our knowledge the first comprehensive view of the timing of transcription throughout the intraerythrocytic development of P. falciparum and provide a resource for the identification of new chemotherapeutic and vaccine candidates.

...read moreread less

1,598 citations

Journal Article•DOI•

Genetics of gene expression surveyed in maize, mouse and man

[...]

Eric E. Schadt, Stephanie A. Monks¹, Thomas A. Drake², Aldons J. Lusis², Nam Che², Veronica Colinayo², Thomas G. Ruff³, Stephen B. Milligan, John Lamb, Guy Cavet, Peter S. Linsley, Mao Mao, Roland Stoughton, Stephen H. Friend⁴ - Show less +10 more•Institutions (4)

University of Washington¹, University of California, Los Angeles², Monsanto³, Merck & Co.⁴

20 Mar 2003-Nature

TL;DR: In this paper, the authors describe comprehensive genetic screens of mouse, plant and human transcriptomes by considering gene expression values as quantitative traits and identify a gene expression pattern strongly associated with obesity in a murine cross and observe two distinct obesity subtypes.

...read moreread less

Abstract: Treating messenger RNA transcript abundances as quantitative traits and mapping gene expression quantitative trait loci for these traits has been pursued in gene-specific ways. Transcript abundances often serve as a surrogate for classical quantitative traits in that the levels of expression are significantly correlated with the classical traits across members of a segregating population. The correlation structure between transcript abundances and classical traits has been used to identify susceptibility loci for complex diseases such as diabetes and allergic asthma. One study recently completed the first comprehensive dissection of transcriptional regulation in budding yeast, giving a detailed glimpse of a genome-wide survey of the genetics of gene expression. Unlike classical quantitative traits, which often represent gross clinical measurements that may be far removed from the biological processes giving rise to them, the genetic linkages associated with transcript abundance affords a closer look at cellular biochemical processes. Here we describe comprehensive genetic screens of mouse, plant and human transcriptomes by considering gene expression values as quantitative traits. We identify a gene expression pattern strongly associated with obesity in a murine cross, and observe two distinct obesity subtypes. Furthermore, we find that these obesity subtypes are under the control of different loci.

...read moreread less

1,539 citations

Journal Article•DOI•

Genome-scale approaches to resolving incongruence in molecular phylogenies

[...]

Antonis Rokas¹, Barry L. Williams¹, Nicole King¹, Sean B. Carroll¹•Institutions (1)

University of Wisconsin-Madison¹

23 Oct 2003-Nature

TL;DR: The results suggest that data sets consisting of single or a small number of concatenated genes have a significant probability of supporting conflicting topologies, and have important implications for resolving branches of the tree of life.

...read moreread less

Abstract: One of the most pervasive challenges in molecular phylogenetics is the incongruence between phylogenies obtained using different data sets, such as individual genes. To systematically investigate the degree of incongruence, and potential methods for resolving it, we screened the genome sequences of eight yeast species and selected 106 widely distributed orthologous genes for phylogenetic analyses, singly and by concatenation. Our results suggest that data sets consisting of single or a small number of concatenated genes have a significant probability of supporting conflicting topologies. By contrast, analyses of the entire data set of concatenated genes yielded a single, fully resolved species tree with maximum support. Comparable results were obtained with a concatenation of a minimum of 20 genes; substantially more genes than commonly used but a small fraction of any genome. These results have important implications for resolving branches of the tree of life.

...read moreread less

1,490 citations

Journal Article•DOI•

The power and promise of population genomics: from genotyping to genome typing

[...]

Gordon Luikart¹, Phillip R. England¹, David A. Tallmon¹, Steve Jordan¹, Pierre Taberlet¹ - Show less +1 more•Institutions (1)

Joseph Fourier University¹

01 Dec 2003-Nature Reviews Genetics

TL;DR: The most useful contribution of the genomics model to population genetics will be improving inferences about population demography and evolutionary history.

...read moreread less

Abstract: Population genomics has the potential to improve studies of evolutionary genetics, molecular ecology and conservation biology, by facilitating the identification of adaptive molecular variation and by improving the estimation of important parameters such as population size, migration rates and phylogenetic relationships. There has been much excitement in the recent literature about the identification of adaptive molecular variation using the population-genomic approach. However, the most useful contribution of the genomics model to population genetics will be improving inferences about population demography and evolutionary history.

...read moreread less

1,276 citations

Journal Article•DOI•

The genome sequence of Caenorhabditis briggsae: A platform for comparative genomics

[...]

Lincoln Stein¹, Zhirong Bao², Zhirong Bao³, Darin Blasiar², Thomas Blumenthal⁴, Michael R. Brent², Nansheng Chen¹, Asif T. Chinwalla², Laura Clarke⁵, Chris Clee⁵, Avril Coghlan⁶, Alan Coulson⁷, Alan Coulson⁵, Peter D'Eustachio¹, Peter D'Eustachio⁸, David H. A. Fitch⁸, Lucinda Fulton², Robert E Fulton², Sam Griffiths-Jones⁵, Todd W. Harris¹, LaDeana W. Hillier², LaDeana W. Hillier³, Ravi Kamath⁵, Patricia E. Kuwabara⁵, Elaine R. Mardis², Marco A. Marra⁹, Marco A. Marra², Tracie L. Miner², Patrick Minx², James C. Mullikin¹⁰, James C. Mullikin⁵, Robert W. Plumb⁵, Jane Rogers⁵, Jacqueline E. Schein², Jacqueline E. Schein⁹, Marc Sohrmann⁵, John Spieth², Jason E. Stajich¹¹, Chaochun Wei², David Willey⁵, Richard K. Wilson², Richard Durbin⁵, Robert H. Waterston², Robert H. Waterston³ - Show less +40 more•Institutions (11)

Cold Spring Harbor Laboratory¹, Washington University in St. Louis², University of Washington³, University of Colorado Denver⁴, Wellcome Trust Sanger Institute⁵, Trinity College, Dublin⁶, Laboratory of Molecular Biology⁷, New York University⁸, BC Cancer Agency⁹, National Institutes of Health¹⁰, Duke University¹¹

17 Nov 2003-PLOS Biology

TL;DR: Comparisons of the two genomes exhibit extensive colinearity, and the rate of divergence appears to be higher in the chromosomal arms than in the centers, which will help to understand the evolutionary forces that mold nematode genomes.

...read moreread less

Abstract: The soil nematodes Caenorhabditis briggsae and Caenorhabditis elegans diverged from a common ancestor roughly 100 million years ago and yet are almost indistinguishable by eye. They have the same chromosome number and genome sizes, and they occupy the same ecological niche. To explore the basis for this striking conservation of structure and function, we have sequenced the C. briggsae genome to a high-quality draft stage and compared it to the finished C. elegans sequence. We predict approximately 19,500 protein-coding genes in the C. briggsae genome, roughly the same as in C. elegans. Of these, 12,200 have clear C. elegans orthologs, a further 6,500 have one or more clearly detectable C. elegans homologs, and approximately 800 C. briggsae genes have no detectable matches in C. elegans. Almost all of the noncoding RNAs (ncRNAs) known are shared between the two species. The two genomes exhibit extensive colinearity, and the rate of divergence appears to be higher in the chromosomal arms than in the centers. Operons, a distinctive feature of C. elegans, are highly conserved in C. briggsae, with the arrangement of genes being preserved in 96% of cases. The difference in size between the C. briggsae (estimated at approximately 104 Mbp) and C. elegans (100.3 Mbp) genomes is almost entirely due to repetitive sequence, which accounts for 22.4% of the C. briggsae genome in contrast to 16.5% of the C. elegans genome. Few, if any, repeat families are shared, suggesting that most were acquired after the two species diverged or are undergoing rapid evolution. Coclustering the C. elegans and C. briggsae proteins reveals 2,169 protein families of two or more members. Most of these are shared between the two species, but some appear to be expanding or contracting, and there seem to be as many as several hundred novel C. briggsae gene families. The C. briggsae draft sequence will greatly improve the annotation of the C. elegans genome. Based on similarity to C. briggsae, we found strong evidence for 1,300 new C. elegans genes. In addition, comparisons of the two genomes will help to understand the evolutionary forces that mold nematode genomes.

...read moreread less

954 citations

Journal Article•DOI•

Empirical analysis of transcriptional activity in the Arabidopsis genome.

[...]

Kayoko Yamada, Jun Lim¹, Joseph M. Dale, Huaming Chen¹, Huaming Chen², Paul Shinn², Paul Shinn¹, Curtis J. Palm³, Audrey Southwick³, Hank C. Wu, Christopher Kim², Christopher Kim¹, Michelle Nguyen³, Paul K. Pham, Rosa Cheuk¹, Rosa Cheuk², George Karlin-Newmann³, Shirley X. Liu, Bao Lam³, Hitomi Sakano, Troy Wu³, Guixia Yu, Molly Miranda³, Hong L. Quach, Matthew Tripp³, Charlie H. Chang, Jeong M. Lee, Mitsue J. Toriumi, Marie M. H. Chan, Carolyn C. Tang, Courtney Onodera, Justine M. Deng, Kenji Akiyama, Yasser Ansari¹, Takahiro Arakawa, Jenny Banh, Fumika Banno, Leah Bowser³, Shelise Brooks², Piero Carninci, Qimin Chao², Nathan Choy¹, Akiko Enju, Andrew D. Goldsmith, Mani Gurjal³, Nancy F. Hansen³, Yoshihide Hayashizaki, Chanda Johnson-Hopson², Vickie Hsuan, Kei Iida, Meagan Karnes¹, Shehnaz Khan², Eric Koesema¹, Junko Ishida, Paul X. Jiang, Ted Jones³, Jun Kawai, Asako Kamiya, Cristina C. Meyers¹, Maiko Nakajima, Mari Narusaka, Motoaki Seki, Tetsuya Sakurai, Masakazu Satou, Racquel Tamse³, Maria Vaysberg, Erika Wallender, Cecilia Wong, Yuki Yamamura, Shiaulou Yuan, Kazuo Shinozaki, Ronald W. Davis³, Athanasios Theologis, Joseph R. Ecker², Joseph R. Ecker¹ - Show less +71 more•Institutions (3)

Salk Institute for Biological Studies¹, University of Pennsylvania², Stanford University³

31 Oct 2003-Science

TL;DR: In this paper, a dual experimental strategy was used to verify and correct the initial genome sequence annotation of the reference plant Arabidopsis and identified 5817 novel transcription units including a substantial amount of antisense gene transcription, and 40 genes within the genetically defined centromeres.

...read moreread less

Abstract: Functional analysis of a genome requires accurate gene structure information and a complete gene inventory. A dual experimental strategy was used to verify and correct the initial genome sequence annotation of the reference plant Arabidopsis. Sequencing full-length cDNAs and hybridizations using RNA populations from various tissues to a set of high-density oligonucleotide arrays spanning the entire genome allowed the accurate annotation of thousands of gene structures. We identified 5817 novel transcription units, including a substantial amount of antisense gene transcription, and 40 genes within the genetically defined centromeres. This approach resulted in completion of approximately 30% of the Arabidopsis ORFeome as a resource for global functional experimentation of the plant proteome.

...read moreread less

Journal Article•DOI•

Bacteriophage T4 Genome

[...]

Eric S. Miller¹, Elizabeth Kutter², Gisela Mosig³, Fumio Arisaka⁴, Takashi Kunisawa⁵, Wolfgang Rüger⁶ - Show less +2 more•Institutions (6)

North Carolina State University¹, The Evergreen State College², Vanderbilt University³, Tokyo Institute of Technology⁴, University of Tokyo⁵, Ruhr University Bochum⁶

01 Mar 2003-Microbiology and Molecular Biology Reviews

TL;DR: T4 functional genomics will aid in the interpretation of these newly sequenced T4-related genomes and in broadening the understanding of the complex evolution and ecology of phages—the most abundant and among the most ancient biological entities on Earth.

...read moreread less

Abstract: Phage T4 has provided countless contributions to the paradigms of genetics and biochemistry. Its complete genome sequence of 168,903 bp encodes about 300 gene products. T4 biology and its genomic sequence provide the best-understood model for modern functional genomics and proteomics. Variations on gene expression, including overlapping genes, internal translation initiation, spliced genes, translational bypassing, and RNA processing, alert us to the caveats of purely computational methods. The T4 transcriptional pattern reflects its dependence on the host RNA polymerase and the use of phage-encoded proteins that sequentially modify RNA polymerase; transcriptional activator proteins, a phage sigma factor, anti-sigma, and sigma decoy proteins also act to specify early, middle, and late promoter recognition. Posttranscriptional controls by T4 provide excellent systems for the study of RNA-dependent processes, particularly at the structural level. The redundancy of DNA replication and recombination systems of T4 reveals how phage and other genomes are stably replicated and repaired in different environments, providing insight into genome evolution and adaptations to new hosts and growth environments. Moreover, genomic sequence analysis has provided new insights into tail fiber variation, lysis, gene duplications, and membrane localization of proteins, while high-resolution structural determination of the "cell-puncturing device," combined with the three-dimensional image reconstruction of the baseplate, has revealed the mechanism of penetration during infection. Despite these advances, nearly 130 potential T4 genes remain uncharacterized. Current phage-sequencing initiatives are now revealing the similarities and differences among members of the T4 family, including those that infect bacteria other than Escherichia coli. T4 functional genomics will aid in the interpretation of these newly sequenced T4-related genomes and in broadening our understanding of the complex evolution and ecology of phages-the most abundant and among the most ancient biological entities on Earth.

...read moreread less

Journal Article•DOI•

Retrovirus-mediated gene transfer and expression cloning: powerful tools in functional genomics

[...]

Toshio Kitamura¹, Yuko 越野裕子 Koshino¹, Fumi Shibata¹, Toshihiko Oki¹, Hideaki Nakajima¹, Tetsuya Nosaka¹, Hidetoshi Kumagai¹ - Show less +3 more•Institutions (1)

University of Tokyo¹

01 Nov 2003-Experimental Hematology

TL;DR: In this review, retrovirus-mediated strategies used for investigation of gene functions and function-based screening strategies are described.

...read moreread less

Journal Article•DOI•

Comparative analyses of multi-species sequences from targeted genomic regions

[...]

James W. Thomas¹, James W. Thomas², Jeffrey W. Touchman, Robert W. Blakesley², Gerry Bouffard², Stephen M. Beckstrom-Sternberg², Elliott H. Margulies², Mathieu Blanchette³, Adam Siepel³, Pamela J. Thomas², Jenny McDowell², Baishali Maskeri², Nancy F. Hansen², M. S. Schwartz³, R. J. Weber³, W. J. Kent³, Donna Karolchik³, T. C. Bruen³, R. Bevan³, David J. Cutler⁴, Scott Schwartz⁵, Laura Elnitski⁵, Jacquelyn R. Idol², Arjun B. Prasad², Shih-Queen Lee-Lin², Valerie Maduro², Tyrone J. Summers², Matthew E. Portnoy², Nicole Dietrich², N. Akhter², K. Ayele², Betty Benjamin², K. Cariaga², Charles P. Brinkley², Shelise Brooks², S. Granite², Xin-Yuan Guan, Jyoti Gupta², P. Haghighi², S. L. Ho², M. C. Huang², Eric Karlins², P. L. Laric², Richelle Legaspi², M. J. Lim², Quino Maduro², Cathy Masiello², Stephen D. Mastrian², J. C. McCloskey², R. Pearson², Sirintorn Stantripop², Emmanuelle Tiongson², J. T. Tran², C. Tsurgeon², Jennifer Vogt², M. A. Walker², Keith Wetherby², L. S. Wiggins², Alice C. Young², L. H. Zhang², Kazutoyo Osoegawa⁶, Baoli Zhu⁶, B. Zhao⁶, C. L. Shu⁶, P. J. De Jong⁶, Charles E. Lawrence⁷, Arian F.A. Smit⁸, Aravinda Chakravarti⁴, David Haussler³, Philip Green⁹, Webb Miller⁵, Eric D. Green² - Show less +68 more•Institutions (9)

Emory University¹, National Institutes of Health², University of California, Santa Cruz³, Johns Hopkins University⁴, Pennsylvania State University⁵, Children's Hospital Oakland Research Institute⁶, Wadsworth Center⁷, Institute for Systems Biology⁸, University of Washington⁹

14 Aug 2003-Nature

TL;DR: The generation and analysis of over 12 megabases of sequence from 12 species, all derived from the genomic region orthologous to a segment of about 1.8 Mb on human chromosome 7 containing ten genes, show conservation reflecting both functional constraints and the neutral mutational events that shaped this genomic region.

...read moreread less

Abstract: The systematic comparison of genomic sequences from different organisms represents a central focus of contemporary genome analysis. Comparative analyses of vertebrate sequences can identify coding and conserved non-coding regions, including regulatory elements, and provide insight into the forces that have rendered modern-day genomes. As a complement to whole-genome sequencing efforts, we are sequencing and comparing targeted genomic regions in multiple, evolutionarily diverse vertebrates. Here we report the generation and analysis of over 12 megabases (Mb) of sequence from 12 species, all derived from the genomic region orthologous to a segment of about 1.8 Mb on human chromosome 7 containing ten genes, including the gene mutated in cystic fibrosis. These sequences show conservation reflecting both functional constraints and the neutral mutational events that shaped this genomic region. In particular, we identify substantial numbers of conserved non-coding segments beyond those previously identified experimentally, most of which are not detectable by pair-wise sequence comparisons alone. Analysis of transposable element insertions highlights the variation in genome dynamics among these species and confirms the placement of rodents as a sister group to the primates.

...read moreread less

Journal Article•DOI•

Genetic content and evolution of adenoviruses.

[...]

Andrew J. Davison, Mária Benkő¹, Balázs Harrach¹•Institutions (1)

Hungarian Academy of Sciences¹

01 Nov 2003-Journal of General Virology

TL;DR: The antiquity of the pre-vertebrate lineages that ultimately gave rise to the Adenoviridae is illustrated by morphological similarities between adenoviruses and bacteriophages, and by use of a protein-primed DNA replication strategy by adenOViruses, certain bacteria and bacter iophage, and linear plasmids of fungi and plants.

...read moreread less

Abstract: This review provides an update of the genetic content, phylogeny and evolution of the family Adenoviridae. An appraisal of the condition of adenovirus genomics highlights the need to ensure that public sequence information is interpreted accurately. To this end, all complete genome sequences available have been reannotated. Adenoviruses fall into four recognized genera, plus possibly a fifth, which have apparently evolved with their vertebrate hosts, but have also engaged in a number of interspecies transmission events. Genes inherited by all modern adenoviruses from their common ancestor are located centrally in the genome and are involved in replication and packaging of viral DNA and formation and structure of the virion. Additional niche-specific genes have accumulated in each lineage, mostly near the genome termini. Capture and duplication of genes in the setting of a ‘leader–exon structure’, which results from widespread use of splicing, appear to have been central to adenovirus evolution. The antiquity of the pre-vertebrate lineages that ultimately gave rise to the Adenoviridae is illustrated by morphological similarities between adenoviruses and bacteriophages, and by use of a protein-primed DNA replication strategy by adenoviruses, certain bacteria and bacteriophages, and linear plasmids of fungi and plants.

...read moreread less

Journal Article•DOI•

Comparative genomics, minimal gene-sets and the last universal common ancestor

[...]

Eugene V. Koonin¹•Institutions (1)

National Institutes of Health¹

01 Nov 2003-Nature Reviews Microbiology

TL;DR: The present estimate suggests a simple last universal common ancestor with only 500–600 genes, based on the principle of evolutionary parsimony, is suggested.

...read moreread less

Abstract: Comparative genomics, using computational and experimental methods, enables the identification of a minimal set of genes that is necessary and sufficient for sustaining a functional cell. For most essential cellular functions, two or more unrelated or distantly related proteins have evolved; only about 60 proteins, primarily those involved in translation, are common to all cellular life. The reconstruction of ancestral life-forms is based on the principle of evolutionary parsimony, but the size and composition of the reconstructed ancestral gene-repertoires depend on relative rates of gene loss and horizontal gene-transfer. The present estimate suggests a simple last universal common ancestor with only 500-600 genes.

...read moreread less

Journal Article•DOI•

The genetics and genomics of cancer

[...]

Allan Balmain¹, Joe W. Gray¹, Bruce A.J. Ponder•Institutions (1)

University of California, San Francisco¹

01 Mar 2003-Nature Genetics

TL;DR: A deeper understanding of this disease will require new statistical and computational approaches for analysis of the genetic and signaling networks that orchestrate individual cancer susceptibility and tumor behavior.

...read moreread less

Abstract: The past decade has seen great strides in our understanding of the genetic basis of human disease. Arguably, the most profound impact has been in the area of cancer genetics, where the explosion of genomic sequence and molecular profiling data has illustrated the complexity of human malignancies. In a tumor cell, dozens of different genes may be aberrant in structure or copy number, and hundreds or thousands of genes may be differentially expressed. A number of familial cancer genes with high-penetrance mutations have been identified, but the contribution of low-penetrance genetic variants or polymorphisms to the risk of sporadic cancer development remains unclear. Studies of the complex somatic genetic events that take place in the emerging cancer cell may aid the search for the more elusive germline variants that confer increased susceptibility. Insights into the molecular pathogenesis of cancer have provided new strategies for treatment, but a deeper understanding of this disease will require new statistical and computational approaches for analysis of the genetic and signaling networks that orchestrate individual cancer susceptibility and tumor behavior.

...read moreread less

Journal Article•DOI•

Mitochondrial genomes: anything goes.

[...]

Gertraud Burger¹, Michael W. Gray¹, B. Franz Lang¹•Institutions (1)

Canadian Institute for Advanced Research¹

01 Dec 2003-Trends in Genetics

TL;DR: In addition to outlining the extraordinary diversity of mtDNA, this review highlights the divergent trends in mitochondrial genome evolution in the various eukaryotic lineages, and examines the relationship between mitochondrial and nuclear genome Evolution in a given organism.

...read moreread less

Journal Article•DOI•

The Corynebacterium glutamicum genome: features and impacts on biotechnological processes

[...]

Masato Ikeda, S. Nakagawa

13 May 2003-Applied Microbiology and Biotechnology

TL;DR: A novel methodology that merges genomics with classical strain improvement has been developed and applied for the reconstruction of classically derived production strains and the path from genomics to biotechnological processes is presented.

...read moreread less

Abstract: Corynebacterium glutamicum has played a principal role in the progress of the amino acid fermentation industry. The complete genome sequence of the representative wild-type strain of C. glutamicum, ATCC 13032, has been determined and analyzed to improve our understanding of the molecular biology and physiology of this organism, and to advance the development of more efficient production strains. Genome annotation has helped in elucidation of the gene repertoire defining a desired pathway, which is accelerating pathway engineering. Post genome technologies such as DNA arrays and proteomics are currently undergoing rapid development in C. glutamicum. Such progress has already exposed new regulatory networks and functions that had so far been unidentified in this microbe. The next goal of these studies is to integrate the fruits of genomics into strain development technology. A novel methodology that merges genomics with classical strain improvement has been developed and applied for the reconstruction of classically derived production strains. How can traditional fermentation benefit from the C. glutamicum genomic data? The path from genomics to biotechnological processes is presented.

...read moreread less

Journal Article•DOI•

Alternative splicing in the human, mouse and rat genomes is associated with an increased frequency of exon creation and/or loss

[...]

Barmak Modrek¹, Christopher Lee¹•Institutions (1)

University of California, Los Angeles¹

01 Jun 2003-Nature Genetics

TL;DR: An analysis of 9,434 orthologous genes in human and mouse indicates that alternative splicing is associated with a large increase in frequency of recent exon creation and/or loss.

...read moreread less

Abstract: One of the most interesting opportunities in comparative genomics is to compare not only genome sequences but additional phenomena, such as alternative splicing, using orthologous genes in different genomes to find similarities and differences between organisms. Recently, genomics studies have suggested that 40-60% of human genes are alternatively spliced and have catalogued up to 30,000 alternative splice relationships in human genes. Here we report an analysis of 9,434 orthologous genes in human and mouse, which indicates that alternative splicing is associated with a large increase in frequency of recent exon creation and/or loss. Whereas most exons in the mouse and human genomes are strongly conserved in both genomes, exons that are only included in alternative splice forms (as opposed to the constitutive or major transcript form) are mostly not conserved and thus are the product of recent exon creation or loss events. A similar comparison of orthologous exons in rat and human validates this pattern. Although this says nothing about the complex question of adaptive benefit, it does indicate that alternative splicing in these genomes has been associated with increased evolutionary change.

...read moreread less

Journal Article•DOI•

Phage as agents of lateral gene transfer.

[...]

Carlos Canchaya¹, Ghislain Fournous¹, Sandra Chibani-Chennoufi¹, Marie Lise Dillmann¹, Harald Brüssow¹ - Show less +1 more•Institutions (1)

Nestlé¹

01 Aug 2003-Current Opinion in Microbiology

TL;DR: Prophages constitute in many bacteria a substantial part of laterally acquired DNA and contribute lysogenic conversion genes that are of selective advantage to the bacterial host.

...read moreread less

Journal Article•DOI•

From Gene Trees to Organismal Phylogeny in Prokaryotes:The Case of the γ-Proteobacteria

[...]

Emmanuelle Lerat¹, Vincent Daubin¹, Nancy A. Moran¹•Institutions (1)

University of Arizona¹

15 Sep 2003-PLOS Biology

TL;DR: The analysis indicates that single-copy orthologous genes are resistant to horizontal transfer, even in ancient bacterial groups subject to high rates of LGT, thus establishing a foundation for reconstructing the evolutionary transitions that underlie diversity in genome content and organization.

...read moreread less

Abstract: The rapid increase in published genomic sequences for bacteria presents the first opportunity to reconstruct evolutionary events on the scale of entire genomes. However, extensive lateral gene transfer (LGT) may thwart this goal by preventing the establishment of organismal relationships based on individual gene phylogenies. The group for which cases of LGT are most frequently documented and for which the greatest density of complete genome sequences is available is the γ-Proteobacteria, an ecologically diverse and ancient group including free-living species as well as pathogens and intracellular symbionts of plants and animals. We propose an approach to multigene phylogeny using complete genomes and apply it to the case of the γ-Proteobacteria. We first applied stringent criteria to identify a set of likely gene orthologs and then tested the compatibilities of the resulting protein alignments with several phylogenetic hypotheses. Our results demonstrate phylogenetic concordance among virtually all (203 of 205) of the selected gene families, with each of the exceptions consistent with a single LGT event. The concatenated sequences of the concordant families yield a fully resolved phylogeny. This topology also received strong support in analyses aimed at excluding effects of heterogeneity in nucleotide base composition across lineages. Our analysis indicates that single-copy orthologous genes are resistant to horizontal transfer, even in ancient bacterial groups subject to high rates of LGT. This gene set can be identified and used to yield robust hypotheses for organismal phylogenies, thus establishing a foundation for reconstructing the evolutionary transitions, such as gene transfer, that underlie diversity in genome content and organization.

...read moreread less

Journal Article•DOI•

Integrating 'omic' information: a bridge between genomics and systems biology.

[...]

Hui Ge¹, Albertha J.M. Walhout², Albertha J.M. Walhout¹, Marc Vidal¹•Institutions (2)

Harvard University¹, University of Massachusetts Medical School²

01 Oct 2003-Trends in Genetics

TL;DR: To increase the reliability of gene function annotation, multiple independent datasets need to be integrated and the recent development of strategies for such integration are reviewed and it is argued that these will be important for a systems approach to modular biology.

...read moreread less

Journal Article•DOI•

Comparative DNA Sequence Analysis of Wheat and Rice Genomes

[...]

Mark E. Sorrells¹, Mauricio La Rota¹, Catherine E. Bermudez-Kandianis¹, Robert A. Greene¹, Ramesh V. Kantety¹, Jesse David Munkvold¹, Miftahudin², A. A. Mahmoud², Xue-Feng Ma², Perry Gustafson², Lili Qi³, B. Echalier³, Bikram S. Gill³, David E. Matthews⁴, Gerard R. Lazo⁴, Shiaoman Chao⁴, Olin D. Anderson⁴, Hugh Edwards⁵, A. M. Linkiewicz⁵, Jorge Dubcovsky⁵, Eduard Akhunov⁵, Jan Dvorak⁵, Deshui Zhang⁶, Henry T. Nguyen², Junhua Peng⁷, Nora L. V. Lapitan⁷, Jose L. Gonzalez-Hernandez⁸, James A. Anderson, Khwaja Hossain⁹, Venu Kalavacharla⁹, Shahryar F. Kianian, D. W. Choi¹⁰, Timothy J. Close¹⁰, Muharrem Dilbirligi¹¹, Kulvinder S. Gill¹¹, Camille M. Steber¹¹, M. K. Walker-Simmons⁴, Patrick E. McGuire⁵, Calvin O. Qualset⁵ - Show less +35 more•Institutions (11)

Cornell University¹, University of Missouri², Kansas State University³, United States Department of Agriculture⁴, University of California, Davis⁵, Texas Tech University⁶, Colorado State University⁷, University of Minnesota⁸, North Dakota State University⁹, University of California, Riverside¹⁰, Washington State University¹¹

01 Aug 2003-Genome Research

TL;DR: A rice genome view of homologous wheat genome locations based on comparative sequence analysis revealed numerous chromosomal rearrangements that will significantly complicate the use of rice as a model for cross-species transfer of information in nonconserved regions.

...read moreread less

Abstract: The use of DNA sequence-based comparative genomics for evolutionary studies and for transferring information from model species to crop species has revolutionized molecular genetics and crop improvement strategies. This study compared 4485 expressed sequence tags (ESTs) that were physically mapped in wheat chromosome bins, to the public rice genome sequence data from 2251 ordered BAC/PAC clones using BLAST. A rice genome view of homologous wheat genome locations based on comparative sequence analysis revealed numerous chromosomal rearrangements that will significantly complicate the use of rice as a model for cross-species transfer of information in nonconserved regions.

...read moreread less

Journal Article•DOI•

Chloroplast research in the genomic age.

[...]

Dario Leister¹•Institutions (1)

Max Planck Society¹

01 Jan 2003-Trends in Genetics

TL;DR: Recent advances in transcriptomics and proteomics of the chloroplast make this organelle one of the best understood of all plant cell compartments.

...read moreread less

Journal Article•DOI•

mreps: efficient and flexible detection of tandem repeats in DNA

[...]

Roman Kolpakov¹, Ghizlane Bana, Gregory Kucherov•Institutions (1)

Moscow State University¹

01 Jul 2003-Nucleic Acids Research

TL;DR: Mreps as discussed by the authors is a software tool for fast identification of tandemly repeated structures in DNA sequences, which is able to identify all types of repeat structures within a single run on a whole genomic sequence.

...read moreread less

Abstract: The presence of repeated sequences is a fundamental feature of genomes. Tandemly repeated DNA appears in both eukaryotic and prokaryotic genomes, it is associated with various regulatory mechanisms and plays an important role in genomic fingerprinting. In this paper, we describe mreps, a powerful software tool for a fast identification of tandemly repeated structures in DNA sequences. mreps is able to identify all types of tandem repeats within a single run on a whole genomic sequence. It has a resolution parameter that allows the program to identify 'fuzzy' repeats. We introduce main algorithmic solutions behind mreps, describe its usage, give some execution time benchmarks and present several case studies to illustrate its capabilities. The mreps web interface is accessible through http://www.loria.fr/mreps/.

...read moreread less

Journal Article•DOI•

[...]

Sven Bergmann¹, Jan Ihmels¹, Naama Barkai¹•Institutions (1)

Weizmann Institute of Science¹

15 Dec 2003-PLOS Biology

TL;DR: A comparative study of large datasets of expression profiles from six evolutionarily distant organisms finds that for all organisms the connectivity distribution follows a power-law, highly connected genes tend to be essential and conserved, and the expression program is highly modular.

...read moreread less

Abstract: Comparing genomic properties of different organisms is of fundamental importance in the study of biological and evolutionary principles. Although differences among organisms are often attributed to differential gene expression, genome-wide comparative analysis thus far has been based primarily on genomic sequence information. We present a comparative study of large datasets of expression profiles from six evolutionarily distant organisms: S. cerevisiae, C. elegans, E. coli, A. thaliana, D. melanogaster, and H. sapiens. We use genomic sequence information to connect these data and compare global and modular properties of the transcription programs. Linking genes whose expression profiles are similar, we find that for all organisms the connectivity distribution follows a power-law, highly connected genes tend to be essential and conserved, and the expression program is highly modular. We reveal the modular structure by decomposing each set of expression data into coexpressed modules. Functionally related sets of genes are frequently coexpressed in multiple organisms. Yet their relative importance to the transcription program and their regulatory relationships vary among organisms. Our results demonstrate the potential of combining sequence and expression data for improving functional gene annotation and expanding our understanding of how gene expression and diversity evolved.

...read moreread less

Journal Article•DOI•

Identification and Characterization of Multi-Species Conserved Sequences

[...]

Elliott H. Margulies¹, Mathieu Blanchette², Nisc Comparative Sequencing Program¹, David Haussler², Eric D. Green¹ - Show less +1 more•Institutions (2)

National Institutes of Health¹, University of California, Santa Cruz²

01 Dec 2003-Genome Research

TL;DR: Two strategies for MCS identification are reported, demonstrating their ability to detect virtually all known actively conserved sequences but very little neutrally evolving sequence (specifically, ancestral repeats).

...read moreread less

Abstract: A key component of genomics research beyond the Human Genome Project will be the rigorous interpretation of the recently finished human genome sequence (Collins et al. 2003). Central to these efforts will be the identification of all functional elements in the human genome. Recent comparative analyses of the human and mouse genome sequences suggest that ∼5% of the mammalian genome is under active selection and thus likely serves a functional role (International Mouse Genome Sequencing Consortium 2002; Roskin et al. 2003). Within this functional subset is an estimated 1% to 2% of the genome that encodes protein (International Mouse Genome Sequencing Consortium 2002). The prospects for comprehensive identification of these coding sequences are quite good, especially in light of the availability of data sets that are complementary to the genomic sequence (e.g., ESTs [Boguski et al. 1994; also see http://www.ncbi.nlm.nih.gov/dbEST] and full-length cDNA sequences [Strausberg et al. 2002; also see http://mgc.nci.nih.gov]) and ever-improving computational methods for gene prediction (Kulp et al. 1996; Burge and Karlin 1997; Rogic et al. 2001; Solovyev 2001; Flicek et al. 2003). The complete identification and characterization of the remaining 3% to 4% of the mammalian genome that likely corresponds to functional non-coding sequence will be profoundly more challenging, due to the lack of complementary data sets, the absence of robust tools for computational predictions, and the incomplete insight about the nature of such sequence. In short, the generation of a comprehensive “parts list” of functional elements in the human genome remains an immense and important challenge. The comparison of orthologous genomic sequences has emerged as a powerful approach for identifying functional elements in the genome (Dermitzakis et al. 2002; DeSilva et al. 2002). The premise of this approach is that sequences conserved across millions of years of evolution are likely to have a functional role (Pennacchio and Rubin 2001). Comparative sequence analyses have been shown to facilitate the identification of both coding (Batzoglou et al. 2000; Korf et al. 2001; Pennacchio et al. 2001; Alexandersson et al. 2003; Flicek et al. 2003) and functional non-coding (Stojanovic et al. 1999; Dubchak et al. 2000; Gottgens et al. 2000; Loots et al. 2000, 2002; Wasserman et al. 2000; Dehal et al. 2001; Elnitski et al. 2003; Kellis et al. 2003) sequences. Among the latter are elements that regulate the spatial and temporal patterns of gene expression (Hardison 2000). When the generation of alignments between related sequences is not possible, motif-finding techniques have also been used to identify functional sequences, in particular for detecting transcription factor–binding sites (Bailey and Elkan 1995; Roth et al. 1998; Hertz and Stormo 1999; McCue et al. 2001; Blanchette and Tompa 2002). Recent efforts have produced whole-genome sequences for several vertebrates, including human (International Human Genome Sequencing Consortium 2001), mouse (International Mouse Genome Sequencing Consortium 2002), rat (http://genome.ucsc.edu/cgi-bin/hgGateway?org=rat), and pufferfish (Aparicio et al. 2002), with the sequencing of additional vertebrate genomes well underway. Increasingly, methods for visualizing (Kent et al. 2002; Clamp et al. 2003; Karolchik et al. 2003) and comparing (Stojanovic et al. 1999; Mayor et al. 2000; Blanchette and Tompa 2002; Loots et al. 2002; Giardine et al. 2003; Schwartz et al. 2003a) genomic sequences from multiple species are emerging. As a complement to these efforts, we are generating the sequence of targeted genomic regions in multiple, phylogenetically diverse vertebrates (Thomas et al. 2003) and developing computational approaches for identifying the subset of sequences that confers function. In particular, we have focused on developing algorithms for detecting sequences that are highly conserved across multiple species, which we call Multi-species Conserved Sequences (or MCSs); such sequences represent candidates for being functionally important. Here we report the development and testing of methods for MCS detection, including analyses of MCSs identified using a recently generated set of orthologous sequences from 11 non-human vertebrates (Thomas et al. 2003).

...read moreread less

Journal Article•DOI•

On the origin of mitochondria: a genomics perspective

[...]

Siv G. E. Andersson¹, Olof Karlberg¹, Björn Canbäck, Charles G. Kurland•Institutions (1)

Uppsala University¹

29 Jan 2003-Philosophical Transactions of the Royal Society B

TL;DR: The strong relationship with alpha-proteobacterial genes observed for some mitochondrial genes, combined with the lack of such a relationship for others, indicates that the modern mitochondrial proteome is the product of both reductive and expansive processes.

...read moreread less

Abstract: The availability of complete genome sequence data from both bacteria and eukaryotes provides information about the contribution of bacterial genes to the origin and evolution of mitochondria. Phylogenetic analyses based on genes located in the mitochondrial genome indicate that these genes originated from within the alpha-proteobacteria. A number of ancestral bacterial genes have also been transferred from the mitochondrial to the nuclear genome, as evidenced by the presence of orthologous genes in the mitochondrial genome in some species and in the nuclear genome of other species. However, a multitude of mitochondrial proteins encoded in the nucleus display no homology to bacterial proteins, indicating that these originated within the eukaryotic cell subsequent to the acquisition of the endosymbiont. An analysis of the expression patterns of yeast nuclear genes coding for mitochondrial proteins has shown that genes predicted to be of eukaryotic origin are mainly translated on polysomes that are free in the cytosol whereas those of putative bacterial origin are translated on polysomes attached to the mitochondrion. The strong relationship with alpha-proteobacterial genes observed for some mitochondrial genes, combined with the lack of such a relationship for others, indicates that the modern mitochondrial proteome is the product of both reductive and expansive processes.

...read moreread less

Journal Article•DOI•

A genomics-guided approach for discovering and expressing cryptic metabolic pathways.

[...]

Emmanuel Zazopoulos, Kexue Huang, Alfredo Staffa, Wen Liu¹, Brian O. Bachmann, Koichi Nonaka¹, Joachim Ahlert¹, Jon S. Thorson¹, Ben Shen¹, Chris M. Farnet - Show less +6 more•Institutions (1)

University of Wisconsin-Madison¹

21 Jan 2003-Nature Biotechnology

TL;DR: It is shown that selective growth conditions can induce the expression of gene clusters involved in natural-product biosynthesis, suggesting that the range of enediyne natural products may be much greater than previously thought.

...read moreread less

Abstract: Genome analysis of actinomycetes has revealed the presence of numerous cryptic gene clusters encoding putative natural products. These loci remain dormant until appropriate chemical or physical signals induce their expression. Here we demonstrate the use of a high-throughput genome scanning method to detect and analyze gene clusters involved in natural-product biosynthesis. This method was applied to uncover biosynthetic pathways encoding enediyne antitumor antibiotics in a variety of actinomycetes. Comparative analysis of five biosynthetic loci representative of the major structural classes of enediynes reveals the presence of a conserved cassette of five genes that includes a novel family of polyketide synthase (PKS). The enediyne PKS (PKSE) is proposed to be involved in the formation of the highly reactive chromophore ring structure (or "warhead") found in all enediynes. Genome scanning analysis indicates that the enediyne warhead cassette is widely dispersed among actinomycetes. We show that selective growth conditions can induce the expression of these loci, suggesting that the range of enediyne natural products may be much greater than previously thought. This technology can be used to increase the scope and diversity of natural-product discovery.

...read moreread less

Collapse