Topic

Genome size

About: Genome size is a research topic. Over the lifetime, 5164 publications have been published within this topic receiving 253817 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

The sequence of the human genome.

[...]

J. Craig Venter¹, Mark Raymond Adams¹, Eugene W. Myers¹, Peter W. Li¹ +269 more•Institutions (12)

16 Feb 2001-Science

TL;DR: Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems are indicated.

...read moreread less

Abstract: A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.

...read moreread less

12,098 citations

Journal Article•DOI•

The genome sequence of Drosophila melanogaster

[...]

Mark Raymond Adams¹, Susan E. Celniker², Robert A. Holt¹, Cheryl A. Evans¹ +191 more•Institutions (23)

24 Mar 2000-Science

TL;DR: The nucleotide sequence of nearly all of the approximately 120-megabase euchromatic portion of the Drosophila genome is determined using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map.

...read moreread less

Abstract: The fly Drosophila melanogaster is one of the most intensively studied organisms in biology and serves as a model system for the investigation of many developmental and cellular processes common to higher eukaryotes, including humans. We have determined the nucleotide sequence of nearly all of the approximately 120-megabase euchromatic portion of the Drosophila genome using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map. Efforts are under way to close the remaining gaps; however, the sequence is of sufficient accuracy and contiguity to be declared substantially complete and to support an initial analysis of genome structure and preliminary gene annotation and interpretation. The genome encodes approximately 13,600 genes, somewhat fewer than the smaller Caenorhabditis elegans genome, but with comparable functional diversity.

...read moreread less

6,180 citations

Journal Article•DOI•

Introducing EzBioCloud: a taxonomically united database of 16S rRNA gene sequences and whole-genome assemblies.

[...]

Seok Hwan Yoon¹, Sung-Min Ha¹, Soonjae Kwon¹, Jeongmin Lim¹, Yeseul Kim¹, Hyungseok Seo¹, Jongsik Chun¹ - Show less +3 more•Institutions (1)

Seoul National University¹

30 May 2017-International Journal of Systematic and Evolutionary Microbiology

TL;DR: An integrated database, called EzBioCloud, that holds the taxonomic hierarchy of the Bacteria and Archaea, which is represented by quality-controlled 16S rRNA gene and genome sequences, with accompanying bioinformatics tools.

...read moreread less

Abstract: The recent advent of DNA sequencing technologies facilitates the use of genome sequencing data that provide means for more informative and precise classification and identification of members of the Bacteria and Archaea. Because the current species definition is based on the comparison of genome sequences between type and other strains in a given species, building a genome database with correct taxonomic information is of paramount need to enhance our efforts in exploring prokaryotic diversity and discovering novel species as well as for routine identifications. Here we introduce an integrated database, called EzBioCloud, that holds the taxonomic hierarchy of the Bacteria and Archaea, which is represented by quality-controlled 16S rRNA gene and genome sequences. Whole-genome assemblies in the NCBI Assembly Database were screened for low quality and subjected to a composite identification bioinformatics pipeline that employs gene-based searches followed by the calculation of average nucleotide identity. As a result, the database is made of 61 700 species/phylotypes, including 13 132 with validly published names, and 62 362 whole-genome assemblies that were identified taxonomically at the genus, species and subspecies levels. Genomic properties, such as genome size and DNA G+C content, and the occurrence in human microbiome data were calculated for each genus or higher taxa. This united database of taxonomy, 16S rRNA gene and genome sequences, with accompanying bioinformatics tools, should accelerate genome-based classification and identification of members of the Bacteria and Archaea. The database and related search tools are available at www.ezbiocloud.net/.

...read moreread less

5,027 citations

Journal Article•DOI•

Complete genome sequence of Pseudomonas aeruginosa PAO1, an opportunistic pathogen.

[...]

Charles K. Stover, X. Q. Pham¹, A. L. Erwin, S. D. Mizoguchi, Paul Warrener, Mark J. Hickey, Fiona S. L. Brinkman², W. O. Hufnagle, D. J. Kowalik, Lagrou Mj, R. L. Garber, L. Goltry, E. Tolentino, S. Westbrock-Wadman, Ying Yuan, L. L. Brody, S. N. Coulter, K. R. Folger, Arnold Kas¹, K. Larbig³, R. Lim¹, Kelly D. Smith¹, David H. Spencer¹, Gane Ka-Shu Wong¹, Z. Wu¹, Ian T. Paulsen⁴, Ian T. Paulsen⁵, Jonathan Reizer⁴, Milton H. Saier⁴, Robert E. W. Hancock², Stephen Lory¹, Maynard V. Olson¹ - Show less +28 more•Institutions (5)

University of Washington¹, University of British Columbia², Hochschule Hannover³, University of California, San Diego⁴, Research Medical Center⁵

31 Aug 2000-Nature

TL;DR: It is proposed that the size and complexity of the P. aeruginosa genome reflect an evolutionary adaptation permitting it to thrive in diverse environments and resist the effects of a variety of antimicrobial substances.

...read moreread less

Abstract: Pseudomonas aeruginosa is a ubiquitous environmental bacterium that is one of the top three causes of opportunistic human infections. A major factor in its prominence as a pathogen is its intrinsic resistance to antibiotics and disinfectants. Here we report the complete sequence of P. aeruginosa strain PAO1. At 6.3 million base pairs, this is the largest bacterial genome sequenced, and the sequence provides insights into the basis of the versatility and intrinsic drug resistance of P. aeruginosa. Consistent with its larger genome size and environmental adaptability, P. aeruginosa contains the highest proportion of regulatory genes observed for a bacterial genome and a large number of genes involved in the catabolism, transport and efflux of organic compounds as well as four potential chemotaxis systems. We propose that the size and complexity of the P. aeruginosa genome reflect an evolutionary adaptation permitting it to thrive in diverse environments and resist the effects of a variety of antimicrobial substances.

...read moreread less

4,220 citations

Journal Article•DOI•

Genome sequence of the palaeopolyploid soybean

[...]

Jeremy Schmutz, Steven B. Cannon¹, Jessica A. Schlueter², Jessica A. Schlueter³, Jianxin Ma², Therese Mitros⁴, William Nelson⁵, David L. Hyten¹, Qijian Song⁶, Qijian Song¹, Jay J. Thelen⁷, Jianlin Cheng⁷, Dong Xu⁷, Uffe Hellsten⁸, Gregory D. May⁹, Yeisoo Yu⁵, Tetsuya Sakurai, Taishi Umezawa, Madan K. Bhattacharyya¹⁰, Devinder Sandhu¹¹, Babu Valliyodan⁷, Erika Lindquist⁸, Myron Peto¹, David Grant¹, Shengqiang Shu⁸, David Goodstein⁸, Kerrie Barry⁸, Montona Futrell-Griggs², Brian Abernathy², Jianchang Du², Zhixi Tian², Liucun Zhu², Navdeep Gill², Trupti Joshi⁷, Marc Libault⁷, Ananad Sethuraman, Xue-Cheng Zhang⁷, Kazuo Shinozaki, Henry T. Nguyen⁷, Rod A. Wing⁵, Perry B. Cregan¹, James E. Specht¹², Jane Grimwood⁸, Daniel S. Rokhsar⁸, Gary Stacey⁷, Randy C. Shoemaker¹, Scott A. Jackson² - Show less +43 more•Institutions (12)

Agricultural Research Service¹, Purdue University², University of North Carolina at Charlotte³, University of California, Berkeley⁴, University of Arizona⁵, University of Maryland, College Park⁶, University of Missouri⁷, Joint Genome Institute⁸, National Center for Genome Resources⁹, Iowa State University¹⁰, University of Wisconsin–Stevens Point¹¹, University of Nebraska–Lincoln¹²

14 Jan 2010-Nature

TL;DR: An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.

...read moreread less

Abstract: Soybean (Glycine max) is one of the most important crop plants for seed protein and oil content, and for its capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms. We sequenced the 1.1-gigabase genome by a whole-genome shotgun approach and integrated it with physical and high-density genetic maps to create a chromosome-scale draft sequence assembly. We predict 46,430 protein-coding genes, 70% more than Arabidopsis and similar to the poplar genome which, like soybean, is an ancient polyploid (palaeopolyploid). About 78% of the predicted genes occur in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. Genome duplications occurred at approximately 59 and 13 million years ago, resulting in a highly duplicated genome with nearly 75% of the genes present in multiple copies. The two duplication events were followed by gene diversification and loss, and numerous chromosome rearrangements. An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.

...read moreread less

3,743 citations

Collapse

Network Information

Performance

Metrics

5,532

Papers

289,364

Citations

No. of papers in the topic in previous years
Year	Papers
2023	124
2022	247
2021	435
2020	422
2019	408
2018	338

Genome size

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics