scispace - formally typeset
Search or ask a question

Showing papers on "Genomics published in 2006"


Journal ArticleDOI
23 Nov 2006-Nature
TL;DR: A first-generation CNV map of the human genome is constructed through the study of 270 individuals from four populations with ancestry in Europe, Africa or Asia, underscoring the importance of CNV in genetic diversity and evolution and the utility of this resource for genetic disease studies.
Abstract: Copy number variation (CNV) of DNA sequences is functionally significant but has yet to be fully ascertained. We have constructed a first-generation CNV map of the human genome through the study of 270 individuals from four populations with ancestry in Europe, Africa or Asia (the HapMap collection). DNA from these individuals was screened for CNV using two complementary technologies: single-nucleotide polymorphism (SNP) genotyping arrays, and clone-based comparative genomic hybridization. A total of 1,447 copy number variable regions (CNVRs), which can encompass overlapping or adjacent gains or losses, covering 360 megabases (12% of the genome) were identified in these populations. These CNVRs contained hundreds of genes, disease loci, functional elements and segmental duplications. Notably, the CNVRs encompassed more nucleotide content per genome than SNPs, underscoring the importance of CNV in genetic diversity and evolution. The data obtained delineate linkage disequilibrium patterns for many CNVs, and reveal marked variation in copy number among populations. We also demonstrate the utility of this resource for genetic disease studies.

4,275 citations


Journal ArticleDOI
02 Jun 2006-Science
TL;DR: Using metabolic function analyses of identified genes, the human genome is compared with the average content of previously sequenced microbial genomes and humans are superorganisms whose metabolism represents an amalgamation of microbial and human attributes.
Abstract: The human intestinal microbiota is composed of 10(13) to 10(14) microorganisms whose collective genome ("microbiome") contains at least 100 times as many genes as our own genome. We analyzed approximately 78 million base pairs of unique DNA sequence and 2062 polymerase chain reaction-amplified 16S ribosomal DNA sequences obtained from the fecal DNAs of two healthy adults. Using metabolic function analyses of identified genes, we compared our human genome with the average content of previously sequenced microbial genomes. Our microbiome has significantly enriched metabolism of glycans, amino acids, and xenobiotics; methanogenesis; and 2-methyl-d-erythritol 4-phosphate pathway-mediated biosynthesis of vitamins and isoprenoids. Thus, humans are superorganisms whose metabolism represents an amalgamation of microbial and human attributes.

4,111 citations


Journal ArticleDOI
26 Oct 2006-Nature
TL;DR: The genome sequence of the honeybee Apis mellifera is reported, suggesting a novel African origin for the species A. melliferA and insights into whether Africanized bees spread throughout the New World via hybridization or displacement.
Abstract: Here we report the genome sequence of the honeybee Apis mellifera, a key model for social behaviour and essential to global ecology through pollination. Compared with other sequenced insect genomes, the A. mellifera genome has high A+T and CpG contents, lacks major transposon families, evolves more slowly, and is more similar to vertebrates for circadian rhythm, RNA interference and DNA methylation genes, among others. Furthermore, A. mellifera has fewer genes for innate immunity, detoxification enzymes, cuticle-forming proteins and gustatory receptors, more genes for odorant receptors, and novel genes for nectar and pollen utilization, consistent with its ecology and social organization. Compared to Drosophila, genes in early developmental pathways differ in Apis, whereas similarities exist for functions that differ markedly, such as sex determination, brain function and behaviour. Population genetics suggests a novel African origin for the species A. mellifera and insights into whether Africanized bees spread throughout the New World via hybridization or displacement.

1,673 citations


Journal ArticleDOI
TL;DR: The University of California Santa Cruz Genome Browser Database (GBD) contains sequence and annotation data for the genomes of about a dozen vertebrate species and several major model organisms to support fast interactive performance with web tools that provide powerful visualization and querying capabilities for mining the data.
Abstract: The University of California Santa Cruz Genome Browser Database (GBD) contains sequence and annotation data for the genomes of about a dozen vertebrate species and several major model organisms. Genome annotations typically include assembly data, sequence composition, genes and gene predictions, mRNA and expressed sequence tag evidence, comparative genomics, regulation, expression and variation data. The database is optimized to support fast interactive performance with web tools that provide powerful visualization and querying capabilities for mining the data. The Genome Browser displays a wide variety of annotations at all scales from single nucleotide level up to a full chromosome. The Table Browser provides direct access to the database tables and sequence data, enabling complex queries on genome-wide datasets. The Proteome Browser graphically displays protein properties. The Gene Sorter allows filtering and comparison of genes by several metrics including expression data and several gene properties. BLAT and In Silico PCR search for sequences in entire genomes in seconds. These tools are highly integrated and provide many hyperlinks to other databases and websites. The GBD, browsing tools, downloadable data files and links to documentation and other information can be found at http://genome.ucsc.edu/.

1,332 citations


Journal ArticleDOI
TL;DR: This work bridges structural genomics to structural biology with a procedure for determining protein complexes of previously unknown function from any organism with a sequenced genome and its entire procedure can be scaled to a genome-wide level.
Abstract: The developing science called structural genomics has focused to date mainly on high-throughput expression of individual proteins, followed by their purification and structure determination. In contrast, the term structural biology is used to denote the determination of structures, often complexes of several macromolecules, that illuminate aspects of biological function. Here we bridge structural genomics to structural biology with a procedure for determining protein complexes of previously unknown function from any organism with a sequenced genome. From computational genomic analysis, we identify functionally linked proteins and verify their interaction in vitro by coexpression/copurification. We illustrate this procedure by the structural determination of a previously unknown complex between a PE and PPE protein from the Mycobacterium tuberculosis genome, members of protein families that constitute ≈10% of the coding capacity of this genome. The predicted complex was readily expressed, purified, and crystallized, although we had previously failed in expressing individual PE and PPE proteins on their own. The reason for the failure is clear from the structure, which shows that the PE and PPE proteins mate along an extended apolar interface to form a four-α-helical bundle, where two of the α-helices are contributed by the PE protein and two by the PPE protein. Our entire procedure for the identification, characterization, and structural determination of protein complexes can be scaled to a genome-wide level.

719 citations


Journal ArticleDOI
TL;DR: The software and underlying methods for identifying these three important structural and functional genome components are reviewed and it is demonstrated that they can be effectively used for initial automatic annotation of the eukaryotic genome.
Abstract: The ENCODE gene prediction workshop (EGASP) has been organized to evaluate how well state-of-the-art automatic gene finding methods are able to reproduce the manual and experimental gene annotation of the human genome. We have used Softberry gene finding software to predict genes, pseudogenes and promoters in 44 selected ENCODE sequences representing approximately 1% (30 Mb) of the human genome. Predictions of gene finding programs were evaluated in terms of their ability to reproduce the ENCODE-HAVANA annotation. The Fgenesh++ gene prediction pipeline can identify 91% of coding nucleotides with a specificity of 90%. Our automatic pseudogene finder (PSF program) found 90% of the manually annotated pseudogenes and some new ones. The Fprom promoter prediction program identifies 80% of TATA promoters sequences with one false positive prediction per 2,000 base-pairs (bp) and 50% of TATA-less promoters with one false positive prediction per 650 bp. It can be used to identify transcription start sites upstream of annotated coding parts of genes found by gene prediction software. We review our software and underlying methods for identifying these three important structural and functional genome components and discuss the accuracy of predictions, recent advances and open problems in annotating genomic sequences. We have demonstrated that our methods can be effectively used for initial automatic annotation of the eukaryotic genome.

716 citations


Journal ArticleDOI
TL;DR: The T. thermophila genome has been used for functional genomic studies to address biological, biomedical, and biotechnological questions of fundamental importance as mentioned in this paper, and it has been shown that the excision of DNA from the MIC to generate the macronucleus specifically targets foreign DNA as a form of genome self-defense.
Abstract: The ciliate Tetrahymena thermophila is a model organism for molecular and cellular biology. Like other ciliates, this species has separate germline and soma functions that are embodied by distinct nuclei within a single cell. The germline-like micronucleus (MIC) has its genome held in reserve for sexual reproduction. The soma-like macronucleus (MAC), which possesses a genome processed from that of the MIC, is the center of gene expression and does not directly contribute DNA to sexual progeny. We report here the shotgun sequencing, assembly, and analysis of the MAC genome of T. thermophila, which is approximately 104 Mb in length and composed of approximately 225 chromosomes. Overall, the gene set is robust, with more than 27,000 predicted protein-coding genes, 15,000 of which have strong matches to genes in other organisms. The functional diversity encoded by these genes is substantial and reflects the complexity of processes required for a free-living, predatory, single-celled organism. This is highlighted by the abundance of lineage-specific duplications of genes with predicted roles in sensing and responding to environmental conditions (e.g., kinases), using diverse resources (e.g., proteases and transporters), and generating structural complexity (e.g., kinesins and dyneins). In contrast to the other lineages of alveolates (apicomplexans and dinoflagellates), no compelling evidence could be found for plastid-derived genes in the genome. UGA, the only T. thermophila stop codon, is used in some genes to encode selenocysteine, thus making this organism the first known with the potential to translate all 64 codons in nuclear genes into amino acids. We present genomic evidence supporting the hypothesis that the excision of DNA from the MIC to generate the MAC specifically targets foreign DNA as a form of genome self-defense. The combination of the genome sequence, the functional diversity encoded therein, and the presence of some pathways missing from other model organisms makes T. thermophila an ideal model for functional genomic studies to address biological, biomedical, and biotechnological questions of fundamental importance.

715 citations


Journal ArticleDOI
TL;DR: A cartographic network based on correlation analysis that reveals whole-plant phenotype associated and independent metabolic associations, including links with metabolites of nutritional and organoleptic importance is generated.
Abstract: Tomato represents an important source of fiber and nutrients in the human diet and is a central model for the study of fruit biology. To identify components of fruit metabolic composition, here we have phenotyped tomato introgression lines (ILs) containing chromosome segments of a wild species in the genetic background of a cultivated variety. Using this high-diversity population, we identify 889 quantitative fruit metabolic loci and 326 loci that modify yield-associated traits. The mapping analysis indicates that at least 50% of the metabolic loci are associated with quantitative trait loci (QTLs) that modify whole-plant yield-associated traits. We generate a cartographic network based on correlation analysis that reveals whole-plant phenotype associated and independent metabolic associations, including links with metabolites of nutritional and organoleptic importance. The results of our genomic survey illustrate the power of genome-wide metabolic profiling and detailed morphological analysis for uncovering traits with potential for crop breeding.

707 citations


Journal ArticleDOI
TL;DR: A new method that uses SNP genotype data from parent-offspring trios to identify polymorphic deletions is reported, which will permit the identification of deletion polymorphisms in high-density SNP surveys of trio or other family data.
Abstract: Recent work has shown that copy number polymorphism is an important class of genetic variation in human genomes. Here we report a new method that uses SNP genotype data from parent-offspring trios to identify polymorphic deletions. We applied this method to data from the International HapMap Project to produce the first high-resolution population surveys of deletion polymorphism. Approximately 100 of these deletions have been experimentally validated using comparative genome hybridization on tiling-resolution oligonucleotide microarrays. Our analysis identifies a total of 586 distinct regions that harbor deletion polymorphisms in one or more of the families. Notably, we estimate that typical individuals are hemizygous for roughly 30-50 deletions larger than 5 kb, totaling around 550-750 kb of euchromatic sequence across their genomes. The detected deletions span a total of 267 known and predicted genes. Overall, however, the deleted regions are relatively gene-poor, consistent with the action of purifying selection against deletions. Deletion polymorphisms may well have an important role in the genetics of complex traits; however, they are not directly observed in most current gene mapping studies. Our new method will permit the identification of deletion polymorphisms in high-density SNP surveys of trio or other family data.

702 citations


Journal ArticleDOI
TL;DR: Genome-wide expression profiles can partition large tumor cohorts into subgroups that are enriched for specific genetic alterations that may assist ultimately in the selection of patients for future clinical trials of molecular targeted therapies.
Abstract: Purpose Traditional genetic approaches to identify gene mutations in cancer are expensive and laborious. Nonetheless, if we are to avoid rejecting effective molecular targeted therapies, we must test these drugs in patients whose tumors harbor mutations in the drug target. We hypothesized that gene expression profiling might be a more rapid and cost-effective method of identifying tumors that contain specific genetic abnormalities. Materials and Methods Gene expression profiles of 46 samples of medulloblastoma were generated using the U133av2 Affymetrix oligonucleotide array and validated using real-time reverse transcriptase polymerase chain reaction (RT-PCR) and immunohistochemistry. Genetic abnormalities were confirmed using fluorescence in situ hybridization (FISH) and direct sequencing. Results Unsupervised analysis of gene expression profiles partitioned medulloblastomas into five distinct subgroups (subgroups A to E). Gene expression signatures that distinguished these subgroups predicted the prese...

653 citations


Journal ArticleDOI
TL;DR: New sequence data being gathered from genes in which polymorphisms are known to be maintained by selection can be interpreted in conjunction with results from population genetics models that include recombination between selected sites and nearby neutral marker variants.
Abstract: Our understanding of balancing selection is currently becoming greatly clarified by new sequence data being gathered from genes in which polymorphisms are known to be maintained by selection The data can be interpreted in conjunction with results from population genetics models that include recombination between selected sites and nearby neutral marker variants This understanding is making possible tests for balancing selection using molecular evolutionary approaches Such tests do not necessarily require knowledge of the functional types of the different alleles at a locus, but such information, as well as information about the geographic distribution of alleles and markers near the genes, can potentially help towards understanding what form of balancing selection is acting, and how long alleles have been maintained

Journal ArticleDOI
TL;DR: A snapshot analysis based on the most recent genome sequences of two E.coli K-12 strains allows comparison of their genotypes and mutant status of alleles.
Abstract: The goal of this group project has been to coordinate and bring up-to-date information on all genes of Escherichia coli K-12. Annotation of the genome of an organism entails identification of genes, the boundaries of genes in terms of precise start and end sites, and description of the gene products. Known and predicted functions were assigned to each gene product on the basis of experimental evidence or sequence analysis. Since both kinds of evidence are constantly expanding, no annotation is complete at any moment in time. This is a snapshot analysis based on the most recent genome sequences of two E.coli K-12 bacteria. An accurate and up-to-date description of E.coli K-12 genes is of particular importance to the scientific community because experimentally determined properties of its gene products provide fundamental information for annotation of innumerable genes of other organisms. Availability of the complete genome sequence of two K-12 strains allows comparison of their genotypes and mutant status of alleles.

Journal ArticleDOI
TL;DR: The physical location of each STR locus in the human genome is delineated and allele ranges and variants observed in human populations are summarized as are mutation rates observed from parentage testing.
Abstract: Over the past decade, the human identity testing community has settled on a set of core short tandem repeat (STR) loci that are widely used for DNA typing applications. A variety of commercial kits enable robust amplification of these core STR loci. A brief history is presented regarding the selection of core autosomal and Y-chromosomal STR markers. The physical location of each STR locus in the human genome is delineated and allele ranges and variants observed in human populations are summarized as are mutation rates observed from parentage testing. Internet resources for additional information on core STR loci are reviewed. Additional topics are also discussed, including potential linkage of STR loci to genetic disease-causing genes, probabilistic predictions of sample ethnicity, and desirable characteristics for additional STR loci that may be added in the future to the current core loci. These core STR loci, which form the basis for DNA databases worldwide, will continue to play an important role in forensic science for many years to come.

Journal ArticleDOI
TL;DR: Detailed analyses of capsid structures provide the best evidence for a common origin of the three groups of herpesvirus, and the structure of the capsid shell protein further suggests an element of common origin between herpesviruses and tailed DNA bacteriophages.

Journal ArticleDOI
TL;DR: The status of a much-needed coherent view that integrates studies on protein evolution with biochemistry and functional and structural genomics is discussed.
Abstract: Why do proteins evolve at different rates? Advances in systems biology and genomics have facilitated a move from studying individual proteins to characterizing global cellular factors. Systematic surveys indicate that protein evolution is not determined exclusively by selection on protein structure and function, but is also affected by the genomic position of the encoding genes, their expression patterns, their position in biological networks and possibly their robustness to mistranslation. Recent work has allowed insights into the relative importance of these factors. We discuss the status of a much-needed coherent view that integrates studies on protein evolution with biochemistry and functional and structural genomics.

Journal ArticleDOI
TL;DR: This review summarizes recent advances in the understanding of phylogenetics, polyploidization and comparative genomics in the family Brassicaceae and integrates several of these findings into a simple system of 24 conserved chromosomal blocks (labeled A-X).

Journal ArticleDOI
24 Mar 2006-Science
TL;DR: Observations provide support for the hypothesis that the fundamental features of genome evolution are largely defined by the relative power of two nonadaptive forces: random genetic drift and mutation pressure.
Abstract: The nuclear genomes of multicellular animals and plants contain large amounts of noncoding DNA, the disadvantages of which can be too weak to be effectively countered by selection in lineages with reduced effective population sizes. In contrast, the organelle genomes of these two lineages evolved to opposite ends of the spectrum of genomic complexity, despite similar effective population sizes. This pattern and other puzzling aspects of organelle evolution appear to be consequences of differences in organelle mutation rates. These observations provide support for the hypothesis that the fundamental features of genome evolution are largely defined by the relative power of two nonadaptive forces: random genetic drift and mutation pressure.

Journal ArticleDOI
TL;DR: Tests for balancing selection in the current generation, the recent past, and the distant past provide a comprehensive approach for evaluating selective impacts and provide new ways to evaluate the long-term impact of selection on particular genes and the overall genome in natural populations.
Abstract: The selective mechanisms for maintaining polymorphism in natural populations has been the subject of theory, experiments, and review over the past half century. Advances in molecular genetic techniques have provided new insight into many examples of balancing selection. In addition, new theoretical developments demonstrate how diversifying selection over environments may maintain polymorphism. Tests for balancing selection in the current generation, the recent past, and the distant past provide a comprehensive approach for evaluating selective impacts. In particular, sequencedbased tests provide new ways to evaluate the long-term impact of selection on particular genes and the overall genome in natural populations. Overall, there appear to be many loci exhibiting the signal of adaptive directional selection from genomic scans, but the present evidence suggests that the proportion of loci where polymorphism is maintained by environmental heterogeneity is low. However, as more molecular genetic details become available, more examples of polymorphism maintained by selection in heterogeneous environments may be found.

Journal ArticleDOI
TL;DR: The results show the efficiency of this technique in characterizing viral DNA components of several geminiviruses from experimental and natural host plant sources and will considerably accelerate genomics of at least gemini-, nano- and circoviruses in the future.

Journal ArticleDOI
TL;DR: MaGe integration of annotation data from bacterial genomes enhanced by a gene coding re-annotation process using accurate gene models, integration of results obtained with a wide range of bioinformatics methods and an advanced web interface allowing multiple users to refine the automatic assignment of gene product functions.
Abstract: Magnifying Genomes (MaGe) is a microbial genome annotation system based on a relational database containing information on bacterial genomes, as well as a web interface to achieve genome annotation projects. Our system allows one to initiate the annotation of a genome at the early stage of the finishing phase. MaGe's main features are (i) integration of annotation data from bacterial genomes enhanced by a gene coding re-annotation process using accurate gene models, (ii) integration of results obtained with a wide range of bioinformatics methods, among which exploration of gene context by searching for conserved synteny and reconstruction of metabolic pathways, (iii) an advanced web interface allowing multiple users to refine the automatic assignment of gene product functions. MaGe is also linked to numerous well-known biological databases and systems. Our system has been thoroughly tested during the annotation of complete bacterial genomes (Acinetobacter baylyi ADP1, Pseudoalteromonas haloplanktis, Frankia alni) and is currently used in the context of several new microbial genome annotation projects. In addition, MaGe allows for annotation curation and exploration of already published genomes from various genera (e.g. Yersinia, Bacillus and Neisseria). MaGe can be accessed at http://www.genoscope.cns.fr/agc/mage.

Journal ArticleDOI
TL;DR: The Human Mitochondrial Genome Database (mtDB) has provided a comprehensive database of complete human mitochondrial genomes since early 2000 and contains 2104 sequences, of special interest to medical researchers and population geneticists evaluating specific positions.
Abstract: The mitochondrial genome, contained in the subcellular mitochondrial network, encodes a small number of peptides pivotal for cellular energy production. Mitochondrial genes are highly polymorphic and cataloguing existing variation is of interest for medical scientists involved in the identification of mutations causing mitochondrial dysfunction, as well as for population genetics studies. Human Mitochondrial Genome Database (mtDB) (http://www.genpat.uu.se/mtDB) has provided a comprehensive database of complete human mitochondrial genomes since early 2000. At this time, owing to an increase in the number of published complete human mitochondrial genome sequences, it became necessary to provide a web-based database of human whole genome and complete coding region sequences. As of August 2005 this database contains 2104 sequences (1544 complete genome and 560 coding region) available to download or search for specific polymorphisms. Of special interest to medical researchers and population geneticists evaluating specific positions is a complete list of (currently 3311) mitochondrial polymorphisms among these sequences. Recent expansions in the capabilities of mtDB include a haplotype search function and the ability to identify and download sequences carrying particular variants.

Journal ArticleDOI
TL;DR: How ESTs have been used in molecular ecology research in the last several years by providing sequence data for the design of molecular markers, genome‐wide studies of gene expression and selection, the identification of candidate genes underlying adaptation, and the basis for Studies of gene family and genome evolution are reviewed.
Abstract: Genomics and bioinformatics have great potential to help address numerous topics in ecology and evolution. Expressed sequence tags (ESTs) can bridge genomics and molecular ecology because they can provide a means of accessing the gene space of almost any organism. We review how ESTs have been used in molecular ecology research in the last several years by providing sequence data for the design of molecular markers, genome-wide studies of gene expression and selection, the identification of candidate genes underlying adaptation, and the basis for studies of gene family and genome evolution. Given the tremendous recent advances in inexpensive sequencing technologies, we predict that molecular ecologists will increasingly be developing and using EST collections in the years to come. With this in mind, we close our review by discussing aspects of EST resource development of particular relevance for molecular ecologists.

Journal ArticleDOI
TL;DR: The RCSB Protein Data Bank (PDB) offers online tools, summary reports and target information related to the worldwide structural genomics initiatives from its portal at .
Abstract: The RCSB Protein Data Bank (PDB) offers online tools, summary reports and target information related to the worldwide structural genomics initiatives from its portal at http://sg.pdb.org. There are currently three components to this site: Structural Genomics Initiatives contains information and links on each structural genomics site, including progress reports, target lists, target status, targets in the PDB and level of sequence redundancy; Targets provides combined target information, protocols and other data associated with protein structure determination; and Structures offers an assessment of the progress of structural genomics based on the functional coverage of the human genome by PDB structures, structural genomics targets and homology models. Functional coverage can be examined according to enzyme classification, gene ontology (biological process, cell component and molecular function) and disease.

Journal ArticleDOI
TL;DR: In this article, the authors used comparative genomics to report a genome-wide map of nucleosome positioning sequences (NPSs) located in the vicinity of all Saccharomyces cerevisiae genes.
Abstract: DNA sequence has long been recognized as an important contributor to nucleosome positioning, which has the potential to regulate access to genes. The extent to which the nucleosomal architecture at promoters is delineated by the underlying sequence is now being worked out. Here we use comparative genomics to report a genome-wide map of nucleosome positioning sequences (NPSs) located in the vicinity of all Saccharomyces cerevisiae genes. We find that the underlying DNA sequence provides a very good predictor of nucleosome locations that have been experimentally mapped to a small fraction of the genome. Notably, distinct classes of genes possess characteristic arrangements of NPSs that may be important for their regulation. In particular, genes that have a relatively compact NPS arrangement over the promoter region tend to have a TATA box buried in an NPS and tend to be highly regulated by chromatin modifying and remodeling factors.

Journal ArticleDOI
TL;DR: The first nonpathogenic B. xenovorans LB400 (LB400) isolate has been sequenced in this article, where the authors find significant differences in functional specialization between the three replicons of LB400.
Abstract: Burkholderia xenovorans LB400 (LB400), a well studied, effective polychlorinated biphenyl-degrader, has one of the two largest known bacterial genomes and is the first nonpathogenic Burkholderia isolate sequenced. From an evolutionary perspective, we find significant differences in functional specialization between the three replicons of LB400, as well as a more relaxed selective pressure for genes located on the two smaller vs. the largest replicon. High genomic plasticity, diversity, and specialization within the Burkholderia genus are exemplified by the conservation of only 44% of the genes between LB400 and Burkholderia cepacia complex strain 383. Even among four B. xenovorans strains, genome size varies from 7.4 to 9.73 Mbp. The latter is largely explained by our findings that >20% of the LB400 sequence was recently acquired by means of lateral gene transfer. Although a range of genetic factors associated with in vivo survival and intercellular interactions are present, these genetic factors are likely related to niche breadth rather than determinants of pathogenicity. The presence of at least eleven “central aromatic” and twenty “peripheral aromatic” pathways in LB400, among the highest in any sequenced bacterial genome, supports this hypothesis. Finally, in addition to the experimentally observed redundancy in benzoate degradation and formaldehyde oxidation pathways, the fact that 17.6% of proteins have a better LB400 paralog than an ortholog in a different genome highlights the importance of gene duplication and repeated acquirement, which, coupled with their divergence, raises questions regarding the role of paralogs and potential functional redundancies in large-genome microbes.

Journal ArticleDOI
TL;DR: It is found that Alu recombination-mediated genomic deletion has had a much higher impact than was inferred from previously identified isolated events and that it continues to contribute to the dynamic nature of the human genome.
Abstract: Recombination between Alu elements results in genomic deletions associated with many human genetic disorders. Here, we compare the reference human and chimpanzee genomes to determine the magnitude of this recombination process in the human lineage since the human-chimpanzee divergence ∼6 million years ago. Combining computational data mining and wet-bench experimental verification, we identified 492 human-specific deletions (for a total of ∼400 kb) attributable to this process, a significant component of the insertion/deletion spectrum of the human genome. The majority of the deletions (295 of 492) coincide with known or predicted genes (including 3 that deleted functional exons, as compared with orthologous chimpanzee genes), which implicates this process in creating a substantial portion of the genomic differences between humans and chimpanzees. Overall, we found that Alu recombination-mediated genomic deletion has had a much higher impact than was inferred from previously identified isolated events and that it continues to contribute to the dynamic nature of the human genome.

Journal ArticleDOI
TL;DR: This study evaluated the utility and cost-effectiveness of a hybrid sequencing approach using 3730xl Sanger data and 454 data to generate higher-quality lower-cost assemblies of microbial genomes compared to current Sanger sequencing strategies alone.
Abstract: Since its introduction a decade ago, whole-genome shotgun sequencing (WGS) has been the main approach for producing cost-effective and high-quality genome sequence data. Until now, the Sanger sequencing technology that has served as a platform for WGS has not been truly challenged by emerging technologies. The recent introduction of the pyrosequencing-based 454 sequencing platform (454 Life Sciences, Branford, CT) offers a very promising sequencing technology alternative for incorporation in WGS. In this study, we evaluated the utility and cost-effectiveness of a hybrid sequencing approach using 3730xl Sanger data and 454 data to generate higher-quality lower-cost assemblies of microbial genomes compared to current Sanger sequencing strategies alone.

Journal ArticleDOI
TL;DR: The current study highlights the recent trends of applications of molecular markers in insect studies and explores the technological advancements in molecular marker tools and modern high throughput genotyping methodologies that may be applied in entomological researches for better understanding of insect ecology at molecular level.
Abstract: Insects comprise the largest species composition in the entire animal kingdom and possess a vast undiscovered genetic diversity and gene pool that can be better explored using molecular marker techniques. Current trends of application of DNA marker techniques in diverse domains of insect ecological studies show that mitochondrial DNA (mtDNA), microsatellites, random amplified polymorphic DNA (RAPD), expressed sequence tags (EST) and amplified fragment length polymorphism (AFLP) markers have contributed significantly for progresses towards understanding genetic basis of insect diversity and for mapping medically and agriculturally important genes and quantitative trait loci in insect pests. Apart from these popular marker systems, other novel approaches including transposon display, sequence-specific amplification polymorphism (S-SAP), repeat-associated polymerase chain reaction (PCR) markers have been identified as alternate marker systems in insect studies. Besides, whole genome microarray and single nucleotide polymorphism (SNP) assays are becoming more popular to screen genome-wide polymorphisms in fast and cost effective manner. However, use of such methodologies has not gained widespread popularity in entomological studies. The current study highlights the recent trends of applications of molecular markers in insect studies and explores the technological advancements in molecular marker tools and modern high throughput genotyping methodologies that may be applied in entomological researches for better understanding of insect ecology at molecular level.

Journal ArticleDOI
Jianping Xu1
TL;DR: In the last 20 years, the application of genomics tools have revolutionized microbial ecological studies and drastically expanded our view on the previously under-appreciated microbial world as discussed by the authors.
Abstract: Microbial ecology examines the diversity and activity of micro-organisms in Earth’s biosphere In the last 20 years, the application of genomics tools have revolutionized microbial ecological studies and drastically expanded our view on the previously underappreciated microbial world This review first introduces the basic concepts in microbial ecology and the main genomics methods that have been used to examine natural microbial populations and communities In the ensuing three specific sections, the applications of the genomics in microbial ecological research are highlighted The first describes the widespread application of multilocus sequence typing and representational difference analysis in studying genetic variation within microbial species Such investigations have identified that migration, horizontal gene transfer and recombination are common in natural microbial populations and that microbial strains can be highly variable in genome size and gene content The second section highlights and summarizes the use of four specific genomics methods (phylogenetic analysis of ribosomal RNA, DNA–DNA reassociation kinetics, metagenomics, and micro-arrays) in analysing the diversity and potential activity of microbial populations and communities from a variety of terrestrial and aquatic environments Such analyses have identified many unexpected phylogenetic lineages in viruses, bacteria, archaea, and microbial eukaryotes Functional analyses of environmental DNA also revealed highly prevalent, but previously unknown, metabolic processes in natural microbial communities In the third section, the ecological implications of sequenced microbial genomes are briefly discussed Comparative analyses of prokaryotic genomic sequences suggest the importance of ecology in determining microbial genome size and gene content The significant variability in genome size and gene content among strains and species of prokaryotes indicate the highly fluid nature of prokaryotic genomes, a result consistent with those from multilocus sequence typing and representational difference analyses The integration of various levels of ecological analyses coupled to the application and further development of high throughput technologies are accelerating the pace of discovery in microbial ecology

Journal ArticleDOI
TL;DR: The AgBase database is the first database dedicated to functional genomics and systems biology analysis for agriculturally important species and their pathogens and uses experimental data to improve structural annotation of genomes and to functionally characterize gene products.
Abstract: Background: Many agricultural species and their pathogens have sequenced genomes and more are in progress. Agricultural species provide food, fiber, xenotransplant tissues, biopharmaceuticals and biomedical models. Moreover, many agricultural microorganisms are human zoonoses. However, systems biology from functional genomics data is hindered in agricultural species because agricultural genome sequences have relatively poor structural and functional annotation and agricultural research communities are smaller with limited funding compared to many model organism communities. Description: To facilitate systems biology in these traditionally agricultural species we have established "AgBase", a curated, web-accessible, public resource http://www.agbase.msstate.edu for structural and functional annotation of agricultural genomes. The AgBase database includes a suite of computational tools to use GO annotations. We use standardized nomenclature following the Human Genome Organization Gene Nomenclature guidelines and are currently functionally annotating chicken, cow and sheep gene products using the Gene Ontology (GO). The computational tools we have developed accept and batch process data derived from different public databases (with different accession codes), return all existing GO annotations, provide a list of products without GO annotation, identify potential orthologs, model functional genomics data using GO and assist proteomics analysis of ESTs and EST assemblies. Our journal database helps prevent redundant manual GO curation. We encourage and publicly acknowledge GO annotations from researchers and provide a service for researchers interested in GO and analysis of functional genomics data. Conclusion: The AgBase database is the first database dedicated to functional genomics and systems biology analysis for agriculturally important species and their pathogens. We use experimental data to improve structural annotation of genomes and to functionally characterize gene products. AgBase is also directly relevant for researchers in fields as diverse as agricultural production, cancer biology, biopharmaceuticals, human health and evolutionary biology. Moreover, the experimental methods and bioinformatics tools we provide are widely applicable to many other species including model organisms.