scispace - formally typeset
Search or ask a question

Showing papers in "Genome Biology in 2009"


Journal ArticleDOI
TL;DR: Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches and can be used simultaneously to achieve even greater alignment speeds.
Abstract: Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint of approximately 1.3 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve even greater alignment speeds. Bowtie is open source http://bowtie.cbcb.umd.edu.

20,335 citations


Journal ArticleDOI
TL;DR: BioGPS http://biogps.gnf.org is introduced, a centralized gene portal for aggregating distributed gene annotation resources, and embraces the principle of community intelligence, enabling any user to easily and directly contribute to the BioGPS platform.
Abstract: Online gene annotation resources are indispensable for analysis of genomics data. However, the landscape of these online resources is highly fragmented, and scientists often visit dozens of these sites for each gene in a candidate gene list. Here, we introduce BioGPS http://biogps.gnf.org, a centralized gene portal for aggregating distributed gene annotation resources. Moreover, BioGPS embraces the principle of community intelligence, enabling any user to easily and directly contribute to the BioGPS platform.

1,364 citations


Journal ArticleDOI
TL;DR: By using independent mapping data and conserved synteny between the cow and human genomes, this work was able to construct an assembly with excellent large-scale contiguity in which a large majority (approximately 91%) of the genome has been placed onto the 30 B. taurus chromosomes.
Abstract: Background: The genome of the domestic cow, Bos taurus, was sequenced using a mixture of hierarchical and whole-genome shotgun sequencing methods. Results: We have assembled the 35 million sequence reads and applied a variety of assembly improvement techniques, creating an assembly of 2.86 billion base pairs that has multiple improvements over previous assemblies: it is more complete, covering more of the genome; thousands of gaps have been closed; many erroneous inversions, deletions, and translocations have been corrected; and thousands of single-nucleotide errors have been corrected. Our evaluation using independent metrics demonstrates that the resulting assembly is substantially more accurate and complete than alternative versions. Conclusions: By using independent mapping data and conserved synteny between the cow and human genomes, we were able to construct an assembly with excellent large-scale contiguity in which a large majority (approximately 91%) of the genome has been placed onto the 30 B. taurus chromosomes. We constructed a new cow-human synteny map that expands upon previous maps. We also identified for the first time a portion of the B. taurus Y chromosome.

1,097 citations


Journal ArticleDOI
TL;DR: It is demonstrated that the mean expression value outperforms the current normalization strategy in terms of better reduction of technical variation and more accurate appreciation of biological changes.
Abstract: Gene expression analysis of microRNA molecules is becoming increasingly important. In this study we assess the use of the mean expression value of all expressed microRNAs in a given sample as a normalization factor for microRNA real-time quantitative PCR data and compare its performance to the currently adopted approach. We demonstrate that the mean expression value outperforms the current normalization strategy in terms of better reduction of technical variation and more accurate appreciation of biological changes.

952 citations


Journal ArticleDOI
TL;DR: This study analyzed human sequence generated by the Roche 454, Illumina GA, and the ABI SOLiD technologies for the same 260 kb in four individuals to provide important insights into systematic biases and data variability that need to be considered when utilizing NGS platforms for population targeted sequencing studies.
Abstract: Background Next generation sequencing (NGS) platforms are currently being utilized for targeted sequencing of candidate genes or genomic intervals to perform sequence-based association studies. To evaluate these platforms for this application, we analyzed human sequence generated by the Roche 454, Illumina GA, and the ABI SOLiD technologies for the same 260 kb in four individuals.

673 citations


Journal ArticleDOI
TL;DR: It is found that shared environmental pressures and interactions among coevolving organisms do not obscure genome signatures in acid mine drainage communities and genome signatures can be used to assign sequence fragments to populations, an essential prerequisite if metagenomics is to provide ecological and biochemical insights into the functioning of microbial communities.
Abstract: Background: Analyses of DNA sequences from cultivated microorganisms have revealed genome-wide, taxa-specific nucleotide compositional characteristics, referred to as genome signatures. These signatures have far-reaching implications for understanding genome evolution and potential application in classification of metagenomic sequence fragments. However, little is known regarding the distribution of genome signatures in natural microbial communities or the extent to which environmental factors shape them. Results: We analyzed metagenomic sequence data from two acidophilic biofilm communities, including composite genomes reconstructed for nine archaea, three bacteria, and numerous associated viruses, as well as thousands of unassigned fragments from strain variants and lowabundance organisms. Genome signatures, in the form of tetranucleotide frequencies analyzed by emergent self-organizing maps, segregated sequences from all known populations sharing < 50 to 60% average amino acid identity and revealed previously unknown genomic clusters corresponding to low-abundance organisms and a putative plasmid. Signatures were pervasive genome-wide. Clusters were resolved because intra-genome differences resulting from translational selection or protein adaptation to the intracellular (pH ~5) versus extracellular (pH ~1) environment were small relative to inter-genome differences. We found that these genome signatures stem from multiple influences but are primarily manifested through codon composition, which we propose is the result of genome-specific mutational biases. Conclusions: An important conclusion is that shared environmental pressures and interactions among coevolving organisms do not obscure genome signatures in acid mine drainage communities. Thus, genome signatures can be used to assign sequence fragments to populations, an essential prerequisite if metagenomics is to provide ecological and biochemical insights into the functioning of microbial communities.

535 citations


Journal ArticleDOI
TL;DR: Crossbow is a cloud-computing software tool that combines the aligner Bowtie and the SNP caller SOAPsnp that analyzes data comprising 38-fold coverage of the human genome in three hours using a 320-CPU cluster rented from a cloud computing service for about $85.
Abstract: As DNA sequencing outpaces improvements in computer speed, there is a critical need to accelerate tasks like alignment and SNP calling. Crossbow is a cloud-computing software tool that combines the aligner Bowtie and the SNP caller SOAPsnp. Executing in parallel using Hadoop, Crossbow analyzes data comprising 38-fold coverage of the human genome in three hours using a 320-CPU cluster rented from a cloud computing service for about $85. Crossbow is available from http://bowtie-bio.sourceforge.net/crossbow/.

514 citations


Journal ArticleDOI
TL;DR: Although many aspects of the mechanisms and functions of H3K4me appear to be conserved among all three kingdoms, significant differences are observed in the relationship between H3k4me and transcription or other epigenetic pathways in plants and mammals.
Abstract: Background: Post-translational modifications of histones play important roles in maintaining normal transcription patterns by directly or indirectly affecting the structural properties of the chromatin. In plants, methylation of histone H3 lysine 4 (H3K4me) is associated with genes and required for normal plant development. Results: We have characterized the genome-wide distribution patterns of mono-, di- and trimethylation of H3K4 (H3K4me1, H3K4me2 and H3K4me3, respectively) in Arabidopsis thaliana seedlings using chromatin immunoprecipitation and high-resolution whole-genome tiling microarrays (ChIP-chip). All three types of H3K4me are found to be almost exclusively genic, and two-thirds of Arabidopsis genes contain at least one type of H3K4me. H3K4me2 and H3K4me3 accumulate predominantly in promoters and 5' genic regions, whereas H3K4me1 is distributed within transcribed regions. In addition, H3K4me3-containing genes are highly expressed with low levels of tissue specificity, but H3K4me1 or H3K4me2 may not be directly involved in transcriptional activation. Furthermore, the preferential co-localization of H3K4me3 and H3K27me3 found in mammals does not appear to occur in plants at a genome-wide level, but H3K4me2 and H3K27me3 co-localize at a higher-than-expected frequency. Finally, we found that H3K4me2/3 and DNA methylation appear to be mutually exclusive, but surprisingly, H3K4me1 is highly correlated with CG DNA methylation in the transcribed regions of genes. Conclusions: H3K4me plays widespread roles in regulating gene expression in plants. Although many aspects of the mechanisms and functions of H3K4me appear to be conserved among all three kingdoms, we observed significant differences in the relationship between H3K4me and transcription or other epigenetic pathways in plants and mammals.

514 citations


Journal ArticleDOI
TL;DR: This study provides genetic markers for the identification of 027 strains and offers a unique opportunity to explain the recent emergence of a hypervirulent bacterium.
Abstract: Background: The continued rise of Clostridium difficile infections worldwide has been accompanied by the rapid emergence of a highly virulent clone designated PCR-ribotype 027. To understand more about the evolution of this virulent clone, we made a three-way genomic and phenotypic comparison of an 'historic' non-epidemic 027 C. difficile (CD196), a recent epidemic and hypervirulent 027 (R20291) and a previously sequenced PCR-ribotype 012 strain (630).

466 citations


Journal ArticleDOI
TL;DR: A 1001 Genomes project for Arabidopsis thaliana, the workhorse of plant genetics, will provide an enormous boost for plant research with a modest financial investment.
Abstract: We advocate here a 1001 Genomes project for Arabidopsis thaliana, the workhorse of plant genetics, which will provide an enormous boost for plant research with a modest financial investment

463 citations


Journal ArticleDOI
TL;DR: A large group of PIN proteins, including the most ancient members known from mosses, localize to the endoplasmic reticulum and they regulate the subcellular compartmentalization of auxin and thus auxin metabolism.
Abstract: The PIN-FORMED (PIN) proteins are secondary transporters acting in the efflux of the plant signal molecule auxin from cells. They are asymmetrically localized within cells and their polarity determines the directionality of intercellular auxin flow. PIN genes are found exclusively in the genomes of multicellular plants and play an important role in regulating asymmetric auxin distribution in multiple developmental processes, including embryogenesis, organogenesis, tissue differentiation and tropic responses. All PIN proteins have a similar structure with amino- and carboxy-terminal hydrophobic, membrane-spanning domains separated by a central hydrophilic domain. The structure of the hydrophobic domains is well conserved. The hydrophilic domain is more divergent and it determines eight groups within the protein family. The activity of PIN proteins is regulated at multiple levels, including transcription, protein stability, subcellular localization and transport activity. Different endogenous and environmental signals can modulate PIN activity and thus modulate auxin-distribution-dependent development. A large group of PIN proteins, including the most ancient members known from mosses, localize to the endoplasmic reticulum and they regulate the subcellular compartmentalization of auxin and thus auxin metabolism. Further work is needed to establish the physiological importance of this unexpected mode of auxin homeostasis regulation. Furthermore, the evolution of PIN-based transport, PIN protein structure and more detailed biochemical characterization of the transport function are important topics for further studies.

Journal ArticleDOI
TL;DR: A functional genomic in vivo expression technology (IVET) screen provided insight into genes used by P. fluorescens in its natural environment and an improved understanding of the ecological significance of diversity within this species.
Abstract: Pseudomonas fluorescens are common soil bacteria that can improve plant health through nutrient cycling, pathogen antagonism and induction of plant defenses. The genome sequences of strains SBW25 and Pf0-1 were determined and compared to each other and with P. fluorescens Pf-5. A functional genomic in vivo expression technology (IVET) screen provided insight into genes used by P. fluorescens in its natural environment and an improved understanding of the ecological significance of diversity within this species. Comparisons of three P. fluorescens genomes (SBW25, Pf0-1, Pf-5) revealed considerable divergence: 61% of genes are shared, the majority located near the replication origin. Phylogenetic and average amino acid identity analyses showed a low overall relationship. A functional screen of SBW25 defined 125 plant-induced genes including a range of functions specific to the plant environment. Orthologues of 83 of these exist in Pf0-1 and Pf-5, with 73 shared by both strains. The P. fluorescens genomes carry numerous complex repetitive DNA sequences, some resembling Miniature Inverted-repeat Transposable Elements (MITEs). In SBW25, repeat density and distribution revealed 'repeat deserts' lacking repeats, covering approximately 40% of the genome. P. fluorescens genomes are highly diverse. Strain-specific regions around the replication terminus suggest genome compartmentalization. The genomic heterogeneity among the three strains is reminiscent of a species complex rather than a single species. That 42% of plant-inducible genes were not shared by all strains reinforces this conclusion and shows that ecological success requires specialized and core functions. The diversity also indicates the significant size of genetic information within the Pseudomonas pan genome.

Journal ArticleDOI
TL;DR: Members of the family of serine/arginine (SR)-rich proteins are critical components of the machineries carrying out these essential processing events, highlighting their importance in maintaining efficient gene expression.
Abstract: Summary The processing of pre-mRNAs is a fundamental step required for the expression of most metazoan genes. Members of the family of serine/arginine (SR)-rich proteins are critical components of the machineries carrying out these essential processing events, highlighting their importance in maintaining efficient gene expression. SR proteins are characterized by their ability to interact simultaneously with RNA and other protein components via an RNA recognition motif (RRM) and through a domain rich in arginine and serine residues, the RS domain. Their functional roles in gene expression are surprisingly diverse, ranging from their classical involvement in constitutive and alternative pre-mRNA splicing to various post-splicing activities, including mRNA nuclear export, nonsense-mediated decay, and mRNA translation. These activities point up the importance of SR proteins during the regulation of mRNA metabolism.

Journal ArticleDOI
TL;DR: It is suggested that most animal transcription factors will be found to show a similar broad overlapping pattern of binding in vivo, with specificity achieved by modulating the amount, rather than the identity, of bound factor.
Abstract: We previously established that six sequence-specific transcription factors that initiate anterior/posterior patterning in Drosophila bind to overlapping sets of thousands of genomic regions in blastoderm embryos. While regions bound at high levels include known and probable functional targets, more poorly bound regions are preferentially associated with housekeeping genes and/or genes not transcribed in the blastoderm, and are frequently found in protein coding sequences or in less conserved non-coding DNA, suggesting that many are likely non-functional. Here we show that an additional 15 transcription factors that regulate other aspects of embryo patterning show a similar quantitative continuum of function and binding to thousands of genomic regions in vivo. Collectively, the 21 regulators show a surprisingly high overlap in the regions they bind given that they belong to 11 DNA binding domain families, specify distinct developmental fates, and can act via different cis-regulatory modules. We demonstrate, however, that quantitative differences in relative levels of binding to shared targets correlate with the known biological and transcriptional regulatory specificities of these factors. It is likely that the overlap in binding of biochemically and functionally unrelated transcription factors arises from the high concentrations of these proteins in nuclei, which, coupled with their broad DNA binding specificities, directs them to regions of open chromatin. We suggest that most animal transcription factors will be found to show a similar broad overlapping pattern of binding in vivo, with specificity achieved by modulating the amount, rather than the identity, of bound factor.

Journal ArticleDOI
TL;DR: Ibis (Improved base identification system), an accurate, fast and easy-to-use base caller that significantly reduces the error rate and increases the output of usable reads is presented.
Abstract: The Illumina Genome Analyzer generates millions of short sequencing reads. We present Ibis (Improved base identification system), an accurate, fast and easy-to-use base caller that significantly reduces the error rate and increases the output of usable reads. Ibis is faster and more robust with respect to chemistry and technology than other publicly available packages. Ibis is freely available under the GPL from http://bioinf.eva.mpg.de/Ibis/.

Journal ArticleDOI
TL;DR: The analysis of WOX gene expression and function shows that WOX family members fulfill specialized functions in key developmental processes in plants, such as embryonic patterning, stem-cell maintenance and organ formation.
Abstract: The WOX genes form a plant-specific subclade of the eukaryotic homeobox transcription factor superfamily, which is characterized by the presence of a conserved DNA-binding homeodomain. The analysis of WOX gene expression and function shows that WOX family members fulfill specialized functions in key developmental processes in plants, such as embryonic patterning, stem-cell maintenance and organ formation. These functions can be related to either promotion of cell division activity and/or prevention of premature cell differentiation. The phylogenetic tree of the plant WOX proteins can be divided into three clades, termed the WUS, intermediate and ancient clade. WOX proteins of the WUS clade appear to some extent able to functionally complement other members. The specific function of individual WOX-family proteins is most probably determined by their spatiotemporal expression pattern and probably also by their interaction with other proteins, which may repress their transcriptional activity. The prototypic WOX-family member WUS has recently been shown to act as a bifunctional transcription factor, functioning as repressor in stem-cell regulation and as activator in floral patterning. Past research has mainly focused on part of the WOX protein family in some model flowering plants, such as Arabidopsis thaliana (thale cress) or Oryza sativa (rice). Future research, including so-far neglected clades and non-flowering plants, is expected to reveal how these master switches of plant differentiation and embryonic patterning evolved and how they fulfill their function.

Journal ArticleDOI
TL;DR: GenomeMapper supports simultaneous mapping of short reads against multiple genomes by integrating related genomes into a single graph structure and introduces representations for alignments against complex structures.
Abstract: Genome resequencing with short reads generally relies on alignments against a single reference. GenomeMapper supports simultaneous mapping of short reads against multiple genomes by integrating related genomes (e.g., individuals of the same species) into a single graph structure. It constitutes the first approach for handling multiple references and introduces representations for alignments against complex structures. Demonstrated benefits include access to polymorphisms that cannot be identified by alignments against the reference alone. Download GenomeMapper at http://1001genomes.org.

Journal ArticleDOI
TL;DR: Paired-End Mapper demonstrated high structural variant reconstruction efficiency for PEMer's coverage-adjusted multi-cutoff scoring-strategy and showed its relative insensitivity to base-calling errors.
Abstract: Personal-genomics endeavors, such as the 1000 Genomes project, are generating maps of genomic structural variants by analyzing ends of massively sequenced genome fragments. To process these we developed Paired-End Mapper (PEMer; http://sv.gersteinlab.org/pemer). This comprises an analysis pipeline, compatible with several next-generation sequencing platforms; simulation-based error models, yielding confidence-values for each structural variant; and a back-end database. The simulations demonstrated high structural variant reconstruction efficiency for PEMer's coverage-adjusted multi-cutoff scoring-strategy and showed its relative insensitivity to base-calling errors.

Journal ArticleDOI
TL;DR: Significant indications are provided that higher-order complex formation is a general and essential molecular mechanism for plant MADS box protein functioning and attribute a pivotal role to the SEP3 'glue' protein in mediating multimerization.
Abstract: Plant MADS box proteins play important roles in a plethora of developmental processes. In order to regulate specific sets of target genes, MADS box proteins dimerize and are thought to assemble into multimeric complexes. In this study a large-scale yeast three-hybrid screen is utilized to provide insight into the higher-order complex formation capacity of the Arabidopsis MADS box family. SEPALLATA3 (SEP3) has been shown to mediate complex formation and, therefore, special attention is paid to this factor in this study. In total, 106 multimeric complexes were identified; in more than half of these at least one SEP protein was present. Besides the known complexes involved in determining floral organ identity, various complexes consisting of combinations of proteins known to play a role in floral organ identity specification, and flowering time determination were discovered. The capacity to form this latter type of complex suggests that homeotic factors play essential roles in down-regulation of the MADS box genes involved in floral timing in the flower via negative auto-regulatory loops. Furthermore, various novel complexes were identified that may be important for the direct regulation of the floral transition process. A subsequent detailed analysis of the APETALA3, PISTILLATA, and SEP3 proteins in living plant cells suggests the formation of a multimeric complex in vivo. Overall, these results provide strong indications that higher-order complex formation is a general and essential molecular mechanism for plant MADS box protein functioning and attribute a pivotal role to the SEP3 'glue' protein in mediating multimerization.

Journal ArticleDOI
TL;DR: Six high-resolution genome-wide maps of Saccharomyces cerevisiae nucleosome positions from multiple labs and detection platforms are compiled, and new insights are reported.
Abstract: Nucleosomes have position-specific functions in controlling gene expression. A complete systematic genome-wide reference map of absolute and relative nucleosome positions is needed to minimize potential confusion when referring to the function of individual nucleosomes (or nucleosome-free regions) across datasets. We compiled six high-resolution genome-wide maps of Saccharomyces cerevisiae nucleosome positions from multiple labs and detection platforms, and report new insights. Data downloads, reference position assignment software, queries, and a visualization browser are available online http://atlas.bx.psu.edu/.

Journal ArticleDOI
TL;DR: Comparisons between prototherian and therian mammals provide strong support for the host defence hypothesis and show that the platypus has significantly fewer repeats of certain classes in the regions of the genome that have become imprinted in therian mammal.
Abstract: Background: Genomic imprinting is an epigenetic phenomenon that results in monoallelic gene expression. Many hypotheses have been advanced to explain why genomic imprinting evolved in mammals, but few have examined how it arose. The host defence hypothesis suggests that imprinting evolved from existing mechanisms within the cell that act to silence foreign DNA elements that insert into the genome. However, the changes to the mammalian genome that accompanied the evolution of imprinting have been hard to define due to the absence of large scale genomic resources between all extant classes. The recent release of the platypus genome has provided the first opportunity to perform comparisons between prototherian (monotreme; which appear to lack imprinting) and therian (marsupial and eutherian; which have imprinting) mammals. Results: We compared the distribution of repeat elements known to attract epigenetic silencing across the entire genome from monotremes and therian mammals, particularly focusing on the orthologous imprinted regions. There is a significant accumulation of certain repeat elements within imprinted regions of therian mammals compared to the platypus. Conclusions: Our analyses show that the platypus has significantly fewer repeats of certain classes in the regions of the genome that have become imprinted in therian mammals. The accumulation of repeats, especially long terminal repeats and DNA elements, in therian imprinted genes and gene clusters is coincident with, and may have been a potential driving force in, the development of mammalian genomic imprinting. These data provide strong support for the host defence hypothesis.

Journal ArticleDOI
TL;DR: These analyses reveal lincRNA and macroRNA exon sequences to be subject to the same relatively low degree of sequence constraint, and indicate that each of the two ncRNA catalogues unevenly and lightly samples the true, much larger, nc RNA repertoire of the mouse.
Abstract: Background Despite increasing interest in the noncoding fraction of transcriptomes, the number, species-conservation and functions, if any, of many non-protein-coding transcripts remain to be discovered. Two extensive long intergenic noncoding RNA (ncRNA) transcript catalogues are now available for mouse: over 3,000 macroRNAs identified by cDNA sequencing, and 1,600 long intergenic noncoding RNA (lincRNA) intervals that are predicted from chromatin-state maps. Previously we showed that macroRNAs tend to be more highly conserved than putatively neutral sequence, although only 5% of bases are predicted as constrained. By contrast, over a thousand lincRNAs were reported as being highly conserved. This apparent difference may account for the surprisingly small fraction (11%) of transcripts that are represented in both catalogues. Here we sought to resolve the reported discrepancy between the evolutionary rates for these two sets.

Journal ArticleDOI
Bolan Linghu1, Evan S. Snitkin1, Zhenjun Hu1, Yu Xia1, Charles DeLisi1 
TL;DR: The functional-linkage network is used to prioritize candidate genes for 110 diseases, and to reliably disclose hidden associations between disease pairs having dissimilar phenotypes, such as hypercholesterolemia and Alzheimer's disease.
Abstract: We integrate 16 genomic features to construct an evidence-weighted functional-linkage network comprising 21,657 human genes. The functional-linkage network is used to prioritize candidate genes for 110 diseases, and to reliably disclose hidden associations between disease pairs having dissimilar phenotypes, such as hypercholesterolemia and Alzheimer's disease. Many of these disease-disease associations are supported by epidemiology, but with no previous genetic basis. Such associations can drive novel hypotheses on molecular mechanisms of diseases and therapies.

Journal ArticleDOI
TL;DR: This work describes a method for automatic detection of absolute segmental copy numbers and genotype status in complex cancer genome profiles measured with single-nucleotide polymorphism (SNP) arrays based on pattern recognition of segmented and smoothed copy number and allelic imbalance profiles.
Abstract: We describe a method for automatic detection of absolute segmental copy numbers and genotype status in complex cancer genome profiles measured with single-nucleotide polymorphism (SNP) arrays. The method is based on pattern recognition of segmented and smoothed copy number and allelic imbalance profiles. Assignments were verified by DNA indexes of primary tumors and karyotypes of cell lines. The method performs well even for poor-quality data, low tumor content, and highly rearranged tumor genomes.

Journal ArticleDOI
TL;DR: The 82 base-pair deletion found in the rph-pyrE operon of many endpoints may function to relieve a pyrimidine biosynthesis defect present in MG1655, suggesting flexibility in overcoming regulatory challenges in the adaptation.
Abstract: Background: Short-term laboratory evolution of bacteria followed by genomic sequencing provides insight into the mechanism of adaptive evolution, such as the number of mutations needed for adaptation, genotype-phenotype relationships, and the reproducibility of adaptive outcomes. Results: In the present study, we describe the genome sequencing of 11 endpoints of Escherichia coli that underwent 60-day laboratory adaptive evolution under growth rate selection pressure in lactate minimal media. Two to eight mutations were identified per endpoint. Generally, each endpoint acquired mutations to different genes. The most notable exception was an 82 base-pair deletion in the rph-pyrE operon that appeared in 7 of the 11 adapted strains. This mutation conferred an approximately 15% increase to the growth rate when experimentally introduced to the wild-type background and resulted in an approximately 30% increase to growth rate when introduced to a background already harboring two adaptive mutations. Additionally, most endpoints had a mutation in a regulatory gene (crp or relA, for example) or the RNA polymerase. Conclusions: The 82 base-pair deletion found in the rph-pyrE operon of many endpoints may function to relieve a pyrimidine biosynthesis defect present in MG1655. In contrast, a variety of regulators acquire mutations in the different endpoints, suggesting flexibility in overcoming regulatory challenges in the adaptation.

Journal ArticleDOI
TL;DR: It appears that polyploidy and chromosomal diploidization are ongoing processes that collectively stabilize the B. rapa genome and facilitate its evolution.
Abstract: Brassica rapa is one of the most economically important vegetable crops worldwide. Owing to its agronomic importance and phylogenetic position, B. rapa provides a crucial reference to understand polyploidy-related crop genome evolution. The high degree of sequence identity and remarkably conserved genome structure between Arabidopsis and Brassica genomes enables comparative tiling sequencing using Arabidopsis sequences as references to select the counterpart regions in B. rapa, which is a strong challenge of structural and comparative crop genomics. We assembled 65.8 megabase-pairs of non-redundant euchromatic sequence of B. rapa and compared this sequence to the Arabidopsis genome to investigate chromosomal relationships, macrosynteny blocks, and microsynteny within blocks. The triplicated B. rapa genome contains only approximately twice the number of genes as in Arabidopsis because of genome shrinkage. Genome comparisons suggest that B. rapa has a distinct organization of ancestral genome blocks as a result of recent whole genome triplication followed by a unique diploidization process. A lack of the most recent whole genome duplication (3R) event in the B. rapa genome, atypical of other Brassica genomes, may account for the emergence of B. rapa from the Brassica progenitor around 8 million years ago. This work demonstrates the potential of using comparative tiling sequencing for genome analysis of crop species. Based on a comparative analysis of the B. rapa sequences and the Arabidopsis genome, it appears that polyploidy and chromosomal diploidization are ongoing processes that collectively stabilize the B. rapa genome and facilitate its evolution.

Journal ArticleDOI
TL;DR: Progress in the automated prediction of protein function based on protein sequence and structure is reviewed in the BioSapiens Network.
Abstract: With many genomes now sequenced, computational annotation methods to characterize genes and proteins from their sequence are increasingly important. The BioSapiens Network has developed tools to address all stages of this process, and here we review progress in the automated prediction of protein function based on protein sequence and structure.

Journal ArticleDOI
TL;DR: Targeted RNA-Seq produces an enhanced view of the molecular state of a set of "high interest" genes by combining next-generation sequencing with capture of sequences from a relevant subset of a transcriptome.
Abstract: Targeted RNA-Seq combines next-generation sequencing with capture of sequences from a relevant subset of a transcriptome. When testing by capturing sequences from a tumor cDNA library by hybridization to oligonucleotide probes specific for 467 cancer-related genes, this method showed high selectivity, improved mutation detection enabling discovery of novel chimeric transcripts, and provided RNA expression data. Thus, targeted RNA-Seq produces an enhanced view of the molecular state of a set of "high interest" genes.

Journal ArticleDOI
TL;DR: The findings support the essentiality of milk to the survival of mammalian neonates and the establishment of milk secretory mechanisms more than 160 million years ago and suggest that this diversity of milk protein composition across species is primarily due to other mechanisms.
Abstract: The newly assembled Bos taurus genome sequence enables the linkage of bovine milk and lactation data with other mammalian genomes. Using publicly available milk proteome data and mammary expressed sequence tags, 197 milk protein genes and over 6,000 mammary genes were identified in the bovine genome. Intersection of these genes with 238 milk production quantitative trait loci curated from the literature decreased the search space for milk trait effectors by more than an order of magnitude. Genome location analysis revealed a tendency for milk protein genes to be clustered with other mammary genes. Using the genomes of a monotreme (platypus), a marsupial (opossum), and five placental mammals (bovine, human, dog, mice, rat), gene loss and duplication, phylogeny, sequence conservation, and evolution were examined. Compared with other genes in the bovine genome, milk and mammary genes are: more likely to be present in all mammals; more likely to be duplicated in therians; more highly conserved across Mammalia; and evolving more slowly along the bovine lineage. The most divergent proteins in milk were associated with nutritional and immunological components of milk, whereas highly conserved proteins were associated with secretory processes. Although both copy number and sequence variation contribute to the diversity of milk protein composition across species, our results suggest that this diversity is primarily due to other mechanisms. Our findings support the essentiality of milk to the survival of mammalian neonates and the establishment of milk secretory mechanisms more than 160 million years ago.

Journal ArticleDOI
TL;DR: Using 36 base (fragment) and 26 base (jumping) reads from five microbial genomes of varied GC composition and sizes up to 40 Mb, ALLPATHS2 generated assemblies with long, accurate contigs and scaffolds.
Abstract: We demonstrate that genome sequences approaching finished quality can be generated from short paired reads. Using 36 base (fragment) and 26 base (jumping) reads from five microbial genomes of varied GC composition and sizes up to 40 Mb, ALLPATHS2 generated assemblies with long, accurate contigs and scaffolds. Velvet and EULER-SR were less accurate. For example, for Escherichia coli, the fraction of 10-kb stretches that were perfect was 99.8% (ALLPATHS2), 68.7% (Velvet), and 42.1% (EULER-SR).