Showing papers in "BMC Genomics in 2008"

PDF

Open Access

Journal Article•DOI•

The RAST Server: Rapid Annotations using Subsystems Technology

[...]

Ramy K. Aziz¹, Ramy K. Aziz², Daniela Bartels³, Aaron A. Best⁴, Matthew DeJongh⁴, Terrence Disz³, Terrence Disz⁵, Robert Edwards⁵, Kevin Formsma⁴, Svetlana Gerdes, Elizabeth M. Glass⁵, Michael Kubal³, Folker Meyer⁵, Folker Meyer³, Gary J. Olsen⁶, Gary J. Olsen⁵, Robert Olson⁵, Robert Olson³, Andrei L. Osterman⁷, Ross Overbeek, Leslie Klis McNeil⁶, Daniel Paarmann³, Tobias Paczian³, Bruce Parrello, Gordon D. Pusch³, Claudia I. Reich⁶, Rick Stevens⁵, Rick Stevens³, Olga Vassieva, Veronika Vonstein, Andreas Wilke³, Olga Zagnitko - Show less +28 more•Institutions (7)

Cairo University¹, University of Tennessee Health Science Center², University of Chicago³, Hope College⁴, Argonne National Laboratory⁵, University of Illinois at Urbana–Champaign⁶, Sanford-Burnham Institute for Medical Research⁷

08 Feb 2008-BMC Genomics

TL;DR: A fully automated service for annotating bacterial and archaeal genomes that identifies protein-encoding, rRNA and tRNA genes, assigns functions to the genes, predicts which subsystems are represented in the genome, uses this information to reconstruct the metabolic network and makes the output easily downloadable for the user.

...read moreread less

Abstract: The number of prokaryotic genome sequences becoming available is growing steadily and is growing faster than our ability to accurately annotate them. We describe a fully automated service for annotating bacterial and archaeal genomes. The service identifies protein-encoding, rRNA and tRNA genes, assigns functions to the genes, predicts which subsystems are represented in the genome, uses this information to reconstruct the metabolic network and makes the output easily downloadable for the user. In addition, the annotated genome can be browsed in an environment that supports comparative analysis with the annotated genomes maintained in the SEED environment. The service normally makes the annotated genome available within 12–24 hours of submission, but ultimately the quality of such a service will be judged in terms of accuracy, consistency, and completeness of the produced annotations. We summarize our attempts to address these issues and discuss plans for incrementally enhancing the service. By providing accurate, rapid annotation freely to the community we have created an important community resource. The service has now been utilized by over 120 external users annotating over 350 distinct genomes.

...read moreread less

9,397 citations

Journal Article•DOI•

BioVenn : a web application for the comparison and visualization of biological lists using area-proportional Venn diagrams

[...]

Tim Hulsen¹, Jacob de Vlieg², Jacob de Vlieg¹, Wynand Alkema²•Institutions (2)

Radboud University Nijmegen Medical Centre¹, Schering-Plough²

16 Oct 2008-BMC Genomics

TL;DR: BioVenn is an easy-to-use web application to generate area-proportional Venn diagrams from lists of biological identifiers, which supports a wide range of identifiers from the most used biological databases currently available.

...read moreread less

Abstract: In many genomics projects, numerous lists containing biological identifiers are produced. Often it is useful to see the overlap between different lists, enabling researchers to quickly observe similarities and differences between the data sets they are analyzing. One of the most popular methods to visualize the overlap and differences between data sets is the Venn diagram: a diagram consisting of two or more circles in which each circle corresponds to a data set, and the overlap between the circles corresponds to the overlap between the data sets. Venn diagrams are especially useful when they are 'area-proportional' i.e. the sizes of the circles and the overlaps correspond to the sizes of the data sets. Currently there are no programs available that can create area-proportional Venn diagrams connected to a wide range of biological databases. We designed a web application named BioVenn to summarize the overlap between two or three lists of identifiers, using area-proportional Venn diagrams. The user only needs to input these lists of identifiers in the textboxes and push the submit button. Parameters like colors and text size can be adjusted easily through the web interface. The position of the text can be adjusted by 'drag-and-drop' principle. The output Venn diagram can be shown as an SVG or PNG image embedded in the web application, or as a standalone SVG or PNG image. The latter option is useful for batch queries. Besides the Venn diagram, BioVenn outputs lists of identifiers for each of the resulting subsets. If an identifier is recognized as belonging to one of the supported biological databases, the output is linked to that database. Finally, BioVenn can map Affymetrix and EntrezGene identifiers to Ensembl genes. BioVenn is an easy-to-use web application to generate area-proportional Venn diagrams from lists of biological identifiers. It supports a wide range of identifiers from the most used biological databases currently available. Its implementation on the World Wide Web makes it available for use on any computer with internet connection, independent of operating system and without the need to install programs locally. BioVenn is freely accessible at http://www.cmbi.ru.nl/cdd/biovenn/ .

...read moreread less

1,323 citations

Journal Article•DOI•

LEA (Late Embryogenesis Abundant) proteins and their encoding genes in Arabidopsis thaliana

[...]

Michaela Hundertmark¹, Dirk K. Hincha¹•Institutions (1)

Max Planck Society¹

04 Mar 2008-BMC Genomics

TL;DR: A genome-wide analysis of LEA proteins and their encoding genes in Arabidopsis thaliana indicates a wide range of sequence diversity, intracellular localizations, and expression patterns and indicates that they confer an evolutionary advantage for an organism under varying stressful environmental conditions.

...read moreread less

Abstract: LEA (late embryogenesis abundant) proteins have first been described about 25 years ago as accumulating late in plant seed development. They were later found in vegetative plant tissues following environmental stress and also in desiccation tolerant bacteria and invertebrates. Although they are widely assumed to play crucial roles in cellular dehydration tolerance, their physiological and biochemical functions are largely unknown. We present a genome-wide analysis of LEA proteins and their encoding genes in Arabidopsis thaliana. We identified 51 LEA protein encoding genes in the Arabidopsis genome that could be classified into nine distinct groups. Expression studies were performed on all genes at different developmental stages, in different plant organs and under different stress and hormone treatments using quantitative RT-PCR. We found evidence of expression for all 51 genes. There was only little overlap between genes expressed in vegetative tissues and in seeds and expression levels were generally higher in seeds. Most genes encoding LEA proteins had abscisic acid response (ABRE) and/or low temperature response (LTRE) elements in their promoters and many genes containing the respective promoter elements were induced by abscisic acid, cold or drought. We also found that 33% of all Arabidopsis LEA protein encoding genes are arranged in tandem repeats and that 43% are part of homeologous pairs. The majority of LEA proteins were predicted to be highly hydrophilic and natively unstructured, but some were predicted to be folded. The analyses indicate a wide range of sequence diversity, intracellular localizations, and expression patterns. The high fraction of retained duplicate genes and the inferred functional diversification indicate that they confer an evolutionary advantage for an organism under varying stressful environmental conditions. This comprehensive analysis will be an important starting point for future efforts to elucidate the functional role of these enigmatic proteins.

...read moreread less

838 citations

Journal Article•DOI•

The unfoldomics decade: an update on intrinsically disordered proteins

[...]

A. Keith Dunker¹, Christopher J. Oldfield¹, Jingwei Meng¹, Pedro Romero¹, Jack Y. Yang¹, Jessica Walton Chen¹, Vladimir Vacic¹, Zoran Obradovic², Vladimir N. Uversky³, Vladimir N. Uversky¹ - Show less +6 more•Institutions (3)

Indiana University¹, Temple University², Russian Academy of Sciences³

16 Sep 2008-BMC Genomics

TL;DR: The goal is to review the key discoveries and to weave these discoveries together to support novel approaches for understanding sequence-function relationships.

...read moreread less

Abstract: Our first predictor of protein disorder was published just over a decade ago in the Proceedings of the IEEE International Conference on Neural Networks (Romero P, Obradovic Z, Kissinger C, Villafranca JE, Dunker AK (1997) Identifying disordered regions in proteins from amino acid sequence. Proceedings of the IEEE International Conference on Neural Networks, 1: 90–95). By now more than twenty other laboratory groups have joined the efforts to improve the prediction of protein disorder. While the various prediction methodologies used for protein intrinsic disorder resemble those methodologies used for secondary structure prediction, the two types of structures are entirely different. For example, the two structural classes have very different dynamic properties, with the irregular secondary structure class being much less mobile than the disorder class. The prediction of secondary structure has been useful. On the other hand, the prediction of intrinsic disorder has been revolutionary, leading to major modifications of the more than 100 year-old views relating protein structure and function. Experimentalists have been providing evidence over many decades that some proteins lack fixed structure or are disordered (or unfolded) under physiological conditions. In addition, experimentalists are also showing that, for many proteins, their functions depend on the unstructured rather than structured state; such results are in marked contrast to the greater than hundred year old views such as the lock and key hypothesis. Despite extensive data on many important examples, including disease-associated proteins, the importance of disorder for protein function has been largely ignored. Indeed, to our knowledge, current biochemistry books don't present even one acknowledged example of a disorder-dependent function, even though some reports of disorder-dependent functions are more than 50 years old. The results from genome-wide predictions of intrinsic disorder and the results from other bioinformatics studies of intrinsic disorder are demanding attention for these proteins. Disorder prediction has been important for showing that the relatively few experimentally characterized examples are members of a very large collection of related disordered proteins that are wide-spread over all three domains of life. Many significant biological functions are now known to depend directly on, or are importantly associated with, the unfolded or partially folded state. Here our goal is to review the key discoveries and to weave these discoveries together to support novel approaches for understanding sequence-function relationships. Intrinsically disordered protein is common across the three domains of life, but especially common among the eukaryotic proteomes. Signaling sequences and sites of posttranslational modifications are frequently, or very likely most often, located within regions of intrinsic disorder. Disorder-to-order transitions are coupled with the adoption of different structures with different partners. Also, the flexibility of intrinsic disorder helps different disordered regions to bind to a common binding site on a common partner. Such capacity for binding diversity plays important roles in both protein-protein interaction networks and likely also in gene regulation networks. Such disorder-based signaling is further modulated in multicellular eukaryotes by alternative splicing, for which such splicing events map to regions of disorder much more often than to regions of structure. Associating alternative splicing with disorder rather than structure alleviates theoretical and experimentally observed problems associated with the folding of different length, isomeric amino acid sequences. The combination of disorder and alternative splicing is proposed to provide a mechanism for easily "trying out" different signaling pathways, thereby providing the mechanism for generating signaling diversity and enabling the evolution of cell differentiation and multicellularity. Finally, several recent small molecules of interest as potential drugs have been shown to act by blocking protein-protein interactions based on intrinsic disorder of one of the partners. Study of these examples has led to a new approach for drug discovery, and bioinformatics analysis of the human proteome suggests that various disease-associated proteins are very rich in such disorder-based drug discovery targets.

...read moreread less

643 citations

Journal Article•DOI•

Gene networks driving bovine milk fat synthesis during the lactation cycle

[...]

Massimo Bionaz¹, Juan J. Loor¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

31 Jul 2008-BMC Genomics

TL;DR: Results challenge the proposal that SREBF1 is central for milk fat synthesis regulation and highlight a pivotal role for a concerted action among PPARG, PPARGC1A, and INSIG1.

...read moreread less

Abstract: The molecular events associated with regulation of milk fat synthesis in the bovine mammary gland remain largely unknown. Our objective was to study mammary tissue mRNA expression via quantitative PCR of 45 genes associated with lipid synthesis (triacylglycerol and phospholipids) and secretion from the late pre-partum/non-lactating period through the end of subsequent lactation. mRNA expression was coupled with milk fatty acid (FA) composition and calculated indexes of FA desaturation and de novo synthesis by the mammary gland. Marked up-regulation and/or % relative mRNA abundance during lactation were observed for genes associated with mammary FA uptake from blood (LPL, CD36), intracellular FA trafficking (FABP3), long-chain (ACSL1) and short-chain (ACSS2) intracellular FA activation, de novo FA synthesis (ACACA, FASN), desaturation (SCD, FADS1), triacylglycerol synthesis (AGPAT6, GPAM, LPIN1), lipid droplet formation (BTN1A1, XDH), ketone body utilization (BDH1), and transcription regulation (INSIG1, PPARG, PPARGC1A). Change in SREBF1 mRNA expression during lactation, thought to be central for milk fat synthesis regulation, was ≤2-fold in magnitude, while expression of INSIG1, which negatively regulates SREBP activation, was >12-fold and had a parallel pattern of expression to PPARGC1A. Genes involved in phospholipid synthesis had moderate up-regulation in expression and % relative mRNA abundance. The mRNA abundance and up-regulation in expression of ABCG2 during lactation was markedly high, suggesting a biological role of this gene in milk synthesis/secretion. Weak correlations were observed between both milk FA composition and desaturase indexes (i.e., apparent SCD activity) with mRNA expression pattern of genes measured. A network of genes participates in coordinating milk fat synthesis and secretion. Results challenge the proposal that SREBF1 is central for milk fat synthesis regulation and highlight a pivotal role for a concerted action among PPARG, PPARGC1A, and INSIG1. Expression of SCD, the most abundant gene measured, appears to be key during milk fat synthesis. The lack of correlation between gene expression and calculated desaturase indexes does not support their use to infer mRNA expression or enzyme activity (e.g., SCD). Longitudinal mRNA expression allowed development of transcriptional regulation networks and an updated model of milk fat synthesis regulation.

...read moreread less

634 citations

Journal Article•DOI•

A universal DNA mini-barcode for biodiversity analysis

[...]

Isabelle Meusnier¹, Gregory A. C. Singer², Jean-François Landry³, Donal A. Hickey⁴, Paul D. N. Hebert¹, Mehrdad Hajibabaei¹ - Show less +2 more•Institutions (4)

University of Guelph¹, Ohio State University², Agriculture and Agri-Food Canada³, Concordia University⁴

12 May 2008-BMC Genomics

TL;DR: A novel approach based on a much shorter barcode sequence is established and demonstrated its effectiveness in archival specimens, which will significantly broaden the application of DNA barcoding in biodiversity studies.

...read moreread less

Abstract: The goal of DNA barcoding is to develop a species-specific sequence library for all eukaryotes. A 650 bp fragment of the cytochrome c oxidase 1 (CO1) gene has been used successfully for species-level identification in several animal groups. It may be difficult in practice, however, to retrieve a 650 bp fragment from archival specimens, (because of DNA degradation) or from environmental samples (where universal primers are needed). We used a bioinformatics analysis using all CO1 barcode sequences from GenBank and calculated the probability of having species-specific barcodes for varied size fragments. This analysis established the potential of much smaller fragments, mini-barcodes, for identifying unknown specimens. We then developed a universal primer set for the amplification of mini-barcodes. We further successfully tested the utility of this primer set on a comprehensive set of taxa from all major eukaryotic groups as well as archival specimens. In this study we address the important issue of minimum amount of sequence information required for identifying species in DNA barcoding. We establish a novel approach based on a much shorter barcode sequence and demonstrate its effectiveness in archival specimens. This approach will significantly broaden the application of DNA barcoding in biodiversity studies.

...read moreread less

586 citations

Journal Article•DOI•

Flexible nets: disorder and induced fit in the associations of p53 and 14-3-3 with their partners.

[...]

Christopher J. Oldfield¹, Jingwei Meng¹, Jack Y. Yang¹, Mary Qu Yang¹, Vladimir N. Uversky¹, Vladimir N. Uversky², A. Keith Dunker¹ - Show less +3 more•Institutions (2)

Indiana University¹, Russian Academy of Sciences²

20 Mar 2008-BMC Genomics

TL;DR: Detailed examination of two divergent examples of hub proteins support the conjecture that hub proteins often utilize intrinsic disorder to bind to multiple partners and provide detailed information about induced fit in structured regions.

...read moreread less

Abstract: Proteins are involved in many interactions with other proteins leading to networks that regulate and control a wide variety of physiological processes. Some of these proteins, called hub proteins or hubs, bind to many different protein partners. Protein intrinsic disorder, via diversity arising from structural plasticity or flexibility, provide a means for hubs to associate with many partners (Dunker AK, Cortese MS, Romero P, Iakoucheva LM, Uversky VN: Flexible Nets: The roles of intrinsic disorder in protein interaction networks. FEBS J 2005, 272:5129-5148). Here we present a detailed examination of two divergent examples: 1) p53, which uses different disordered regions to bind to different partners and which also has several individual disordered regions that each bind to multiple partners, and 2) 14-3-3, which is a structured protein that associates with many different intrinsically disordered partners. For both examples, three-dimensional structures of multiple complexes reveal that the flexibility and plasticity of intrinsically disordered protein regions as well as induced-fit changes in the structured regions are both important for binding diversity. These data support the conjecture that hub proteins often utilize intrinsic disorder to bind to multiple partners and provide detailed information about induced fit in structured regions.

...read moreread less

582 citations

Journal Article•DOI•

Acidithiobacillus ferrooxidans metabolism: from genome sequence to industrial applications

[...]

Jorge Valdés¹, Inti Pedroso¹, Raquel Quatrini¹, Robert J. Dodson², Hervé Tettelin³, Hervé Tettelin², Robert C. Blake⁴, Jonathan A. Eisen², Jonathan A. Eisen⁵, David S. Holmes¹ - Show less +6 more•Institutions (5)

Andrés Bello National University¹, J. Craig Venter Institute², University of Maryland, Baltimore³, Xavier University⁴, University of California, Davis⁵

11 Dec 2008-BMC Genomics

TL;DR: Bioinformatics analysis provides a valuable platform for gene discovery and functional prediction that helps explain the activity of A. ferrooxidans in industrial bioleaching and its role as a primary producer in acidic environments.

...read moreread less

Abstract: Acidithiobacillus ferrooxidans is a major participant in consortia of microorganisms used for the industrial recovery of copper (bioleaching or biomining). It is a chemolithoautrophic, γ-proteobacterium using energy from the oxidation of iron- and sulfur-containing minerals for growth. It thrives at extremely low pH (pH 1–2) and fixes both carbon and nitrogen from the atmosphere. It solubilizes copper and other metals from rocks and plays an important role in nutrient and metal biogeochemical cycling in acid environments. The lack of a well-developed system for genetic manipulation has prevented thorough exploration of its physiology. Also, confusion has been caused by prior metabolic models constructed based upon the examination of multiple, and sometimes distantly related, strains of the microorganism. The genome of the type strain A. ferrooxidans ATCC 23270 was sequenced and annotated to identify general features and provide a framework for in silico metabolic reconstruction. Earlier models of iron and sulfur oxidation, biofilm formation, quorum sensing, inorganic ion uptake, and amino acid metabolism are confirmed and extended. Initial models are presented for central carbon metabolism, anaerobic metabolism (including sulfur reduction, hydrogen metabolism and nitrogen fixation), stress responses, DNA repair, and metal and toxic compound fluxes. Bioinformatics analysis provides a valuable platform for gene discovery and functional prediction that helps explain the activity of A. ferrooxidans in industrial bioleaching and its role as a primary producer in acidic environments. An analysis of the genome of the type strain provides a coherent view of its gene content and metabolic potential.

...read moreread less

489 citations

Journal Article•DOI•

Protein abundance profiling of the Escherichia coli cytosol

[...]

Yasushi Ishihama¹, Yasushi Ishihama², Thorsten Schmidt³, Juri Rappsilber¹, Juri Rappsilber⁴, Matthias Mann¹, Matthias Mann⁵, F. Ulrich Hartl⁵, Michael J. Kerner⁶, Dmitrij Frishman³ - Show less +6 more•Institutions (6)

University of Southern Denmark¹, Keio University², Technische Universität München³, University of Edinburgh⁴, Max Planck Society⁵, Technical University of Denmark⁶

27 Feb 2008-BMC Genomics

TL;DR: Abundance measurements for more than 1000 E. coli proteins presented in this work represent the most complete study of protein abundance in a bacterial cell so far and show significant associations between the abundance of a protein and its properties and functions in the cell.

...read moreread less

Abstract: Knowledge about the abundance of molecular components is an important prerequisite for building quantitative predictive models of cellular behavior. Proteins are central components of these models, since they carry out most of the fundamental processes in the cell. Thus far, protein concentrations have been difficult to measure on a large scale, but proteomic technologies have now advanced to a stage where this information becomes readily accessible. Here, we describe an experimental scheme to maximize the coverage of proteins identified by mass spectrometry of a complex biological sample. Using a combination of LC-MS/MS approaches with protein and peptide fractionation steps we identified 1103 proteins from the cytosolic fraction of the Escherichia coli strain MC4100. A measure of abundance is presented for each of the identified proteins, based on the recently developed emPAI approach which takes into account the number of sequenced peptides per protein. The values of abundance are within a broad range and accurately reflect independently measured copy numbers per cell. As expected, the most abundant proteins were those involved in protein synthesis, most notably ribosomal proteins. Proteins involved in energy metabolism as well as those with binding function were also found in high copy number while proteins annotated with the terms metabolism, transcription, transport, and cellular organization were rare. The barrel-sandwich fold was found to be the structural fold with the highest abundance. Highly abundant proteins are predicted to be less prone to aggregation based on their length, pI values, and occurrence patterns of hydrophobic stretches. We also find that abundant proteins tend to be predominantly essential. Additionally we observe a significant correlation between protein and mRNA abundance in E. coli cells. Abundance measurements for more than 1000 E. coli proteins presented in this work represent the most complete study of protein abundance in a bacterial cell so far. We show significant associations between the abundance of a protein and its properties and functions in the cell. In this way, we provide both data and novel insights into the role of protein concentration in this model organism.

...read moreread less

484 citations

Journal Article•DOI•

High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome

[...]

Evandro Novaes¹, Derek R. Drost¹, William G. Farmerie¹, Georgios J. Pappas², Georgios J. Pappas³, Dario Grattapaglia³, Dario Grattapaglia², Ronald R. Sederoff⁴, Matias Kirst¹ - Show less +5 more•Institutions (4)

University of Florida¹, Empresa Brasileira de Pesquisa Agropecuária², Universidade Católica de Brasília³, North Carolina State University⁴

30 Jun 2008-BMC Genomics

TL;DR: It is demonstrated that SNPs sampled in large-scale with 454 pyrosequencing can be used to detect evolutionary signatures among genes, providing one of the first genome-wide assessments of nucleotide diversity and Ka/Ks for a non-model plant species.

...read moreread less

Abstract: Benefits from high-throughput sequencing using 454 pyrosequencing technology may be most apparent for species with high societal or economic value but few genomic resources. Rapid means of gene sequence and SNP discovery using this novel sequencing technology provide a set of baseline tools for genome-level research. However, it is questionable how effective the sequencing of large numbers of short reads for species with essentially no prior gene sequence information will support contig assemblies and sequence annotation. With the purpose of generating the first broad survey of gene sequences in Eucalyptus grandis, the most widely planted hardwood tree species, we used 454 technology to sequence and assemble 148 Mbp of expressed sequences (EST). EST sequences were generated from a normalized cDNA pool comprised of multiple tissues and genotypes, promoting discovery of homologues to almost half of Arabidopsis genes, and a comprehensive survey of allelic variation in the transcriptome. By aligning the sequencing reads from multiple genotypes we detected 23,742 SNPs, 83% of which were validated in a sample. Genome-wide nucleotide diversity was estimated for 2,392 contigs using a modified theta (θ) parameter, adapted for measuring genetic diversity from polymorphisms detected by randomly sequencing a multi-genotype cDNA pool. Diversity estimates in non-synonymous nucleotides were on average 4x smaller than in synonymous, suggesting purifying selection. Non-synonymous to synonymous substitutions (Ka/Ks) among 2,001 contigs averaged 0.30 and was skewed to the right, further supporting that most genes are under purifying selection. Comparison of these estimates among contigs identified major functional classes of genes under purifying and diversifying selection in agreement with previous researches. In providing an abundance of foundational transcript sequences where limited prior genomic information existed, this work created part of the foundation for the annotation of the E. grandis genome that is being sequenced by the US Department of Energy. In addition we demonstrated that SNPs sampled in large-scale with 454 pyrosequencing can be used to detect evolutionary signatures among genes, providing one of the first genome-wide assessments of nucleotide diversity and Ka/Ks for a non-model plant species.

...read moreread less

476 citations

Journal Article•DOI•

Genome-wide analysis of the interaction between the endosymbiotic bacterium Wolbachia and its Drosophila host

[...]

Zhiyong Xi¹, Zhiyong Xi², Laurent Gavotte³, Laurent Gavotte¹, Yan Xie¹, Stephen L. Dobson¹ - Show less +2 more•Institutions (3)

University of Kentucky¹, Michigan State University², University of Montpellier³

02 Jan 2008-BMC Genomics

TL;DR: In vivo characterization of differentially-expressed products in gonads demonstrates that Angiotensin Converting Enzyme varies between Wolbachia infected and uninfected flies and that the variation occurs in a sex-specific manner, which supports the use of Wolbachian infected cell cultures as an appropriate model for predicting in vivo host/Wolbachia interactions.

...read moreread less

Abstract: Intracellular Wolbachia bacteria are obligate, maternally-inherited, endosymbionts found frequently in insects and other invertebrates. The success of Wolbachia can be attributed in part to an ability to alter host reproduction via mechanisms including cytoplasmic incompatibility (CI), parthenogenesis, feminization and male killing. Despite substantial scientific effort, the molecular mechanisms underlying the Wolbachia/host interaction are unknown. Here, an in vitro Wolbachia infection was generated in the Drosophila S2 cell line, and transcription profiles of infected and uninfected cells were compared by microarray. Differentially-expressed patterns related to reproduction, immune response and heat stress response are observed, including multiple genes that have been previously reported to be involved in the Wolbachia/host interaction. Subsequent in vivo characterization of differentially-expressed products in gonads demonstrates that Angiotensin Converting Enzyme (Ance) varies between Wolbachia infected and uninfected flies and that the variation occurs in a sex-specific manner. Consistent with expectations for the conserved CI mechanism, the observed Ance expression pattern is repeatable in different Drosophila species and with different Wolbachia types. To examine Ance involvement in the CI phenotype, compatible and incompatible crosses of Ance mutant flies were conducted. Significant differences are observed in the egg hatch rate resulting from incompatible crosses, providing support for additional experiments examining for an interaction of Ance with the CI mechanism. Wolbachia infection is shown to affect the expression of multiple host genes, including Ance. Evidence for potential Ance involvement in the CI mechanism is described, including the prior report of Ance in spermatid differentiation, Wolbachia-induced sex-specific effects on Ance expression and an Ance mutation effect on CI levels. The results support the use of Wolbachia infected cell cultures as an appropriate model for predicting in vivo host/Wolbachia interactions.

...read moreread less

Journal Article•DOI•

Predicting prognosis using molecular profiling in estrogen receptor-positive breast cancer treated with tamoxifen.

[...]

Sherene Loi, Benjamin Haibe-Kains¹, Christine Desmedt, Pratyaksha Wirapati², Françoise Lallemand, Andrew Tutt³, Cheryl Gillet³, Paul Ellis³, Kenneth Ryder³, James F. Reid⁴, Maria Grazia Daidone⁴, M. A. Pierotti⁴, Els M.J.J. Berns⁵, Maurice P.H.M. Jansen⁵, John A. Foekens⁵, Mauro Delorenzi², Gianluca Bontempi¹, Martine Piccart, Christos Sotiriou - Show less +15 more•Institutions (5)

Université libre de Bruxelles¹, Swiss Institute of Bioinformatics², Guy's Hospital³, University of Milan⁴, Erasmus University Rotterdam⁵

22 May 2008-BMC Genomics

TL;DR: A gene classifier that can predict clinical outcome in tamoxifen-treated ER+ BC patients is developed and other genes and pathways that may elucidate further mechanisms that influence clinical outcome and prediction of response to tamoxIFen are proposed.

...read moreread less

Abstract: Estrogen receptor positive (ER+) breast cancers (BC) are heterogeneous with regard to their clinical behavior and response to therapies. The ER is currently the best predictor of response to the anti-estrogen agent tamoxifen, yet up to 30–40% of ER+BC will relapse despite tamoxifen treatment. New prognostic biomarkers and further biological understanding of tamoxifen resistance are required. We used gene expression profiling to develop an outcome-based predictor using a training set of 255 ER+ BC samples from women treated with adjuvant tamoxifen monotherapy. We used clusters of highly correlated genes to develop our predictor to facilitate both signature stability and biological interpretation. Independent validation was performed using 362 tamoxifen-treated ER+ BC samples obtained from multiple institutions and treated with tamoxifen only in the adjuvant and metastatic settings. We developed a gene classifier consisting of 181 genes belonging to 13 biological clusters. In the independent set of adjuvantly-treated samples, it was able to define two distinct prognostic groups (HR 2.01 95%CI: 1.29–3.13; p = 0.002). Six of the 13 gene clusters represented pathways involved in cell cycle and proliferation. In 112 metastatic breast cancer patients treated with tamoxifen, one of the classifier components suggesting a cellular inflammatory mechanism was significantly predictive of response. We have developed a gene classifier that can predict clinical outcome in tamoxifen-treated ER+ BC patients. Whilst our study emphasizes the important role of proliferation genes in prognosis, our approach proposes other genes and pathways that may elucidate further mechanisms that influence clinical outcome and prediction of response to tamoxifen.

...read moreread less

Journal Article•DOI•

Genome sequence and rapid evolution of the rice pathogen Xanthomonas oryzae pv. oryzae PXO99A

[...]

Steven L. Salzberg¹, Daniel D. Sommer¹, Michael C. Schatz¹, Adam M. Phillippy¹, Pablo D. Rabinowicz², Seiji Tsuge³, Ayako Furutani³, Hirokazu Ochiai, Arthur L. Delcher¹, David R. Kelley¹, Ramana Madupu, Daniela Puiu¹, Diana Radune, Martin Shumway, Cole Trapnell¹, Gudlur Aparna⁴, Gopaljee Jha⁴, Alok Pandey⁴, Prabhu B. Patil⁴, Hiromichi Ishihara⁵, Damien F. Meyer⁶, Boris Szurek, Valérie Verdier, Ralf Koebnik, J. Maxwell Dow⁷, Robert P. Ryan⁷, Hisae Hirata⁸, Shinji Tsuyumu⁷, Sang Won Lee⁹, Pamela C. Ronald⁹, Ramesh V. Sonti⁴, Marie-Anne Van Sluys⁴, Jan E. Leach⁴, Frank F. White¹⁰, Adam J. Bogdanove⁶ - Show less +31 more•Institutions (10)

University of Maryland, College Park¹, University of Maryland, Baltimore², Kyoto Prefectural University³, Council of Scientific and Industrial Research⁴, Colorado State University⁵, Iowa State University⁶, University College Cork⁷, Shizuoka University⁸, University of California, Davis⁹, Kansas State University¹⁰

01 May 2008-BMC Genomics

TL;DR: The complete genome sequence of strain PXO99A is reported on and its comparison to two previously sequenced strains, KACC10331 and MAFF311018, which are highly similar to one another and point to sources of genomic variation and candidates for strain-specific adaptations of this pathogen.

...read moreread less

Abstract: Xanthomonas oryzae pv. oryzae causes bacterial blight of rice (Oryza sativa L.), a major disease that constrains production of this staple crop in many parts of the world. We report here on the complete genome sequence of strain PXO99A and its comparison to two previously sequenced strains, KACC10331 and MAFF311018, which are highly similar to one another. The PXO99A genome is a single circular chromosome of 5,240,075 bp, considerably longer than the genomes of the other strains (4,941,439 bp and 4,940,217 bp, respectively), and it contains 5083 protein-coding genes, including 87 not found in KACC10331 or MAFF311018. PXO99A contains a greater number of virulence-associated transcription activator-like effector genes and has at least ten major chromosomal rearrangements relative to KACC10331 and MAFF311018. PXO99A contains numerous copies of diverse insertion sequence elements, members of which are associated with 7 out of 10 of the major rearrangements. A rapidly-evolving CRISPR (clustered regularly interspersed short palindromic repeats) region contains evidence of dozens of phage infections unique to the PXO99A lineage. PXO99A also contains a unique, near-perfect tandem repeat of 212 kilobases close to the replication terminus. Our results provide striking evidence of genome plasticity and rapid evolution within Xanthomonas oryzae pv. oryzae. The comparisons point to sources of genomic variation and candidates for strain-specific adaptations of this pathogen that help to explain the extraordinary diversity of Xanthomonas oryzae pv. oryzae genotypes and races that have been isolated from around the world.

...read moreread less

Journal Article•DOI•

Novel and nodulation-regulated microRNAs in soybean roots

[...]

Senthil Subramanian¹, Yan-Yan Fu¹, Ramanjulu Sunkar², W. Brad Barbazuk¹, Jian-Kang Zhu³, Oliver Yu¹ - Show less +2 more•Institutions (3)

Donald Danforth Plant Science Center¹, Oklahoma State University–Stillwater², University of California, Riverside³

10 Apr 2008-BMC Genomics

TL;DR: Construction and analysis of a small RNA library led to the identification of 20 conserved and 35 novel miRNA families in soybean and enable investigation of the role of miRNAs in rhizobial symbiosis.

...read moreread less

Abstract: Small RNAs regulate a number of developmental processes in plants and animals. However, the role of small RNAs in legume-rhizobial symbiosis is largely unexplored. Symbiosis between legumes (e.g. soybean) and rhizobia bacteria (e.g. Bradyrhizobium japonicum) results in root nodules where the majority of biological nitrogen fixation occurs. We sought to identify microRNAs (miRNAs) regulated during soybean-B. japonicum symbiosis. We sequenced ~350000 small RNAs from soybean roots inoculated with B. japonicum and identified conserved miRNAs based on similarity to miRNAs known in other plant species and new miRNAs based on potential hairpin-forming precursors within soybean EST and shotgun genomic sequences. These bioinformatics analyses identified 55 families of miRNAs of which 35 were novel. A subset of these miRNAs were validated by Northern analysis and miRNAs differentially responding to B. japonicum inoculation were identified. We also identified putative target genes of the identified miRNAs and verified in vivo cleavage of a subset of these targets by 5'-RACE analysis. Using conserved miRNAs as internal control, we estimated that our analysis identified ~50% of miRNAs in soybean roots. Construction and analysis of a small RNA library led to the identification of 20 conserved and 35 novel miRNA families in soybean. The availability of complete and assembled genome sequence information will enable identification of many other miRNAs. The conserved miRNA loci and novel miRNAs identified in this study enable investigation of the role of miRNAs in rhizobial symbiosis.

...read moreread less

Journal Article•DOI•

The adaptive evolution of the mammalian mitochondrial genome.

[...]

Rute R. da Fonseca¹, Rute R. da Fonseca², Warren E. Johnson³, Stephen J. O'Brien³, Maria J. Ramos², Agostinho Antunes², Agostinho Antunes³ - Show less +3 more•Institutions (3)

University of Copenhagen¹, University of Porto², National Institutes of Health³

04 Mar 2008-BMC Genomics

TL;DR: This study provides insight into the adaptive evolution of the mtDNA genome in mammals and its implications for the molecular mechanism of oxidative phosphorylation, and presents a framework for future experimental characterization of the impact of specific mutations in the function, physiology, and interactions of themtDNA encoded proteins involved in oxidativeosphorylation.

...read moreread less

Abstract: The mitochondria produce up to 95% of a eukaryotic cell's energy through oxidative phosphorylation. The proteins involved in this vital process are under high functional constraints. However, metabolic requirements vary across species, potentially modifying selective pressures. We evaluate the adaptive evolution of 12 protein-coding mitochondrial genes in 41 placental mammalian species by assessing amino acid sequence variation and exploring the functional implications of observed variation in secondary and tertiary protein structures. Wide variation in the properties of amino acids were observed at functionally important regions of cytochrome b in species with more-specialized metabolic requirements (such as adaptation to low energy diet or large body size, such as in elephant, dugong, sloth, and pangolin, and adaptation to unusual oxygen requirements, for example diving in cetaceans, flying in bats, and living at high altitudes in alpacas). Signatures of adaptive variation in the NADH dehydrogenase complex were restricted to the loop regions of the transmembrane units which likely function as protons pumps. Evidence of adaptive variation in the cytochrome c oxidase complex was observed mostly at the interface between the mitochondrial and nuclear-encoded subunits, perhaps evidence of co-evolution. The ATP8 subunit, which has an important role in the assembly of F0, exhibited the highest signal of adaptive variation. ATP6, which has an essential role in rotor performance, showed a high adaptive variation in predicted loop areas. Our study provides insight into the adaptive evolution of the mtDNA genome in mammals and its implications for the molecular mechanism of oxidative phosphorylation. We present a framework for future experimental characterization of the impact of specific mutations in the function, physiology, and interactions of the mtDNA encoded proteins involved in oxidative phosphorylation.

...read moreread less

Journal Article•DOI•

Phylogenetic and genomic diversity of human bacteremic Escherichia coli strains

[...]

Françoise Jauréguy¹, Françoise Jauréguy², Luce Landraud², Virginie Passet³, Laure Diancourt³, Eric Frapy², Ghislaine Guigon³, Etienne Carbonnelle², Olivier Lortholary⁴, Olivier Lortholary³, Olivier Clermont⁵, Erick Denamur⁵, Bertrand Picard¹, Xavier Nassif², Sylvain Brisse³ - Show less +11 more•Institutions (5)

University of Paris¹, Paris Descartes University², Pasteur Institute³, Necker-Enfants Malades Hospital⁴, Paris Diderot University⁵

26 Nov 2008-BMC Genomics

TL;DR: It is demonstrated that human bacteremia strains distribute over the entire span of E. coli phylogenetic diversity and that CCs represent important phylogenetic units for pathogenesis and comparative genomics.

...read moreread less

Abstract: Extraintestinal pathogenic Escherichia coli (ExPEC) strains represent a huge public health burden. Knowledge of their clonal diversity and of the association of clones with genomic content and clinical features is a prerequisite to recognize strains with a high invasive potential. In order to provide an unbiased view of the diversity of E. coli strains responsible for bacteremia, we studied 161 consecutive isolates from patients with positive blood culture obtained during one year in two French university hospitals. We collected precise clinical information, multilocus sequence typing (MLST) data and virulence gene content for all isolates. A subset representative of the clonal diversity was subjected to comparative genomic hybridization (CGH) using 2,324 amplicons from the flexible gene pool of E. coli. Recombination-insensitive phylogenetic analysis of MLST data in combination with the ECOR collection revealed that bacteremic E. coli isolates were highly diverse and distributed into five major lineages, corresponding to the classical E. coli phylogroups (A+B1, B2, D and E) and group F, which comprises strains previously assigned to D. Compared to other strains of phylogenetic group B2, strains belonging to MLST-derived clonal complexes (CCs) CC1 and CC4 were associated (P < 0.05) with a urinary origin. In contrast, no CC appeared associated with severe sepsis or unfavorable outcome of the bacteremia. CGH analysis revealed genomic characteristics of the distinct CCs and identified genomic regions associated with CC1 and/or CC4. Our results demonstrate that human bacteremia strains distribute over the entire span of E. coli phylogenetic diversity and that CCs represent important phylogenetic units for pathogenesis and comparative genomics.

...read moreread less

Journal Article•DOI•

Transcriptomic dissection of tongue squamous cell carcinoma

[...]

Hui Ye¹, Tianwei Yu², Stéphane Temam³, Stéphane Temam⁴, Barry L. Ziober⁵, Jianguang Wang⁶, Joel L. Schwartz¹, Li Mao³, David T.W. Wong⁷, Xiaofeng Zhou⁶, Xiaofeng Zhou¹ - Show less +7 more•Institutions (7)

University of Illinois at Chicago¹, Emory University², University of Texas MD Anderson Cancer Center³, Institut Gustave Roussy⁴, University of Pennsylvania⁵, Sun Yat-sen University⁶, University of California, Los Angeles⁷

06 Feb 2008-BMC Genomics

TL;DR: This study provided a transcriptomic signature for OTSCC that may lead to a diagnosis or screen tool and provide the foundation for further functional validation of these specific candidate genes for O TSCC.

...read moreread less

Abstract: The head and neck/oral squamous cell carcinoma (HNOSCC) is a diverse group of cancers, which develop from many different anatomic sites and are associated with different risk factors and genetic characteristics. The oral tongue squamous cell carcinoma (OTSCC) is one of the most common types of HNOSCC. It is significantly more aggressive than other forms of HNOSCC, in terms of local invasion and spread. In this study, we aim to identify specific transcriptomic signatures that associated with OTSCC. Genome-wide transcriptomic profiles were obtained for 53 primary OTSCCs and 22 matching normal tissues. Genes that exhibit statistically significant differences in expression between OTSCCs and normal were identified. These include up-regulated genes (MMP1, MMP10, MMP3, MMP12, PTHLH, INHBA, LAMC2, IL8, KRT17, COL1A2, IFI6, ISG15, PLAU, GREM1, MMP9, IFI44, CXCL1), and down-regulated genes (KRT4, MAL, CRNN, SCEL, CRISP3, SPINK5, CLCA4, ADH1B, P11, TGM3, RHCG, PPP1R3C, CEACAM7, HPGD, CFD, ABCA8, CLU, CYP3A5). The expressional difference of IL8 and MMP9 were further validated by real-time quantitative RT-PCR and immunohistochemistry. The Gene Ontology analysis suggested a number of altered biological processes in OTSCCs, including enhancements in phosphate transport, collagen catabolism, I-kappaB kinase/NF-kappaB signaling cascade, extracellular matrix organization and biogenesis, chemotaxis, as well as suppressions of superoxide release, hydrogen peroxide metabolism, cellular response to hydrogen peroxide, keratinization, and keratinocyte differentiation in OTSCCs. In summary, our study provided a transcriptomic signature for OTSCC that may lead to a diagnosis or screen tool and provide the foundation for further functional validation of these specific candidate genes for OTSCC.

...read moreread less

Journal Article•DOI•

Expression of the cytochrome P450s, CYP6P3 and CYP6M2 are significantly elevated in multiple pyrethroid resistant populations of Anopheles gambiae s.s. from Southern Benin and Nigeria.

[...]

Rousseau Djouaka¹, Rousseau Djouaka², Adekunle A. Bakare¹, Ousmane Coulibaly², Martin Akogbeto, Hilary Ranson³, Janet Hemingway³, Clare Strode³ - Show less +4 more•Institutions (3)

University of Ibadan¹, International Institute of Tropical Agriculture², Liverpool School of Tropical Medicine³

13 Nov 2008-BMC Genomics

TL;DR: The discovery that mosquitoes collected from different types of breeding sites display differing profiles of metabolic genes at the adult stage may reflect the influence of a range of xenobiotics on selecting for resistance in mosquitoes.

...read moreread less

Abstract: Insecticide resistance in Anopheles mosquitoes is threatening the success of malaria control programmes. This is particularly true in Benin where pyrethroid resistance has been linked to the failure of insecticide treated bed nets. The role of mutations in the insecticide target sites in conferring resistance has been clearly established. In this study, the contribution of other potential resistance mechanisms was investigated in Anopheles gambiae s.s. from a number of localities in Southern Benin and Nigeria. The mosquitoes were sampled from a variety of breeding sites in a preliminary attempt to investigate the role of contamination of mosquito breeding sites in selecting for resistance in adult mosquitoes. All mosquitoes sampled belonged to the M form of An. gambiae s.s. There were high levels of permethrin resistance in an agricultural area (Akron) and an urban area (Gbedjromede), low levels of resistance in mosquito samples from an oil contaminated site (Ojoo) and complete susceptibility in the rural Orogun location. The target site mutation kdrW was detected at high levels in two of the populations (Akron f = 0.86 and Gbedjromede f = 0.84) but was not detected in Ojoo or Orogun. Microarray analysis using the Anopheles gambiae detox chip identified two P450s, CYP6P3 and CYP6M2 up regulated in all three populations, the former was expressed at particularly high levels in the Akron (12.4-fold) and Ojoo (7.4-fold) populations compared to the susceptible population. Additional detoxification and redox genes were also over expressed in one or more populations including two cuticular pre-cursor genes which were elevated in two of the three resistant populations. Multiple resistance mechanisms incurred in the different breeding sites contribute to resistance to permethrin in Benin. The cytochrome P450 genes, CYP6P3 and CYP6M2 are upregulated in all three resistant populations analysed. Several additional potential resistance mechanisms were also identified that warrant further investigation. Metabolic genes were over expressed irrespective of the presence of kdr, the latter resistance mechanism being absent in one resistant population. The discovery that mosquitoes collected from different types of breeding sites display differing profiles of metabolic genes at the adult stage may reflect the influence of a range of xenobiotics on selecting for resistance in mosquitoes.

...read moreread less

Journal Article•DOI•

Evolution of the chicken Toll-like receptor gene family : A story of gene gain and gene loss

[...]

Nicholas D Temperley¹, Sofia Berlin², Sofia Berlin¹, Ian R. Paton¹, Darren K. Griffin³, David W. Burt¹ - Show less +2 more•Institutions (3)

The Roslin Institute¹, Uppsala University², University of Kent³

01 Feb 2008-BMC Genomics

TL;DR: Comparative phylogenetic analysis of vertebrate TLR genes provides insight into their patterns and processes of gene evolution, with examples of both gene gain and gene loss.

...read moreread less

Abstract: Toll-like receptors (TLRs) perform a vital role in disease resistance through their recognition of pathogen associated molecular patterns (PAMPs). Recent advances in genomics allow comparison of TLR genes within and between many species. This study takes advantage of the recently sequenced chicken genome to determine the complete chicken TLR repertoire and place it in context of vertebrate genomic evolution. The chicken TLR repertoire consists of ten genes. Phylogenetic analyses show that six of these genes have orthologs in mammals and fish, while one is only shared by fish and three appear to be unique to birds. Furthermore the phylogeny shows that TLR1-like genes arose independently in fish, birds and mammals from an ancestral gene also shared by TLR6 and TLR10. All other TLRs were already present prior to the divergence of major vertebrate lineages 550 Mya (million years ago) and have since been lost in certain lineages. Phylogenetic analysis shows the absence of TLRs 8 and 9 in chicken to be the result of gene loss. The notable exception to the tendency of gene loss in TLR evolution is found in chicken TLRs 1 and 2, each of which underwent gene duplication about 147 and 65 Mya, respectively. Comparative phylogenetic analysis of vertebrate TLR genes provides insight into their patterns and processes of gene evolution, with examples of both gene gain and gene loss. In addition, these comparisons clarify the nomenclature of TLR genes in vertebrates.

...read moreread less

Journal Article•DOI•

Hidden layers of human small RNAs.

[...]

Hideya Kawaji, Mari M. Nakamura, Yukari Takahashi, Albin Sandelin¹, Shintaro Katayama, Shiro Fukuda, Carsten O. Daub, Chikatoshi Kai, Jun Kawai, Jun Yasuda, Piero Carninci, Yoshihide Hayashizaki - Show less +8 more•Institutions (1)

University of Copenhagen¹

10 Apr 2008-BMC Genomics

TL;DR: The data shows that well-characterized non-coding RNA, such as tRNA, snoRNA, and snRNA are cleaved at sites specific to the class of ncRNA, indicating that the small RNAs are a product of dsRNA formation and their subsequent cleavage.

...read moreread less

Abstract: Small RNA attracts increasing interest based on the discovery of RNA silencing and the rapid progress of our understanding of these phenomena. Although recent studies suggest the possible existence of yet undiscovered types of small RNAs in higher organisms, many studies to profile small RNA have focused on miRNA and/or siRNA rather than on the exploration of additional classes of RNAs. Here, we explored human small RNAs by unbiased sequencing of RNAs with sizes of 19–40 nt. We provide substantial evidences for the existence of independent classes of small RNAs. Our data shows that well-characterized non-coding RNA, such as tRNA, snoRNA, and snRNA are cleaved at sites specific to the class of ncRNA. In particular, tRNA cleavage is regulated depending on tRNA type and tissue expression. We also found small RNAs mapped to genomic regions that are transcribed in both directions by bidirectional promoters, indicating that the small RNAs are a product of dsRNA formation and their subsequent cleavage. Their partial similarity with ribosomal RNAs (rRNAs) suggests unrevealed functions of ribosomal DNA or interstitial rRNA. Further examination revealed six novel miRNAs. Our results underscore the complexity of the small RNA world and the biogenesis of small RNAs.

...read moreread less

Journal Article•DOI•

Viral genome sequencing by random priming methods

[...]

Appolinaire Djikeng¹, Rebecca A. Halpin¹, Ryan Kuzmickas¹, Jay V. DePasse², Jeremy I. Feldblyum¹, Naomi Sengamalay¹, Claudio L. Afonso³, Xinsheng Zhang⁴, Norman G. Anderson, Elodie Ghedin², David J. Spiro¹ - Show less +7 more•Institutions (4)

J. Craig Venter Institute¹, University of Pittsburgh², United States Department of Agriculture³, Ohio Agricultural Research and Development Center⁴

07 Jan 2008-BMC Genomics

TL;DR: The SISPA methodology is adapted to genome sequencing of RNA and DNA viruses and of great utility in generating whole genome assemblies for viruses with little or no available sequence information, viruses from greatly divergent families, previously uncharacterized viruses, or to more fully describe mixed viral infections.

...read moreread less

Abstract: Most emerging health threats are of zoonotic origin. For the overwhelming majority, their causative agents are RNA viruses which include but are not limited to HIV, Influenza, SARS, Ebola, Dengue, and Hantavirus. Of increasing importance therefore is a better understanding of global viral diversity to enable better surveillance and prediction of pandemic threats; this will require rapid and flexible methods for complete viral genome sequencing. We have adapted the SISPA methodology [1–3] to genome sequencing of RNA and DNA viruses. We have demonstrated the utility of the method on various types and sources of viruses, obtaining near complete genome sequence of viruses ranging in size from 3,000–15,000 kb with a median depth of coverage of 14.33. We used this technique to generate full viral genome sequence in the presence of host contaminants, using viral preparations from cell culture supernatant, allantoic fluid and fecal matter. The method described is of great utility in generating whole genome assemblies for viruses with little or no available sequence information, viruses from greatly divergent families, previously uncharacterized viruses, or to more fully describe mixed viral infections.

...read moreread less

Journal Article•DOI•

Genome-wide and expression analysis of protein phosphatase 2C in rice and Arabidopsis

[...]

Tongtong Xue¹, Dong Wang², Dong Wang¹, Shizhong Zhang¹, Juergen Ehlting³, Fei Ni¹, Stephen Jakab⁴, Chengchao Zheng¹, Yuan-Fu Zhong¹, Yuan-Fu Zhong⁴ - Show less +6 more•Institutions (4)

Shandong Agricultural University¹, Fudan University², University of Victoria³, Millersville University of Pennsylvania⁴

20 Nov 2008-BMC Genomics

TL;DR: This comparative genome-wide overview of the PP2C family in Arabidopsis and rice provides insights into the functions and regulatory mechanisms, as well as the evolution and divergence of thePP2C genes in dicots and monocots.

...read moreread less

Abstract: The protein phosphatase 2Cs (PP2Cs) from various organisms have been implicated to act as negative modulators of protein kinase pathways involved in diverse environmental stress responses and developmental processes. A genome-wide overview of the PP2C gene family in plants is not yet available. A comprehensive computational analysis identified 80 and 78 PP2C genes in Arabidopsis thaliana (AtPP2Cs) and Oryza sativa (OsPP2Cs), respectively, which denotes the PP2C gene family as one of the largest families identified in plants. Phylogenic analysis divided PP2Cs in Arabidopsis and rice into 13 and 11 subfamilies, respectively, which are supported by the analyses of gene structures and protein motifs. Comparative analysis between the PP2C genes in Arabidopsis and rice identified common and lineage-specific subfamilies and potential 'gene birth-and-death' events. Gene duplication analysis reveals that whole genome and chromosomal segment duplications mainly contributed to the expansion of both OsPP2Cs and AtPP2Cs, but tandem or local duplication occurred less frequently in Arabidopsis than rice. Some protein motifs are widespread among the PP2C proteins, whereas some other motifs are specific to only one or two subfamilies. Expression pattern analysis suggests that 1) most PP2C genes play functional roles in multiple tissues in both species, 2) the induced expression of most genes in subfamily A by diverse stimuli indicates their primary role in stress tolerance, especially ABA response, and 3) the expression pattern of subfamily D members suggests that they may constitute positive regulators in ABA-mediated signaling pathways. The analyses of putative upstream regulatory elements by two approaches further support the functions of subfamily A in ABA signaling, and provide insights into the shared and different transcriptional regulation machineries in dicots and monocots. This comparative genome-wide overview of the PP2C family in Arabidopsis and rice provides insights into the functions and regulatory mechanisms, as well as the evolution and divergence of the PP2C genes in dicots and monocots. Bioinformatics analyses suggest that plant PP2C proteins from different subfamilies participate in distinct signaling pathways. Our results have established a solid foundation for future studies on the functional divergence in different PP2C subfamilies.

...read moreread less

Journal Article•DOI•

Heat stress-responsive transcriptome analysis in heat susceptible and tolerant wheat ( Triticum aestivum L.) by using Wheat Genome Array

[...]

Dandan Qin¹, Haiyan Wu¹, Huiru Peng¹, Yingyin Yao¹, Zhongfu Ni¹, Zhenxing Li¹, Chunlei Zhou¹, Qixin Sun¹ - Show less +4 more•Institutions (1)

China Agricultural University¹

22 Sep 2008-BMC Genomics

TL;DR: The heat stress responsive genes identified in this study will facilitate the understanding of molecular basis for heatolerance in different wheat genotypes and future improvement of heat tolerance in wheat and other cereals.

...read moreread less

Abstract: Wheat is a major crop in the world, and the high temperature stress can reduce the yield of wheat by as much as 15%. The molecular changes in response to heat stress are poorly understood. Using GeneChip® Wheat Genome Array, we analyzed genome-wide gene expression profiles in the leaves of two wheat genotypes, namely, heat susceptible 'Chinese Spring' (CS) and heat tolerant 'TAM107' (TAM). A total of 6560 (~10.7%) probe sets displayed 2-fold or more changes in expression in at least one heat treatment (f alse d iscovery r ate, FDR, α = 0.001). Except for heat shock protein (HSP) and heat shock factor (HSF) genes, these putative heat responsive genes encode transcription factors and proteins involved in phytohormone biosynthesis/signaling, calcium and sugar signal pathways, RNA metabolism, ribosomal proteins, primary and secondary metabolisms, as well as proteins related to other stresses. A total of 313 probe sets were differentially expressed between the two genotypes, which could be responsible for the difference in heat tolerance of the two genotypes. Moreover, 1314 were differentially expressed between the heat treatments with and without pre-acclimation, and 4533 were differentially expressed between short and prolonged heat treatments. The differences in heat tolerance in different wheat genotypes may be associated with multiple processes and mechanisms involving HSPs, transcription factors, and other stress related genes. Heat acclimation has little effects on gene expression under prolonged treatments but affects gene expression in wheat under short-term heat stress. The heat stress responsive genes identified in this study will facilitate our understanding of molecular basis for heat tolerance in different wheat genotypes and future improvement of heat tolerance in wheat and other cereals.

...read moreread less

Journal Article•DOI•

PlantPAN: Plant promoter analysis navigator, for identifying combinatorial cis-regulatory elements with distance constraint in plant gene groups

[...]

Wen Chi Chang¹, Tzong-Yi Lee², Hsien Da Huang², His Yuan Huang², Rong Long Pan¹ - Show less +1 more•Institutions (2)

National Tsing Hua University¹, National Chiao Tung University²

26 Nov 2008-BMC Genomics

TL;DR: A database-assisted system, PlantPAN (Plant Promoter Analysis Navigator), for recognizing combinatorial cis-regulatory elements with a distance constraint in sets of plant genes and enables other regulatory features in a plant promoter, such as CpG/CpNpG islands and tandem repeats, to be displayed.

...read moreread less

Abstract: The elucidation of transcriptional regulation in plant genes is important area of research for plant scientists, following the mapping of various plant genomes, such as A. thaliana, O. sativa and Z. mays. A variety of bioinformatic servers or databases of plant promoters have been established, although most have been focused only on annotating transcription factor binding sites in a single gene and have neglected some important regulatory elements (tandem repeats and CpG/CpNpG islands) in promoter regions. Additionally, the combinatorial interaction of transcription factors (TFs) is important in regulating the gene group that is associated with the same expression pattern. Therefore, a tool for detecting the co-regulation of transcription factors in a group of gene promoters is required. This study develops a database-assisted system, PlantPAN (Plant Promoter Analysis Navigator), for recognizing combinatorial cis-regulatory elements with a distance constraint in sets of plant genes. The system collects the plant transcription factor binding profiles from PLACE, TRANSFAC (public release 7.0), AGRIS, and JASPER databases and allows users to input a group of gene IDs or promoter sequences, enabling the co-occurrence of combinatorial transcription factor binding sites (TFBSs) within a defined distance (20 bp to 200 bp) to be identified. Furthermore, the new resource enables other regulatory features in a plant promoter, such as CpG/CpNpG islands and tandem repeats, to be displayed. The regulatory elements in the conserved regions of the promoters across homologous genes are detected and presented. In addition to providing a user-friendly input/output interface, PlantPAN has numerous advantages in the analysis of a plant promoter. Several case studies have established the effectiveness of PlantPAN. This novel analytical resource is now freely available at http://PlantPAN.mbc.nctu.edu.tw .

...read moreread less

Journal Article•DOI•

High-throughput sequencing of Medicago truncatula short RNAs identifies eight new miRNA families

[...]

György Szittya¹, Simon Moxon¹, Dulce M. Santos, Runchun Jing¹, Manuel Pedro Salema Fevereiro, Vincent Moulton¹, Tamas Dalmay¹ - Show less +3 more•Institutions (1)

University of East Anglia¹

09 Dec 2008-BMC Genomics

TL;DR: Deep sequencing of short RNAs from M. truncatula leaves identified eight new miRNAs indicating that specific miRN as well as 26 novel miRNA candidates that were potentially generated from 32 loci.

...read moreread less

Abstract: High-throughput sequencing technology is capable to identify novel short RNAs in plant species. We used Solexa sequencing to find new microRNAs in one of the model legume species, barrel medic (Medicago truncatula). 3,948,871 reads were obtained from two separate short RNA libraries generated from total RNA extracted from M. truncatula leaves, representing 1,563,959 distinct sequences. 2,168,937 reads were mapped to the available M. truncatula genome corresponding to 619,175 distinct sequences. 174,504 reads representing 25 conserved miRNA families showed perfect matches to known miRNAs. We also identified 26 novel miRNA candidates that were potentially generated from 32 loci. Nine of these loci produced eight distinct sequences, for which the miRNA* sequences were also sequenced. These sequences were not described in other plant species and accumulation of these eight novel miRNAs was confirmed by Northern blot analysis. Potential target genes were predicted for most conserved and novel miRNAs. Deep sequencing of short RNAs from M. truncatula leaves identified eight new miRNAs indicating that specific miRNAs exist in legume species.

...read moreread less

Journal Article•DOI•

Genome-wide identification, organization and phylogenetic analysis of Dicer-like, Argonaute and RNA-dependent RNA Polymerase gene families and their expression analysis during reproductive development and stress in rice.

[...]

Meenu Kapoor¹, R. Arora², Tenisha Lama¹, Aashima Nijhawan², Jitendra P. Khurana², Akhilesh K. Tyagi², Sanjay Kapoor² - Show less +3 more•Institutions (2)

Guru Gobind Singh Indraprastha University¹, University of Delhi²

01 Oct 2008-BMC Genomics

TL;DR: This investigation has identified 23 rice genes belonging to DCL, Argonaute and RDR gene families that could potentially be involved in reproductive development-specific gene regulatory mechanisms and a basis for further, more detailed investigations aimed at understanding the contribution of individual components of RNA silencing machinery during reproductive phase of plant development.

...read moreread less

Abstract: Important developmental processes in both plants and animals are partly regulated by genes whose expression is modulated at the post-transcriptional level by processes such as RNA interference (RNAi). Dicers, Argonautes and RNA-dependent RNA polymerases (RDR) form the core components that facilitate gene silencing and have been implicated in the initiation and maintenance of the trigger RNA molecules, central to process of RNAi. Investigations in eukaryotes have revealed that these proteins are encoded by variable number of genes with plants showing relatively higher number in each gene family. To date, no systematic expression profiling of these genes in any of the organisms has been reported. In this study, we provide a complete analysis of rice Dicer-like, Argonaute and RDR gene families including gene structure, genomic localization and phylogenetic relatedness among gene family members. We also present microarray-based expression profiling of these genes during 14 stages of reproductive and 5 stages of vegetative development and in response to cold, salt and dehydration stress. We have identified 8 Dicer-like (OsDCLs), 19 Argonaute (OsAGOs) and 5 RNA-dependent RNA polymerase (OsRDRs) genes in rice. Based on phylogeny, each of these genes families have been categorized into four subgroups. Although most of the genes express both in vegetative and reproductive organs, 2 OsDCLs, 14 OsAGOs and 3 OsRDRs were found to express specifically/preferentially during stages of reproductive development. Of these, 2 OsAGOs exhibited preferential up-regulation in seeds. One of the Argonautes (OsAGO2) also showed specific up-regulation in response to cold, salt and dehydration stress. This investigation has identified 23 rice genes belonging to DCL, Argonaute and RDR gene families that could potentially be involved in reproductive development-specific gene regulatory mechanisms. These data provide an insight into probable domains of activity of these genes and a basis for further, more detailed investigations aimed at understanding the contribution of individual components of RNA silencing machinery during reproductive phase of plant development.

...read moreread less

Journal Article•DOI•

The genome of Aeromonas salmonicida subsp. salmonicida A449: insights into the evolution of a fish pathogen

[...]

Michael Reith¹, Rama K. Singh¹, Bruce A. Curtis¹, Bruce A. Curtis², Jessica M. Boyd¹, Anne B. Bouevitch, Jennifer Kimball¹, Janet Munholland¹, Colleen A. Murphy¹, Darren Sarty¹, Jason Williams¹, John H. E. Nash, Stewart C. Johnson¹, Laura L. Brown¹ - Show less +10 more•Institutions (2)

Halifax¹, Dalhousie University²

18 Sep 2008-BMC Genomics

TL;DR: The genome sequence of A. salmonicida was determined to provide a better understanding of the virulence factors used by this pathogen to infect fish and provide insights into the mechanisms used by the bacterium for infection and avoidance of host defence systems.

...read moreread less

Abstract: Aeromonas salmonicida subsp. salmonicida is a Gram-negative bacterium that is the causative agent of furunculosis, a bacterial septicaemia of salmonid fish. While other species of Aeromonas are opportunistic pathogens or are found in commensal or symbiotic relationships with animal hosts, A. salmonicida subsp. salmonicida causes disease in healthy fish. The genome sequence of A. salmonicida was determined to provide a better understanding of the virulence factors used by this pathogen to infect fish. The nucleotide sequences of the A. salmonicida subsp. salmonicida A449 chromosome and two large plasmids are characterized. The chromosome is 4,702,402 bp and encodes 4388 genes, while the two large plasmids are 166,749 and 155,098 bp with 178 and 164 genes, respectively. Notable features are a large inversion in the chromosome and, in one of the large plasmids, the presence of a Tn21 composite transposon containing mercury resistance genes and an In2 integron encoding genes for resistance to streptomycin/spectinomycin, quaternary ammonia compounds, sulphonamides and chloramphenicol. A large number of genes encoding potential virulence factors were identified; however, many appear to be pseudogenes since they contain insertion sequences, frameshifts or in-frame stop codons. A total of 170 pseudogenes and 88 insertion sequences (of ten different types) are found in the A. salmonicida genome. Comparison with the A. hydrophila ATCC 7966T genome reveals multiple large inversions in the chromosome as well as an approximately 9% difference in gene content indicating instances of single gene or operon loss or gain. A limited number of the pseudogenes found in A. salmonicida A449 were investigated in other Aeromonas strains and species. While nearly all the pseudogenes tested are present in A. salmonicida subsp. salmonicida strains, only about 25% were found in other A. salmonicida subspecies and none were detected in other Aeromonas species. Relative to the A. hydrophila ATCC 7966T genome, the A. salmonicida subsp. salmonicida genome has acquired multiple mobile genetic elements, undergone substantial rearrangement and developed a significant number of pseudogenes. These changes appear to be a consequence of adaptation to a specific host, salmonid fish, and provide insights into the mechanisms used by the bacterium for infection and avoidance of host defence systems.

...read moreread less

Journal Article•DOI•

A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes

[...]

Stefan Kurtz¹, Apurva Narechania², Apurva Narechania³, Joshua C. Stein³, Doreen Ware³ - Show less +1 more•Institutions (3)

University of Hamburg¹, American Museum of Natural History², Cold Spring Harbor Laboratory³

31 Oct 2008-BMC Genomics

TL;DR: The Tallymer software, a flexible and memory-efficient collection of programs for k-mer counting and indexing of large sequence sets, is introduced, based on enhanced suffix arrays that gives a much larger flexibility concerning the choice of the k-mers size.

...read moreread less

Abstract: The challenges of accurate gene prediction and enumeration are further aggravated in large genomes that contain highly repetitive transposable elements (TEs). Yet TEs play a substantial role in genome evolution and are themselves an important subject of study. Repeat annotation, based on counting occurrences of k-mers, has been previously used to distinguish TEs from low-copy genic regions; but currently available software solutions are impractical due to high memory requirements or specialization for specific user-tasks. Here we introduce the Tallymer software, a flexible and memory-efficient collection of programs for k-mer counting and indexing of large sequence sets. Unlike previous methods, Tallymer is based on enhanced suffix arrays. This gives a much larger flexibility concerning the choice of the k-mer size. Tallymer can process large data sizes of several billion bases. We used it in a variety of applications to study the genomes of maize and other plant species. In particular, Tallymer was used to index a set of whole genome shotgun sequences from maize (B73) (total size 109 bp.). We analyzed k-mer frequencies for a wide range of k. At this low genome coverage (≈ 0.45×) highly repetitive 20-mers constituted 44% of the genome but represented only 1% of all possible k-mers. Similar low-complexity was seen in the repeat fractions of sorghum and rice. When applying our method to other maize data sets, High-C0t derived sequences showed the greatest enrichment for low-copy sequences. Among annotated TEs, the most highly repetitive were of the Ty3/gypsy class of retrotransposons, followed by the Ty1/copia class, and DNA transposons. Among expressed sequence tags (EST), a notable fraction contained high-copy k-mers, suggesting that transposons are still active in maize. Retrotransposons in Mo17 and McC cultivars were readily detected using the B73 20-mer frequency index, indicating their conservation despite extensive rearrangement across cultivars. Among one hundred annotated bacterial artificial chromosomes (BACs), k-mer frequency could be used to detect transposon-encoded genes with 92% sensitivity, compared to 96% using alignment-based repeat masking, while both methods showed 92% specificity. The Tallymer software was effective in a variety of applications to aid genome annotation in maize, despite limitations imposed by the relatively low coverage of sequence available. For more information on the software, see http://www.zbh.uni-hamburg.de/Tallymer .

...read moreread less

Journal Article•DOI•

A comprehensive collection of experimentally validated primers for Polymerase Chain Reaction quantitation of murine transcript abundance

[...]

Athanasia Spandidos¹, Xiaowei Wang², Xiaowei Wang¹, Huajun Wang¹, Stefan Dragnev¹, Tara K. Thurber¹, Brian Seed¹ - Show less +3 more•Institutions (2)

Harvard University¹, Washington University in St. Louis²

24 Dec 2008-BMC Genomics

TL;DR: An experimentally validated collection of murine primer pairs for PCR and QPCR which can be used under a common PCR thermal profile, allowing the evaluation of transcript abundance of a large number of genes in parallel.

...read moreread less

Abstract: Quantitative polymerase chain reaction (QPCR) is a widely applied analytical method for the accurate determination of transcript abundance. Primers for QPCR have been designed on a genomic scale but non-specific amplification of non-target genes has frequently been a problem. Although several online databases have been created for the storage and retrieval of experimentally validated primers, only a few thousand primer pairs are currently present in existing databases and the primers are not designed for use under a common PCR thermal profile. We previously reported the implementation of an algorithm to predict PCR primers for most known human and mouse genes. We now report the use of that resource to identify 17483 pairs of primers that have been experimentally verified to amplify unique sequences corresponding to distinct murine transcripts. The primer pairs have been validated by gel electrophoresis, DNA sequence analysis and thermal denaturation profile. In addition to the validation studies, we have determined the uniformity of amplification using the primers and the technical reproducibility of the QPCR reaction using the popular and inexpensive SYBR Green I detection method. We have identified an experimentally validated collection of murine primer pairs for PCR and QPCR which can be used under a common PCR thermal profile, allowing the evaluation of transcript abundance of a large number of genes in parallel. This feature is increasingly attractive for confirming and/or making more precise data trends observed from experiments performed with DNA microarrays.

...read moreread less

Journal Article•DOI•

A comparative study of different machine learning methods on microarray gene expression data.

[...]

Mehdi Pirooznia¹, Jack Y. Yang², Mary Qu Yang³, Youping Deng¹•Institutions (3)

University of Southern Mississippi¹, Harvard University², United States Department of Health and Human Services³

20 Mar 2008-BMC Genomics

TL;DR: The importance of feature selection in accurately classifying new samples and how an integrated feature selection and classification algorithm is performing and is capable of identifying significant genes are revealed.

...read moreread less

Abstract: Several classification and feature selection methods have been studied for the identification of differentially expressed genes in microarray data. Classification methods such as SVM, RBF Neural Nets, MLP Neural Nets, Bayesian, Decision Tree and Random Forrest methods have been used in recent studies. The accuracy of these methods has been calculated with validation methods such as v-fold validation. However there is lack of comparison between these methods to find a better framework for classification, clustering and analysis of microarray gene expression results. In this study, we compared the efficiency of the classification methods including; SVM, RBF Neural Nets, MLP Neural Nets, Bayesian, Decision Tree and Random Forrest methods. The v-fold cross validation was used to calculate the accuracy of the classifiers. Some of the common clustering methods including K-means, DBC, and EM clustering were applied to the datasets and the efficiency of these methods have been analysed. Further the efficiency of the feature selection methods including support vector machine recursive feature elimination (SVM-RFE), Chi Squared, and CSF were compared. In each case these methods were applied to eight different binary (two class) microarray datasets. We evaluated the class prediction efficiency of each gene list in training and test cross-validation using supervised classifiers. We presented a study in which we compared some of the common used classification, clustering, and feature selection methods. We applied these methods to eight publicly available datasets, and compared how these methods performed in class prediction of test datasets. We reported that the choice of feature selection methods, the number of genes in the gene list, the number of cases (samples) substantially influence classification success. Based on features chosen by these methods, error rates and accuracy of several classification algorithms were obtained. Results revealed the importance of feature selection in accurately classifying new samples and how an integrated feature selection and classification algorithm is performing and is capable of identifying significant genes.

...read moreread less

Collapse