scispace - formally typeset
Search or ask a question

Showing papers in "Genome Biology in 2008"


Journal ArticleDOI
TL;DR: This work presents Model-based Analysis of ChIP-Seq data, MACS, which analyzes data generated by short read sequencers such as Solexa's Genome Analyzer, and uses a dynamic Poisson distribution to effectively capture local biases in the genome, allowing for more robust predictions.
Abstract: We present Model-based Analysis of ChIP-Seq data, MACS, which analyzes data generated by short read sequencers such as Solexa's Genome Analyzer. MACS empirically models the shift size of ChIP-Seq tags, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome, allowing for more robust predictions. MACS compares favorably to existing ChIP-Seq peak-finding algorithms, and is freely available.

13,008 citations


Journal ArticleDOI
TL;DR: In this article, an automated eukaryotic gene structure annotation tool, EVM, is presented as a weighted consensus of all available evidence, combined with the Program to Assemble Spliced Alignments (PASA), yields a comprehensive, configurable annotation system that predicts protein coding genes and alternatively spliced isoforms.
Abstract: EVidenceModeler (EVM) is presented as an automated eukaryotic gene structure annotation tool that reports eukaryotic gene structures as a weighted consensus of all available evidence. EVM, when combined with the Program to Assemble Spliced Alignments (PASA), yields a comprehensive, configurable annotation system that predicts protein-coding genes and alternatively spliced isoforms. Our experiments on both rice and human genome sequences demonstrate that EVM produces automated gene structure annotation approaching the quality of manual curation.

1,996 citations


Journal ArticleDOI
TL;DR: A fast heuristic algorithm, derived from ridge regression, to integrate multiple functional association networks and predict gene function from a single process-specific network using label propagation, that is efficient enough to be deployed on a modern webserver and as accurate as the leading methods on the MouseFunc I benchmark and a new yeast function prediction benchmark.
Abstract: Background: Most successful computational approaches for protein function prediction integrate multiple genomics and proteomics data sources to make inferences about the function of unknown proteins. The most accurate of these algorithms have long running times, making them unsuitable for real-time protein function prediction in large genomes. As a result, the predictions of these algorithms are stored in static databases that can easily become outdated. We propose a new algorithm, GeneMANIA, that is as accurate as the leading methods, while capable of predicting protein function in real-time. Results: We use a fast heuristic algorithm, derived from ridge regression, to integrate multiple functional association networks and predict gene function from a single process-specific network using label propagation. Our algorithm is efficient enough to be deployed on a modern webserver and is as accurate as, or more so than, the leading methods on the MouseFunc I benchmark and a new yeast function prediction benchmark; it is robust to redundant and irrelevant data and requires, on average, less than ten seconds of computation time on tasks from these benchmarks. Conclusion: GeneMANIA is fast enough to predict gene function on-the-fly while achieving stateof-the-art accuracy. A prototype version of a GeneMANIA-based webserver is available at http:// morrislab.med.utoronto.ca/prototype.

880 citations


Journal ArticleDOI
TL;DR: It is suggested that transcript abundance of roughly one-third of expressed A. thaliana genes is circadian regulated, suggesting that how the circadian clock affects plant growth and leads to improved fitness is understood.
Abstract: As nonmotile organisms, plants must rapidly adapt to ever-changing environmental conditions, including those caused by daily light/dark cycles. One important mechanism for anticipating and preparing for such predictable changes is the circadian clock. Nearly all organisms have circadian oscillators that, when they are in phase with the Earth's rotation, provide a competitive advantage. In order to understand how circadian clocks benefit plants, it is necessary to identify the pathways and processes that are clock controlled. We have integrated information from multiple circadian microarray experiments performed on Arabidopsis thaliana in order to better estimate the fraction of the plant transcriptome that is circadian regulated. Analyzing the promoters of clock-controlled genes, we identified circadian clock regulatory elements correlated with phase-specific transcript accumulation. We have also identified several physiological pathways enriched for clock-regulated changes in transcript abundance, suggesting they may be modulated by the circadian clock. Our analysis suggests that transcript abundance of roughly one-third of expressed A. thaliana genes is circadian regulated. We found four promoter elements, enriched in the promoters of genes with four discrete phases, which may contribute to the time-of-day specific changes in the transcript abundance of these genes. Clock-regulated genes are over-represented among all of the classical plant hormone and multiple stress response pathways, suggesting that all of these pathways are influenced by the circadian clock. Further exploration of the links between the clock and these pathways will lead to a better understanding of how the circadian clock affects plant growth and leads to improved fitness.

671 citations


Journal ArticleDOI
TL;DR: This study seeks conserved genetic signatures for LN-DCs and in vitro derived granulocyte-macrophage colony stimulating factor (GM-CSF) DCs through the analysis of a compendium of genome-wide expression profiles of mouse or human leukocytes and identifies a large gene expression program shared between mouse and human pDCs, and smaller conserved profiles shared betweenmouse and human Ln-cDC subsets.
Abstract: Background Dendritic cells (DCs) are a complex group of cells that play a critical role in vertebrate immunity. Lymph-node resident DCs (LN-DCs) are subdivided into conventional DC (cDC) subsets (CD11b and CD8α in mouse; BDCA1 and BDCA3 in human) and plasmacytoid DCs (pDCs). It is currently unclear if these various DC populations belong to a unique hematopoietic lineage and if the subsets identified in the mouse and human systems are evolutionary homologs. To gain novel insights into these questions, we sought conserved genetic signatures for LN-DCs and in vitro derived granulocyte-macrophage colony stimulating factor (GM-CSF) DCs through the analysis of a compendium of genome-wide expression profiles of mouse or human leukocytes.

535 citations


Journal ArticleDOI
TL;DR: Argonaute proteins are evolutionarily conserved and can be phylogenetically subdivided into the Ago subfamily and the Piwi subfamily, which bind to siRNAs or miRNAs to guide post-transcriptional gene silencing either by destabilization of the mRNA or by translational repression.
Abstract: Argonaute proteins were first discovered genetically, and extensive research in the past few years has revealed that members of the Argonaute protein family are key players in gene-silencing pathways guided by small RNAs. Small RNAs such as short interfering RNAs (siRNAs), microRNAs (miRNAs) or Piwi-interacting RNAs (piRNAs) are anchored into specific binding pockets and guide Argonaute proteins to target mRNA molecules for silencing or destruction. Various classes of small RNAs and Argonaute proteins are found in all higher eukaryotes and have important functions in processes as diverse as embryonic development, cell differentiation and transposon silencing. Argonaute proteins are evolutionarily conserved and can be phylogenetically subdivided into the Ago subfamily and the Piwi subfamily. Ago proteins are ubiquitously expressed and bind to siRNAs or miRNAs to guide post-transcriptional gene silencing either by destabilization of the mRNA or by translational repression. The expression of Piwi proteins is mostly restricted to the germ line and Piwi proteins associate with piRNAs to facilitate silencing of mobile genetic elements. Although various aspects of Argonaute function have been identified, many Argonaute proteins are still poorly characterized. Therefore, it is very likely that as yet unknown functions of the Argonaute protein family will be elucidated in the future.

516 citations


Journal ArticleDOI
TL;DR: An automated pipeline for phylogenomic analysis (AMPHORA) is developed that overcomes the existing bottlenecks limiting large-scale protein phylogenetic inference and demonstrates its high throughput capabilities and high quality results.
Abstract: The explosive growth of genomic data provides an opportunity to make increased use of protein markers for phylogenetic inference. We have developed an automated pipeline for phylogenomic analysis (AMPHORA) that overcomes the existing bottlenecks limiting large-scale protein phylogenetic inference. We demonstrated its high throughput capabilities and high quality results by constructing a genome tree of 578 bacterial species and by assigning phylotypes to 18,607 protein markers identified in metagenomic data collected from the Sargasso Sea.

512 citations


Journal ArticleDOI
TL;DR: Although both Tribolium and C. elegans show a robust systemic RNAi response, the genome-wide survey reveals significant differences between the RNAi mechanisms of these organisms.
Abstract: Background: RNA interference (RNAi) is a highly conserved cellular mechanism. In some organisms, such as Caenorhabditis elegans, the RNAi response can be transmitted systemically. Some insects also exhibit a systemic RNAi response. However, Drosophila, the leading insect model organism, does not show a robust systemic RNAi response, necessitating another model system to study the molecular mechanism of systemic RNAi in insects. Results: We used Tribolium, which exhibits robust systemic RNAi, as an alternative model system. We have identified the core RNAi genes, as well as genes potentially involved in systemic RNAi, from the Tribolium genome. Both phylogenetic and functional analyses suggest that Tribolium has a somewhat larger inventory of core component genes than Drosophila, perhaps allowing a more sensitive response to double-stranded RNA (dsRNA). We also identified three Tribolium homologs of C. elegans sid-1, which encodes a possible dsRNA channel. However, detailed sequence analysis has revealed that these Tribolium homologs share more identity with another C. elegans gene, tag130. We analyzed tag-130 mutants, and found that this gene does not have a function in systemic RNAi in C. elegans. Likewise, the Tribolium sid-like genes do not seem to be required for systemic RNAi. These results suggest that insect sid-1-like genes have a different function than dsRNA uptake. Moreover, Tribolium lacks homologs of several genes important for RNAi in C. elegans. Conclusion: Although both Tribolium and C. elegans show a robust systemic RNAi response, our genome-wide survey reveals significant differences between the RNAi mechanisms of these organisms. Thus, insects may use an alternative mechanism for the systemic RNAi response. Understanding this process would assist with rendering other insects amenable to systemic RNAi, and may influence pest control approaches.

501 citations


Journal ArticleDOI
TL;DR: The AID/APOBECs, a group of cytidine deaminases, represent a somewhat unusual protein family that can insert mutations in DNA and RNA as a result of their ability to deaminate cytidine to uridine.
Abstract: The AID/APOBECs, a group of cytidine deaminases, represent a somewhat unusual protein family that can insert mutations in DNA and RNA as a result of their ability to deaminate cytidine to uridine. The ancestral AID/APOBECs originated from a branch of the zinc-dependent deaminase superfamily at the beginning of the vertebrate radiation. Other members of the family have arisen in mammals and present a history of complex gene duplications and positive selection. All AID/APOBECs have a characteristic zinc-coordination motif, which forms the core of the catalytic site. The crystal structure of human APOBEC2 shows remarkable similarities to that of the bacterial tRNA-editing enzyme TadA, which suggests a conserved mechanism by which polynucleotides are recognized and deaminated. The AID/APOBECs seem to have diverse roles. AID and the APOBEC3s are DNA mutators, acting in antigen-driven antibody diversification processes and in an innate defense system against retroviruses, respectively. APOBEC1 edits the mRNA for apolipoprotein B, a protein involved in lipid transport. A detailed understanding of the biological roles of the family is still some way off, however, and the functions of some members of the family are completely unknown. Given their ability to mutate DNA, a role for the AID/APOBECs in the onset of cancer has been proposed.

499 citations


Journal ArticleDOI
TL;DR: The panoply of antimicrobial drug resistance genes and mobile genetic elements found suggests that the organism can act as a reservoir of antimacterial drug resistance determinants in a clinical environment, which is an issue of considerable concern.
Abstract: Background Stenotrophomonas maltophilia is a nosocomial opportunistic pathogen of the Xanthomonadaceae. The organism has been isolated from both clinical and soil environments in addition to the sputum of cystic fibrosis patients and the immunocompromised. Whilst relatively distant phylogenetically, the closest sequenced relatives of S. maltophilia are the plant pathogenic xanthomonads.

486 citations


Journal ArticleDOI
TL;DR: It is demonstrated that, by combining the results from all submissions, an F score of 0.9066 is feasible, and furthermore that the best result makes use of the lowest scoring submissions.
Abstract: Nineteen teams presented results for the Gene Mention Task at the BioCreative II Workshop. In this task participants designed systems to identify substrings in sentences corresponding to gene name mentions. A variety of different methods were used and the results varied with a highest achieved F1 score of 0.8721. Here we present brief descriptions of all the methods used and a statistical analysis of the results. We also demonstrate that, by combining the results from all submissions, an F score of 0.9066 is feasible, and furthermore that the best result makes use of the lowest scoring submissions.

Journal ArticleDOI
TL;DR: Tissue and cellular investigations, driven by the analysis of transcriptional interactions, revealed an increased amount of interstitial fibrosis in obese WAT, associated with an infiltration of different types of inflammatory cells, and suggest that phenotypic alterations of human pre-adipocytes, induced by a pro-inflammatory environment, may lead to an excessive synthesis of ECM components.
Abstract: Background: Investigations performed in mice and humans have acknowledged obesity as a lowgrade inflammatory disease Several molecular mechanisms have been convincingly shown to be involved in activating inflammatory processes and altering cell composition in white adipose tissue (WAT) However, the overall importance of these alterations, and their long-term impact on the metabolic functions of the WAT and on its morphology, remain unclear Results: Here, we analyzed the transcriptomic signature of the subcutaneous WAT in obese human subjects, in stable weight conditions and after weight loss following bariatric surgery An original integrative functional genomics approach was applied to quantify relations between relevant structural and functional themes annotating differentially expressed genes in order to construct a comprehensive map of transcriptional interactions defining the obese WAT These analyses highlighted a significant up-regulation of genes and biological themes related to extracellular matrix (ECM) constituents, including members of the integrin family, and suggested that these elements could play a major mediating role in a chain of interactions that connect local inflammatory phenomena to the alteration of WAT metabolic functions in obese subjects Tissue and cellular investigations, driven by the analysis of transcriptional interactions, revealed an increased amount of interstitial fibrosis in obese WAT, associated with an infiltration of different types of inflammatory cells, and suggest that phenotypic alterations of human pre-adipocytes,

Journal ArticleDOI
TL;DR: Major advances for the BioCreative II gene normalization task include broader participation (20 versus 8 teams) and a pooled system performance comparable to human experts, at over 90% agreement, which show promise as tools to link the literature with biological databases.
Abstract: Background: The goal of the gene normalization task is to link genes or gene products mentioned in the literature to biological databases. This is a key step in an accurate search of the biological literature. It is a challenging task, even for the human expert; genes are often described rather than referred to by gene symbol and, confusingly, one gene name may refer to different genes (often from different organisms). For BioCreative II, the task was to list the Entrez Gene identifiers for human genes or gene products mentioned in PubMed/MEDLINE abstracts. We selected abstracts associated with articles previously curated for human genes. We provided 281 expert-annotated

Journal ArticleDOI
TL;DR: The features of the P.Anserina genome indicate a highly dynamic evolution since the divergence of P. anserina and N. crassa, leading to the ability of the former to use specific complex carbon sources that match its needs in its natural biotope.
Abstract: Background: The dung-inhabiting ascomycete fungus Podospora anserina is a model used to study various aspects of eukaryotic and fungal biology, such as ageing, prions and sexual development. Results: We present a 10X draft sequence of P. anserina genome, linked to the sequences of a large expressed sequence tag collection. Similar to higher eukaryotes, the P. anserina transcription/ splicing machinery generates numerous non-conventional tran scripts. Comparison of the P. anserina genome and orthologous gene set with the one of its close relatives, Neurospora crassa, shows that synteny is poorly conserved, the main result of evolution being gene shuffling in the same chromosome. The P. anserina genome contains fewer repeated sequences and has evolved

Journal ArticleDOI
TL;DR: The generality of molecular signatures for environmental adaptation of extreme salt-loving organisms, demonstrated in the present study, advocates the convergent evolution of halophilic species towards specific genome and amino acid composition, irrespective of their varying GC-bias and widely disparate taxonomic positions.
Abstract: Halophilic prokaryotes are adapted to thrive in extreme conditions of salinity. Identification and analysis of distinct macromolecular characteristics of halophiles provide insight into the factors responsible for their adaptation to high-salt environments. The current report presents an extensive and systematic comparative analysis of genome and proteome composition of halophilic and non-halophilic microorganisms, with a view to identify such macromolecular signatures of haloadaptation. Comparative analysis of the genomes and proteomes of halophiles and non-halophiles reveals some common trends in halophiles that transcend the boundary of phylogenetic relationship and the genomic GC-content of the species. At the protein level, halophilic species are characterized by low hydrophobicity, over-representation of acidic residues, especially Asp, under-representation of Cys, lower propensities for helix formation and higher propensities for coil structure. At the DNA level, the dinucleotide abundance profiles of halophilic genomes bear some common characteristics, which are quite distinct from those of non-halophiles, and hence may be regarded as specific genomic signatures for salt-adaptation. The synonymous codon usage in halophiles also exhibits similar patterns regardless of their long-term evolutionary history. The generality of molecular signatures for environmental adaptation of extreme salt-loving organisms, demonstrated in the present study, advocates the convergent evolution of halophilic species towards specific genome and amino acid composition, irrespective of their varying GC-bias and widely disparate taxonomic positions. The adapted features of halophiles seem to be related to physical principles governing DNA and protein stability, in response to the extreme environmental conditions under which they thrive.

Journal ArticleDOI
TL;DR: The miR-17-5p microRNA is able to act as both an oncogene and a tumor suppressor in different cellular contexts; the model of competing positive and negative signals can explain both of these activities.
Abstract: Background: MicroRNAs are modifiers of gene expression, acting to reduce translation through either translational repression or mRNA cleavage. Recently, it has been shown that some microRNAs can act to promote or suppress cell transformation, with miR-17-92 described as the first oncogenic microRNA. The association of miR-17-92 encoded microRNAs with a surprisingly broad range of cancers not only underlines the clinical significance of this locus, but also suggests that miR-17-92 may regulate fundamental biological processes, and for these reasons miR-17-92 has been considered as a therapeutic target. Results: In this study, we show that miR-17-92 is a cell cycle regulated locus, and ectopic expression of a single microRNA (miR-17-5p) is sufficient to drive a proliferative signal in HEK293T cells. For the first time, we reveal the mechanism behind this response - miR-17-5p acts specifically at the G1/S-phase cell cycle boundary, by targeting more than 20 genes involved in the transition between these phases. While both pro- and anti-proliferative genes are targeted by miR-17-5p, pro-proliferative mRNAs are specifically up-regulated by secondary and/or tertiary effects in HEK293T cells. Conclusion: The miR-17-5p microRNA is able to act as both an oncogene and a tumor suppressor in different cellular contexts; our model of competing positive and negative signals can explain both of these activities. The coordinated suppression of proliferation-inhibitors allows miR-17-5p to efficiently de-couple negative regulators of the MAPK (mitogen activated protein kinase) signaling cascade, promoting growth in HEK293T cells. Additionally, we have demonstrated the utility of a systems biology approach as a unique and rapid approach to uncover microRNA function.

Journal ArticleDOI
TL;DR: A method for the comparison of mRNA expression levels of most human genes across 9,783 Affymetrix gene expression array experiments representing 43 normal human tissue types, 68 cancer types, and 64 other diseases is developed.
Abstract: Our knowledge on tissue- and disease-specific functions of human genes is rather limited and highly context-specific. Here, we have developed a method for the comparison of mRNA expression levels of most human genes across 9,783 Affymetrix gene expression array experiments representing 43 normal human tissue types, 68 cancer types, and 64 other diseases. This database of gene expression patterns in normal human tissues and pathological conditions covers 113 million datapoints and is available from the GeneSapiens website.

Journal ArticleDOI
TL;DR: This work formalizes several mechanisms for detecting mis-assemblies, and describes their implementation in the automated validation pipeline, called amosvalidate, which is demonstrated in both bacterial and eukaryotic genome assemblies.
Abstract: We present the first collection of tools aimed at automated genome assembly validation. This work formalizes several mechanisms for detecting mis-assemblies, and describes their implementation in our automated validation pipeline, called amosvalidate. We demonstrate the application of our pipeline in both bacterial and eukaryotic genome assemblies, and highlight several assembly errors in both draft and finished genomes. The software described is compatible with common assembly formats and is released, open-source, at http://amos.sourceforge.net.

Journal ArticleDOI
TL;DR: The BioCreative II PPI task is the first attempt to compare the performance of text-mining tools specific for each of the basic steps of the PPI extraction pipeline, and challenges identified range from problems in full-text format conversion of articles to difficulties in detecting interactor protein pairs and then linking them to their database records.
Abstract: Background: The biomedical literature is the primary information source for manual protein-protein interaction annotations. Text-mining systems have been implemented to extract binary protein interactions from articles, but a comprehensive comparison between the different techniques as well as with manual curation was missing.

Journal ArticleDOI
TL;DR: It is proposed that while members of a given marine Synechococcus lineage may have the same broad geographical distribution, local niche occupancy is facilitated by lateral gene transfers, a process in which genomic islands play a key role as a repository for transferred genes.
Abstract: Background The picocyanobacterial genus Synechococcus occurs over wide oceanic expanses, having colonized most available niches in the photic zone. Large scale distribution patterns of the different Synechococcus clades (based on 16S rRNA gene markers) suggest the occurrence of two major lifestyles ('opportunists'/'specialists'), corresponding to two distinct broad habitats ('coastal'/'open ocean'). Yet, the genetic basis of niche partitioning is still poorly understood in this ecologically important group.

Journal ArticleDOI
TL;DR: This analysis shows that protein secretion by L. donovani is a heterogeneous process that is unlikely to be determined by a classical amino-terminal secretion signal, and as an alternative, L.Donovani appears to use multiple nonclassical secretion pathways, including the release of exosome-like microvesicles.
Abstract: Background: Leishmania and other intracellular pathogens have evolved strategies that support invasion and persistence within host target cells. In some cases the underlying mechanisms involve the export of virulence factors into the host cell cytosol. Previous work from our laboratory identified one such candidate leishmania effector, namely elongation factor-1α, to be present in conditioned medium of infectious leishmania as well as within macrophage cytosol after infection. To investigate secretion of potential effectors more broadly, we used quantitative mass spectrometry to analyze the protein content of conditioned medium collected from cultures of stationary-phase promastigotes of Leishmania donovani, an agent of visceral leishmaniasis. Results: Analysis of leishmania conditioned medium resulted in the identification of 151 proteins apparently secreted by L. donovani. Ratios reflecting the relative amounts of each leishmania protein secreted, as compared to that remaining cell associated, revealed a hierarchy of protein secretion, with some proteins secreted to a greater extent than others. Comparison with an in silico approach defining proteins potentially exported along the classic eukaryotic secretion pathway suggested that few leishmania proteins are targeted for export using a classic eukaryotic amino-terminal secretion signal peptide. Unexpectedly, a large majority of known eukaryotic exosomal proteins was detected in leishmania conditioned medium, suggesting a vesicle-based secretion system. Conclusion: This analysis shows that protein secretion by L. donovani is a heterogeneous process that is unlikely to be determined by a classical amino-terminal secretion signal. As an alternative, L. donovani appears to use multiple nonclassical secretion pathways, including the release of exosomelike microvesicles.

Journal ArticleDOI
TL;DR: OG1RF's effects in experimental models suggest that mediators of virulence may be diverse between different E. faecalis strains and that virulence is not dependent on the presence of mobile genetic elements.
Abstract: Background Enterococcus faecalis has emerged as a major hospital pathogen. To explore its diversity, we sequenced E. faecalis strain OG1RF, which is commonly used for molecular manipulation and virulence studies.

Journal ArticleDOI
TL;DR: G-Mo.R-Se (Gene Modelling using RNA-Seq), an approach aimed at building gene models directly from RNA- Seq and demonstrate its utility on the grapevine genome is presented.
Abstract: Next generation technologies enable massive-scale cDNA sequencing (so-called RNA-Seq). Mainly because of the difficulty of aligning short reads on exon-exon junctions, no attempts have been made so far to use RNA-Seq for building gene models de novo, that is, in the absence of a set of known genes and/or splicing events. We present G-Mo.R-Se (Gene Modelling using RNA-Seq), an approach aimed at building gene models directly from RNA-Seq and demonstrate its utility on the grapevine genome.

Journal ArticleDOI
TL;DR: The results show that currently available data for mammals allows predictions with both breadth and accuracy, and many highly novel predictions emerge for the 38% of mouse genes that remain uncharacterized.
Abstract: Background: Several years after sequencing the human genome and the mouse genome, much remains to be discovered about the functions of most human and mouse genes. Computational prediction of gene function promises to help focus limited experimental resources on the most likely hypotheses. Several algorithms using diverse genomic data have been applied to this task in model organisms; however, the performance of such approaches in mammals has not yet been evaluated.

Journal ArticleDOI
TL;DR: The AAA+ superfamily is a large and functionally diverse superfamily of NTPases that are characterized by a conserved nucleotide-binding and catalytic module, the AAA+ module.
Abstract: The AAA+ superfamily is a large and functionally diverse superfamily of NTPases that are characterized by a conserved nucleotide-binding and catalytic module, the AAA+ module. Members are involved in an astonishing range of different cellular processes, attaining this functional diversity through additions of structural motifs and modifications to the core AAA+ module.

Journal ArticleDOI
TL;DR: This review presents a general introduction to the main characteristics and applications of currently available text-mining systems for life sciences in terms of the type of biological information demands being addressed; the level of information granularity of both user queries and results; and the features and methods commonly exploited by these applications.
Abstract: Efficient access to information contained in online scientific literature collections is essential for life science research, playing a crucial role from the initial stage of experiment planning to the final interpretation and communication of the results. The biological literature also constitutes the main information source for manual literature curation used by expert-curated databases. Following the increasing popularity of web-based applications for analyzing biological data, new text-mining and information extraction strategies are being implemented. These systems exploit existing regularities in natural language to extract biologically relevant information from electronic texts automatically. The aim of the BioCreative challenge is to promote the development of such tools and to provide insight into their performance. This review presents a general introduction to the main characteristics and applications of currently available text-mining systems for life sciences in terms of the following: the type of biological information demands being addressed; the level of information granularity of both user queries and results; and the features and methods commonly exploited by these applications. The current trend in biomedical text mining points toward an increasing diversification in terms of application types and techniques, together with integration of domain-specific resources such as ontologies. Additional descriptions of some of the systems discussed here are available on the internet http://zope.bioinfo.cnio.es/bionlp_tools/.

Journal ArticleDOI
TL;DR: High expression of a subset of MYCN/c-MYC target genes identifies a patient subtype with poor overall survival independent of the established risk markers amplified MYCN, disease stage, and age at diagnosis.
Abstract: Amplified MYCN oncogene resulting in deregulated MYCN transcriptional activity is observed in 20% of neuroblastomas and identifies a highly aggressive subtype. In MYCN single-copy neuroblastomas, elevated MYCN mRNA and protein levels are paradoxically associated with a more favorable clinical phenotype, including disseminated tumors that subsequently regress spontaneously (stage 4s-non-amplified). In this study, we asked whether distinct transcriptional MYCN or c-MYC activities are associated with specific neuroblastoma phenotypes. We defined a core set of direct MYCN/c-MYC target genes by applying gene expression profiling and chromatin immunoprecipitation (ChIP, ChIP-chip) in neuroblastoma cells that allow conditional regulation of MYCN and c-MYC. Their transcript levels were analyzed in 251 primary neuroblastomas. Compared to localized-non-amplified neuroblastomas, MYCN/c-MYC target gene expression gradually increases from stage 4s-non-amplified through stage 4-non-amplified to MYCN amplified tumors. This was associated with MYCN activation in stage 4s-non-amplified and predominantly c-MYC activation in stage 4-non-amplified tumors. A defined set of MYCN/c-MYC target genes was induced in stage 4-non-amplified but not in stage 4s-non-amplified neuroblastomas. In line with this, high expression of a subset of MYCN/c-MYC target genes identifies a patient subtype with poor overall survival independent of the established risk markers amplified MYCN, disease stage, and age at diagnosis. High MYCN/c-MYC target gene expression is a hallmark of malignant neuroblastoma progression, which is predominantly driven by c-MYC in stage 4-non-amplified tumors. In contrast, moderate MYCN function gain in stage 4s-non-amplified tumors induces only a restricted set of target genes that is still compatible with spontaneous regression.

Journal ArticleDOI
TL;DR: An expression signature of 581 genes whose levels are significantly different in prostate cancer stem cells is described, which identified the JAK-STAT pathway and focal adhesion signaling as key processes in the biology ofcancer stem cells.
Abstract: Background The tumor-initiating capacity of many cancers is considered to reside in a small subpopulation of cells (cancer stem cells). We have previously shown that rare prostate epithelial cells with a CD133+/α2β1hi phenotype have the properties of prostate cancer stem cells. We have compared gene expression in these cells relative to their normal and differentiated (CD133-/α2β1low) counterparts, resulting in an informative cancer stem cell gene-expression signature.

Journal ArticleDOI
TL;DR: Gene-by-gene phylogenetic analysis showed that in C. globosum and M. grisea, the evolution of these ACE1-like clusters is characterized by successive complex duplication events including tandem duplication within the M.Grisea cluster, and phylogenetic trees present evidence that at least five of the six genes in the homologous ACE1 gene cluster in A. clavatus originated by horizontal transfer from a donor closely related to M.grisea.
Abstract: Background Filamentous fungi synthesize many secondary metabolites and are rich in genes encoding proteins involved in their biosynthesis. Genes from the same pathway are often clustered and co-expressed in particular conditions. Such secondary metabolism gene clusters evolve rapidly through multiple rearrangements, duplications and losses. It has long been suspected that clusters can be transferred horizontally between species, but few concrete examples have been described so far.

Journal ArticleDOI
TL;DR: Reliable orthology prediction is central to comparative genomics and although orthology is defined by phylogenetic criteria, most automated prediction methods are based on pairwise sequence comparisons.
Abstract: Reliable orthology prediction is central to comparative genomics. Although orthology is defined by phylogenetic criteria, most automated prediction methods are based on pairwise sequence comparisons. Recently, automated phylogeny-based orthology prediction has emerged as a feasible alternative for genome-wide studies.