scispace - formally typeset
Search or ask a question

Showing papers in "Genome Biology in 2006"


Journal ArticleDOI
TL;DR: The first free, open-source system designed for flexible, high-throughput cell image analysis, CellProfiler is described, which can address a variety of biological questions quantitatively.
Abstract: Biologists can now prepare and image thousands of samples per day using automation, enabling chemical screens and functional genomics (for example, using RNA interference). Here we describe the first free, open-source system designed for flexible, high-throughput cell image analysis, CellProfiler. CellProfiler can address a variety of biological questions quantitatively, including standard assays (for example, cell count, size, per-cell protein levels) and complex morphological assays (for example, cell/organelle shape or subcellular patterns of DNA or protein staining).

4,578 citations


Journal ArticleDOI
TL;DR: The majority of disease resistance genes in plants encode nucleotide-binding site leucine-rich repeat (NBS-LRR) proteins, and their precise role in recognition is unknown; however, they are thought to monitor the status of plant proteins that are targeted by pathogen effectors.
Abstract: The majority of disease resistance genes in plants encode nucleotide-binding site leucine-rich repeat (NBS-LRR) proteins. This large family is encoded by hundreds of diverse genes per genome and can be subdivided into the functionally distinct TIR-domain-containing (TNL) and CC-domain-containing (CNL) subfamilies. Their precise role in recognition is unknown; however, they are thought to monitor the status of plant proteins that are targeted by pathogen effectors.

881 citations


Journal ArticleDOI
TL;DR: Microtubule-associated proteins of the MAP2/Tau family include the vertebrate proteins MAP2, MAP4, and Tau and homologs in other animals and are best known for their microtubules-stabilizing activity and for proposed roles regulating microtubule networks in the axons and dendrites of neurons.
Abstract: Microtubule-associated proteins (MAPs) of the MAP2/Tau family include the vertebrate proteins MAP2, MAP4, and Tau and homologs in other animals. All three vertebrate members of the family have alternative splice forms; all isoforms share a conserved carboxy-terminal domain containing microtubule-binding repeats, and an amino-terminal projection domain of varying size. MAP2 and Tau are found in neurons, whereas MAP4 is present in many other tissues but is generally absent from neurons. Members of the family are best known for their microtubule-stabilizing activity and for proposed roles regulating microtubule networks in the axons and dendrites of neurons. Contrary to this simple, traditional view, accumulating evidence suggests a much broader range of functions, such as binding to filamentous (F) actin, recruitment of signaling proteins, and regulation of microtubule-mediated transport. Tau is also implicated in Alzheimer's disease and other dementias. The ability of MAP2 to interact with both microtubules and F-actin might be critical for neuromorphogenic processes, such as neurite initiation, during which networks of microtubules and F-actin are reorganized in a coordinated manner. Various upstream kinases and interacting proteins have been identified that regulate the microtubule-stabilizing activity of MAP2/Tau family proteins.

855 citations


Journal ArticleDOI
TL;DR: The software and underlying methods for identifying these three important structural and functional genome components are reviewed and it is demonstrated that they can be effectively used for initial automatic annotation of the eukaryotic genome.
Abstract: The ENCODE gene prediction workshop (EGASP) has been organized to evaluate how well state-of-the-art automatic gene finding methods are able to reproduce the manual and experimental gene annotation of the human genome. We have used Softberry gene finding software to predict genes, pseudogenes and promoters in 44 selected ENCODE sequences representing approximately 1% (30 Mb) of the human genome. Predictions of gene finding programs were evaluated in terms of their ability to reproduce the ENCODE-HAVANA annotation. The Fgenesh++ gene prediction pipeline can identify 91% of coding nucleotides with a specificity of 90%. Our automatic pseudogene finder (PSF program) found 90% of the manually annotated pseudogenes and some new ones. The Fprom promoter prediction program identifies 80% of TATA promoters sequences with one false positive prediction per 2,000 base-pairs (bp) and 50% of TATA-less promoters with one false positive prediction per 650 bp. It can be used to identify transcription start sites upstream of annotated coding parts of genes found by gene prediction software. We review our software and underlying methods for identifying these three important structural and functional genome components and discuss the accuracy of predictions, recent advances and open problems in annotating genomic sequences. We have demonstrated that our methods can be effectively used for initial automatic annotation of the eukaryotic genome.

716 citations


Journal ArticleDOI
TL;DR: The driving principles of AceView are described, and how, by performing hand-supervised automatic annotation, it solves the combinatorial splicing problem and summarize all of GenBank, dbEST and RefSeq into a genome-wide non-redundant but comprehensive cDNA-supported transcriptome.
Abstract: Regions covering one percent of the genome, selected by ENCODE for extensive analysis, were annotated by the HAVANA/Gencode group with high quality transcripts, thus defining a benchmark. The ENCODE Genome Annotation Assessment Project (EGASP) competition aimed at reproducing Gencode and finding new genes. The organizers evaluated the protein predictions in depth. We present a complementary analysis of the mRNAs, including alternative transcript variants. We evaluate 25 gene tracks from the University of California Santa Cruz (UCSC) genome browser. We either distinguish or collapse the alternative splice variants, and compare the genomic coordinates of exons, introns and nucleotides. Whole mRNA models, seen as chains of introns, are sorted to find the best matching pairs, and compared so that each mRNA is used only once. At the mRNA level, AceView is by far the closest to Gencode: the vast majority of transcripts of the two methods, including alternative variants, are identical. At the protein level, however, due to a lack of experimental data, our predictions differ: Gencode annotates proteins in only 41% of the mRNAs whereas AceView does so in virtually all. We describe the driving principles of AceView, and how, by performing hand-supervised automatic annotation, we solve the combinatorial splicing problem and summarize all of GenBank, dbEST and RefSeq into a genome-wide non-redundant but comprehensive cDNA-supported transcriptome. AceView accuracy is now validated by Gencode. Relative to a consensus mRNA catalog constructed from all evidence-based annotations, Gencode and AceView have 81% and 84% sensitivity, and 74% and 73% specificity, respectively. This close agreement validates a richer view of the human transcriptome, with three to five times more transcripts than in UCSC Known Genes (sensitivity 28%), RefSeq (sensitivity 21%) or Ensembl (sensitivity 19%).

657 citations


Journal ArticleDOI
TL;DR: The analysis provides a high-confidence set of proteins present in human urinary proteome and provides a useful reference for comparing datasets obtained using different methodologies and may prove useful in biomarker discovery in the future.
Abstract: Urine is a desirable material for the diagnosis and classification of diseases because of the convenience of its collection in large amounts; however, all of the urinary proteome catalogs currently being generated have limitations in their depth and confidence of identification. Our laboratory has developed methods for the in-depth characterization of body fluids; these involve a linear ion trap-Fourier transform (LTQ-FT) and a linear ion trap-orbitrap (LTQ-Orbitrap) mass spectrometer. Here we applied these methods to the analysis of the human urinary proteome. We employed one-dimensional sodium dodecyl sulfate polyacrylamide gel electrophoresis and reverse phase high-performance liquid chromatography for protein separation and fractionation. Fractionated proteins were digested in-gel or in-solution, and digests were analyzed with the LTQ-FT and LTQ-Orbitrap at parts per million accuracy and with two consecutive stages of mass spectrometric fragmentation. We identified 1543 proteins in urine obtained from ten healthy donors, while essentially eliminating false-positive identifications. Surprisingly, nearly half of the annotated proteins were membrane proteins according to Gene Ontology (GO) analysis. Furthermore, extracellular, lysosomal, and plasma membrane proteins were enriched in the urine compared with all GO entries. Plasma membrane proteins are probably present in urine by secretion in exosomes. Our analysis provides a high-confidence set of proteins present in human urinary proteome and provides a useful reference for comparing datasets obtained using different methodologies. The urinary proteome is unexpectedly complex and may prove useful in biomarker discovery in the future.

647 citations


Journal ArticleDOI
TL;DR: The comprehensiveness of the GENCODE annotation was assessed by attempting to validate all the predicted exon boundaries outside the GencODE annotation, which showed only 40% of GENCode exons are contained within the two sets, which is a reflection of the high number of alternative splice forms with unique exons annotated.
Abstract: Background The GENCODE consortium was formed to identify and map all protein-coding genes within the ENCODE regions. This was achieved by a combination of initial manual annotation by the HAVANA team, experimental validation by the GENCODE consortium and a refinement of the annotation based on these experimental results.

619 citations


Journal ArticleDOI
TL;DR: Serpins are a broadly distributed family of protease inhibitors that use a conformational change to inhibit target enzymes, central in controlling many important proteolytic cascades, including the mammalian coagulation pathways.
Abstract: Serpins are a broadly distributed family of protease inhibitors that use a conformational change to inhibit target enzymes They are central in controlling many important proteolytic cascades, including the mammalian coagulation pathways Serpins are conformationally labile and many of the disease-linked mutations of serpins result in misfolding or in pathogenic, inactive polymers

601 citations


Journal ArticleDOI
TL;DR: The genomic organization of theChemokine ligand genes and a comparison of their sequences between species shows that tandem gene duplication has taken place independently in the mouse and human lineages of some chemokine families.
Abstract: The human chemokine superfamily currently includes at least 46 ligands, which bind to 18 functionally signaling G-protein-coupled receptors and two decoy or scavenger receptors. The chemokine ligands probably comprise one of the first completely known molecular superfamilies. The genomic organization of the chemokine ligand genes and a comparison of their sequences between species shows that tandem gene duplication has taken place independently in the mouse and human lineages of some chemokine families. This means that care needs to be taken when extrapolating experimental results on some chemokines from mouse to human.

582 citations


Journal ArticleDOI
TL;DR: It is proposed that virulence in this organism is both multifactorial and combinatorial, the result of a pool of pathogenicity-related genes that interact in various combinations in different genetic backgrounds.
Abstract: Background: Pseudomonas aeruginosa is a ubiquitous environmental bacterium and an important opportunistic human pathogen. Generally, the acquisition of genes in the form of pathogenicity islands distinguishes pathogenic isolates from nonpathogens. We therefore sequenced a highly virulent strain of P. aeruginosa, PA14, and compared it with a previously sequenced (and less pathogenic) strain, PAO1, to identify novel virulence genes. Results: The PA14 and PAO1 genomes are remarkably similar, although PA14 has a slightly larger genome (6.5 megabses [Mb]) than does PAO1 (6.3 Mb). We identified 58 PA14 gene clusters that are absent in PAO1 to determine which of these genes, if any, contribute to its enhanced virulence in a Caenorhabditis elegans pathogenicity model. First, we tested 18 additional diverse strains in the C. elegans model and observed a wide range of pathogenic potential; however, genotyping these strains using a custom microarray showed that the presence of PA14 genes that are absent in PAO1 did not correlate with the virulence of these strains. Second, we utilized a full-genome nonredundant mutant library of PA14 to identify five genes (absent in PAO1) required for C. elegans killing. Surprisingly, although these five genes are present in many other P. aeruginosa strains, they do not correlate with virulence in C. elegans. Conclusion: Genes required for pathogenicity in one strain of P. aeruginosa are neither required for nor predictive of virulence in other strains. We therefore propose that virulence in this organism is both multifactorial and combinatorial, the result of a pool of pathogenicity-related genes that interact in various combinations in different genetic backgrounds.

552 citations


Journal ArticleDOI
TL;DR: The Inferelator uses regression and variable selection to identify transcriptional influences on genes based on the integration of genome annotation and expression data, and successfully predicted Halobacterium's global expression under novel perturbations with predictive power similar to that seen over training data.
Abstract: We present a method (the Inferelator) for deriving genome-wide transcriptional regulatory interactions, and apply the method to predict a large portion of the regulatory network of the archaeon Halobacterium NRC-1. The Inferelator uses regression and variable selection to identify transcriptional influences on genes based on the integration of genome annotation and expression data. The learned network successfully predicted Halobacterium's global expression under novel perturbations with predictive power similar to that seen over training data. Several specific regulatory predictions were experimentally tested and verified.

Journal ArticleDOI
TL;DR: The genome of R. leguminosarum can be considered to have two main components: a 'core', which is higher in G+C, is mostly chromosomal, is shared with related organisms, and has a consistent phylogeny; and an 'accessory' component, which is sporadic in distribution, lower in G-C, and located on the plasmids and chromosomal islands.
Abstract: Rhizobium leguminosarum is an α-proteobacterial N2-fixing symbiont of legumes that has been the subject of more than a thousand publications. Genes for the symbiotic interaction with plants are well studied, but the adaptations that allow survival and growth in the soil environment are poorly understood. We have sequenced the genome of R. leguminosarum biovar viciae strain 3841. The 7.75 Mb genome comprises a circular chromosome and six circular plasmids, with 61% G+C overall. All three rRNA operons and 52 tRNA genes are on the chromosome; essential protein-encoding genes are largely chromosomal, but most functional classes occur on plasmids as well. Of the 7,263 protein-encoding genes, 2,056 had orthologs in each of three related genomes (Agrobacterium tumefaciens, Sinorhizobium meliloti, and Mesorhizobium loti), and these genes were over-represented in the chromosome and had above average G+C. Most supported the rRNA-based phylogeny, confirming A. tumefaciens to be the closest among these relatives, but 347 genes were incompatible with this phylogeny; these were scattered throughout the genome but were over-represented on the plasmids. An unexpectedly large number of genes were shared by all three rhizobia but were missing from A. tumefaciens. Overall, the genome can be considered to have two main components: a 'core', which is higher in G+C, is mostly chromosomal, is shared with related organisms, and has a consistent phylogeny; and an 'accessory' component, which is sporadic in distribution, lower in G+C, and located on the plasmids and chromosomal islands. The accessory genome has a different nucleotide composition from the core despite a long history of coexistence.

Journal ArticleDOI
TL;DR: The full yeast protein-protein interaction network is estimated to contain 37,800-75,500 interactions and the human network 154,000-369,000, but owing to a high false-positive rate, current maps are roughly only 50% and 10% complete, respectively.
Abstract: We estimate the full yeast protein-protein interaction network to contain 37,800-75,500 interactions and the human network 154,000-369,000, but owing to a high false-positive rate, current maps are roughly only 50% and 10% complete, respectively. Paradoxically, releasing raw, unfiltered assay data might help separate true from false interactions.

Journal ArticleDOI
TL;DR: Data suggest radiation of genes encoding remodeling and virulence factors from a small number of loci in a common Plasmodium ancestor, and imply a closer phylogenetic relationship between the P. vivax and P. falciparum lineages than previously believed.
Abstract: The apicomplexan parasite Plasmodium falciparum causes the most severe form of malaria in humans. After invasion into erythrocytes, asexual parasite stages drastically alter their host cell and export remodeling and virulence proteins. Previously, we have reported identification and functional analysis of a short motif necessary for export of proteins out of the parasite and into the red blood cell. We have developed software for the prediction of exported proteins in the genus Plasmodium, and identified exported proteins conserved between malaria parasites infecting rodents and the two major causes of human malaria, P. falciparum and P. vivax. This conserved 'exportome' is confined to a few subtelomeric chromosomal regions in P. falciparum and the synteny of these and surrounding regions is conserved in P. vivax. We have identified a novel gene family PHIST (for Plasmodium helical interspersed subtelomeric family) that shares a unique domain with 72 paralogs in P. falciparum and 39 in P. vivax; however, there is only one member in each of the three species studied from the P. berghei lineage. These data suggest radiation of genes encoding remodeling and virulence factors from a small number of loci in a common Plasmodium ancestor, and imply a closer phylogenetic relationship between the P. vivax and P. falciparum lineages than previously believed. The presence of a conserved 'exportome' in the genus Plasmodium has important implications for our understanding of both common mechanisms and species-specific differences in host-parasite interactions, and may be crucial in developing novel antimalarial drugs to this infectious disease.

Journal ArticleDOI
TL;DR: Insect PGRPs activate the Toll or immune deficiency (Imd) signal transduction pathways or induce proteolytic cascades that generate antimicrobial products, induce phagocytosis, hydrolyze peptidoglycan, and protect insects against infections.
Abstract: Peptidoglycan recognition proteins (PGRPs) are innate immunity molecules present in insects, mollusks, echinoderms, and vertebrates, but not in nematodes or plants. PGRPs have at least one carboxy-terminal PGRP domain (approximately 165 amino acids long), which is homologous to bacteriophage and bacterial type 2 amidases. Insects have up to 19 PGRPs, classified into short (S) and long (L) forms. The short forms are present in the hemolymph, cuticle, and fat-body cells, and sometimes in epidermal cells in the gut and hemocytes, whereas the long forms are mainly expressed in hemocytes. The expression of insect PGRPs is often upregulated by exposure to bacteria. Insect PGRPs activate the Toll or immune deficiency (Imd) signal transduction pathways or induce proteolytic cascades that generate antimicrobial products, induce phagocytosis, hydrolyze peptidoglycan, and protect insects against infections. Mammals have four PGRPs, which are secreted; it is not clear whether any are directly orthologous to the insect PGRPs. One mammalian PGRP, PGLYRP-2, is an N-acetylmuramoyl-L-alanine amidase that hydrolyzes bacterial peptidoglycan and reduces its proinflammatory activity; PGLYRP-2 is secreted from the liver into the blood and is also induced by bacteria in epithelial cells. The three remaining mammalian PGRPs are bactericidal proteins that are secreted as disulfide-linked homo- and hetero-dimers. PGLYRP-1 is expressed primarily in polymorphonuclear leukocyte granules and PGLYRP-3 and PGLYRP-4 are expressed in the skin, eyes, salivary glands, throat, tongue, esophagus, stomach, and intestine. These three proteins kill bacteria by interacting with cell wall peptidoglycan, rather than permeabilizing bacterial membranes as other antibacterial peptides do. Direct bactericidal activity of these PGRPs either evolved in the vertebrate (or mammalian) lineage or is yet to be discovered in insects.

Journal ArticleDOI
TL;DR: The results indicate that multiple and repeated domains are enriched in hub proteins and, further, that long disordered regions, which are common in date hubs, are particularly important for flexible binding.
Abstract: Most proteins interact with only a few other proteins while a small number of proteins (hubs) have many interaction partners. Hub proteins and non-hub proteins differ in several respects; however, understanding is not complete about what properties characterize the hubs and set them apart from proteins of low connectivity. Therefore, we have investigated what differentiates hubs from non-hubs and static hubs (party hubs) from dynamic hubs (date hubs) in the protein-protein interaction network of Saccharomyces cerevisiae. The many interactions of hub proteins can only partly be explained by bindings to similar proteins or domains. It is evident that domain repeats, which are associated with binding, are enriched in hubs. Moreover, there is an over representation of multi-domain proteins and long proteins among the hubs. In addition, there are clear differences between party hubs and date hubs. Fewer of the party hubs contain long disordered regions compared to date hubs, indicating that these regions are important for flexible binding but less so for static interactions. Furthermore, party hubs interact to a large extent with each other, supporting the idea of party hubs as the cores of highly clustered functional modules. In addition, hub proteins, and in particular party hubs, are more often ancient. Finally, the more recent paralogs of party hubs are underrepresented. Our results indicate that multiple and repeated domains are enriched in hub proteins and, further, that long disordered regions, which are common in date hubs, are particularly important for flexible binding.

Journal ArticleDOI
TL;DR: This high-confidence characterization of seminal plasma content provides an inventory of proteins with potential roles in fertilization and should be useful for studies of fertilization, male infertility, and prostatic and testicular cancers.
Abstract: Background: The development of mass spectrometric (MS) techniques now allows the investigation of very complex protein mixtures ranging from subcellular structures to tissues. Body fluids are also popular targets of proteomic analysis because of their potential for biomarker discovery. Seminal plasma has not yet received much attention from the proteomics community but its characterization could provide a future reference for virtually all studies involving human sperm. The fluid is essential for the survival of spermatozoa and their successful journey through the female reproductive tract. Results: Here we report the high-confidence identification of 923 proteins in seminal fluid from a single individual. Fourier transform MS enabled parts per million mass accuracy, and two consecutive stages of MS fragmentation allowed confident identification of proteins even by single peptides. Analysis with GoMiner annotated two-thirds of the seminal fluid proteome and revealed a large number of extracellular proteins including many proteases. Other proteins originated from male accessory glands and have important roles in spermatozoan survival. Conclusion: This high-confidence characterization of seminal plasma content provides an inventory of proteins with potential roles in fertilization. When combined with quantitative proteomics methodologies, it should be useful for studies of fertilization, male infertility, and prostatic and testicular cancers.

Journal ArticleDOI
TL;DR: Evidence is provided that massive gene duplication (probably as a consequence of entire genome duplications) at the dawn of vertebrate evolution might have been particularly important for the evolution of complex vertebrates.
Abstract: Background: Gene duplication is assumed to have played a crucial role in the evolution of vertebrate organisms. Apart from a continuous mode of duplication, two or three whole genome duplication events have been proposed during the evolution of vertebrates, one or two at the dawn of vertebrate evolution, and an additional one in the fish lineage, not shared with land vertebrates. Here, we have studied gene gain and loss in seven different vertebrate genomes, spanning an evolutionary period of about 600 million years. Results: We show that: first, the majority of duplicated genes in extant vertebrate genomes are ancient and were created at times that coincide with proposed whole genome duplication events; second, there exist significant differences in gene retention for different functional categories of genes between fishes and land vertebrates; third, there seems to be a considerable bias in gene retention of regulatory genes towards the mode of gene duplication (whole genome duplication events compared to smaller-scale events), which is in accordance with the so-called gene balance hypothesis; and fourth, that ancient duplicates that have survived for many hundreds of millions of years can still be lost. Conclusion: Based on phylogenetic analyses, we show that both the mode of duplication and the functional class the duplicated genes belong to have been of major importance for the evolution of the vertebrates. In particular, we provide evidence that massive gene duplication (probably as a consequence of entire genome duplications) at the dawn of vertebrate evolution might have been particularly important for the evolution of complex vertebrates.

Journal ArticleDOI
TL;DR: Interplay between proteases and protease inhibitors, and between oxidative reactions, is an important feature of the ocular environment and identification of a large set of proteins participating in these reactions may allow discovery of molecular markers of disease conditions of the eye.
Abstract: The tear film is a thin layer of fluid that covers the ocular surface and is involved in lubrication and protection of the eye. Little is known about the protein composition of tear fluid but its deregulation is associated with disease states, such as diabetic dry eyes. This makes this body fluid an interesting candidate for in-depth proteomic analysis. In this study, we employ state-of-the-art mass spectrometric identification, using both a hybrid linear ion trap-Fourier transform (LTQ-FT) and a linear ion trap-Orbitrap (LTQ-Orbitrap) mass spectrometer, and high confidence identification by two consecutive stages of peptide fragmentation (MS/MS/MS or MS3), to characterize the protein content of the tear fluid. Low microliter amounts of tear fluid samples were either pre-fractionated with one-dimensional SDS-PAGE and digested in situ with trypsin, or digested in solution. Five times more proteins were detected after gel electrophoresis compared to in solution digestion (320 versus 63 proteins). Ontology classification revealed that 64 of the identified proteins are proteases or protease inhibitors. Of these, only 24 have previously been described as components of the tear fluid. We also identified 18 anti-oxidant enzymes, which protect the eye from harmful consequences of its exposure to oxygen. Only two proteins with this activity have been previously described in the literature. Interplay between proteases and protease inhibitors, and between oxidative reactions, is an important feature of the ocular environment. Identification of a large set of proteins participating in these reactions may allow discovery of molecular markers of disease conditions of the eye.

Journal ArticleDOI
TL;DR: The results presented here contribute to the value of ongoing large-scale annotation projects and should guide further experimental methods when being scaled up to the entire human genome sequence.
Abstract: Background: We present the results of EGASP, a community experiment to assess the state-ofthe-art in genome annotation within the ENCODE regions, which span 1% of the human genome sequence. The experiment had two major goals: the assessment of the accuracy of computational methods to predict protein coding genes; and the overall assessment of the completeness of the current human genome annotations as represented in the ENCODE regions. For the computational prediction assessment, eighteen groups contributed gene predictions. We evaluated these submissions against each other based on a ‘reference set’ of annotations generated as part of the GENCODE project. These annotations were not available to the prediction groups prior to the submission deadline, so that their predictions were blind and an external advisory committee could perform a fair assessment. Results: The best methods had at least one gene transcript correctly predicted for close to 70% of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into account alternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotide level, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programs relying on mRNA and protein sequences were the most accurate in reproducing the manually curated annotations. Experimental validation shows that only a very small percentage (3.2%) of

Journal ArticleDOI
TL;DR: The results indicate that RNA editing increases the diversity of miRNAs and their targets, and hence may modulate miRNA function.
Abstract: Background: MicroRNAs (miRNAs) are short RNAs of around 22 nucleotides that regulate gene expression. The primary transcripts of miRNAs contain double-stranded RNA and are therefore potential substrates for adenosine to inosine (A-to-I) RNA editing. Results: We have conducted a survey of RNA editing of miRNAs from ten human tissues by sequence comparison of PCR products derived from matched genomic DNA and total cDNA from the same individual. Six out of 99 (6%) miRNA transcripts from which data were obtained were subject to A-to-I editing in at least one tissue. Four out of seven edited adenosines were in the mature miRNA and were predicted to change the target sites in 3' untranslated regions. For a further six miRNAs, we identified A-to-I editing of transcripts derived from the opposite strand of the genome to the annotated miRNA. These miRNAs may have been annotated to the wrong genomic strand. Conclusion: Our results indicate that RNA editing increases the diversity of miRNAs and their targets, and hence may modulate miRNA function.

Journal ArticleDOI
TL;DR: Rapid diurnal changes in transcript levels are integrated over time to generate quasi-stable changes across large sectors of metabolism, which implies that correlations between metabolites and transcripts are due to regulation of gene expression by metabolites, rather than metabolites being changed as a consequence of a change in gene expression.
Abstract: Background Genome-wide transcript profiling and analyses of enzyme activities from central carbon and nitrogen metabolism show that transcript levels undergo marked and rapid changes during diurnal cycles and after transfer to darkness, whereas changes in activities are smaller and delayed. In the starchless pgm mutant, where sugars are depleted every night, there are accentuated diurnal changes in transcript levels. Enzyme activities in this mutant do not show larger diurnal changes; instead, they shift towards the levels found in the wild type after several days of darkness. This indicates that enzyme activities change slowly, integrating the changes in transcript levels over several diurnal cycles.

Journal ArticleDOI
TL;DR: Two significant evolutionary processes are fundamentally not tree-like in nature - lateral gene transfer among prokaryotes and endosymbiotic gene transfer (from organelles) among eukaryotes - and biologists need to depart from the preconceived notion that all genomes are related by a single bifurcating tree.
Abstract: Two significant evolutionary processes are fundamentally not tree-like in nature - lateral gene transfer among prokaryotes and endosymbiotic gene transfer (from organelles) among eukaryotes. To incorporate such processes into the bigger picture of early evolution, biologists need to depart from the preconceived notion that all genomes are related by a single bifurcating tree.

Journal ArticleDOI
TL;DR: Analysis techniques for generating high-confidence quantitative epistasis scores from measurements made using synthetic genetic array and epistatic miniarray profile (E-MAP) technology are presented, as well as several tools for higher-level analysis of the resulting data that are greatly enhanced by the quantitative score and detection of alleviating interactions.
Abstract: Recently, approaches have been developed for high-throughput identification of synthetic sick/lethal gene pairs. However, these are only a specific example of the broader phenomenon of epistasis, wherein the presence of one mutation modulates the phenotype of another. We present analysis techniques for generating high-confidence quantitative epistasis scores from measurements made using synthetic genetic array and epistatic miniarray profile (E-MAP) technology, as well as several tools for higher-level analysis of the resulting data that are greatly enhanced by the quantitative score and detection of alleviating interactions.

Journal ArticleDOI
TL;DR: A method that integrates all steps to generate a scored phenotype list from raw data is described that is useful for the analysis and documentation of individual RNAi screens and is a prerequisite for the integration of multiple experiments.
Abstract: RNA interference (RNAi) screening is a powerful technology for functional characterization of biological pathways. Interpretation of RNAi screens requires computational and statistical analysis techniques. We describe a method that integrates all steps to generate a scored phenotype list from raw data. It is implemented in an open-source Bioconductor/R package, cellHTS (http://www.dkfz.de/signaling/cellHTS). The method is useful for the analysis and documentation of individual RNAi screens. Moreover, it is a prerequisite for the integration of multiple experiments.

Journal ArticleDOI
TL;DR: AUGUSTUS turned out to be the most accurate ab initio gene finder among the tested tools and is very flexible because it can take information from several sources simultaneously into consideration.
Abstract: Background: A large number of gene prediction programs for the human genome exist. These annotation tools use a variety of methods and data sources. In the recent ENCODE genome annotation assessment project (EGASP), some of the most commonly used and recently developed gene-prediction programs were systematically evaluated and compared on test data from the human genome. AUGUSTUS was among the tools that were tested in this project. Results: AUGUSTUS can be used as an ab initio program, that is, as a program that uses only one single genomic sequence as input information. In addition, it is able to combine information from the genomic sequence under study with external hints from various sources of information. For EGASP, we used genomic sequence alignments as well as alignments to expressed sequence tags (ESTs) and protein sequences as additional sources of information. Within the category of ab initio programs AUGUSTUS predicted significantly more genes correctly than any other ab initio program. At the same time it predicted the smallest number of false positive genes and the smallest number of false positive exons among all ab initio programs. The accuracy of AUGUSTUS could be further improved when additional extrinsic data, such as alignments to EST, protein and/or genomic sequences, was taken into account. Conclusions: AUGUSTUS turned out to be the most accurate ab initio gene finder among the tested tools. Moreover it is very flexible because it can take information from several sources simultaneously into consideration.

Journal ArticleDOI
TL;DR: Advanced mass spectrometry methods can unambiguously identify more than 2,000 proteins in a single proteome, including very low abundant ones, andSubstantially increased coverage of the yeast proteome appears feasible with further development in software and instrumentation.
Abstract: Mass spectrometry has become a powerful tool for the analysis of large numbers of proteins in complex samples, enabling much of proteomics. Due to various analytical challenges, so far no proteome has been sequenced completely. O'Shea, Weissman and co-workers have recently determined the copy number of yeast proteins, making this proteome an excellent model system to study factors affecting coverage. To probe the yeast proteome in depth and determine factors currently preventing complete analysis, we grew yeast cells, extracted proteins and separated them by one-dimensional gel electrophoresis. Peptides resulting from trypsin digestion were analyzed by liquid chromatography mass spectrometry on a linear ion trap-Fourier transform mass spectrometer with very high mass accuracy and sequencing speed. We achieved unambiguous identification of more than 2,000 proteins, including very low abundant ones. Effective dynamic range was limited to about 1,000 and effective sensitivity to about 500 femtomoles, far from the subfemtomole sensitivity possible with single proteins. We used SILAC (stable isotope labeling by amino acids in cell culture) to generate one-to-one pairs of true peptide signals and investigated if sensitivity, sequencing speed or dynamic range were limiting the analysis. Advanced mass spectrometry methods can unambiguously identify more than 2,000 proteins in a single proteome. Complex mixture analysis is not limited by sensitivity but by a combination of dynamic range (high abundance peptides preventing sequencing of low abundance ones) and by effective sequencing speed. Substantially increased coverage of the yeast proteome appears feasible with further development in software and instrumentation.

Journal ArticleDOI
TL;DR: In this article, the authors show that organisms with CENs resembling those in S. cerevisiae are very closely related and that all contain a set of 11 kinetochore proteins not found in organisms with complex CEN.
Abstract: Background: Kinetochores are large multi-protein structures that assemble on centromeric DNA (CEN DNA) and mediate the binding of chromosomes to microtubules. Comprising 125 base-pairs of CEN DNA and 70 or more protein components, Saccharomyces cerevisiae kinetochores are among the best understood. In contrast, most fungal, plant and animal cells assemble kinetochores on CENs that are longer and more complex, raising the question of whether kinetochore architecture has been conserved through evolution, despite considerable divergence in CEN sequence. Results: Using computational approaches, ranging from sequence similarity searches to hidden Markov model-based modeling, we show that organisms with CENs resembling those in S. cerevisiae (point CENs) are very closely related and that all contain a set of 11 kinetochore proteins not found in organisms with complex CENs. Conversely, organisms with complex CENs (regional CENs) contain proteins seemingly absent from point-CEN organisms. However, at least three quarters of known kinetochore proteins are present in all fungi regardless of CEN organization. At least six of these proteins have previously unidentified human orthologs. When fungi and metazoa are compared, almost all have kinetochores constructed around Spc105 and three conserved multiprotein linker complexes (MIND, COMA, and the NDC80 complex). Conclusion: Our data suggest that critical structural features of kinetochores have been well conserved from yeast to man. Surprisingly, phylogenetic analysis reveals that human kinetochore proteins are as similar in sequence to their yeast counterparts as to presumptive Drosophila melanogaster or Caenorhabditis elegans orthologs. This finding is consistent with evidence that kinetochore proteins have evolved very rapidly relative to components of other complex cellular

Journal ArticleDOI
TL;DR: Structural information is available only for free (unbound) vertebrate arrestins, and shows that the conserved overall fold is elongated and composed of two domains, with the core of each domain consisting of a seven-stranded β-sandwich.
Abstract: In vertebrates, the arrestins are a family of four proteins that regulate the signaling and trafficking of hundreds of different G-protein-coupled receptors (GPCRs). Arrestin homologs are also found in insects, protochordates and nematodes. Fungi and protists have related proteins but do not have true arrestins. Structural information is available only for free (unbound) vertebrate arrestins, and shows that the conserved overall fold is elongated and composed of two domains, with the core of each domain consisting of a seven-stranded β-sandwich. Two main intramolecular interactions keep the two domains in the correct relative orientation, but both of these interactions are destabilized in the process of receptor binding, suggesting that the conformation of bound arrestin is quite different. As well as binding to hundreds of GPCR subtypes, arrestins interact with other classes of membrane receptors and more than 20 surprisingly diverse types of soluble signaling protein. Arrestins thus serve as ubiquitous signaling regulators in the cytoplasm and nucleus.

Journal ArticleDOI
TL;DR: Heterochromatin Protein 1 proteins are amenable to posttranslational modifications that probably regulate these distinct functions, thereby creating a subcode within the context of the 'histone code' of histone posttranslated modifications.
Abstract: Heterochromatin Protein 1 (HP1) was first discovered in Drosophila as a dominant suppressor of position-effect variegation and a major component of heterochromatin. The HP1 family is evolutionarily conserved, with members in fungi, plants and animals but not prokaryotes, and there are multiple members within the same species. The amino-terminal chromodomain binds methylated lysine 9 of histone H3, causing transcriptional repression. The highly conserved carboxy-terminal chromoshadow domain enables dimerization and also serves as a docking site for proteins involved in a wide variety of nuclear functions, from transcription to nuclear architecture. In addition to heterochromatin packaging, it is becoming increasingly clear that HP1 proteins have diverse roles in the nucleus, including the regulation of euchromatic genes. HP1 proteins are amenable to posttranslational modifications that probably regulate these distinct functions, thereby creating a subcode within the context of the 'histone code' of histone posttranslational modifications.