scispace - formally typeset
Search or ask a question
Topic

Annotation

About: Annotation is a research topic. Over the lifetime, 6719 publications have been published within this topic receiving 203463 citations. The topic is also known as: note & markup.


Papers
More filters
Journal ArticleDOI
TL;DR: A unique, manually curated annotation of Bacillus subtilis strain 168 is presented, essentially based on experimental data, which reveals how this bacterium lives in a plant niche, while carrying a paleome operating system common to Firmicutes and Tenericutes.
Abstract: Genome annotation is, nowadays, performed via automatic pipelines that cannot discriminate between right and wrong annotations. Given their importance in increasing the accuracy of the genome annotations of other organisms, it is critical that the annotations of model organisms reflect the current annotation gold standard. The genome of Bacillus subtilis strain 168 was sequenced twenty years ago. Using a combination of inductive, deductive and abductive reasoning, we present a unique, manually curated annotation, essentially based on experimental data. This reveals how this bacterium lives in a plant niche, while carrying a paleome operating system common to Firmicutes and Tenericutes. Dozens of new genomic objects and an extensive literature survey have been included for the sequence available at the INSDC (AccNum AL009126.3). We also propose an extension to Demerec's nomenclature rules that will help investigators connect to this type of curated annotation via the use of common gene names.

78 citations

Journal ArticleDOI
TL;DR: Some of the ways in which GO can change should be carefully considered by all users of GO as they may have a significant impact on the resulting gene product annotations, and therefore the functional description of the gene product, or the interpretation of analyses performed on GO datasets.
Abstract: The Gene Ontology Consortium (GOC) is a major bioinformatics project that provides structured controlled vocabularies to classify gene product function and location. GOC members create annotations to gene products using the Gene Ontology (GO) vocabularies, thus providing an extensive, publicly available resource. The GO and its annotations to gene products are now an integral part of functional analysis, and statistical tests using GO data are becoming routine for researchers to include when publishing functional information. While many helpful articles about the GOC are available, there are certain updates to the ontology and annotation sets that sometimes go unobserved. Here we describe some of the ways in which GO can change that should be carefully considered by all users of GO as they may have a significant impact on the resulting gene product annotations, and therefore the functional description of the gene product, or the interpretation of analyses performed on GO datasets. GO annotations for gene products change for many reasons, and while these changes generally improve the accuracy of the representation of the underlying biology, they do not necessarily imply that previous annotations were incorrect. We additionally describe the quality assurance mechanisms we employ to improve the accuracy of annotations, which necessarily changes the composition of the annotation sets we provide. We use the Universal Protein Resource (UniProt) for illustrative purposes of how the GO Consortium, as a whole, manages these changes.

78 citations

Proceedings ArticleDOI
10 May 2005
TL;DR: The Rich News system, that can automatically annotate radio and television news with the aid of resources retrieved from the World Wide Web, is described, and an evaluation shows that the system operates with high precision, and with a moderate level of recall.
Abstract: The Rich News system, that can automatically annotate radio and television news with the aid of resources retrieved from the World Wide Web, is described. Automatic speech recognition gives a temporally precise but conceptually inaccurate annotation model. Information extraction from related web news sites gives the opposite: conceptual accuracy but no temporal data. Our approach combines the two for temporally accurate conceptual semantic annotation of broadcast news. First low quality transcripts of the broadcasts are produced using speech recognition, and these are then automatically divided into sections corresponding to individual news stories. A key phrases extraction component finds key phrases for each story and uses these to search for web pages reporting the same event. The text and meta-data of the web pages is then used to create index documents for the stories in the original broadcasts, which are semantically annotated using the KIM knowledge management platform. A web interface then allows conceptual search and browsing of news stories, and playing of the parts of the media files corresponding to each news story. The use of material from the World Wide Web allows much higher quality textual descriptions and semantic annotations to be produced than would have been possible using the ASR transcript directly. The semantic annotations can form a part of the Semantic Web, and an evaluation shows that the system operates with high precision, and with a moderate level of recall.

78 citations

Journal ArticleDOI
TL;DR: Psa mutants displayed pleiotropic phenotypes that are reminiscent of alterations observed after the replacement of choline (Ch) by ethanolamine (EA) in the cell wall of pneumococcus and the absence of LytA (phenotype iv) could itself explain phenotypes i and iii.
Abstract: Recently, Novak et al. (1998, Mol Microbiol 29: 1285±1296) reported their investigation on the phenomenon of penicillin tolerance in Streptococcus pneumoniae. A library of mutants in pneumococcal surface proteins was screened for the ability to survive in the presence of 10 ́ the minimum inhibitory concentration of antibiotic. A mutant harbouring an insertion in the known gene psaA was isolated among 10 candidate tolerance mutants. Inactivation of psaA was previously shown to result in reduced virulence of S. pneumoniae (as judged by intranasal or intraperitoneal challenge of mice) and in reduced adherence to A549 cells (type II pneumocytes), leading to the suggestion that PsaA was an adhesin (Berry and Paton, 1996, Infect Immun 64: 5255±5262). This gene is part of the psa locus (Fig. 1) that encodes an ATP-binding cassette (ABC) permease belonging to cluster 9, a family of ABC metal permeases (Dintilhac et al., 1997, Mol Microbiol 25: 727±740). Novak et al. (1998, Mol Microbiol 29: 1285±1296) reported that psa mutants displayed pleiotropic phenotypes: (i) reduced sensitivity to the lytic and killing effects of penicillin; (ii) growth in chains of 40±50 (psaC ) to 200±300 (psaD ) cells; (iii) autolysis defect and loss of sensitivity to low concentrations of deoxycholate (DOC), a species characteristic trait; (iv) absence of LytA, the major autolytic amidase; (v) almost complete loss of choline-binding proteins (ChBPs) (psaC and psaD ) and absence of CbpA; (vi) loss of transformability (except psaA); and (vii) manganese (Mn) requirement for growth in a chemically de®ned medium. Because penicillin tolerance was ®rst associated with an autolysis defect (Tomasz et al., 1970, Nature 227: 138± 140), the absence of LytA (phenotype iv) could itself explain phenotypes i and iii. Dysregulation of lytA could not be investigated because, according to Novak et al. (1998, Mol Microbiol 29: 1285±1296), the dif®culty in lysing psa mutant cells prohibited Northern analysis, although lysates of the psa mutants could be obtained for immunoblot analysis of LytA and of RecA and for Southern con®rmation of the psa mutations. Nevertheless, because expression of the lytA gene has been shown to be driven by three different promoters, including Pb which is the recA basal promoter (Mortier-BarrieÁre et al., 1998, Mol Microbiol 27: 159±170), and because wild-type levels of RecA were detected in the psa mutants (Novak et al., 1998, Mol Microbiol 29: 1285±1296), it seems dif®cult to account for the complete absence of LytA on the basis of altered expression. On the other hand, phenotypes i±iv are reminiscent of alterations observed after the replacement of choline (Ch) by ethanolamine (EA) in the cell wall of pneumococcus (Tomasz, 1968, Proc Natl Acad Sci USA 59: 86±93). Similar phenotypes were also displayed by Ch-independent mutants of S. pneumoniae (Severin et al., 1997, Microb Drug Res 3: 391±400; Yother et al., 1998, J Bacteriol 180: 2093±2101). S. pneumoniae has a nutritional requirement for Ch that is incorporated by covalent bonds into the cell wall teichoic acids (TA) and in the membrane-bound lipoteichoic acid (LTA). Ch residues bound to TA (ChTA) were shown to be absolutely required for LytA activity (Holtje and Tomasz, 1975; J Biol Chem 250: 6072±6076). The action of LytA has long been thought to be restricted to pneumococcal cell walls because of this requirement. However, recent reports suggest that ChTA is required Molecular Microbiology (1999) 32(4), 881±891

78 citations

Journal ArticleDOI
TL;DR: The establishment of OrchidBase will provide researchers with a high-quality genetic resource for data mining and facilitate efficient experimental studies on orchid biology and biotechnology.
Abstract: Orchids are one of the most ecological and evolutionarily significant plants, and the Orchidaceae is one of the most abundant families of the angiosperms. Genetic databases will be useful not only for gene discovery but also for future genomic annotation. For this purpose, OrchidBase was established from 37,979,342 sequence reads collected from 11 in-house Phalaenopsis orchid cDNA libraries. Among them, 41,310 expressed sequence tags (ESTs) were obtained by using Sanger sequencing, whereas 37,908,032 reads were obtained by using next-generation sequencing (NGS) including both Roche 454 and Solexa Illumina sequencers. These reads were assembled into 8,501 contigs and 76,116 singletons, resulting in 84,617 non-redundant transcribed sequences with an average length of 459 bp. The analysis pipeline of the database is an automated system written in Perl and C#, and consists of the following components: automatic pre-processing of EST reads, assembly of raw sequences, annotation of the assembled sequences and storage of the analyzed information in SQL databases. A web application was implemented with HTML and a Microsoft .NET Framework C# program for browsing and querying the database, creating dynamic web pages on the client side, analyzing gene ontology (GO) and mapping annotated enzymes to KEGG pathways. The online resources for putative annotation can be searched either by text or by using BLAST, and the results can be explored on the website and downloaded. Consequently, the establishment of OrchidBase will provide researchers with a high-quality genetic resource for data mining and facilitate efficient experimental studies on orchid biology and biotechnology. The OrchidBase database is freely available at http://lab.fhes.tn.edu.tw/est.

78 citations


Network Information
Related Topics (5)
Inference
36.8K papers, 1.3M citations
81% related
Deep learning
79.8K papers, 2.1M citations
80% related
Graph (abstract data type)
69.9K papers, 1.2M citations
80% related
Unsupervised learning
22.7K papers, 1M citations
79% related
Cluster analysis
146.5K papers, 2.9M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20231,461
20223,073
2021305
2020401
2019383
2018373