scispace - formally typeset
Search or ask a question

Showing papers on "Annotation published in 2013"


Journal ArticleDOI
Judith A. Blake, Mary E. Dolan, H. Drabkin, David P. Hill, Li N, D. Sitnikov, Susan M. Bridges1, Shane C. Burgess1, Teresia Buza1, Fiona M. McCarthy1, Divyaswetha Peddinti1, Lakshmi Pillai1, Seth Carbon2, Heiko Dietze2, Amelia Ireland2, Suzanna E. Lewis2, Christopher J. Mungall2, Pascale Gaudet3, Chrisholm Rl3, Petra Fey3, Warren A. Kibbe3, S. Basu3, Deborah A. Siegele4, B. K. McIntosh4, Daniel P. Renfro4, Adrienne E. Zweifel4, James C. Hu4, Nicholas H. Brown5, Susan Tweedie5, Yasmin Alam-Faruque6, Rolf Apweiler6, A. Auchinchloss6, Kristian B. Axelsen6, Benoit Bely6, M. C. Blatter6, Bonilla C6, Bouguerleret L6, Emmanuel Boutet6, Lionel Breuza6, Alan Bridge6, W. M. Chan6, Gayatri Chavali6, Elisabeth Coudert6, E. Dimmer6, Anne Estreicher6, L Famiglietti6, Marc Feuermann6, Arnaud Gos6, Nadine Gruaz-Gumowski6, Hieta R6, Hinz C6, Chantal Hulo6, Rachael P. Huntley6, J. James6, Florence Jungo6, Guillaume Keller6, Kati Laiho6, Duncan Legge6, P. Lemercier6, Damien Lieberherr6, Michele Magrane6, Maria Jesus Martin6, Patrick Masson6, Mutowo-Muellenet P6, Claire O'Donovan6, Ivo Pedruzzi6, Klemens Pichler6, Diego Poggioli6, Porras Millán P6, Sylvain Poux6, Catherine Rivoire6, Bernd Roechert6, Tony Sawford6, Michel Schneider6, Andre Stutz6, Shyamala Sundaram6, Michael Tognolli6, Ioannis Xenarios6, Foulgar R, Jane Lomax, Paola Roncaglia, Varsha K. Khodiyar7, Ruth C. Lovering7, Philippa J. Talmud7, Marcus C. Chibucos8, Giglio Mg9, Hsin-Yu Chang9, Sarah Hunter9, Craig McAnulla9, Alex L. Mitchell9, Sangrador A9, Stephan R, Midori A. Harris5, Stephen G. Oliver5, Kim Rutherford5, Wood7, Jürg Bähler7, Antonia Lock7, Paul J. Kersey9, McDowall Dm9, Daniel M. Staines9, Melinda R. Dwinell10, Mary Shimoyama10, Stan Laulederkind10, Tom Hayman10, Shur-Jen Wang10, Timothy F. Lowry10, P D'Eustachio11, Lisa Matthews11, Rama Balakrishnan12, Gail Binkley12, J. M. Cherry12, Maria C. Costanzo12, Selina S. Dwight12, Engel12, Dianna G. Fisk12, Benjamin C. Hitz12, Eurie L. Hong12, Kalpana Karra12, Miyasato12, Robert S. Nash12, Julie Park12, Marek S. Skrzypek12, Shuai Weng12, Edith D. Wong12, Tanya Z. Berardini13, Eva Huala13, Huaiyu Mi14, Paul Thomas14, Juancarlos Chan15, Ranjana Kishore15, Paul W. Sternberg15, Van Auken K15, Doug Howe16, Monte Westerfield16 
TL;DR: The Gene Ontology (GO) Consortium is a community-based bioinformatics resource that classifies gene product function through the use of structured, controlled vocabularies and has been expanded not only to cover new areas of biology through focused interaction with experts, but also to capture greater specificity in all areas of the ontology.
Abstract: The Gene Ontology (GO) Consortium (GOC, http://www.geneontology.org) is a community-based bioinformatics resource that classifies gene product function through the use of structured, controlled vocabularies. Over the past year, the GOC has implemented several processes to increase the quantity, quality and specificity of GO annotations. First, the number of manual, literature-based annotations has grown at an increasing rate. Second, as a result of a new 'phylogenetic annotation' process, manually reviewed, homology-based annotations are becoming available for a broad range of species. Third, the quality of GO annotations has been improved through a streamlined process for, and automated quality checks of, GO annotations deposited by different annotation groups. Fourth, the consistency and correctness of the ontology itself has increased by using automated reasoning tools. Finally, the GO has been expanded not only to cover new areas of biology through focused interaction with experts, but also to capture greater specificity in all areas of the ontology using tools for adding new combinatorial terms. The GOC works closely with other ontology developers to support integrated use of terminologies. The GOC supports its user community through the use of e-mail lists, social media and web-based resources.

492 citations


Proceedings ArticleDOI
23 Jun 2013
TL;DR: This is the first attempt to create a tool suitable for annotating massive facial databases, and the tool for creating annotations for MultiPIE, XM2VTS, AR, and FRGC Ver. 2 databases is employed.
Abstract: Developing powerful deformable face models requires massive, annotated face databases on which techniques can be trained, validated and tested. Manual annotation of each facial image in terms of landmarks requires a trained expert and the workload is usually enormous. Fatigue is one of the reasons that in some cases annotations are inaccurate. This is why, the majority of existing facial databases provide annotations for a relatively small subset of the training images. Furthermore, there is hardly any correspondence between the annotated land-marks across different databases. These problems make cross-database experiments almost infeasible. To overcome these difficulties, we propose a semi-automatic annotation methodology for annotating massive face datasets. This is the first attempt to create a tool suitable for annotating massive facial databases. We employed our tool for creating annotations for MultiPIE, XM2VTS, AR, and FRGC Ver. 2 databases. The annotations will be made publicly available from http://ibug.doc.ic.ac.uk/ resources/facial-point-annotations/. Finally, we present experiments which verify the accuracy of produced annotations.

407 citations


Journal ArticleDOI
Joel Nothman1, Nicky Ringland1, Will Radford1, Tara Murphy1, James Curran1 
TL;DR: The approach outperforms other approaches to automatic ne annotation; competes with gold-standard training when tested on an evaluation corpus from a different source; and performs 10% better than newswire-trained models on manually-annotated Wikipedia text.

338 citations


Journal ArticleDOI
TL;DR: The GenomeTools, a convenient and efficient software library and associated software tools for developing bioinformatics software intended to create, process or convert annotation graphs, strictly follow the annotation graph approach, offering a unified graph-based representation.
Abstract: Genome annotations are often published as plain text files describing genomic features and their subcomponents by an implicit annotation graph. In this paper, we present the GenomeTools, a convenient and efficient software library and associated software tools for developing bioinformatics software intended to create, process or convert annotation graphs. The GenomeTools strictly follow the annotation graph approach, offering a unified graph-based representation. This gives the developer intuitive and immediate access to genomic features and tools for their manipulation. To process large annotation sets with low memory overhead, we have designed and implemented an efficient pull-based approach for sequential processing of annotations. This allows to handle even the largest annotation sets, such as a complete catalogue of human variations. Our object-oriented C-based software library enables a developer to conveniently implement their own functionality on annotation graphs and to integrate it into larger workflows, simultaneously accessing compressed sequence data if required. The careful C implementation of the GenomeTools does not only ensure a light-weight memory footprint while allowing full sequential as well as random access to the annotation graph, but also facilitates the creation of bindings to a variety of script programming languages (like Python and Ruby) sharing the same interface.

330 citations


Proceedings Article
01 Aug 2013
TL;DR: This paper presents smatch, a metric that calculates the degree of overlap between two semantic feature structures, and gives an efficient algorithm to compute the metric and shows the results of an inter-annotator agreement study.
Abstract: The evaluation of whole-sentence semantic structures plays an important role in semantic parsing and large-scale semantic structure annotation. However, there is no widely-used metric to evaluate wholesentence semantic structures. In this paper, we present smatch, a metric that calculates the degree of overlap between two semantic feature structures. We give an efficient algorithm to compute the metric and show the results of an inter-annotator agreement study.

327 citations


Proceedings Article
01 Aug 2013
TL;DR: WebAnno offers annotation project management, freely configurable tagsets and the management of users in different roles, and the architecture design allows adding additional modes of visualization and editing, when new kinds of annotations are to be supported.
Abstract: We present WebAnno, a general purpose web-based annotation tool for a wide range of linguistic annotations. WebAnno offers annotation project management, freely configurable tagsets and the management of users in different roles. WebAnno uses modern web technology for visualizing and editing annotations in a web browser. It supports arbitrarily large documents, pluggable import/export filters, the curation of annotations across various users, and an interface to farming out annotations to a crowdsourcing platform. Currently WebAnno allows part-ofspeech, named entity, dependency parsing and co-reference chain annotations. The architecture design allows adding additional modes of visualization and editing, when new kinds of annotations are to be supported.

205 citations


Proceedings Article
01 Aug 2013
TL;DR: SEMILAR implements a number of algorithms for assessing the semantic similarity between two texts and offers facilities for manual se-mantic similarity annotation by experts through its component SEMILAT (a SEMantic simILarity Annotation Tool).
Abstract: We present in this paper SEMILAR, the SEMantic simILARity toolkit. SEMILAR implements a number of algorithms for assessing the semantic similarity between two texts. It is available as a Java library and as a Java standalone ap-plication offering GUI-based access to the implemented semantic similarity methods. Furthermore, it offers facilities for manual se-mantic similarity annotation by experts through its component SEMILAT (a SEMantic simILarity Annotation Tool).

126 citations


Proceedings Article
01 Aug 2013
TL;DR: Novel techniques for learning from the outputs of multiple annotators while accounting for annotator specific behaviour are presented, which use multi-task Gaussian Processes to learn jointly a series of annotator and metadata specific models.
Abstract: Annotating linguistic data is often a complex, time consuming and expensive endeavour. Even with strict annotation guidelines, human subjects often deviate in their analyses, each bringing different biases, interpretations of the task and levels of consistency. We present novel techniques for learning from the outputs of multiple annotators while accounting for annotator specific behaviour. These techniques use multi-task Gaussian Processes to learn jointly a series of annotator and metadata specific models, while explicitly representing correlations between models which can be learned directly from data. Our experiments on two machine translation quality estimation datasets show uniform significant accuracy gains from multi-task learning, and consistently outperform strong baselines.

125 citations


Proceedings Article
01 Aug 2013
TL;DR: This work proposes a simple yet novel framework that combines a passage retrieval model using coarse features into a state-of-the-art relation extractor using multi-instance learning with fine features, and adapts the information retrieval technique of pseudorelevance feedback to expand knowledge bases.
Abstract: Distant supervision has attracted recent interest for training information extraction systems because it does not require any human annotation but rather employs existing knowledge bases to heuristically label a training corpus. However, previous work has failed to address the problem of false negative training examples mislabeled due to the incompleteness of knowledge bases. To tackle this problem, we propose a simple yet novel framework that combines a passage retrieval model using coarse features into a state-of-the-art relation extractor using multi-instance learning with fine features. We adapt the information retrieval technique of pseudorelevance feedback to expand knowledge bases, assuming entity pairs in top-ranked passages are more likely to express a relation. Our proposed technique significantly improves the quality of distantly supervised relation extraction, boosting recall from 47.7% to 61.2% with a consistently high level of precision of around 93% in the experiments.

111 citations


Proceedings Article
01 Aug 2013
TL;DR: An automated annotation scheme learning system is introduced, which derives task-specific event rules and constraints from the training data, and uses these to automatically adapt the system for new corpora with no additional programming required.
Abstract: We participate in the BioNLP 2013 Shared Task with Turku Event Extraction System (TEES) version 2.1. TEES is a support vector machine (SVM) based text mining system for the extraction of events and relations from natural language texts. In version 2.1 we introduce an automated annotation scheme learning system, which derives task-specific event rules and constraints from the training data, and uses these to automatically adapt the system for new corpora with no additional programming required. TEES 2.1 is shown to have good generalizability and good performance across the BioNLP 2013 task corpora, achieving first place in four out of eight tasks.

106 citations


Journal ArticleDOI
01 Dec 2013
TL;DR: GATE Teamware enables users to carry out complex corpus annotation projects, involving distributed annotator teams, and has been evaluated through the creation of several gold standard corpora and internal projects, as well as through external evaluation in commercial and EU text annotation projects.
Abstract: This paper presents GATE Teamware--an open-source, web-based, collaborative text annotation framework. It enables users to carry out complex corpus annotation projects, involving distributed annotator teams. Different user roles are provided (annotator, manager, administrator) with customisable user interface functionalities, in order to support the complex workflows and user interactions that occur in corpus annotation projects. Documents may be pre-processed automatically, so that human annotators can begin with text that has already been pre-annotated and thus making them more efficient. The user interface is simple to learn, aimed at non-experts, and runs in an ordinary web browser, without need of additional software installation. GATE Teamware has been evaluated through the creation of several gold standard corpora and internal projects, as well as through external evaluation in commercial and EU text annotation projects. It is available as on-demand service on GateCloud.net, as well as open-source for self-installation.

Proceedings Article
01 Aug 2013
TL;DR: This article presented a case study of a difficult and important categorical annotation task (word sense) to demonstrate a probabilistic annotation model applied to crowdsourced data and argued that standard (chance-adjusted) agreement levels are neither necessary nor sufficient to ensure high quality gold standard labels.
Abstract: This paper presents a case study of a difficult and important categorical annotation task (word sense) to demonstrate a probabilistic annotation model applied to crowdsourced data. It is argued that standard (chance-adjusted) agreement levels are neither necessary nor sufficient to ensure high quality gold standard labels. Compared to conventional agreement measures, application of an annotation model to instances with crowdsourced labels yields higher quality labels at lower cost.

Posted Content
TL;DR: It is suggested that micropublications, generated by useful software tools supporting such activities as writing, editing, reviewing, and discussion, will be of great value in improving the quality and tractability of biomedical communications.
Abstract: The Micropublications semantic model for scientific claims, evidence, argumentation and annotation in biomedical publications, is a metadata model of scientific argumentation, designed to support several key requirements for exchange and value-addition of semantic metadata across the biomedical publications ecosystem. Micropublications allow formalizing the argument structure of scientific publications so that (a) their internal structure is semantically clear and computable; (b) citation networks can be easily constructed across large corpora; (c) statements can be formalized in multiple useful abstraction models; (d) statements in one work may cite statements in another, individually; (e) support, similarity and challenge of assertions can be modelled across corpora; (f) scientific assertions, particularly in review articles, may be transitively closed to supporting evidence and methods. The model supports natural language statements; data; methods and materials specifications; discussion and commentary; as well as challenge and disagreement. A detailed analysis of nine use cases is provided, along with an implementation in OWL 2 and SWRL, with several example instantiations in RDF.

Patent
31 Dec 2013
TL;DR: In this paper, features and techniques for interacting with paginated digital content, including a multi-purpose tool and an annotation mode, are disclosed, which can provide access to multiple modes (e.g., copy, define, note, and/or highlight modes) that a user can invoke.
Abstract: Features and techniques are disclosed for interacting with paginated digital content, including a multi-purpose tool and an annotation mode. The multi-purpose tool, which may be represented by a graphic (e.g., a movable interactive graphic), can provide access to multiple modes (e.g., copy, define, note, and/or highlight modes) that a user can invoke. The mode invoked determines the functions performed by the tool when interacting with the paginated digital content. The annotation mode, which may be invoked using the multi-purpose tool or independently thereof, can allow a user to create and edit annotations, such as highlights and notes (e.g., sticky notes, margin notes, and/or highlight notes), for paginated digital content. Editing the annotations may include selecting a desired color for the annotation, for example. The annotation mode may also allow a user to intuitively merge and delete annotations previously added to paginated digital content.

08 Feb 2013
TL;DR: The Open Annotation Core Data Model specifies an interoperable framework for creating associations between related resources, annotations, using a methodology that conforms to the Architecture of the World Wide Web.
Abstract: The Open Annotation Core Data Model specifies an interoperable framework for creating associations between related resources, annotations, using a methodology that conforms to the Architecture of the World Wide Web. Open Annotations can easily be shared between platforms, with sufficient richness of expression to satisfy complex requirements while remaining simple enough to also allow for the most common use cases, such as attaching a piece of text to a single web resource.An Annotation is considered to be a set of connected resources, typically including a body and target, where the body is somehow about the target. The full model supports additional functionality, enabling semantic annotations, embedding content, selecting segments of resources, choosing the appropriate representation of a resource and providing styling hints for consuming clients.

Journal ArticleDOI
TL;DR: The findings suggested that students participated actively in the collaborative learning activity and were engaged in a variety of behaviors including self-reflection, elaboration, internalization, and showing support.
Abstract: The purpose of the study was to understand student interaction and learning supported by a collaboratively social annotation tool — Diigo. The researcher examined through a case study how students participated and interacted when learning an online text with the social annotation tool — Diigo, and how they perceived their experience. The findings suggested that students participated actively in the collaborative learning activity and were engaged in a variety of behaviors including self-reflection, elaboration, internalization, and showing support. Although students generally had a moderately positive attitude toward using the social annotation tool for collaborative learning, a few problems were identified. In particular, students found it distracting to navigate through a large amount of annotation while reading the text. The study has implications for future research on using or developing social annotation tools for educational purposes.

Journal ArticleDOI
TL;DR: It is demonstrated that graded WSsim and Usim ratings can be used to analyze existing coarse-grained sense groupings to identify sense groups that may not match intuitions of untrained native speakers, and that the WSsim ratings are not subsumed by any static sense grouping.
Abstract: Word sense disambiguation (WSD) is an old and important task in computational linguistics that still remains challenging, to machines as well as to human annotators. Recently there have been several proposals for representing word meaning in context that diverge from the traditional use of a single best sense for each occurrence. They represent word meaning in context through multiple paraphrases, as points in vector space, or as distributions over latent senses. New methods of evaluating and comparing these different representations are needed.In this paper we propose two novel annotation schemes that characterize word meaning in context in a graded fashion. In WSsim annotation, the applicability of each dictionary sense is rated on an ordinal scale. Usim annotation directly rates the similarity of pairs of usages of the same lemma, again on a scale. We find that the novel annotation schemes show good inter-annotator agreement, as well as a strong correlation with traditional single-sense annotation and ...

Proceedings Article
01 Aug 2013
TL;DR: The paper addresses the challenge of converting MIDT, an existing dependency– based Italian treebank resulting from the harmonization and merging of smaller resources, into the Stanford Dependencies annotation formalism, with the final aim of constructing a standard–compliant resource for the Italian language.
Abstract: The paper addresses the challenge of converting MIDT, an existing dependency– based Italian treebank resulting from the harmonization and merging of smaller resources, into the Stanford Dependencies annotation formalism, with the final aim of constructing a standard–compliant resource for the Italian language. Achieved results include a methodology for converting treebank annotations belonging to the same dependency–based family, the Italian Stanford Dependency Treebank (ISDT), and an Italian localization of the Stanford Dependency scheme.

Journal ArticleDOI
TL;DR: This paper presents an automatic annotation approach that first aligns the data units on a result page into different groups such that the data in the same group have the same semantic.
Abstract: An increasing number of databases have become web accessible through HTML form-based search interfaces. The data units returned from the underlying database are usually encoded into the result pages dynamically for human browsing. For the encoded data units to be machine processable, which is essential for many applications such as deep web data collection and Internet comparison shopping, they need to be extracted out and assigned meaningful labels. In this paper, we present an automatic annotation approach that first aligns the data units on a result page into different groups such that the data in the same group have the same semantic. Then, for each group we annotate it from different aspects and aggregate the different annotations to predict a final annotation label for it. An annotation wrapper for the search site is automatically constructed and can be used to annotate new result pages from the same web database. Our experiments indicate that the proposed approach is highly effective.

Proceedings ArticleDOI
22 Jul 2013
TL;DR: This work transforms the problem into a tag recommendation problem with a controlled tag library, and proposes two variants of an algorithm for recommending tags, and develops algorithms for automatic annotation of metadata.
Abstract: The increase of the complexity and advancement in ecological and environmental sciences encourages scientists across the world to collect data from multiple places, times, and thematic scales to verify their hypotheses. Accumulated over time, such data not only increases in amount, but also in the diversity of the data sources spread around the world. This poses a huge challenge for scientists who have to manually search for information. To alleviate such problems, ONEMercury has recently been implemented as part of the DataONE project to serve as a portal for accessing environmental and observational data across the globe. ONEMercury harvests metadata from the data hosted by multiple repositories and makes it searchable. However, harvested metadata records sometimes are poorly annotated or lacking meaningful keywords, which could affect effective retrieval. Here, we develop algorithms for automatic annotation of metadata. We transform the problem into a tag recommendation problem with a controlled tag library, and propose two variants of an algorithm for recommending tags. Our experiments on four datasets of environmental science metadata records not only show great promises on the performance of our method, but also shed light on the different natures of the datasets.

Proceedings Article
01 Jun 2013
TL;DR: This work proposes three new annotation methodologies for gathering word senses where untrained annotators are allowed to use multiple labels and weight the senses, showing that given the appropriate annotation task, untrained workers can obtain at least as high agreement as annotators in a controlled setting and in aggregate generate equally as good of a sense labeling.
Abstract: Word sense disambiguation aims to identify which meaning of a word is present in a given usage. Gathering word sense annotations is a laborious and difficult task. Several methods have been proposed to gather sense annotations using large numbers of untrained annotators, with mixed results. We propose three new annotation methodologies for gathering word senses where untrained annotators are allowed to use multiple labels and weight the senses. Our findings show that given the appropriate annotation task, untrained workers can obtain at least as high agreement as annotators in a controlled setting, and in aggregate generate equally as good of a sense labeling.

Posted Content
TL;DR: The Open Annotation Core Data Model as mentioned in this paper is an interoperable framework for creating associations between related resources, called annotations, using a methodology that conforms to the Architecture of the World Wide Web.
Abstract: The Open Annotation Core Data Model specifies an interoperable framework for creating associations between related resources, called annotations, using a methodology that conforms to the Architecture of the World Wide Web. Open Annotations can easily be shared between platforms, with sufficient richness of expression to satisfy complex requirements while remaining simple enough to also allow for the most common use cases, such as attaching a piece of text to a single web resource. This paper presents the W3C Open Annotation Community Group specification and the rationale behind the scoping and technical decisions that were made. It also motivates interoperable Annotations via use cases, and provides a brief analysis of the advantages over previous specifications.

Journal ArticleDOI
TL;DR: A novel web-based tool that integrates the server-side capabilities for data analysis with the browse-based technology for data visualization, INVEX has two key features: flexible differential expression analysis for a wide variety of experimental designs and interactive visualization within the context of metadata and biological annotations.
Abstract: Summary: Gene expression or metabolomics data generated from clinical settings are often associated with multiple metadata (i.e. diagnosis, genotype, gender, etc.). It is of great interest to analyze and to visualize the data in these contexts. Here, we introduce INVEX—a novel web-based tool that integrates the server-side capabilities for data analysis with the browse-based technology for data visualization. INVEX has two key features: (i) flexible differential expression analysis for a wide variety of experimental designs; and (ii) interactive visualization within the context of metadata and biological annotations. INVEX has built-in support for gene/metabolite annotation and a fully functional heatmap builder. Availability and implementation: Freely available at http://www.invex.ca. Contact: ac.cbu.balkcocnah@bob

Journal ArticleDOI
TL;DR: The Microbial Genomic context Viewer (MGcV), an interactive, web-based application tailored to strengthen the practice of manual comparative genome context analysis for bacteria, advances the manual comparative analysis of genes and regulatory elements by providing fast and flexible integration of gene related data combined with straightforward data retrieval.
Abstract: Conserved gene context is used in many types of comparative genome analyses. It is used to provide leads on gene function, to guide the discovery of regulatory sequences, but also to aid in the reconstruction of metabolic networks. We present the Microbial Genomic context Viewer (MGcV), an interactive, web-based application tailored to strengthen the practice of manual comparative genome context analysis for bacteria. MGcV is a versatile, easy-to-use tool that renders a visualization of the genomic context of any set of selected genes, genes within a phylogenetic tree, genomic segments, or regulatory elements. It is tailored to facilitate laborious tasks such as the interactive annotation of gene function, the discovery of regulatory elements, or the sequence-based reconstruction of gene regulatory networks. We illustrate that MGcV can be used in gene function annotation by visually integrating information on prokaryotic genes, like their annotation as available from NCBI with other annotation data such as Pfam domains, sub-cellular location predictions and gene-sequence characteristics such as GC content. We also illustrate the usefulness of the interactive features that allow the graphical selection of genes to facilitate data gathering (e.g. upstream regions, ID’s or annotation), in the analysis and reconstruction of transcription regulation. Moreover, putative regulatory elements and their corresponding scores or data from RNA-seq and microarray experiments can be uploaded, visualized and interpreted in (ranked-) comparative context maps. The ranked maps allow the interpretation of predicted regulatory elements and experimental data in light of each other. MGcV advances the manual comparative analysis of genes and regulatory elements by providing fast and flexible integration of gene related data combined with straightforward data retrieval. MGcV is available at http://mgcv.cmbi.ru.nl .

Journal ArticleDOI
TL;DR: The results indicate that, for reading comprehension, the in-text format led to the lowest performance among all types of annotation, including the control condition, and among the 3 annotation formats, the glossary type was considered the least preferred type by participants.
Abstract: This study extends current knowledge by exploring the effect of different annotation formats, namely in-text annotation, glossary annotation, and pop-up annotation, on hypertext reading comprehension in a foreign language and vocabulary acquisition across student proficiencies. User attitudes toward the annotation presentation were also investigated. Data were collected from 83 non-English-majored university students in Taiwan in a 4-week period. Each week participants read 3 passages, each with different annotation formats as a treatment condition and one passage without annotation as a control condition. Posttests of reading comprehension and vocabulary recognition followed each passage. The results indicate that, for reading comprehension, the in-text format led to the lowest performance among all types of annotation, including the control condition. The best performance was observed in the condition where annotations were presented in the pop-up format. No interaction effect between format and proficiency was detected. For vocabulary acquisition, reading passages with hypermedia annotations significantly benefit vocabulary learning for participants of medium and high proficiencies compared with the control condition. No significant differences were found among the 3 formats. The beneficial effect, however, did not extend to low-proficiency participants. Participant feedback revealed a positive attitude toward annotations. Among the 3 annotation formats, the glossary type was considered the least preferred type by participants. Findings of the research provide insights on the design and instruction for online reading.

Proceedings ArticleDOI
01 Aug 2013
TL;DR: This work addresses the problem of transferring an SRL model from one language to another using a shared feature representation and assesses competitive performance as compared to a state-of-the-art unsupervised SRL system and a cross-lingual annotation projection baseline.
Abstract: Semantic Role Labeling (SRL) has become one of the standard tasks of natural language processing and proven useful as a source of information for a number of other applications. We address the problem of transferring an SRL model from one language to another using a shared feature representation. This approach is then evaluated on three language pairs, demonstrating competitive performance as compared to a state-of-the-art unsupervised SRL system and a cross-lingual annotation projection baseline. We also consider the contribution of different aspects of the feature representation to the performance of the model and discuss practical applicability of this method. 1 Background and Motivation

Patent
17 Jan 2013
TL;DR: In this paper, a user provides an annotation, such as text or graphics, in relation to a resource available on a computer network, and the annotation is automatically stored and/or retrieved without requiring separate action from the user to accomplish the storage or retrieval.
Abstract: A user provides an annotation, such as text or graphics, in relation to a resource available on a computer network. The annotation is automatically stored and/or retrieved without requiring separate action from the user to accomplish the storage or retrieval. An annotation interface may receive the annotation from the user. The annotation is then stored in association with the user and the network address of the resource. The user's annotation may be later retrieved and displayed to the user based on the network address of the resource. In one specific embodiment, a browser toolbar receives and displays user annotations associated with Web sites or Web pages to which the user has navigated. Preferably, the annotation interface remains available to the user throughout the time in which the resource is provided. Further controls may enable the user to make an annotation publicly available to others, and to receive annotations from others.

DOI
30 Aug 2013
TL;DR: It is shown that a very large number of relations carry signals that identify them as such, and thus the detailed, extensive analysis of signals in the corpus will aid research in the automatic parsing of discourse relations.
Abstract: We present an annotation effort that involves adding a new layer of annotation to an existing corpus. We are interested in how rhetorical relations are signalled in discourse, and thus begin with a corpus already annotated for rhetorical relations, to which we add signalling information. We show that a very large number of relations carry signals that identify them as such. The detailed, extensive analysis of signals in the corpus will aid research in the automatic parsing of discourse relations.

Proceedings Article
01 Jan 2013
TL;DR: Results indicate that more complex methods of Croatian-to- Serbian annotation projection are not required on such dataset sizes for these particular tasks.
Abstract: We investigate state-of-the-art statistical models for lemmatization and morphosyntactic tagging of Croatian and Serbian. The models stem from a new manually annotated SETIMES.HR corpus of Croatian, based on the SETimes parallel corpus. We train models on Croatian text and evaluate them on samples of Croatian and Serbian from the SETimes corpus and the two Wikipedias. Lemmatization accuracy for the two languages reaches 97.87% and 96.30%, while full morphosyntactic tagging accuracy using a 600-tag tagset peaks at 87.72% and 85.56%, respectively. Part of speech tagging accuracies reach 97.13% and 96.46%. Results indicate that more complex methods of Croatian-to- Serbian annotation projection are not required on such dataset sizes for these particular tasks. The SETIMES.HR corpus, its resulting models and test sets are all made freely available .

Proceedings ArticleDOI
02 May 2013
TL;DR: This paper presents the W3C Open Annotation Community Group specification and the rationale behind the scoping and technical decisions that were made, and motivates interoperable Annotations via use cases, and provides a brief analysis of the advantages over previous specifications.
Abstract: The Open Annotation Core Data Model specifies an interoperable framework for creating associations between related resources, called annotations, using a methodology that conforms to the Architecture of the World Wide Web. Open Annotations can easily be shared between platforms, with sufficient richness of expression to satisfy complex requirements while remaining simple enough to also allow for the most common use cases, such as attaching a piece of text to a single web resource. This paper presents the W3C Open Annotation Community Group specification and the rationale behind the scoping and technical decisions that were made. It also motivates interoperable Annotations via use cases, and provides a brief analysis of the advantages over previous specifications.