scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Annotea: an open RDF infrastructure for shared Web annotations

TL;DR: The paper presents the overall design of Annotea and describes some of the issues the project has faced and how it has solved them, including combining RDF with XPointer, XLink, and HTTP.
About: This article is published in Computer Networks.The article was published on 2002-08-05 and is currently open access. It has received 565 citations till now. The article focuses on the topics: Annotea & RDF/XML.

Summary (3 min read)

1. INTRODUCTION

  • One of the basic milestones in the road to a Semantic Web [22] is the association of metadata to content.
  • Metadata allows the Web to describe properties about some given content, even if the medium of this content does not directly provide the necessary means to do so.
  • An interesting side e ect, is that a same piece of metadata can be used not only for describing content, but also to organize and classify it, thus setting up other properties the authors had not thought about at rst.
  • One can use it to search for photos of a given location taken at a given time.the authors.
  • In order to reach this goal, the authors have been W3C Fellow from Elisa Communications .

2. DESIGN

  • In this section the authors describe the architecture of the Annotea system and the RDF annotation schema.
  • The authors start with a discussion of the requirements that motivate some of the aspects of their design.

2.1 Requirements

  • Since the early design of Annotea, the authors decided to build an infrastructure that was based on generic RDF, with annotations being one possible instantiation of the infrastructure.
  • This choice has allowed us to concentrate more on the infrastructure than on the application itself.
  • At the same time that an annotation can be seen as metadata related to an annotated document, annotations themselves can have distinct properties.
  • Di erent users have di erent views and needs.
  • Annotations are stored in generic RDF databases.

2.2 Annotea and its operation

  • In Annotea, annotations are described with a dedicated RDF schema and are stored in annotation servers (Fig. 1).
  • Section 2.3 describes the annotation schema further in detail.
  • The user then publishes the annotation to a given annotation server.
  • For each list of annotations that it receives, the browser parses the metadata of each annotation, resolves the XPointer of the annotation and, if successful, highlights the annotated text.
  • The motivation for this choice is to reduce the amount of data that is being sent back to the browser.

2.3 RDF Schema for Annotations

  • The most important feature of an annotation is that it supports the evolving needs of the collaborating groups.
  • In its most simple level, RDF provides (resource, property, value) triples (Fig. 3).
  • By Annotation A super class describing the common features of annotations.
  • Pointing to ner details after the ID can be done by other XPointer means, such as using text matching.
  • Other metadata can be added to the annotation when the working group needs that.

3. ANNOTATIONS IN AMAYA

  • Since the beginning of the project, the authors have been implementing both a client and a server prototype.
  • Amaya [1] is a full-featured web browser and editor developed by W3C for experimenting and validating web speci - cations at an early stage of their development.
  • The relation between an annotation resource and the resource to which the annotation applies.
  • It is also possible to specify additional annotation types as an RDF schema that can be can be downloaded at runtime.
  • Amaya will use the namespace name to try to retrieve an RDF schema from the Web or the schema content can be cached in a local le and speci ed with the same startup con guration.

3.1 Creating an annotation

  • The user has three choices for creating an annotation: annotate a whole document, annotate the position where the caret is, annotate the current selection.
  • The annotation window shows the metadata of the annotation, as de ned in Section 2.3, inside a box and the body of the annotation.
  • If the user clicks on the Source document eld, Amaya will scroll to the annotated text and highlight it if it is a selection.
  • Users can cut and paste fragments from other documents, add links to other documents, and so on.
  • When a user creates an annotation, it is con- sidered a local one and will be stored in the user's Amaya directory.

3.2 Browsing annotations

  • By means of a setup menu, the user can specify the URIs of the annotation servers he wants to query, as well as the local annotation repository.
  • Moreover, if the user does not open all the annotations, the authors save time by not downloading the body.
  • When the user clicks once on the A-element, Amaya highlights the target of the annotation.
  • Instead, the authors place it as the the beginning of the Math expression.
  • The user may then open the orphan annotation and reposition its XPointer or delete it.

3.3 Filtering annotations

  • For a heavily annotated document, seeing the A-element icon can make reading the document bothersome.
  • This lter menu does not have any e ect on the Links view.
  • As an alternative to hiding annotations, the user can also temporarily disable some annotation servers using the conguration menu.
  • The Algae language is derived from Algernon [4].
  • This customized query interface makes it possible to start ltering the annotations on the server side, for example, by only requesting those done in the past week by a given author and belonging to a given annotation type.

5. CONCLUSIONS AND FUTURE PLANS

  • Being able to associate metadata with Web resources is an important milestone for building a Semantic Web.
  • Annotea provides a simple infrastructure for associating annotations with Web documents, without having to modify these documents.
  • Users can extend it by de ning their own annotation types or by adding other annotation properties.
  • All the source code is freely available too.
  • If a user edits an annotated document, in some cases, the XPointer of an annotation may point to the wrong place and thus become a misleading annotation.

Did you find this useful? Give us your feedback

Citations
More filters
01 Jan 2014
TL;DR: This survey article reinterprets the evolution of NLP research as the intersection of three overlapping curves-namely Syntactics, Semantics, and Pragmatics Curves which will eventually lead NLPResearch to evolve into natural language understanding.

768 citations

Journal ArticleDOI
Atanas Kiryakov1, Borislav Popov1, Ivan Terziev1, Dimitar Manov1, Damyan Ognyanoff1 
TL;DR: This paper presents a semantically enhanced information extraction system, which provides automatic semantic annotation with references to classes in the ontology and to instances and argues that such large-scale, fully automatic methods are essential for the transformation of the current largely textual web into a Semantic Web.

651 citations

Journal ArticleDOI
TL;DR: This analysis shows that, while there is still some way to go before semantic annotation tools will be able to address fully all the knowledge management needs, research in the area is active and making good progress.

605 citations

Journal ArticleDOI
TL;DR: This article reinterpreted the evolution of NLP research as the intersection of three overlapping curves-namely Syntactics, Semantics, and Pragmatics Curves-which will eventually lead NLP to evolve into natural language understanding.
Abstract: Natural language processing (NLP) is a theory-motivated range of computational techniques for the automatic analysis and representation of human language. NLP research has evolved from the era of punch cards and batch processing (in which the analysis of a sentence could take up to 7 minutes) to the era of Google and the likes of it (in which millions of webpages can be processed in less than a second). This review paper draws on recent developments in NLP research to look at the past, present, and future of NLP technology in a new light. Borrowing the paradigm of `jumping curves? from the field of business management and marketing prediction, this survey article reinterprets the evolution of NLP research as the intersection of three overlapping curves-namely Syntactics, Semantics, and Pragmatics Curves- which will eventually lead NLP research to evolve into natural language understanding.

553 citations

Proceedings ArticleDOI
20 May 2003
TL;DR: It is argued that automated large scale semantic tagging of ambiguous content can bootstrap and accelerate the creation of the semantic web.
Abstract: This paper describes Seeker, a platform for large-scale text analytics, and SemTag, an application written on the platform to perform automated semantic tagging of large corpora. We apply SemTag to a collection of approximately 264 million web pages, and generate approximately 434 million automatically disambiguated semantic tags, published to the web as a label bureau providing metadata regarding the 434 million annotations. To our knowledge, this is the largest scale semantic tagging effort to date.We describe the Seeker platform, discuss the architecture of the SemTag application, describe a new disambiguation algorithm specialized to support ontological disambiguation of large-scale data, evaluate the algorithm, and present our final results with information about acquiring and making use of the semantic tags. We argue that automated large scale semantic tagging of ambiguous content can bootstrap and accelerate the creation of the semantic web.

527 citations

References
More filters
Book
14 Aug 2002
TL;DR: The Document Object Model: Processing Structured Documents will help you flatten your learning curve, standardize programming, reuse code, and reduce development time.
Abstract: From the Publisher: Here's a practical guide to using the W3C's standardized DOM interfaces to process XML and HTML documents Learn the concepts, design, theory, and origins of the DOM Use the DOM to inspect, navigate, and manipulate a document's nodes and content; then learn to build useful applications that can easily be ported to any DOM-compliant implementation without re-coding Get easy-to-follow advice on using the DOM in real-world scenarios such as manipulating document content, creating user interfaces, and offloading processing to the client side The Document Object Model: Processing Structured Documents will help you flatten your learning curve, standardize programming, reuse code, and reduce development time

483 citations

Proceedings Article
12 Apr 2000
TL;DR: A new Web annotation tool is presented which uses the Document Object Model Level 2 and Dynamic HTML to deliver a system where speed and privacy are important issues and preliminary results show that annotations can be used to produce user-directed document clustering and classification.
Abstract: With bookmark programs, current Web browsers provide a limited support to personalize the Web. We present a new Web annotation tool which uses the Document Object Model Level 2 and Dynamic HTML to deliver a system where speed and privacy are important issues. We report on several experiments showing how annotations improve document access and retrieval by providing user-directed document summaries. Preliminary results also show that annotations can be used to produce user-directed document clustering and classification.

123 citations

Journal ArticleDOI
30 Apr 1995
TL;DR: The primary concern addressed in this paper is how to ensure that the feature is scalable, and it is argued that the solution is no less scalable than the Web itself.
Abstract: NCSA is adding support for group and public annotations to its HTTP server and Mosaic client. The primary concern addressed in this paper is how to ensure that the feature is scalable. Our solution requires each document server to tell the client where to get public annotations for a document, whereas the user tells the client where to get group annotations for a document. We argue that our solution is no less scalable than the Web itself. Finally, we address the problem of finding out what is new.

115 citations

01 May 1991
TL;DR: Access-Limited Logic is a language for knowledge representation which formalizes the access limitations inherent in a network structured knowledge-base and is used to build several non-trivial systems, including versions of Qualitative Process Theory and Pearl's probability networks.
Abstract: Access-Limited Logic (ALL) is a language for knowledge representation which formalizes the access limitations inherent in a network structured knowledge-base. Where a deductive method such as resolution would retrieve all assertions that satisfy a given pattern, an access-limited logic retrieves all assertions reachable by following an available access path. The time complexity of inference is thus a polynomial function of the size of the accessible portion of the knowledge-base, rather than the size of the entire knowledge-base. Access-Limited Logic, though incomplete, still has a well defined semantics and a weakened form of completeness, Socratic Completeness, which guarantees that for any query which is a logical consequence of the knowledge-base, there exists a series of queries after which the original query will succeed. We have implemented ALL in Lisp and it has been used to build several non-trivial systems, including versions of Qualitative Process Theory and Pearl's probability networks. ALL is a step toward providing the properties--clean semantics, efficient inference, expressive power--which will be necessary to build large, effective knowledge bases.

29 citations

Journal ArticleDOI
01 Jan 2000
TL;DR: Nous montrons that le Document Object Model and Dynamic HTML sont necessaires pour construire des outils d'annotation performants, and comparons ensuite des algorithmes de classification automatique utilisant d'une part les annotations, and d'autre part le texte integral des documents.
Abstract: L'usager du web se retrouve perdu dans son propre espace d'information, materialise en general par des signets ou bookmarks. Une classification automatique des documents semble a cet egard interessante. Nous proposons un outil d'annotation permettant a l'utilisateur de personnaliser les documents. Nous montrons que le Document Object Model et Dynamic HTML sont necessaires pour construire des outils d'annotation performants. Nous comparons ensuite des algorithmes de classification automatique utilisant d'une part les annotations, et d'autre part le texte integral des documents. Nos resultats montrent que les classifications basees sur les annotations sont a la fois plus rapides et plus justes que celles basees sur le texte integral des documents.

9 citations

Frequently Asked Questions (11)
Q1. What are the contributions mentioned in the paper "Annotea: an open rdf infrastructure for shared web annotations" ?

One of the goals of this project has been to re-use as much existing W3C technology as possible. The paper presents the overall design of Annotea and describes some of the issues the authors have faced and how they have solved them. 

Their wish list for future work on Annotea includes: Shared bookmarks. The authors would like to experiment with other ways for displaying annotations. The authors plan to expand the schema so that the author is de ned by another RDF schema and use this property in the Annotation schema. It will then be easy to search for the metadata of the author and, for example, substitute the pencil icon with the photo of the author. 

The browser queries each of the annotation servers, requesting via an HTTP GET method the annotation metadata that is associated with the document's URI. 

Once a document is downloaded, the annotations metadata is downloaded asynchronously, just like images, and merged into the document. 

When the user decides to post it to an annotation server, the local annotation will be deleted and subsequent saves will be sent to the server. 

It codes the annotations into an extended URI format and uses local les similar to bookmark les to store and retrieve the annotations. 

To prevent unnecessary loss of pointers the authors can search for the nearest ID to a parent of the object use it as the starting point for the XPointer path. 

An interesting possibility for presenting the annotations on a Web page is to use internal DOM [14] events without actually changing the mark-up of the page. 

As an alternative to hiding annotations, the user can also temporarily disable some annotation servers using the conguration menu. 

The authors can also easily add new properties to the annotation classes, for instance, the authors could add a property that de nes an annotation set. 

A better XPointer expression would be one that is more tolerant of document changes, but robust enough to prevent misleading annotations.