ACM international conference on Digital libraries
About: ACM international conference on Digital libraries is an academic conference. The conference publishes majorly in the area(s): Digital library & Metadata. Over the lifetime, 496 publication(s) have been published by the conference receiving 14269 citation(s).
Papers published on a yearly basis
••01 Jun 2000
TL;DR: This paper develops a scalable evaluation methodology and metrics for the task, and presents a thorough experimental evaluation of Snowball and comparable techniques over a collection of more than 300,000 newspaper documents.
Abstract: Text documents often contain valuable structured data that is hidden Yin regular English sentences. This data is best exploited infavailable as arelational table that we could use for answering precise queries or running data mining tasks.We explore a technique for extracting such tables from document collections that requires only a handful of training examples from users. These examples are used to generate extraction patterns, that in turn result in new tuples being extracted from the document collection.We build on this idea and present our Snowball system. Snowball introduces novel strategies for generating patterns and extracting tuples from plain-text documents.At each iteration of the extraction process, Snowball evaluates the quality of these patterns and tuples without human intervention,and keeps only the most reliable ones for the next iteration. In this paper we also develop a scalable evaluation methodology and metrics for our task, and present a thorough experimental evaluation of Snowball and comparable techniques over a collection of more than 300,000 newspaper documents.
••01 Jun 2000
TL;DR: This work describes a content-based book recommending system that utilizes information extraction and a machine-learning algorithm for text categorization and shows initial experimental results demonstrate that this approach can produce accurate recommendations.
Abstract: Recommender systems improve access to relevant products and information by making personalized suggestions based on previous examples of a user's likes and dislikes. Most existing recommender systems use collaborative filtering methods that base recommendations on other users' preferences. By contrast,content-based methods use information about an item itself to make suggestions.This approach has the advantage of being able to recommend previously unrated items to users with unique interests and to provide explanations for its recommendations. We describe a content-based book recommending system that utilizes information extraction and a machine-learning algorithm for text categorization. Initial experimental results demonstrate that this approach can produce accurate recommendations.
••01 Aug 1999
Abstract: Keyphrases provide semantic metadata that summarize and characterize documents. This paper describes Kea, an algorithm for automatically extracting keyphrases from text. Kea identifies candidate keyphrases using lexical methods, calculates feature values for each candidate, and uses a machinelearning algorithm to predict which candidates are good keyphrases. The machine learning scheme first builds a prediction model using training documents with known keyphrases, and then uses the model to find keyphrases in new documents. We use a large test corpus to evaluate Kea’s effectiveness in terms of how many author-assigned keyphrases are correctly identified. The system is simple, robust, and publicly available.
••11 May 1998
TL;DR: CiteSeer has many advantages over traditional citation indexes, including the ability to create more up-to-date databases which are not limited to a preselected set of journals or restricted by journal publication delays, completely autonomous operation with a corresponding reduction in cost, and powerful interactive browsing of the literature using the context of citations.
Abstract: We present CiteSeer: an autonomous citation indexing system which indexes academic literature in electronic format (e.g. Postscript files on the Web). CiteSeer understands how to parse citations, identify citations to the same paper in different formats, and identify the context of citations in the body of articles. CiteSeer provides most of the advantages of traditional (manually constructed) citation indexes (e.g. the ISI citation indexes), including: literature retrieval by following citation links (e.g. by providing a list of papers that cite a given paper), the evaluation and ranking of papers, authors, journals, etc. based on the number of citations, and the identification of research trends. CiteSeer has many advantages over traditional citation indexes, including the ability to create more up-to-date databases which are not limited to a preselected set of journals or restricted by journal publication delays, completely autonomous operation with a corresponding reduction in cost, and powerful interactive browsing of the literature using the context of citations. Given a particular paper of interest, CiteSeer can display the context of how the paper is cited in subsequent publications. This context may contain a brief summary of the paper, another author’s response to the paper, or subsequent work which builds upon the original article. CiteSeer allows the location of papers by keyword search or by citation links. Papers related to a given paper can be located using common citation information or word vector similarity. CiteSeer will soon be available for public use.
••01 Jul 1997
TL;DR: The practice of annotation in a particular situation is examined: the markings students make in university-level textbooks, and their status within a community of fellow textbook readers is examined.
Abstract: Readers annotate paper books as a routine part of their engagement with the materials; it is a useful practice, manifested through a wide variety of markings made in service of very different purposes. This paper examines the practice of annotation in a particular situation: the markings students make in university-level textbooks. The study focuses on the form and function of these annotations, and their status within a community of fellow textbook readers. Using this study as a basis, I discuss issues and implications for the design of annotation tools for a digital library setting.
Related Conferences (5)
International ACM SIGIR Conference on Research and Development in Information Retrieval
6.4K papers, 316.1K citations
Conference on Information and Knowledge Management
7K papers, 191.8K citations
The Web Conference
6.8K papers, 423.6K citations
7.7K papers, 212.1K citations
International Conference on Data Engineering
6.2K papers, 232.8K citations