scispace - formally typeset
Search or ask a question
JournalISSN: 1432-1300

International Journal on Digital Libraries 

Springer Science+Business Media
About: International Journal on Digital Libraries is an academic journal published by Springer Science+Business Media. The journal publishes majorly in the area(s): Digital library & Computer science. It has an ISSN identifier of 1432-1300. Over the lifetime, 509 publications have been published receiving 13889 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: The main novelties of the Lorel language are the extensive use of coercion to relieve the user from the strict typing of OQL, which is inappropriate for semistructured data; and powerful path expressions, which permit a flexible form of declarative navigational access and are particularly suitable when the details of the structure are not known to the user.
Abstract: language, designed for querying semistructured data. Semistructured data is becoming more and more prevalent, e.g., in structured documents such as HTML and when performing simple integration of data from multiple sources. Traditional data models and query languages are inappropriate, since semistructured data often is irregular: some data is missing, similar concepts are represented using different types, heterogeneous sets are present, or object structure is not fully known. Lorel is a user-friendly language in the SQL/OQL style for querying such data effectively. For wide applicability, the simple object model underlying Lorel can be viewed as an extension of the ODMG data model and the Lorel language as an extension of OQL. The main novelties of the Lorel language are: (i) the extensive use of coercion to relieve the user from the strict typing of OQL, which is inappropriate for semistructured data; and (ii) powerful path expressions, which permit a flexible form of declarative navigational access and are particularly suitable when the details of the structure are not known to the user. Lorel also includes a declarative update language. Lorel is implemented as the query language of the Lore prototype database management system at Stanford. Information about Lore can be found at http://www-db.stanford.edu/lore. In addition to presenting the Lorel language in full, this paper briefly describes the Lore system and query processor. We also briefly discuss a second implementation of Lorel on top of a conventional object-oriented database management system, the O2 system.

1,257 citations

Journal ArticleDOI
TL;DR: This paper presents a domain-independent method for the automatic extraction of multi-word terms, from machine-readable special language corpora, using C-value/NC-value, which enhances the common statistical measure of frequency of occurrence for term extraction, making it sensitive to a particular type ofMulti- word terms, the nested terms.
Abstract: Technical terms (henceforth called terms ), are important elements for digital libraries. In this paper we present a domain-independent method for the automatic extraction of multi-word terms, from machine-readable special language corpora. The method, (C-value/NC-value ), combines linguistic and statistical information. The first part, C-value, enhances the common statistical measure of frequency of occurrence for term extraction, making it sensitive to a particular type of multi-word terms, the nested terms. The second part, NC-value, gives: 1) a method for the extraction of term context words (words that tend to appear with terms); 2) the incorporation of information from term context words to the extraction of terms.

849 citations

Journal ArticleDOI
TL;DR: Several actions could improve the research landscape: developing a common evaluation framework, agreement on the information to include in research papers, a stronger focus on non-accuracy aspects and user modeling, a platform for researchers to exchange information, and an open-source framework that bundles the available recommendation approaches.
Abstract: In the last 16 years, more than 200 research articles were published about research-paper recommender systems. We reviewed these articles and present some descriptive statistics in this paper, as well as a discussion about the major advancements and shortcomings and an overview of the most common recommendation concepts and approaches. We found that more than half of the recommendation approaches applied content-based filtering (55 %). Collaborative filtering was applied by only 18 % of the reviewed approaches, and graph-based recommendations by 16 %. Other recommendation concepts included stereotyping, item-centric recommendations, and hybrid recommendations. The content-based filtering approaches mainly utilized papers that the users had authored, tagged, browsed, or downloaded. TF-IDF was the most frequently applied weighting scheme. In addition to simple terms, n-grams, topics, and citations were utilized to model users' information needs. Our review revealed some shortcomings of the current research. First, it remains unclear which recommendation concepts and approaches are the most promising. For instance, researchers reported different results on the performance of content-based and collaborative filtering. Sometimes content-based filtering performed better than collaborative filtering and sometimes it performed worse. We identified three potential reasons for the ambiguity of the results. (A) Several evaluations had limitations. They were based on strongly pruned datasets, few participants in user studies, or did not use appropriate baselines. (B) Some authors provided little information about their algorithms, which makes it difficult to re-implement the approaches. Consequently, researchers use different implementations of the same recommendations approaches, which might lead to variations in the results. (C) We speculated that minor variations in datasets, algorithms, or user populations inevitably lead to strong variations in the performance of the approaches. Hence, finding the most promising approaches is a challenge. As a second limitation, we noted that many authors neglected to take into account factors other than accuracy, for example overall user satisfaction. In addition, most approaches (81 %) neglected the user-modeling process and did not infer information automatically but let users provide keywords, text snippets, or a single paper as input. Information on runtime was provided for 10 % of the approaches. Finally, few research papers had an impact on research-paper recommender systems in practice. We also identified a lack of authority and long-term research interest in the field: 73 % of the authors published no more than one paper on research-paper recommender systems, and there was little cooperation among different co-author groups. We concluded that several actions could improve the research landscape: developing a common evaluation framework, agreement on the information to include in research papers, a stronger focus on non-accuracy aspects and user modeling, a platform for researchers to exchange information, and an open-source framework that bundles the available recommendation approaches.

648 citations

Journal ArticleDOI
TL;DR: This paper proposes a query language, WebSQL, that takes advantage of multiple index servers without requiring users to know about them, and that integrates textual retrieval with structure and topology-based queries.
Abstract: The World Wide Web is a large, heterogeneous, distributed collection of documents connected by hypertext links. The most common technology currently used for searching the Web depends on sending information retrieval requests to "index servers". One problem with this is that these queries cannot exploit the structure and topology of the document network. The authors propose a query language, WebSQL, that takes advantage of multiple index servers without requiring users to know about them, and that integrates textual retrieval with structure and topology-based queries. They give a formal semantics for WebSQL using a calculus based on a novel "virtual graph" model of a document network. They propose a new theory of query cost based on the idea of "query locality," that is, how much of the network must be visited to answer a particular query. Finally, they describe a prototype implementation of WebSQL written in Java.

402 citations

Journal ArticleDOI
TL;DR: WBIIS (Wavelet-Based Image Indexing and Searching), a new image indexing and retrieval algorithm with partial sketch image searching capability for large image databases, which performs much better in capturing coherence of image, object granularity, local color/texture, and bias avoidance than traditional color layout algorithms.
Abstract: This paper describes WBIIS (Wavelet-Based Image Indexing and Searching), a new image indexing and retrieval algorithm with partial sketch image searching capability for large image databases. The algorithm characterizes the color variations over the spatial extent of the image in a manner that provides semantically meaningful image comparisons. The indexing algorithm applies a Daubechies' wavelet transform for each of the three opponent color components. The wavelet coefficients in the lowest few frequency bands, and their variances, are stored as feature vectors. To speed up retrieval, a two-step procedure is used that first does a crude selection based on the variances, and then refines the search by performing a feature vector match between the selected images and the query. For better accuracy in searching, two-level multiresolution matching may also be used. Masks are used for partial-sketch queries. This technique performs much better in capturing coherence of image, object granularity, local color/texture, and bias avoidance than traditional color layout algorithms. WBIIS is much faster and more accurate than traditional algorithms. When tested on a database of more than 10 000 general-purpose images, the best 100 matches were found in 3.3 seconds.

346 citations

Performance
Metrics
No. of papers from the Journal in previous years
YearPapers
202331
202225
202138
202024
201925
201827