scispace - formally typeset
Search or ask a question
Author

Rogier Brussee

Other affiliations: Novay
Bio: Rogier Brussee is an academic researcher from Utrecht University. The author has contributed to research in topics: Semantics & Thesaurus (information retrieval). The author has an hindex of 13, co-authored 24 publications receiving 640 citations. Previous affiliations of Rogier Brussee include Novay.

Papers
More filters
Proceedings ArticleDOI
01 Sep 2008
TL;DR: Evaluation on Wikipedia articles shows that clusters of keywords correlate strongly with the Wikipedia categories of the articles, and a newly proposed term distribution taking co-occurrence of terms into account gives best results.
Abstract: We consider topic detection without any prior knowledge of category structure or possible categories. Keywords are extracted and clustered based on different similarity measures using the induced k-bisecting clustering algorithm. Evaluation on Wikipedia articles shows that clusters of keywords correlate strongly with the Wikipedia categories of the articles. In addition, we find that a distance measure based on the Jensen-Shannon divergence of probability distributions outperforms the cosine similarity. In particular, a newly proposed term distribution taking co-occurrence of terms into account gives best results.

144 citations

Proceedings ArticleDOI
26 Aug 2003
TL;DR: This paper presents the results of the first phase of the Topia project, which explored generating a discourse structure derived from generic processing of the underlying domain semantics, transforming this to a structured progression and then using this to steer the choice of hypermedia communicative devices used to convey the actual information in the resulting presentation.
Abstract: Generating hypermedia presentations requires processing constituent material into coherent, unified presentations. One large challenge is creating a generic process for producing hypermedia presentations from the semantics of potentially unfamiliar domains. The resulting presentations must both respect the underlying semantics and appear as coherent, plausible and, if possible, pleasant to the user. Among the related unsolved problems is the inclusion of discourse knowledge in the generation process. One potential approach is generating a discourse structure derived from generic processing of the underlying domain semantics, transforming this to a structured progression and then using this to steer the choice of hypermedia communicative devices used to convey the actual information in the resulting presentation.This paper presents the results of the first phase of the Topia project, which explored this approach. These results include an architecture for this more domain-independent processing of semantics and discourse into hypermedia presentations. We demonstrate this architecture with an implementation using Web standards and freely available technologies.

77 citations

Proceedings ArticleDOI
30 Aug 2010
TL;DR: The results show that using word co-occurrence information can improve precision and recall over tf.idf, and some alternative relevance measures that do use relations between words are studied.
Abstract: A common strategy to assign keywords to documents is to select the most appropriate words from the document text. One of the most important criteria for a word to be selected as keyword is its relevance for the text. The tf.idf score of a term is a widely used relevance measure. While easy to compute and giving quite satisfactory results, this measure does not take (semantic) relations between words into account. In this paper we study some alternative relevance measures that do use relations between words. They are computed by defining co-occurrence distributions for words and comparing these distributions with the document and the corpus distribution. We then evaluate keyword extraction algorithms defined by selecting different relevance measures. For two corpora of abstracts with manually assigned keywords, we compare manually extracted keywords with different automatically extracted ones. The results show that using word co-occurrence information can improve precision and recall over tf.idf.

74 citations

01 Mar 2007
TL;DR: The overall goal of the project is to explore different users' characteristics and personalize users' museum experiences within the Rijksmuseum virtual and physical collections.
Abstract: This paper describes ongoing work exploring aspects of personalized access to and presentation of virtual museum collections. The project demonstrator illustrates an interactive approach to collecting data about museum visitors in terms of their interests in and preferences about artefacts from the Rijksmuseum collection. This data is stored in user profiles used further to recommend routes through the museum and to guide the users towards artefacts related to their interests and preferences. The overall goal of the project is to explore different users' characteristics and personalize users' museum experiences within the Rijksmuseum virtual and physical collections.

60 citations

Proceedings ArticleDOI
30 Nov 2009
TL;DR: A second order co-occurrence and a related distance measure measure for tag similarities that is robust against the variation in tags is introduced that can derive methods to analyze user interest and compute recommendations.
Abstract: Tagging with free form tags is becoming an increasingly important indexing mechanism. However, free form tags have characteristics that require special treatment when used for searching or recommendation because they show much more variation than controlled keywords. In this paper we present a method that puts this large variation to good use. We introduce second order co-occurrence and a related distance measure measure for tag similarities that is robust against the variation in tags. From this distance measure it is straightforward to derive methods to analyze user interest and compute recommendations. We evaluate the use of tag based recommendation on the Movielens dataset and a dataset of tagged books.

48 citations


Cited by
More filters
Book ChapterDOI
01 Jan 2002
TL;DR: Knowledge management systems, Knowledge management systems , مرکز فناوری اطلاعات و اصاع رسانی, کδاوρزی
Abstract: Knowledge management systems , Knowledge management systems , مرکز فناوری اطلاعات و اطلاع رسانی کشاورزی

416 citations

Journal ArticleDOI
TL;DR: Gradient Field HOG is described; an adapted form of the HOG descriptor suitable for Sketch Based Image Retrieval (SBIR) and incorporated into a Bag of Visual Words retrieval framework, and shown to consistently outperform retrieval versus SIFT, multi-resolution HOG, Self Similarity, Shape Context and Structure Tensor.

363 citations

Journal ArticleDOI
TL;DR: Investigating the influence of transparency on user trust in and acceptance of content-based recommender systems in the cultural heritage domain shows that explaining to the user why a recommendation was made increased acceptance of the recommendations, but trust in the system itself was not improved by transparency.
Abstract: The increasing availability of (digital) cultural heritage artefacts offers great potential for increased access to art content, but also necessitates tools to help users deal with such abundance of information. User-adaptive art recommender systems aim to present their users with art content tailored to their interests. These systems try to adapt to the user based on feedback from the user on which artworks he or she finds interesting. Users need to be able to depend on the system to competently adapt to their feedback and find the artworks that are most interesting to them. This paper investigates the influence of transparency on user trust in and acceptance of content-based recommender systems. A between-subject experiment (N = 60) evaluated interaction with three versions of a content-based art recommender in the cultural heritage domain. This recommender system provides users with artworks that are of interest to them, based on their ratings of other artworks. Version 1 was not transparent, version 2 explained to the user why a recommendation had been made and version 3 showed a rating of how certain the system was that a recommendation would be of interest to the user. Results show that explaining to the user why a recommendation was made increased acceptance of the recommendations. Trust in the system itself was not improved by transparency. Showing how certain the system was of a recommendation did not influence trust and acceptance. A number of guidelines for design of recommender systems in the cultural heritage domain have been derived from the study's results.

360 citations

Book ChapterDOI
TL;DR: An overview of crowdfunding literature, classified in terms of the main actors (i.e., capital seekers, capital providers, and intermediaries) is provided in this article, where the authors present important research questions for future research.
Abstract: Crowdfunding has become important in recent years. However, there is no comprehensive overview of the economic literature on this topic. This paper provides an overview of crowdfunding literature, classified in terms of the main actors (capital seekers, capital providers, and intermediaries), and presents important research questions for future research.

220 citations

Journal ArticleDOI
TL;DR: This paper compares and contrasts a variety of test searches in PubMed and Google Scholar to gain a better understanding of Google Scholar's searching capabilities.
Abstract: Google Scholar has been met with both enthusiasm and criticism since its introduction in 2004. This search engine provides a simple way to access “peer-reviewed papers, theses, books, abstracts, and articles from academic publishers' sites, professional societies, preprint repositories, universities and other scholarly organizations” [1]. An obvious strength of Google Scholar is its intuitive interface, as the main search engine interface consists of a simple query box. In contrast, databases, such as PubMed, utilize search interfaces that offer a greater variety of advanced features. These additional features, while powerful, often lead to a complexity that may require a substantial investment of time to master. It has been observed that Google Scholar may allow searchers to “find some resources they can use rather than be frustrated by a database's search screen” [2]. Some even feel that “Google Scholar's simplicity may eventually consume PubMed” [3]. Along with ease of use, Google Scholar carries the familiar “Google” brand name. As Kennedy and Price so aptly stated, “College students AND professors might not know that library databases exist, but they sure know Google” [4]. The familiarity of Google may allow librarians and educators to ease students into the scholarly searching process by starting with Google Scholar and eventually moving to more complex systems. Felter noted that “as researchers work with Google Scholar and reach limitations of searching capabilities and options, they may become more receptive to other products” [5]. Google Scholar is also thought to provide increased access to gray literature [2], as it retrieves more than journal articles and includes preprint archives, conference proceedings, and institutional repositories [6]. Google Scholar also includes links to the online collections of some academic libraries. Including these access points in Google Scholar retrieval sets may ultimately help more users reach more of their own institution's subscriptions [7]. While its advantages are substantial, Google Scholar is not without flaws. The shortcomings of the system and its search interface have been well documented in the literature and include lack of reliable advanced search functions, lack of controlled vocabulary, and issues regarding scope of coverage and currency. Table 1 summarizes some of the reported criticisms of Google Scholar. Table 1 Criticisms of Google Scholar Vine found that while Google Scholar pulls in data from PubMed, many PubMed records are missing [20], and that Google Scholar also lacks features available in MEDLINE [12]. Others have noted that Google Scholar should not be the first or sole choice when searching for patient care information, clinical trials, or literature reviews [23,24]. Thorough review and testing of Google Scholar, being an approach similar to that used to evaluate licensed resources, is necessary to better understand its strengths and limitations. As Jacso states, “professional searchers must do sample test searches and correctly interpret the results to corroborate claims and get factual information about databases” [18]. This paper compares and contrasts a variety of test searches in PubMed and Google Scholar to gain a better understanding of Google Scholar's searching capabilities.

215 citations