scispace - formally typeset
Search or ask a question

Showing papers in "Information Processing and Management in 2013"


Journal ArticleDOI
TL;DR: The multidimensional User Engagement Scale (UES) was administered in an exploratory search environment to assess users' perceptions of the Perceived Usability, Aesthetics, Novelty, Felt Involvement, and Endurability aspects of the experience.
Abstract: The user experience is an integral component of interactive information retrieval (IIR). However, there is a twofold problem in its measurement. Firstly, while many IIR studies have relied on a single dimension of user feedback, that of satisfaction, experience is a much more complex concept. IIR in general, and exploratory search more specifically, are dynamic, multifaceted experiences that evoke pragmatic and hedonic needs, expectations, and outcomes that are not adequately captured by user satisfaction. Secondly, questionnaires, which are typically the means in which user's attitudes and perceptions are measured, are not typically subjected to rigorous reliability and validity testing. To address these issues, we administered the multidimensional User Engagement Scale (UES) in an exploratory search environment to assess users' perceptions of the Perceived Usability (PUs), Aesthetics (AE), Novelty (NO), Felt Involvement (FI), Focused Attention (FA), and Endurability (EN) aspects of the experience. In a typical laboratory-style study, 381 participants performed three relatively complex search tasks using a novel search interface, and responded to the UES immediately upon completion. We used Principal Axis Factor Analysis and Multiple Regression to examine the factor structure of UES items and the relationships amongst factors. Results showed that three of the six sub-scales (PUs, AE, FA) were stable, while NO, FI and EN merged to form a single factor. We discuss recommendations for revising and validating the UES in light of these findings.

159 citations


Journal ArticleDOI
TL;DR: Bibliometric maps cannot be expected ever to be fully equivalent to scholarly taxonomies, but they are valuable tools for assisting users’ to orient themselves to the information ecology.
Abstract: Knowledge organization (KO) and bibliometrics have traditionally been seen as separate subfields of library and information science, but bibliometric techniques make it possible to identify candidate terms for thesauri and to organize knowledge by relating scientific papers and authors to each other and thereby indicating kinds of relatedness and semantic distance. It is therefore important to view bibliometric techniques as a family of approaches to KO in order to illustrate their relative strengths and weaknesses. The subfield of bibliometrics concerned with citation analysis forms a distinct approach to KO which is characterized by its social, historical and dynamic nature, its close dependence on scholarly literature and its explicit kind of literary warrant. The two main methods, co-citation analysis and bibliographic coupling represent different things and thus neither can be considered superior for all purposes. The main difference between traditional knowledge organization systems (KOSs) and maps based on citation analysis is that the first group represents intellectual KOSs, whereas the second represents social KOSs. For this reason bibliometric maps cannot be expected ever to be fully equivalent to scholarly taxonomies, but they are – along with other forms of KOSs – valuable tools for assisting users’ to orient themselves to the information ecology. Like other KOSs, citation-based maps cannot be neutral but will always be based on researchers’ decisions, which tend to favor certain interests and views at the expense of others.

137 citations


Journal ArticleDOI
TL;DR: A content-based technique to automatically generate a semantic representation of the user's musical preferences directly from audio, starting from an explicit set of music tracks provided by the user as evidence of his/her preferences is proposed.
Abstract: Preference elicitation is a challenging fundamental problem when designing recommender systems. In the present work we propose a content-based technique to automatically generate a semantic representation of the user's musical preferences directly from audio. Starting from an explicit set of music tracks provided by the user as evidence of his/her preferences, we infer high-level semantic descriptors for each track obtaining a user model. To prove the benefits of our proposal, we present two applications of our technique. In the first one, we consider three approaches to music recommendation, two of them based on a semantic music similarity measure, and one based on a semantic probabilistic model. In the second application, we address the visualization of the user's musical preferences by creating a humanoid cartoon-like character - the Musical Avatar - automatically inferred from the semantic representation. We conducted a preliminary evaluation of the proposed technique in the context of these applications with 12 subjects. The results are promising: the recommendations were positively evaluated and close to those coming from state-of-the-art metadata-based systems, and the subjects judged the generated visualizations to capture their core preferences. Finally, we highlight the advantages of the proposed semantic user model for enhancing the user interfaces of information filtering systems.

111 citations


Journal ArticleDOI
TL;DR: A new variable-length encoding scheme for sequences of integers, Directly Addressable Codes (DACs), which enables direct access to any element of the encoded sequence without the need of any sampling method is presented.
Abstract: We present a new variable-length encoding scheme for sequences of integers, Directly Addressable Codes (DACs), which enables direct access to any element of the encoded sequence without the need of any sampling method. Our proposal is a kind of implicit data structure that introduces synchronism in the encoded sequence without using asymptotically any extra space. We show some experiments demonstrating that the technique is not only simple, but also competitive in time and space with existing solutions in several applications, such as the representation of LCP arrays or high-order entropy-compressed sequences.

109 citations


Journal ArticleDOI
TL;DR: The facet-analytic paradigm is probably the most distinct approach to knowledge organization within Library and Information Science, and in many ways it has dominated what has be termed ''modern classification theory''.
Abstract: The facet-analytic paradigm is probably the most distinct approach to knowledge organization within Library and Information Science, and in many ways it has dominated what has be termed ''modern classification theory''. It was mainly developed by S.R. Ranganathan and the British Classification Research Group, but it is mostly based on principles of logical division developed more than two millennia ago. Colon Classification (CC) and Bliss 2 (BC2) are among the most important systems developed on this theoretical basis, but it has also influenced the development of other systems, such as the Dewey Decimal Classification (DDC) and is also applied in many websites. It still has a strong position in the field and it is the most explicit and ''pure'' theoretical approach to knowledge organization (KO) (but it is not by implication necessarily also the most important one). The strength of this approach is its logical principles and the way it provides structures in knowledge organization systems (KOS). The main weaknesses are (1) its lack of empirical basis and (2) its speculative ordering of knowledge without basis in the development or influence of theories and socio-historical studies. It seems to be based on the problematic assumption that relations between concepts are a priori and not established by the development of models, theories and laws.

91 citations


Journal ArticleDOI
TL;DR: A specific way to integrate interactive visualization and personalized search is proposed and an adaptive visualization based search system Adaptive VIBE that implements it is introduced that can improve the precision and the productivity of the personalized search system while helping users to discover more diverse sets of information.
Abstract: As the volume and breadth of online information is rapidly increasing, ad hoc search systems become less and less efficient to answer information needs of modern users To support the growing complexity of search tasks, researchers in the field of information developed and explored a range of approaches that extend the traditional ad hoc retrieval paradigm Among these approaches, personalized search systems and exploratory search systems attracted many followers Personalized search explored the power of artificial intelligence techniques to provide tailored search results according to different user interests, contexts, and tasks In contrast, exploratory search capitalized on the power of human intelligence by providing users with more powerful interfaces to support the search process As these approaches are not contradictory, we believe that they can re-enforce each other We argue that the effectiveness of personalized search systems may be increased by allowing users to interact with the system and learn/investigate the problem in order to reach the final goal We also suggest that an interactive visualization approach could offer a good ground to combine the strong sides of personalized and exploratory search approaches This paper proposes a specific way to integrate interactive visualization and personalized search and introduces an adaptive visualization based search system Adaptive VIBE that implements it We tested the effectiveness of Adaptive VIBE and investigated its strengths and weaknesses by conducting a full-scale user study The results show that Adaptive VIBE can improve the precision and the productivity of the personalized search system while helping users to discover more diverse sets of information

85 citations


Journal ArticleDOI
TL;DR: It is shown that a user's level of domain knowledge can be inferred from their interactive search behaviors without considering the content of queries or documents, and exploratory regression models are constructed that suggest it is possible to build models that can make predictions of the user'slevel of knowledge based on real-time measurements of eye movement patterns during a task session.
Abstract: The acquisition of information and the search interaction process is influenced strongly by a person's use of their knowledge of the domain and the task. In this paper we show that a user's level of domain knowledge can be inferred from their interactive search behaviors without considering the content of queries or documents. A technique is presented to model a user's information acquisition process during search using only measurements of eye movement patterns. In a user study (n=40) of search in the domain of genomics, a representation of the participant's domain knowledge was constructed using self-ratings of knowledge of genomics-related terms (n=409). Cognitive effort features associated with reading eye movement patterns were calculated for each reading instance during the search tasks. The results show correlations between the cognitive effort due to reading and an individual's level of domain knowledge. We construct exploratory regression models that suggest it is possible to build models that can make predictions of the user's level of knowledge based on real-time measurements of eye movement patterns during a task session.

84 citations


Journal ArticleDOI
TL;DR: This paper evaluated recommendations of learning resources generated by different well known memory-based CF algorithms using two databases (with implicit and explicit ratings) gathered from the popular MERLOT repository and compared several existing endorsement mechanisms of the repository to explore possible relations among them.
Abstract: Collaborative filtering (CF) algorithms are techniques used by recommender systems to predict the utility of items for users based on the similarity among their preferences and the preferences of other users. The enormous growth of learning objects on the internet and the availability of preferences of usage by the community of users in the existing learning object repositories (LORs) have opened the possibility of testing the efficiency of CF algorithms on recommending learning materials to the users of these communities. In this paper we evaluated recommendations of learning resources generated by different well known memory-based CF algorithms using two databases (with implicit and explicit ratings) gathered from the popular MERLOT repository. We have also contrasted the results of the generated recommendations with several existing endorsement mechanisms of the repository to explore possible relations among them. Finally, the recommendations generated by the different algorithms were compared in order to evaluate whether or not they were overlapping. The results found here can be used as a starting point for future studies that account for the specific context of learning object repositories and the different aspects of preference in learning resource selection.

81 citations


Journal ArticleDOI
TL;DR: The proposed approach has been extended to develop a question dependent approach that considers the relevance of historical questions to the target question in deriving user domain knowledge, reputation and authority.
Abstract: Question answering websites are becoming an ever more popular knowledge sharing platform. On such websites, people may ask any type of question and then wait for someone else to answer the question. However, in this manner, askers may not obtain correct answers from appropriate experts. Recently, various approaches have been proposed to automatically find experts in question answering websites. In this paper, we propose a novel hybrid approach to effectively find experts for the category of the target question in question answering websites. Our approach considers user subject relevance, user reputation and authority of a category in finding experts. A user's subject relevance denotes the relevance of a user's domain knowledge to the target question. A user's reputation is derived from the user's historical question-answering records, while user authority is derived from link analysis. Moreover, our proposed approach has been extended to develop a question dependent approach that considers the relevance of historical questions to the target question in deriving user domain knowledge, reputation and authority. We used a dataset obtained from Yahoo! Answer Taiwan to evaluate our approach. Our experiment results show that our proposed methods outperform other conventional methods.

80 citations


Journal ArticleDOI
TL;DR: GroupReM is introduced which makes movie recommendations appealing to members of a group by employing a merging strategy to explore individual group members' interests in movies and creating a profile that reflects the preferences of the group on movies, and using word-correlation factors to find movies similar in content.
Abstract: People are gregarious by nature, which explains why group activities, from colleagues sharing a meal to friends attending a book club event together, are the social norm. Online group recommenders identify items of interest, such as restaurants, movies, and books, that satisfy the collective needs of a group (rather than the interests of individual group members). With a number of new movies being released every week, online recommenders play a significant role in suggesting movies for family members or groups of friends/people to watch, either at home or at movie theaters. Making group recommendations relevant to the joint interests of a group, however, is not a trivial task due to the diversity in preferences among group members. To address this issue, we introduce GroupReM which makes movie recommendations appealing (to a certain degree) to members of a group by (i) employing a merging strategy to explore individual group members' interests in movies and create a profile that reflects the preferences of the group on movies, (ii) using word-correlation factors to find movies similar in content, and (iii) considering the popularity of movies at a movie website. Unlike existing group recommenders based on collaborative filtering (CF) which consider ratings of movies to perform the recommendation task, GroupReM primarily employs (personal) tags for capturing the contents of movies considered for recommendation and group members' interests. The design of GroupReM, which is simple and domain-independent, can easily be extended to make group recommendations on items other than movies. Empirical studies conducted using more than 3000 groups of different users in the MovieLens dataset, which are various in terms of numbers and preferences in movies, show that GroupReM is highly effective and efficient in recommending movies appealing to a group. Experimental results also verify that GroupReM outperforms popular CF-based recommenders in making group recommendations.

75 citations


Journal ArticleDOI
TL;DR: This research demonstrates that the LDA-based classification scheme tends to outperform the Delta rule, and the @g^2 distance, two classical approaches in authorship attribution based on a restricted number of terms.
Abstract: This paper describes, evaluates and compares the use of Latent Dirichlet allocation (LDA) as an approach to authorship attribution. Based on this generative probabilistic topic model, we can model each document as a mixture of topic distributions with each topic specifying a distribution over words. Based on author profiles (aggregation of all texts written by the same writer) we suggest computing the distance with a disputed text to determine its possible writer. This distance is based on the difference between the two topic distributions. To evaluate different attribution schemes, we carried out an experiment based on 5408 newspaper articles (Glasgow Herald) written by 20 distinct authors. To complement this experiment, we used 4326 articles extracted from the Italian newspaper La Stampa and written by 20 journalists. This research demonstrates that the LDA-based classification scheme tends to outperform the Delta rule, and the @g^2 distance, two classical approaches in authorship attribution based on a restricted number of terms. Compared to the Kullback-Leibler divergence, the LDA-based scheme can provide better effectiveness when considering a larger number of terms.

Journal ArticleDOI
TL;DR: The author offers a comprehensive survey of feasible algorithms for ranking users in social networks, he examines their vulnerabilities to linking malpractice in such networks, and suggests an objective criterion against which to compare such algorithms.
Abstract: Micro-blogging services such as Twitter allow anyone to publish anything, anytime. Needless to say, many of the available contents can be diminished as babble or spam. However, given the number and diversity of users, some valuable pieces of information should arise from the stream of tweets. Thus, such services can develop into valuable sources of up-to-date information (the so-called real-time web) provided a way to find the most relevant/trustworthy/authoritative users is available. Hence, this makes a highly pertinent question for which graph centrality methods can provide an answer. In this paper the author offers a comprehensive survey of feasible algorithms for ranking users in social networks, he examines their vulnerabilities to linking malpractice in such networks, and suggests an objective criterion against which to compare such algorithms. Additionally, he suggests a first step towards “desensitizing” prestige algorithms against cheating by spammers and other abusive users.

Journal ArticleDOI
TL;DR: This paper presents what, to the best of the knowledge, is currently the most comprehensive study of the relative quality of textual features in social media, based on an extensive characterization of data crawled from four popular applications.
Abstract: Social media is increasingly becoming a significant fraction of the content retrieved daily by Web users. However, the potential lack of quality of user generated content poses a challenge to information retrieval services, which rely mostly on textual features generated by users (particularly tags) commonly associated with the multimedia objects. This paper presents what, to the best of our knowledge, is currently the most comprehensive study of the relative quality of textual features in social media. We analyze four different features, namely, title, tags, description and comments posted by users, in four popular applications, namely, YouTube, Yahoo! Video, LastFM and CiteULike. Our study is based on an extensive characterization of data crawled from the four applications with respect to usage, amount and semantics of content, descriptive and discriminative power as well as content and information diversity across features. It also includes a series of object classification and tag recommendation experiments as case studies of two important information retrieval tasks, aiming at analyzing how these tasks are affected by the quality of the textual features. Classification and recommendation effectiveness is analyzed in light of our characterization results. Our findings provide valuable insights for future research and design of Web 2.0 applications and services.

Journal ArticleDOI
TL;DR: This study is the first study that encompasses both the antecedents of simultaneous cooperative and behaviors and the mechanisms through which simultaneous cooperation and competition influence knowledge sharing behaviors.
Abstract: We present and empirically validate a Coopetitive Model of Knowledge Sharing that helps understand the forces underlying High-Quality Knowledge Sharing in multiparty software development teams. More specifically, we integrate the Coopetitive Model of Knowledge Sharing and Social Interdependence Theory to explain the forces behind High-Quality Knowledge Sharing in cross-functional software development teams. Based on the analysis of data collected from 115 software development project managers, we explore the mechanisms through which simultaneous cooperative and competitive behaviors drive High-Quality Knowledge Sharing among cross-functional team members. We also show how multiple interdependencies that are simultaneously set in motion engender cooperative and competitive behaviors. This study is the first study that encompasses both the antecedents of simultaneous cooperative and behaviors and the mechanisms through which simultaneous cooperation and competition influence knowledge sharing behaviors. The model adds to the emerging contingency perspective pertaining to the study of cooperation and competition in system development teams.

Journal ArticleDOI
TL;DR: The article focuses on the social workers' workarounds aka their own alternative strategies for defeating the various types of obstacles in information interaction in a client information system (CIS).
Abstract: The article focuses on the social workers' workarounds aka their own alternative strategies for defeating the various types of obstacles in information interaction in a client information system (CIS). Data consists of semi-structured interviews and social workers' observations with their verbal accounts while they used CIS in their daily work. The workarounds were analyzed from the process perspective when antecedent conditions, actual workarounds and their consequences were taken into account. The design flaws and external demands in work generated the workarounds. The social workers used small scale tricks within CIS to maintain continuum in a client's trajectory; they relied on shadow systems to manage their whole clientele; and took shortcuts in production of statistical information. The workarounds offered a better grip on information and saved time. However, some of the workarounds were tensional in a child protection context. The analysis of workarounds provided valuable secondary design suggestions to remedy CIS.

Journal ArticleDOI
TL;DR: The Recommendation System of Pedagogical Patterns (RSPP) as discussed by the authors is a system that allows lecturers to define their best teaching strategies for use in the context of a specific class, defined by: the specific characteristics of the subject being treated, the specific objectives that are expected to be achieved in the classroom session, the profile of the students on the course, the dominant characteristics of a teacher, and the classroom environment for each session.
Abstract: To carry out effective teaching/learning processes, lecturers in a variety of educational institutions frequently need support. They therefore resort to advice from more experienced lecturers, to formal training processes such as specializations, master or doctoral degrees, or to self-training. High costs in time and money are invariably involved in the processes of formal training, while self-training and advice each bring their own specific risks (e.g. of following new trends that are not fully evaluated or the risk of applying techniques that are inappropriate in specific contexts).This paper presents a system that allows lecturers to define their best teaching strategies for use in the context of a specific class. The context is defined by: the specific characteristics of the subject being treated, the specific objectives that are expected to be achieved in the classroom session, the profile of the students on the course, the dominant characteristics of the teacher, and the classroom environment for each session, among others. The system presented is the Recommendation System of Pedagogical Patterns (RSPP). To construct the RSPP, an ontology representing the pedagogical patterns and their interaction with the fundamentals of the educational process was defined. A web information system was also defined to record information on courses, students, lecturers, etc.; an option based on a unified hybrid model (for content and collaborative filtering) of recommendations for pedagogical patterns was further added to the system. RSPP features a minable view, a tabular structure that summarizes and organizes the information registered in the rest of the system as well as facilitating the task of recommendation. The data recorded in the minable view is taken to a latent space, where noise is reduced and the essence of the information contained in the structure is distilled. This process makes use of Singular Value Decomposition (SVD), commonly used by information retrieval and recommendation systems. Satisfactory results both in the accuracy of the recommendations and in the use of the general application open the door for further research and expand the role of recommender systems in educational teacher support processes.

Journal ArticleDOI
TL;DR: This paper presents a hybrid approach that contains two steps that can invoke each other and discovers informative content using Decision Tree Learning as an appropriate machine learning method and creates rules from the results of this learning method.
Abstract: Eliminating noisy information and extracting informative content have become important issues for web mining, search and accessibility. This extraction process can employ automatic techniques and hand-crafted rules. Automatic extraction techniques focus on various machine learning methods, but implementing these techniques increases time complexity of the extraction process. Conversely, extraction through hand-crafted rules is an efficient technique that uses string manipulation functions, but preparing these rules is difficult and cumbersome for users. In this paper, we present a hybrid approach that contains two steps that can invoke each other. The first step discovers informative content using Decision Tree Learning as an appropriate machine learning method and creates rules from the results of this learning method. The second step extracts informative content using rules obtained from the first step. However, if the second step does not return an extraction result, the first step gets invoked. In our experiments, the first step achieves high accuracy with 95.76% in extraction of the informative content. Moreover, 71.92% of the rules can be used in the extraction process, and it is approximately 240 times faster than the first step.

Journal ArticleDOI
TL;DR: A method to automatically define a personal ontology via a knowledge extraction process from the general purpose ontology YAGO is presented; starting from a set of keywords, the process is aimed to define a structured and semantically coherent representation of the user topical interests.
Abstract: Personalized search is aimed at tailoring the search outcome to users; to this aim user profiles play an important role: the more faithfully a user profile represents the user interests and preferences, the higher is the probability to improve the search process. In the approaches proposed in the literature, user profiles are formally represented as bags of words, as vectors, or as conceptual taxonomies, generally defined based on external knowledge resources (such as the WordNet and the ODP - Open Directory Project). Ontologies have been more recently considered as a powerful expressive means for knowledge representation. The advantage offered by ontological languages is that they allow a more structured and expressive knowledge representation with respect to the above mentioned approaches. A challenging research activity consists in defining user profiles by a knowledge extraction process from an existing ontology, with the main aim of producing a semantically rich representation of the user interests. In this paper a method to automatically define a personal ontology via a knowledge extraction process from the general purpose ontology YAGO is presented; starting from a set of keywords, which are representatives of the user interests, the process is aimed to define a structured and semantically coherent representation of the user topical interests. In the paper the proposed method is described, as well as some evaluations that show its effectiveness.

Journal ArticleDOI
TL;DR: A novel privacy-preserving collaborative filtering scheme based on bisecting k-means clustering in which two preprocessing methods are applied to relieve scalability and augment accuracy significantly.
Abstract: Privacy-preserving collaborative filtering is an emerging web-adaptation tool to cope with information overload problem without jeopardizing individuals' privacy. However, collaborative filtering with privacy schemes commonly suffer from scalability and sparseness as the content in the domain proliferates. Moreover, applying privacy measures causes a distortion in collected data, which in turn defects accuracy of such systems. In this work, we propose a novel privacy-preserving collaborative filtering scheme based on bisecting k-means clustering in which we apply two preprocessing methods. The first preprocessing scheme deals with scalability problem by constructing a binary decision tree through a bisecting k-means clustering approach while the second produces clones of users by inserting pseudo-self-predictions into original user profiles to boost accuracy of scalability-enhanced structure. Sparse nature of collections are handled by transforming ratings into item features-based profiles. After analyzing our scheme with respect to privacy and supplementary costs, we perform experiments on benchmark data sets to evaluate it in terms of accuracy and online performance. Our empirical outcomes verify that combined effects of the proposed preprocessing schemes relieve scalability and augment accuracy significantly.

Journal ArticleDOI
TL;DR: This paper provides a group recommendation similarity metric and demonstrates the convenience of tackling the aggregation of the group's users in the actual similarity metric of the collaborative filtering process.
Abstract: In collaborative filtering recommender systems recommendations can be made to groups of users. There are four basic stages in the collaborative filtering algorithms where the group's users' data can be aggregated to the data of the group of users: similarity metric, establishing the neighborhood, prediction phase, determination of recommended items. In this paper we perform aggregation experiments in each of the four stages and two fundamental conclusions are reached: (1) the system accuracy does not vary significantly according to the stage where the aggregation is performed, (2) the system performance improves notably when the aggregation is performed in an earlier stage of the collaborative filtering process. This paper provides a group recommendation similarity metric and demonstrates the convenience of tackling the aggregation of the group's users in the actual similarity metric of the collaborative filtering process.

Journal ArticleDOI
TL;DR: Findings from the research show that the gender-orientation of the key phrase is a significant determinant in predicting behaviors and performance, with statistically different consumer behaviors for all attributes as the probability of a male or female keyword phrase changes.
Abstract: In this research, we evaluate the effect of gender targeted advertising on the performance of sponsored search advertising. We analyze nearly 7,000,000 records spanning 33 consecutive months of a keyword advertising campaign from a major US retailer. In order to determine the effect of demographic targeting, we classify the campaign's key phrases by a probability of being targeted for a specific gender, and we then compare the key performance indicators among these groupings using the critical sponsored search metrics of impressions, clicks, cost-per-click, sales revenue, orders, and items, and return on advertising. Findings from our research show that the gender-orientation of the key phrase is a significant determinant in predicting behaviors and performance, with statistically different consumer behaviors for all attributes as the probability of a male or female keyword phrase changes. However, gender neutral phrases perform the best overall, generating 20 times the return of advertising than any gender targeted category. Insight from this research could result in sponsored advertising efforts being more effectively targeted to searchers and potential consumers.

Journal ArticleDOI
TL;DR: An overview of the frameworks developed to characterize such a multi-faceted concept is presented and the most common quality-related problems affecting metadata both during the creation and the aggregation phase are discussed.
Abstract: In this work, we elaborate on the meaning of metadata quality by surveying efforts and experiences matured in the digital library domain. In particular, an overview of the frameworks developed to characterize such a multi-faceted concept is presented. Moreover, the most common quality-related problems affecting metadata both during the creation and the aggregation phase are discussed together with the approaches, technologies and tools developed to mitigate them. This survey on digital library developments is expected to contribute to the ongoing discussion on data and metadata quality occurring in the emerging yet more general framework of data infrastructures.

Journal ArticleDOI
TL;DR: It is argued that collaboration is an important aspect of human-centered IR, and that the work provides interesting insights into people doing information seeking/retrieval in collaboration.
Abstract: Communication is considered to be one of the most essential components of collaboration, but our understanding as to which form of communication provides the most optimal cost-benefit balance lacks severely. To help investigate effects of various communication channels on a collaborative project, we conducted a user study with 30 pairs (60 participants) in three different conditions - co-located, remotely located with text chat, and remotely located with text as well as audio chat, in an exploratory search task. Using both quantitative and qualitative data analysis, we found that teams with remotely located participants were more effective in terms of being able to explore more diverse information. Adding audio support for remote collaboration helped participants to lower their cognitive load as well as negative emotions compared to those working in the same space. We also show how these findings could help design more effective systems for collaborative information seeking tasks using adequate and appropriate communication. We argue that collaboration is an important aspect of human-centered IR, and that our work provides interesting insights into people doing information seeking/retrieval in collaboration.

Journal ArticleDOI
TL;DR: The theory of communication and uncertainty management is reviewed and nine principles based on that theoretical work that can be used to influence IR system design are offered, reflecting a view of uncertainty as a multi-faceted and dynamic experience.
Abstract: Uncertainty is an important idea in information-retrieval (IR) research, but the concept has yet to be fully elaborated and explored. Common assumptions about uncertainty are (a) that it is a negative (anxiety-producing) state and (b) that it will be reduced through information search and retrieval. Research in the domain of uncertainty in illness, however, has demonstrated that uncertainty is a complex phenomenon that shares a complicated relationship with information. Past research on people living with HIV and individuals who have tested positive for genetic risk for different illnesses has revealed that information and the reduction of uncertainty can, in fact, produce anxiety, and that maintaining uncertainty can be associated with optimism and hope. We review the theory of communication and uncertainty management and offer nine principles based on that theoretical work that can be used to influence IR system design. The principles reflect a view of uncertainty as a multi-faceted and dynamic experience, one subject to ongoing appraisal and management efforts that include interaction with and use of information in a variety of forms.

Journal ArticleDOI
TL;DR: The basic theories of human development are described to explain the specifics of young users, i.e., their cognitive skills, fine motor skills, knowledge, memory and emotional states in so far as they differ from those of adults.
Abstract: In this paper, we present the state of the art in the field of information retrieval that is relevant for understanding how to design information retrieval systems for children. We describe basic theories of human development to explain the specifics of young users, i.e., their cognitive skills, fine motor skills, knowledge, memory and emotional states in so far as they differ from those of adults. We derive the implications these differences have on the design of information retrieval systems for children. Furthermore, we summarize the main findings about children's search behavior from multiple user studies. These findings are important to understand children's information needs, their search strategies and usage of information retrieval systems. We also identify several weaknesses of previous user studies about children's information-seeking behavior. Guided by the findings of these user studies, we describe challenges for the design of information retrieval systems for young users. We give an overview of algorithms and user interface concepts. We also describe existing information retrieval systems for children, in specific web search engines and digital libraries. We conclude with a discussion of open issues and directions for further research. The survey provided in this paper is important both for designers of information retrieval systems for young users as well as for researchers who start working in this field.

Journal ArticleDOI
TL;DR: This paper investigates the effects of different SRs on NER tasks, and proposes a feature generation method using multiple SRs that allows a model to exploit not only highly discriminative features of complex SRs but also robust features of simple SRs against the data sparseness problem.
Abstract: Named entity recognition (NER) is mostly formalized as a sequence labeling problem in which segments of named entities are represented by label sequences. Although a considerable effort has been made to investigate sophisticated features that encode textual characteristics of named entities (e.g. PEOPLE, LOCATION, etc.), little attention has been paid to segment representations (SRs) for multi-token named entities (e.g. the IOB2 notation). In this paper, we investigate the effects of different SRs on NER tasks, and propose a feature generation method using multiple SRs. The proposed method allows a model to exploit not only highly discriminative features of complex SRs but also robust features of simple SRs against the data sparseness problem. Since it incorporates different SRs as feature functions of Conditional Random Fields (CRFs), we can use the well-established procedure for training. In addition, the tagging speed of a model integrating multiple SRs can be accelerated equivalent to that of a model using only the most complex SR of the integrated model. Experimental results demonstrate that incorporating multiple SRs into a single model improves the performance and the stability of NER. We also provide the detailed analysis of the results.

Journal ArticleDOI
TL;DR: A novel approach called ''profile expansion'', based on the query expansion techniques used in Information Retrieval, is proposed and evaluated, which shows that both item-global and user-local offer outstanding improvements in precision, up to 100%.
Abstract: Collaborative Filtering techniques have become very popular in the last years as an effective method to provide personalized recommendations. They generally obtain much better accuracy than other techniques such as content-based filtering, because they are based on the opinions of users with tastes or interests similar to the user they are recommending to. However, this is precisely the reason of one of its main limitations: the cold-start problem. That is, how to recommend new items, not yet rated, or how to offer good recommendations to users they have not information about. For example, because they have recently joined the system. In fact, the new user problem is particularly serious, because an unsatisfied user may stop using the system before it could even collect enough information to generate good recommendations. In this article we tackle this problem with a novel approach called ''profile expansion'', based on the query expansion techniques used in Information Retrieval. In particular, we propose and evaluate three kinds of techniques: item-global, item-local and user-local. The experiments we have performed show that both item-global and user-local offer outstanding improvements in precision, up to 100%. Moreover, the improvements are statistically significant and consistent among different movie recommendation datasets and several training conditions.

Journal ArticleDOI
TL;DR: This work proposes a novel system to tackle redundancy in tweets by conducting two-stage NER for multiple similar tweets, which first pre-labels each tweet using a sequential labeler based on the linear Conditional Random Fields (CRFs) model.
Abstract: One main challenge of Named Entities Recognition (NER) for tweets is the insufficient information in a single tweet, owing to the noisy and short nature of tweets. We propose a novel system to tackle this challenge, which leverages redundancy in tweets by conducting two-stage NER for multiple similar tweets. Particularly, it first pre-labels each tweet using a sequential labeler based on the linear Conditional Random Fields (CRFs) model. Then it clusters tweets to put tweets with similar content into the same group. Finally, for each cluster it refines the labels of each tweet using an enhanced CRF model that incorporates the cluster level information, i.e., the labels of the current word and its neighboring words across all tweets in the cluster. We evaluate our method on a manually annotated dataset, and show that our method boosts the F1 of the baseline without collectively labeling from 75.4% to 82.5%.

Journal ArticleDOI
TL;DR: It is concluded that the application of semantic knowledge leads to more general models and aids in the recognition of temporal entities that are ambiguous at shallower language analysis levels and that lexical semantics and semantic roles have complementary advantages, and that it is useful to combine them.
Abstract: This paper addresses the problem of the automatic recognition and classification of temporal expressions and events in human language. Efficacy in these tasks is crucial if the broader task of temporal information processing is to be successfully performed. We analyze whether the application of semantic knowledge to these tasks improves the performance of current approaches. We therefore present and evaluate a data-driven approach as part of a system: TIPSem. Our approach uses lexical semantics and semantic roles as additional information to extend classical approaches which are principally based on morphosyntax. The results obtained for English show that semantic knowledge aids in temporal expression and event recognition, achieving an error reduction of 59% and 21%, while in classification the contribution is limited. From the analysis of the results it may be concluded that the application of semantic knowledge leads to more general models and aids in the recognition of temporal entities that are ambiguous at shallower language analysis levels. We also discovered that lexical semantics and semantic roles have complementary advantages, and that it is useful to combine them. Finally, we carried out the same analysis for Spanish. The results obtained show comparable advantages. This supports the hypothesis that applying the proposed semantic knowledge may be useful for different languages.

Journal ArticleDOI
TL;DR: This paper proposes an adaptation of the Relevance Modelling framework to effectively suggest recommendations to a user and proposes a probabilistic clustering technique to perform the neighbour selection process as a way to achieve a better approximation of the set of relevant items in the pseudo relevance feedback process.
Abstract: Relevance-Based Language Models, commonly known as Relevance Models, are successful approaches to explicitly introduce the concept of relevance in the statistical Language Modelling framework of Information Retrieval. These models achieve state-of-the-art retrieval performance in the pseudo relevance feedback task. On the other hand, the field of recommender systems is a fertile research area where users are provided with personalised recommendations in several applications. In this paper, we propose an adaptation of the Relevance Modelling framework to effectively suggest recommendations to a user. We also propose a probabilistic clustering technique to perform the neighbour selection process as a way to achieve a better approximation of the set of relevant items in the pseudo relevance feedback process. These techniques, although well known in the Information Retrieval field, have not been applied yet to recommender systems, and, as the empirical evaluation results show, both proposals outperform individually several baseline methods. Furthermore, by combining both approaches even larger effectiveness improvements are achieved.