scispace - formally typeset
Search or ask a question

Showing papers in "Information Retrieval in 2017"


Journal ArticleDOI
TL;DR: This paper lays out an experimental configuration framework upon which to identify and analyse specific statistical biases arising in the adaptation of Information Retrieval metrics to recommendation tasks, namely sparsity and popularity biases.
Abstract: There is an increasing consensus in the Recommender Systems community that the dominant error-based evaluation metrics are insufficient, and mostly inadequate, to properly assess the practical effectiveness of recommendations. Seeking to evaluate recommendation rankings—which largely determine the effective accuracy in matching user needs—rather than predicted rating values, Information Retrieval metrics have started to be applied for the evaluation of recommender systems. In this paper we analyse the main issues and potential divergences in the application of Information Retrieval methodologies to recommender system evaluation, and provide a systematic characterisation of experimental design alternatives for this adaptation. We lay out an experimental configuration framework upon which we identify and analyse specific statistical biases arising in the adaptation of Information Retrieval metrics to recommendation tasks, namely sparsity and popularity biases. These biases considerably distort the empirical measurements, hindering the interpretation and comparison of results across experiments. We develop a formal characterisation and analysis of the biases upon which we analyse their causes and main factors, as well as their impact on evaluation metrics under different experimental configurations, illustrating the theoretical findings with empirical evidence. We propose two experimental design approaches that effectively neutralise such biases to a large extent. We report experiments validating our proposed experimental variants, and comparing them to alternative approaches and metrics that have been defined in the literature with similar or related purposes.

108 citations


Journal ArticleDOI
TL;DR: This work proposes a probabilistic modeling approach called Neural Semantic Personalized Ranking (NSPR) to unify the strengths of deep neural network and pairwise learning and demonstrates NSPR’s versatility to integrate various pairwise probability functions.
Abstract: Recommender systems help users deal with information overload and enjoy a personalized experience on the Web. One of the main challenges in these systems is the item cold-start problem which is very common in practice since modern online platforms have thousands of new items published every day. Furthermore, in many real-world scenarios, the item recommendation tasks are based on users' implicit preference feedback such as whether a user has interacted with an item. To address the above challenges, we propose a probabilistic modeling approach called Neural Semantic Personalized Ranking (NSPR) to unify the strengths of deep neural network and pairwise learning. Specifically, NSPR tightly couples a latent factor model with a deep neural network to learn a robust feature representation from both implicit feedback and item content, consequently allowing our model to generalize to unseen items. We demonstrate NSPR's versatility to integrate various pairwise probability functions and propose two variants based on the Logistic and Probit functions. We conduct a comprehensive set of experiments on two real-world public datasets and demonstrate that NSPR significantly outperforms the state-of-the-art baselines.

38 citations


Journal ArticleDOI
TL;DR: A RWR-CR (standing for random walk with restart-based collaborator recommendation) algorithm in a heterogeneous bibliographic network towards this problem is proposed and its promising performance in collaborator prediction is demonstrated.
Abstract: The increasingly growing popularity of the collaboration among researchers and the increasing information overload in big scholarly data make it imperative to develop a collaborator recommendation system for researchers to find potential partners. Existing works always study this task as a link prediction problem in a homogeneous network with a single object type (i.e., author) and a single link type (i.e., co-authorship). However, a real-world academic social network often involves several object types, e.g., papers, terms, and venues, as well as multiple relationships among different objects. This paper proposes a RWR-CR (standing for random walk with restart-based collaborator recommendation) algorithm in a heterogeneous bibliographic network towards this problem. First, we construct a heterogeneous network with multiple types of nodes and links with a simplified network structure by removing the citing paper nodes. Then, two importance measures are used to weight edges in the network, which will bias a random walker’s behaviors. Finally, we employ a random walk with restart to retrieve relevant authors and output an ordered recommendation list in terms of ranking scores. Experimental results on DBLP and hep-th datasets demonstrate the effectiveness of our methodology and its promising performance in collaborator prediction.

32 citations


Journal ArticleDOI
TL;DR: An understanding of youths’ conceptions of Google can enable educators to better tailor their digital literacy instruction efforts and can inform search engine developers and search engine interface designers in making the inner workings of the engine more transparent and their output more trustworthy to young users.
Abstract: Although youth are increasingly going online to fulfill their needs for information, many youth struggle with information and digital literacy skills, such as the abilities to conduct a search and assess the credibility of online information. Ideally, these skills encompass an accurate and comprehensive understanding of the ways in which a system, such as a Web search engine, functions. In order to investigate youths' conceptions of the Google search engine, a drawing activity was conducted with 26 HackHealth after-school program participants to elicit their mental models of Google. The findings revealed that many participants personified Google and emphasized anthropomorphic elements, computing equipment, and/or connections (such as cables, satellites and antennas) in their drawings. Far fewer participants focused their drawings on the actual Google interface or on computer code. Overall, their drawings suggest a limited understanding of Google and the ways in which it actually works. However, an understanding of youths' conceptions of Google can enable educators to better tailor their digital literacy instruction efforts and can inform search engine developers and search engine interface designers in making the inner workings of the engine more transparent and their output more trustworthy to young users. With a better understanding of how Google works, young users will be better able to construct effective queries, assess search results, and ultimately find relevant and trustworthy information that will be of use to them.

27 citations


Journal ArticleDOI
TL;DR: The experimental results show that significantly better precision can be obtained using the enriched textual information rather than the math expressions’ own textual information, indicating that the enrichment of textual information for each math expression using dependency relationships enhances the math search system.
Abstract: Current mathematical search systems allow math expressions within a document to be queried using math expressions and keywords. To accept such queries, math search systems must index both math expressions and textual information in documents. Each indexed math expression is usually associated with all the words in its surrounding context within a given window size. However, we found that this context is often ineffective for explaining math expressions in scientific papers. The meaning of an expression is usually defined in the early part of a document, and the meaning of each symbol contained in the expression can be useful for explaining the entire expression. This explanation may not be captured within the context of a math expression, unless we set the context to have a very wide window size. However, widening the window size also increases the proportion of words that are unrelated to the expression. This paper proposes the use of dependency relationships between math expressions to enrich the textual information of each expression. We examine the influence of this enrichment in a math search system. The experimental results show that significantly better precision can be obtained using the enriched textual information rather than the math expressions' own textual information. This indicates that the enrichment of textual information for each math expression using dependency relationships enhances the math search system.

24 citations


Journal ArticleDOI
TL;DR: It is demonstrated that selective search can be tuned to avoid bottlenecks, and thus maximize usage of the underlying computer hardware.
Abstract: Simulation and analysis have shown that selective search can reduce the cost of large-scale distributed information retrieval. By partitioning the collection into small topical shards, and then using a resource ranking algorithm to choose a subset of shards to search for each query, fewer postings are evaluated. In this paper we extend the study of selective search into new areas using a fine-grained simulation, examining the difference in efficiency when term-based and sample-based resource selection algorithms are used; measuring the effect of two policies for assigning index shards to machines; and exploring the benefits of index-spreading and mirroring as the number of deployed machines is varied. Results obtained for two large datasets and four large query logs confirm that selective search is significantly more efficient than conventional distributed search architectures and can handle higher query rates. Furthermore, we demonstrate that selective search can be tuned to avoid bottlenecks, and thus maximize usage of the underlying computer hardware.

23 citations


Journal ArticleDOI
TL;DR: This study designs a two-stage approach which consists of question selection and question diversification to help customers who want to quickly capture the main idea of a lengthy product review before they read the details.
Abstract: Product reviews have become an important resource for customers before they make purchase decisions. However, the abundance of reviews makes it difficult for customers to digest them and make informed choices. In our study, we aim to help customers who want to quickly capture the main idea of a lengthy product review before they read the details. In contrast with existing work on review analysis and document summarization, we aim to retrieve a set of real-world user questions to summarize a review. In this way, users would know what questions a given review can address and they may further read the review only if they have similar questions about the product. Specifically, we design a two-stage approach which consists of question selection and question diversification. For question selection phase, we first employ probabilistic retrieval models to locate candidate questions that are relevant to a given review. A Recurrent Neural Network Encoder---Decoder is utilized to measure the "answerability" of questions to a review. We then design a set function to re-rank the questions with the goal of rewarding diversity in the final question set. The set function satisfies submodularity and monotonicity, which results in an efficient greedy algorithm of submodular optimization. Evaluation on product reviews from two categories shows that the proposed approach is effective for discovering meaningful questions that are representative of individual reviews.

19 citations


Journal ArticleDOI
TL;DR: This paper proposes to incorporate mouse movement information into existing click models to enhance the estimation of examination and shows a better ability to predict both user clicks and document relevance, than the original models.
Abstract: User interactions in Web search, in particular, clicks, provide valuable hints on document relevance; but the signals are very noisy. In order to better understand user click behaviors and to infer the implied relevance, various click models have been proposed, each relying on some hypotheses and involving different hidden events (e.g. examination). In almost all the existing click models, it is assumed that clicks are the only observable evidence and the examinations of documents are deduced from it. However, with an increasing number of embedded heterogeneous components (e.g. verticals) on Search Engine Result Pages, click information is not sufficient to draw a complete picture of process of user examination, especially in federated search scenario. In practice, we can also collect mouse movement information, which has proven to have a strong correlation with examination. In this paper, we propose to incorporate mouse movement information into existing click models to enhance the estimation of examination. The enhanced click models are shown to have a better ability to predict both user clicks and document relevance, than the original models. The collection of mouse movement information has been implemented in a commercial search engine, showing the feasibility of the approach in practice.

14 citations


Journal ArticleDOI
TL;DR: Two novel ideas are developed, interleaved LCPs and precomputed document lists, that yield highly compressed indexes solving the problem of document listing, top-k document retrieval, and document counting, and show that a classical data structure supporting the latter query becomes highly compressible on repetitive data.
Abstract: Most of the fastest-growing string collections today are repetitive, that is, most of the constituent documents are similar to many others. As these collections keep growing, a key approach to handling them is to exploit their repetitiveness, which can reduce their space usage by orders of magnitude. We study the problem of indexing repetitive string collections in order to perform efficient document retrieval operations on them. Document retrieval problems are routinely solved by search engines on large natural language collections, but the techniques are less developed on generic string collections. The case of repetitive string collections is even less understood, and there are very few existing solutions. We develop two novel ideas, interleaved LCPs and precomputed document lists, that yield highly compressed indexes solving the problem of document listing (find all the documents where a string appears), top-k document retrieval (find the k documents where a string appears most often), and document counting (count the number of documents where a string appears). We also show that a classical data structure supporting the latter query becomes highly compressible on repetitive data. Finally, we show how the tools we developed can be combined to solve ranked conjunctive and disjunctive multi-term queries under the simple $${\textsf{tf}}{\textsf{-}}{\textsf{idf}}$$tf-idf model of relevance. We thoroughly evaluate the resulting techniques in various real-life repetitiveness scenarios, and recommend the best choices for each case.

11 citations


Journal ArticleDOI
TL;DR: The paper describes this perspective, which motivated empirical work to ‘orchestrate’ a CIS searching to learn session, and foregrounds this approach to demonstrate the potential of orchestration as a design approach for researching and implementing CIS as a ‘searching to learn’ context.
Abstract: The paper describes our novel perspective on ‘searching to learn’ through collaborative information seeking (CIS). We describe this perspective, which motivated empirical work to ‘orchestrate’ a CIS searching to learn session. The work is described through the lens of orchestration, an approach which brings to the fore the ways in which: background context—including practical classroom constraints, and theoretical perspective; actors—including the educators, researchers, and technologies; and activities that are to be completed, are brought into alignment. The orchestration is exemplified through the description of research work designed to explore a pedagogically salient construct (epistemic cognition), in a particular institutional setting. Evaluation of the session indicated satisfaction with the orchestration from students, with written feedback indicating reflection from them on features of the orchestration. We foreground this approach to demonstrate the potential of orchestration as a design approach for researching and implementing CIS as a ‘searching to learn’ context.

11 citations


Journal ArticleDOI
TL;DR: Across four web test collections, it is found that the highest query evaluation speed is achieved by simply leaving the postings lists uncompressed, although the performance advantage over a state-of-the-art compression scheme is relatively small and the index is considerably larger.
Abstract: This paper explores the performance of top k document retrieval with score-at-a-time query evaluation on impact-ordered indexes in main memory. To better understand execution efficiency in the context of modern processor architectures, we examine the role of index compression on query evaluation latency. Experiments include compressing postings with variable byte encoding, Simple-8b, variants of the QMX compression scheme, as well as a condition that is less often considered--no compression. Across four web test collections, we find that the highest query evaluation speed is achieved by simply leaving the postings lists uncompressed, although the performance advantage over a state-of-the-art compression scheme is relatively small and the index is considerably larger. We explain this finding in terms of the design of modern processor architectures: Index segments with high impact scores are usually short and inherently benefit from cache locality. Index segments with lower impact scores may be quite long, but modern architectures have sufficient memory bandwidth (coupled with prefetching) to "keep up" with the processor. Our results highlight the importance of "architecture affinity" when designing high-performance search engines.

Journal ArticleDOI
TL;DR: A static cache that acts simultaneously as list and intersection cache, offering a more efficient way of handling cache space, and uses a query resolution strategy that takes advantage of the existence of this cache to reorder the query execution sequence.
Abstract: Modern information retrieval systems use several levels of caching to speedup computation by exploiting frequent, recent or costly data used in the past. Previous studies show that the use of caching techniques is crucial in search engines, as it helps reducing query response times and processing workloads on search servers. In this work we propose and evaluate a static cache that acts simultaneously as list and intersection cache, offering a more efficient way of handling cache space. We also use a query resolution strategy that takes advantage of the existence of this cache to reorder the query execution sequence. In addition, we propose effective strategies to select the term pairs that should populate the cache. We also represent the data in cache in both raw and compressed forms and evaluate the differences between them using different configurations of cache sizes. The results show that the proposed Integrated Cache outperforms the standard posting lists cache in most of the cases, taking advantage not only of the intersection cache but also the query resolution strategy.

Journal ArticleDOI
TL;DR: This paper proposes a dynamic Bayesian network, referred as Query Utility Model (QUM), to capture query utility by simultaneously modeling users’ reformulation and click behaviors and shows that, by recommending high utility queries, this approach is far more effective in helping users find relevant search results and thus satisfying their information needs.
Abstract: Query recommendation has long been considered a key feature of search engines, which can improve users' search experience by providing useful query suggestions for their search tasks. Most existing approaches on query recommendation aim to recommend relevant queries, i.e., alternative queries similar to a user's initial query. However, the ultimate goal of query recommendation is to assist users to reformulate queries so that they can accomplish their search task successfully and quickly. Only considering relevance in query recommendation is apparently not directly toward this goal. In this paper, we argue that it is more important to directly recommend queries with high utility, i.e., queries that can better satisfy users' information needs. For this purpose, we attempt to infer query utility from users' sequential search behaviors recorded in their search sessions. Specifically, we propose a dynamic Bayesian network, referred as Query Utility Model (QUM), to capture query utility by simultaneously modeling users' reformulation and click behaviors. We then recommend queries with high utility to help users better accomplish their search tasks. We empirically evaluated the performance of our approach on a publicly released query log by comparing with the state-of-the-art methods. The experimental results show that, by recommending high utility queries, our approach is far more effective in helping users find relevant search results and thus satisfying their information needs.

Journal ArticleDOI
TL;DR: Efficiency in indexing and searching email and documents in a multi-tenant cloud is important, and difficult to achieve, but when the individual enterprise search applications are small in scale, the investment of programmer time to achieve gains in efficiency can soon pay for itself in reduced server hosting costs.
Abstract: The efficiency of information retrieval (IR) algorithms has always been of interest to researchers at the computer science end of the IR field, and index compression techniques, intersection and ranking algorithms, and pruning mechanisms have been a constant feature of IR conferences and journals over many years. Efficiency is also of serious economic concern to operators of commercial web search engines, where a cluster of a thousand or more computers might participate in processing a single query, and where such clusters of machines might be replicated hundreds of times to handle the query load (Dean 2009). In this environment even relatively small improvements in query processing efficiency could potentially save tens of millions of dollars per year in terms of hardware and energy costs, and at the same time significantly reduce greenhouse gas emissions. In commercial data centres, query processing is by no means the only big IR consumer of server processing cycles. Crawling, indexing, format conversion, PageRank calculation, ranker training, deep learning, knowledge graph generation and processing, social network analysis, query classification, natural language processing, speech processing, question answering, query auto-completion, related search mechanisms, navigation systems and ad targeting are also computationally expensive, and potentially capable of being made more efficient. Data centers running such services are replicated across the world, and their operations provide every-day input to the lives of billions of people. Information retrieval algorithms also run at large scale in cloud-based services and in social media sites such as Facebook and Twitter. Efficiency in indexing and searching email and documents in a multi-tenant cloud is important, and difficult to achieve. Even so, when the individual enterprise search applications are small in scale, the investment of programmer time to achieve gains in efficiency can soon pay for itself in reduced server hosting costs.

Journal ArticleDOI
TL;DR: To understand if and how much a user click on a result document implies true relevance, one has to take into account different factors (usually named behavior biases), in addition to the factor of relevance, that may affect user click behaviors.
Abstract: Search has reached a level at which a good understanding of user interactions may significantly impact its quality. Among all kinds of user interactions, click-through behavior on search results is an important one that attracted much attention. Clicking a certain result (or advertisement, or query suggestions, etc.) is usually regarded as an implicit feedback signal for its relevance, which is, however, very noisy. To understand if and how much a user click on a result document implies true relevance, one has to take into account different factors (usually named behavior biases), in addition to the factor of relevance, that may affect user click behaviors. Joachims et al. (2005) worked on extracting reliable implicit feedback from user behaviors, and concluded that click logs are informative yet biased. Previous studies revealed several bias aspects such as ‘‘position’’ (Craswell et al. 2008), ‘‘trust’’ (O’Brien and Keane 2006) and ‘‘presentation’’ (Wang et al. 2013) factors. Recently, we have also witnessed the rising of ranking models which rely on click-through data as a biased noisy information source for training purposes (Wang et al. 2016; Joachims et al. 2017). Several click models (e.g. Dupret and Piwowarski 2008; Chapelle and Zhang 2009; Guo et al. 2009) have been proposed, which usually involve additional events (e.g. examination) and different assumptions. These models are designed to eliminate the effects of various behavior biases (e.g. position bias, presentation bias, trust bias, etc.) to provide a better estimation of result relevance. Many of these efforts have been adopted to generate useful ranking signals for production rankers of commercial search engines.

Journal ArticleDOI
TL;DR: The underlying hypothesis behind this paper is that given the previously clicked documents, a user tends to choose documents which provide novel relevant information to satisfy her information need, rather than redundant relevant information.
Abstract: Query logs contain rich feedback information from users interacting with search engines. Therefore, various click models have been developed to interpret users' search behavior and to extract useful knowledge from query logs. However, most existing models are not designed to consider novelty bias in click behavior. The underlying hypothesis behind this paper is that given the previously clicked documents, a user tends to choose documents which provide novel relevant information to satisfy her information need, rather than redundant relevant information. Moreover, the prior click models have been mainly tested on frequently occurring queries, hence, leaving a large proportion of sparse queries uncovered. In this paper, we propose to predict users' click behavior from the perspective of utility theory (i.e., utility and marginal utility). In particular, as a complement to the examination hypothesis, we introduce a new hypothesis called marginal utility hypothesis to characterize the effect of novelty bias on users' click behavior by exploring the semantic divergence among documents in a result list. Moreover, to cope with sparse or unseen queries that have not been observed in the training set, we use a set of descriptive features to quantify the probability of a document being relevant and probability of a document providing marginally (novel) useful information. Finally, a series of experiments are conducted on a real-world data set to validate the effectiveness of the proposed methods. The experimental results verify the effectiveness of interpreting users' click behavior based on the marginal utility hypothesis, especially when query sessions contain sparse queries or unseen query-document pairs.

Journal ArticleDOI
TL;DR: This paper introduces a novel time-aware metric—“sellability”, which is defined as the time duration for a used item to be traded, to quantify the value of it and proposes a combined Poisson regression and listwise ranking model.
Abstract: A number of online marketplaces enable customers to buy or sell used products, which raises the need for ranking tools to help them find desirable items among a huge pool of choices. To the best of our knowledge, no prior work in the literature has investigated the task of used product ranking which has its unique characteristics compared with regular product ranking. While there exist a few ranking metrics (e.g., price, conversion probability) that measure the "goodness" of a product, they do not consider the time factor, which is crucial in used product trading due to the fact that each used product is often unique while new products are usually abundant in supply or quantity. In this paper, we introduce a novel time-aware metric--"sellability", which is defined as the time duration for a used item to be traded, to quantify the value of it. In order to estimate the "sellability" values for newly generated used products and to present users with a ranked list of the most relevant results, we propose a combined Poisson regression and listwise ranking model. The model has a good property in fitting the distribution of "sellability". In addition, the model is designed to optimize loss functions for regression and ranking simultaneously, which is different from previous approaches that are conventionally learned with a single cost function, i.e., regression or ranking. We evaluate our approach in the domain of used vehicles. Experimental results show that the proposed model can improve both regression and ranking performance compared with non-machine learning and machine learning baselines.