scispace - formally typeset
Search or ask a question

Showing papers presented at "International ACM SIGIR Conference on Research and Development in Information Retrieval in 2013"


Proceedings ArticleDOI
28 Jul 2013
TL;DR: This paper defines a new problem, namely, the time-aware POI recommendation, to recommend POIs for a given user at a specified time in a day, and develops a collaborative recommendation model that is able to incorporate temporal information.
Abstract: The availability of user check-in data in large volume from the rapid growing location based social networks (LBSNs) enables many important location-aware services to users. Point-of-interest (POI) recommendation is one of such services, which is to recommend places where users have not visited before. Several techniques have been recently proposed for the recommendation service. However, no existing work has considered the temporal information for POI recommendations in LBSNs. We believe that time plays an important role in POI recommendations because most users tend to visit different places at different time in a day, \eg visiting a restaurant at noon and visiting a bar at night. In this paper, we define a new problem, namely, the time-aware POI recommendation, to recommend POIs for a given user at a specified time in a day. To solve the problem, we develop a collaborative recommendation model that is able to incorporate temporal information. Moreover, based on the observation that users tend to visit nearby POIs, we further enhance the recommendation model by considering geographical information. Our experimental results on two real-world datasets show that the proposed approach outperforms the state-of-the-art POI recommendation methods substantially.

731 citations


Proceedings ArticleDOI
28 Jul 2013
TL;DR: This paper empirically establishes that a novel method of tweet pooling by hashtags leads to a vast improvement in a variety of measures for topic coherence across three diverse Twitter datasets in comparison to an unmodified LDA baseline and a range of pooling schemes.
Abstract: Twitter, or the world of 140 characters poses serious challenges to the efficacy of topic models on short, messy text. While topic models such as Latent Dirichlet Allocation (LDA) have a long history of successful application to news articles and academic abstracts, they are often less coherent when applied to microblog content like Twitter. In this paper, we investigate methods to improve topics learned from Twitter content without modifying the basic machinery of LDA; we achieve this through various pooling schemes that aggregate tweets in a data preprocessing step for LDA. We empirically establish that a novel method of tweet pooling by hashtags leads to a vast improvement in a variety of measures for topic coherence across three diverse Twitter datasets in comparison to an unmodified LDA baseline and a variety of pooling schemes. An additional contribution of automatic hashtag labeling further improves on the hashtag pooling results for a subset of metrics. Overall, these two novel schemes lead to significantly improved LDA topic models on Twitter content.

475 citations


Proceedings ArticleDOI
28 Jul 2013
TL;DR: This paper describes a method that accounts for nascent information culled from Twitter to provide relevant recommendation in such cold-start situations and significantly outperforms other state-of-the-art recommendation techniques by up to 33%.
Abstract: As a tremendous number of mobile applications (apps) are readily available, users have difficulty in identifying apps that are relevant to their interests. Recommender systems that depend on previous user ratings (i.e., collaborative filtering, or CF) can address this problem for apps that have sufficient ratings from past users. But for apps that are newly released, CF does not have any user ratings to base recommendations on, which leads to the cold-start problem. In this paper, we describe a method that accounts for nascent information culled from Twitter to provide relevant recommendation in such cold-start situations. We use Twitter handles to access an app's Twitter account and extract the IDs of their Twitter-followers. We create pseudo-documents that contain the IDs of Twitter users interested in an app and then apply latent Dirichlet allocation to generate latent groups. At test time, a target user seeking recommendations is mapped to these latent groups. By using the transitive relationship of latent groups to apps, we estimate the probability of the user liking the app. We show that by incorporating information from Twitter, our approach overcomes the difficulty of cold-start app recommendation and significantly outperforms other state-of-the-art recommendation techniques by up to 33%.

195 citations


Proceedings ArticleDOI
Milad Shokouhi1
28 Jul 2013
TL;DR: The results suggest that supervised rankers enhanced by personalization features can significantly outperform the existing popularity-based base-lines, in terms of mean reciprocal rank (MRR) by up to 9%.
Abstract: Query auto-completion (QAC) is one of the most prominent features of modern search engines. The list of query candidates is generated according to the prefix entered by the user in the search box and is updated on each new key stroke. Query prefixes tend to be short and ambiguous, and existing models mostly rely on the past popularity of matching candidates for ranking. However, the popularity of certain queries may vary drastically across different demographics and users. For instance, while instagram and imdb have comparable popularities overall and are both legitimate candidates to show for prefix i, the former is noticeably more popular among young female users, and the latter is more likely to be issued by men. In this paper, we present a supervised framework for personalizing auto-completion ranking. We introduce a novel labelling strategy for generating offline training labels that can be used for learning personalized rankers. We compare the effectiveness of several user-specific and demographic-based features and show that among them, the user's long-term search history and location are the most effective for personalizing auto-completion rankers. We perform our experiments on the publicly available AOL query logs, and also on the larger-scale logs of Bing. The results suggest that supervised rankers enhanced by personalization features can significantly outperform the existing popularity-based base-lines, in terms of mean reciprocal rank (MRR) by up to 9%.

195 citations


Proceedings ArticleDOI
Hao Ma1
28 Jul 2013
TL;DR: This study conducts comprehensive experimental analysis on three recommendation datasets and indicates that implicit user and item social information, including similar and dissimilar relationships, can be employed to improve traditional recommendation methods.
Abstract: Social recommendation problems have drawn a lot of attention recently due to the prevalence of social networking sites. The experiments in previous literature suggest that social information is very effective in improving traditional recommendation algorithms. However, explicit social information is not always available in most of the recommender systems, which limits the impact of social recommendation techniques. In this paper, we study the following two research problems: (1) In some systems without explicit social information, can we still improve recommender systems using implicit social information? (2) In the systems with explicit social information, can the performance of using implicit social information outperform that of using explicit social information? In order to answer these two questions, we conduct comprehensive experimental analysis on three recommendation datasets. The result indicates that: (1) Implicit user and item social information, including similar and dissimilar relationships, can be employed to improve traditional recommendation methods. (2) When comparing implicit social information with explicit social information, the performance of using implicit information is slightly worse. This study provides additional insights to social recommendation techniques, and also greatly widens the utility and spreads the impact of previous and upcoming social recommendation approaches.

191 citations


Proceedings ArticleDOI
28 Jul 2013
TL;DR: This paper proposes to dynamically choose negative training samples from the ranked list produced by the current prediction model and iteratively update the model, showing that this approach not only reduces the training time, but also leads to significant performance gains.
Abstract: Collaborative filtering techniques rely on aggregated user preference data to make personalized predictions. In many cases, users are reluctant to explicitly express their preferences and many recommender systems have to infer them from implicit user behaviors, such as clicking a link in a webpage or playing a music track. The clicks and the plays are good for indicating the items a user liked (i.e., positive training examples), but the items a user did not like (negative training examples) are not directly observed. Previous approaches either randomly pick negative training samples from unseen items or incorporate some heuristics into the learning model, leading to a biased solution and a prolonged training period. In this paper, we propose to dynamically choose negative training samples from the ranked list produced by the current prediction model and iteratively update our model. The experiments conducted on three large-scale datasets show that our approach not only reduces the training time, but also leads to significant performance gains.

186 citations


Proceedings ArticleDOI
28 Jul 2013
TL;DR: This article proposes a novel TF-IDF term weighting scheme that employs two different within document term frequency normalizations to capture two different aspects of term saliency.
Abstract: Term weighting schemes are central to the study of information retrieval systems. This article proposes a novel TF-IDF term weighting scheme that employs two different within document term frequency normalizations to capture two different aspects of term saliency. One component of the term frequency is effective for short queries, while the other performs better on long queries. The final weight is then measured by taking a weighted combination of these components, which is determined on the basis of the length of the corresponding query. Experiments conducted on a large number of TREC news and web collections demonstrate that the proposed scheme almost always outperforms five state of the art retrieval models with remarkable significance and consistency. The experimental results also show that the proposed model achieves significantly better precision than the existing models.

172 citations


Proceedings ArticleDOI
Ryen W. White1
28 Jul 2013
TL;DR: Targeting yes-no questions in the critical domain of health search, it is shown that Web searchers exhibit their own biases and are also subject to bias from the search engine, and that search engines strongly favor a particular, usually positive, perspective, irrespective of the truth.
Abstract: People's beliefs, and unconscious biases that arise from those beliefs, influence their judgment, decision making, and actions, as is commonly accepted among psychologists. Biases can be observed in information retrieval in situations where searchers seek or are presented with information that significantly deviates from the truth. There is little understanding of the impact of such biases in search. In this paper we study search-related biases via multiple probes: an exploratory retrospective survey, human labeling of the captions and results returned by a Web search engine, and a large-scale log analysis of search behavior on that engine. Targeting yes-no questions in the critical domain of health search, we show that Web searchers exhibit their own biases and are also subject to bias from the search engine. We clearly observe searchers favoring positive information over negative and more than expected given base rates based on consensus answers from physicians. We also show that search engines strongly favor a particular, usually positive, perspective, irrespective of the truth. Importantly, we show that these biases can be counterproductive and affect search outcomes; in our study, around half of the answers that searchers settled on were actually incorrect. Our findings have implications for search engine design, including the development of ranking algorithms that con-sider the desire to satisfy searchers (by validating their beliefs) and providing accurate answers and properly considering base rates. Incorporating likelihood information into search is particularly important for consequential tasks, such as those with a medical focus.

160 citations


Proceedings ArticleDOI
28 Jul 2013
TL;DR: It is argued that a system should not only recommend the most relevant item, but also recommend at the right time, and a new opportunity model is proposed to explicitly incorporate time in an e-commerce recommender system.
Abstract: Most of existing e-commerce recommender systems aim to recommend the right product to a user, based on whether the user is likely to purchase or like a product. On the other hand, the effectiveness of recommendations also depends on the time of the recommendation. Let us take a user who just purchased a laptop as an example. She may purchase a replacement battery in 2 years (assuming that the laptop's original battery often fails to work around that time) and purchase a new laptop in another 2 years. In this case, it is not a good idea to recommend a new laptop or a replacement battery right after the user purchased the new laptop. It could hurt the user's satisfaction of the recommender system if she receives a potentially right product recommendation at the wrong time. We argue that a system should not only recommend the most relevant item, but also recommend at the right time. This paper studies the new problem: how to recommend the right product at the right time? We adapt the proportional hazards modeling approach in survival analysis to the recommendation research field and propose a new opportunity model to explicitly incorporate time in an e-commerce recommender system. The new model estimates the joint probability of a user making a follow-up purchase of a particular product at a particular time. This joint purchase probability can be leveraged by recommender systems in various scenarios, including the zero-query pull-based recommendation scenario (e.g. recommendation on an e-commerce web site) and a proactive push-based promotion scenario (e.g. email or text message based marketing). We evaluate the opportunity modeling approach with multiple metrics. Experimental results on a data collected by a real-world e-commerce website(shop.com) show that it can predict a user's follow-up purchase behavior at a particular time with descent accuracy. In addition, the opportunity model significantly improves the conversion rate in pull-based systems and the user satisfaction/utility in push-based systems.

158 citations


Proceedings ArticleDOI
28 Jul 2013
TL;DR: This paper proposes a novel prototype called Sumblr (SUMmarization By stream cLusteRing) for tweet streams, and develops a TCV-Rank summarization technique for generating online summaries and historical summaries of arbitrary time durations.
Abstract: With the explosive growth of microblogging services, short-text messages (also known as tweets) are being created and shared at an unprecedented rate. Tweets in its raw form can be incredibly informative, but also overwhelming. For both end-users and data analysts it is a nightmare to plow through millions of tweets which contain enormous noises and redundancies. In this paper, we study continuous tweet summarization as a solution to address this problem. While traditional document summarization methods focus on static and small-scale data, we aim to deal with dynamic, quickly arriving, and large-scale tweet streams. We propose a novel prototype called Sumblr (SUMmarization By stream cLusteRing) for tweet streams. We first propose an online tweet stream clustering algorithm to cluster tweets and maintain distilled statistics called Tweet Cluster Vectors. Then we develop a TCV-Rank summarization technique for generating online summaries and historical summaries of arbitrary time durations. Finally, we describe a topic evolvement detection method, which consumes online and historical summaries to produce timelines automatically from tweet streams. Our experiments on large-scale real tweets demonstrate the efficiency and effectiveness of our approach.

132 citations


Proceedings ArticleDOI
28 Jul 2013
TL;DR: A novel scheme to crawl the relevant messages related to the designated organization by monitoring multi-aspects of microblog content, an incremental clustering framework to detect new topics, and a range of content and temporal features to help in promptly detecting hot emerging topics are designed.
Abstract: Microblog services have emerged as an essential way to strengthen the communications among individuals and organizations. These services promote timely and active discussions and comments towards products, markets as well as public events, and have attracted a lot of attentions from organizations. In particular, emerging topics are of immediate concerns to organizations since they signal current concerns of, and feedback by their users. Two challenges must be tackled for effective emerging topic detection. One is the problem of real-time relevant data collection and the other is the ability to model the emerging characteristics of detected topics and identify them before they become hot topics. To tackle these challenges, we first design a novel scheme to crawl the relevant messages related to the designated organization by monitoring multi-aspects of microblog content, including users, the evolving keywords and their temporal sequence. We then develop an incremental clustering framework to detect new topics, and employ a range of content and temporal features to help in promptly detecting hot emerging topics. Extensive evaluations on a representative real-world dataset based on Twitter data demonstrate that our scheme is able to characterize emerging topics well and detect them before they become hot topics.

Proceedings ArticleDOI
28 Jul 2013
TL;DR: This paper proposes two complementary evaluation measures -- Reliability and Sensitivity -- for the generic Document Organization task which are derived from a proposed set of formal constraints (properties that any suitable measure must satisfy).
Abstract: A number of key Information Access tasks -- Document Retrieval, Clustering, Filtering, and their combinations -- can be seen as instances of a generic {\em document organization} problem that establishes priority and relatedness relationships between documents (in other words, a problem of forming and ranking clusters). As far as we know, no analysis has been made yet on the evaluation of these tasks from a global perspective. In this paper we propose two complementary evaluation measures -- Reliability and Sensitivity -- for the generic Document Organization task which are derived from a proposed set of formal constraints (properties that any suitable measure must satisfy). In addition to be the first measures that can be applied to any mixture of ranking, clustering and filtering tasks, Reliability and Sensitivity satisfy more formal constraints than previously existing evaluation metrics for each of the subsumed tasks. Besides their formal properties, its most salient feature from an empirical point of view is their strictness: a high score according to the harmonic mean of Reliability and Sensitivity ensures a high score with any of the most popular evaluation metrics in all the Document Retrieval, Clustering and Filtering datasets used in our experiments.

Journal ArticleDOI
21 Jan 2013
TL;DR: A novel interleaved comparison method, called probabilistic interleave, that allows unbiased comparisons of search engine result rankings, and methods for learning quickly and effectively from the resulting relative feedback are developed.
Abstract: The amount of digital data we produce every day far surpasses our ability to process this data, and finding useful information in this constant flow of data has become one of the major challenges of the 21st century. Search engines are one way of accessing large data collections. Their algorithms have evolved far beyond simply matching search queries to sets of documents. Today’s most sophisticated search engines combine hundreds of relevance signals to provide the best possible results for each searcher. Current approaches for tuning the parameters of search engines can be highly effective. However, they typically require considerable expertise and manual effort. They rely on supervised learning to rank, meaning that they learn from manually annotated examples of relevant documents for given queries. Obtaining large quantities of sufficiently accurate manual annotations is becoming increasingly difficult, especially for personalized search, access to sensitive data, or search in settings that change over time. In this thesis, I develop new online learning to rank techniques, based on insights from reinforcement learning. In contrast to supervised approaches, these methods allow search engines to learn directly from users’ interactions. User interactions can typically be observed easily and cheaply, and reflect the preferences of real users. Interpreting user interactions and learning from them is challenging, because they can be biased and noisy. The contributions of this thesis include a novel interleaved comparison method, called probabilistic interleave, that allows unbiased comparisons of search engine result rankings, and methods for learning quickly and effectively from the resulting relative feedback. The obtained analytical and experimental results show how search engines can effectively learn from user interactions. In the future, these and similar techniques can open up new ways for gaining useful information from ever larger amounts of data.

Proceedings ArticleDOI
28 Jul 2013
TL;DR: Within a learning to-rank framework, a wide range of features, such as retweet history, followers status, followers active time and followers interests, are explored, finding that followers who retweeted or mentioned the author's tweets frequently before and have common interests are more likely to be retweeters.
Abstract: An important aspect of communication in Twitter (and other Social Network is message propagation -- people creating posts for others to share. Although there has been work on modelling how tweets in Twitter are propagated (retweeted), an untackled problem has been who will retweet a message. Here we consider the task of finding who will retweet a message posted on Twitter. Within a learning to-rank framework, we explore a wide range of features, such as retweet history, followers status, followers active time and followers interests. We find that followers who retweeted or mentioned the author's tweets frequently before and have common interests are more likely to be retweeters.

Proceedings ArticleDOI
28 Jul 2013
TL;DR: This work develops a new, generalized Language Modeling approach for IR by adopting the probabilistic framework of QT, which is the first practical application of quantum probability to show significant improvements over a robust bag-of-words baseline and achieves better performance on a stronger non bag- of- words baseline.
Abstract: Traditional information retrieval (IR) models use bag-of-words as the basic representation and assume that some form of independence holds between terms. Representing term dependencies and defining a scoring function capable of integrating such additional evidence is theoretically and practically challenging. Recently, Quantum Theory (QT) has been proposed as a possible, more general framework for IR. However, only a limited number of investigations have been made and the potential of QT has not been fully explored and tested. We develop a new, generalized Language Modeling approach for IR by adopting the probabilistic framework of QT. In particular, quantum probability could account for both single and compound terms at once without having to extend the term space artificially as in previous studies. This naturally allows us to avoid the weight-normalization problem, which arises in the current practice by mixing scores from matching compound terms and from matching single terms. Our model is the first practical application of quantum probability to show significant improvements over a robust bag-of-words baseline and achieves better performance on a stronger non bag-of-words baseline.

Proceedings ArticleDOI
28 Jul 2013
TL;DR: The authors' L2R method was trained to learn the answer rating, based on the feedback users give to answers in Q&A forums, and was able to outperform a state of the art baseline with gains of up to 21% in NDCG, a metric used to evaluate rankings.
Abstract: Collaborative web sites, such as collaborative encyclopedias, blogs, and forums, are characterized by a loose edit control, which allows anyone to freely edit their content. As a consequence, the quality of this content raises much concern. To deal with this, many sites adopt manual quality control mechanisms. However, given their size and change rate, manual assessment strategies do not scale and content that is new or unpopular is seldom reviewed. This has a negative impact on the many services provided, such as ranking and recommendation. To tackle with this problem, we propose a learning to rank (L2R) approach for ranking answers in QA we also conducted a comprehensive study of the features, showing that (ii) review and user features are the most important in the QA and (iii) the best set of new features we proposed was able to yield the best quality rankings.

Proceedings ArticleDOI
Chao Wang1, Yiqun Liu1, Min Zhang1, Shaoping Ma1, Meihong Zheng1, Jing Qian1, Kuo Zhang1 
28 Jul 2013
TL;DR: Experimental results show that the new Vertical-aware Click Model (VCM) is better at interpreting user click behavior on federated searches in terms of both log-likelihood and perplexity than existing models.
Abstract: In modern search engines, an increasing number of search result pages (SERPs) are federated from multiple specialized search engines (called verticals, such as Image or Video). As an effective approach to interpret users' click-through behavior as feedback information, most click models were designed to reduce the position bias and improve ranking performance of ordinary search results, which have homogeneous appearances. However, when vertical results are combined with ordinary ones, significant differences in presentation may lead to user behavior biases and thus failure of state-of-the-art click models. With the help of a popular commercial search engine in China, we collected a large scale log data set which contains behavior information on both vertical and ordinary results. We also performed eye-tracking analysis to study user's real-world examining behavior. According these analysis, we found that different result appearances may cause different behavior biases both for vertical results (local effect) and for the whole result lists (global effect). These biases include: examine bias for vertical results (especially those with multimedia components), trust bias for result lists with vertical results, and a higher probability of result revisitation for vertical results. Based on these findings, a novel click model considering these biases besides position bias was constructed to describe interaction with SERPs containing verticals. Experimental results show that the new Vertical-aware Click Model (VCM) is better at interpreting user click behavior on federated searches in terms of both log-likelihood and perplexity than existing models.

Proceedings ArticleDOI
28 Jul 2013
TL;DR: A novel query change retrieval model (QCM), which utilizes syntactic editing changes between adjacent queries as well as the relationship between query change and previously retrieved documents to enhance session search.
Abstract: Session search is the Information Retrieval (IR) task that performs document retrieval for a search session. During a session, a user constantly modifies queries in order to find relevant documents that fulfill the information need. This paper proposes a novel query change retrieval model (QCM), which utilizes syntactic editing changes between adjacent queries as well as the relationship between query change and previously retrieved documents to enhance session search. We propose to model session search as a Markov Decision Process (MDP). We consider two agents in this MDP: the user agent and the search engine agent. The user agent's actions are query changes that we observe and the search agent's actions are proposed in this paper. Experiments show that our approach is highly effective and outperforms top session search systems in TREC 2011 and 2012.

Proceedings ArticleDOI
28 Jul 2013
TL;DR: This paper examines a multi-stage retrieval architecture consisting of a candidate generation stage, a feature extraction stage, and a reranking stage using machine-learned models and shows that conjunctive WAND is the best overall candidate generation strategy.
Abstract: This paper examines a multi-stage retrieval architecture consisting of a candidate generation stage, a feature extraction stage, and a reranking stage using machine-learned models. Given a fixed set of features and a learning-to-rank model, we explore effectiveness/efficiency tradeoffs with three candidate generation approaches: postings intersection with SvS, conjunctive query evaluation with WAND, and disjunctive query evaluation with WAND. We find no significant differences in end-to-end effectiveness as measured by NDCG between conjunctive and disjunctive WAND, but conjunctive query evaluation is substantially faster. Postings intersection with SvS, while fast, yields substantially lower end-to-end effectiveness, suggesting that document and term frequencies remain important in the initial ranking stage. These findings show that conjunctive WAND is the best overall candidate generation strategy of those we examined.

Proceedings ArticleDOI
28 Jul 2013
TL;DR: A clearer picture is provided on how to further improve current voice search systems by evaluating the impacts of typical voice input errors on users' search progress and the effectiveness of different reformulation strategies on handling these errors.
Abstract: Voice search offers users with a new search experience: instead of typing, users can vocalize their search queries. However, due to voice input errors (such as speech recognition errors and improper system interruptions), users need to frequently reformulate queries to handle the incorrectly recognized queries. We conducted user experiments with native English speakers on their query reformulation behaviors in voice search and found that users often reformulate queries with both lexical and phonetic changes to previous queries. In this paper, we first characterize and analyze typical voice input errors in voice search and users' corresponding reformulation strategies. Then, we evaluate the impacts of typical voice input errors on users' search progress and the effectiveness of different reformulation strategies on handling these errors. This study provides a clearer picture on how to further improve current voice search systems.

Proceedings ArticleDOI
28 Jul 2013
TL;DR: A sentiment-aware nearest neighbor model (SANN) for multimedia recommendations over TED talks, which makes use of user comments is proposed, which outperforms significantly, by more than 25% on unseen data, several competitive baselines.
Abstract: User-generated texts such as reviews, comments or discussions are valuable indicators of users' preferences. Unlike previous works which focus on labeled data from user-contributed reviews, we focus here on user comments which are not accompanied by explicit rating labels. We investigate their utility for a one-class collaborative filtering task such as bookmarking, where only the user actions are given as ground truth. We propose a sentiment-aware nearest neighbor model (SANN) for multimedia recommendations over TED talks, which makes use of user comments. The model outperforms significantly, by more than 25% on unseen data, several competitive baselines.

Proceedings ArticleDOI
Qi Guo1, Haojian Jin2, Dmitry Lagun2, Shuai Yuan2, Eugene Agichtein2 
28 Jul 2013
TL;DR: This paper evaluates a variety of touch interactions on a smart phone as implicit relevance feedback, and compares them with the corresponding fine-grained interaction on a desktop computer with mouse and keyboard as the primary input devices, and demonstrates significant improvements to search ranking quality by mining touch interaction data.
Abstract: Fine-grained search interactions in the desktop setting, such as mouse cursor movements and scrolling, have been shown valuable for understanding user intent, attention, and their preferences for Web search results. As web search on smart phones and tablets becomes increasingly popular, previously validated desktop interaction models have to be adapted for the available touch interactions such as pinching and swiping, and for the different device form factors. In this paper, we present, to our knowledge, the first in-depth study of modeling interactions on touch-enabled device for improving Web search ranking. In particular, we evaluate a variety of touch interactions on a smart phone as implicit relevance feedback, and compare them with the corresponding fine-grained interactions on a desktop computer with mouse and keyboard as the primary input devices. Our experiments are based on a dataset collected from two user studies with 56 users in total, using a specially instrumented version of a popular mobile browser to capture the interaction data. We report a detailed analysis of the similarities and differences of fine-grained search interactions between the desktop and the smart phone modalities, and identify novel patterns of touch interactions indicative of result relevance. Finally, we demonstrate significant improvements to search ranking quality by mining touch interaction data.

Proceedings ArticleDOI
28 Jul 2013
TL;DR: The Explicit Localized Semantic Analysis (ELSA), an ESA-based topical representation of documents that stresses only the topics that are relevant to a given location, whereas all topics are equally important in ESA.
Abstract: The interest of users in handheld devices is strongly related to their location. Therefore, the user location is important, as a user context, for news article recommendation in a mobile environment. This paper proposes a novel news article recommendation that reflects the geographical context of the user. For this purpose, we propose the Explicit Localized Semantic Analysis (ELSA), an ESA-based topical representation of documents. Every location has its own geographical topics, which can be captured from the geo-tagged documents related to the location. Thus, not only news articles but locations are also represented as topic vectors. The main advantage of ELSA is that it stresses only the topics that are relevant to a given location, whereas all topics are equally important in ESA. As a result, geographical topics have different importance according to the user location in ELSA, even if they come from the same article. Another advantage of ELSA is that it allows a simple comparison of the user location and news articles, because it projects both locations and articles onto an identical space composed of Wikipedia topics. In the evaluation of ELSA with the New York Times corpus, it outperformed two simple baselines of Bag-Of-Words and LDA as well as two ESA-based methods. Rt10 of ELSA was improved up to 46.25% over other methods, and its NDCG@k was always higher than those of the others regardless of k.

Proceedings ArticleDOI
28 Jul 2013
TL;DR: This work identifies and addresses three key problems for ID retrieval: identifying authentic examples of ID tasks from post-hoc analysis of behavioral signals in search logs; learning to identify initiator queries that mark the start of an ID search task; and given an initiator query, predicting which content to prefetch and rank.
Abstract: Current research on web search has focused on optimizing and evaluating single queries. However, a significant fraction of user queries are part of more complex tasks [20] which span multiple queries across one or more search sessions [26,24]. An ideal search engine would not only retrieve relevant results for a user's particular query but also be able to identify when the user is engaged in a more complex task and aid the user in completing that task [29,1]. Toward optimizing whole-session or task relevance, we characterize and address the problem of intrinsic diversity (ID) in retrieval [30], a type of complex task that requires multiple interactions with current search engines. Unlike existing work on extrinsic diversity [30] that deals with ambiguity in intent across multiple users, ID queries often have little ambiguity in intent but seek content covering a variety of aspects on a shared theme. In such scenarios, the underlying needs are typically exploratory, comparative, or breadth-oriented in nature. We identify and address three key problems for ID retrieval: identifying authentic examples of ID tasks from post-hoc analysis of behavioral signals in search logs; learning to identify initiator queries that mark the start of an ID search task; and given an initiator query, predicting which content to prefetch and rank.

Proceedings ArticleDOI
Tetsuya Sakai1, Zhicheng Dou1
28 Jul 2013
TL;DR: A general information access evaluation framework that can potentially handle summaries, ranked document lists and even multi query sessions seamlessly, and is useful for understanding the user's search behaviour and for comparison across different information access styles.
Abstract: We introduce a general information access evaluation framework that can potentially handle summaries, ranked document lists and even multi query sessions seamlessly. Our framework first builds a trailtext which represents a concatenation of all the texts read by the user during a search session, and then computes an evaluation metric called U-measure over the trailtext. Instead of discounting the value of a retrieved piece of information based on ranks, U-measure discounts it based on its position within the trailtext. U-measure takes the document length into account just like Time-Biased Gain (TBG), and has the diminishing return property. It is therefore more realistic than rank-based metrics. Furthermore, it is arguably more flexible than TBG, as it is free from the linear traversal assumption (i.e., that the user scans the ranked list from top to bottom), and can handle information access tasks other than ad hoc retrieval. This paper demonstrates the validity and versatility of the U-measure framework. Our main conclusions are: (a) For ad hoc retrieval, U-measure is at least as reliable as TBG in terms of rank correlations with traditional metrics and discriminative power; (b) For diversified search, our diversity versions of U-measure are highly correlated with state-of-the-art diversity metrics; (c) For multi-query sessions, U-measure is highly correlated with Session nDCG; and (d) Unlike rank-based metrics such as DCG, U-measure can quantify the differences between linear and nonlinear traversals in sessions. We argue that our new framework is useful for understanding the user's search behaviour and for comparison across different information access styles (e.g. examining a direct answer vs. examining a ranked list of web pages).

Proceedings ArticleDOI
28 Jul 2013
TL;DR: Results show that subjects who used the Structured interface submitted significantly fewer queries, spent more time on search results pages, examined significantly more documents per query, and went to greater depths in the search results list.
Abstract: affects how users interact with a search system. Microeconomic theory is used to generate the cost-interaction hypothesis that states as the cost of querying increases, users will pose fewer queries and examine more documents per query. A between-subjects laboratory study with 36 undergraduate subjects was conducted, where subjects were randomly assigned to use one of three search interfaces that varied according to the amount of physical cost required to query: Structured (high cost), Standard (medium cost) and Query Suggestion (low cost). Results show that subjects who used the Structured interface submitted significantly fewer queries, spent more time on search results pages, examined significantly more documents per query, and went to greater depths in the search results list. Results also showed that these subjects spent longer generating their initial queries, saved more relevant documents and rated their queries as more successful. These findings have implications for the usefulness of microeconomic theory as a way to model and explain search interaction, as well as for the design of query facilities.

Proceedings ArticleDOI
28 Jul 2013
TL;DR: A time-aware user behavior model is proposed, the Tweet Propagation Model (TPM), in which dynamic probabilistic distributions over interests and topics are inferred and an iterative optimization algorithm for selecting tweets is proposed.
Abstract: We focus on the problem of selecting meaningful tweets given a user's interests; the dynamic nature of user interests, the sheer volume, and the sparseness of individual messages make this an challenging problem. Specifically, we consider the task of time-aware tweets summarization, based on a user's history and collaborative social influences from ``social circles.'' We propose a time-aware user behavior model, the Tweet Propagation Model (TPM), in which we infer dynamic probabilistic distributions over interests and topics. We then explicitly consider novelty, coverage, and diversity to arrive at an iterative optimization algorithm for selecting tweets. Experimental results validate the effectiveness of our personalized time-aware tweets summarization method based on TPM.

Proceedings ArticleDOI
28 Jul 2013
TL;DR: This paper proposes a general ranking model adaptation framework for personalized search that quickly learns to apply a series of linear transformations over the parameters of the given global ranking model such that the adapted model can better fit each individual user's search preferences.
Abstract: Search engines train and apply a single ranking model across all users, but searchers' information needs are diverse and cover a broad range of topics. Hence, a single user-independent ranking model is insufficient to satisfy different users' result preferences. Conventional personalization methods learn separate models of user interests and use those to re-rank the results from the generic model. Those methods require significant user history information to learn user preferences, have low coverage in the case of memory-based methods that learn direct associations between query-URL pairs, and have limited opportunity to markedly affect the ranking given that they only re-order top-ranked items. In this paper, we propose a general ranking model adaptation framework for personalized search. Using a given user-independent ranking model trained offline and limited number of adaptation queries from individual users, the framework quickly learns to apply a series of linear transformations, e.g., scaling and shifting, over the parameters of the given global ranking model such that the adapted model can better fit each individual user's search preferences. Extensive experimentation based on a large set of search logs from a major commercial Web search engine confirms the effectiveness of the proposed method compared to several state-of-the-art ranking model adaptation methods.

Proceedings ArticleDOI
28 Jul 2013
TL;DR: It is shown that offline metrics that are based on click models are more strongly correlated with online experimental outcomes than traditional offline metrics, especially in situations when the authors have incomplete relevance judgements.
Abstract: In recent years many models have been proposed that are aimed at predicting clicks of web search users. In addition, some information retrieval evaluation metrics have been built on top of a user model. In this paper we bring these two directions together and propose a common approach to converting any click model into an evaluation metric. We then put the resulting model-based metrics as well as traditional metrics (like DCG or Precision) into a common evaluation framework and compare them along a number of dimensions. One of the dimensions we are particularly interested in is the agreement between offline and online experimental outcomes. It is widely believed, especially in an industrial setting, that online A/B-testing and interleaving experiments are generally better at capturing system quality than offline measurements. We show that offline metrics that are based on click models are more strongly correlated with online experimental outcomes than traditional offline metrics, especially in situations when we have incomplete relevance judgements.

Proceedings ArticleDOI
28 Jul 2013
TL;DR: An extension to the Latent Dirichlet Allocation (LDA) based document classification algorithm based on the combination of Expectation-Maximization (EM) algorithm and a naive Bayes classifier is presented.
Abstract: In this paper, we propose Latent Dirichlet Allocation (LDA) [1] based document classification algorithm which does not require any labeled dataset. In our algorithm, we construct a topic model using LDA, assign one topic to one of the class labels, aggregate all the same class label topics into a single topic using the aggregation property of the Dirichlet distribution and then automatically assign a class label to each unlabeled document depending on its "closeness" to one of the aggregated topics. We present an extension to our algorithm based on the combination of Expectation-Maximization (EM) algorithm and a naive Bayes classifier. We show effectiveness of our algorithm on three real world datasets.