scispace - formally typeset
Search or ask a question

Showing papers on "Document retrieval published in 2013"


Proceedings ArticleDOI
18 May 2013
TL;DR: A recommender (called Refoqus) based on machine learning is proposed, which is trained with a sample of queries and relevant results and automatically recommends a reformulation strategy that should improve its performance, based on the properties of the query.
Abstract: There are more than twenty distinct software engineering tasks addressed with text retrieval (TR) techniques, such as, traceability link recovery, feature location, refactoring, reuse, etc. A common issue with all TR applications is that the results of the retrieval depend largely on the quality of the query. When a query performs poorly, it has to be reformulated and this is a difficult task for someone who had trouble writing a good query in the first place. We propose a recommender (called Refoqus) based on machine learning, which is trained with a sample of queries and relevant results. Then, for a given query, it automatically recommends a reformulation strategy that should improve its performance, based on the properties of the query. We evaluated Refoqus empirically against four baseline approaches that are used in natural language document retrieval. The data used for the evaluation corresponds to changes from five open source systems in Java and C++ and it is used in the context of TR-based concept location in source code. Refoqus outperformed the baselines and its recommendations lead to query performance improvement or preservation in 84% of the cases (in average).

215 citations


Proceedings Article
11 Aug 2013
TL;DR: A type of Deep Boltzmann Machine that is suitable for extracting distributed semantic representations from a large unstructured collection of documents is introduced and it is shown that the model assigns better log probability to unseen data than the Replicated Softmax model.
Abstract: We introduce a type of Deep Boltzmann Machine (DBM) that is suitable for extracting distributed semantic representations from a large unstructured collection of documents. We overcome the apparent difficulty of training a DBM with judicious parameter tying. This enables an efficient pretraining algorithm and a state initialization scheme for fast inference. The model can be trained just as efficiently as a standard Restricted Boltzmann Machine. Our experiments show that the model assigns better log probability to unseen data than the Replicated Softmax model. Features extracted from our model outperform LDA, Replicated Softmax, and DocNADE models on document retrieval and document classification tasks.

123 citations


Proceedings ArticleDOI
28 Jul 2013
TL;DR: This paper proposes two complementary evaluation measures -- Reliability and Sensitivity -- for the generic Document Organization task which are derived from a proposed set of formal constraints (properties that any suitable measure must satisfy).
Abstract: A number of key Information Access tasks -- Document Retrieval, Clustering, Filtering, and their combinations -- can be seen as instances of a generic {\em document organization} problem that establishes priority and relatedness relationships between documents (in other words, a problem of forming and ranking clusters). As far as we know, no analysis has been made yet on the evaluation of these tasks from a global perspective. In this paper we propose two complementary evaluation measures -- Reliability and Sensitivity -- for the generic Document Organization task which are derived from a proposed set of formal constraints (properties that any suitable measure must satisfy). In addition to be the first measures that can be applied to any mixture of ranking, clustering and filtering tasks, Reliability and Sensitivity satisfy more formal constraints than previously existing evaluation metrics for each of the subsumed tasks. Besides their formal properties, its most salient feature from an empirical point of view is their strictness: a high score according to the harmonic mean of Reliability and Sensitivity ensures a high score with any of the most popular evaluation metrics in all the Document Retrieval, Clustering and Filtering datasets used in our experiments.

112 citations


Posted Content
TL;DR: A Deep Boltzmann Machine model suitable for modeling and extracting latent semantic representations from a large unstructured collection of documents is introduced and it is shown that the model assigns better log probability to unseen data than the Replicated Softmax model.
Abstract: We introduce a Deep Boltzmann Machine model suitable for modeling and extracting latent semantic representations from a large unstructured collection of documents. We overcome the apparent difficulty of training a DBM with judicious parameter tying. This parameter tying enables an efficient pretraining algorithm and a state initialization scheme that aids inference. The model can be trained just as efficiently as a standard Restricted Boltzmann Machine. Our experiments show that the model assigns better log probability to unseen data than the Replicated Softmax model. Features extracted from our model outperform LDA, Replicated Softmax, and DocNADE models on document retrieval and document classification tasks.

108 citations


Proceedings ArticleDOI
28 Jul 2013
TL;DR: A novel query change retrieval model (QCM), which utilizes syntactic editing changes between adjacent queries as well as the relationship between query change and previously retrieved documents to enhance session search.
Abstract: Session search is the Information Retrieval (IR) task that performs document retrieval for a search session. During a session, a user constantly modifies queries in order to find relevant documents that fulfill the information need. This paper proposes a novel query change retrieval model (QCM), which utilizes syntactic editing changes between adjacent queries as well as the relationship between query change and previously retrieved documents to enhance session search. We propose to model session search as a Markov Decision Process (MDP). We consider two agents in this MDP: the user agent and the search engine agent. The user agent's actions are query changes that we observe and the search agent's actions are proposed in this paper. Experiments show that our approach is highly effective and outperforms top session search systems in TREC 2011 and 2012.

96 citations


Proceedings ArticleDOI
01 Dec 2013
TL;DR: An approach for the text-to-image retrieval problem based on textual content present in images, where the retrieval performance is evaluated on public scene text datasets as well as three large datasets, namely IIIT scene text retrieval, Sports-10K and TV series-1M.
Abstract: We present an approach for the text-to-image retrieval problem based on textual content present in images. Given the recent developments in understanding text in images, an appealing approach to address this problem is to localize and recognize the text, and then query the database, as in a text retrieval problem. We show that such an approach, despite being based on state-of-the-art methods, is insufficient, and propose a method, where we do not rely on an exact localization and recognition pipeline. We take a query-driven search approach, where we find approximate locations of characters in the text query, and then impose spatial constraints to generate a ranked list of images in the database. The retrieval performance is evaluated on public scene text datasets as well as three large datasets, namely IIIT scene text retrieval, Sports-10K and TV series-1M, we introduce.

83 citations


Journal ArticleDOI
TL;DR: A research project on the effective retrieval of relevant historical cases from a case library using text mining techniques concludes that natural language based case document retrieval is superior to the case-based reasoning from structured case collection and is more practical for implementation in a construction management information system.

80 citations


Book
21 Mar 2013
TL;DR: This work focuses on the development of Topic-Based Language Models for Distributed Retrieval and their applications in the context of distributed information retrieval systems.
Abstract: Preface. Contributing Authors. 1. Combining Approaches to Information Retrieval W. Bruce Croft. 2. The Use of Exploratory Data Analysis in Information Retrieval Research W.R. Greiff. 3. Language Models for Relevance Feedback J.M. Ponte. 4. Topic Detection and Tracking: Event Clustering as a Basis for First Story Detection R. Papka, J. Allan. 5. Distributed Information Retrieval J. Callan. 6. Topic-Based Language Models for Distributed Retrieval J. Xu, B. Croft. 7. The Effect of Collection Organization and Query Locality on Information Retrieval System Performance Z. Lu, K.S. McKinley. 8. Cross-Language Retrieval via Transitive Translation L.A. Ballesteros. 9. Building, Testing, and Applying Concept Hierarchies M. Sanderson, D. Lawrie. 10. Appearance-Based Global Similarity Retrieval of Images S. Ravela, C. Luo. Index.

70 citations


Journal ArticleDOI
TL;DR: It is concluded that the proposed content-based text mining approach provides a promising solution to improve the current difficulty encountered in retrieval and reusability of vast CAD documents for the construction industry.

65 citations


Proceedings ArticleDOI
04 Feb 2013
TL;DR: This paper studies new algorithms and optimizations for Block-Max indexes that achieve significant performance gains over the work in [9], by implementing and comparing Block- Max oriented algorithms based on the well-known Maxscore and WAND approaches.
Abstract: Large web search engines use significant hardware and energy resources to process hundreds of millions of queries each day, and a lot of research has focused on how to improve query processing efficiency. One general class of optimizations called early termination techniques is used in all major engines, and essentially involves computing top results without an exhaustive traversal and scoring of all potentially relevant index entries. Recent work in [9,7] proposed several early termination algorithms for disjunctive top-k query processing, based on a new augmented index structure called Block-Max Index that enables aggressive skipping in the index.In this paper, we build on this work by studying new algorithms and optimizations for Block-Max indexes that achieve significant performance gains over the work in [9,7]. We start by implementing and comparing Block-Max oriented algorithms based on the well-known Maxscore and WAND approaches. Then we study how to build better Block-Max index structures and design better index-traversal strategies, resulting in new algorithms that achieve a factor of 2 speed-up over the best results in [9] with acceptable space overheads. We also describe and evaluate a hierarchical algorithm for a new recursive Block-Max index structure.

63 citations


Journal ArticleDOI
TL;DR: This paper proposes a new approach to collaborative profile recommendation using a hierarchical structure for user modeling in an information retrieval system a hierarchical user profile that is being recommended to a new user based on profiles of other, similar users.
Abstract: This paper proposes a new approach to collaborative profile recommendation using a hierarchical structure for user modeling. In an information retrieval system a hierarchical user profile, used to personalize the document retrieval process, is being recommended to a new user based on profiles of other, similar users. Using methodology from the Knowledge Integration domain, four criteria are defined and analyzed to complete the aim of recommendation: Reliability is required for maintaining the correct structure of the profile, O1 and O2 Optimality postulates are required to calculate the best output profile by minimizing distances to other profiles, and Conflict Solution is used to better represent situations inherent to profile recommendation. Based on those criteria, four algorithms are proposed: O1 and O2 algorithms and modified O1 and O2 algorithms. These algorithms are further analyzed to check if they provide good recommendation.

Journal ArticleDOI
TL;DR: This work gives new space/time tradeoffs for compressed indexes that answer document retrieval queries on general sequences, and gives new algorithms for document listing with frequencies and top-k document retrieval using just |CSA|+O(nlglglgD) bits.

Journal ArticleDOI
Xiaodong He1, Li Deng1
07 Mar 2013
TL;DR: This paper presents an optimization-oriented statistical framework for the overall system design where the interactions between the subsystems in tandem are fully incorporated and where design consistency is established between the optimization objectives and the end-to-end system performance metrics.
Abstract: Automatic speech recognition (ASR) is a central and common component of voice-driven information processing systems in human language technology, including spoken language translation (SLT), spoken language understanding (SLU), voice search, spoken document retrieval, and so on. Interfacing ASR with its downstream text-based processing tasks of translation, understanding, and information retrieval (IR) creates both challenges and opportunities in optimal design of the combined, speech-enabled systems. We present an optimization-oriented statistical framework for the overall system design where the interactions between the subsystems in tandem are fully incorporated and where design consistency is established between the optimization objectives and the end-to-end system performance metrics. Techniques for optimizing such objectives in both the decoding and learning phases of the speech-centric information processing (SCIP) system design are described, in which the uncertainty in speech recognition subsystem's outputs is fully considered and marginalized. This paper provides an overview of the past and current work in this area. Future challenges and new opportunities are also discussed and analyzed.

Book ChapterDOI
24 Jun 2013
TL;DR: The basic idea is to measure the distance between candidate concepts using the PMING distance, a collaborative semantic proximity measure, a measure which can be computed using statistical results from a web search engine, and show that the proposed technique can provide users with more satisfying expansion results and improve the quality of web document retrieval.
Abstract: In this work several semantic approaches to concept-based query expansion and re-ranking schemes are studied and compared with different ontology-based expansion methods in web document search and retrieval. In particular, we focus on concept-based query expansion schemes where, in order to effectively increase the precision of web document retrieval and to decrease the users’ browsing time, the main goal is to quickly provide users with the most suitable query expansion. Two key tasks for query expansion in web document retrieval are to find the expansion candidates, as the closest concepts in web document domain, and to rank the expanded queries properly. The approach we propose aims at improving the expansion phase for better web document retrieval and precision. The basic idea is to measure the distance between candidate concepts using the PMING distance, a collaborative semantic proximity measure, i.e. a measure which can be computed using statistical results from a web search engine. Experiments show that the proposed technique can provide users with more satisfying expansion results and improve the quality of web document retrieval.

DOI
15 May 2013
TL;DR: The results show the effectiveness of the approach to model the implicit knowledge in medical records search, whereby the infAP retrieval performance is significantly improved up to 14.43% over an effective concept-based representation baseline.
Abstract: Medical records search is challenging because of the inherent implicit knowledge within medical records and queries Such knowledge is known to the medical practitioners but may be hidden from a search system For example, when searching for the medical records of patients with a heart disease, medical practitioners commonly know that the medical records of patients taking the amiodarone medicine are relevant, since this drug is used to combat a heart disease In this paper, we argue that leveraging such implicit knowledge improves the retrieval effectiveness, since it provides new evidence to infer the relevance of medical records towards a query Specifically, using a novel concept-based representation for both medical records and queries, we expand the queries by inferring additional conceptual relationships from domain-specific resources as well as by extracting informative concepts from the top-ranked medical records We evaluate the retrieval effectiveness of our proposed approach in the context of the TREC 2011 and 2012 Medical Records track Our results show the effectiveness of our approach to model the implicit knowledge in medical records search, whereby the infAP retrieval performance is significantly improved up to 1443% over an effective concept-based representation baseline Moreover, our proposed approach could achieve retrieval effectiveness comparable to the performance of the best TREC 2011 and 2012 systems

Journal ArticleDOI
TL;DR: In this scheme,domain ontology is first constructed using the graph-based approach to automating construction of domain ontology GRAONTO proposed by the group, and query semantic extension and retrieval are then adopted for semantic-based knowledge retrieval.

Proceedings ArticleDOI
28 Jul 2013
TL;DR: This work presents a novel cluster ranking approach that utilizes Markov Random Fields (MRFs), and shows that it significantly outperforms state-of- the-art cluster ranking methods and can be used to improve the performance of results-diversification methods.
Abstract: An important challenge in cluster-based document retrieval is ranking document clusters by their relevance to the query. We present a novel cluster ranking approach that utilizes Markov Random Fields (MRFs). MRFs enable the integration of various types of cluster-relevance evidence; e.g., the query-similarity values of the cluster's documents and query-independent measures of the cluster. We use our method to re-rank an initially retrieved document list by ranking clusters that are created from the documents most highly ranked in the list. The resultant retrieval effectiveness is substantially better than that of the initial list for several lists that are produced by effective retrieval methods. Furthermore, our cluster ranking approach significantly outperforms state-of- the-art cluster ranking methods. We also show that our method can be used to improve the performance of (state-of- the-art) results-diversification methods.

Journal ArticleDOI
TL;DR: This paper proposes a unified formulation under a common notational framework for memory-based collaborative filtering, and a technique to use any text retrieval weighting function with collaborative filtering preference data and confirms the rationale of the framework.
Abstract: When speaking of information retrieval, we often mean text retrieval. But there exist many other forms of information retrieval applications. A typical example is collaborative filtering that suggests interesting items to a user by taking into account other users' preferences or tastes. Due to the uniqueness of the problem, it has been modeled and studied differently in the past, mainly drawing from the preference prediction and machine learning view point. A few attempts have yet been made to bring back collaborative filtering to information (text) retrieval modeling and subsequently new interesting collaborative filtering techniques have been thus derived. In this paper, we show that from the algorithmic view point, there is an even closer relationship between collaborative filtering and text retrieval. Specifically, major collaborative filtering algorithms, such as the memory-based, essentially calculate the dot product between the user vector (as the query vector in text retrieval) and the item rating vector (as the document vector in text retrieval). Thus, if we properly structure user preference data and employ the target user's ratings as query input, major text retrieval algorithms and systems can be directly used without any modification. In this regard, we propose a unified formulation under a common notational framework for memory-based collaborative filtering, and a technique to use any text retrieval weighting function with collaborative filtering preference data. Besides confirming the rationale of the framework, our preliminary experimental results have also demonstrated the effectiveness of the approach in using text retrieval models and systems to perform item ranking tasks in collaborative filtering.

Book
01 Jan 2013
TL;DR: The Query Complexity of Finding a Hidden Permutation, Bounds for Scheduling Jobs on Grid Processors, and a Survey of Algorithms and Models for List Update are presented.
Abstract: The Query Complexity of Finding a Hidden Permutation -- Bounds for Scheduling Jobs on Grid Processors -- Quake Heaps: A Simple Alternative to Fibonacci Heaps -- Variations on Instant Insanity -- A Simple Linear-Space Data Structure for Constant-Time Range -- Closing a Long-Standing Complexity Gap for Selection: V3(42) = 50 -- Frugal Streaming for Estimating Quantiles -- From Time to Space: Fast Algorithms That Yield Small and Fast Data -- Computing (and life) Is all About Tradeoffs: A Small Sample of Some Computational Tradeoffs -- A History of Distribution-Sensitive Data Structures -- A Survey on Priority Queues -- On Generalized Comparison-Based Sorting Problems -- A Survey of the Game “Lights Out!” -- Random Access to High-Order Entropy Compressed Text -- Succinct and Implicit Data Structures for Computational Geometry -- In Pursuit of the Dynamic Optimality Conjecture -- A Survey of Algorithms and Models for List Update -- Orthogonal Range Searching for Text Indexing -- A Survey of Data Structures in the Bitprobe Model -- Succinct Representations of Ordinal Trees -- Array Range Queries -- Indexes for Document Retrieval with Relevance.

Proceedings ArticleDOI
25 Aug 2013
TL;DR: A new system that assists people's reading activity by combining a wearable eye tracker, a see-through head mounted display, and an image based document retrieval engine is presented.
Abstract: We present a new system that assists people's reading activity by combining a wearable eye tracker, a see-through head mounted display, and an image based document retrieval engine. An image based document retrieval engine is used for identification of the reading document, whereas an eye tracker is used to detect which part of the document the reader is currently reading. The reader can refer to the glossary of the latest viewed key word by looking at the see-through head mounted display. This novel document reading assist application, which is the integration of a document retrieval system into an everyday reading scenario for the first time, enriches people's reading life. In this paper, we i) investigate the performance of the state-of-the-art image based document retrieval method using a wearable camera, ii) propose a method for identification of the word the reader is attendant, and iii) conduct pilot studies for evaluation of the system in this reading context. The results show the potential of a document retrieval system in combination with a gaze based user-oriented system.

Book ChapterDOI
17 Jun 2013
TL;DR: This paper shows how one of those indexes, the run-length compressed suffix array (RLCSA), can be extended to support document listing, and develops a new document listing technique for general collections that is of independent interest.
Abstract: Many document collections consist largely of repeated material, and several indexes have been designed to take advantage of this. There has been only preliminary work, however, on document retrieval for repetitive collections. In this paper we show how one of those indexes, the run-length compressed suffix array (RLCSA), can be extended to support document listing. In our experiments, our additional structures on top of the RLCSA can reduce the query time for document listing by an order of magnitude while still using total space that is only a fraction of the raw collection size. As a byproduct, we develop a new document listing technique for general collections that is of independent interest.

Journal ArticleDOI
TL;DR: This paper describes a Japanese spoken term detection method for spoken documents that robustly considers OOV words and mis-recognition, and proposes a distant n-gram indexing/retrieval method that incorporates a distance metric in a syllable lattice.

Journal ArticleDOI
TL;DR: An index for top-k most frequent document retrieval whose space is CSA, and its query time is O ( log k log 2 + ϵ n ) per reported document, improves over previous results for this problem.

Book ChapterDOI
01 Jan 2013
TL;DR: This work surveys recent and relevant past work on array range queries, which are of current interest in the field of data structures and have connections to computational geometry, compressed and succinct data structures, and other areas of computer science.
Abstract: Array range queries are of current interest in the field of data structures. Given an array of numbers or arbitrary elements, the general array range query problem is to build a data structure that can efficiently answer queries of a given type stated in terms of an interval of the indices. The specific query type might be for the minimum element in the range, the most frequently occurring element, or any of many other possibilities. In addition to being interesting in themselves, array range queries have connections to computational geometry, compressed and succinct data structures, and other areas of computer science. We survey recent and relevant past work on this class of problems.

Book ChapterDOI
09 Dec 2013
TL;DR: It is argued that users retain a number of snippets in an “active band” that shifts down the result page, and that reading and clicking activity tends to takes place within the band in a manner that is not strictly sequential.
Abstract: Search engine result pages – the ten blue links – are a staple of document retrieval services. The usual presumption is that users read these one-by-one from the top, making judgments about the usefulness of documents based on the snippets presented, accessing the underlying document when a snippet seems attractive, and then moving on to the next snippet. In this paper we re-examine this assumption, and present the results of a user experiment in which gaze-tracking is combined with click analysis. We conclude that in very general terms, users do indeed read from the top, but that at a detailed level there are complex behaviors evident, suggesting that a more sophisticated model of user interaction might be appropriate. In particular, we argue that users retain a number of snippets in an “active band” that shifts down the result page, and that reading and clicking activity tends to takes place within the band in a manner that is not strictly sequential.

Journal ArticleDOI
TL;DR: A model for user profile tuning in document retrieval systems is considered and methods for tuning the user profile based on analysis of user preferences dynamics are experimentally evaluated to check whether with growing history of user activity the created user profile can converge to his preferences.
Abstract: Modeling users' preferences and needs is one of the most important personalization tasks in information retrieval domain. In this paper a model for user profile tuning in document retrieval systems is considered. Methods for tuning the user profile based on analysis of user preferences dynamics are experimentally evaluated to check whether with growing history of user activity the created user profile can converge to his preferences. As the statistical analysis of series of simulations has shown, proposed methods of user profile actualization are effective in the sense that the distance between user preferences and his profile becomes smaller and smaller along with time.

Proceedings ArticleDOI
28 Jul 2013
TL;DR: This paper proposes to use query change as a new form of relevance feedback for better session search and shows that query change is a highly effective form of feedback as compared with existing relevance feedback methods.
Abstract: Session search is the Information Retrieval (IR) task that performs document retrieval for an entire session. During a session, users often change queries to explore and investigate the information needs. In this paper, we propose to use query change as a new form of relevance feedback for better session search. Evaluation conducted over TREC 2012 Session Track shows that query change is a highly effective form of feedback as compared with existing relevance feedback methods. The proposed method outperforms the state-of-the-art relevance feedback methods for the TREC 2012 Session Track by a significant improvement of >25%.

Journal ArticleDOI
TL;DR: A new concept-based retrieval model that can effectively discriminate between unimportant terms with respect to sentence semantics and terms which represent the concepts that capture the sentence meaning is introduced.
Abstract: Most of the common techniques in text retrieval are based on the statistical analysis terms (words or phrases). Statistical analysis of term frequency captures the importance of the term within a document only. Thus, to achieve a more accurate analysis, the underlying model should indicate terms that capture the semantics of text. In this case, the model can capture terms that represent the concepts of the sentence, which leads to discovering the topic of the document. In this paper, a new concept-based retrieval model is introduced. The proposed concept-based retrieval model consists of conceptual ontological graph (COG) representation and concept-based weighting scheme. The COG representation captures the semantic structure of each term within a sentence. Then, all the terms are placed in the COG representation according to their contribution to the meaning of the sentence. The concept-based weighting analyzes terms at the sentence and document levels. This is different from the classical approach of analyzing terms at the document level only. The weighted terms are then ranked, and the top concepts are used to build a concept-based document index for text retrieval. The concept-based retrieval model can effectively discriminate between unimportant terms with respect to sentence semantics and terms which represent the concepts that capture the sentence meaning. Experiments using the proposed concept-based retrieval model on different data sets in text retrieval are conducted. The experiments provide comparison between traditional approaches and the concept-based retrieval model obtained by the combined approach of the conceptual ontological graph and the concept-based weighting scheme. The evaluation of results is performed using three quality measures, the preference measure (bpref), precision at 10 documents retrieved (P(10)) and the mean uninterpolated average precision (MAP). All of these quality measures are improved when the newly developed concept-based retrieval model is used, confirming that such model enhances the quality of text retrieval.

Proceedings Article
01 Oct 2013
TL;DR: A keyword extraction process, based on the PageRank algorithm, to reduce noise of input data for measuring semantic similarity and experimental results showed significantly improved document retrieval performance with this extraction process in place.
Abstract: This paper proposes a keyword extraction process, based on the PageRank algorithm, to reduce noise of input data for measuring semantic similarity. This paper will introduce several features related to implementation and discuss their effects. It will also discuss experimental results which showed significantly improved document retrieval performance with this extraction process in place.

Proceedings ArticleDOI
16 Nov 2013
TL;DR: Experimental evaluation on real Google search snippets shows that this approach outperforms the traditional BOW method and gives good performance, and can be easily implemented with low cost.
Abstract: Short text classification is a difficult and challenging task in information retrieval systems since the text data is short, sparse and multidimensional In this paper, we represent short text with Wikipedia concepts for classification Short document text is mapped to Wikipedia concepts and the concepts are then used to represent document for text categorization Traditional methods for classification such as SVM can be used to perform text categorization on the Wikipedia concept document representation Experimental evaluation on real Google search snippets shows that our approach outperforms the traditional BOW method and gives good performance Although it's not better than the state-of-the-art classifier (see eg Phan et al WWW '08), our method can be easily implemented with low cost