Showing papers on "Document retrieval published in 2013"

PDF

Open Access

Proceedings Article•DOI•

Automatic query reformulations for text retrieval in software engineering

[...]

Sonia Haiduc¹, Gabriele Bavota², Andrian Marcus¹, Rocco Oliveto³, Andrea De Lucia², Tim Menzies⁴ - Show less +2 more•Institutions (4)

Wayne State University¹, University of Salerno², University of Molise³, University of West Virginia⁴

18 May 2013

TL;DR: A recommender (called Refoqus) based on machine learning is proposed, which is trained with a sample of queries and relevant results and automatically recommends a reformulation strategy that should improve its performance, based on the properties of the query.

...read moreread less

Abstract: There are more than twenty distinct software engineering tasks addressed with text retrieval (TR) techniques, such as, traceability link recovery, feature location, refactoring, reuse, etc. A common issue with all TR applications is that the results of the retrieval depend largely on the quality of the query. When a query performs poorly, it has to be reformulated and this is a difficult task for someone who had trouble writing a good query in the first place. We propose a recommender (called Refoqus) based on machine learning, which is trained with a sample of queries and relevant results. Then, for a given query, it automatically recommends a reformulation strategy that should improve its performance, based on the properties of the query. We evaluated Refoqus empirically against four baseline approaches that are used in natural language document retrieval. The data used for the evaluation corresponds to changes from five open source systems in Java and C++ and it is used in the context of TR-based concept location in source code. Refoqus outperformed the baselines and its recommendations lead to query performance improvement or preservation in 84% of the cases (in average).

...read moreread less

215 citations

Proceedings Article•

Modeling documents with a Deep Boltzmann Machine

[...]

Nitish Srivastava¹, Ruslan Salakhutdinov¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

11 Aug 2013

TL;DR: A type of Deep Boltzmann Machine that is suitable for extracting distributed semantic representations from a large unstructured collection of documents is introduced and it is shown that the model assigns better log probability to unseen data than the Replicated Softmax model.

...read moreread less

Abstract: We introduce a type of Deep Boltzmann Machine (DBM) that is suitable for extracting distributed semantic representations from a large unstructured collection of documents. We overcome the apparent difficulty of training a DBM with judicious parameter tying. This enables an efficient pretraining algorithm and a state initialization scheme for fast inference. The model can be trained just as efficiently as a standard Restricted Boltzmann Machine. Our experiments show that the model assigns better log probability to unseen data than the Replicated Softmax model. Features extracted from our model outperform LDA, Replicated Softmax, and DocNADE models on document retrieval and document classification tasks.

...read moreread less

123 citations

Proceedings Article•DOI•

A general evaluation measure for document organization tasks

[...]

Enrique Amigó¹, Julio Gonzalo¹, Felisa Verdejo¹•Institutions (1)

National University of Distance Education¹

28 Jul 2013

TL;DR: This paper proposes two complementary evaluation measures -- Reliability and Sensitivity -- for the generic Document Organization task which are derived from a proposed set of formal constraints (properties that any suitable measure must satisfy).

...read moreread less

Abstract: A number of key Information Access tasks -- Document Retrieval, Clustering, Filtering, and their combinations -- can be seen as instances of a generic {\em document organization} problem that establishes priority and relatedness relationships between documents (in other words, a problem of forming and ranking clusters). As far as we know, no analysis has been made yet on the evaluation of these tasks from a global perspective. In this paper we propose two complementary evaluation measures -- Reliability and Sensitivity -- for the generic Document Organization task which are derived from a proposed set of formal constraints (properties that any suitable measure must satisfy). In addition to be the first measures that can be applied to any mixture of ranking, clustering and filtering tasks, Reliability and Sensitivity satisfy more formal constraints than previously existing evaluation metrics for each of the subsumed tasks. Besides their formal properties, its most salient feature from an empirical point of view is their strictness: a high score according to the harmonic mean of Reliability and Sensitivity ensures a high score with any of the most popular evaluation metrics in all the Document Retrieval, Clustering and Filtering datasets used in our experiments.

...read moreread less

112 citations

Posted Content•

Modeling Documents with Deep Boltzmann Machines

[...]

Nitish Srivastava, Ruslan Salakhutdinov, Geoffrey E. Hinton

26 Sep 2013-arXiv: Learning

TL;DR: A Deep Boltzmann Machine model suitable for modeling and extracting latent semantic representations from a large unstructured collection of documents is introduced and it is shown that the model assigns better log probability to unseen data than the Replicated Softmax model.

...read moreread less

Abstract: We introduce a Deep Boltzmann Machine model suitable for modeling and extracting latent semantic representations from a large unstructured collection of documents. We overcome the apparent difficulty of training a DBM with judicious parameter tying. This parameter tying enables an efficient pretraining algorithm and a state initialization scheme that aids inference. The model can be trained just as efficiently as a standard Restricted Boltzmann Machine. Our experiments show that the model assigns better log probability to unseen data than the Replicated Softmax model. Features extracted from our model outperform LDA, Replicated Softmax, and DocNADE models on document retrieval and document classification tasks.

...read moreread less

108 citations

Proceedings Article•DOI•

Utilizing query change for session search

[...]

Dongyi Guan¹, Sicong Zhang¹, Hui Yang¹•Institutions (1)

Georgetown University¹

28 Jul 2013

TL;DR: A novel query change retrieval model (QCM), which utilizes syntactic editing changes between adjacent queries as well as the relationship between query change and previously retrieved documents to enhance session search.

...read moreread less

Abstract: Session search is the Information Retrieval (IR) task that performs document retrieval for a search session. During a session, a user constantly modifies queries in order to find relevant documents that fulfill the information need. This paper proposes a novel query change retrieval model (QCM), which utilizes syntactic editing changes between adjacent queries as well as the relationship between query change and previously retrieved documents to enhance session search. We propose to model session search as a Markov Decision Process (MDP). We consider two agents in this MDP: the user agent and the search engine agent. The user agent's actions are query changes that we observe and the search agent's actions are proposed in this paper. Experiments show that our approach is highly effective and outperforms top session search systems in TREC 2011 and 2012.

...read moreread less

96 citations

Proceedings Article•DOI•

Image Retrieval Using Textual Cues

[...]

Anand Mishra¹, Karteek Alahari², C. V. Jawahar¹•Institutions (2)

International Institute of Information Technology, Hyderabad¹, French Institute for Research in Computer Science and Automation²

01 Dec 2013

TL;DR: An approach for the text-to-image retrieval problem based on textual content present in images, where the retrieval performance is evaluated on public scene text datasets as well as three large datasets, namely IIIT scene text retrieval, Sports-10K and TV series-1M.

...read moreread less

Abstract: We present an approach for the text-to-image retrieval problem based on textual content present in images. Given the recent developments in understanding text in images, an appealing approach to address this problem is to localize and recognize the text, and then query the database, as in a text retrieval problem. We show that such an approach, despite being based on state-of-the-art methods, is insufficient, and propose a method, where we do not rely on an exact localization and recognition pipeline. We take a query-driven search approach, where we find approximate locations of characters in the text query, and then impose spatial constraints to generate a ranked list of images in the database. The retrieval performance is evaluated on public scene text datasets as well as three large datasets, namely IIIT scene text retrieval, Sports-10K and TV series-1M, we introduce.

...read moreread less

83 citations

Journal Article•DOI•

[...]

Hongqin Fan¹, Heng Li¹•Institutions (1)

Hong Kong Polytechnic University¹

01 Sep 2013-Automation in Construction

TL;DR: A research project on the effective retrieval of relevant historical cases from a case library using text mining techniques concludes that natural language based case document retrieval is superior to the case-based reasoning from structured case collection and is more practical for implementation in a construction management information system.

...read moreread less

80 citations

Book•

Advances in Information Retrieval: Recent Research from the Center for Intelligent Information Retrieval

[...]

W. Bruce Croft

21 Mar 2013

TL;DR: This work focuses on the development of Topic-Based Language Models for Distributed Retrieval and their applications in the context of distributed information retrieval systems.

...read moreread less

Abstract: Preface. Contributing Authors. 1. Combining Approaches to Information Retrieval W. Bruce Croft. 2. The Use of Exploratory Data Analysis in Information Retrieval Research W.R. Greiff. 3. Language Models for Relevance Feedback J.M. Ponte. 4. Topic Detection and Tracking: Event Clustering as a Basis for First Story Detection R. Papka, J. Allan. 5. Distributed Information Retrieval J. Callan. 6. Topic-Based Language Models for Distributed Retrieval J. Xu, B. Croft. 7. The Effect of Collection Organization and Query Locality on Information Retrieval System Performance Z. Lu, K.S. McKinley. 8. Cross-Language Retrieval via Transitive Translation L.A. Ballesteros. 9. Building, Testing, and Applying Concept Hierarchies M. Sanderson, D. Lawrie. 10. Appearance-Based Global Similarity Retrieval of Images S. Ravela, C. Luo. Index.

...read moreread less

70 citations

Journal Article•DOI•

Content-based text mining technique for retrieval of CAD documents

[...]

Wen-der Yu¹, Jia-yang Hsu¹•Institutions (1)

Chung Hua University¹

01 May 2013-Automation in Construction

TL;DR: It is concluded that the proposed content-based text mining approach provides a promising solution to improve the current difficulty encountered in retrieval and reusability of vast CAD documents for the construction industry.

...read moreread less

65 citations

Proceedings Article•DOI•

Optimizing top-k document retrieval strategies for block-max indexes

[...]

Constantinos Dimopoulos¹, Sergey Nepomnyachiy¹, Torsten Suel¹•Institutions (1)

New York University¹

04 Feb 2013

TL;DR: This paper studies new algorithms and optimizations for Block-Max indexes that achieve significant performance gains over the work in [9], by implementing and comparing Block- Max oriented algorithms based on the well-known Maxscore and WAND approaches.

...read moreread less

Abstract: Large web search engines use significant hardware and energy resources to process hundreds of millions of queries each day, and a lot of research has focused on how to improve query processing efficiency. One general class of optimizations called early termination techniques is used in all major engines, and essentially involves computing top results without an exhaustive traversal and scoring of all potentially relevant index entries. Recent work in [9,7] proposed several early termination algorithms for disjunctive top-k query processing, based on a new augmented index structure called Block-Max Index that enables aggressive skipping in the index.In this paper, we build on this work by studying new algorithms and optimizations for Block-Max indexes that achieve significant performance gains over the work in [9,7]. We start by implementing and comparing Block-Max oriented algorithms based on the well-known Maxscore and WAND approaches. Then we study how to build better Block-Max index structures and design better index-traversal strategies, resulting in new algorithms that achieve a factor of 2 speed-up over the best results in [9] with acceptable space overheads. We also describe and evaluate a hierarchical algorithm for a new recursive Block-Max index structure.

...read moreread less

63 citations

Journal Article•DOI•

A method for collaborative recommendation using knowledge integration tools and hierarchical structure of user profiles

[...]

Marcin Maleszka¹, Bernadetta Mianowska¹, Ngoc Thanh Nguyen¹•Institutions (1)

Wrocław University of Technology¹

01 Jul 2013-Knowledge Based Systems

TL;DR: This paper proposes a new approach to collaborative profile recommendation using a hierarchical structure for user modeling in an information retrieval system a hierarchical user profile that is being recommended to a new user based on profiles of other, similar users.

...read moreread less

Abstract: This paper proposes a new approach to collaborative profile recommendation using a hierarchical structure for user modeling. In an information retrieval system a hierarchical user profile, used to personalize the document retrieval process, is being recommended to a new user based on profiles of other, similar users. Using methodology from the Knowledge Integration domain, four criteria are defined and analyzed to complete the aim of recommendation: Reliability is required for maintaining the correct structure of the profile, O1 and O2 Optimality postulates are required to calculate the best output profile by minimizing distances to other profiles, and Conflict Solution is used to better represent situations inherent to profile recommendation. Based on those criteria, four algorithms are proposed: O1 and O2 algorithms and modified O1 and O2 algorithms. These algorithms are further analyzed to check if they provide good recommendation.

...read moreread less

Journal Article•DOI•

Improved compressed indexes for full-text document retrieval

[...]

Djamal Belazzougui¹, Gonzalo Navarro², Daniel Valenzuela²•Institutions (2)

Paris Diderot University¹, University of Chile²

01 Jan 2013-Journal of Discrete Algorithms

TL;DR: This work gives new space/time tradeoffs for compressed indexes that answer document retrieval queries on general sequences, and gives new algorithms for document listing with frequencies and top-k document retrieval using just |CSA|+O(nlglglgD) bits.

...read moreread less

Journal Article•DOI•

Speech-Centric Information Processing: An Optimization-Oriented Approach

[...]

Xiaodong He¹, Li Deng¹•Institutions (1)

Microsoft¹

07 Mar 2013

TL;DR: This paper presents an optimization-oriented statistical framework for the overall system design where the interactions between the subsystems in tandem are fully incorporated and where design consistency is established between the optimization objectives and the end-to-end system performance metrics.

...read moreread less

Abstract: Automatic speech recognition (ASR) is a central and common component of voice-driven information processing systems in human language technology, including spoken language translation (SLT), spoken language understanding (SLU), voice search, spoken document retrieval, and so on. Interfacing ASR with its downstream text-based processing tasks of translation, understanding, and information retrieval (IR) creates both challenges and opportunities in optimal design of the combined, speech-enabled systems. We present an optimization-oriented statistical framework for the overall system design where the interactions between the subsystems in tandem are fully incorporated and where design consistency is established between the optimization objectives and the end-to-end system performance metrics. Techniques for optimizing such objectives in both the decoding and learning phases of the speech-centric information processing (SCIP) system design are described, in which the uncertainty in speech recognition subsystem's outputs is fully considered and marginalized. This paper provides an overview of the past and current work in this area. Future challenges and new opportunities are also discussed and analyzed.

...read moreread less

Book Chapter•DOI•

Collective evolutionary concept distance based query expansion for effective web document retrieval

[...]

Clement H. C. Leung¹, Yuanxi Li¹, Alfredo Milani², Valentina Franzoni²•Institutions (2)

Hong Kong Baptist University¹, University of Perugia²

24 Jun 2013

TL;DR: The basic idea is to measure the distance between candidate concepts using the PMING distance, a collaborative semantic proximity measure, a measure which can be computed using statistical results from a web search engine, and show that the proposed technique can provide users with more satisfying expansion results and improve the quality of web document retrieval.

...read moreread less

Abstract: In this work several semantic approaches to concept-based query expansion and re-ranking schemes are studied and compared with different ontology-based expansion methods in web document search and retrieval. In particular, we focus on concept-based query expansion schemes where, in order to effectively increase the precision of web document retrieval and to decrease the users’ browsing time, the main goal is to quickly provide users with the most suitable query expansion. Two key tasks for query expansion in web document retrieval are to find the expansion candidates, as the closest concepts in web document domain, and to rank the expanded queries properly. The approach we propose aims at improving the expansion phase for better web document retrieval and precision. The basic idea is to measure the distance between candidate concepts using the PMING distance, a collaborative semantic proximity measure, i.e. a measure which can be computed using statistical results from a web search engine. Experiments show that the proposed technique can provide users with more satisfying expansion results and improve the quality of web document retrieval.

...read moreread less

DOI•

Inferring conceptual relationships to improve medical records search

[...]

Nut Limsopatham¹, Craig Macdonald¹, Iadh Ounis¹•Institutions (1)

University of Glasgow¹

15 May 2013

TL;DR: The results show the effectiveness of the approach to model the implicit knowledge in medical records search, whereby the infAP retrieval performance is significantly improved up to 14.43% over an effective concept-based representation baseline.

...read moreread less

Abstract: Medical records search is challenging because of the inherent implicit knowledge within medical records and queries Such knowledge is known to the medical practitioners but may be hidden from a search system For example, when searching for the medical records of patients with a heart disease, medical practitioners commonly know that the medical records of patients taking the amiodarone medicine are relevant, since this drug is used to combat a heart disease In this paper, we argue that leveraging such implicit knowledge improves the retrieval effectiveness, since it provides new evidence to infer the relevance of medical records towards a query Specifically, using a novel concept-based representation for both medical records and queries, we expand the queries by inferring additional conceptual relationships from domain-specific resources as well as by extracting informative concepts from the top-ranked medical records We evaluate the retrieval effectiveness of our proposed approach in the context of the TREC 2011 and 2012 Medical Records track Our results show the effectiveness of our approach to model the implicit knowledge in medical records search, whereby the infAP retrieval performance is significantly improved up to 1443% over an effective concept-based representation baseline Moreover, our proposed approach could achieve retrieval effectiveness comparable to the performance of the best TREC 2011 and 2012 systems

...read moreread less

Journal Article•DOI•

Ontology-based semantic retrieval for engineering domain knowledge

[...]

Xutang Zhang¹, Xin Hou¹, Xiaofeng Chen¹, Ting Zhuang¹•Institutions (1)

Harbin Institute of Technology¹

20 Sep 2013-Neurocomputing

TL;DR: In this scheme,domain ontology is first constructed using the graph-based approach to automating construction of domain ontology GRAONTO proposed by the group, and query semantic extension and retrieval are then adopted for semantic-based knowledge retrieval.

...read moreread less

Proceedings Article•DOI•

Ranking document clusters using markov random fields

[...]

Fiana Raiber¹, Oren Kurland¹•Institutions (1)

Technion – Israel Institute of Technology¹

28 Jul 2013

TL;DR: This work presents a novel cluster ranking approach that utilizes Markov Random Fields (MRFs), and shows that it significantly outperforms state-of- the-art cluster ranking methods and can be used to improve the performance of results-diversification methods.

...read moreread less

Abstract: An important challenge in cluster-based document retrieval is ranking document clusters by their relevance to the query. We present a novel cluster ranking approach that utilizes Markov Random Fields (MRFs). MRFs enable the integration of various types of cluster-relevance evidence; e.g., the query-similarity values of the cluster's documents and query-independent measures of the cluster. We use our method to re-rank an initially retrieved document list by ranking clusters that are created from the documents most highly ranked in the list. The resultant retrieval effectiveness is substantially better than that of the initial list for several lists that are produced by effective retrieval methods. Furthermore, our cluster ranking approach significantly outperforms state-of- the-art cluster ranking methods. We also show that our method can be used to improve the performance of (state-of- the-art) results-diversification methods.

...read moreread less

Journal Article•DOI•

Bridging memory-based collaborative filtering and text retrieval

[...]

Alejandro Bellogín¹, Jun Wang², Pablo Castells¹•Institutions (2)

Autonomous University of Madrid¹, University College London²

01 Dec 2013-Information Retrieval

TL;DR: This paper proposes a unified formulation under a common notational framework for memory-based collaborative filtering, and a technique to use any text retrieval weighting function with collaborative filtering preference data and confirms the rationale of the framework.

...read moreread less

Abstract: When speaking of information retrieval, we often mean text retrieval. But there exist many other forms of information retrieval applications. A typical example is collaborative filtering that suggests interesting items to a user by taking into account other users' preferences or tastes. Due to the uniqueness of the problem, it has been modeled and studied differently in the past, mainly drawing from the preference prediction and machine learning view point. A few attempts have yet been made to bring back collaborative filtering to information (text) retrieval modeling and subsequently new interesting collaborative filtering techniques have been thus derived. In this paper, we show that from the algorithmic view point, there is an even closer relationship between collaborative filtering and text retrieval. Specifically, major collaborative filtering algorithms, such as the memory-based, essentially calculate the dot product between the user vector (as the query vector in text retrieval) and the item rating vector (as the document vector in text retrieval). Thus, if we properly structure user preference data and employ the target user's ratings as query input, major text retrieval algorithms and systems can be directly used without any modification. In this regard, we propose a unified formulation under a common notational framework for memory-based collaborative filtering, and a technique to use any text retrieval weighting function with collaborative filtering preference data. Besides confirming the rationale of the framework, our preliminary experimental results have also demonstrated the effectiveness of the approach in using text retrieval models and systems to perform item ranking tasks in collaborative filtering.

...read moreread less

Book•

Space-Efficient Data Structures, Streams, and Algorithms Papers in Honor of J. Ian Munro, on the Occasion of His 66th Birthday

[...]

Andrej Brodnik, Alejandro López-Ortiz, Venkatesh Raman, Alfredo Viola

01 Jan 2013

TL;DR: The Query Complexity of Finding a Hidden Permutation, Bounds for Scheduling Jobs on Grid Processors, and a Survey of Algorithms and Models for List Update are presented.

...read moreread less

Abstract: The Query Complexity of Finding a Hidden Permutation -- Bounds for Scheduling Jobs on Grid Processors -- Quake Heaps: A Simple Alternative to Fibonacci Heaps -- Variations on Instant Insanity -- A Simple Linear-Space Data Structure for Constant-Time Range -- Closing a Long-Standing Complexity Gap for Selection: V3(42) = 50 -- Frugal Streaming for Estimating Quantiles -- From Time to Space: Fast Algorithms That Yield Small and Fast Data -- Computing (and life) Is all About Tradeoffs: A Small Sample of Some Computational Tradeoffs -- A History of Distribution-Sensitive Data Structures -- A Survey on Priority Queues -- On Generalized Comparison-Based Sorting Problems -- A Survey of the Game “Lights Out!” -- Random Access to High-Order Entropy Compressed Text -- Succinct and Implicit Data Structures for Computational Geometry -- In Pursuit of the Dynamic Optimality Conjecture -- A Survey of Algorithms and Models for List Update -- Orthogonal Range Searching for Text Indexing -- A Survey of Data Structures in the Bitprobe Model -- Succinct Representations of Ordinal Trees -- Array Range Queries -- Indexes for Document Retrieval with Relevance.

...read moreread less

Proceedings Article•DOI•

Wearable Reading Assist System: Augmented Reality Document Combining Document Retrieval and Eye Tracking

[...]

Takumi Toyama, Andreas Dengel, Wakana Suzuki¹, Koichi Kise¹•Institutions (1)

Osaka Prefecture University¹

25 Aug 2013

TL;DR: A new system that assists people's reading activity by combining a wearable eye tracker, a see-through head mounted display, and an image based document retrieval engine is presented.

...read moreread less

Abstract: We present a new system that assists people's reading activity by combining a wearable eye tracker, a see-through head mounted display, and an image based document retrieval engine. An image based document retrieval engine is used for identification of the reading document, whereas an eye tracker is used to detect which part of the document the reader is currently reading. The reader can refer to the glossary of the latest viewed key word by looking at the see-through head mounted display. This novel document reading assist application, which is the integration of a document retrieval system into an everyday reading scenario for the first time, enriches people's reading life. In this paper, we i) investigate the performance of the state-of-the-art image based document retrieval method using a wearable camera, ii) propose a method for identification of the word the reader is attendant, and iii) conduct pilot studies for evaluation of the system in this reading context. The results show the potential of a document retrieval system in combination with a gaze based user-oriented system.

...read moreread less

Book Chapter•DOI•

Document Listing on Repetitive Collections

[...]

Travis Gagie¹, Kalle Karhu², Gonzalo Navarro³, Simon J. Puglisi¹, Jouni Sirén³ - Show less +1 more•Institutions (3)

University of Helsinki¹, Aalto University², University of Chile³

17 Jun 2013

TL;DR: This paper shows how one of those indexes, the run-length compressed suffix array (RLCSA), can be extended to support document listing, and develops a new document listing technique for general collections that is of independent interest.

...read moreread less

Abstract: Many document collections consist largely of repeated material, and several indexes have been designed to take advantage of this. There has been only preliminary work, however, on document retrieval for repetitive collections. In this paper we show how one of those indexes, the run-length compressed suffix array (RLCSA), can be extended to support document listing. In our experiments, our additional structures on top of the RLCSA can reduce the query time for document listing by an order of magnitude while still using total space that is only a fraction of the raw collection size. As a byproduct, we develop a new document listing technique for general collections that is of independent interest.

...read moreread less

Journal Article•DOI•

A robust/fast spoken term detection method based on a syllable n-gram index with a distance metric

[...]

Seiichi Nakagawa¹, Keisuke Iwami¹, Yasuhisa Fujii¹, Kazumasa Yamamoto¹•Institutions (1)

Toyohashi University of Technology¹

01 Mar 2013-Speech Communication

TL;DR: This paper describes a Japanese spoken term detection method for spoken documents that robustly considers OOV words and mis-recognition, and proposes a distant n-gram indexing/retrieval method that incorporates a distance metric in a syllable lattice.

...read moreread less

Journal Article•DOI•

Top-k document retrieval in optimal space

[...]

Dekel Tsur¹•Institutions (1)

Ben-Gurion University of the Negev¹

30 Jun 2013-Information Processing Letters

TL;DR: An index for top-k most frequent document retrieval whose space is CSA, and its query time is O ( log k log 2 + ϵ n ) per reported document, improves over previous results for this problem.

...read moreread less

Book Chapter•DOI•

Array Range Queries

[...]

Matthew Skala¹•Institutions (1)

University of Manitoba¹

01 Jan 2013

TL;DR: This work surveys recent and relevant past work on array range queries, which are of current interest in the field of data structures and have connections to computational geometry, compressed and succinct data structures, and other areas of computer science.

...read moreread less

Abstract: Array range queries are of current interest in the field of data structures. Given an array of numbers or arbitrary elements, the general array range query problem is to build a data structure that can efficiently answer queries of a given type stated in terms of an interval of the indices. The specific query type might be for the minimum element in the range, the most frequently occurring element, or any of many other possibilities. In addition to being interesting in themselves, array range queries have connections to computational geometry, compressed and succinct data structures, and other areas of computer science. We survey recent and relevant past work on this class of problems.

...read moreread less

Book Chapter•DOI•

What Users Do: The Eyes Have It

[...]

Paul Thomas¹, Falk Scholer², Alistair Moffat³•Institutions (3)

Australian National University¹, RMIT University², University of Melbourne³

09 Dec 2013

TL;DR: It is argued that users retain a number of snippets in an “active band” that shifts down the result page, and that reading and clicking activity tends to takes place within the band in a manner that is not strictly sequential.

...read moreread less

Abstract: Search engine result pages – the ten blue links – are a staple of document retrieval services. The usual presumption is that users read these one-by-one from the top, making judgments about the usefulness of documents based on the snippets presented, accessing the underlying document when a snippet seems attractive, and then moving on to the next snippet. In this paper we re-examine this assumption, and present the results of a user experiment in which gaze-tracking is combined with click analysis. We conclude that in very general terms, users do indeed read from the top, but that at a detailed level there are complex behaviors evident, suggesting that a more sophisticated model of user interaction might be appropriate. In particular, we argue that users retain a number of snippets in an “active band” that shifts down the result page, and that reading and clicking activity tends to takes place within the band in a manner that is not strictly sequential.

...read moreread less

Journal Article•DOI•

Tuning user profiles based on analyzing dynamic preference in document retrieval systems

[...]

Bernadetta Mianowska¹, Ngoc Thanh Nguyen¹•Institutions (1)

Wrocław University of Technology¹

01 Jul 2013-Multimedia Tools and Applications

TL;DR: A model for user profile tuning in document retrieval systems is considered and methods for tuning the user profile based on analysis of user preferences dynamics are experimentally evaluated to check whether with growing history of user activity the created user profile can converge to his preferences.

...read moreread less

Abstract: Modeling users' preferences and needs is one of the most important personalization tasks in information retrieval domain. In this paper a model for user profile tuning in document retrieval systems is considered. Methods for tuning the user profile based on analysis of user preferences dynamics are experimentally evaluated to check whether with growing history of user activity the created user profile can converge to his preferences. As the statistical analysis of series of simulations has shown, proposed methods of user profile actualization are effective in the sense that the distance between user preferences and his profile becomes smaller and smaller along with time.

...read moreread less

Proceedings Article•DOI•

Query change as relevance feedback in session search

[...]

Sicong Zhang¹, Dongyi Guan¹, Hui Yang¹•Institutions (1)

Georgetown University¹

28 Jul 2013

TL;DR: This paper proposes to use query change as a new form of relevance feedback for better session search and shows that query change is a highly effective form of feedback as compared with existing relevance feedback methods.

...read moreread less

Abstract: Session search is the Information Retrieval (IR) task that performs document retrieval for an entire session. During a session, users often change queries to explore and investigate the information needs. In this paper, we propose to use query change as a new form of relevance feedback for better session search. Evaluation conducted over TREC 2012 Session Track shows that query change is a highly effective form of feedback as compared with existing relevance feedback methods. The proposed method outperforms the state-of-the-art relevance feedback methods for the TREC 2012 Session Track by a significant improvement of >25%.

...read moreread less

Journal Article•DOI•

An efficient concept-based retrieval model for enhancing text retrieval quality

[...]

Shady Shehata, Fakhri Karray¹, Mohamed S. Kamel¹•Institutions (1)

University of Waterloo¹

01 May 2013-Knowledge and Information Systems

TL;DR: A new concept-based retrieval model that can effectively discriminate between unimportant terms with respect to sentence semantics and terms which represent the concepts that capture the sentence meaning is introduced.

...read moreread less

Abstract: Most of the common techniques in text retrieval are based on the statistical analysis terms (words or phrases). Statistical analysis of term frequency captures the importance of the term within a document only. Thus, to achieve a more accurate analysis, the underlying model should indicate terms that capture the semantics of text. In this case, the model can capture terms that represent the concepts of the sentence, which leads to discovering the topic of the document. In this paper, a new concept-based retrieval model is introduced. The proposed concept-based retrieval model consists of conceptual ontological graph (COG) representation and concept-based weighting scheme. The COG representation captures the semantic structure of each term within a sentence. Then, all the terms are placed in the COG representation according to their contribution to the meaning of the sentence. The concept-based weighting analyzes terms at the sentence and document levels. This is different from the classical approach of analyzing terms at the document level only. The weighted terms are then ranked, and the top concepts are used to build a concept-based document index for text retrieval. The concept-based retrieval model can effectively discriminate between unimportant terms with respect to sentence semantics and terms which represent the concepts that capture the sentence meaning. Experiments using the proposed concept-based retrieval model on different data sets in text retrieval are conducted. The experiments provide comparison between traditional approaches and the concept-based retrieval model obtained by the combined approach of the conceptual ontological graph and the concept-based weighting scheme. The evaluation of results is performed using three quality measures, the preference measure (bpref), precision at 10 documents retrieved (P(10)) and the mean uninterpolated average precision (MAP). All of these quality measures are improved when the newly developed concept-based retrieval model is used, confirming that such model enhances the quality of text retrieval.

...read moreread less

Proceedings Article•

Applying Graph-based Keyword Extraction to Document Retrieval

[...]

Youngsam Kim, Munhyong Kim¹, Andrew Cattle, Julia Otmakhova, Suzi Park¹, Hyopil Shin¹ - Show less +2 more•Institutions (1)

Seoul National University¹

01 Oct 2013

TL;DR: A keyword extraction process, based on the PageRank algorithm, to reduce noise of input data for measuring semantic similarity and experimental results showed significantly improved document retrieval performance with this extraction process in place.

...read moreread less

Abstract: This paper proposes a keyword extraction process, based on the PageRank algorithm, to reduce noise of input data for measuring semantic similarity. This paper will introduce several features related to implementation and discuss their effects. It will also discuss experimental results which showed significantly improved document retrieval performance with this extraction process in place.

...read moreread less

Proceedings Article•DOI•

Short Text Classification Using Wikipedia Concept Based Document Representation

[...]

Xiang Wang¹, Ruhua Chen¹, Yan Jia¹, Bin Zhou¹•Institutions (1)

National University of Defense Technology¹

16 Nov 2013

TL;DR: Experimental evaluation on real Google search snippets shows that this approach outperforms the traditional BOW method and gives good performance, and can be easily implemented with low cost.

...read moreread less

Abstract: Short text classification is a difficult and challenging task in information retrieval systems since the text data is short, sparse and multidimensional In this paper, we represent short text with Wikipedia concepts for classification Short document text is mapped to Wikipedia concepts and the concepts are then used to represent document for text categorization Traditional methods for classification such as SVM can be used to perform text categorization on the Wikipedia concept document representation Experimental evaluation on real Google search snippets shows that our approach outperforms the traditional BOW method and gives good performance Although it's not better than the state-of-the-art classifier (see eg Phan et al WWW '08), our method can be easily implemented with low cost

...read moreread less

Collapse