scispace - formally typeset
Search or ask a question

Showing papers on "Human–computer information retrieval published in 2018"


Journal ArticleDOI
TL;DR: This special issue on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL) was compiled after the first joint BIRNDL workshop that was held at the joint conference on digital libraries 2016 in Newark, New Jersey, USA.
Abstract: The large scale of scholarly publications poses a challenge for scholars in information seeking and sensemaking. Bibliometric, information retrieval (IR), text mining, and natural language processing techniques can assist to address this challenge, but have yet to be widely used in digital libraries (DL). This special issue on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL) was compiled after the first joint BIRNDL workshop that was held at the joint conference on digital libraries (JCDL 2016) in Newark, New Jersey, USA. It brought together IR and DL researchers and professionals to elaborate on new approaches in natural language processing, information retrieval, scientometric, and recommendation techniques that can advance the state of the art in scholarly document understanding, analysis, and retrieval at scale. This special issue includes 14 papers: four extended papers originating from the first BIRNDL workshop 2016 and the BIR workshop at ECIR 2016, four extended system reports of the CL-SciSumm Shared Task 2016 and six original research papers submitted via the open call for papers.

35 citations


Journal ArticleDOI
TL;DR: This work presents RETRIEVAL, a Web-based integrated information retrieval performance evaluation platform that offers a number of metrics that are popular within the scientific community so as to compose an efficient framework for implementing performance evaluation.
Abstract: Performance evaluation is one of the main research topics in information retrieval. Evaluation metrics are used to quantify various performance aspects of a retrieval method. These metrics assist in identifying the optimum method for a specific retrieval challenge but also to allow its parameters fine-tuning in order to achieve a robust operation for a given set of requirements specification. In this work, we present RETRIEVAL, a Web-based integrated information retrieval performance evaluation platform. It offers a number of metrics that are popular within the scientific community, so as to compose an efficient framework for implementing performance evaluation. We discuss the functionality of RETRIEVAL by citing important aspects such as the data input approaches, the user-level performance metrics parameterization, the evaluation scenarios, the interactive plots, and the performance reports repository that offers both archiving and download functionalities.

19 citations


Journal ArticleDOI
TL;DR: This commentary reminds readers of factors that call into question the appropriateness of default reliance on database searches particularly as systematic review is adapted for use in new and lower consensus fields.
Abstract: Despite recognition that database search alone is inadequate even within the health sciences, it appears that reviewers in fields that have adopted systematic review are choosing to rely primarily, or only, on database search for information retrieval. This commentary reminds readers of factors that call into question the appropriateness of default reliance on database searches particularly as systematic review is adapted for use in new and lower consensus fields. It then discusses alternative methods for information retrieval that require development, formalisation, and evaluation. Our goals are to encourage reviewers to reflect critically and transparently on their choice of information retrieval methods and to encourage investment in research on alternatives.

18 citations


Proceedings ArticleDOI
01 Mar 2018
TL;DR: In this paper, a lab-based user study involving 25 participants interacting with a set of four different interactive multilingual search user interfaces was conducted to understand and support multilingual user abilities and preferences.
Abstract: While the number of polyglot Web users across the globe has increased dramatically, little human-centered research has been conducted to better understand and support multilingual user abilities and preferences. In particular, in the fields of cross-language and multilingual search, the majority of research has focused primarily on improving retrieval and translation accuracy, while paying comparably less attention to multilingual user interaction aspects. By contrast, this paper specifically focuses on multilingual search user interface preferences and behaviors, through a lab-based user study involving 25 participants interacting with a set of four different interactive multilingual search user interfaces. User preference results confirm that multilingual search users generally have strong preferences towards interfaces that provide clear language separation, and that the traditional approach of interleaving results, as typically used in prior research, is least preferred. In addition, an analysis of user interaction behaviors shows that multilingual users make significant use of each of their languages, and that there are several interaction behavior differences depending on interface and task type.

9 citations


Journal ArticleDOI
TL;DR: Two separate user studies using a total of 5 different collaborative search interfaces and 3 information access scenarios found that being able to easily identify different team members and their actions is important for users in Multi-Level Collaborative Information Retrieval (MLCIR).
Abstract: Although there has been a great deal of research into Collaborative Information Retrieval (CIR) and Collaborative Information Seeking (CIS), the majority has assumed that team members have the same level of unrestricted access to underlying information. However, observations from different domains (e.g. healthcare, business, etc.) have suggested that collaboration sometimes involves people with differing levels of access to underlying information. This type of scenario has been referred to as Multi-Level Collaborative Information Retrieval (MLCIR). To the best of our knowledge, no studies have been conducted to investigate the effect of awareness, an existing CIR/CIS concept, on MLCIR. To address this gap in current knowledge, we conducted two separate user studies using a total of 5 different collaborative search interfaces and 3 information access scenarios. A number of Information Retrieval (IR), CIS and CIR evaluation metrics, as well as questionnaires were used to compare the interfaces. Design interviews were also conducted after evaluations to obtain qualitative feedback from participants. Results suggested that query properties such as time spent on query, query popularity and query effectiveness could allow users to obtain information about team's search performance and implicitly suggest better queries without disclosing sensitive data. Besides, having access to a history of intersecting viewed, relevant and bookmarked documents could provide similar positive effect as query properties. Also, it was found that being able to easily identify different team members and their actions is important for users in MLCIR. Based on our findings, we provide important design recommendations to help develop new CIR and MLCIR interfaces.

7 citations


Book ChapterDOI
01 Jan 2018
TL;DR: This tutorial gives an overview of information retrieval models which are based on query expansion along with practical details and description on methods of implementation.
Abstract: Most successful information retrieval techniques which has the ability to expand the original query with additional terms that best represent the actual user need. This tutorial gives an overview of information retrieval models which are based on query expansion along with practical details and description on methods of implementation. Toy examples with data are provided to assist the reader to grasp the main idea behind the query expansion (QE) techniques such as Kullback-Leibler Divergence (KLD) and the candidate expansion terms based on WordNet. The tutorial uses spectral analysis which one of the recent information retrieval techniques that considers the term proximity.

7 citations


Journal ArticleDOI
01 Jun 2018
TL;DR: This work proposes a keyword-based linked data information retrieval framework that can incorporate temporal features and give more concise results and the evaluation of the system performance indicates that it is promising.
Abstract: Temporal features, such as an explicit date and time or a time-specific event, employ concise semantics for any kind of information retrieval. Therefore, temporal features should be suitable for linked data information retrieval. However, we have found that most linked data information retrieval techniques pay little attention to the power of temporal feature inclusion. We propose a keyword-based linked data information retrieval framework ` that can incorporate temporal features and give more concise results. The evaluation of our system performance indicates that it is promising.

6 citations


Posted Content
TL;DR: A model of IR is presented demonstrating why some types of data concerning searcher and system behavior are important and are at least necessary, if not necessarily sufficient, for meaningful evaluation of personalization of IR.
Abstract: Two key, but usually ignored, issues for the evaluation of methods of personalization for information retrieval are: that such evaluation must be of a search session as a whole; and, that people, during the course of an information search session, engage in a variety of activities, intended to accomplish differ- ent goals or intentions. Taking serious account of these factors has major impli- cations for not only evaluation methods and metrics, but also for the nature of the data that is necessary both for understanding and modeling information search, and for evaluation of personalized support for information retrieval (IR). In this position paper, we: present a model of IR demonstrating why these fac- tors are important; identify some implications of accepting their validity; and, on the basis of a series of studies in interactive IR, identify some types of data concerning searcher and system behavior that we claim are, at least, necessary, if not necessarily sufficient, for meaningful evaluation of personalization of IR.

4 citations


Book ChapterDOI
01 Jan 2018
TL;DR: This study proposes an Arabic semantic-based search approach that is based on the Vector Space Model (VSM), which uses the Universal WordNet (UWN) ontology to build a rich index of concepts, Concept-Space (CS), which replaces the traditional index of terms, Term- space (TS), and enhances the Semantic VSM capability.
Abstract: One of the main reasons for adopting the Semantic Web technology in search systems is to enhance the performance of the retrieval process. A semantic-based search is characterized by finding the contents that are semantically associated with the concepts of the query rather than those which are exactly matching the query’s keywords. There is a growing interest in searching the Arabic content worldwide due to its importance for culture, religion, and economics. However, the Arabic language; across all of its linguistics levels; is morphologically and syntactically rich. This linguistic nature of Arabic makes the effective search of its content be a challenge. In this study, we propose an Arabic semantic-based search approach that is based on the Vector Space Model (VSM). VSM has proved its success, and many studies have been focused on refining its old-style version. Our proposed approach uses the Universal WordNet (UWN) ontology to build a rich index of concepts, Concept-Space (CS), which replaces the traditional index of terms, Term-Space (TS) and enhances the Semantic VSM capability. As a consequence, we proposed a new incidence indicator to calculate the Significance Level of a Concept (SLC) in the document. The new indicator is used to evaluate the performance of the retrieval process semantically instead of the traditional syntactic retrieval that is based on the traditional incidence indicator; Term Frequency (TF). This new indicator has motivated us to develop a new formula to calculate the Semantic Weight of the Concept (SWC). The SWC is necessary for determining the Semantic Distance (SD) of two vectors. As a proof of concept, a prototype is applied on a full dump of the Arabic Wikipedia. Since documents are indexed by their concepts and, hence, classified semantically, we were able to search Arabic documents efficiently. The experimental results regarding the Precision, Recall, and F-measure presented a noticeable improvement in performance.

3 citations


BookDOI
01 Jan 2018
TL;DR: Density is improved by combining a few documents in one line of the matrix to reduce the filter size and to address the problem of document removal in Matrix Bloom filters.
Abstract: Data leak prevention systems become a must-have component of enterprise information security. To minimize the communication delay, these systems require quick mechanisms for massive document comparison. Bloom filters have been proven to be a fast tool for membership checkup. Taking into account specific needs of fast text comparison this chapter proposes modifications to the Matrix Bloom filters. Approach proposed in this chapter allows improving density inMatrix Bloom filters with the help of special index to track documents uploaded into the system. Density is improved by combining a few documents in one line of the matrix to reduce the filter size and to address the problem of document removal. Special attention is paid to the negative impact of filter-to-filter comparison in matrix filters. Theoretical evaluation of the threshold for false positive results is provided. The experiment provided in the chapter outlines advantages and applicability of the proposed approach.

2 citations


Proceedings Article
30 Mar 2018
TL;DR: Effective enterprise searching is a challenge for the researchers and the commercial companies, however it is realized that the solution for which will deliver enormous benefits is realized.
Abstract: Efficient retrieval of the relevant information is a critical success factor for many enterprises. Despite of all the advancement in the web search technology, enterprise searching is still faced with many challenges and problems. Boundaries of the enterprise search are broad and expectations of the users are quite high, in addition to many challenges faced one of the major problems is the difference between the nature of web and enterprise searching. Many solutions have been proposed and techniques have been devised to improve the enterprise search, but still effective enterprise searching is a challenge for the researchers and the commercial companies, however it is realized that the solution for which will deliver enormous benefits.

Book ChapterDOI
TL;DR: This article will attempt to show some of the aspects of human intelligence, as related to information retrieval, by the device of a semi-imaginary Oracle.
Abstract: What is "intelligent" information retrieval? Essentially this is asking what is intelligence, in this article I will attempt to show some of the aspects of human intelligence, as related to information retrieval. I will do this by the device of a semi-imaginary Oracle. Every Observatory has an oracle, someone who is a distinguished scientist, has great administrative responsibilities, acts as mentor to a number of less senior people, and as trusted advisor to even the most accomplished scientists, and knows essentially everyone in the field. In an appendix I will present a brief summary of the Statistical Factor Space method for text indexing and retrieval, and indicate how it will be used in the Astrophysics Data System Abstract Service. 2018 Keywords: Personal Digital Assistant; Supervised Topic Models

Journal ArticleDOI
TL;DR: This work has developed a novel learning framework for retrieving precise information blocks from Web pages given a query, which may contain some search terms and prior information such as the layout format of the data.
Abstract: Contrary to traditional Web information retrieval methods that can only return a ranked list of Web pages and only allow search terms in the query, we have developed a novel learning framework for retrieving precise information blocks from Web pages given a query, which may contain some search terms and prior information such as the layout format of the data. There are two challenging sub-tasks for this problem. One challenge is information block detection, where a Web page is automatically segmented into blocks. Another challenge is to find the information blocks relevant to the query. Existing page segmentation methods, which make use of only visual layout information or only content information, do not consider the query information, leading to a solution having conflict with the information need expressed by the query. Our framework aims at modeling the query and the block features to capture both keyword information and prior information via a probabilistic graphical model. Fisher Kernel, which can effectively incorporate the graphical model, is then employed to accomplish the two sub-tasks in a unified manner, optimizing the final goal of block retrieval performance. We have conducted experiments on benchmark datasets and read-world data. Comparisons between existing methods have been conducted to evaluate the effectiveness of our framework.