Showing papers on "Ranking (information retrieval) published in 2021"
••
11 Jul 2021TL;DR: Pyserini as mentioned in this paper is a Python toolkit for reproducible information retrieval research with sparse and dense representations, which aims to provide effective, reproducible, and easy-to-use first-stage retrieval in a multi-stage ranking architecture.
Abstract: Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations. It aims to provide effective, reproducible, and easy-to-use first-stage retrieval in a multi-stage ranking architecture. Our toolkit is self-contained as a standard Python package and comes with queries, relevance judgments, pre-built indexes, and evaluation scripts for many commonly used IR test collections. We aim to support, out of the box, the entire research lifecycle of efforts aimed at improving ranking with modern neural approaches. In particular, Pyserini supports sparse retrieval (e.g., BM25 scoring using bag-of-words representations), dense retrieval (e.g., nearest-neighbor search on transformer-encoded representations), as well as hybrid retrieval that integrates both approaches. This paper provides an overview of toolkit features and presents empirical results that illustrate its effectiveness on two popular ranking tasks. Around this toolkit, our group has built a culture of reproducibility through shared norms and tools that enable rigorous automated testing.
165 citations
••
11 Jul 2021TL;DR: Zhang et al. as mentioned in this paper investigated different training strategies for dense retrieval models and tried to explain why hard negative sampling performs better than random sampling, and proposed two training strategies named a stable training algorithm for dense retrieval (STAR) and a query-side training Algorithm for Directly Optimizing Ranking pErformance (ADORE), respectively.
Abstract: Ranking has always been one of the top concerns in information retrieval researches. For decades, the lexical matching signal has dominated the ad-hoc retrieval process, but solely using this signal in retrieval may cause the vocabulary mismatch problem. In recent years, with the development of representation learning techniques, many researchers turn to Dense Retrieval (DR) models for better ranking performance. Although several existing DR models have already obtained promising results, their performance improvement heavily relies on the sampling of training examples. Many effective sampling strategies are not efficient enough for practical usage, and for most of them, there still lacks theoretical analysis in how and why performance improvement happens. To shed light on these research questions, we theoretically investigate different training strategies for DR models and try to explain why hard negative sampling performs better than random sampling. Through the analysis, we also find that there are many potential risks in static hard negative sampling, which is employed by many existing training methods. Therefore, we propose two training strategies named a Stable Training Algorithm for dense Retrieval (STAR) and a query-side training Algorithm for Directly Optimizing Ranking pErformance (ADORE), respectively. STAR improves the stability of DR training process by introducing random negatives. ADORE replaces the widely-adopted static hard negative sampling method with a dynamic one to directly optimize the ranking performance. Experimental results on two publicly available retrieval benchmark datasets show that either strategy gains significant improvements over existing competitive baselines and a combination of them leads to the best performance.
136 citations
••
11 Jul 2021TL;DR: In this article, a new first-stage ranker based on explicit sparsity regularization and a log-saturation effect on term weights is proposed, leading to highly sparse representations.
Abstract: In neural Information Retrieval, ongoing research is directed towards improving the first retriever in ranking pipelines. Learning dense embeddings to conduct retrieval using efficient approximate nearest neighbors methods has proven to work well. Meanwhile, there has been a growing interest in learning sparse representations for documents and queries, that could inherit from the desirable properties of bag-of-words models such as the exact matching of terms and the efficiency of inverted indexes. In this work, we present a new first-stage ranker based on explicit sparsity regularization and a log-saturation effect on term weights, leading to highly sparse representations and competitive results with respect to state-of-the-art dense and sparse methods. Our approach is simple, trained end-to-end in a single stage. We also explore the trade-off between effectiveness and efficiency, by controlling the contribution of the sparsity regularization.
108 citations
••
TL;DR: The current landscape of semantic retrieval models from three major paradigms, paying special attention to recent neural-based methods, is described in this article, where the authors review the benchmark datasets, optimization methods and evaluation metrics, and summarize the state-of-theart models.
Abstract: Multi-stage ranking pipelines have been a practical solution in modern search systems, where the first-stage retrieval is to return a subset of candidate documents, and the latter stages attempt to re-rank those candidates. Unlike the re-ranking stages going through quick technique shifts during the past decades, the first-stage retrieval has long been dominated by classical term-based models. Unfortunately, these models suffer from the vocabulary mismatch problem, which may block the re-ranking stages from relevant documents at the very beginning. Therefore, it has been a long-term desire to build semantic models for the first-stage retrieval that can achieve high recall efficiently. Recently, we have witnessed an explosive growth of research interests on the first-stage semantic retrieval models. We believe it is the right time to survey the current status, learn from existing methods, and gain some insights for future development. In this paper, we describe the current landscape of semantic retrieval models from three major paradigms, paying special attention to recent neural-based methods. We review the benchmark datasets, optimization methods and evaluation metrics, and summarize the state-of-the-art models. We also discuss the unresolved challenges and suggest potentially promising directions for future work.
52 citations
••
19 Apr 2021TL;DR: In this paper, a neural architecture called GraFRank is proposed to learn expressive user representations from multiple feature modalities and user-user interactions, which can be used for friend recommendation on large-scale social platforms.
Abstract: Graph Neural Networks (GNNs) have recently enabled substantial advances in graph learning. Despite their rich representational capacity, GNNs remain under-explored for large-scale social modeling applications. One such industrially ubiquitous application is friend suggestion: recommending users other candidate users to befriend, to improve user connectivity, retention and engagement. However, modeling such user-user interactions on large-scale social platforms poses unique challenges: such graphs often have heavy-tailed degree distributions, where a significant fraction of users are inactive and have limited structural and engagement information. Moreover, users interact with different functionalities, communicate with diverse groups, and have multifaceted interaction patterns. We study the application of GNNs for friend suggestion, providing the first investigation of GNN design for this task, to our knowledge. To leverage the rich knowledge of in-platform actions, we formulate friend suggestion as multi-faceted friend ranking with multi-modal user features and link communication features. We design a neural architecture GraFRank to learn expressive user representations from multiple feature modalities and user-user interactions. Specifically, GraFRank employs modality-specific neighbor aggregators and cross-modality attentions to learn multi-faceted user representations. We conduct experiments on two multi-million user datasets from Snapchat, a leading mobile social platform, where GraFRank outperforms several state-of-the-art approaches on candidate retrieval (by 30% MRR) and ranking (by 20% MRR) tasks. Moreover, our qualitative analysis indicates notable gains for critical populations of less-active and low-degree users.
50 citations
••
TL;DR: This paper proposes several new distance and similarity measures for the SVNS model, and it is proven that the proposed similarity measures produced the most consistent ranking results compared to other existing similarity measures.
Abstract: The single-valued neutrosophic set (SVNS) is a well-known model for handling uncertain and indeterminate information. Information measures such as distance measures, similarity measures and entropy measures are very useful tools to be used in many applications such as multi-criteria decision making (MCDM), medical diagnosis, pattern recognition and clustering problems. A lot of such information measures have been proposed for the SVNS model. However, many of these measures have inherent problems that prevent them from producing reasonable or consistent results to the decision makers. In this paper, we propose several new distance and similarity measures for the SVNS model. The proposed measures have been verified and proven to comply with the axiomatic definition of the distance and similarity measure for the SVNS model. A detailed and comprehensive comparative analysis between the proposed similarity measures and other well-known existing similarity measures has been done. Based on the comparison results, it is clearly proven that the proposed similarity measures are able to overcome the shortcomings that are inherent in existing similarity measures. Finally, an extensive set of numerical examples, related to pattern recognition and medical diagnosis, is given to demonstrate the practical applicability of the proposed similarity measures. In all numerical examples, it is proven that the proposed similarity measures are able to produce accurate and reasonable results. To further verify the superiority of the suggested similarity measures, the Spearman’s rank correlation coefficient test is performed on the ranking results that were obtained from the numerical examples, and it was again proven that the proposed similarity measures produced the most consistent ranking results compared to other existing similarity measures.
45 citations
•
TL;DR: This work proposes Variance of Gradients (VOG) as a proxy metric for detecting outliers in the data distribution and provides quantitative and qualitative support that VOG is a meaningful way to rank data by difficulty and to surface a tractable subset of the most challenging examples for human-in-the-loop auditing.
Abstract: In machine learning, a question of great interest is understanding what examples are challenging for a model to classify. Identifying atypical examples helps inform safe deployment of models, isolates examples that require further human inspection, and provides interpretability into model behavior. In this work, we propose the Variance of Gradients (VOG) as a valuable and efficient proxy metric for detecting outliers in the data distribution. We provide quantitative and qualitative support that VOG is a meaningful way to rank data by difficulty and to surface a tractable subset of the most challenging examples for human-in-the-loop auditing. Data points with high VOG scores are more difficult for the model to learn and over-index on examples that require memorization.
44 citations
••
11 Jul 2021TL;DR: In this article, the authors propose the Term Independent Likelihood moDEl (TILDE) model, which ranks documents by both query and document likelihood at query time, which does not require the inference step of deep language models based retrieval approaches.
Abstract: Deep language models (deep LMs) are increasingly being used for full text retrieval or within cascade retrieval pipelines as later-stage re-rankers. A problem with using deep LMs is that, at query time, a slow inference step needs to be performed -- this hinders the practical adoption of these powerful retrieval models, or limits sensibly how many documents can be considered for re-ranking. We propose the novel, BERT-based, Term Independent Likelihood moDEl (TILDE), which ranks documents by both query and document likelihood. At query time, our model does not require the inference step of deep language models based retrieval approaches, thus providing consistent time-savings, as the prediction of query terms' likelihood can be pre-computed and stored during index creation. This is achieved by relaxing the term dependence assumption made by the deep LMs. In addition, we have devised a novel bi-directional training loss which allows TILDE to maximise both query and document likelihood at the same time during training. At query time, TILDE can rely on its query likelihood component (TILDE-QL) solely, or the combination of TILDE-QL and its document likelihood component (TILDE-DL), thus providing a flexible trade-off between efficiency and effectiveness. Exploiting both components provide the highest effectiveness at a higher computational cost while relying only on TILDE-QL trades off effectiveness for faster response time due to no inference being required. TILDE is evaluated on the MS MARCO and TREC Deep Learning 2019 and 2020 passage ranking datasets. Empirical results show that, compared to other approaches that aim to make deep language models viable operationally, TILDE achieves competitive effectiveness coupled with low query latency.
43 citations
••
11 Jul 2021TL;DR: In this article, the authors proposed to use pseudo-relevance feedback to enhance the performance of dense retrieval by extracting representative feedback embeddings from the pseudo-relevant set of documents identified using a first pass dense retrieval.
Abstract: Pseudo-relevance feedback mechanisms, from Rocchio to the relevance models, have shown the usefulness of expanding and reweighting the users' initial queries using information occurring in an initial set of retrieved documents, known as the pseudo-relevant set. Recently, dense retrieval -- through the use of neural contextual language models such as BERT for analysing the documents' and queries' contents and computing their relevance scores -- has shown a promising performance on several information retrieval tasks still relying on the traditional inverted index for identifying documents relevant to a query. Two different dense retrieval families have emerged: the use of single embedded representations for each passage and query (e.g. using BERT's [CLS] token), or via multiple representations (e.g. using an embedding for each token of the query and document). In this work, we conduct the first study into the potential for multiple representation dense retrieval to be enhanced using pseudo-relevance feedback. In particular, based on the pseudo-relevant set of documents identified using a first-pass dense retrieval, we extract representative feedback embeddings (using KMeans clustering) -- while ensuring that these embeddings discriminate among passages (based on IDF) -- which are then added to the query representation. These additional feedback embeddings are shown to both enhance the effectiveness of a reranking as well as an additional dense retrieval operation. Indeed, experiments on the MSMARCO passage ranking dataset show that MAP can be improved by upto 26% on the TREC 2019 query set and 10% on the TREC 2020 query set by the application of our proposed ColBERT-PRF method on a ColBERT dense retrieval approach.
41 citations
••
26 Oct 2021TL;DR: Joint optimization of query encoding and product quantization (JPQ) as discussed by the authors trains the query encoder and PQ index jointly in an end-to-end manner based on three optimization strategies, namely ranking-oriented loss, PQ centroid optimization, and end to end negative sampling.
Abstract: Recently, Information Retrieval community has witnessed fast-paced advances in Dense Retrieval (DR), which performs first-stage retrieval with embedding-based search. Despite the impressive ranking performance, previous studies usually adopt brute-force search to acquire candidates, which is prohibitive in practical Web search scenarios due to its tremendous memory usage and time cost. To overcome these problems, vector compression methods have been adopted in many practical embedding-based retrieval applications. One of the most popular methods is Product Quantization (PQ). However, although existing vector compression methods including PQ can help improve the efficiency of DR, they incur severely decayed retrieval performance due to the separation between encoding and compression. To tackle this problem, we present JPQ, which stands for Joint optimization of query encoding and Product Quantization. It trains the query encoder and PQ index jointly in an end-to-end manner based on three optimization strategies, namely ranking-oriented loss, PQ centroid optimization, and end-to-end negative sampling. We evaluate JPQ on two publicly available retrieval benchmarks. Experimental results show that JPQ significantly outperforms popular vector compression methods. Compared with previous DR models that use brute-force search, JPQ almost matches the best retrieval performance with 30x compression on index size. The compressed index further brings 10x speedup on CPU and 2x speedup on GPU in query latency.
41 citations
••
TL;DR: A survey of text ranking with neural network architectures known as transformers can be found in this paper, where the authors provide a synthesis of existing work as a single point of entry for practitioners who wish to gain a better understanding of how to apply transformers to text ranking problems and researchers who desire to pursue work in this area.
Abstract: The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query. Although the most common formulation of text ranking is search, instances of the task can also be found in many natural language processing applications. This survey provides an overview of text ranking with neural network architectures known as transformers, of which BERT is the best-known example. The combination of transformers and self-supervised pretraining has been responsible for a paradigm shift in natural language processing (NLP), information retrieval (IR), and beyond. In this survey, we provide a synthesis of existing work as a single point of entry for practitioners who wish to gain a better understanding of how to apply transformers to text ranking problems and researchers who wish to pursue work in this area. We cover a wide range of modern techniques, grouped into two high-level categories: transformer models that perform reranking in multi-stage architectures and dense retrieval techniques that perform ranking directly. There are two themes that pervade our survey: techniques for handling long documents, beyond typical sentence-by-sentence processing in NLP, and techniques for addressing the tradeoff between effectiveness (i.e., result quality) and efficiency (e.g., query latency, model and index size). Although transformer architectures and pretraining techniques are recent innovations, many aspects of how they are applied to text ranking are relatively well understood and represent mature techniques. However, there remain many open research questions, and thus in addition to laying out the foundations of pretrained transformers for text ranking, this survey also attempts to prognosticate where the field is heading.
••
01 Jun 2021TL;DR: SPARTA achieves new state-of-the-art results across a variety of open-domain question answering tasks in both English and Chinese datasets, including open SQuAD, CMRC and etc.
Abstract: We introduce SPARTA, a novel neural retrieval method that shows great promise in performance, generalization, and interpretability for open-domain question answering. Unlike many neural ranking methods that use dense vector nearest neighbor search, SPARTA learns a sparse representation that can be efficiently implemented as an Inverted Index. The resulting representation enables scalable neural retrieval that does not require expensive approximate vector search and leads to better performance than its dense counterpart. We validated our approaches on 4 open-domain question answering (OpenQA) tasks and 11 retrieval question answering (ReQA) tasks. SPARTA achieves new state-of-the-art results across a variety of open-domain question answering tasks in both English and Chinese datasets, including open SQuAD, CMRC and etc. Analysis also confirms that the proposed method creates human interpretable representation and allows flexible control over the trade-off between performance and efficiency.
••
01 Feb 2021TL;DR: It is found that the information-based score function can overcome the drawbacks of the existing ranking methods and can rank the IVIFSs well.
Abstract: The score functions are often used to rank the interval-valued intuitionistic fuzzy sets (IVIFSs) in multiattribute decision making (MADM). The purpose of this paper is to develop an information-based score function of the IVIFS and apply it to MADM. Considering the information amount, the reliability, the certainty information, and the relative closeness degree, we propose an information-based score function of the IVIFS. Comparing the information-based score function with existing ranking methods, we find that the information-based score function can overcome the drawbacks of the existing ranking methods and can rank the IVIFSs well. Three illustrative examples of MADM with linear programming are examined to demonstrate the applicability and feasibility of the information-based score function. It is shown that the information-based score function is well defined and can be applied to MADM.
••
01 Aug 2021
TL;DR: The authors distill knowledge from a bi-encoder teacher to a student by distilling knowledge from ColBERT's expressive MaxSim operator into a simple dot product, enabling richer interactions between teacher and student models.
Abstract: We present an efficient training approach to text retrieval with dense representations that applies knowledge distillation using the ColBERT late-interaction ranking model. Specifically, we propose to transfer the knowledge from a bi-encoder teacher to a student by distilling knowledge from ColBERT’s expressive MaxSim operator into a simple dot product. The advantage of the bi-encoder teacher–student setup is that we can efficiently add in-batch negatives during knowledge distillation, enabling richer interactions between teacher and student models. In addition, using ColBERT as the teacher reduces training cost compared to a full cross-encoder. Experiments on the MS MARCO passage and document ranking tasks and data from the TREC 2019 Deep Learning Track demonstrate that our approach helps models learn robust representations for dense retrieval effectively and efficiently.
••
TL;DR: In this paper, the authors proposed to use pseudo-relevance feedback to enhance the performance of dense retrieval by extracting representative feedback embeddings from the pseudo-relevant set of documents identified using a first pass dense retrieval.
Abstract: Pseudo-relevance feedback mechanisms, from Rocchio to the relevance models, have shown the usefulness of expanding and reweighting the users' initial queries using information occurring in an initial set of retrieved documents, known as the pseudo-relevant set. Recently, dense retrieval -- through the use of neural contextual language models such as BERT for analysing the documents' and queries' contents and computing their relevance scores -- has shown a promising performance on several information retrieval tasks still relying on the traditional inverted index for identifying documents relevant to a query. Two different dense retrieval families have emerged: the use of single embedded representations for each passage and query (e.g. using BERT's [CLS] token), or via multiple representations (e.g. using an embedding for each token of the query and document). In this work, we conduct the first study into the potential for multiple representation dense retrieval to be enhanced using pseudo-relevance feedback. In particular, based on the pseudo-relevant set of documents identified using a first-pass dense retrieval, we extract representative feedback embeddings (using KMeans clustering) -- while ensuring that these embeddings discriminate among passages (based on IDF) -- which are then added to the query representation. These additional feedback embeddings are shown to both enhance the effectiveness of a reranking as well as an additional dense retrieval operation. Indeed, experiments on the MSMARCO passage ranking dataset show that MAP can be improved by upto 26% on the TREC 2019 query set and 10% on the TREC 2020 query set by the application of our proposed ColBERT-PRF method on a ColBERT dense retrieval approach.
••
TL;DR: In this paper, a homogeneous Pythagorean fuzzy framework was proposed for distributing the COVID-19 vaccine dose by integrating a new formulation of the fuzzy-weighted zero-inconsistency (PFWZIC) and PFDOSM methods.
••
TL;DR: This research proposed an in-text citation sentiment analysis-based approach for binary classification which effectively enhanced the results of the state-of-the-art on the benchmark dataset.
••
11 Jul 2021
TL;DR: In this paper, the authors use the MS MARCO and TREC Deep Learning Track as their case study, comparing it to the case of TREC ad hoc ranking in the 1990s.
Abstract: Evaluation efforts such as TREC, CLEF, NTCIR and FIRE, alongside public leaderboard such as MS MARCO, are intended to encourage research and track our progress, addressing big questions in our field. However, the goal is not simply to identify which run is "best", achieving the top score. The goal is to move the field forward by developing new robust techniques, that work in many different settings, and are adopted in research and practice. This paper uses the MS MARCO and TREC Deep Learning Track as our case study, comparing it to the case of TREC ad hoc ranking in the 1990s. We show how the design of the evaluation effort can encourage or discourage certain outcomes, and raising questions about internal and external validity of results. We provide some analysis of certain pitfalls, and a statement of best practices for avoiding such pitfalls. We summarize the progress of the effort so far, and describe our desired end state of "robust usefulness", along with steps that might be required to get us there.
••
01 Dec 2021TL;DR: In this article, the authors proposed a new multi-attribute decision-making (MADM) method, named as R-method, for ranking of Pareto-optimal solutions and selecting the best solution in multi-and many-objective optimization problems.
Abstract: This paper presents a new multi-attribute decision-making (MADM) method, named as R-method, for ranking of Pareto-optimal solutions and selecting the best solution in multi- and many-objective optimization problems. The compromise among the optimization objectives is different for each Pareto-optimal solution and, hence, the solution that has the best compromise among the objectives can be considered as the best solution. The proposed R-method is used to identify such best compromise solution. The method ranks the objectives based on their importance for the given optimization problem and ranks the alternative solutions (i.e. Pareto-optimal solutions) based on their data corresponding to the objectives. The ranks assigned to the objectives and the ranks assigned to the alternative solutions with respect to each of the objectives are converted to appropriate weights and the final composite scores of the alternative solutions are computed using these weights. The final ranking of alternative solutions is done based on the composite scores. The steps of the proposed method are described along with a pseudocode. Three examples are considered to demonstrate and validate the proposed method. The first example contains 4-objectives and 50 alternative solutions, the second example contains 6-objectives and 30 alternative solutions, and the third example contains 3-objectives and 25 alternative solutions. The results of the proposed method are compared with those of the other widely used MADM methods for the three examples considered. Also, the proposed method is compared with four well-known ranking methods to demonstrate its rationality in assigning weights to the ranks of the objectives and the alternative solutions. The proposed method is comparatively easier, more logical, and can be used for choosing the best compromise solution in multi- and many-objective optimization problems.
••
TL;DR: In this paper, a comprehensive approach for multi-criteria decision analysis (MCDA) based on alternative methods capable of assessing different aspects of supplier selection uncertainty in terms of utility functions and criteria related to efficiency is presented.
Abstract: The focus of this paper is on selecting suppliers in the Oil and Gas (O&G) industry by developing a comprehensive approach for Multi-Criteria Decision Analysis (MCDA) based on alternative methods capable of assessing different aspects of supplier selection uncertainty in terms of utility functions and criteria related to efficiency. The O&G industry has a key role in the public sector of various countries such as Iran with its revenues being of prime importance to develop infrastructure facilities such as for healthcare, education, and transportation. This comprehensive approach walks through various stages for selecting Critical Success Factors (CSFs), ranking suppliers, and for setting partial weighting alternatives. While CSFs are selected using a traditional Delphi approach, the partial supplier rankings are defined based on Complex Proportional Assessment (COPRAS) utility functions together with criteria weights derived from Step-wise Weight Assessment Ratio Analysis (SWARA) for each CSF. As it concerns information reliability of utility and efficiency functions of both methods obtained via expert preferences or perceptions, Z-numbers are used to address the intrinsic fuzziness level inherent to each analytical stage. Iran's economy depends on revenues from oil and other related production, which means that by earning more income from this industry, most of its economic indicators such as GDP and employment rate should increase significantly, thus leading to economic growth. Various countries put plans in place related to production for increasing their social economics. One of these plans is focused on suppliers since they have a high impact on providing essential items such as equipment, HR, and transportation, so by choosing the best suppliers in all fields, costs will decrease and consequently revenue will increase. This research points out how to rank O&G industry suppliers using MCDA methods in an uncertain environment. An example based on actual data from an Iranian O&G company is provided to show the applicability of the approach proposed. Results suggest that the complexity of O&G operations on selecting suppliers can be adequately handled by information reliability techniques applied to traditional economic concepts such as utility- and efficiency-related factors, particularly in business environments characterized by a trade embargo.
••
TL;DR: An integrated approach to solve the decision-making problem under the probabilistic hesitant fuzzy information features, which is an extension of the hesitant fuzzy set, and an algorithm for finding some missing values in the preference information is presented.
Abstract: The paper aims to present an integrated approach to solve the decision-making problem under the probabilistic hesitant fuzzy information (PHFI) features, which is an extension of the hesitant fuzzy set. The considered PHFI not only allows multiple opinions, but also associates occurrence probability to each opinion, which increases the reliability of the information. Motivated by these features of PHFI, an approach is presented to solve the decision problem with partial known information about the attribute and expert weights. In addition, an algorithm for finding some missing values in the preference information is presented and stated their properties. Afterward, the Hamy mean operator has been used to aggregate the different collective information into a single one. Also, we presented a COPRAS method to the PHFI for ranking the given alternatives. The presented algorithm has been demonstrated through a case study of cloud vendor selection and its validity has been revealed by comparing the approach results with the several existing algorithm results.
••
08 Mar 2021TL;DR: A good overview of text ranking with neural network architectures known as transformers can be found in this article, where the authors provide a synthesis of existing work as a single point of entry for both researchers and practitioners.
Abstract: The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query. Although the most common formulation of text ranking is search, instances of the task can also be found in many natural language processing applications. This tutorial, based on a forthcoming book, provides an overview of text ranking with neural network architectures known as transformers, of which BERT is the best-known example. The combination of transformers and self-supervised pretraining has, without exaggeration, revolutionized the fields of natural language processing (NLP), information retrieval (IR), and beyond. We provide a synthesis of existing work as a single point of entry for both researchers and practitioners. Our coverage is grouped into two categories: transformer models that perform reranking in multi-stage ranking architectures and learned dense representations that perform ranking directly. Two themes pervade our treatment: techniques for handling long documents and techniques for addressing the tradeoff between effectiveness (result quality) and efficiency (query latency). Although transformer architectures and pretraining techniques are recent innovations, many aspects of their application are well understood. Nevertheless, there remain many open research questions, and thus in addition to laying out the foundations of pretrained transformers for text ranking, we also attempt to prognosticate the future.
••
TL;DR: This paper proposed AdvBert, a ranking model achieved by adapting adversarial bias mitigation for IR, which jointly learns to predict relevance and remove protected attributes, and investigated the trade-off between fairness and utility.
Abstract: Societal biases resonate in the retrieved contents of information retrieval (IR) systems, resulting in reinforcing existing stereotypes. Approaching this issue requires established measures of fairness in respect to the representation of various social groups in retrieval results, as well as methods to mitigate such biases, particularly in the light of the advances in deep ranking models. In this work, we first provide a novel framework to measure the fairness in the retrieved text contents of ranking models. Introducing a ranker-agnostic measurement, the framework also enables the disentanglement of the effect on fairness of collection from that of rankers. To mitigate these biases, we propose AdvBert, a ranking model achieved by adapting adversarial bias mitigation for IR, which jointly learns to predict relevance and remove protected attributes. We conduct experiments on two passage retrieval collections (MSMARCO Passage Re-ranking and TREC Deep Learning 2019 Passage Re-ranking), which we extend by fairness annotations of a selected subset of queries regarding gender attributes. Our results on the MSMARCO benchmark show that, (1) all ranking models are less fair in comparison with ranker-agnostic baselines, and (2) the fairness of Bert rankers significantly improves when using the proposed AdvBert models. Lastly, we investigate the trade-off between fairness and utility, showing that we can maintain the significant improvements in fairness without any significant loss in utility.
••
11 Jul 2021TL;DR: This paper proposed AdvBert, a ranking model achieved by adapting adversarial bias mitigation for IR, which jointly learns to predict relevance and remove protected attributes, and investigated the trade-off between fairness and utility.
Abstract: Societal biases resonate in the retrieved contents of information retrieval (IR) systems, resulting in reinforcing existing stereotypes. Approaching this issue requires established measures of fairness in respect to the representation of various social groups in retrieval results, as well as methods to mitigate such biases, particularly in the light of the advances in deep ranking models. In this work, we first provide a novel framework to measure the fairness in the retrieved text contents of ranking models. Introducing a ranker-agnostic measurement, the framework also enables the disentanglement of the effect on fairness of collection from that of rankers. To mitigate these biases, we propose AdvBert, a ranking model achieved by adapting adversarial bias mitigation for IR, which jointly learns to predict relevance and remove protected attributes. We conduct experiments on two passage retrieval collections (MSMARCO Passage Re-ranking and TREC Deep Learning 2019 Passage Re-ranking), which we extend by fairness annotations of a selected subset of queries regarding gender attributes. Our results on the MSMARCO benchmark show that, (1) all ranking models are less fair in comparison with ranker-agnostic baselines, and (2) the fairness of Bert rankers significantly improves when using the proposed AdvBert models. Lastly, we investigate the trade-off between fairness and utility, showing that we can maintain the significant improvements in fairness without any significant loss in utility.
••
TL;DR: This article proposes a similarity-based ranking (SR) strategy inspired by a density-based clustering algorithm and introduces structural similarity (SSIM) to measure the relationships between the bands and demonstrated that SR-SSIM outperformed the other methods.
Abstract: Band selection (BS) is a commonly used dimension reduction technique for hyperspectral images. In this article, we propose a similarity-based ranking (SR) strategy inspired by a density-based clustering algorithm. The representativeness of a band is evaluated according to its ability to become a cluster center. We introduce structural similarity (SSIM) to measure the relationships between the bands. Thus, our proposed ranking-based BS method is called SR-SSIM. We picked state-of-the-art BS methods as competitors and carried out classification experiments on different data sets. The results illustrated that SR-SSIM outperformed the other methods. It is demonstrated, in this article, that the SSIM is more suitable for hyperspectral BS than the Euclidean distance since the SSIM can mine the spatial information contained in the band images. Furthermore, we discuss the application of BS methods on deep learning classifier. We found that proper preprocessing by the BS method can effectively eliminate redundant information and avoid overfitting.
••
TL;DR: The evaluation findings show that the ranking results are robust and the C O 2 emission criterion is found to be the dominant criterion in the multi-criteria decision-making model proposed in this paper.
••
TL;DR: In this article, a two-stage PLS-SEM-artificial-neural-network (ANN) predictive analytic approach was adopted to analyze the collected data, of which PLSSEM was first applied to test the hypotheses, followed by the ANN technique to detect the nonlinear effect on the model.
••
11 Jul 2021TL;DR: This paper adopt an intra-document cascading strategy, which prunes passages of a candidate document using a less expensive model, called ESM, before running a scoring model that is more expensive and effective, called ETM.
Abstract: An emerging recipe for achieving state-of-the-art effectiveness in neural document re-ranking involves utilizing large pre-trained language models - e.g., BERT - to evaluate all individual passages in the document and then aggregating the outputs by pooling or additional Transformer layers. A major drawback of this approach is high query latency due to the cost of evaluating every passage in the document with BERT. To make matters worse, this high inference cost and latency varies based on the length of the document, with longer documents requiring more time and computation. To address this challenge, we adopt an intra-document cascading strategy, which prunes passages of a candidate document using a less expensive model, called ESM, before running a scoring model that is more expensive and effective, called ETM. We found it best to train ESM (short for Efficient Student Model) via knowledge distillation from the ETM (short for Effective Teacher Model) e.g., BERT. This pruning allows us to only run the ETM model on a smaller set of passages whose size does not vary by document length. Our experiments on the MS MARCO and TREC Deep Learning Track benchmarks suggest that the proposed Intra-Document Cascaded Ranking Model (IDCM) leads to over 400% lower query latency by providing essentially the same effectiveness as the state-of-the-art BERT-based document ranking models.
••
TL;DR: This work model the sequential interactions between the CI environment and a test case prioritization agent as an RL problem, using three alternative ranking models, and shows that the best RL solutions provide a significant accuracy improvement over previous RL-based work, with prioritization strategies getting close to being optimal.
Abstract: Continuous Integration (CI) significantly reduces integration problems, speeds up development time, and shortens release time. However, it also introduces new challenges for quality assurance activities, including regression testing, which is the focus of this work. Though various approaches for test case prioritization have shown to be very promising in the context of regression testing, specific techniques must be designed to deal with the dynamic nature and timing constraints of CI. Recently, Reinforcement Learning (RL) has shown great potential in various challenging scenarios that require continuous adaptation, such as game playing, real-time ads bidding, and recommender systems. Inspired by this line of work and building on initial efforts in supporting test case prioritization with RL techniques, we perform here a comprehensive investigation of RL-based test case prioritization in a CI context. To this end, taking test case prioritization as a ranking problem, we model the sequential interactions between the CI environment and a test case prioritization agent as an RL problem, using three alternative ranking models. We then rely on carefully selected and tailored state-of-the-art RL techniques to automatically and continuously learn a test case prioritization strategy, whose objective is to be as close as possible to the optimal one. Our extensive experimental analysis shows that the best RL solutions provide a significant accuracy improvement over previous RL-based work, with prioritization strategies getting close to being optimal, thus paving the way for using RL to prioritize test cases in a CI context.
••
TL;DR: An extensive study on 10 popularly used filter ranking methods providing a comparison amongst different filter methods helping researchers make an informed choice about selecting an appropriate filter method for their work.
Abstract: DNA microarray experiments generate thousands of gene expression values that provide information about the state of cells and tissues. Though these expressive values are useful in disease classification, however, only a few genes contribute towards this classification. In this context, usage of feature selection algorithms can be beneficial, as the main goal of feature selection algorithms is to identify the relevant features (here genes) efficiently. In the recent past, many feature selection algorithms have been proposed in the literature that measure the relevancy and redundancy of the features using various evaluation criteria. An important type of feature selection techniques is feature ranking, which does not use any learning algorithm, rather assigns an important value or weight to a feature. In this paper, we provide an extensive study on 10 popularly used filter ranking methods. We have applied the methods to 10 microarray datasets (both binary class and multi-class) and tested the accuracies using three well-known classifiers namely Multi-layer Perceptron (MLP), Support Vector Machine (SVM) and K-Nearest Neighbor (KNN). We have conducted a wide variety of tests to assess the strength and weakness of various filter methods. This vast study provides a comparison amongst different filter methods helping researchers make an informed choice about selecting an appropriate filter method for their work. Three categories of filtering methods are tested, namely, Entropy based, Similarity based and Statistics based. The experiments show that out of all the methods Mutual Information (MI) gives the best results (also best among Entropy based methods). In the category of Similarity based methods ReliefF performs best and Chi-square performs best in the category of Statistics based methods. In case of bi-class datasets, Chi-square would be the better choice, while for multi-class datasets, MI gives better results.