Topic
Confusion matrix
About: Confusion matrix is a research topic. Over the lifetime, 1969 publications have been published within this topic receiving 32645 citations. The topic is also known as: error matrix & contingency table.
Papers published on a yearly basis
Papers
More filters
TL;DR: This paper presents a systematic analysis of twenty four performance measures used in the complete spectrum of Machine Learning classification tasks, i.e., binary, multi-class,multi-labelled, and hierarchical, to produce a measure invariance taxonomy with respect to all relevant label distribution changes in a classification problem.
Abstract: This paper presents a systematic analysis of twenty four performance measures used in the complete spectrum of Machine Learning classification tasks, i.e., binary, multi-class, multi-labelled, and hierarchical. For each classification task, the study relates a set of changes in a confusion matrix to specific characteristics of data. Then the analysis concentrates on the type of changes to a confusion matrix that do not change a measure, therefore, preserve a classifier's evaluation (measure invariance). The result is the measure invariance taxonomy with respect to all relevant label distribution changes in a classification problem. This formal analysis is supported by examples of applications where invariance properties of measures lead to a more reliable evaluation of classifiers. Text classification supplements the discussion with several case studies.
3,945 citations
TL;DR: It is likely that it is unlikely that a single standardized method of accuracy assessment and reporting can be identified, but some possible directions for future research that may facilitate accuracy assessment are highlighted.
Abstract: The production of thematic maps, such as those depicting land cover, using an image classification is one of the most common applications of remote sensing. Considerable research has been directed at the various components of the mapping process, including the assessment of accuracy. This paper briefly reviews the background and methods of classification accuracy assessment that are commonly used and recommended in the research literature. It is, however, evident that the research community does not universally adopt the approaches that are often recommended to it, perhaps a reflection of the problems associated with accuracy assessment, and typically fails to achieve the accuracy targets commonly specified. The community often tends to use, unquestioningly, techniques based on the confusion matrix for which the correct application and interpretation requires the satisfaction of often untenable assumptions (e.g., perfect coregistration of data sets) and the provision of rarely conveyed information (e.g., sampling design for ground data acquisition). Eight broad problem areas that currently limit the ability to appropriately assess, document, and use the accuracy of thematic maps derived from remote sensing are explored. The implications of these problems are that it is unlikely that a single standardized method of accuracy assessment and reporting can be identified, but some possible directions for future research that may facilitate accuracy assessment are highlighted.
3,800 citations
TL;DR: This article shows how MCC produces a more informative and truthful score in evaluating binary classifications than accuracy and F1 score, by first explaining the mathematical properties, and then the asset of MCC in six synthetic use cases and in a real genomics scenario.
Abstract: To evaluate binary classifications and their confusion matrices, scientific researchers can employ several statistical rates, accordingly to the goal of the experiment they are investigating. Despite being a crucial issue in machine learning, no widespread consensus has been reached on a unified elective chosen measure yet. Accuracy and F1 score computed on confusion matrices have been (and still are) among the most popular adopted metrics in binary classification tasks. However, these statistical measures can dangerously show overoptimistic inflated results, especially on imbalanced datasets. The Matthews correlation coefficient (MCC), instead, is a more reliable statistical rate which produces a high score only if the prediction obtained good results in all of the four confusion matrix categories (true positives, false negatives, true negatives, and false positives), proportionally both to the size of positive elements and the size of negative elements in the dataset. In this article, we show how MCC produces a more informative and truthful score in evaluating binary classifications than accuracy and F1 score, by first explaining the mathematical properties, and then the asset of MCC in six synthetic use cases and in a real genomics scenario. We believe that the Matthews correlation coefficient should be preferred to accuracy and F1 score in evaluating binary classification tasks by all scientific communities.
2,358 citations
Proceedings Article•
01 Dec 1997TL;DR: A new general framework, called Diverse Density, is described, which is applied to learn a simple description of a person from a series of images containing that person, to a stock selection problem, and to the drug activity prediction problem.
Abstract: Multiple-instance learning is a variation on supervised learning, where the task is to learn a concept given positive and negative bags of instances. Each bag may contain many instances, but a bag is labeled positive even if only one of the instances in it falls within the concept. A bag is labeled negative only if all the instances in it are negative. We describe a new general framework, called Diverse Density, for solving multiple-instance learning problems. We apply this framework to learn a simple description of a person from a series of images (bags) containing that person, to a stock selection problem, and to the drug activity prediction problem.
1,314 citations
TL;DR: A detailed overview of the classification assessment measures is introduced with the aim of providing the basics of these measures and to show how it works to serve as a comprehensive source for researchers who are interested in this field.
Abstract: Classification techniques have been applied to many applications in various fields of sciences. There are several ways of evaluating classification algorithms. The analysis of such metrics and its significance must be interpreted correctly for evaluating different learning algorithms. Most of these measures are scalar metrics and some of them are graphical methods. This paper introduces a detailed overview of the classification assessment measures with the aim of providing the basics of these measures and to show how it works to serve as a comprehensive source for researchers who are interested in this field. This overview starts by highlighting the definition of the confusion matrix in binary and multi-class classification problems. Many classification measures are also explained in details, and the influence of balanced and imbalanced data on each metric is presented. An illustrative example is introduced to show (1) how to calculate these measures in binary and multi-class classification problems, and (2) the robustness of some measures against balanced and imbalanced data. Moreover, some graphical measures such as Receiver operating characteristics (ROC), Precision-Recall, and Detection error trade-off (DET) curves are presented with details. Additionally, in a step-by-step approach, different numerical examples are demonstrated to explain the preprocessing steps of plotting ROC, PR, and DET curves.
1,147 citations