scispace - formally typeset
Open AccessJournal ArticleDOI

Precrec: fast and accurate precision-recall and ROC curve calculations in R.

TLDR
Precrec, an R library that aims to overcome this limitation of the plot, provides fast and accurate precision–recall calculations together with multiple functionalities that work efficiently under different conditions.
Abstract
Summary: The precision–recall plot is more informative than the ROC plot when evaluating classifiers on imbalanced datasets, but fast and accurate curve calculation tools for precision–recall plots are currently not available. We have developed Precrec, an R library that aims to overcome this limitation of the plot. Our tool provides fast and accurate precision–recall calculations together with multiple functionalities that work efficiently under different conditions. Availability and Implementation: Precrec is licensed under GPL-3 and freely available from CRAN (https://cran.r-project.org/package=precrec). It is implemented in R with C ++. Contact: takaya.saito@ii.uib.no Supplementary information: Supplementary data are available at Bioinformatics online.

read more

Content maybe subject to copyright    Report

Data and text mining
Precrec: fast and accurate precision–recall and
ROC curve calculations in R
Takaya Saito
1,
* and Marc Rehmsmeier
1,2
1
Computational Biology Unit, Department of Informatics, University of Bergen, N-5020 Bergen, Norway and
2
Integrated Research Institute (IRI) for the Life Sciences and Department of Biology, Humboldt-Universit
at zu
Berlin, Berlin, Germany
*To whom correspondence should be addressed.
Associate Editor: Jonathan Wren
Received on April 11, 2016; revised on August 24, 2016; accepted on August 26, 2016
Abstract
Summary: The precision–recall plot is more informative than the ROC plot when evaluating classi-
fiers on imbalanced datasets, but fast and accurate curve calculation tools for precision–recall plots
are currently not available. We have developed Precrec, an R library that aims to overcome this
limitation of the plot. Our tool provides fast and accurate precision–recall calculations together
with multiple functionalities that work efficiently under different conditions.
Availability and Implementation: Precrec is licensed under GPL-3 and freely available from CRAN
(https://cran.r-project.org/package¼precrec). It is implemented in R with C þþ.
Contact: takaya.saito@ii.uib.no
Supplementary information: Supplementary data are available at Bioinformatics online.
1 Introduction
The recent rapid advances of molecular technologies have increased
the importance of developing efficient and robust algorithms to han-
dle large amounts of data in various fields of bioinformatics. Binary
classifiers are mathematical and computational models that have
successfully solved a wide range of life science problems with huge
volumes of data produced from high-throughput experiments (Saito
and Rehmsmeier, 2015). The Receiver Operating Characteristics
(ROC) plot is the most popular performance measure for the evalu-
ation of binary classification models. Its popularity comes from
several well-studied characteristics, such as intuitive visual interpret-
ation of the curve, easy comparisons of multiple models, and the
Area Under the Curve (AUC) as a single-value quantity (Fawcett,
2006). Nonetheless, the intuitive visual interpretation can be mis-
leading and potentially result in inaccurate conclusions caused by a
wrong interpretation of specificity when the datasets are imbal-
anced. Imbalanced data naturally occur in life sciences. For instance,
the majority of the datasets from genome-wide studies, such as
microRNA gene discovery, are heavily imbalanced (Saito and
Rehmsmeier, 2015). The precision–recall plot is an ROC alternative
and can be used to avoid this potential pitfall of the ROC plot (He
and Garcia, 2009; Saito and Rehmsmeier, 2015).
Although some performance evaluation tools offer the calcula-
tion of precision–recall curves, they tend to underestimate several
important aspects. One of these aspects is that any point on an ROC
curve has a one-to-one relationship with a corresponding point on a
precision–recall curve. To satisfy this relationship, precision–recall
curves require non-linear interpolations to connect two adjacent
points, unlike the simple linear interpolations of ROC curves (Davis
and Goadrich, 2006). This non-linear interpolation is further de-
veloped in closely connected areas, such as calculations of AUC
scores and confidence interval bands (Boyd et al., 2013; Keilwagen
et al., 2014). Nonetheless, only a limited number of tools can pro-
duce non-linear interpolations of precision–recall curves (Davis and
Goadrich, 2006; Grau et al., 2015), and they usually come with
high computational demands. Moreover, tools that are specific to
precision–recall calculations tend to lack support for pre- and post-
processing such as handling tied scores and calculating confidence
interval bands, whereas some ROC-specific tools provide multiple
functionalities (Robin et al., 2011). We have developed Precrec, a
tool that offers fast and accurate precision–recall calculations with
several additional functionalities. Our comparison tests show
that Precrec is the only tool that performs fast and accurate
precision–recall calculations under various conditions.
V
C
The Author 2016. Published by Oxford University Press. 145
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits
unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Bioinformatics, 33(1), 2017, 145–147
doi: 10.1093/bioinformatics/btw570
Advance Access Publication Date: 1 September 2016
Applications Note
Downloaded from https://academic.oup.com/bioinformatics/article/33/1/145/2525681 by U.S. Department of Justice user on 16 August 2022

2 Implementation
We separated Precrec into several modules according to their func-
tions, and optimized each module with respect to processing time
and accuracy. Specifically, we focused on the following six aspects
to achieve high accuracy and multiple functionalities:
1. Calculation of correct non-linear interpolations.
2. Estimation of the first point, which is necessary when the preci-
sion value becomes undefined due to no positive predictions.
3. Use of score-wise threshold values instead of fixed bins.
4. Integration of other evaluation measures, such as ROC and basic
measures from the confusion matrix.
5. Handling of multiple models and multiple test sets.
6. Addition of pre- and post-process functions for simple data prep-
aration and curve analysis.
The aspects 1–3 are related to correct curve calculations. The re-
maining aspects pertain to the other evaluation measures and fea-
tures that Precrec offers. Precrec concurrently calculates ROC and
precision–recall curves together with their AUCs. It can also calcu-
late several basic evaluation measures, such as error rate, accuracy,
specificity, sensitivity and positive predictive value. Moreover,
Precrec can directly accept multiple models and multiple test sets.
For instance, it automatically calculates the average curve and the
confidence interval bands when multiple test sets are specified.
Precrec also has powerful features for data preparation. For in-
stance, it offers several options for handling tied scores and missing
values.
To speed up calculations in the Precrec modules, we first tried to
optimize only in R. We replaced some R code with C þþ code when
it was difficult to solve low-performance issues in R.
3 Results
For the evaluation of Precrec, we have developed prcbench, an R li-
brary that serves as a compact testing workbench for the evaluation
of precision–recall curves (available on CRAN). We have also com-
pared our tool with four other tools that can calculate precision–re-
call curves: ROCR (Sing et al., 2005), AUCCalculator (Davis and
Goadrich, 2006), PerfMeas (available on CRAN) and PRROC
(Grau et al., 2015). The workbench provides two types of test re-
sults: the accuracy of the curves (Fig. 1) and the benchmarking of
processing time (Table 1).
3.1 Precrec calculates accurate precision–recall curves
Figure 1A shows the base points of three tests sets C1, C2 and C3.
The tests are based on these pre-calculated points through which
correctly calculated curves must pass. Each test set contains three
categories. SE is for checking the correct curve elongation to the
start and the end points. Ip is for correct curve calculations both on
the intermediate points and interpolations. Rg is for x and y ranges;
it is less important than the other two categories, but incorrect
ranges may cause graph plotting issues. The results show that
ROCR, AUCCalculator and PerfMeas (Fig. 1B–D) have inconsistent
starting points. Of these three, only AUCCalculator applies non-
linear interpolations. Both PRROC and Precrec (Fig. 1E, F) calculate
correct curves on C2 and C3, but only Precrec calculates a correct
curve for C1, whereas PRROC fails on this set by providing several
precision values that are larger than 1 by around 1E-15 in our test
environment (indicated by a dotted curve in Figure 1E; see
Supplementary methods and results for details).
3.2 Precrec uses additional support points for
non-linear interpolation and confidence intervals
Precrec relies on additional support points for non-linear interpol-
ation between two adjacent points and offers an option (x_bins)
Fig. 1. Results of evaluating precision–recall curves calculated by five differ-
ent tools for three test sets C1, C2 and C3. (A) The plot shows manually cal-
culated points for C1 (red), C2 (green) and C3 (blue). Each test set contains
three different test categories: SE (start and end positions), Ip (intermediate
position and interpolation) and Rg (x and y ranges). In addition, each category
has 3–5 individual test items. The remaining plots show the calculated curves
with successes/total per category for (B) ROCR, (C) AUCCalculator, (D)
PerfMeas, (E) PRROC and (F) Precrec
Table 1. Benchmarking results of the five tools in millisecond
Tool Curve AUC NL 100 1000 1 million
ROCR Yes No No 5.4 6.8 (2.6 s)
AUCCalculator Yes Yes Yes 105 216 (33 min)
PerfMeas Yes Yes No 0.2 0.4 763
PRROC Yes Yes Yes 348 (74 sec) (123 days)
a
PRROC (step¼1) Yes Yes No 7.9 96 (6.3 hrs)
a
PRROC (AUC) No Yes Yes 23.7 236 (4 min)
Precrec Yes Yes Yes 6.4 6.8 463
Tool: We performed PRROC (step ¼ 1) with minStepSize ¼ 1 and PRROC
(AUC) without curve calculation. Curve: curve calculation. AUC: AUC calcu-
lation. NL: non-linear interpolation. 100, 1000, 1 million: test dataset size.
We tested each case 10 times and calculated the average (mean) processing
time. The measurement unit is millisecond unless indicated otherwise.
a
We tested only once for these cases.
146 T.Saito and M.Rehmsmeier
Downloaded from https://academic.oup.com/bioinformatics/article/33/1/145/2525681 by U.S. Department of Justice user on 16 August 2022

that associates with the number of support points for the whole
curve, with the default value being 1000. For instance, the distances
between two support points are consistent and respectively 0.5 and
0.001 when x_bins are 2 and 1000. Precrec performs linear interpol-
ation when x_bins is 1. Moreover, this approach enables us to calcu-
late the average curve with confidence interval bands when multiple
test datasets are specified.
3.3 Precrec provides fast calculations regardless of
dataset sizes
Table 1 shows the benchmarking result of processing time for the
five tools. All tools perform reasonably well on small (100 items)
and medium (1000 items) datasets, but only Precrec appears to be
practically useful for calculating accurate non-linear interpolations
(NL:Yes) on large (1 million items) datasets (see Supplementary
methods and results for details).
3.4 Precrec calculates AUCs with high accuracy
Precrec uses the trapezoidal rule to calculate AUC scores. If a differ-
ent number of support points is specified, the score changes accord-
ingly. We also analyzed the accuracy of AUC scores by using
randomly generated datasets. AUC scores appear to be very similar
across the tools especially for large datasets. PerfMeas calculates
AUC scores that are slightly different from the others, but the differ-
ences are small (see Supplementary methods and results for details).
The results also show that there are only small differences between
linear and non-linear AUCs. Nonetheless, correct non-linear inter-
polation can be useful when a dataset contains distantly separated
adjacent points.
3.5 Datasets with class imbalance and tied scores may
require non-linear interpolation
Non-linear interpolation is important when two adjacent points are
distantly separated. Such a separation usually occurs when the data-
set size is small. Nonetheless, it may even occur for large datasets,
for instance, if a dataset is heavily imbalanced or contains a number
of tied scores (see Supplementary methods and results for details).
Hence, it is useful to provide non-linear calculations regardless of
the dataset size.
3.6 Precrec concurrently calculates ROC curve
ROC and precision–recall curves have a number of aspects in com-
mon, and it is sometimes demanded to analyze both curves. Precrec
calculates both curves and their AUCs by default.
4 Summary
The precision–recall plot is more informative than the ROC plot
when evaluating classifiers on imbalanced datasets. Nevertheless,
most performance evaluation tools focus mainly on the ROC plot.
We have developed a performance evaluation library that works
efficiently with various types of datasets and evaluation measures.
In summary, Precrec is a powerful tool which provides fast and ac-
curate precision–recall and ROC calculations with various
functionalities.
Conflict of Interest: none declared.
References
Boyd,K. et al. (2013) Area under the precision–recall curve: point estimates
and confidence intervals. In: Blockeel, H., et al. (eds) Machine Learning and
Knowledge Discovery in Databases: European Conference, ECML PKDD
2013, Prague, Czech Republic, September 23–27, 2013, Proceedings, Part
III. Springer, Berlin, Heidelberg, p. 451–466.
Davis,J. and Goadrich,M. (2006) The relationship between precision–recall
and ROC curves. In: Proceedings of the 23rd international conference on
Machine Learning, pp. 233–240.
Fawcett,T. (2006) An introduction to ROC analysis. Pattern Recognit. Lett.,
27, 861–874.
Grau,J. et al. (2015) PRROC: computing and visualizing precision–recall
and receiver operating characteristic curves in R. Bioinformatics, 31,
2595–2597.
He,H. and Garcia,E. (2009) Learning from imbalanced data. IEEE Trans.
Knowl. Data Eng., 21, 1263–1284.
Keilwagen,J. et al. (2014) Area under precision–recall curves for weighted and
unweighted data. PLoS One, 9, e92209.
Robin,X. et al. (2011) pROC: an open-source package for R and Sþ to analyze
and compare ROC curves. BMC Bioinform atics, 12, 77.
Saito,T. and Rehmsmeier,M. (2015) The precision–recall plot is more inform-
ative than the ROC plot when evaluating binary classifiers on imbalanced
datasets. PLoS One, 10, e0118432.
Sing,T. et al. (2005) ROCR: visualizing classifier performance in R.
Bioinformatics, 21, 3940–3941.
Precrec 147
Downloaded from https://academic.oup.com/bioinformatics/article/33/1/145/2525681 by U.S. Department of Justice user on 16 August 2022
Citations
More filters
Journal ArticleDOI

Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data.

TL;DR: A systematic evaluation of state-of-the-art algorithms for inferring gene regulatory networks from single-cell transcriptional data finds heterogeneous performance and suggests recommendations to users.
Journal ArticleDOI

Multiplexed gene synthesis in emulsions for exploring protein functional landscapes

TL;DR: The DropSynth method, a scalable, low-cost method to build thousands of defined gene-length constructs in a pooled (multiplexed) manner, is introduced and rationally explore sequence-function relationships at an unprecedented scale.
Journal ArticleDOI

Rapid vessel segmentation and reconstruction of head and neck angiograms using 3D convolutional neural network.

TL;DR: An artificial intelligence reconstruction system supported by an optimized physiological anatomical-based 3D convolutional neural network that can automatically achieve CTA reconstruction in healthcare services that facilitates clinical workflows and provides an opportunity for clinical technologists to improve humanistic patient care.
Journal ArticleDOI

Natural language processing for automated detection of incidental durotomy.

TL;DR: Pending external validation, the NLP algorithm developed in this study may be used by entities including national spine registries or hospital quality and safety departments to automate tracking of incidental durotomies.
References
More filters
Journal ArticleDOI

An introduction to ROC analysis

TL;DR: The purpose of this article is to serve as an introduction to ROC graphs and as a guide for using them in research.
Journal ArticleDOI

pROC: an open-source package for R and S+ to analyze and compare ROC curves

TL;DR: pROC as mentioned in this paper is a package for R and S+ that contains a set of tools displaying, analyzing, smoothing and comparing ROC curves in a user-friendly, object-oriented and flexible interface.
Journal ArticleDOI

Learning from Imbalanced Data

TL;DR: A critical review of the nature of the problem, the state-of-the-art technologies, and the current assessment metrics used to evaluate learning performance under the imbalanced learning scenario is provided.
Proceedings ArticleDOI

The relationship between Precision-Recall and ROC curves

TL;DR: It is shown that a deep connection exists between ROC space and PR space, such that a curve dominates in R OC space if and only if it dominates in PR space.
Journal ArticleDOI

ROCR: visualizing classifier performance in R

TL;DR: UNLABELLED ROCR is a package for evaluating and visualizing the performance of scoring classifiers in the statistical language R that features over 25 performance measures that can be freely combined to create two-dimensional performance curves.
Related Papers (5)