scispace - formally typeset
Search or ask a question

Showing papers on "False positive paradox published in 2016"


Journal ArticleDOI
TL;DR: It was showed that the proposed multi-view ConvNets is highly suited to be used for false positive reduction of a CAD system.
Abstract: We propose a novel Computer-Aided Detection (CAD) system for pulmonary nodules using multi-view convolutional networks (ConvNets), for which discriminative features are automatically learnt from the training data. The network is fed with nodule candidates obtained by combining three candidate detectors specifically designed for solid, subsolid, and large nodules. For each candidate, a set of 2-D patches from differently oriented planes is extracted. The proposed architecture comprises multiple streams of 2-D ConvNets, for which the outputs are combined using a dedicated fusion method to get the final classification. Data augmentation and dropout are applied to avoid overfitting. On 888 scans of the publicly available LIDC-IDRI dataset, our method reaches high detection sensitivities of 85.4% and 90.1% at 1 and 4 false positives per scan, respectively. An additional evaluation on independent datasets from the ANODE09 challenge and DLCST is performed. We showed that the proposed multi-view ConvNets is highly suited to be used for false positive reduction of a CAD system.

1,030 citations


Journal ArticleDOI
28 Mar 2016-RNA
TL;DR: For future RNA-seq experiments, results suggest that at least six biological replicates should be used, rising to at least 12 when it is important to identify SDE genes for all fold changes, and if fewer than 12 replicates are used, a superior combination of true positive and false positive performances makes edgeR and DESeq2 the leading tools.
Abstract: RNA-seq is now the technology of choice for genome-wide differential gene expression experiments, but it is not clear how many biological replicates are needed to ensure valid biological interpretation of the results or which statistical tools are best for analyzing the data. An RNA-seq experiment with 48 biological replicates in each of two conditions was performed to answer these questions and provide guidelines for experimental design. With three biological replicates, nine of the 11 tools evaluated found only 20%–40% of the significantly differentially expressed (SDE) genes identified with the full set of 42 clean replicates. This rises to >85% for the subset of SDE genes changing in expression by more than fourfold. To achieve >85% for all SDE genes regardless of fold change requires more than 20 biological replicates. The same nine tools successfully control their false discovery rate at ≲5% for all numbers of replicates, while the remaining two tools fail to control their FDR adequately, particularly for low numbers of replicates. For future RNA-seq experiments, these results suggest that at least six biological replicates should be used, rising to at least 12 when it is important to identify SDE genes for all fold changes. If fewer than 12 replicates are used, a superior combination of true positive and false positive performances makes edgeR and DESeq2 the leading tools. For higher replicate numbers, minimizing false positives is more important and DESeq marginally outperforms the other tools.

605 citations


Journal ArticleDOI
TL;DR: In this article, the authors present astrophysical false positive probability calculations for every Kepler Object of Interest (KOI) using vespa, a publicly available Python package that is able to be easily applied to any transiting exoplanet candidate.
Abstract: We present astrophysical false positive probability calculations for every Kepler Object of Interest (KOI)—the first large-scale demonstration of a fully automated transiting planet validation procedure. Out of 7056 KOIs, we determine that 1935 have probabilities <1% of being astrophysical false positives, and thus may be considered validated planets. Of these, 1284 have not yet been validated or confirmed by other methods. In addition, we identify 428 KOIs that are likely to be false positives, but have not yet been identified as such, though some of these may be a result of unidentified transit timing variations. A side product of these calculations is full stellar property posterior samplings for every host star, modeled as single, binary, and triple systems. These calculations use vespa, a publicly available Python package that is able to be easily applied to any transiting exoplanet candidate.

465 citations


Journal ArticleDOI
TL;DR: The possible physiological origins of fNIRS hemodynamic responses that are not due to neurovascular coupling are summarized and ways to avoid and remove them are suggested.
Abstract: We highlight a significant problem that needs to be considered and addressed when performing functional near-infrared spectroscopy (fNIRS) studies, namely the possibility of inadvertently measuring fNIRS hemodynamic responses that are not due to neurovascular coupling. These can be misinterpreted as brain activity, i.e., "false positives" (errors caused by wrongly assigning a detected hemodynamic response to functional brain activity), or mask brain activity, i.e., "false negatives" (errors caused by wrongly assigning a not observed hemodynamic response in the presence of functional brain activity). Here, we summarize the possible physiological origins of these issues and suggest ways to avoid and remove them.

415 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present astrophysical false positive probability calculations for every Kepler Object of Interest (KOI) and demonstrate the first large-scale demonstration of a fully automated transiting planet validation procedure.
Abstract: We present astrophysical false positive probability calculations for every Kepler Object of Interest (KOI)---the first large-scale demonstration of a fully automated transiting planet validation procedure. Out of 7056 KOIs, we determine that 1935 have probabilities <1% to be astrophysical false positives, and thus may be considered validated planets. 1284 of these have not yet been validated or confirmed by other methods. In addition, we identify 428 KOIs likely to be false positives that have not yet been identified as such, though some of these may be a result of unidentified transit timing variations. A side product of these calculations is full stellar property posterior samplings for every host star, modeled as single, binary, and triple systems. These calculations use 'vespa', a publicly available Python package able to be easily applied to any transiting exoplanet candidate.

413 citations


Posted Content
TL;DR: By incorporating information about dependence ignored in classical multiple testing procedures, such as the Bonferroni and Holm corrections, the bootstrap-based procedure has much greater ability to detect truly false null hypotheses.
Abstract: Empiricism in the sciences allows us to test theories, formulate optimal policies, and learn how the world works. In this manner, it is critical that our empirical work provides accurate conclusions about underlying data patterns. False positives represent an especially important problem, as vast public and private resources can be misguided if we base decisions on false discovery. This study explores one especially pernicious influence on false positives—multiple hypothesis testing (MHT). While MHT potentially affects all types of empirical work, we consider three common scenarios where MHT influences inference within experimental economics: jointly identifying treatment effects for a set of outcomes, estimating heterogeneous treatment effects through subgroup analysis, and conducting hypothesis testing for multiple treatment conditions. Building upon the work of Romano and Wolf (2010), we present a correction procedure that incorporates the three scenarios, and illustrate the improvement in power by comparing our results with those obtained by the classic studies due to Bonferroni (1935) and Holm (1979). Importantly, under weak assumptions, our testing procedure asymptotically controls the familywise error rate – the probability of one false rejection – and is asymptotically balanced. We showcase our approach by revisiting the data reported in Karlan and List (2007), to deepen our understanding of why people give to charitable causes.

292 citations


Proceedings ArticleDOI
01 Jun 2016
TL;DR: In this article, a generic iterative framework for fine-grained categorization and dataset bootstrapping is proposed, which uses deep metric learning with humans in the loop, and learns a low dimensional feature embedding with anchor points on manifolds for each category.
Abstract: Existing fine-grained visual categorization methods often suffer from three challenges: lack of training data, large number of fine-grained categories, and high intraclass vs. low inter-class variance. In this work we propose a generic iterative framework for fine-grained categorization and dataset bootstrapping that handles these three challenges. Using deep metric learning with humans in the loop, we learn a low dimensional feature embedding with anchor points on manifolds for each category. These anchor points capture intra-class variances and remain discriminative between classes. In each round, images with high confidence scores from our model are sent to humans for labeling. By comparing with exemplar images, labelers mark each candidate image as either a "true positive" or a "false positive." True positives are added into our current dataset and false positives are regarded as "hard negatives" for our metric learning model. Then the model is retrained with an expanded dataset and hard negatives for the next round. To demonstrate the effectiveness of the proposed framework, we bootstrap a fine-grained flower dataset with 620 categories from Instagram images. The proposed deep metric learning scheme is evaluated on both our dataset and the CUB-200-2001 Birds dataset. Experimental evaluations show significant performance gain using dataset bootstrapping and demonstrate state-of-the-art results achieved by the proposed deep metric learning methods.

215 citations


Journal ArticleDOI
TL;DR: A systematic review of 73 studies of contextual cuing, a popular implicit learning paradigm, involving 181 statistical analyses of awareness tests, reveals how underpowered studies can lead to failure to reject a false null hypothesis and challenges a widespread and theoretically important claim about the extent of unconscious human cognition.
Abstract: The scientific community has witnessed growing concern about the high rate of false positives and unreliable results within the psychological literature, but the harmful im- pact offalse negatives has been largely ignored. False negatives are particularly concerning in research areas where demonstrat- ing the absence of an effect is crucial, such as studies of uncon- scious or implicit processing. Research on implicit processes seeks evidence of above-chance performance on some implicit behavioral measure at the same time as chance-level perfor- mance (that is, a null result) on an explicit measure of aware- ness. A systematic review of 73 studies of contextual cuing, a popular implicit learning paradigm, involving 181 statistical analyses of awareness tests, reveals how underpowered studies can lead to failure to reject a false null hypothesis. Among the studies that reported sufficient information, the meta-analytic effect size across awareness tests was dz = 0.31 (95 % CI 0.24- 0.37), showing that participants' learning in these experiments was conscious. The unusually large number of positive results in this literature cannot be explained by selective publication. Instead, our analyses demonstrate that these tests are typically insensitive and underpowered to detect medium to small, but true, effects in awareness tests. These findings challenge a widespread and theoretically important claim about the extent of unconscious human cognition.

189 citations


Journal ArticleDOI
TL;DR: The importance of controlling for false detection from early steps of eDNA analyses (laboratory, bioinformatics), to improve the quality of results and allow an efficient use of the site occupancy‐detection modelling (SODM) framework for limiting false presences in eDNA analysis is discussed.
Abstract: Environmental DNA (eDNA) and metabarcoding are boosting our ability to acquire data on species distribution in a variety of ecosystems. Nevertheless, as most of sampling approaches, eDNA is not perfect. It can fail to detect species that are actually present, and even false positives are possible: a species may be apparently detected in areas where it is actually absent. Controlling false positives remains a main challenge for eDNA analyses: in this issue of Molecular Ecology Resources, Lahoz-Monfort et al. () test the performance of multiple statistical modelling approaches to estimate the rate of detection and false positives from eDNA data. Here, we discuss the importance of controlling for false detection from early steps of eDNA analyses (laboratory, bioinformatics), to improve the quality of results and allow an efficient use of the site occupancy-detection modelling (SODM) framework for limiting false presences in eDNA analysis.

161 citations


Journal ArticleDOI
TL;DR: A new malware detection method, named ICCDetector, that detects and classifies malwares into five newly defined malware categories, which help understand the relationship between malicious behaviors and ICC characteristics, and provides a systemic analysis of ICC patterns of benign apps and malWares.
Abstract: Most existing mobile malware detection methods (e.g., Kirin and DroidMat) are designed based on the resources required by malwares (e.g., permissions, application programming interface (API) calls, and system calls). These methods capture the interactions between mobile apps and Android system, but ignore the communications among components within or cross application boundaries. As a consequence, the majority of the existing methods are less effective in identifying many typical malwares, which require a few or no suspicious resources, but leverage on inter-component communication (ICC) mechanism when launching stealthy attacks. To address this challenge, we propose a new malware detection method, named ICCDetector. ICCDetector outputs a detection model after training with a set of benign apps and a set of malwares, and employs the trained model for malware detection. The performance of ICCDetector is evaluated with 5264 malwares, and 12026 benign apps. Compared with our benchmark, which is a permission-based method proposed by Peng et al. in 2012 with an accuracy up to 88.2%, ICCDetector achieves an accuracy of 97.4%, roughly 10% higher than the benchmark, with a lower false positive rate of 0.67%, which is only about a half of the benchmark. After manually analyzing false positives, we discover 43 new malwares from the benign data set, and reduce the number of false positives to seven. More importantly, ICCDetector discovers 1708 more advanced malwares than the benchmark, while it misses 220 obvious malwares, which can be easily detected by the benchmark. For the detected malwares, ICCDetector further classifies them into five newly defined malware categories, which help understand the relationship between malicious behaviors and ICC characteristics. We also provide a systemic analysis of ICC patterns of benign apps and malwares.

160 citations


Journal ArticleDOI
01 Aug 2016-BMJ Open
TL;DR: The role of trial sequential analysis (TSA) in assessing the reliability of conclusions in underpowered meta-analyses was explored and the true proportion of false positives in meta-analysis is probably higher.
Abstract: Objective Many published meta-analyses are underpowered. We explored the role of trial sequential analysis (TSA) in assessing the reliability of conclusions in underpowered meta-analyses. Methods We screened The Cochrane Database of Systematic Reviews and selected 100 meta-analyses with a binary outcome, a negative result and sufficient power. We defined a negative result as one where the 95% CI for the effect included 1.00, a positive result as one where the 95% CI did not include 1.00, and sufficient power as the required information size for 80% power, 5% type 1 error, relative risk reduction of 10% or number needed to treat of 100, and control event proportion and heterogeneity taken from the included studies. We re-conducted the meta-analyses, using conventional cumulative techniques, to measure how many false positives would have occurred if these meta-analyses had been updated after each new trial. For each false positive, we performed TSA, using three different approaches. Results We screened 4736 systematic reviews to find 100 meta-analyses that fulfilled our inclusion criteria. Using conventional cumulative meta-analysis, false positives were present in seven of the meta-analyses (7%, 95% CI 3% to 14%), occurring more than once in three. The total number of false positives was 14 and TSA prevented 13 of these (93%, 95% CI 68% to 98%). In a post hoc analysis, we found that Cochrane meta-analyses that are negative are 1.67 times more likely to be updated (95% CI 0.92 to 2.68) than those that are positive. Conclusions We found false positives in 7% (95% CI 3% to 14%) of the included meta-analyses. Owing to limitations of external validity and to the decreased likelihood of updating positive meta-analyses, the true proportion of false positives in meta-analysis is probably higher. TSA prevented 93% of the false positives (95% CI 68% to 98%).

Journal ArticleDOI
TL;DR: This work advocates alternative approaches to account for false‐positive errors that rely on prior information, or the collection of ancillary detection data at a subset of sites using a sampling method that is not prone to false‐ positive errors.
Abstract: Environmental DNA (eDNA) sampling is prone to both false-positive and false-negative errors. We review statistical methods to account for such errors in the analysis of eDNA data and use simulations to compare the performance of different modelling approaches. Our simulations illustrate that even low false-positive rates can produce biased estimates of occupancy and detectability. We further show that removing or classifying single PCR detections in an ad hoc manner under the suspicion that such records represent false positives, as sometimes advocated in the eDNA literature, also results in biased estimation of occupancy, detectability and false-positive rates. We advocate alternative approaches to account for false-positive errors that rely on prior information, or the collection of ancillary detection data at a subset of sites using a sampling method that is not prone to false-positive errors. We illustrate the advantages of these approaches over ad hoc classifications of detections and provide practical advice and code for fitting these models in maximum likelihood and Bayesian frameworks. Given the severe bias induced by false-negative and false-positive errors, the methods presented here should be more routinely adopted in eDNA studies.

Journal ArticleDOI
TL;DR: Large tumor-only targeted panels are sufficient for most somatic variant identification and mutational load prediction if paired with expanded germline analysis strategies and molecular pathologist review.
Abstract: The diversity of clinical tumor profiling approaches (small panels to whole exomes with matched or unmatched germline analysis) may engender uncertainty about their benefits and liabilities, particularly in light of reported germline false positives in tumor-only profiling and use of global mutational and/or neoantigen data. The goal of this study was to determine the impact of genomic analysis strategies on error rates and data interpretation across contexts and ancestries. We modeled common tumor profiling modalities—large (n = 300 genes), medium (n = 48 genes), and small (n = 15 genes) panels—using clinical whole exomes (WES) from 157 patients with lung or colon adenocarcinoma. We created a tumor-only analysis algorithm to assess germline false positive rates, the impact of patient ancestry on tumor-only results, and neoantigen detection. After optimizing a germline filtering strategy, the germline false positive rate with tumor-only large panel sequencing was 14 % (144/1012 variants). For patients whose tumor-only results underwent molecular pathologist review (n = 91), 50/54 (93 %) false positives were correctly interpreted as uncertain variants. Increased germline false positives were observed in tumor-only sequencing of non-European compared with European ancestry patients (p < 0.001; Fisher’s exact) when basic germline filtering approaches were used; however, the ExAC database (60,706 germline exomes) mitigated this disparity (p = 0.53). Matched and unmatched large panel mutational load correlated with WES mutational load (r2 = 0.99 and 0.93, respectively; p < 0.001). Neoantigen load also correlated (r2 = 0.80; p < 0.001), though WES identified a broader spectrum of neoantigens. Small panels did not predict mutational or neoantigen load. Large tumor-only targeted panels are sufficient for most somatic variant identification and mutational load prediction if paired with expanded germline analysis strategies and molecular pathologist review. Paired germline sequencing reduced overall false positive mutation calls and WES provided the most neoantigens. Without patient-matched germline data, large germline databases are needed to minimize false positive mutation calling and mitigate ethnic disparities.

Journal ArticleDOI
TL;DR: This review selectively surveys some of the most important changes in DSM-5, including structural/organizational changes, modifications of diagnostic criteria, and newly introduced categories, and analyzes why these changes led to such heated controversies.
Abstract: The fifth revision of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) was the most controversial in the manual's history. This review selectively surveys some of the most important changes in DSM-5, including structural/organizational changes, modifications of diagnostic criteria, and newly introduced categories. It analyzes why these changes led to such heated controversies, which included objections to the revision's process, its goals, and the content of altered criteria and new categories. The central focus is on disputes concerning the false positives problem of setting a valid boundary between disorder and normal variation. Finally, this review highlights key problems and issues that currently remain unresolved and need to be addressed in the future, including systematically identifying false positive weaknesses in criteria, distinguishing risk from disorder, including context in diagnostic criteria, clarifying how to handle fuzzy boundaries, and improving the guidelines for "other specified" diagnosis.

Journal ArticleDOI
TL;DR: In this article, the problem of multiple testing within a Geographically Weighted Regression framework is described and a possible solution to the problem which is based on a family-wise error rate for dependent processes is presented.
Abstract: This article describes the problem of multiple testing within a Geographically Weighted Regression framework and presents a possible solution to the problem which is based on a family-wise error rate for dependent processes. We compare the solution presented here to other solutions such as the Bonferroni correction and the Byrne, Charlton, and Fotheringham proposal which is based on the Benjamini and Hochberg False Discovery Rate. We conclude that our proposed correction is superior to others and that generally some correction in the conventional t-test is necessary to avoid false positives in GWR.

Journal ArticleDOI
TL;DR: Clinicians may consider awaiting confirmatory testing in retreatment patients with CT> 30; however, most false positives fall below this cut-point, and Xpert can detect DNA from nonviable, nonintact bacilli.
Abstract: Background Patients with previous tuberculosis may have residual DNA in sputum that confounds nucleic acid amplification tests such as Xpert MTB/RIF. Little is known about the frequency of Xpert-positive, culture-negative ("false positive") results in retreatment patients, whether these are distinguishable from true positives, and whether Xpert's automated filter-based wash step reduces false positivity by removing residual DNA associated with nonintact cells. Methods Pretreatment patients (n = 2889) with symptoms of tuberculosis from Cape Town, South Africa, underwent a sputum-based liquid culture and Xpert. We also compared Xpert results from dilutions of intact or heat-lysed and mechanically lysed bacilli. Results Retreatment cases were more likely to be Xpert false-positive (45/321 Xpert-positive retreatment cases were false-positive) than new cases (40/461) (14% [95% confidence interval {CI}, 10%-18%] vs 8% [95% CI, 6%-12%];P= .018). Fewer years since treatment completion (adjusted odds ratio [aOR], 0.85 [95% CI, .73-.99]), less mycobacterial DNA (aOR, 1.14 [95% CI, 1.03-1.27] per cycle threshold [CT]), and a chest radiograph not suggestive of active tuberculosis (aOR, 0.22 [95% CI, .06-.82]) were associated with false positivity. CThad suboptimal accuracy for false positivity: 46% of Xpert-positives with CT> 30 would be false positive, although 70% of false positives would be missed. CT's predictive ability (area under the curve, 0.83 [95% CI, .76-.90]) was not improved by additional variables. Xpert detected nonviable, nonintact bacilli without a change in CTvs controls. Conclusions One in 7 Xpert-positive retreatment patients were culture negative and potentially false positive. False positivity was associated with recent previous tuberculosis, high CT, and a chest radiograph not suggestive of active tuberculosis. Clinicians may consider awaiting confirmatory testing in retreatment patients with CT> 30; however, most false positives fall below this cut-point. Xpert can detect DNA from nonviable, nonintact bacilli.

Proceedings ArticleDOI
11 Jul 2016
TL;DR: A model for reducing false positives using data mining techniques by combining support vector machines (SVM), decision trees, and Naïve Bayes is proposed.
Abstract: Intrusion detection systems monitor network or host packets in an attempt to detect malicious activities on a system. Anomaly detection systems have success in exposing new attacks, commonly referred to as ‘zero’ day attacks, yet have high false positive rates. False positive events occur when an activity is flagged for investigation yet it was determined to be benign upon analysis. Computational power and valuable resources are wasted when the irrelevant data is processed, data flagged, analyst alerted, and the irrelevant data is finally disregarded. In an effort to make intrusion detection systems more efficient the false positive rate must be reduced. This paper proposes a model for reducing false positives using data mining techniques by combining support vector machines (SVM), decision trees, and Naive Bayes.

Journal ArticleDOI
TL;DR: The purpose of this study was to determine the likelihood that analyzing smooth 1D data with a 0D model of variance will produce false positives, and used random field theory (RFT) to predict the probability of false positives in 0D analyses.

Journal ArticleDOI
TL;DR: For early warning indicators to be effective tools for preventative management of ecosystem change, it is recommended that multivariate approaches of a suite of potential indicators are adopted, incorporating analyses of anthropogenic drivers and process-based understanding.
Abstract: 1. Anthropogenic pressures, including climate change, are causing nonlinear changes in ecosystems globally. The development of reliable early warning indicators (EWIs) to predict these changes is vital for the adaptive management of ecosystems and the protection of biodiversity, natural capital and ecosystem services. Increased variance and autocorrelation are potential early warning indicators and can be readily estimated from ecological time series. Here, we undertook a comprehensive test of the consistency between early warning indicators and nonlinear abundance change across species, trophic levels and ecosystem types. 2. We tested whether long-term abundance time series of 55 taxa (126 data sets) across multiple trophic levels in marine and freshwater ecosystems showed (i) significant nonlinear change in abundance ‘turning points’ and (ii) significant increases in variance and autocorrelation (‘early warning indicators’). For each data set, we then quantified the prevalence of three cases: true positives (early warning indicators and associated turning point), false negatives (turning point but no associated early warning indicators) and false positives (early warning indicators but no turning point). 3. True positives were rare, representing only 9% (16 of 170) of cases using variance, and 13% (19 of 152) of cases using autocorrelation. False positives were more prevalent than false negatives (53% vs. 38% for variance; 47% vs. 40% for autocorrelation). False results were found in every decade and across all trophic levels and ecosystems. 4. Time series that contained true positives were uncommon (8% for variance; 6% for autocorrelation), with all but one time series also containing false classifications. Coherence between the types of early warning indicators was generally low with 43% of time series categorized differently based on variance compared to autocorrelation. 5. Synthesis and applications. Conservation management requires effective early warnings of ecosystem change using readily available data, and variance and autocorrelation in abundance data have been suggested as candidates. However, our study shows that they consistently fail to predict nonlinear change. For early warning indicators to be effective tools for preventative management of ecosystem change, we recommend that multivariate approaches of a suite of potential indicators are adopted, incorporating analyses of anthropogenic drivers and process-based understanding.

Proceedings ArticleDOI
16 May 2016
TL;DR: This paper provides evidence for the value of multimodal execution monitoring and the use of a detection threshold that varies based on the progress of execution and evaluates the approach with haptic, visual, auditory, and kinematic sensing during a variety of manipulation tasks performed by a PR2 robot.
Abstract: Online detection of anomalous execution can be valuable for robot manipulation, enabling robots to operate more safely, determine when a behavior is inappropriate, and otherwise exhibit more common sense. By using multiple complementary sensory modalities, robots could potentially detect a wider variety of anomalies, such as anomalous contact or a loud utterance by a human. However, task variability and the potential for false positives make online anomaly detection challenging, especially for long-duration manipulation behaviors. In this paper, we provide evidence for the value of multimodal execution monitoring and the use of a detection threshold that varies based on the progress of execution. Using a data-driven approach, we train an execution monitor that runs in parallel to a manipulation behavior. Like previous methods for anomaly detection, our method trains a hidden Markov model (HMM) using multimodal observations from non-anomalous executions. In contrast to prior work, our system also uses a detection threshold that changes based on the execution progress. We evaluated our approach with haptic, visual, auditory, and kinematic sensing during a variety of manipulation tasks performed by a PR2 robot. The tasks included pushing doors closed, operating switches, and assisting able-bodied participants with eating yogurt. In our evaluations, our anomaly detection method performed substantially better with multimodal monitoring than single modality monitoring. It also resulted in more desirable ROC curves when compared with other detection threshold methods from the literature, obtaining higher true positive rates for comparable false positive rates.

Proceedings ArticleDOI
14 Mar 2016
TL;DR: This work proposes a preliminary classification of such false positives with the aim of facilitating a better understanding of the effects of anti-patterns and code smells in practice, and hopes that the development and further refinement of such a classification can support researchers and tool vendors in their endeavour to develop more pragmatic, context-relevant detection and analysis tools for anti- patterns.
Abstract: Anti-patterns and code smells are archetypes used for describing software design shortcomings that can negatively affect software quality, in particular maintainability. Tools, metrics and methodologies have been developed to identify these archetypes, based on the assumption that they can point at problematic code. However, recent empirical studies have shown that some of these archetypes are ubiquitous in real world programs, and many of them are found not to be as detrimental to quality as previously conjectured. We are therefore interested in revisiting common anti-patterns and code smells, and building a catalogue of cases that constitute candidates for "false positives". We propose a preliminary classification of such false positives with the aim of facilitating a better understanding of the effects of anti-patterns and code smells in practice. We hope that the development and further refinement of such a classification can support researchers and tool vendors in their endeavour to develop more pragmatic, context-relevant detection and analysis tools for anti-patterns and code smells.

Journal ArticleDOI
TL;DR: This paper found that about 40% of studies fail to fully report all experimental conditions and about 70% do not report all outcome variables included in the questionnaires and data sets of psychology experiments.
Abstract: Many scholars have raised concerns about the credibility of empirical findings in psychology, arguing that the proportion of false positives reported in the published literature dramatically exceeds the rate implied by standard significance levels. A major contributor of false positives is the practice of reporting a subset of the potentially relevant statistical analyses pertaining to a research project. This study is the first to provide direct evidence of selective underreporting in psychology experiments. To overcome the problem that the complete experimental design and full set of measured variables are not accessible for most published research, we identify a population of published psychology experiments from a competitive grant program for which questionnaires and data are made publicly available because of an institutional rule. We find that about 40% of studies fail to fully report all experimental conditions and about 70% of studies do not report all outcome variables included in the questionna...

Journal ArticleDOI
TL;DR: It is observed that security training makes a noticeable difference in a user's ability to detect deception attempts, with one of the most important features being the time since last self-study, while formal security education through lectures appears to be much less useful as a predictor.
Abstract: Semantic social engineering attacks are a pervasive threat to computer and communication systems. By employing deception rather than by exploiting technical vulnerabilities, spear-phishing, obfuscated URLs, drive-by downloads, spoofed websites, scareware, and other attacks are able to circumvent traditional technical security controls and target the user directly. Our aim is to explore the feasibility of predicting user susceptibility to deception-based attacks through attributes that can be measured, preferably in real-time and in an automated manner. Toward this goal, we have conducted two experiments, the first on 4333 users recruited on the Internet, allowing us to identify useful high-level features through association rule mining, and the second on a smaller group of 315 users, allowing us to study these features in more detail. In both experiments, participants were presented with attack and non-attack exhibits and were tested in terms of their ability to distinguish between the two. Using the data collected, we have determined practical predictors of users’ susceptibility against semantic attacks to produce and evaluate a logistic regression and a random forest prediction model, with the accuracy rates of. 68 and. 71, respectively. We have observed that security training makes a noticeable difference in a user’s ability to detect deception attempts, with one of the most important features being the time since last self-study, while formal security education through lectures appears to be much less useful as a predictor. Other important features were computer literacy, familiarity, and frequency of access to a specific platform. Depending on an organisation’s preferences, the models learned can be configured to minimise false positives or false negatives or maximise accuracy, based on a probability threshold. For both models, a threshold choice of 0.55 would keep both false positives and false negatives below 0.2.

Proceedings ArticleDOI
11 Mar 2016
TL;DR: A comparison of the proposed fractal based method with a traditional Euclidean based machine learning algorithm (k-NN) shows that the proposed method significantly outperforms the traditional approach by reducing false positive and false negative rates, simultaneously, while improving the overall classification rates.
Abstract: Advanced Persistent Threats (APTs) are a new breed of internet based smart threats, which can go undetected with the existing state of-the-art internet traffic monitoring and protection systems. With the evolution of internet and cloud computing, a new generation of smart APT attacks has also evolved and signature based threat detection systems are proving to be futile and insufficient. One of the essential strategies in detecting APTs is to continuously monitor and analyze various features of a TCP/IP connection, such as the number of transferred packets, the total count of the bytes exchanged, the duration of the TCP/IP connections, and details of the number of packet flows. The current threat detection approaches make extensive use of machine learning algorithms that utilize statistical and behavioral knowledge of the traffic. However, the performance of these algorithms is far from satisfactory in terms of reducing false negatives and false positives simultaneously. Mostly, current algorithms focus on reducing false positives, only. This paper presents a fractal based anomaly classification mechanism, with the goal of reducing both false positives and false negatives, simultaneously. A comparison of the proposed fractal based method with a traditional Euclidean based machine learning algorithm (k-NN) shows that the proposed method significantly outperforms the traditional approach by reducing false positive and false negative rates, simultaneously, while improving the overall classification rates.

Journal ArticleDOI
TL;DR: Non‐invasive prenatal testing (NIPT) has been widely used to screen for common aneuploidies since 2011, but false positive results can occur and positive results should be confirmed with invasive testing before any irreversible procedure is performed.
Abstract: Non-invasive prenatal testing (NIPT) has been widely used to screen for common aneuploidies since 2011. While NIPT is highly sensitive and specific, false positive results can occur. One important cause of false positive results is confined placental mosaicism (CPM). This can occur through a mitotic nondisjunction event or through aneuploidy rescue. CPM is usually associated with normal fetal outcomes, but has been associated with intrauterine growth restriction, pregnancy loss, or perinatal death in some cases. CPM may also be a marker for uniparental disomy. Given that NIPT can result in false positives, positive results should be confirmed with invasive testing before any irreversible procedure is performed. Whether to perform CVS or amniocentesis to confirm a positive NIPT result is controversial. While CVS can be performed earlier than amniocentesis, CPM can also cause false positive results. Our practice is to proceed with CVS, and to examine all cell lines using both an uncultured sample using fluorescence in situ hybridization (FISH) or short-term culture, as well as long-term culture of the sample. If the results all show aneuploidy, the results are reported to the patient. Otherwise, if the results are also mosaic, amniocentesis is recommended and analyzed by both FISH and karyotype. © 2016 Wiley Periodicals, Inc.

Patent
26 Aug 2016
TL;DR: In this paper, a method and a system for model checker based efficient elimination of false positives from static analysis warnings generated during static analysis of an application code is presented. But this method is not suitable for the verification of assertions.
Abstract: A method and a system is disclosed herein for model checker based efficient elimination of false positives from static analysis warnings generated during static analysis of an application code. The system computes complete-range non-deterministic value variables (cnv variables) that are based on data flow analysis or static approximation of execution paths by control flow paths. During computation of cnv variables, over approximation (may-cnv variables) and under approximation (must-cnv variables) of a set of cnv variables at a program point is identified. The computed cnv variables are used to check whether an assertion variable is a cnv variable and the corresponding assertion verification call is redundant or otherwise. The identified redundant calls are then skipped for the efficiency of the false positives elimination and the model checker is invoked corresponding to the non-redundant assertion verification calls.

Book ChapterDOI
24 Oct 2016
TL;DR: This paper proposes an anomaly detection approach that incorporates perspectives that go beyond the control flow, such as, time and resources (i.e., to detect contextual anomalies), and is capable of dealing with unexpected process model execution events.
Abstract: Ensuring anomaly-free process model executions is crucial in order to prevent fraud and security breaches. Existing anomaly detection approaches focus on the control flow, point anomalies, and struggle with false positives in the case of unexpected events. By contrast, this paper proposes an anomaly detection approach that incorporates perspectives that go beyond the control flow, such as, time and resources (i.e., to detect contextual anomalies). In addition, it is capable of dealing with unexpected process model execution events: not every unexpected event is immediately detected as anomalous, but based on a certain likelihood of occurrence, hence reducing the number of false positives. Finally, multiple events are analyzed in a combined manner in order to detect collective anomalies. The performance and applicability of the overall approach are evaluated by means of a prototypical implementation along and based on real life process execution logs from multiple domains.

Journal ArticleDOI
08 Sep 2016-Tellus A
TL;DR: The importance of statistical multiplicity has not been appreciated, and this becomes particularly dangerous when many experiments are compared together as discussed by the authors, and it is shown that the naive application of Student's t -test does generate too many false positives (i.e. false rejections of the null hypothesis).
Abstract: The impact of developments in weather forecasting is measured using forecast verification, but many developments, though useful, have impacts of less than 0.5 % on medium-range forecast scores. Chaotic variability in the quality of individual forecasts is so large that it can be hard to achieve statistical significance when comparing these ‘smaller’ developments to a control. For example, with 60 separate forecasts and requiring a 95 % confidence level, a change in quality of the day-5 forecast needs to be larger than 1 % to be statistically significant using a Student’s t -test. The first aim of this study is simply to illustrate the importance of significance testing in forecast verification, and to point out the surprisingly large sample sizes that are required to attain significance. The second aim is to see how reliable are current approaches to significance testing, following suspicion that apparently significant results may actually have been generated by chaotic variability. An independent realisation of the null hypothesis can be created using a forecast experiment containing a purely numerical perturbation, and comparing it to a control. With 1885 paired differences from about 2.5 yr of testing, an alternative significance test can be constructed that makes no statistical assumptions about the data. This is used to experimentally test the validity of the normal statistical framework for forecast scores, and it shows that the naive application of Student’s t -test does generate too many false positives (i.e. false rejections of the null hypothesis). A known issue is temporal autocorrelation in forecast scores, which can be corrected by an inflation in the size of confidence range, but typical inflation factors, such as those based on an AR(1) model, are not big enough and they are affected by sampling uncertainty. Further, the importance of statistical multiplicity has not been appreciated, and this becomes particularly dangerous when many experiments are compared together. For example, across three forecast experiments, there could be roughly a 1 in 2 chance of getting a false positive. However, if correctly adjusted for autocorrelation, and when the effects of multiplicity are properly treated using a Sidak correction, the t -test is a reliable way of finding the significance of changes in forecast scores. Keywords: statistical significance, forecast verification, Student’s t, temporal autocorrelation, paired differences, multiple comparisons (Published: 8 September 2016) Citation: Tellus A 2016, 68, 30229, http://dx.doi.org/10.3402/tellusa.v68.30229

Journal ArticleDOI
TL;DR: For conditions with very low prevalence, small reductions in specificity greatly increase false positives, and this inescapable test characteristic governs the predictive value of genomic sequencing in the general population.

Book ChapterDOI
TL;DR: This chapter covers the basics of FDR, its application in proteomics, and methods to estimate FDR, the metric for global confidence assessment of a large-scale proteomics dataset.
Abstract: With the advancement in proteomics separation techniques and improvements in mass analyzers, the data generated in a mass-spectrometry based proteomics experiment is rising exponentially. Such voluminous datasets necessitate automated computational tools for high-throughput data analysis and appropriate statistical control. The data is searched using one or more of the several popular database search algorithms. The matches assigned by these tools can have false positives and statistical validation of these false matches is necessary before making any biological interpretations. Without such procedures, the biological inferences do not hold true and may be outright misleading. There is a considerable overlap between true and false positives. To control the false positives amongst a set of accepted matches, there is a need for some statistical estimate that can reflect the amount of false positives present in the data processed. False discovery rate (FDR) is the metric for global confidence assessment of a large-scale proteomics dataset. This chapter covers the basics of FDR, its application in proteomics, and methods to estimate FDR.