Author
Scott Dick
Other affiliations: University of South Florida
Bio: Scott Dick is an academic researcher from University of Alberta. The author has contributed to research in topic(s): Fuzzy logic & Fuzzy classification. The author has an hindex of 24, co-authored 80 publication(s) receiving 2307 citation(s). Previous affiliations of Scott Dick include University of South Florida.
Papers published on a yearly basis
Papers
More filters
TL;DR: An in-depth investigation of multi-label classification algorithms for disaggregating appliances in a power signal shows that this class of algorithms has received little attention in the literature, but is arguably a more natural fit to the disaggregation problem than the traditional single-label classifiers used to date.
Abstract: Demand-side management technology is a key element of the proposed smart grid, which will help utilities make more efficient use of their generation assets by reducing consumers’ energy demand during peak load periods. However, although some modern appliances can respond to price signals from the utility companies, there is a vast stock of older appliances that cannot. For such appliances, utilities must infer what appliances are operating in a home, given only the power signals on the main feeder to the home (i.e., the home’s power consumption must be disaggregated into individual appliances). We report on an in-depth investigation of multi-label classification algorithms for disaggregating appliances in a power signal. A systematic review of this research topic shows that this class of algorithms has received little attention in the literature, even though it is arguably a more natural fit to the disaggregation problem than the traditional single-label classifiers used to date. We examine a multi-label meta-classification framework (RA ${k}$ EL), and a bespoke multi-label classification algorithm (ML ${k}$ NN), employing both time-domain and wavelet-domain feature sets. We test these classifiers on two real houses from the Reference Energy Disaggregation Dataset. We found that the multilabel algorithms are effective and competitive with published results on the datasets.
196 citations
TL;DR: An ensemble classification method and a compact feature-based sequence representation that improves prediction accuracy for the four main structural classes compared to competing methods, and provides highly accurate predictions for sequences of widely varying homologies.
Abstract: Structural class characterizes the overall folding type of a protein or its domain. A number of computational methods have been proposed to predict structural class based on primary sequences; however, the accuracy of these methods is strongly affected by sequence homology. This paper proposes, an ensemble classification method and a compact feature-based sequence representation. This method improves prediction accuracy for the four main structural classes compared to competing methods, and provides highly accurate predictions for sequences of widely varying homologies. The experimental evaluation of the proposed method shows superior results across sequences that are characterized by entire homology spectrum, ranging from 25% to 90% homology. The error rates were reduced by over 20% when compared with using individual prediction methods and most commonly used composition vector representation of protein sequences. Comparisons with competing methods on three large benchmark datasets consistently show the superiority of the proposed method.
162 citations
TL;DR: An important assertion from a previous paper, that only the modulus of a complex fuzzy membership should be considered in set theoretic (or logical) operations, is examined and the impact of this property on the form of complex fuzzy logic operations is examined.
Abstract: Complex fuzzy logic is a postulated logic system that is isomorphic to the complex fuzzy sets recently described in a previous paper. This concept is analogous to the many-valued logics that are isomorphic to type-1 fuzzy sets, commonly known as fuzzy logic. As with fuzzy logics, a complex fuzzy logic would be defined by particular choices of the conjunction, disjunction and complement operators. In this paper, an important assertion from a previous paper, that only the modulus of a complex fuzzy membership should be considered in set theoretic (or logical) operations, is examined. A more general mathematical formulation (the property of rotational invariance) is proposed for this assertion, and the impact of this property on the form of complex fuzzy logic operations is examined. All complex fuzzy logics based on the modulus of a vector are shown to be rotationally invariant. The case of complex fuzzy logics that are not rotationally invariant is examined using the framework of vector logic. A candidate conjunction operator was identified, and the existence of a dual disjunction was proven. Finally, a discussion on the possible applications of complex fuzzy logic focuses on the phenomenon of regularity as a possible fuzzification of stationarity.
144 citations
24 Jun 2007
TL;DR: This work examines stratification, a widely used technique for learning unbalanced data that has received little attention in software defect prediction, and finds an improvement of at least 23% in the average geometric mean classification accuracy on four benchmark datasets.
Abstract: Due to the tremendous complexity and sophistication of software, improving software reliability is an enormously difficult task. We study the software defect prediction problem, which focuses on predicting which modules will experience a failure during operation. Numerous studies have applied machine learning to software defect prediction; however, skewness in defect-prediction datasets usually undermines the learning algorithms. The resulting classifiers will often never predict the faulty minority class. This problem is well known in machine learning and is often referred to as learning from unbalanced datasets. We examine stratification, a widely used technique for learning unbalanced data that has received little attention in software defect prediction. Our experiments are focused on the SMOTE technique, which is a method of over-sampling minority-class examples. Our goal is to determine if SMOTE can improve recognition of defect-prone modules, and at what cost. Our experiments demonstrate that after SMOTE resampling, we have a more balanced classification. We found an improvement of at least 23% in the average geometric mean classification accuracy on four benchmark datasets.
125 citations
TL;DR: A meta-study of the empirical literature on trust in e-commerce systems is conducted, and a qualitative model incorporating the various factors that have been empirically found to influence consumer trust in E-commerce is proposed.
Abstract: Trust is at once an elusive, imprecise concept, and a critical attribute that must be engineered into e-commerce systems. Trust conveys a vast number of meanings, and is deeply dependent upon context. The literature on engineering trust into e-commerce systems reflects these ambiguous meanings; there are a large number of articles, but there is as yet no clear theoretical framework for the investigation of trust in e-commerce. E-commerce, however, is predicated on trust; indeed, any e-commerce vendor that fails to establish a trusting relationship with their customers is doomed. There is a very clear need for specific guidance on e-commerce system attributes and business operations that will effectively promote consumer trust. To address this need, we have conducted a meta-study of the empirical literature on trust in e-commerce systems. This area of research is still immature, and hence our meta-analysis is qualitative rather than quantitative. We identify the major theoretical frameworks that have been proposed in the literature, and propose a qualitative model incorporating the various factors that have been empirically found to influence consumer trust in e-commerce. As this model is too complex to be of practical use, we explore subsets of this model that have the strongest support in the literature, and discuss the implications of this model for Web site design. Finally, we outline key conceptual and methodological needs for future work on this topic.
112 citations
Cited by
More filters
Journal Article•
8,675 citations
01 Mar 1995
TL;DR: This thesis applies neural network feature selection techniques to multivariate time series data to improve prediction of a target time series and results indicate that the Stochastics and RSI indicators result in better prediction results than the moving averages.
Abstract: : This thesis applies neural network feature selection techniques to multivariate time series data to improve prediction of a target time series. Two approaches to feature selection are used. First, a subset enumeration method is used to determine which financial indicators are most useful for aiding in prediction of the S&P 500 futures daily price. The candidate indicators evaluated include RSI, Stochastics and several moving averages. Results indicate that the Stochastics and RSI indicators result in better prediction results than the moving averages. The second approach to feature selection is calculation of individual saliency metrics. A new decision boundary-based individual saliency metric, and a classifier independent saliency metric are developed and tested. Ruck's saliency metric, the decision boundary based saliency metric, and the classifier independent saliency metric are compared for a data set consisting of the RSI and Stochastics indicators as well as delayed closing price values. The decision based metric and the Ruck metric results are similar, but the classifier independent metric agrees with neither of the other metrics. The nine most salient features, determined by the decision boundary based metric, are used to train a neural network and the results are presented and compared to other published results. (AN)
1,429 citations
Book•
30 Apr 1996TL;DR: Technical foundations introduction software reliability and system reliability the operational profile software reliability modelling survey model evaluation and recalibration techniques practices and experiences and best current practice of SRE software reliability measurement experience.
Abstract: Technical foundations introduction software reliability and system reliability the operational profile software reliability modelling survey model evaluation and recalibration techniques practices and experiences best current practice of SRE software reliability measurement experience measurement-based analysis of software reliability software fault and failure classification techniques trend analysis in validation and maintenance software reliability and field data analysis software reliability process assessment emerging techniques software reliability prediction metrics software reliability and testing fault-tolerant SRE software reliability using fault trees software reliability process simulation neural networks and software reliability. Appendices: software reliability tools software failure data set repository.
1,039 citations
TL;DR: Although there are a set of fault prediction studies in which confidence is possible, more studies are needed that use a reliable methodology and which report their context, methodology, and performance comprehensively.
Abstract: Background: The accurate prediction of where faults are likely to occur in code can help direct test effort, reduce costs, and improve the quality of software. Objective: We investigate how the context of models, the independent variables used, and the modeling techniques applied influence the performance of fault prediction models. Method: We used a systematic literature review to identify 208 fault prediction studies published from January 2000 to December 2010. We synthesize the quantitative and qualitative results of 36 studies which report sufficient contextual and methodological information according to the criteria we develop and apply. Results: The models that perform well tend to be based on simple modeling techniques such as Naive Bayes or Logistic Regression. Combinations of independent variables have been used by models that perform well. Feature selection has been applied to these combinations when models are performing particularly well. Conclusion: The methodology used to build models seems to be influential to predictive performance. Although there are a set of fault prediction studies in which confidence is possible, more studies are needed that use a reliable methodology and which report their context, methodology, and performance comprehensively.
844 citations