scispace - formally typeset
Search or ask a question

Showing papers on "Test data published in 2018"


Posted Content
Jie Chen1, Tengfei Ma1, Cao Xiao1
TL;DR: Enhanced with importance sampling, FastGCN not only is efficient for training but also generalizes well for inference, and is orders of magnitude more efficient while predictions remain comparably accurate.
Abstract: The graph convolutional networks (GCN) recently proposed by Kipf and Welling are an effective graph model for semi-supervised learning This model, however, was originally designed to be learned with the presence of both training and test data Moreover, the recursive neighborhood expansion across layers poses time and memory challenges for training with large, dense graphs To relax the requirement of simultaneous availability of test data, we interpret graph convolutions as integral transforms of embedding functions under probability measures Such an interpretation allows for the use of Monte Carlo approaches to consistently estimate the integrals, which in turn leads to a batched training scheme as we propose in this work---FastGCN Enhanced with importance sampling, FastGCN not only is efficient for training but also generalizes well for inference We show a comprehensive set of experiments to demonstrate its effectiveness compared with GCN and related models In particular, training is orders of magnitude more efficient while predictions remain comparably accurate

786 citations


Proceedings ArticleDOI
16 Nov 2018
TL;DR: This paper proposes a mutation testing framework specialized for DL systems to measure the quality of test data, and designs a set of model-level mutation operators that directly inject faults into DL models without a training process.
Abstract: Deep learning (DL) defines a new data-driven programming paradigm where the internal system logic is largely shaped by the training data. The standard way of evaluating DL models is to examine their performance on a test dataset. The quality of the test dataset is of great importance to gain confidence of the trained models. Using an inadequate test dataset, DL models that have achieved high test accuracy may still lack generality and robustness. In traditional software testing, mutation testing is a well-established technique for quality evaluation of test suites, which analyzes to what extent a test suite detects the injected faults. However, due to the fundamental difference between traditional software and deep learning-based software, traditional mutation testing techniques cannot be directly applied to DL systems. In this paper, we propose a mutation testing framework specialized for DL systems to measure the quality of test data. To do this, by sharing the same spirit of mutation testing in traditional software, we first define a set of source-level mutation operators to inject faults to the source of DL (i.e., training data and training programs). Then we design a set of model-level mutation operators that directly inject faults into DL models without a training process. Eventually, the quality of test data could be evaluated from the analysis on to what extent the injected faults could be detected. The usefulness of the proposed mutation testing techniques is demonstrated on two public datasets, namely MNIST and CIFAR-10, with three DL models.

252 citations


Proceedings Article
Jie Chen1, Tengfei Ma1, Cao Xiao1
30 Jan 2018
TL;DR: FastGCN as mentioned in this paper proposes to interpret graph convolutions as integral transforms of embedding functions under probability measures, which allows for the use of Monte Carlo approaches to consistently estimate the integrals.
Abstract: The graph convolutional networks (GCN) recently proposed by Kipf and Welling are an effective graph model for semi-supervised learning. This model, however, was originally designed to be learned with the presence of both training and test data. Moreover, the recursive neighborhood expansion across layers poses time and memory challenges for training with large, dense graphs. To relax the requirement of simultaneous availability of test data, we interpret graph convolutions as integral transforms of embedding functions under probability measures. Such an interpretation allows for the use of Monte Carlo approaches to consistently estimate the integrals, which in turn leads to a batched training scheme as we propose in this work---FastGCN. Enhanced with importance sampling, FastGCN not only is efficient for training but also generalizes well for inference. We show a comprehensive set of experiments to demonstrate its effectiveness compared with GCN and related models. In particular, training is orders of magnitude more efficient while predictions remain comparably accurate.

238 citations


Posted Content
TL;DR: An introduction to transfer learners and domain-adaptive classifiers is presented, guided by the question: when and how can a classifier generalize from a source to a target domain?
Abstract: In machine learning, if the training data is an unbiased sample of an underlying distribution, then the learned classification function will make accurate predictions for new samples. However, if the training data is not an unbiased sample, then there will be differences between how the training data is distributed and how the test data is distributed. Standard classifiers cannot cope with changes in data distributions between training and test phases, and will not perform well. Domain adaptation and transfer learning are sub-fields within machine learning that are concerned with accounting for these types of changes. Here, we present an introduction to these fields, guided by the question: when and how can a classifier generalize from a source to a target domain? We will start with a brief introduction into risk minimization, and how transfer learning and domain adaptation expand upon this framework. Following that, we discuss three special cases of data set shift, namely prior, covariate and concept shift. For more complex domain shifts, there are a wide variety of approaches. These are categorized into: importance-weighting, subspace mapping, domain-invariant spaces, feature augmentation, minimax estimators and robust algorithms. A number of points will arise, which we will discuss in the last section. We conclude with the remark that many open questions will have to be addressed before transfer learners and domain-adaptive classifiers become practical.

199 citations


Journal ArticleDOI
TL;DR: In this article, a machine learning framework is developed to estimate ocean-wave conditions by supervised training of machine learning models on many thousands of iterations of a physics-based wave model, accurate representations of significant wave heights and period can be used to predict ocean conditions.

194 citations


Journal ArticleDOI
01 Mar 2018
TL;DR: The prognostics framework proposed in this paper provides a structured way for monitoring the state of health (SoH) of a battery by maintaining satisfactory prediction accuracy.
Abstract: In this paper, a method for the estimation of remaining useful lifetime (RUL) of lithium-ion batteries has been presented based on a combination of its capacity degradation and internal resistance growth models. The capacity degradation model is developed recently based on battery capacity test data. An empirical model for internal resistance growth is also developed based on electrochemical-impedance spectroscopy (EIS) test data. The obtained models are used in a particle filtering (PF) framework for making end-of-lifetime (EOL) predictions at various phases of its lifecycle. Further, the above two models were fused together to obtain a new degradation model for RUL estimation. It has been observed that the fused degradation model has improved the standard deviation of prediction as compared to the individual degradation models by maintaining satisfactory prediction accuracy. The effect of parameter variations on the performance of the PF algorithm has also been studied. Finally, the predictions are validated with experimental data. From the results it can be observed that with the availability of longer volume of data, the prediction accuracy gradually improves. The prognostics framework proposed in this paper provides a structured way for monitoring the state of health (SoH) of a battery.

187 citations


Journal ArticleDOI
TL;DR: A method of human activity recognition with high throughput from raw accelerometer data applying a deep recurrent neural network (DRNN) is proposed, and various architectures and its combination to find the best parameter values are investigated.
Abstract: In this paper, we propose a method of human activity recognition with high throughput from raw accelerometer data applying a deep recurrent neural network (DRNN), and investigate various architectures and its combination to find the best parameter values. The “high throughput” refers to short time at a time of recognition. We investigated various parameters and architectures of the DRNN by using the training dataset of 432 trials with 6 activity classes from 7 people. The maximum recognition rate was 95.42% and 83.43% against the test data of 108 segmented trials each of which has single activity class and 18 multiple sequential trials, respectively. Here, the maximum recognition rates by traditional methods were 71.65% and 54.97% for each. In addition, the efficiency of the found parameters was evaluated using additional dataset. Further, as for throughput of the recognition per unit time, the constructed DRNN was requiring only 1.347 ms, while the best traditional method required 11.031 ms which includes 11.027 ms for feature calculation. These advantages are caused by the compact and small architecture of the constructed real time oriented DRNN.

184 citations


Journal ArticleDOI
TL;DR: This research advocates that the development of automated classification systems which can identify fish from underwater video imagery is feasible and a cost-effective alternative to manual identification by humans.
Abstract: There is a need for automatic systems that can reliably detect, track and classify fish and other marine species in underwater videos without human intervention. Conventional computer vision techniques do not perform well in underwater conditions where the background is complex and the shape and textural features of fish are subtle. Data-driven classification models like neural networks require a huge amount of labelled data, otherwise they tend to over-fit to the training data and fail on unseen test data which is not involved in training. We present a state-of-the-art computer vision method for fine-grained fish species classification based on deep learning techniques. A cross-layer pooling algorithm using a pre-trained Convolutional Neural Network as a generalized feature detector is proposed, thus avoiding the need for a large amount of training data. Classification on test data is performed by a SVM on the features computed through the proposed method, resulting in classification accuracy of 94.3% for fish species from typical underwater video imagery captured off the coast of Western Australia. This research advocates that the development of automated classification systems which can identify fish from underwater video imagery is feasible and a cost-effective alternative to manual identification by humans.

151 citations


Book ChapterDOI
08 Sep 2018
TL;DR: A new test dataset for semantic and instance segmentation for the automotive domain is presented and a new benchmark evaluation method is presented that uses the meta-information to calculate the robustness of a given algorithm with respect to the individual hazards.
Abstract: Test datasets should contain many different challenging aspects so that the robustness and real-world applicability of algorithms can be assessed. In this work, we present a new test dataset for semantic and instance segmentation for the automotive domain. We have conducted a thorough risk analysis to identify situations and aspects that can reduce the output performance for these tasks. Based on this analysis we have designed our new dataset. Meta-information is supplied to mark which individual visual hazards are present in each test case. Furthermore, a new benchmark evaluation method is presented that uses the meta-information to calculate the robustness of a given algorithm with respect to the individual hazards. We show how this new approach allows for a more expressive characterization of algorithm robustness by comparing three baseline algorithms.

144 citations


Journal ArticleDOI
TL;DR: It is shown that exploiting unlabeled data consistently leads to better emotion recognition performance across all emotional dimensions, and the effect of adversarial training on the feature representation across the proposed deep learning architecture is visualize.
Abstract: The performance of speech emotion recognition is affected by the differences in data distributions between train (source domain) and test (target domain) sets used to build and evaluate the models. This is a common problem, as multiple studies have shown that the performance of emotional classifiers drops when they are exposed to data that do not match the distribution used to build the emotion classifiers. The difference in data distributions becomes very clear when the training and testing data come from different domains, causing a large performance gap between development and testing performance. Due to the high cost of annotating new data and the abundance of unlabeled data, it is crucial to extract as much useful information as possible from the available unlabeled data. This study looks into the use of adversarial multitask training to extract a common representation between train and test domains. The primary task is to predict emotional-attribute-based descriptors for arousal, valence, or dominance. The secondary task is to learn a common representation, where the train and test domains cannot be distinguished. By using a gradient reversal layer, the gradients coming from the domain classifier are used to bring the source and target domain representations closer. We show that exploiting unlabeled data consistently leads to better emotion recognition performance across all emotional dimensions. We visualize the effect of adversarial training on the feature representation across the proposed deep learning architecture. The analysis shows that the data representations for the train and test domains converge as the data are passed to deeper layers of the network. We also evaluate the difference in performance when we use a shallow neural network versus a deep neural network and the effect of the number of shared layers used by the task and domain classifiers.

135 citations


Posted Content
TL;DR: Zhang et al. as mentioned in this paper proposed a mutation testing framework for DL systems to measure the quality of test data, by defining a set of source-level mutation operators to inject faults to the source of DL (i.e., training data and training programs).
Abstract: Deep learning (DL) defines a new data-driven programming paradigm where the internal system logic is largely shaped by the training data. The standard way of evaluating DL models is to examine their performance on a test dataset. The quality of the test dataset is of great importance to gain confidence of the trained models. Using an inadequate test dataset, DL models that have achieved high test accuracy may still lack generality and robustness. In traditional software testing, mutation testing is a well-established technique for quality evaluation of test suites, which analyzes to what extent a test suite detects the injected faults. However, due to the fundamental difference between traditional software and deep learning-based software, traditional mutation testing techniques cannot be directly applied to DL systems. In this paper, we propose a mutation testing framework specialized for DL systems to measure the quality of test data. To do this, by sharing the same spirit of mutation testing in traditional software, we first define a set of source-level mutation operators to inject faults to the source of DL (i.e., training data and training programs). Then we design a set of model-level mutation operators that directly inject faults into DL models without a training process. Eventually, the quality of test data could be evaluated from the analysis on to what extent the injected faults could be detected. The usefulness of the proposed mutation testing techniques is demonstrated on two public datasets, namely MNIST and CIFAR-10, with three DL models.

Journal ArticleDOI
TL;DR: The proposed discriminative block-diagonal low-rank representation (BDLRR) method for recognition not only shows superior potential on image recognition but also outperforms the state-of-the-art methods.
Abstract: Existing block-diagonal representation studies mainly focuses on casting block-diagonal regularization on training data, while only little attention is dedicated to concurrently learning both block-diagonal representations of training and test data. In this paper, we propose a discriminative block-diagonal low-rank representation (BDLRR) method for recognition. In particular, the elaborate BDLRR is formulated as a joint optimization problem of shrinking the unfavorable representation from off-block-diagonal elements and strengthening the compact block-diagonal representation under the semisupervised framework of LRR. To this end, we first impose penalty constraints on the negative representation to eliminate the correlation between different classes such that the incoherence criterion of the extra-class representation is boosted. Moreover, a constructed subspace model is developed to enhance the self-expressive power of training samples and further build the representation bridge between the training and test samples, such that the coherence of the learned intraclass representation is consistently heightened. Finally, the resulting optimization problem is solved elegantly by employing an alternative optimization strategy, and a simple recognition algorithm on the learned representation is utilized for final prediction. Extensive experimental results demonstrate that the proposed method achieves superb recognition results on four face image data sets, three character data sets, and the 15 scene multicategories data set. It not only shows superior potential on image recognition but also outperforms the state-of-the-art methods.

Proceedings ArticleDOI
15 Jun 2018
TL;DR: In this paper, a deep network with multiple domain-specific classifiers, each associated to a source domain, is designed to learn domain invariant representations from source data and exploit the probabilities that a target sample belongs to each source domain and exploit them to optimally fuse the classifiers predictions.
Abstract: A long standing problem in visual object categorization is the ability of algorithms to generalize across different testing conditions. The problem has been formalized as a covariate shift among the probability distributions generating the training data (source) and the test data (target) and several domain adaptation methods have been proposed to address this issue. While these approaches have considered the single source-single target scenario, it is plausible to have multiple sources and require adaptation to any possible target domain. This last scenario, named Domain Generalization (DG), is the focus of our work. Differently from previous DG methods which learn domain invariant representations from source data, we design a deep network with multiple domain-specific classifiers, each associated to a source domain. At test time we estimate the probabilities that a target sample belongs to each source domain and exploit them to optimally fuse the classifiers predictions. To further improve the generalization ability of our model, we also introduced a domain agnostic component supporting the final classifier. Experiments on two public benchmarks demonstrate the power of our approach.

Journal ArticleDOI
01 Apr 2018-Sensors
TL;DR: This paper proposes a one-dimensional Convolutional Neural Network (1D CNN) for HAR that employs a divide and conquer-based classifier learning coupled with test data sharpening that outperforms both the two-stage 1D CNN-only method and other state of the art approaches.
Abstract: Human Activity Recognition (HAR) aims to identify the actions performed by humans using signals collected from various sensors embedded in mobile devices. In recent years, deep learning techniques have further improved HAR performance on several benchmark datasets. In this paper, we propose one-dimensional Convolutional Neural Network (1D CNN) for HAR that employs a divide and conquer-based classifier learning coupled with test data sharpening. Our approach leverages a two-stage learning of multiple 1D CNN models; we first build a binary classifier for recognizing abstract activities, and then build two multi-class 1D CNN models for recognizing individual activities. We then introduce test data sharpening during prediction phase to further improve the activity recognition accuracy. While there have been numerous researches exploring the benefits of activity signal denoising for HAR, few researches have examined the effect of test data sharpening for HAR. We evaluate the effectiveness of our approach on two popular HAR benchmark datasets, and show that our approach outperforms both the two-stage 1D CNN-only method and other state of the art approaches.

Journal ArticleDOI
30 May 2018
TL;DR: This paper serves as a survey and empirical evaluation of the state-of-the-art in activity recognition methods using accelerometers, particularly focused on long-term activity recognition in real-world settings.
Abstract: This paper serves as a survey and empirical evaluation of the state-of-the-art in activity recognition methods using accelerometers. The paper is particularly focused on long-term activity recognition in real-world settings. In these environments, data collection is not a trivial matter; thus, there are performance trade-offs between prediction accuracy, which is not the sole system objective, and keeping the maintenance overhead at minimum levels. We examine research that has focused on the selection of activities, the features that are extracted from the accelerometer data, the segmentation of the time-series data, the locations of accelerometers, the selection and configuration trade-offs, the test/retest reliability, and the generalisation performance. Furthermore, we study these questions from an experimental platform and show, somewhat surprisingly, that many disparate experimental configurations yield comparable predictive performance on testing data. Our understanding of these results is that the experimental setup directly and indirectly defines a pathway for context to be delivered to the classifier, and that, in some settings, certain configurations are more optimal than alternatives. We conclude by identifying how the main results of this work can be used in practice, specifically in experimental configurations in challenging experimental conditions.

Journal ArticleDOI
TL;DR: This work explores the application of Bayesian analysis to determine the spatially varying soil properties in a slope from multiple sources of test data and demonstrates that the method is able to efficiently learn the random field and its parameters, as well as update the resulting reliability of the slope.

Book ChapterDOI
11 Jun 2018
TL;DR: An inverse classification approach whose principle consists in determining the minimal changes needed to alter a prediction: in an instance-based framework, given a data point whose classification must be explained, the proposed method consists in identifying a close neighbor classified differently, where the closeness definition integrates a sparsity constraint.
Abstract: In the context of post-hoc interpretability, this paper addresses the task of explaining the prediction of a classifier, considering the case where no information is available, neither on the classifier itself, nor on the processed data (neither the training nor the test data). It proposes an inverse classification approach whose principle consists in determining the minimal changes needed to alter a prediction: in an instance-based framework, given a data point whose classification must be explained, the proposed method consists in identifying a close neighbor classified differently, where the closeness definition integrates a sparsity constraint. This principle is implemented using observation generation in the Growing Spheres algorithm. Experimental results on two datasets illustrate the relevance of the proposed approach that can be used to gain knowledge about the classifier.

Journal ArticleDOI
TL;DR: This paper proposes a new and accurate spatio-temporal prediction method to replace missing values in remote sensing data sets and accompanies the open-source R package gapfill, which provides a flexible, fast, and ready-to-use implementation of the method.
Abstract: Continuous, consistent, and long time-series from remote sensing are essential to monitoring changes on Earth’s surface. However, analyzing such data sets is often challenging due to missing values introduced by cloud cover, missing orbits, sensor geometry artifacts, and so on. We propose a new and accurate spatio-temporal prediction method to replace missing values in remote sensing data sets. The method exploits the spatial coherence and temporal seasonal regularity that are inherent in many data sets. The key parts of the method are: 1) the adaptively chosen spatio-temporal subsets around missing values; 2) the ranking of images within the subsets based on a scoring algorithm; 3) the estimation of empirical quantiles characterizing the missing values; and 4) the prediction of missing values through quantile regression. One advantage of quantile regression is the robustness to outliers, which enables more accurate parameter retrieval in the analysis of remote sensing data sets. In addition, we provide bootstrap-based quantification of prediction uncertainties. The proposed prediction method was applied to a Normalized Difference Vegetation Index data set from the Moderate Resolution Imaging Spectroradiometer and assessed with realistic test data sets featuring between 20% and 50% missing values. Validation against established methods showed that the proposed method has a good performance in terms of the root-mean-squared prediction error and significantly outperforms its competitors. This paper is accompanied by the open-source R package gapfill , which provides a flexible, fast, and ready-to-use implementation of the method.

Journal ArticleDOI
TL;DR: An ensemble transfer learning framework to improve the classification accuracy when the training data are insufficient is designed and a weighted-resampling method for transfer learning is proposed, which is named TrResampling.
Abstract: Transfer learning and ensemble learning are the new trends for solving the problem that training data and test data have different distributions. In this paper, we design an ensemble transfer learning framework to improve the classification accuracy when the training data are insufficient. First, a weighted-resampling method for transfer learning is proposed, which is named TrResampling. In each iteration, the data with heavy weights in the source domain are resampled, and the TrAdaBoost algorithm is used to adjust the weights of the source data and target data. Second, three classic machine learning algorithms, namely, naive Bayes, decision tree, and SVM, are used as the base learners of TrResampling, where the base learner with the best performance is chosen for transfer learning. To illustrate the performance of TrResampling, the TrAdaBoost and decision tree are used for evaluation and comparison on 15 UCI data sets, TrAdaBoost, ARTL, and SVM are used for evaluation and comparison on five text data sets. According to the experimental results, our proposed TrResampling is superior to the state-of-the-art learning methods on UCI data sets and text data sets. In addition, TrResampling, bagging-based transfer learning algorithm, and MultiBoosting-based transfer learning algorithm (TrMultiBoosting) are assembled in the framework, and we compare the three ensemble transfer learning algorithms with TrAdaBoost to illustrate the framework’s effective transfer ability.

Journal ArticleDOI
TL;DR: A hybrid algorithm based on transfer learning, Online Sequential Extreme Learning Machine with Kernels (OS-ELMK), and ensemble learning is proposed, along with its precise mathematic derivation, and constructs an algorithm framework for transfer learning in time series forecasting, which is groundbreaking.
Abstract: Recently, many excellent algorithms for time series prediction issues have been proposed, most of which are developed based on the assumption that sufficient training data and testing data under the same distribution are available. However, in reality, time-series data usually exhibit some kind of time-varying characteristic, which may lead to a wide variability between old data and new data. Hence, how to transfer knowledge over a long time span, when addressing time series prediction issues, poses serious challenges. To solve this problem, in this paper, a hybrid algorithm based on transfer learning, Online Sequential Extreme Learning Machine with Kernels (OS-ELMK), and ensemble learning, abbreviated as TrEnOS-ELMK, is proposed, along with its precise mathematic derivation. It aims to make the most of, rather than discard, the adequate long-ago data, and constructs an algorithm framework for transfer learning in time series forecasting, which is groundbreaking. Inspired by the preferable performance of models ensemble, ensemble learning scheme is also incorporated into our proposed algorithm, where the weights of the constituent models are adaptively updated according to their performances on fresh samples. Compared to many existing time series prediction methods, the newly proposed algorithm takes long-ago data into consideration and can effectively leverage the latent knowledge implied in these data for current prediction. In addition, TrEnOS-ELMK naturally inherits merits of both OS-ELMK and ensemble learning due to its incorporation of the two techniques. Experimental results on three synthetic and six real-world datasets demonstrate the effectiveness of the proposed algorithm.

Journal ArticleDOI
TL;DR: A new structural damage detection and localization method using a density peaks-based fast clustering algorithm modified to an unsupervised machine learning method that shows satisfactory performance with regard to damage localization under various damage scenarios as compared to a traditional approach.
Abstract: Within machine learning, several structural damage detection and localization methods based on clustering and novelty detection methods have been proposed in the recent years in order to monitor mechanical and civil structures. In order to train a machine learning model, an unsupervised mode is preferred because it only requires sufficient normal data from the intact states of a structure for training, and the testing abnormal data from various damage states are generally quite rare. With an unsupervised training mode, the capability of detecting structural damage mainly depends on the identification of abnormal data from the testing data. This identification process is termed unsupervised novelty detection. The premise of unsupervised novelty detection is that a large volume of a normal data set is available first to train a normal model that is established by machine learning algorithms. Then, the trained normal model can be used to identify abnormal data from future testing data. In this article, a new...

Journal ArticleDOI
TL;DR: A validation framework for accuracy assessment of multi-temporal built-up land layers using integrated public parcel and building records as validation data is developed and shows very encouraging accuracy measures that vary across study areas, generally improve over time but show very distinct patterns across the rural-urban trajectories.

Journal ArticleDOI
Zizhao Zhang1, Haojie Lin1, Xibin Zhao1, Rongrong Ji2, Yue Gao1 
TL;DR: An inductive multi-hypergraph learning algorithm, which targets on learning an optimal projection for the multi-modal training data and has the potential to be applied in other applications in practice.
Abstract: The wide 3D applications have led to increasing amount of 3D object data, and thus effective 3D object classification technique has become an urgent requirement. One important and challenging task for 3D object classification is how to formulate the 3D data correlation and exploit it. Most of the previous works focus on learning optimal pairwise distance metric for object comparison, which may lose the global correlation among 3D objects. Recently, a transductive hypergraph learning has been investigated for classification, which can jointly explore the correlation among multiple objects, including both the labeled and unlabeled data. Although these methods have shown better performance, they are still limited due to 1) a considerable amount of testing data may not be available in practice and 2) the high computational cost to test new coming data. To handle this problem, considering the multi-modal representations of 3D objects in practice, we propose an inductive multi-hypergraph learning algorithm, which targets on learning an optimal projection for the multi-modal training data. In this method, all the training data are formulated in multi-hypergraph based on the features, and the inductive learning is conducted to learn the projection matrices and the optimal multi-hypergraph combination weights simultaneously. Different from the transductive learning on hypergraph, the high cost training process is off-line, and the testing process is very efficient for the inductive learning on hypergraph. We have conducted experiments on two 3D benchmarks, i.e. , the NTU and the ModelNet40 data sets, and compared the proposed algorithm with the state-of-the-art methods and traditional transductive multi-hypergraph learning methods. Experimental results have demonstrated that the proposed method can achieve effective and efficient classification performance. We also note that the proposed method is a general framework and has the potential to be applied in other applications in practice.

Journal ArticleDOI
19 Sep 2018
TL;DR: This study categorized the quality of predictions for the test set or true external set into three groups (good, moderate, and bad) based on absolute prediction errors and found that using the most frequently appearing weighting scheme 0.5–0–0.5, the composite score-based categorization showed concordance with absolute prediction error- based categorization for more than 80% test data points.
Abstract: Quantitative structure-activity relationship (QSAR) models have long been used for making predictions and data gap filling in diverse fields including medicinal chemistry, predictive toxicology, environmental fate modeling, materials science, agricultural science, nanoscience, food science, and so forth. Usually a QSAR model is developed based on chemical information of a properly designed training set and corresponding experimental response data while the model is validated using one or more test set(s) for which the experimental response data are available. However, it is interesting to estimate the reliability of predictions when the model is applied to a completely new data set (true external set) even when the new data points are within applicability domain (AD) of the developed model. In the present study, we have categorized the quality of predictions for the test set or true external set into three groups (good, moderate, and bad) based on absolute prediction errors. Then, we have used three criteria [(a) mean absolute error of leave-one-out predictions for 10 most close training compounds for each query molecule; (b) AD in terms of similarity based on the standardization approach; and (c) proximity of the predicted value of the query compound to the mean training response] in different weighting schemes for making a composite score of predictions. It was found that using the most frequently appearing weighting scheme 0.5-0-0.5, the composite score-based categorization showed concordance with absolute prediction error-based categorization for more than 80% test data points while working with 5 different datasets with 15 models for each set derived in three different splitting techniques. These observations were also confirmed with true external sets for another four endpoints suggesting applicability of the scheme to judge the reliability of predictions for new datasets. The scheme has been implemented in a tool "Prediction Reliability Indicator" available at http://dtclab.webs.com/software-tools and http://teqip.jdvu.ac.in/QSAR_Tools/DTCLab/, and the tool is presently valid for multiple linear regression models only.

Journal ArticleDOI
TL;DR: The maximum entropy model (Maxent) was adopted as the base classifier in the transfer learning model and experimental results showed that good classification performance was obtained based on the transferLearning model.

Journal ArticleDOI
TL;DR: This work designs a convolutional neural network to differentiate between different morphology classes using sources from the Radio Galaxy Zoo (RGZ) citizen science project, and explores the factors that affect the performance of such neural networks, such as the amount of training data, number and nature of layers, and the hyperparameters.
Abstract: Machine learning techniques have been increasingly useful in astronomical applications over the last few years, for example in the morphological classification of galaxies. Convolutional neural networks have proven to be highly effective in classifying objects in image data. In the context of radio-interferometric imaging in astronomy, we looked for ways to identify multiple components of individual sources. To this effect, we design a convolutional neural network to differentiate between different morphology classes using sources from the Radio Galaxy Zoo (RGZ) citizen science project. In this first step, we focus on exploring the factors that affect the performance of such neural networks, such as the amount of training data, number and nature of layers, and the hyperparameters. We begin with a simple experiment in which we only differentiate between two extreme morphologies, using compact and multiple-component extended sources. We found that a three-convolutional layer architecture yielded very good results, achieving a classification accuracy of 97.4 per cent on a test data set. The same architecture was then tested on a four-class problem where we let the network classify sources into compact and three classes of extended sources, achieving a test accuracy of 93.5 per cent. The best-performing convolutional neural network set-up has been verified against RGZ Data Release 1 where a final test accuracy of 94.8 per cent was obtained, using both original and augmented images. The use of sigma clipping does not offer a significant benefit overall, except in cases with a small number of training images.

Journal ArticleDOI
TL;DR: The diagnosis system of power plant gas turbine has been developed to detect the deterioration of engine performance and the artificial neural network model was built to predict the deteriorated characteristics of gas turbine.

Posted Content
TL;DR: The in-depth evaluation of the proposed DeepGauge testing criteria is demonstrated on two well-known datasets, five DL systems, with four state-of-the-art adversarial data generation techniques, which sheds light on the construction of robust DL systems.
Abstract: Deep learning defines a new data-driven programming paradigm that constructs the internal system logic of a crafted neuron network through a set of training data. Deep learning (DL) has been widely adopted in many safety-critical scenarios. However, a plethora of studies have shown that the state-of-the-art DL systems suffer from various vulnerabilities which can lead to severe consequences when applied to real-world applications. Currently, the robustness of a DL system against adversarial attacks is usually measured by the accuracy of test data. Considering the limitation of accessible test data, good performance on test data can hardly guarantee the robustness and generality of DL systems. Different from traditional software systems which have clear and controllable logic and functionality, a DL system is trained with data and lacks thorough understanding. This makes it difficult for system analysis and defect detection, which could potentially hinder its real-world deployment without safety guarantees. In this paper, we propose DeepGauge, a comprehensive and multi-granularity testing criteria for DL systems, which renders a complete and multi-faceted portrayal of the testbed. The in-depth evaluation of our proposed testing criteria is demonstrated on two well-known datasets, five DL systems, with four state-of-the-art adversarial data generation techniques. The effectiveness of DeepGauge sheds light on the construction of robust DL systems.

Journal ArticleDOI
TL;DR: Extensive experimental results show that the DAFTL can identify the bearing fault accurately under variable working conditions and outperforms other competitive approaches.
Abstract: Bearings, as universal components, have been widely used in the important position of rotating machinery. However, due to the distribution divergence between training data and test data caused by variable working conditions, such as different rotation speeds and load conditions, most of the fault diagnosis models built during the training stage are not applicable for the detection in the test stage. The models dramatically lead to the performance degradation for fault classification. In this paper, a novel bearing fault diagnosis method, domain adaptation by using feature transfer learning (DAFTL) under variable working conditions, is proposed to solve this performance degradation issue. The dataset of normal bearings and faulty bearings are obtained via the fast Fourier transformation of raw vibration signals, under different motor speeds and load conditions. Then, the marginal and conditional distributions are reduced simultaneously between training data and test data by refining pseudo test labels based on the maximum mean discrepancy and domain invariant clustering in a common space. Ultimately, a transferable feature representation for training data and test data is achieved. With the help of the nearest-neighbor classifier built on the transferable features, bearing faults are identified in this common space. Extensive experimental results show that the DAFTL can identify the bearing fault accurately under variable working conditions and outperforms other competitive approaches.

Posted Content
TL;DR: There is a high risk that a statistical significance in this type of evaluation is not due to a superior learning approach, and there is ahigh risk that the difference is due to chance.
Abstract: Developing state-of-the-art approaches for specific tasks is a major driving force in our research community. Depending on the prestige of the task, publishing it can come along with a lot of visibility. The question arises how reliable are our evaluation methodologies to compare approaches? One common methodology to identify the state-of-the-art is to partition data into a train, a development and a test set. Researchers can train and tune their approach on some part of the dataset and then select the model that worked best on the development set for a final evaluation on unseen test data. Test scores from different approaches are compared, and performance differences are tested for statistical significance. In this publication, we show that there is a high risk that a statistical significance in this type of evaluation is not due to a superior learning approach. Instead, there is a high risk that the difference is due to chance. For example for the CoNLL 2003 NER dataset we observed in up to 26% of the cases type I errors (false positives) with a threshold of p < 0.05, i.e., falsely concluding a statistically significant difference between two identical approaches. We prove that this evaluation setup is unsuitable to compare learning approaches. We formalize alternative evaluation setups based on score distributions.