scispace - formally typeset
Search or ask a question

Showing papers on "Test data published in 2016"


Journal ArticleDOI
TL;DR: This paper discusses the modular approach to atomistic machine learning through the development of the open-source Atomistic Machine-learning Package ( Amp), which allows for representations of both the total and atom-centered potential energy surface, in both periodic and non-periodic systems.

302 citations


Journal ArticleDOI
TL;DR: Algorithm described in this paper can be used to add final state radiation to the event generated by external software using selected event record format and can also be used on a sample of events loaded from data file.

272 citations


Journal ArticleDOI
TL;DR: In this paper, the authors investigated support vector machine based fault type and distance estimation scheme in a long transmission line using post fault single cycle current waveform and pre-processing of the samples is done by wavelet packet transform.

131 citations


Journal ArticleDOI
TL;DR: The Monte Carlo simulation method is utilized to compute the DFT model with consideration of system replacement policy and the results show that this integrated approach is more flexible and effective for assessing the reliability of complex dynamic systems.

116 citations


Journal ArticleDOI
TL;DR: The experiment showed that the variable selection methods used in data mining could improve the performance of clinical prediction models.

113 citations


Journal ArticleDOI
TL;DR: In this article, a new crack size quantification method based on in-situ Lamb wave testing and Bayesian method is presented, which uses coupon test to develop a baseline quantification model between the crack size and damage sensitive features.

109 citations


Journal ArticleDOI
TL;DR: In this article, the potential of using free-of-charge Sentinel-1 Synthetic Aperture Radar (SAR) imagery for land cover mapping in urban areas is investigated.
Abstract: . In this paper, the potential of using free-of-charge Sentinel-1 Synthetic Aperture Radar (SAR) imagery for land cover mapping in urban areas is investigated. To this aim, we use dual-pol (VV+VH) Interferometric Wide swath mode (IW) data collected on September 16th 2015 along descending orbit over Istanbul megacity, Turkey. Data have been calibrated, terrain corrected, and filtered by a 5x5 kernel using gamma map approach. During terrain correction by using a 25m resolution SRTM DEM, SAR data has been resampled resulting into a pixel spacing of 20m. Support Vector Machines (SVM) method has been implemented as a supervised pixel based image classification to classify the dataset. During the classification, different scenarios have been applied to find out the performance of Sentinel-1 data. The training and test data have been collected from high resolution image of Google Earth. Different combinations of VV and VH polarizations have been analysed and the resulting classified images have been assessed using overall classification accuracy and Kappa coefficient. Results demonstrate that, combining opportunely dual polarization data, the overall accuracy increases up to 93.28% against 73.85% and 70.74% of using individual polarization VV and VH, respectively. Our preliminary analysis points out that dual polarimetric Sentinel-1SAR data can be effectively exploited for producing accurate land cover maps, with relevant advantages for urban planning and management of large cities.

104 citations


Journal ArticleDOI
TL;DR: In this paper, the ability of artificial neural network (ANN), adaptive neuro fuzzy inference system (ANFIS), multivariate adaptive regression splines (MARS) and M5 Model Tree (M5Tree) techniques to predict ultimate conditions of fiber-reinforced polymer (FRP)-confined concrete was studied.
Abstract: This paper studies the ability of artificial neural network (ANN), adaptive neuro fuzzy inference system (ANFIS), multivariate adaptive regression splines (MARS) and M5 Model Tree (M5Tree) techniques to predict ultimate conditions of fiber-reinforced polymer (FRP)-confined concrete. A large experimental test database that consists of over 1000 axial compression tests results of FRP-confined concrete specimens assembled from the published literature was used to train, test, and validate the models. The modeling results show that the ANN, ANFIS, MARS and M5Tree models fit well with the experimental test data. The M5Tree model performs better than the remaining models in predicting the hoop strain reduction factor and strength enhancement ratio, whereas the ANN model provided the most accurate estimates of the strain enhancement ratio. Performances of the proposed models are also compared with those of the existing conventional and evolutionary algorithm models, which indicate that the proposed ANN, ANFIS, MARS and M5Tree models exhibit improved accuracy over the existing models. The predictions of each proposed model are subsequently used to establish the interdependence of critical parameters and their influence on the behavior of FRP-confined concrete, which are discussed in the paper.

97 citations


Journal ArticleDOI
TL;DR: In this article, a meta-learning based framework, termed Building Energy Model Recommendation System (BEMR), is proposed to forecast the building energy profiles based on building characteristics, including physical features as well as statistical and time series meta-features extracted from the operational data and energy consumption data.

84 citations


Journal ArticleDOI
TL;DR: Experimental results show that RTBDG not only uses the limited energy of network nodes efficiently, but also balances the energy consumption of all nodes, and will be widely applied to risk analysis in different industrial operations.
Abstract: The era of big data has begun and an enormous amount of real-time data is used for the risk analysis of various industrial applications. However, a technical challenge exists in gathering real-time big data in a complex indoor industrial environment. Indoor wireless sensor networks (WSNs) technology can overcome this limitation by collecting the big data generated from source nodes and transmitting them to the data center in real time. In this study, typical residence, office, and manufacturing environments were chosen. The signal transmission characteristics of an indoor WSN were obtained by analyzing the test data. According to these characteristics, a real-time big data gathering (RTBDG) algorithm based on an indoor WSN is proposed for the risk analysis of industrial operations. In this algorithm, sensor nodes can screen the data collected from the environment and equipment according to the requirements of risk analysis. Clustering data transmission structure is then established on the basis of the received signal strength indicator (RSSI) and residual energy information. Experimental results show that RTBDG not only uses the limited energy of network nodes efficiently, but also balances the energy consumption of all nodes. In the near future, the algorithm will be widely applied to risk analysis in different industrial operations.

83 citations


Proceedings ArticleDOI
16 May 2016
TL;DR: A novel algorithm to produce descriptive online 3D occupancy maps using Gaussian processes, which may serve both as an improved-accuracy classifier, and as a predictive tool to support autonomous navigation.
Abstract: We present a novel algorithm to produce descriptive online 3D occupancy maps using Gaussian processes (GPs). GP regression and classification have met with recent success in their application to robot mapping, as GPs are capable of expressing rich correlation among map cells and sensor data. However, the cubic computational complexity has limited its application to large-scale mapping and online use. In this paper we address this issue first by proposing test-data octrees, octrees within blocks of the map that prune away nodes of the same state, condensing the number of test data used in a regression, in addition to allowing fast data retrieval. We also propose a nested Bayesian committee machine which, after new sensor data is partitioned among several GP regressions, fuses the result and updates the map with greatly reduced complexity. Finally, by adjusting the range of influence of the training data and tuning a variance threshold implemented in our method's binary classification step, we are able to control the richness of inference achieved by GPs - and its tradeoff with classification accuracy. The performance of the proposed approach is evaluated with both simulated and real data, demonstrating that the method may serve both as an improved-accuracy classifier, and as a predictive tool to support autonomous navigation.

Journal ArticleDOI
TL;DR: The Bayesian network has a better performance than the other algorithms in terms of the percentage correctly identified instances and Kappa values for both the training data and test data, in the sense that theBayesian network is relatively efficient and generalizable in the context of GPS data imputation.
Abstract: Global Positioning System (GPS) technologies have been increasingly considered as an alternative to traditional travel survey methods to collect activity-travel data. Algorithms applied to extract activity-travel patterns vary from informal ad-hoc decision rules to advanced machine learning methods and have different accuracy. This paper systematically compares the relative performance of different algorithms for the detection of transportation modes and activity episodes. In particular, naive Bayesian, Bayesian network, logistic regression, multilayer perceptron, support vector machine, decision table, and C4.5 algorithms are selected and compared for the same data according to their overall error rates and hit ratios. Results show that the Bayesian network has a better performance than the other algorithms in terms of the percentage correctly identified instances and Kappa values for both the training data and test data, in the sense that the Bayesian network is relatively efficient and generalizable in the context of GPS data imputation.

Journal ArticleDOI
TL;DR: In this article, a Markov chain Monte Carlo technique with adaptive random-walk steps is proposed to draw the samples for model parameter uncertainty quantification, and an iterated improved reduced system technique is employed to update the prediction error as well as to calculate the likelihood function in the sampling process.

Posted Content
TL;DR: The BookTest is proposed, a new dataset similar to the popular Children's Book Test, however more than 60 times larger, which shows that training on the new data improves the accuracy of the Attention-Sum Reader model on the original CBT test data by a much larger margin.
Abstract: There is a practically unlimited amount of natural language data available. Still, recent work in text comprehension has focused on datasets which are small relative to current computing possibilities. This article is making a case for the community to move to larger data and as a step in that direction it is proposing the BookTest, a new dataset similar to the popular Children's Book Test (CBT), however more than 60 times larger. We show that training on the new data improves the accuracy of our Attention-Sum Reader model on the original CBT test data by a much larger margin than many recent attempts to improve the model architecture. On one version of the dataset our ensemble even exceeds the human baseline provided by Facebook. We then show in our own human study that there is still space for further improvement.

Journal ArticleDOI
TL;DR: Experimental results show that the DCAE outperforms typical drift correction algorithms and autoencoder-based transfer learning methods and can improve the robustness of e-nose systems and greatly enhance their performance in real-world applications.
Abstract: Electronic noses (e-noses) are instruments that can be used to measure gas samples conveniently. Based on the measured signal, the type and concentration of the gas can be predicted by pattern recognition algorithms. However, e-noses are often affected by influential factors, such as instrumental variation and time-varying drift. From the viewpoint of pattern recognition, the factors make the posterior distribution of the test data drift from that of the training data, thus will degrade the accuracy of the prediction models. In this paper, we propose drift correction autoencoder (DCAE) to address this problem. DCAE learns to model and correct the influential factors explicitly with the help of transfer samples. It generates drift-corrected and discriminative representation of the original data, which can then be applied to various prediction algorithms. We evaluate DCAE on data sets with instrumental variation and complex time-varying drift. Prediction models are trained on samples collected with one device or in the initial time period, then tested on other devices or time periods. Experimental results show that the DCAE outperforms typical drift correction algorithms and autoencoder-based transfer learning methods. It can improve the robustness of e-nose systems and greatly enhance their performance in real-world applications.

Journal ArticleDOI
TL;DR: Bayesian approaches for probabilistic characterization of the ISV of φ ′ are developed using indirect test data and prior knowledge and the suitability of MCMCS in Bayesian probabilism characterization of soil properties is highlighted.

Journal ArticleDOI
01 Dec 2016
TL;DR: Meta-heuristics can be effectively used for hard problems and when the search space is large and approximation level+branch distance based fitness function is generally a good fitness function that guides the algorithms accurately.
Abstract: Graphical abstractDisplay Omitted HighlightsThis paper applies meta-heuristic algorithms to software testing problem.Different meta-heuristics were employed to analyze their performances on test data generation.A control parameter sensitivity analysis was performed on the algorithms.Various fitness functions based on different coverage approaches were compared. Cost of testing activities is a major portion of the total cost of a software. In testing, generating test data is very important because the efficiency of testing is highly dependent on the data used in this phase. In search-based software testing, soft computing algorithms explore test data in order to maximize a coverage metric which can be considered as an optimization problem. In this paper, we employed some meta-heuristics (Artificial Bee Colony, Particle Swarm Optimization, Differential Evolution and Firefly Algorithms) and Random Search algorithm to solve this optimization problem. First, the dependency of the algorithms on the values of the control parameters was analyzed and suitable values for the control parameters were recommended. Algorithms were compared based on various fitness functions (path-based, dissimilarity-based and approximation level+branch distance) because the fitness function affects the behaviour of the algorithms in the search space. Results showed that meta-heuristics can be effectively used for hard problems and when the search space is large. Besides, approximation level+branch distance based fitness function is generally a good fitness function that guides the algorithms accurately.

Journal ArticleDOI
TL;DR: A model is proposed here that allows for variable working speed, and an illustration of the model using the Amsterdam Chess Test data is provided.
Abstract: With computerized testing, it is possible to record both the responses of test takers to test questions (i.e., items) and the amount of time spent by a test taker in responding to each question. Various models have been proposed that take into account both test-taker ability and working speed, with the many models assuming a constant working speed throughout the test. The constant working speed assumption may be inappropriate for various reasons. For example, a test taker may need to adjust the pace due to time mismanagement, or a test taker who started out working too fast may reduce the working speed to improve accuracy. A model is proposed here that allows for variable working speed. An illustration of the model using the Amsterdam Chess Test data is provided.

Journal ArticleDOI
TL;DR: In this paper, a Bayesian sequential updating (BSU) approach is proposed for probabilistic characterization of geotechnical parameters based on multi-source information, including the information available prior to the project, referred to as prior knowledge in Bayesian framework, and results of different types of tests that might be performed at different locations in a soil layer within a specific site.

Journal ArticleDOI
TL;DR: An effective and innovative intelligent optimization model based on nonlinear support vector machine and hard penalty function is proposed to forecast solar radiation by converting support vectors machine into a regularization problem with ridge penalty, and using glowworm swarm optimization algorithm to determine the optimal parameters of the model.

Journal ArticleDOI
15 May 2016-Energy
TL;DR: In this paper, an ANN (artificial neural network) model for a solid desiccant - vapor compression hybrid air-conditioning system is developed to predict the cooling capacity, power input and COP (coefficient of performance) of the system.

Journal ArticleDOI
TL;DR: In this article, an early stopping mechanism is used for small training data, because it reliably overcomes overfitting problems, and two strategies of selection of suitable input variables are demonstrated.

Journal ArticleDOI
TL;DR: A freely available MATLAB code for the simulation of electron transport in arbitrary gas mixtures in the presence of uniform electric fields, allowing the tracing and visualization of the spatiotemporal evolution of electron swarms and the temporal development of the mean energy and the electron number due to attachment and/or ionization processes.

Journal ArticleDOI
TL;DR: This study demonstrates how the training data and rule number selections impact model accuracy and provides important guidance for future remote-sensing-based ecosystem modeling.
Abstract: Regression tree models have been widely used for remote sensing-based ecosystem mapping. Improper use of the sample data (model training and testing data) may cause overfitting and underfitting effects in the model. The goal of this study is to develop an optimal sampling data usage strategy for any dataset and identify an appropriate number of rules in the regression tree model that will improve its accuracy and robustness. Landsat 8 data and Moderate-Resolution Imaging Spectroradiometer-scaled Normalized Difference Vegetation Index (NDVI) were used to develop regression tree models. A Python procedure was designed to generate random replications of model parameter options across a range of model development data sizes and rule number constraints. The mean absolute difference (MAD) between the predicted and actual NDVI (scaled NDVI, value from 0–200) and its variability across the different randomized replications were calculated to assess the accuracy and stability of the models. In our case study, a six-rule regression tree model developed from 80% of the sample data had the lowest MAD (MADtraining = 2.5 and MADtesting = 2.4), which was suggested as the optimal model. This study demonstrates how the training data and rule number selections impact model accuracy and provides important guidance for future remote-sensing-based ecosystem modeling.

Journal ArticleDOI
01 Oct 2016
TL;DR: A discrete wavelet transform (DWT) is employed for reducing the amount of long-term reservoir pressure data obtained for eight different reservoir models, and a multi-layer perceptron neural network (MLPNN) is developed to recognize reservoir models using the reduced pressure data.
Abstract: Display Omitted Various reservoir models have been detected using novel coupled MLP-DWT model.Our proposed model detected reservoir models from training set with TCA=95.37%.Our coupled model recognized reservoir models from test set with TCA of 94.34%.The proposed model has been validated using synthetic and actual noisy real data. Well testing analysis is performed for detecting oil and gas reservoir model and estimating its associated parameters from pressure transient data which are often recorded by pressure down-hole gauges (PDG). The PDGs can record a huge amount of bottom-hole pressure data, limited computer resources for analysis and handling of these noisy data are some of the challenging problems for the PDGs monitoring. Therefore, reducing the number of the recorded data by PDGs to a manageable size is an important step in well test analysis. In the present study, a discrete wavelet transform (DWT) is employed for reducing the amount of long-term reservoir pressure data obtained for eight different reservoir models. Then, a multi-layer perceptron neural network (MLPNN) is developed to recognize reservoir models using the reduced pressure data. The developed algorithm has four steps: (1) generating pressure over time data (2) converting the generated data to log-log pressure derivative (PD) graphs (3) calculating of the multi-level discrete wavelet coefficient (DWC) of the PD graphs and (4) using the approximate wavelet coefficients as the inputs of a MLPNN classifier. Sensitivity analysis confirms that the most accurate reservoir model predictions are obtained by the MLPNN with 17 hidden neurons. The proposed method has been validated using simulated test data and actual field information. The results show that the suggested algorithm is able to identify the correct reservoir models for training and test data sets with total classification accuracies (TCA) of 95.37% and 94.34% respectively.

Journal ArticleDOI
TL;DR: Condor, an open-source simulation tool to predict X-ray scattering amplitudes for flash X-rays imaging experiments, is introduced.
Abstract: Flash X-ray imaging has the potential to determine structures down to molecular resolution without the need for crystallization. The ability to accurately predict the diffraction signal and to identify the optimal experimental configuration within the limits of the instrument is important for successful data collection. This article introduces Condor, an open-source simulation tool to predict X-ray far-field scattering amplitudes of isolated particles for customized experimental designs and samples, which the user defines by an atomic or a refractive index model. The software enables researchers to test whether their envisaged imaging experiment is feasible, and to optimize critical parameters for reaching the best possible result. It also aims to support researchers who intend to create or advance reconstruction algorithms by simulating realistic test data. Condor is designed to be easy to use and can be either installed as a Python package or used from its web interface (http://lmb.icm.uu.se/condor). X-ray free-electron lasers have high running costs and beam time at these facilities is precious. Data quality can be substantially improved by using simulations to guide the experimental design and simplify data analysis.

Journal ArticleDOI
TL;DR: In this paper, a multistage testing procedure for a truck tyre was presented, including quasi-static, dynamic and strongly dynamic tests, including: radial deflection test, bounce test, and blast loading test.

Journal ArticleDOI
TL;DR: In this paper, the authors applied principle component analysis (PCA) and Fisher discriminant analysis (FDA) for fault detection and fault isolation in Pakistan Research Reactor-2 (PARR-2) for known faults of control rod withdrawal and external reactivity insertion.

Journal ArticleDOI
TL;DR: An R-matrix Fortran package to solve coupled-channel problems in nuclear physics, and the Lagrange-mesh method, which deals with open and closed channels simultaneously, without numerical instability associated with closed channels.

Journal ArticleDOI
TL;DR: This paper proposes the Evolving Domain Adaptation (EDA) method that first finds a new feature space in which the source domain and the current target domain are approximately indistinguishable and uses a semi-supervised classification method to utilize both the unlabeled data of the target domain andThe source domain.
Abstract: Almost all of the existing domain adaptation methods assume that all test data belong to a single stationary target distribution. However, in many real world applications, data arrive sequentially and the data distribution is continuously evolving. In this paper, we tackle the problem of adaptation to a continuously evolving target domain that has been recently introduced. We assume that the available data for the source domain are labeled but the examples of the target domain can be unlabeled and arrive sequentially. Moreover, the distribution of the target domain can evolve continuously over time. We propose the Evolving Domain Adaptation (EDA) method that first finds a new feature space in which the source domain and the current target domain are approximately indistinguishable. Therefore, source and target domain data are similarly distributed in the new feature space and we use a semi-supervised classification method to utilize both the unlabeled data of the target domain and the labeled data of the source domain. Since test data arrives sequentially, we propose an incremental approach both for finding the new feature space and for semi-supervised classification. Experiments on several real datasets demonstrate the superiority of our proposed method in comparison to the other recent methods.