Showing papers on "Test data published in 2015"

PDF

Open Access

Journal Article•DOI•

Atomsk: A tool for manipulating and converting atomic data files ☆

[...]

Pierre Hirel¹•Institutions (1)

01 Dec 2015-Computer Physics Communications

TL;DR: Atomsk is a unified program that allows to generate, convert and transform atomic systems for the purposes of ab initio calculations, classical atomistic simulations, or visualization, in the areas of computational physics and chemistry.

...read moreread less

867 citations

Journal Article•DOI•

Beware of R2: Simple, Unambiguous Assessment of the Prediction Accuracy of QSAR and QSPR Models

[...]

David L. J. Alexander¹, Alexander Tropsha², David A. Winkler¹•Institutions (2)

Commonwealth Scientific and Industrial Research Organisation¹, University of North Carolina at Chapel Hill²

09 Jul 2015-Journal of Chemical Information and Modeling

TL;DR: This paper clarifies some apparent confusion over the use of the coefficient of determination, R(2), as a measure of model fit and predictive power in QSAR and QSPR modeling and recommends a clearer and simpler alternative method to characterize model predictivity.

...read moreread less

Abstract: The statistical metrics used to characterize the external predictivity of a model, i.e., how well it predicts the properties of an independent test set, have proliferated over the past decade. This paper clarifies some apparent confusion over the use of the coefficient of determination, R2, as a measure of model fit and predictive power in QSAR and QSPR modeling. R2 (or r2) has been used in various contexts in the literature in conjunction with training and test data for both ordinary linear regression and regression through the origin as well as with linear and nonlinear regression models. We analyze the widely adopted model fit criteria suggested by Golbraikh and Tropsha (J. Mol. Graphics Modell. 2002, 20, 269−276) in a strict statistical manner. Shortcomings in these criteria are identified, and a clearer and simpler alternative method to characterize model predictivity is provided. The intent is not to repeat the well-documented arguments for model validation using test data but rather to guide the ap...

...read moreread less

478 citations

Journal Article•DOI•

Mechanical Characterization and FE Modelling of a Hyperelastic Material

[...]

Majid Shahzad¹, Ali Kamran², M. Z. Siddiqui¹, Muhammad Farhan¹•Institutions (2)

Pakistan Space and Upper Atmosphere Research Commission¹, Institute of Space Technology²

01 Oct 2015-Materials Research-ibero-american Journal of Materials

TL;DR: In this paper, the authors used the digital image correlation (DIC) technique to have strain measurements for biaxial and planar specimens to input stress-strain data in Abaqus®.

...read moreread less

Abstract: The aim of research work is to characterize hyperelastic material and to determine a suitable strain energy function (SEF) for an indigenously developed rubber to be used in flexible joint use for thrust vectoring of solid rocket motor. In order to evaluate appropriate SEF uniaxial and volumetric tests along with equi-biaxial and planar shear tests were conducted. Digital image correlation (DIC) technique was utilized to have strain measurements for biaxial and planar specimens to input stress-strain data in Abaqus®. Yeoh model seems to be right choice, among the available material models, because of its ability to match experimental stress-strain data at small and large strain values. Quadlap specimen test was performed to validate material model fitted from test data. FE simulations were carried out to verify the behavior as predicted by Yeoh model and results are found to be in good agreement with the experimental data.

...read moreread less

200 citations

Proceedings Article•DOI•

BLSTM supported GEV beamformer front-end for the 3RD CHiME challenge

[...]

Jahn Heymann¹, Lukas Drude¹, Aleksej Chinaev¹, Reinhold Haeb-Umbach¹•Institutions (1)

University of Paderborn¹

01 Dec 2015

TL;DR: A new beamformer front-end for Automatic Speech Recognition that leverages the power of a bi-directional Long Short-Term Memory network to robustly estimate soft masks for a subsequent beamforming step and achieves a 53% relative reduction of the word error rate over the best baseline enhancement system for the relevant test data set.

...read moreread less

Abstract: We present a new beamformer front-end for Automatic Speech Recognition and apply it to the 3rd-CHiME Speech Separation and Recognition Challenge. Without any further modification of the back-end, we achieve a 53% relative reduction of the word error rate over the best baseline enhancement system for the relevant test data set. Our approach leverages the power of a bi-directional Long Short-Term Memory network to robustly estimate soft masks for a subsequent beamforming step. The utilized Generalized Eigenvalue beamforming operation with an optional Blind Analytic Normalization does not rely on a Direction-of-Arrival estimate and can cope with multi-path sound propagation, while at the same time only introducing very limited speech distortions. Our quite simple setup exploits the possibilities provided by simulated training data while still being able to generalize well to the fairly different real data. Finally, combining our front-end with data augmentation and another language model nearly yields a 64 % reduction of the word error rate on the real data test set.

...read moreread less

151 citations

Patent•

Classifying data with deep learning neural records incrementally refined through expert input

[...]

David Russell Williams, Luke Robert Gutzwiller, Megan Ursula Hazen, Brigham Sterling Anderson, Alan McIntyre, Tom Abeles - Show less +2 more

04 Mar 2015

TL;DR: In this paper, a fast learning model is used to classify data using machine learning that may be incrementally refined based on expert input, and another confidence value is generated and associated with the classification of the data by the fast learning models.

...read moreread less

Abstract: Embodiments are directed towards classifying data using machine learning that may be incrementally refined based on expert input. Data provided to a deep learning model that may be trained based on a plurality of classifiers and sets of training data and/or testing data. If the number of classification errors exceeds a defined threshold classifiers may be modified based on data corresponding to observed classification errors. A fast learning model may be trained based on the modified classifiers, the data, and the data corresponding to the observed classification errors. And, another confidence value may be generated and associated with the classification of the data by the fast learning model. Report information may be generated based on a comparison result of the confidence value associated with the fast learning model and the confidence value associated with the deep learning model.

...read moreread less

148 citations

Journal Article•DOI•

A comparative study of equivalent circuit models of ultracapacitors for electric vehicles

[...]

Lei Zhang¹, Lei Zhang², Zhenpo Wang¹, Xiaosong Hu³, Fengchun Sun¹, David G. Dorrell² - Show less +2 more•Institutions (3)

Beijing Institute of Technology¹, University of Technology, Sydney², University of California, Berkeley³

15 Jan 2015-Journal of Power Sources

TL;DR: In this article, three types of equivalent circuit models for ultracapacitors were examined and compared by measuring the model complexity, accuracy, and robustness against unseen data collected in the Dynamic Stress Test (DST) and a self-designed pulse test (SDP).

...read moreread less

131 citations

Journal Article•DOI•

ElecSus: A program to calculate the electric susceptibility of an atomic ensemble

[...]

Mark A. Zentile¹, James Keaveney¹, Lee Weller¹, Daniel J. Whiting¹, Charles S. Adams¹, Ifan G. Hughes¹ - Show less +2 more•Institutions (1)

Durham University¹

01 Apr 2015-Computer Physics Communications

TL;DR: A computer program and underlying model are presented to calculate the electric susceptibility of a gas, which is essential to predict its absorptive and dispersive properties, and a suite of fitting methods are provided to allow user supplied experimental data to be fit to the theory, thereby allowing experimental parameters to be extracted.

...read moreread less

130 citations

Journal Article•DOI•

Rosefw-rf

[...]

Isaac Triguero, Sara del Río¹, Victoria López¹, Jaume Bacardit², José Manuel Benítez¹, Francisco Herrera¹ - Show less +2 more•Institutions (2)

University of Granada¹, Newcastle University²

01 Oct 2015-Knowledge Based Systems

TL;DR: This work describes the methodology that won the ECBDL'14 big data challenge for a bioinformatics big data problem, named as ROSEFW-RF, which is based on several MapReduce approaches to balance the classes distribution through random oversampling and detect the most relevant features via an evolutionary feature weighting process.

...read moreread less

Abstract: The application of data mining and machine learning techniques to biological and biomedicine data continues to be an ubiquitous research theme in current bioinformatics. The rapid advances in biotechnology are allowing us to obtain and store large quantities of data about cells, proteins, genes, etc., that should be processed. Moreover, in many of these problems such as contact map prediction, the problem tackled in this paper, it is difficult to collect representative positive examples. Learning under these circumstances, known as imbalanced big data classification, may not be straightforward for most of the standard machine learning methods.In this work we describe the methodology that won the ECBDL'14 big data challenge for a bioinformatics big data problem. This algorithm, named as ROSEFW-RF, is based on several MapReduce approaches to (1) balance the classes distribution through random oversampling, (2) detect the most relevant features via an evolutionary feature weighting process and a threshold to choose them, (3) build an appropriate Random Forest model from the pre-processed data and finally (4) classify the test data. Across the paper, we detail and analyze the decisions made during the competition showing an extensive experimental study that characterize the way of working of our methodology. From this analysis we can conclude that this approach is very suitable to tackle large-scale bioinformatics classifications problems.

...read moreread less

126 citations

Journal Article•DOI•

Bayesian model updating of a coupled-slab system using field test data utilizing an enhanced Markov chain Monte Carlo simulation algorithm

[...]

Heung-Fai Lam¹, Jia-Hua Yang¹, Siu-Kui Au²•Institutions (2)

City University of Hong Kong¹, University of Liverpool²

01 Nov 2015-Engineering Structures

TL;DR: In the field test verification, the posterior marginal PDFs conditional on two model classes are obtained by the proposed MCMC algorithm, which provide valuable information about the identifiability of different model parameters.

...read moreread less

123 citations

Journal Article•DOI•

A Hybrid Instance Selection Using Nearest-Neighbor for Cross-Project Defect Prediction

[...]

Duksan Ryu¹, Jong-In Jang¹, Jongmoon Baik¹•Institutions (1)

KAIST¹

14 Sep 2015-Journal of Computer Science and Technology

TL;DR: A Hybrid Instance Selection Using Nearest-Neighbor (HISNN) method that performs a hybrid classification selectively learning local knowledge and global knowledge (via naïve Bayes) and the experimental results show that HISNN produces high overall performance as well as high PD and low PF.

...read moreread less

Abstract: Software defect prediction (SDP) is an active research field in software engineering to identify defect-prone modules. Thanks to SDP, limited testing resources can be effectively allocated to defect-prone modules. Although SDP requires sufficient local data within a company, there are cases where local data are not available, e.g., pilot projects. Companies without local data can employ cross-project defect prediction (CPDP) using external data to build classifiers. The major challenge of CPDP is different distributions between training and test data. To tackle this, instances of source data similar to target data are selected to build classifiers. Software datasets have a class imbalance problem meaning the ratio of defective class to clean class is far low. It usually lowers the performance of classifiers. We propose a Hybrid Instance Selection Using Nearest-Neighbor (HISNN) method that performs a hybrid classification selectively learning local knowledge (via k-nearest neighbor) and global knowledge (via na¨ove Bayes). Instances having strong local knowledge are identified via nearest-neighbors with the same class label. Previous studies showed low PD (probability of detection) or high PF (probability of false alarm) which is impractical to use. The experimental results show that HISNN produces high overall performance as well as high PD and low PF.

...read moreread less

95 citations

Journal Article•DOI•

Calibration of the continuous surface cap model for concrete

[...]

Hua Jiang¹, Hua Jiang², Jidong Zhao²•Institutions (2)

Chang'an University¹, Hong Kong University of Science and Technology²

01 May 2015-Finite Elements in Analysis and Design

TL;DR: In this paper, an effective calibration method to determine the material parameters for this model as functions of uniaxial compression strength and the maximum aggregate size of concrete according to formulas from CEB-FIP code and concrete test data from other published literatures is presented.

...read moreread less

Journal Article•DOI•

Optimizing the echo state network with a binary particle swarm optimization algorithm

[...]

Heshan Wang¹, Xuefeng Yan¹•Institutions (1)

East China University of Science and Technology¹

01 Sep 2015-Knowledge Based Systems

TL;DR: Results show that the O-ESN outperforms the classical feature selection method, least angle regression (LAR) method in that its architecture is simpler than that of LAR.

...read moreread less

Abstract: The echo state network (ESN) is a novel and powerful method for the temporal processing of recurrent neural networks. It has tremendous potential for solving a variety of problems, especially real-valued, time-series modeling tasks. However, its complicated topologies and random reservoirs are difficult to implement in practice. For instance, the reservoir must be large enough to capture all data features given that the reservoir is generated randomly. To reduce network complexity and to improve generalization ability, we present a novel optimized ESN (O-ESN) based on binary particle swarm optimization (BPSO). Because the optimization of output weights connection structures is a feature selection problem and PSO has been used as a promising method for feature selection problems, BPSO is employed to determine the optimal connection structures for output weights in the O-ESN. First, we establish and train an ESN with sufficient internal units using training data. The connection structure of output weights, i.e., connection or disconnection, is then optimized through BPSO with validation data. Finally, the performance of the O-ESN is evaluated through test data. This performance is demonstrated in three different types of problems, namely, a system identification and two time-series benchmark tasks. Results show that the O-ESN outperforms the classical feature selection method, least angle regression (LAR) method in that its architecture is simpler than that of LAR.

...read moreread less

Journal Article•DOI•

Test set bias affects reproducibility of gene signatures

[...]

Prasad Patil¹, Pierre Olivier Bachant-Winner, Benjamin Haibe-Kains², Jeffrey T. Leek¹•Institutions (2)

Johns Hopkins University¹, Princess Margaret Cancer Centre²

15 Jul 2015-Bioinformatics

TL;DR: It is demonstrated that results from existing gene signatures which rely on normalizing test data may be irreproducible when the patient population changes composition or size using a set of curated, publicly available breast cancer microarray experiments.

...read moreread less

Abstract: Motivation: Prior to applying genomic predictors to clinical samples, the genomic data must be properly normalized to ensure that the test set data are comparable to the data upon which the predictor was trained. The most effective normalization methods depend on data from multiple patients. From a biomedical perspective, this implies that predictions for a single patient may change depending on which other patient samples they are normalized with. This test set bias will occur when any cross-sample normalization is used before clinical prediction. Results: We demonstrate that results from existing gene signatures which rely on normalizing test data may be irreproducible when the patient population changes composition or size using a set of curated, publicly available breast cancer microarray experiments. As an alternative, we examine the use of gene signatures that rely on ranks from the data and show why signatures using rank-based features can avoid test set bias while maintaining highly accurate classification, even across platforms. Availability and implementation: The code, data and instructions necessary to reproduce our entire analysis is available at https://github.com/prpatil/testsetbias. Contact: moc.liamg@keeltj or ac.hcraesernhu@akebiahb Supplementary information: Supplementary data are available at Bioinformatics online.

...read moreread less

Journal Article•DOI•

Isogeometric Fatigue Damage Prediction in Large-Scale Composite Structures Driven by Dynamic Sensor Data

[...]

Yuri Bazilevs¹, Xiaowei Deng¹, A. Korobenko¹, F. Lanza di Scalea¹, Michael D. Todd¹, S. G. Taylor² - Show less +2 more•Institutions (2)

University of California, San Diego¹, Los Alamos National Laboratory²

01 Sep 2015-Journal of Applied Mechanics

TL;DR: The results indicate that using an advanced computational model informed by in situ SHM data leads to accurate prediction of the damage zone formation, damage progression, and eventual failure of the structure.

...read moreread less

Abstract: Author(s): Bazilevs, Y; Deng, X; Korobenko, A; Di Scalea, FL; Todd, MD; Taylor, SG | Abstract: In this paper, we combine recent developments in modeling of fatigue-damage, isogeometric analysis (IGA) of thin-shell structures, and structural health monitoring (SHM) to develop a computational steering framework for fatigue-damage prediction in full-scale laminated composite structures. The main constituents of the proposed framework are described in detail, and the framework is deployed in the context of an actual fatigue test of a full-scale wind-turbine blade structure. The results indicate that using an advanced computational model informed by in situ SHM data leads to accurate prediction of the damage zone formation, damage progression, and eventual failure of the structure. Although the blade fatigue simulation was driven by test data obtained prior to the computation, the proposed computational steering framework may be deployed concurrently with structures undergoing fatigue loading.

...read moreread less

Journal Article•DOI•

AQUAgpusph, a new free 3D SPH solver accelerated with OpenCL ☆

[...]

J.L. Cercos-Pita¹•Institutions (1)

Technical University of Madrid¹

01 Jul 2015-Computer Physics Communications

TL;DR: AQUAgpusph has been designed trying to provide researchers and engineers with a valuable tool to test and apply the SPH method and modifications are shown to improve the solver speed, the results quality, and allow for a wider areas of application.

...read moreread less

Journal Article•DOI•

Does Automated Unit Test Generation Really Help Software Testers? A Controlled Empirical Study

[...]

Gordon Fraser¹, Matt Staats², Phil McMinn¹, Andrea Arcuri³, Frank Padberg⁴ - Show less +1 more•Institutions (4)

University of Sheffield¹, University of Luxembourg², Simula Research Laboratory³, Karlsruhe Institute of Technology⁴

02 Sep 2015-ACM Transactions on Software Engineering and Methodology

TL;DR: It is found that, on one hand, tool support leads to clear improvements in commonly applied quality metrics such as code coverage (up to 300% increase), however, on the other hand, there was no measurable improvement in the number of bugs actually found by developers.

...read moreread less

Abstract: Work on automated test generation has produced several tools capable of generating test data which achieves high structural coverage over a program. In the absence of a specification, developers are expected to manually construct or verify the test oracle for each test input. Nevertheless, it is assumed that these generated tests ease the task of testing for the developer, as testing is reduced to checking the results of tests. While this assumption has persisted for decades, there has been no conclusive evidence to date confirming it. However, the limited adoption in industry indicates this assumption may not be correct, and calls into question the practical value of test generation tools. To investigate this issue, we performed two controlled experiments comparing a total of 97 subjects split between writing tests manually and writing tests with the aid of an automated unit test generation tool, EvoSuite. We found that, on one hand, tool support leads to clear improvements in commonly applied quality metrics such as code coverage (up to 300p increase). However, on the other hand, there was no measurable improvement in the number of bugs actually found by developers. Our results not only cast some doubt on how the research community evaluates test generation tools, but also point to improvements and future work necessary before automated test generation tools will be widely adopted by practitioners.

...read moreread less

Journal Article•DOI•

Axial Stress-Strain Model of CFRP-Confined Concrete under Monotonic and Cyclic Loading

[...]

Najwa F. Hany¹, Elie G. Hantouche¹, Mohamed H. Harajli¹•Institutions (1)

American University of Beirut¹

08 Jan 2015-Journal of Composites for Construction

TL;DR: In this article, a constitutive axial stress-strain material model of CFRP-confined concrete under generalized loading is developed, which is composed of a monotonic envelope response and a cyclic response, and assumes a more simplified approach than existing models available in the literature.

...read moreread less

Abstract: Experimental results of the axial stress-strain response of eighteen carbon-fiber-reinforced polymer (CFRP) confined circular, square, and rectangular column specimens when subjected to cyclic axial compression are presented and discussed. Guided by these test results and other test data reported in the technical literature, a constitutive axial stress-strain material model of CFRP-confined concrete under generalized loading is developed. The proposed model, which is composed of a monotonic envelope response and a cyclic response, accounts for a wide range of test parameters and assumes a more simplified approach than existing models available in the literature. The model covers all important parameters in a unified manner, and predicts both ascending and descending postpeak responses. In addition to its simplicity in application, despite little discrepancy, the model was able to reproduce the test results generated in the experimental part of this investigation and other test data reported in the...

...read moreread less

Book Chapter•DOI•

SVM Parameter Tuning with Grid Search and Its Impact on Reduction of Model Over-fitting

[...]

Petre Lameski¹, Eftim Zdravevski¹, Riste Mingov, Andrea Kulakov¹•Institutions (1)

Saints Cyril and Methodius University of Skopje¹

01 Jan 2015

TL;DR: This paper addresses the challenge of building robust classification models with support vector machines (SVMs) that are built from time series data and investigates the impact of parameter tuning of SVMs with grid search on the classification performance and its effect on preventing over-fitting.

...read moreread less

Abstract: In this paper we describe our submission to the IJCRS’15 Data Mining Competition, which is concerned with prediction of dangerous concentrations of methane in longwalls of a Polish coalmine. We address the challenge of building robust classification models with support vector machines (SVMs) that are built from time series data. Moreover, we investigate the impact of parameter tuning of SVMs with grid search on the classification performance and its effect on preventing over-fitting. Our results show improvements of predictive performance with proper parameter tuning but also improved stability of the classification models even when the test data comes from a different time period and class distribution. By applying the proposed method we were able to build a classification model that predicts unseen test data even better than the training data, thus highlighting the non-over-fitting properties of the model. The submitted solution was about 2 % behind the winning solution.

...read moreread less

Journal Article•DOI•

Cracking assessment in concrete structures by distributed optical fiber

[...]

Gerardo Rodríguez, Joan R. Casas, Sergi Villaba

03 Feb 2015-Smart Materials and Structures

TL;DR: In this article, a method to obtain crack initiation, location and width in concrete structures subjected to bending and instrumented with an optical backscattered reflectometer (OBR) system is proposed.

...read moreread less

Abstract: In this paper, a method to obtain crack initiation, location and width in concrete structures subjected to bending and instrumented with an optical backscattered reflectometer (OBR) system is proposed. Continuous strain data with high spatial resolution and accuracy are the main advantages of the OBR system. These characteristics make this structural health monitoring technique a useful tool in early damage detection in important structural problems. In the specific case of reinforced concrete structures, which exhibit cracks even in-service loading, the possibility to obtain strain data with high spatial resolution is a main issue. In this way, this information is of paramount importance concerning the durability and long performance and management of concrete structures. The proposed method is based on the results of a test up to failure carried out on a reinforced concrete slab. Using test data and different crack modeling criteria in concrete structures, simple nonlinear finite element models were elaborated to validate its use in the localization and appraisal of the crack width in the testing slab.

...read moreread less

Journal Article•DOI•

Phonon Transport Simulator (PhonTS)

[...]

Aleksandr V. Chernatynskiy¹, Simon R. Phillpot¹•Institutions (1)

University of Florida¹

01 Jul 2015-Computer Physics Communications

TL;DR: The Phonon Transport Simulator (PhonTS), a Fortran90, fully parallel code to perform thermal conductivity calculations in crystal solids from the level of the interatomic interactions, is introduced.

...read moreread less

Journal Article•DOI•

Automated finite element model updating of a scale bridge model using measured static and modal test data

[...]

Masoud Sanayei¹, Ali Khaloo², Mustafa Gül³, F. Necati Catbas⁴•Institutions (4)

Tufts University¹, George Mason University², University of Alberta³, University of Central Florida⁴

01 Nov 2015-Engineering Structures

TL;DR: In this article, a multiresponse structural parameter estimation method for the automated finite element (FE) model updating using data obtained from a set of nondestructive tests conducted on a laboratory bridge model is presented.

...read moreread less

Journal Article•DOI•

Evaluating the robustness of models developed from field spectral data in predicting African grass foliar nitrogen concentration using WorldView-2 image as an independent test dataset

[...]

Onisimo Mutanga¹, Elhadi Adam², Clement Adjorlolo¹, Clement Adjorlolo³, Elfatih M. Abdel-Rahman¹ - Show less +1 more•Institutions (3)

University of KwaZulu-Natal¹, University of the Witwatersrand², South African National Space Agency³

01 Feb 2015-International Journal of Applied Earth Observation and Geoinformation

TL;DR: Results provide an insight on the magnitude of errors that are expected when up-scaling field spectral models to airborne or satellite image data and indicates the unceasing relevance of field spectroscopy studies to better understand the spectral models critical for vegetation quality assessment.

...read moreread less

Journal Article•DOI•

Lithology prediction by support vector classifiers using inverted seismic attributes data and petrophysical logs as a new approach and investigation of training data set size effect on its performance in a heterogeneous carbonate reservoir

[...]

Mohammad Ali Sebtosheikh, Ali Salehi¹•Institutions (1)

National Iranian South Oil Company¹

01 Oct 2015-Journal of Petroleum Science and Engineering

TL;DR: In this paper, support vector machines (SVMs) based on statistical learning theory (SLT) and the principles of structural risk minimization (SRM) and empirical risk minimisation (ERM) use an analytical approach to classification and regression.

...read moreread less

Journal Article•DOI•

Statistical Disclosure Control for Micro-Data Using the R Package sdcMicro

[...]

Matthias Templ¹, Alexander Kowarik², Bernhard Meindl²•Institutions (2)

Vienna University of Technology¹, Statistics Austria²

07 Oct 2015-Journal of Statistical Software

TL;DR: The sdcMicro package as mentioned in this paper is an R package that implements SDC methods to evaluate and anonymize confidential micro-data sets, which includes all popular disclosure risk and perturbation methods, including frequency counts, individual and global risk measures, information loss and data utility statistics.

...read moreread less

Abstract: The demand for data from surveys, censuses or registers containing sensible information on people or enterprises has increased significantly over the last years. However, before data can be provided to the public or to researchers, confidentiality has to be respected for any data set possibly containing sensible information about individual units. Confidentiality can be achieved by applying statistical disclosure control (SDC) methods to the data in order to decrease the disclosure risk of data. The R package sdcMicro serves as an easy-to-handle, object-oriented S4 class implementation of SDC methods to evaluate and anonymize confidential micro-data sets. It includes all popular disclosure risk and perturbation methods. The package performs automated recalculation of frequency counts, individual and global risk measures, information loss and data utility statistics after each anonymization step. All methods are highly optimized in terms of computational costs to be able to work with large data sets. Reporting facilities that summarize the anonymization process can also be easily used by practitioners. We describe the package and demonstrate its functionality with a complex household survey test data set that has been distributed by the International Household Survey Network.

...read moreread less

Proceedings Article•DOI•

Motivating Personality-aware Machine Translation

[...]

Shachar Mirkin¹, Scott Nowson², Caroline Brun², Julien Perez²•Institutions (2)

IBM¹, Xerox²

01 Sep 2015

TL;DR: It is shown that both translation of the source training data into the target language, and the target testData into the source language has a detrimental effect on the accuracy of predicting author traits, which supports the need for personal and personality-aware machine translation models.

...read moreread less

Abstract: Language use is known to be influenced by personality traits as well as by sociodemographic characteristics such as age or mother tongue. As a result, it is possible to automatically identify these traits of the author from her texts. It has recently been shown that knowledge of such dimensions can improve performance in NLP tasks such as topic and sentiment modeling. We posit that machine translation is another application that should be personalized. In order to motivate this, we explore whether translation preserves demographic and psychometric traits. We show that, largely, both translation of the source training data into the target language, and the target test data into the source language has a detrimental effect on the accuracy of predicting author traits. We argue that this supports the need for personal and personality-aware machine translation models.

...read moreread less

Journal Article•DOI•

Adapting ant colony optimization to generate test data for software structural testing

[...]

Chengying Mao¹, Chengying Mao², Lichuan Xiao¹, Xinxin Yu¹, Jinfu Chen³ - Show less +1 more•Institutions (3)

Jiangxi University of Finance and Economics¹, Wuhan University², Jiangsu University³

01 Feb 2015-Swarm and evolutionary computation

TL;DR: The experimental results show that the basic ACO algorithm is reformed into discrete version so as to generate test data for structural testing outperforms the existing simulated annealing and genetic algorithm in the quality of test data and stability, and is comparable to particle swarm optimization-based method.

...read moreread less

Abstract: In general, software testing has been viewed as an effective way to improve software quality and reliability. However, the quality of test data has a significant impact on the fault-revealing ability of software testing activity. Recently, search-based test data generation has been treated as an operational approach to settle this difficulty. In the paper, the basic ACO algorithm is reformed into discrete version so as to generate test data for structural testing. First, the technical roadmap of combining the adapted ACO algorithm and test process together is introduced. In order to improve algorithm׳s searching ability and generate more diverse test inputs, some strategies such as local transfer, global transfer and pheromone update are defined and applied. The coverage for program elements is a special optimization objective, so the customized fitness function is constructed in our approach through comprehensively considering the nesting level and predicate type of branch. To validate the effectiveness of our ACO-based test data generation method, eight well-known programs are utilized to perform the comparative analysis. The experimental results show that our approach outperforms the existing simulated annealing and genetic algorithm in the quality of test data and stability, and is comparable to particle swarm optimization-based method. In addition, the sensitivity analysis on algorithm parameters is also employed to recommend the reasonable parameter settings for practical applications.

...read moreread less

Journal Article•DOI•

3D meso-scale modelling of concrete material in spall tests

[...]

Gang Chen, Yifei Hao¹, Hong Hao¹•Institutions (1)

Tianjin University¹

01 Jun 2015-Materials and Structures

TL;DR: In this paper, a 3D meso-scale model of concrete specimen with consideration of cement mortar and aggregates is developed to simulate spall tests and investigate the behavior of concrete material under high strain rate.

...read moreread less

Abstract: Tensile strength is one of the key factors of concrete material that need be accurately defined in analysis of concrete structures subjected to high-speed impact loads. Dynamic tensile strength of concrete material is usually obtained by conducting laboratory tests such as direct tensile test, Brazilian splitting test and spall test. Concrete is a heterogeneous material with different components, but is conventionally assumed to be homogeneous, i.e., cement mortar only, in most previous experimental or numerical studies. The aggregates in concrete material are usually neglected owing to testing limitation and numerical simplification. It has been well acknowledged that neglecting coarse aggregates might not necessarily give accurate concrete dynamic material properties. In the present study, a 3D meso-scale model of concrete specimen with consideration of cement mortar and aggregates is developed to simulate spall tests and investigate the behaviour of concrete material under high strain rate. The commercial software LS-DYNA is used to perform the numerical simulations of spall tests. The mesh size sensitivity is examined by conducting mesh convergence tests. The reliability of the numerical model in simulating the spall tests is verified by comparing the numerical results with the experimental data from the literature. The influence of coarse aggregates on the experimental test results is studied. The wave attenuation in concrete specimen is analysed, and empirical equations are proposed for quick assessment of the test data to determine the true dynamic tensile strength of concrete material. The contributions of aggregates to dynamic strength in spall tests are quantified for modifying the test results based on mortar material in the literature.

...read moreread less

Journal Article•DOI•

UDEC–AUTODYN Hybrid Modeling of a Large-Scale Underground Explosion Test

[...]

Xifei Deng, Shougen Chen¹, Jianbo Zhu², Yingxin Zhou³, Zhiye Zhao⁴, Jian Zhao⁵ - Show less +2 more•Institutions (5)

Southwest Jiaotong University¹, Hong Kong Polytechnic University², Defence Science and Technology Agency³, Nanyang Technological University⁴, Monash University⁵

01 Mar 2015-Rock Mechanics and Rock Engineering

TL;DR: In this paper, numerical modeling of a large-scale decoupled underground explosion test with 10 tons of TNT in Alvdalen, Sweden is performed by combining DEM and FEM with codes UDEC and AUTODYN.

...read moreread less

Abstract: In this study, numerical modeling of a large-scale decoupled underground explosion test with 10 tons of TNT in Alvdalen, Sweden is performed by combining DEM and FEM with codes UDEC and AUTODYN. AUTODYN is adopted to model the explosion process, blast wave generation, and its action on the explosion chamber surfaces, while UDEC modeling is focused on shock wave propagation in jointed rock masses surrounding the explosion chamber. The numerical modeling results with the hybrid AUTODYN–UDEC method are compared with empirical estimations, purely AUTODYN modeling results, and the field test data. It is found that in terms of peak particle velocity, empirical estimations are much smaller than the measured data, while purely AUTODYN modeling results are larger than the test data. The UDEC–AUTODYN numerical modeling results agree well with the test data. Therefore, the UDEC–AUTODYN method is appropriate in modeling a large-scale explosive detonation in a closed space and the following wave propagation in jointed rock masses. It should be noted that joint mechanical and spatial properties adopted in UDEC–AUTODYN modeling are determined with empirical equations and available geological data, and they may not be sufficiently accurate.

...read moreread less

Journal Article•DOI•

Cyclic test data of six unreinforced masonry walls with different boundary conditions

[...]

Sarah Petry¹, Katrin Beyer¹•Institutions (1)

École Polytechnique Fédérale de Lausanne¹

01 Dec 2015-Earthquake Spectra

TL;DR: Petry and Beyer as discussed by the authors presented a new data set (Petry et al. 2014a; DOI:10.5281/zenodo.8443) on six wall tests, which is publicly ava...

...read moreread less

Abstract: Previous test data on unreinforced masonry walls focused on the global response of the wall. A new data set (Petry and Beyer 2014a; DOI:10.5281/zenodo.8443) on six wall tests, which is publicly ava...

...read moreread less

Journal Article•DOI•

On the Analysis of Impulse Test Results on Grounding Systems

[...]

Noureddine Harid, Huw Griffiths, S. Mousa¹, David Clark¹, Stephen C. Robson¹, Abderrahmane Haddad¹ - Show less +2 more•Institutions (1)

Cardiff University¹

05 Jun 2015-IEEE Transactions on Industry Applications

TL;DR: In this paper, an analysis of experimental results on the behavior under impulse currents of various grounding electrodes: rod and horizontal electrodes, a ground grid, and tower footings, is presented.

...read moreread less

Abstract: In this paper, an analysis of experimental results on the behavior under impulse currents of various grounding electrodes: rod and horizontal electrodes, a ground grid, and tower footings, is presented. The parameters used for analyzing transient performance are reviewed, and the differences are highlighted based on test data. The analysis is extended to the following: 1) to quantify the effect of impulse shape; 2) to quantify the effect of current magnitude; 3) to compare low-frequency and impulse performances; 4) to compare impulse and high-frequency performances; and 5) to examine the effects of the test setup on measured results, e.g., in the case of field tests, the effect of current return leads, and proximity and extent of return electrodes. A generalized impulse index is introduced to help elucidate the differences between different parameters used for the analysis of transient test results on ground electrodes. It is found that the analysis of test data based on different parameters may lead to different assessments of impulse performance. The results also show that the impulse parameters used for the analysis of test data can be influenced by several factors such as electrode length, impulse current rise time, and experimental setup. In addition, variable-frequency test results are analyzed by introducing a “harmonic coefficient” which quantifies the deviations of the harmonic impedance from the low-frequency resistance over different frequency ranges covering the entire lightning frequency spectrum. Significant variations of the harmonic coefficient with frequency were observed, highlighting the importance of taking the frequency dependence of soil properties when modeling the impulse and high-frequency behavior of grounding systems.

...read moreread less

Collapse