Showing papers in "ACM Transactions on Software Engineering and Methodology in 2018"

PDF

Open Access

Journal Article•DOI•

How Far We Have Progressed in the Journey? An Examination of Cross-Project Defect Prediction

[...]

Yuming Zhou¹, Yibiao Yang¹, Hongmin Lu¹, Lin Chen¹, Yanhui Li¹, Yangyang Zhao¹, Junyan Qian², Baowen Xu¹ - Show less +4 more•Institutions (2)

Nanjing University¹, Guilin University of Electronic Technology²

16 Apr 2018-ACM Transactions on Software Engineering and Methodology

TL;DR: The results caution us that, if the prediction performance is the goal, the real progress in CPDP is not being achieved as it might have been envisaged, and recommend that future studies should include ManualDown/ManualUp as the baseline models for comparison when developing new C PDP models to predict defects in a complete target project.

...read moreread less

Abstract: Background. Recent years have seen an increasing interest in cross-project defect prediction (CPDP), which aims to apply defect prediction models built on source projects to a target project. Currently, a variety of (complex) CPDP models have been proposed with a promising prediction performance.Problem. Most, if not all, of the existing CPDP models are not compared against those simple module size models that are easy to implement and have shown a good performance in defect prediction in the literature.Objective. We aim to investigate how far we have really progressed in the journey by comparing the performance in defect prediction between the existing CPDP models and simple module size models.Method. We first use module size in the target project to build two simple defect prediction models, ManualDown and ManualUp, which do not require any training data from source projects. ManualDown considers a larger module as more defect-prone, while ManualUp considers a smaller module as more defect-prone. Then, we take the following measures to ensure a fair comparison on the performance in defect prediction between the existing CPDP models and the simple module size models: using the same publicly available data sets, using the same performance indicators, and using the prediction performance reported in the original cross-project defect prediction studies.Result. The simple module size models have a prediction performance comparable or even superior to most of the existing CPDP models in the literature, including many newly proposed models.Conclusion. The results caution us that, if the prediction performance is the goal, the real progress in CPDP is not being achieved as it might have been envisaged. We hence recommend that future studies should include ManualDown/ManualUp as the baseline models for comparison when developing new CPDP models to predict defects in a complete target project.

...read moreread less

142 citations

Journal Article•DOI•

The ABC of Software Engineering Research

[...]

Klaas-Jan Stol¹, Brian Fitzgerald²•Institutions (2)

University College Cork¹, University of Limerick²

17 Sep 2018-ACM Transactions on Software Engineering and Methodology

TL;DR: A taxonomy from the social sciences is adopted, termed here the ABC framework for SE research, which offers a holistic view of eight archetypal research strategies, and six ways in which the framework can advance SE research.

...read moreread less

Abstract: A variety of research methods and techniques are available to SE researchers, and while several overviews exist, there is consistency neither in the research methods covered nor in the terminology used. Furthermore, research is sometimes critically reviewed for characteristics inherent to the methods. We adopt a taxonomy from the social sciences, termed here the ABC framework for SE research, which offers a holistic view of eight archetypal research strategies. ABC refers to the research goal that strives for generalizability over Actors (A) and precise measurement of their Behavior (B), in a realistic Context (C). The ABC framework uses two dimensions widely considered to be key in research design: the level of obtrusiveness of the research and the generalizability of research findings. We discuss metaphors for each strategy and their inherent limitations and potential strengths. We illustrate these research strategies in two key SE domains, global software engineering and requirements engineering, and apply the framework on a sample of 75 articles. Finally, we discuss six ways in which the framework can advance SE research.

...read moreread less

138 citations

Journal Article•DOI•

Lightweight, Obfuscation-Resilient Detection and Family Identification of Android Malware

[...]

Joshua Garcia¹, Mahmoud Hammad¹, Sam Malek¹•Institutions (1)

University of California, Irvine¹

12 Jan 2018-ACM Transactions on Software Engineering and Methodology

TL;DR: This work presents a novel machine-learning-based Android malware detection and family identification approach, RevealDroid, that operates without the need to perform complex program analyses or to extract large sets of features and demonstrates its superiority against state-of-the-art approaches.

...read moreread less

Abstract: The number of malicious Android apps is increasing rapidly. Android malware can damage or alter other files or settings, install additional applications, and so on. To determine such behaviors, a security analyst can significantly benefit from identifying the family to which an Android malware belongs rather than only detecting if an app is malicious. Techniques for detecting Android malware, and determining their families, lack the ability to handle certain obfuscations that aim to thwart detection. Moreover, some prior techniques face scalability issues, preventing them from detecting malware in a timely manner.To address these challenges, we present a novel machine-learning-based Android malware detection and family identification approach, RevealDroid, that operates without the need to perform complex program analyses or to extract large sets of features. Specifically, our selected features leverage categorized Android API usage, reflection-based features, and features from native binaries of apps. We assess RevealDroid for accuracy, efficiency, and obfuscation resilience using a large dataset consisting of more than 54,000 malicious and benign apps. Our experiments show that RevealDroid achieves an accuracy of 98% in detection of malware and an accuracy of 95% in determination of their families. We further demonstrate RevealDroid’s superiority against state-of-the-art approaches.

...read moreread less

125 citations

Journal Article•DOI•

FEMOSAA: Feature-Guided and Knee-Driven Multi-Objective Optimization for Self-Adaptive Software

[...]

Tao Chen¹, Ke Li², Rami Bahsoon¹, Xin Yao³•Institutions (3)

University of Birmingham¹, University of Electronic Science and Technology of China², Southern University of Science and Technology³

29 Jun 2018-ACM Transactions on Software Engineering and Methodology

TL;DR: A novel framework that automatically synergizes the feature model and Multi-Objective Evolutionary Algorithm (MOEA) to optimize SAS at runtime, and a new method to search for the knee solutions, which can achieve a balanced tradeoff.

...read moreread less

Abstract: Self-Adaptive Software (SAS) can reconfigure itself to adapt to the changing environment at runtime, aiming to continually optimize conflicted nonfunctional objectives (e.g., response time, energy consumption, throughput, cost, etc.). In this article, we present Feature-guided and knEe-driven Multi-Objective optimization for Self-Adaptive softwAre (FEMOSAA), a novel framework that automatically synergizes the feature model and Multi-Objective Evolutionary Algorithm (MOEA) to optimize SAS at runtime. FEMOSAA operates in two phases: at design time, FEMOSAA automatically transposes the engineers’ design of SAS, expressed as a feature model, to fit the MOEA, creating new chromosome representation and reproduction operators. At runtime, FEMOSAA utilizes the feature model as domain knowledge to guide the search and further extend the MOEA, providing a larger chance for finding better solutions. In addition, we have designed a new method to search for the knee solutions, which can achieve a balanced tradeoff. We comprehensively evaluated FEMOSAA on two running SAS: One is a highly complex SAS with various adaptable real-world software under the realistic workload trace; another is a service-oriented SAS that can be dynamically composed from services. In particular, we compared the effectiveness and overhead of FEMOSAA against four of its variants and three other search-based frameworks for SAS under various scenarios, including three commonly applied MOEAs, two workload patterns, and diverse conflicting quality objectives. The results reveal the effectiveness of FEMOSAA and its superiority over the others with high statistical significance and nontrivial effect sizes.

...read moreread less

63 citations

Journal Article•DOI•

Configuring Software Product Lines by Combining Many-Objective Optimization and SAT Solvers

[...]

Yi Xiang¹, Yuren Zhou¹, Zibin Zheng¹, Miqing Li²•Institutions (2)

Sun Yat-sen University¹, University of Birmingham²

20 Feb 2018-ACM Transactions on Software Engineering and Methodology

TL;DR: A new approach named SATVaEA for handling the optimal feature selection problem, which is simplified with the number of both features and constraints being reduced greatly and significantly outperforms other state-of-the-art algorithms.

...read moreread less

Abstract: A feature model (FM) is a compact representation of the information of all possible products from software product lines. The optimal feature selection involves the simultaneous optimization of multiple (usually more than three) objectives in a large and highly constrained search space. By combining our previous work on many-objective evolutionary algorithm (i.e., VaEA) with two different satisfiability (SAT) solvers, this article proposes a new approach named SATVaEA for handling the optimal feature selection problem. In SATVaEA, an FM is simplified with the number of both features and constraints being reduced greatly. We enhance the search of VaEA by using two SAT solvers: one is a stochastic local search--based SAT solver that can quickly repair infeasible configurations, whereas the other is a conflict-driven clause-learning SAT solver that is introduced to generate diversified products. We evaluate SATVaEA on 21 FMs with up to 62,482 features, including two models with realistic values for feature attributes. The experimental results are promising, with SATVaEA returning 100% valid products on almost all FMs. For models with more than 10,000 features, the search in SATVaEA takes only a few minutes. Concerning both effectiveness and efficiency, SATVaEA significantly outperforms other state-of-the-art algorithms.

...read moreread less

59 citations

Journal Article•DOI•

Variability-Aware Static Analysis at Scale: An Empirical Study

[...]

Alexander von Rhein, Jörg Liebig, Andreas Janker¹, Christian Kästner², Sven Apel³ - Show less +1 more•Institutions (3)

Capgemini¹, Carnegie Mellon University², University of Passau³

16 Nov 2018-ACM Transactions on Software Engineering and Methodology

TL;DR: Overall, it is found that variability-aware analysis outperforms most sample-based static-analysis techniques with respect to efficiency and effectiveness.

...read moreread less

Abstract: The advent of variability management and generator technology enables users to derive individual system variants from a configurable code base by selecting desired configuration options. This approach gives rise to the generation of possibly billions of variants, which, however, cannot be efficiently analyzed for bugs and other properties with classic analysis techniques. To address this issue, researchers and practitioners have developed sampling heuristics and, recently, variability-aware analysis techniques. While sampling reduces the analysis effort significantly, the information obtained is necessarily incomplete, and it is unknown whether state-of-the-art sampling techniques scale to billions of variants. Variability-aware analysis techniques process the configurable code base directly, exploiting similarities among individual variants with the goal of reducing analysis effort. However, while being promising, so far, variability-aware analysis techniques have been applied mostly only to small academic examples. To learn about the mutual strengths and weaknesses of variability-aware and sample-based static-analysis techniques, we compared the two by means of seven concrete control-flow and data-flow analyses, applied to five real-world subject systems: Busybox, OpenSSL, SQLite, the x86 Linux kernel, and uClibc. In particular, we compare the efficiency (analysis execution time) of the static analyses and their effectiveness (potential bugs found). Overall, we found that variability-aware analysis outperforms most sample-based static-analysis techniques with respect to efficiency and effectiveness. For example, checking all variants of OpenSSL with a variability-aware static analysis is faster than checking even only two variants with an analysis that does not exploit similarities among variants.

...read moreread less

46 citations

Journal Article•DOI•

Linear Programming as a Baseline for Software Effort Estimation

[...]

Federica Sarro¹, Alessio Petrozziello²•Institutions (2)

University College London¹, University of Portsmouth²

17 Sep 2018-ACM Transactions on Software Engineering and Methodology

TL;DR: It is suggested that using LP4EE as a baseline can help reduce conclusion instability and be more accurate and robust than ATLM against different data splits and cross-validation methods for 44% of the cases.

...read moreread less

Abstract: Software effort estimation studies still suffer from discordant empirical results (ie, conclusion instability) mainly due to the lack of rigorous benchmarking methods So far only one baseline model, namely, Automatically Transformed Linear Model (ATLM), has been proposed yet it has not been extensively assessed In this article, we propose a novel method based on Linear Programming (dubbed as Linear Programming for Effort Estimation, LP4EE) and carry out a thorough empirical study to evaluate the effectiveness of both LP4EE and ATLM for benchmarking widely used effort estimation techniques The results of our study confirm the need to benchmark every other proposal against accurate and robust baselines They also reveal that LP4EE is more accurate than ATLM for 17% of the experiments and more robust than ATLM against different data splits and cross-validation methods for 44% of the cases These results suggest that using LP4EE as a baseline can help reduce conclusion instability We make publicly available an open-source implementation of LP4EE in order to facilitate its adoption in future studies

...read moreread less

44 citations

Journal Article•DOI•

Variability Bugs in Highly Configurable Systems: A Qualitative Analysis

[...]

Iago Abal¹, Jean Melo¹, Stefan Stanciulescu¹, Claus Brabrand¹, Márcio Ribeiro², Andrzej Wąsowski¹ - Show less +2 more•Institutions (2)

University of Copenhagen¹, Federal University of Alagoas²

12 Jan 2018-ACM Transactions on Software Engineering and Methodology

TL;DR: This study provides insights into the nature and occurrence of variability bugs in four highly-configurable systems implemented in C/C++, and shows in what ways variability hinders comprehension and the uncovering of software bugs.

...read moreread less

Abstract: Variability-sensitive verification pursues effective analysis of the exponentially many variants of a program family. Several variability-aware techniques have been proposed, but researchers still lack examples of concrete bugs induced by variability, occurring in real large-scale systems. A collection of real world bugs is needed to evaluate tool implementations of variability-sensitive analyses by testing them on real bugs. We present a qualitative study of 98 diverse variability bugs (i.e., bugs that occur in some variants and not in others) collected from bug-fixing commits in the Linux, Apache, BusyBox, and Marlin repositories. We analyze each of the bugs, and record the results in a database. For each bug, we create a self-contained simplified version and a simplified patch, in order to help researchers who are not experts on these subject studies to understand them, so that they can use these bugs for evaluation of their tools. In addition, we provide single-function versions of the bugs, which are useful for evaluating intra-procedural analyses. A web-based user interface for the database allows to conveniently browse and visualize the collection of bugs. Our study provides insights into the nature and occurrence of variability bugs in four highly-configurable systems implemented in C/C++, and shows in what ways variability hinders comprehension and the uncovering of software bugs.

...read moreread less

43 citations

Journal Article•DOI•

An Empirical Study of Meta- and Hyper-Heuristic Search for Multi-Objective Release Planning

[...]

Yuanyuan Zhang¹, Mark Harman¹, Gabriela Ochoa², Guenther Ruhe³, Sjaak Brinkkemper⁴ - Show less +1 more•Institutions (4)

University College London¹, University of Stirling², University of Calgary³, Utrecht University⁴

05 Jun 2018-ACM Transactions on Software Engineering and Methodology

TL;DR: In this paper, the authors present an empirical study of global, local, and hybrid meta- and hyper-heuristic search-based algorithms on 10 real-world datasets and find that the hyperheuristics are particularly effective.

...read moreread less

Abstract: A variety of meta-heuristic search algorithms have been introduced for optimising software release planning. However, there has been no comprehensive empirical study of different search algorithms across multiple different real-world datasets. In this article, we present an empirical study of global, local, and hybrid meta- and hyper-heuristic search-based algorithms on 10 real-world datasets. We find that the hyper-heuristics are particularly effective. For example, the hyper-heuristic genetic algorithm significantly outperformed the other six approaches (and with high effect size) for solution quality 85% of the time, and was also faster than all others 70% of the time. Furthermore, correlation analysis reveals that it scales well as the number of requirements increases.

...read moreread less

37 citations

Journal Article•DOI•

Test-Equivalence Analysis for Automatic Patch Generation

[...]

Sergey Mechtaev¹, Xiang Gao¹, Shin Hwei Tan¹, Abhik Roychoudhury¹•Institutions (1)

National University of Singapore¹

22 Oct 2018-ACM Transactions on Software Engineering and Methodology

TL;DR: This work proposes two test-Equivalence relations based on runtime values and dependencies, respectively, and presents an algorithm that performs on-the-fly partitioning of patches into test-equivalence classes.

...read moreread less

Abstract: Automated program repair is a problem of finding a transformation (called a patch) of a given incorrect program that eliminates the observable failures. It has important applications such as providing debugging aids, automatically grading student assignments, and patching security vulnerabilities. A common challenge faced by existing repair techniques is scalability to large patch spaces, since there are many candidate patches that these techniques explicitly or implicitly consider.The correctness criteria for program repair is often given as a suite of tests. Current repair techniques do not scale due to the large number of test executions performed by the underlying search algorithms. In this work, we address this problem by introducing a methodology of patch generation based on a test-equivalence relation (if two programs are “test-equivalent” for a given test, they produce indistinguishable results on this test). We propose two test-equivalence relations based on runtime values and dependencies, respectively, and present an algorithm that performs on-the-fly partitioning of patches into test-equivalence classes.Our experiments on real-world programs reveal that the proposed methodology drastically reduces the number of test executions and therefore provides an order of magnitude efficiency improvement over existing repair techniques, without sacrificing patch quality.

...read moreread less

34 citations

Journal Article•DOI•

Spectrum-Based Fault Localization in Model Transformations

[...]

Javier Troya¹, Sergio Segura¹, José Antonio Parejo¹, Antonio Ruiz-Cortés¹•Institutions (1)

University of Seville¹

17 Sep 2018-ACM Transactions on Software Engineering and Methodology

TL;DR: Evaluation results revealed that the best techniques, namely Kulcynski2, Mountford, Ochiai, and Zoltar, lead the debugger to inspect a maximum of three rules to locate the bug in around 74% of the cases.

...read moreread less

Abstract: Model transformations play a cornerstone role in Model-Driven Engineering (MDE), as they provide the essential mechanisms for manipulating and transforming models. The correctness of software built using MDE techniques greatly relies on the correctness of model transformations. However, it is challenging and error prone to debug them, and the situation gets more critical as the size and complexity of model transformations grow, where manual debugging is no longer possible.Spectrum-Based Fault Localization (SBFL) uses the results of test cases and their corresponding code coverage information to estimate the likelihood of each program component (e.g., statements) of being faulty. In this article we present an approach to apply SBFL for locating the faulty rules in model transformations. We evaluate the feasibility and accuracy of the approach by comparing the effectiveness of 18 different state-of-the-art SBFL techniques at locating faults in model transformations. Evaluation results revealed that the best techniques, namely Kulcynski2, Mountford, Ochiai, and Zoltar, lead the debugger to inspect a maximum of three rules to locate the bug in around 74% of the cases. Furthermore, we compare our approach with a static approach for fault localization in model transformations, observing a clear superiority of the proposed SBFL-based method.

...read moreread less

Journal Article•DOI•

STADS: Software Testing as Species Discovery

[...]

Marcel Böhme¹•Institutions (1)

National University of Singapore¹

27 Jun 2018-ACM Transactions on Software Engineering and Methodology

TL;DR: In this paper, the authors introduce a statistical framework that models Software Testing and Analysis as Discovery of Species (STADS) to address the fundamental extrapolation challenge for automated test generation.

...read moreread less

Abstract: A fundamental challenge of software testing is the statistically well-grounded extrapolation from program behaviors observed during testing. For instance, a security researcher who has run the fuzzer for a week has currently no means (1) to estimate the total number of feasible program branches, given that only a fraction has been covered so far; (2) to estimate the additional time required to cover 10% more branches (or to estimate the coverage achieved in one more day, respectively); or (3) to assess the residual risk that a vulnerability exists when no vulnerability has been discovered. Failing to discover a vulnerability does not mean that none exists—even if the fuzzer was run for a week (or a year). Hence, testing provides no formal correctness guarantees.In this article, I establish an unexpected connection with the otherwise unrelated scientific field of ecology and introduce a statistical framework that models Software Testing and Analysis as Discovery of Species (STADS). For instance, in order to study the species diversity of arthropods in a tropical rain forest, ecologists would first sample a large number of individuals from that forest, determine their species, and extrapolate from the properties observed in the sample to properties of the whole forest. The estimations (1) of the total number of species, (2) of the additional sampling effort required to discover 10% more species, or (3) of the probability to discover a new species are classical problems in ecology. The STADS framework draws from over three decades of research in ecological biostatistics to address the fundamental extrapolation challenge for automated test generation. Our preliminary empirical study demonstrates a good estimator performance even for a fuzzer with adaptive sampling bias—AFL, a state-of-the-art vulnerability detection tool. The STADS framework provides statistical correctness guarantees with quantifiable accuracy.

...read moreread less

Journal Article•DOI•

Detecting the Behavior of Design Patterns through Model Checking and Dynamic Analysis

[...]

Andrea De Lucia¹, Vincenzo Deufemia¹, Carmine Gravino¹, Michele Risi¹•Institutions (1)

University of Salerno¹

12 Feb 2018-ACM Transactions on Software Engineering and Methodology

TL;DR: The results reveal that ePAD outperforms other approaches by recovering more actual instances and achieves better results in terms of correctness and completeness.

...read moreread less

Abstract: We present a method and tool (ePAD) for the detection of design pattern instances in source code. The approach combines static analysis, based on visual language parsing and model checking, and dynamic analysis, based on source code instrumentation. Visual language parsing and static source code analysis identify candidate instances satisfying the structural properties of design patterns. Successively, model checking statically verifies the behavioral aspects of the candidates recovered in the previous phase. We encode the sequence of messages characterizing the correct behaviour of a pattern as Linear Temporal Logic (LTL) formulae and the sequence diagram representing the possible interaction traces among the objects involved in the candidates as Promela specifications. The model checker SPIN verifies that candidates satisfy the LTL formulae. Dynamic analysis is then performed on the obtained candidates by instrumenting the source code and monitoring those instances at runtime through the execution of test cases automatically generated using a search-based approach. The effectiveness of ePAD has been evaluated by detecting instances of 12 creational and behavioral patterns from six publicly available systems. The results reveal that ePAD outperforms other approaches by recovering more actual instances. Furthermore, on average ePAD achieves better results in terms of correctness and completeness.

...read moreread less

Journal Article•DOI•

The State of Empirical Evaluation in Static Feature Location

[...]

Abdul Razzaq¹, Asanka Wasala¹, Chris Exton¹, Jim Buckley¹•Institutions (1)

University of Limerick¹

05 Dec 2018-ACM Transactions on Software Engineering and Methodology

TL;DR: Feature location (FL) is the task of finding the source code that implements a specific, user-observable functionality in a software system as mentioned in this paper and it plays a key role in many software maintenance tasks and a wide variety of Feature Location Techniques (FLTs), which rely on source code structure or textual analysis have been proposed by researchers.

...read moreread less

Abstract: Feature location (FL) is the task of finding the source code that implements a specific, user-observable functionality in a software system. It plays a key role in many software maintenance tasks and a wide variety of Feature Location Techniques (FLTs), which rely on source code structure or textual analysis, have been proposed by researchers. As FLTs evolve and more novel FLTs are introduced, it is important to perform comparison studies to investigate “Which are the best FLTs?” However, an initial reading of the literature suggests that performing such comparisons would be an arduous process, based on the large number of techniques to be compared, the heterogeneous nature of the empirical designs, and the lack of transparency in the literature. This article presents a systematic review of 170 FLT articles, published between the years 2000 and 2015. Results of the systematic review indicate that 95% of the articles studied are directed towards novelty, in that they propose a novel FLT. Sixty-nine percent of these novel FLTs are evaluated through standard empirical methods but, of those, only 9% use baseline technique(s) in their evaluations to allow cross comparison with other techniques. The heterogeneity of empirical evaluation is also clearly apparent: altogether, over 60 different FLT evaluation metrics are used across the 170 articles, 272 subject systems have been used, and 235 different benchmarks employed. The review also identifies numerous user input formats as contributing to the heterogeneity. Analysis of the existing research also suggests that only 27% of the FLTs presented might be reproduced from the published material. These findings suggest that comparison across the existing body of FLT evaluations is very difficult. We conclude by providing guidelines for empirical evaluation of FLTs that may ultimately help to standardise empirical research in the field, cognisant of FLTs with different goals, leveraging common practices in existing empirical evaluations and allied with rationalisations. This is seen as a step towards standardising evaluation in the field, thus facilitating comparison across FLTs.

...read moreread less

Journal Article•DOI•

Multi-Objective Optimization of Energy Consumption of GUIs in Android Apps

[...]

Mario Linares-Vasquez¹, Gabriele Bavota², Carlos Bernal-Cardenas³, Massimiliano Di Penta⁴, Rocco Oliveto⁵, Denys Poshyvanyk³ - Show less +2 more•Institutions (5)

University of Los Andes¹, University of Lugano², College of William & Mary³, University of Sannio⁴, University of Molise⁵

25 Sep 2018-ACM Transactions on Software Engineering and Methodology

TL;DR: GEMMA, a tool aimed at optimizing the colors used by Android apps, is described, with the goal of reducing the energy consumption on (AM)OLED displays while keeping the user interface visually attractive for end-users.

...read moreread less

Abstract: The number of mobile devices sold worldwide has exponentially increased in recent years, surpassing that of personal computers in 2011. Such devices daily download and run millions of apps that take advantage of modern hardware features (e.g., multi-core processors, large Organic Light-Emitting Diode—OLED—screens, etc.) to offer exciting user experiences. Clearly, there is a cost to pay in terms of energy consumption and, in particular, of reduced battery life. This has pushed researchers to investigate how to reduce the energy consumption of apps, for example, by optimizing the color palette used in the app’s GUI. Whilst past research in this area aimed at optimizing energy while keeping an acceptable level of contrast, this article proposes an approach, named Gui Energy Multi-objective optiMization for Android apps (GEMMA), for generating color palettes using a multi-objective optimization technique, which produces color solutions optimizing energy consumption and contrast while using consistent colors with respect to the original color palette. The empirical evaluation demonstrates (i) substantial improvements in terms of the three different objectives, (ii) a concrete reduction of the energy consumption as assessed by a hardware power monitor, (iii) the attractiveness of the generated color compositions for apps’ users, and (iv) the suitability of GEMMA to be adopted in industrial contexts.

...read moreread less

Journal Article•DOI•

Inferring Extended Probabilistic Finite-State Automaton Models from Software Executions

[...]

Seyedeh Sepideh Emam¹, James Miller¹•Institutions (1)

University of Alberta¹

05 Jun 2018-ACM Transactions on Software Engineering and Methodology

TL;DR: This article applies a hybrid technique to use both reinforcement learning and stochastic modeling to generate an extended probabilistic finite state automaton from software traces and indicates that ReHMM outperforms other inference algorithms.

...read moreread less

Abstract: Behavioral models are useful tools in understanding how programs work. Although several inference approaches have been introduced to generate extended finite-state automatons from software execution traces, they suffer from accuracy, flexibility, and decidability issues. In this article, we apply a hybrid technique to use both reinforcement learning and stochastic modeling to generate an extended probabilistic finite state automaton from software traces. Our approach—ReHMM (Reinforcement learning-based Hidden Markov Modelling)—is able to address the problems of inflexibility and un-decidability reported in other state-of-the-art approaches. Experimental results indicate that ReHMM outperforms other inference algorithms.

...read moreread less

Journal Article•DOI•

Shadow Symbolic Execution for Testing Software Patches

[...]

Tomasz Kuchta¹, Hristina Palikareva¹, Cristian Cadar¹•Institutions (1)

Imperial College London¹

25 Sep 2018-ACM Transactions on Software Engineering and Methodology

TL;DR: A symbolic execution-based technique that is designed to generate test inputs that cover the new program behaviours introduced by a patch and evaluated on the Coreutils patches from the CoREBench suite of regression bugs shows that it is able to generatetest inputs that exercise newly added behaviours and expose some of the regression bugs.

...read moreread less

Abstract: While developers are aware of the importance of comprehensively testing patches, the large effort involved in coming up with relevant test cases means that such testing rarely happens in practice. Furthermore, even when test cases are written to cover the patch, they often exercise the same behaviour in the old and the new version of the code. In this article, we present a symbolic execution-based technique that is designed to generate test inputs that cover the new program behaviours introduced by a patch. The technique works by executing both the old and the new version in the same symbolic execution instance, with the old version shadowing the new one. During this combined shadow execution, whenever a branch point is reached where the old and the new version diverge, we generate a test input exercising the divergence and comprehensively test the new behaviours of the new version. We evaluate our technique on the Coreutils patches from the CoREBench suite of regression bugs, and show that it is able to generate test inputs that exercise newly added behaviours and expose some of the regression bugs.

...read moreread less

Journal Article•DOI•

Refactoring Multi-Level Models

[...]

Juan de Lara¹, Esther Guerra¹•Institutions (1)

Autonomous University of Madrid¹

16 Nov 2018-ACM Transactions on Software Engineering and Methodology

TL;DR: A catalogue of 17 novel refactorings specific to multi-level models is proposed to help designers in rearranging elements across and within meta-levels and exploring the consequences.

...read moreread less

Abstract: Multi-level modelling promotes flexibility in modelling by enabling the use of several meta-levels instead of just two, as is the case in mainstream two-level modelling approaches. While this approach leads to simpler models for some scenarios, it introduces an additional degree of freedom as designers can decide the meta-level where an element should reside, having to ascertain the suitability of such decisions.In this respect, model refactorings have been successfully applied in the context of two-level modelling to rearrange the elements of a model while preserving its meaning. Following this idea, we propose a catalogue of 17 novel refactorings specific to multi-level models. Their objective is to help designers in rearranging elements across and within meta-levels and exploring the consequences. In this article, we detail each refactoring in the catalogue, show a classification across different dimensions, and describe the support we provide in our MetaDepth tool. We present two experiments to assess two aspects of our refactorings. The first one validates the predicted semantic side effects of the refactorings on the basis of more than 210.000 refactoring applications. The second one measures the impact of refactorings on three quality attributes of multi-level models.

...read moreread less

Journal Article•DOI•

Assessing the Refactoring of Brain Methods

[...]

Santiago Vidal¹, Iñaki berra, Santiago Zulliani, Claudia Marcos, J. Andrés Díaz Pace¹ - Show less +1 more•Institutions (1)

National Scientific and Technical Research Council¹

16 Apr 2018-ACM Transactions on Software Engineering and Methodology

TL;DR: Bandago is presented, an automated approach to fix a specific type of code smell called Brain Method, which centralizes the intelligence of a class and manifests itself as a long and complex method that is difficult to understand and maintain by developers.

...read moreread less

Abstract: Code smells are a popular mechanism for identifying structural design problems in software systems. Several tools have emerged to support the detection of code smells and propose some refactorings. However, existing tools do not guarantee that a smell will be automatically fixed by means of refactorings. This article presents Bandago, an automated approach to fix a specific type of code smell called Brain Method. A Brain Method centralizes the intelligence of a class and manifests itself as a long and complex method that is difficult to understand and maintain by developers. For each Brain Method, Bandago recommends several refactoring solutions to remove the smell using a search strategy based on simulated annealing. Our approach has been evaluated with several open-source Java applications, and the results show that Bandago can automatically fix more than 60% of Brain Methods. Furthermore, we conducted a survey with 35 industrial developers that showed evidence about the usefulness of the refactorings proposed by Bandago. Also, we compared the performance of the Bandago against that of a third-party refactoring tool.

...read moreread less

Journal Article•DOI•

Oracles for Testing Software Timeliness with Uncertainty

[...]

Chunhui Wang¹, Fabrizio Pastore¹, Lionel C. Briand¹•Institutions (1)

University of Luxembourg¹

16 Nov 2018-ACM Transactions on Software Engineering and Methodology

TL;DR: Stochastic Testing with Unique Input Output Sequences is proposed, an approach for the automated generation of stochastic oracles that verify the capability of a software system to fulfill timing constraints in the presence of time uncertainty.

...read moreread less

Abstract: Uncertainty in timing properties (e.g., detection time of external events) is a common occurrence in embedded software systems, since these systems interact with complex physical environments.Such time uncertainty leads to non-determinism. For example, time-triggered operations may either generate different valid outputs across different executions or experience failures (e.g., results not being generated in the expected time window) that occur only occasionally over many executions. For these reasons, time uncertainty makes the generation of effective test oracles for timing requirements a challenging task.To address the above challenge, we propose Stochastic Testing with Unique Input Output Sequences, an approach for the automated generation of stochastic oracles that verify the capability of a software system to fulfill timing constraints in the presence of time uncertainty. Such stochastic oracles entail the statistical analysis of repeated test case executions based on test output probabilities predicted by means of statistical model checking. Results from two industrial case studies in the automotive domain demonstrate that this approach improves the fault detection effectiveness of tests suites derived from timed automata compared to traditional approaches.

...read moreread less

Journal Article•DOI•

Prove ite Inferring Formal Proof Scripts from CafeOBJ Proof Scores

[...]

Adrián Riesco¹, Kazuhiro Ogata²•Institutions (2)

Complutense University of Madrid¹, Japan Advanced Institute of Science and Technology²

19 Jul 2018-ACM Transactions on Software Engineering and Methodology

TL;DR: The CiM PA and CiMPG are presented, detailing the behavior of the CiMPA and the algorithm underlying the Ci MPG and illustrating the power of the approach by using the QLOCK protocol.

...read moreread less

Abstract: CafeOBJ is a language for writing formal specifications for a wide variety of software and hardware systems and for verifying their properties. CafeOBJ makes it possible to verify properties by using either proof scores, which consists of reducing goal-related terms in user-defined modules, or by using theorem proving. While the former is more flexible, it lacks the formal support to ensure that a property has been really proven. On the other hand, theorem proving might be too strict, since only a predefined set of commands can be applied to the current goal; hence, it hardens the verification of properties.In order to take advantage of the benefits of both techniques, we have extended CafeInMaude, a CafeOBJ interpreter implemented in Maude, with the CafeInMaude Proof Assistant (CiMPA) and the CafeInMaude Proof Generator (CiMPG). CiMPA is a proof assistant for proving inductive properties on CafeOBJ specifications that uses Maude metalevel features to allow programmers to create and manipulate CiMPA proofs. On the other hand, CiMPG provides a minimal set of annotations for identifying proof scores and generating CiMPA scripts for these proof scores. In this article, we present the CiMPA and CiMPG, detailing the behavior of the CiMPA and the algorithm underlying the CiMPG and illustrating the power of the approach by using the QLOCK protocol. Finally, we present some benchmarks that give us confidence in the matureness and usefulness of these tools.

...read moreread less

Journal Article•DOI•

Maintaining Architecture-Implementation Conformance to Support Architecture Centrality: From Single System to Product Line Development

[...]

Yongjie Zheng¹, Cuong Cu¹, Richard N. Taylor²•Institutions (2)

University of Missouri–Kansas City¹, University of California, Irvine²

27 Jun 2018-ACM Transactions on Software Engineering and Methodology

TL;DR: The result shows that the approaches presented are applicable to the implementation of a real software system and are capable of maintaining architecture-implementation conformance during system evolution.

...read moreread less

Abstract: Architecture-centric development addresses the increasing complexity and variability of software systems by focusing on architectural models, which are generally easier to understand and manipulate than source code. It requires a mechanism that can maintain architecture-implementation conformance during architectural development and evolution. The challenge is twofold. There is an abstraction gap between software architecture and implementation, and both may evolve. Existing approaches are deficient in support for both change mapping and product line architecture. This article presents a novel approach named 1.x-way mapping and its extension, 1.x-line mapping to support architecture-implementation mapping in single system development and in product line development, respectively. They specifically address mapping architecture changes to code, maintaining variability conformance between product line architecture and code, and tracing architectural implementation. We built software tools named xMapper and xLineMapper to realize the two approaches, and conducted case studies with two existing open-source systems to evaluate the approaches. The result shows that our approaches are applicable to the implementation of a real software system and are capable of maintaining architecture-implementation conformance during system evolution.

...read moreread less

Journal Article•DOI•

Recommending Who to Follow in the Software Engineering Twitter Space

[...]

Abhishek Sharma¹, Yuan Tian¹, Agus Sulistya¹, Dinusha Wijedasa¹, David Lo¹ - Show less +1 more•Institutions (1)

Singapore Management University¹

22 Oct 2018-ACM Transactions on Software Engineering and Methodology

TL;DR: The approach first extracts different kinds of features that characterize a Twitter user and then employs a two-stage classification approach to generate a discriminative model, which can differentiate specialized software gurus in a particular domain from other Twitter users that generate domain-related tweets.

...read moreread less

Abstract: With the advent of social media, developers are increasingly using it in their software development activities. Twitter is one of the popular social mediums used by developers. A recent study by Singer et al. found that software developers use Twitter to “keep up with the fast-paced development landscape.” Unfortunately, due to the general-purpose nature of Twitter, it’s challenging for developers to use Twitter for their development activities. Our survey with 36 developers who use Twitter in their development activities highlights that developers are interested in following specialized software gurus who share relevant technical tweets.To help developers perform this task, in this work we propose a recommendation system to identify specialized software gurus. Our approach first extracts different kinds of features that characterize a Twitter user and then employs a two-stage classification approach to generate a discriminative model, which can differentiate specialized software gurus in a particular domain from other Twitter users that generate domain-related tweets (aka domain-related Twitter users). We have investigated the effectiveness of our approach in finding specialized software gurus for four different domains (JavaScript, Android, Python, and Linux) on a dataset of 86,824 Twitter users who generate 5,517,878 tweets over 1 month. Our approach can differentiate specialized software experts from other domain-related Twitter users with an F-Measure of up to 0.820. Compared with existing Twitter domain expert recommendation approaches, our proposed approach can outperform their F-Measure by at least 7.63%.

...read moreread less

Journal Article•DOI•

Understanding and Combating Memory Bloat in Managed Data-Intensive Systems

[...]

Khanh Nguyen¹, Kai Wang¹, Yingyi Bu¹, Lu Fang¹, Guoqing Xu¹ - Show less +1 more•Institutions (1)

University of California, Irvine¹

03 Jan 2018-ACM Transactions on Software Engineering and Methodology

TL;DR: A novel compiler framework, called Facade, is designed that can generate highly efficient data manipulation code by automatically transforming the data path of an existing data-intensive application, leading to significantly reduced memory management cost and improved scalability.

...read moreread less

Abstract: The past decade has witnessed increasing demands on data-driven business intelligence that led to the proliferation of data-intensive applications. A managed object-oriented programming language such as Java is often the developer’s choice for implementing such applications, due to its quick development cycle and rich suite of libraries and frameworks. While the use of such languages makes programming easier, their automated memory management comes at a cost. When the managed runtime meets large volumes of input data, memory bloat is significantly magnified and becomes a scalability-prohibiting bottleneck.This article first studies, analytically and empirically, the impact of bloat on the performance and scalability of large-scale, real-world data-intensive systems. To combat bloat, we design a novel compiler framework, called Facade, that can generate highly efficient data manipulation code by automatically transforming the data path of an existing data-intensive application. The key treatment is that in the generated code, the number of runtime heap objects created for data classes in each thread is (almost) statically bounded, leading to significantly reduced memory management cost and improved scalability. We have implemented Facade and used it to transform seven common applications on three real-world, already well-optimized data processing frameworks: GraphChi, Hyracks, and GPS. Our experimental results are very positive: the generated programs have (1) achieved a 3% to 48% execution time reduction and an up to 88× GC time reduction, (2) consumed up to 50% less memory, and (3) scaled to much larger datasets.

...read moreread less

Journal Article•DOI•

Global and Local Deadlock Freedom in BIP

[...]

Paul C. Attie¹, Saddek Bensalem², Marius Bozga², Mohamad Y. Jaber¹, Joseph Sifakis³, Fadi A. Zaraket¹ - Show less +2 more•Institutions (3)

American University of Beirut¹, Centre national de la recherche scientifique², École Polytechnique³

03 Jan 2018-ACM Transactions on Software Engineering and Methodology

TL;DR: A criterion for checking local and global deadlock freedom of finite state systems expressed in BIP: a component-based framework for constructing complex distributed systems, which certifies freedom from local deadlock, in which a subsystem is deadlocked while the rest of the system executes.

...read moreread less

Abstract: We present a criterion for checking local and global deadlock freedom of finite state systems expressed in BIP: a component-based framework for constructing complex distributed systems. Our criterion is evaluated by model-checking a set of subsystems of the overall large system. If satisfied in small subsystems, it implies deadlock-freedom of the overall system. If not satisfied, then we re-evaluate over larger subsystems, which improves the accuracy of the check. When the subsystem being checked becomes the entire system, our criterion becomes complete for deadlock-freedom. Hence our criterion only fails to decide deadlock freedom because of computational limitations: state-space explosion sets in when the subsystems become too large. Our method thus combines the possibility of fast response together with theoretical completeness. Other criteria for deadlock freedom, in contrast, are incomplete in principle, and so may fail to decide deadlock freedom even if unlimited computational resources are available. Also, our criterion certifies freedom from local deadlock, in which a subsystem is deadlocked while the rest of the system executes. Other criteria only certify freedom from global deadlock. We present experimental results for dining philosophers and for a multi-token-based resource allocation system, which subsumes several data arbiters and schedulers, including Milner’s token-based scheduler.

...read moreread less