Showing papers in "arXiv: Software Engineering in 2015"

PDF

Open Access

Posted Content•

The Use of Machine Learning Algorithms in Recommender Systems: A Systematic Review

[...]

Ivens Portugal¹, Paulo S. C. Alencar¹, Donald D. Cowan¹•Institutions (1)

17 Nov 2015-arXiv: Software Engineering

TL;DR: In this paper, the authors present a systematic review of the literature that analyzes the use of machine learning algorithms in recommender systems and identifies research opportunities for software engineering research, and conclude that Bayesian and decision tree algorithms are widely used in recommendation systems because of their relative simplicity and that requirement and design phases of recommender system development appear to offer opportunities for further research.

...read moreread less

Abstract: Recommender systems use algorithms to provide users with product or service recommendations. Recently, these systems have been using machine learning algorithms from the field of artificial intelligence. However, choosing a suitable machine learning algorithm for a recommender system is difficult because of the number of algorithms described in the literature. Researchers and practitioners developing recommender systems are left with little information about the current approaches in algorithm usage. Moreover, the development of a recommender system using a machine learning algorithm often has problems and open questions that must be evaluated, so software engineers know where to focus research efforts. This paper presents a systematic review of the literature that analyzes the use of machine learning algorithms in recommender systems and identifies research opportunities for software engineering research. The study concludes that Bayesian and decision tree algorithms are widely used in recommender systems because of their relative simplicity, and that requirement and design phases of recommender system development appear to offer opportunities for further research.

...read moreread less

354 citations

Proceedings Article•DOI•

SourcererCC: Scaling Code Clone Detection to Big Code

[...]

Hitesh Sajnani¹, Vaibhav Saini¹, Jeffrey Svajlenko², Chanchal K. Roy², Cristina V. Lopes¹ - Show less +1 more•Institutions (2)

University of California, Irvine¹, University of Saskatchewan²

20 Dec 2015-arXiv: Software Engineering

TL;DR: This paper presents a token-based clone detector, SourcererCC, that can detect both exact and near-miss clones from large inter-project repositories using a standard workstation, and evaluates the scalability, execution time, recall and precision, and compares it to four publicly available and state-of-the-art tools.

...read moreread less

Abstract: Despite a decade of active research, there is a marked lack in clone detectors that scale to very large repositories of source code, in particular for detecting near-miss clones where significant editing activities may take place in the cloned code. We present SourcererCC, a token-based clone detector that targets three clone types, and exploits an index to achieve scalability to large inter-project repositories using a standard workstation. SourcererCC uses an optimized inverted-index to quickly query the potential clones of a given code block. Filtering heuristics based on token ordering are used to significantly reduce the size of the index, the number of code-block comparisons needed to detect the clones, as well as the number of required token-comparisons needed to judge a potential clone. We evaluate the scalability, execution time, recall and precision of SourcererCC, and compare it to four publicly available and state-of-the-art tools. To measure recall, we use two recent benchmarks, (1) a large benchmark of real clones, BigCloneBench, and (2) a Mutation/Injection-based framework of thousands of fine-grained artificial clones. We find SourcererCC has both high recall and precision, and is able to scale to a large inter-project repository (250MLOC) using a standard workstation.

...read moreread less

259 citations

Posted Content•

Enabling High-Level Application Development for the Internet of Things

[...]

Pankesh Patel, Damien Cassou¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

21 Jan 2015-arXiv: Software Engineering

TL;DR: A development methodology that separates IoT application development into different concerns and provides a conceptual framework to develop an application and a development framework that implements the development methodology to support actions of stakeholders is proposed.

...read moreread less

Abstract: Application development in the Internet of Things (IoT) is challenging because it involves dealing with a wide range of related issues such as lack of separation of concerns, and lack of high-level of abstractions to address both the large scale and heterogeneity. Moreover, stakeholders involved in the application development have to address issues that can be attributed to different life-cycles phases. when developing applications. First, the application logic has to be analyzed and then separated into a set of distributed tasks for an underlying network. Then, the tasks have to be implemented for the specific hardware. Apart from handling these issues, they have to deal with other aspects of life-cycle such as changes in application requirements and deployed devices. Several approaches have been proposed in the closely related fields of wireless sensor network, ubiquitous and pervasive computing, and software engineering in general to address the above challenges. However, existing approaches only cover limited subsets of the above mentioned challenges when applied to the IoT. This paper proposes an integrated approach for addressing the above mentioned challenges. The main contributions of this paper are: (1) a development methodology that separates IoT application development into different concerns and provides a conceptual framework to develop an application, (2) a development framework that implements the development methodology to support actions of stakeholders. The development framework provides a set of modeling languages to specify each development concern and abstracts the scale and heterogeneity related complexity. It integrates code generation, task-mapping, and linking techniques to provide automation. Code generation supports the application development phase by producing a programming framework that allows stakeholders to focus on the application logic, while our mapping and linking techniques together support the deployment phase by producing device-specific code to result in a distributed system collaboratively hosted by individual devices. Our evaluation based on two realistic scenarios shows that the use of our approach improves the productivity of stakeholders involved in the application development.

...read moreread less

168 citations

Posted Content•

On the "Naturalness" of Buggy Code

[...]

Baishakhi Ray¹, Vincent J. Hellendoorn², Saheel Godhane², Zhaopeng Tu³, Alberto Bacchelli⁴, Premkumar Devanbu² - Show less +2 more•Institutions (4)

University of Virginia¹, University of California, Davis², Huawei³, Delft University of Technology⁴

03 Jun 2015-arXiv: Software Engineering

TL;DR: It is found that code with bugs tends to be more entropic (i.e. unnatural), becoming less so as bugs are fixed, suggesting that entropy may be a valid, simple way to complement the effectiveness of PMD or FindBugs, and that search-based bug-fixing methods may benefit from using entropy both for fault-localization and searching for fixes.

...read moreread less

Abstract: Real software, the kind working programmers produce by the kLOC to solve real-world problems, tends to be "natural", like speech or natural language; it tends to be highly repetitive and predictable. Researchers have captured this naturalness of software through statistical models and used them to good effect in suggestion engines, porting tools, coding standards checkers, and idiom miners. This suggests that code that appears improbable, or surprising, to a good statistical language model is "unnatural" in some sense, and thus possibly suspicious. In this paper, we investigate this hypothesis. We consider a large corpus of bug fix commits (ca.~8,296), from 10 different Java projects, and we focus on its language statistics, evaluating the naturalness of buggy code and the corresponding fixes. We find that code with bugs tends to be more entropic (i.e., unnatural), becoming less so as bugs are fixed. Focusing on highly entropic lines is similar in cost-effectiveness to some well-known static bug finders (PMD, FindBugs) and ordering warnings from these bug finders using an entropy measure improves the cost-effectiveness of inspecting code implicated in warnings. This suggests that entropy may be a valid language-independent and simple way to complement the effectiveness of PMD or FindBugs, and that search-based bug-fixing methods may benefit from using entropy both for fault-localization and searching for fixes.

...read moreread less

150 citations

Posted Content•

Migrating to Cloud-Native Architectures Using Microservices: An Experience Report

[...]

Armin Balalaie¹, Abbas Heydarnoori¹, Pooyan Jamshidi²•Institutions (2)

Sharif University of Technology¹, Imperial College London²

29 Jul 2015-arXiv: Software Engineering

TL;DR: In this article, the authors report their experience and lessons learned in an ongoing project on migrating a monolithic on-premise software architecture to microservices and conclude that microservices is not a one-fit-all solution as it introduces new complexities to the system, and many factors, such as distribution complexities, should be considered before adopting this style.

...read moreread less

Abstract: Migration to the cloud has been a popular topic in industry and academia in recent years. Despite many benefits that the cloud presents, such as high availability and scalability, most of the on-premise application architectures are not ready to fully exploit the benefits of this environment, and adapting them to this environment is a non-trivial task. Microservices have appeared recently as novel architectural styles that are native to the cloud. These cloud-native architectures can facilitate migrating on-premise architectures to fully benefit from the cloud environments because non-functional attributes, like scalability, are inherent in this style. The existing approaches on cloud migration does not mostly consider cloud-native architectures as their first-class citizens. As a result, the final product may not meet its primary drivers for migration. In this paper, we intend to report our experience and lessons learned in an ongoing project on migrating a monolithic on-premise software architecture to microservices. We concluded that microservices is not a one-fit-all solution as it introduces new complexities to the system, and many factors, such as distribution complexities, should be considered before adopting this style. However, if adopted in a context that needs high flexibility in terms of scalability and availability, it can deliver its promised benefits.

...read moreread less

119 citations

Posted Content•

Clustering-Based Predictive Process Monitoring

[...]

Chiara Di Francescomarino, Marlon Dumas¹, Fabrizio Maria Maggi¹, Irene Teinemaa¹•Institutions (1)

University of Tartu¹

03 Jun 2015-arXiv: Software Engineering

TL;DR: This paper proposes a predictive process monitoring framework for estimating the probability that a given predicate will be fulfilled upon completion of a running case and takes into account both the sequence of events observed in the current trace, as well as data attributes associated to these events.

...read moreread less

Abstract: Business process enactment is generally supported by information systems that record data about process executions, which can be extracted as event logs. Predictive process monitoring is concerned with exploiting such event logs to predict how running (uncompleted) cases will unfold up to their completion. In this paper, we propose a predictive process monitoring framework for estimating the probability that a given predicate will be fulfilled upon completion of a running case. The predicate can be, for example, a temporal logic constraint or a time constraint, or any predicate that can be evaluated over a completed trace. The framework takes into account both the sequence of events observed in the current trace, as well as data attributes associated to these events. The prediction problem is approached in two phases. First, prefixes of previous traces are clustered according to control flow information. Secondly, a classifier is built for each cluster using event data to discriminate between fulfillments and violations. At runtime, a prediction is made on a running case by mapping it to a cluster and applying the corresponding classifier. The framework has been implemented in the ProM toolset and validated on a log pertaining to the treatment of cancer patients in a large hospital.

...read moreread less

100 citations

Posted Content•

Untangling Fine-Grained Code Changes

[...]

Martín Dias¹, Alberto Bacchelli², Georgios Gousios³, Damien Cassou¹, Stéphane Ducasse¹ - Show less +1 more•Institutions (3)

French Institute for Research in Computer Science and Automation¹, Delft University of Technology², Radboud University Nijmegen³

24 Feb 2015-arXiv: Software Engineering

TL;DR: A publicly available dataset of untangled code changes, created with the help of two developers who accurately split their code changes into self contained tasks over a period of four months, and a novel approach to help developers share untangled commits (aka. atomic commits) by using fine-grained code change information.

...read moreread less

Abstract: After working for some time, developers commit their code changes to a version control system. When doing so, they often bundle unrelated changes (e.g., bug fix and refactoring) in a single commit, thus creating a so-called tangled commit. Sharing tangled commits is problematic because it makes review, reversion, and integration of these commits harder and historical analyses of the project less reliable. Researchers have worked at untangling existing commits, i.e., finding which part of a commit relates to which task. In this paper, we contribute to this line of work in two ways: (1) A publicly available dataset of untangled code changes, created with the help of two developers who accurately split their code changes into self contained tasks over a period of four months; (2) a novel approach, EpiceaUntangler, to help developers share untangled commits (aka. atomic commits) by using fine-grained code change information. EpiceaUntangler is based and tested on the publicly available dataset, and further evaluated by deploying it to 7 developers, who used it for 2 weeks. We recorded a median success rate of 91% and average one of 75%, in automatically creating clusters of untangled fine-grained code changes.

...read moreread less

84 citations

Posted Content•

YesWorkflow: A User-Oriented, Language-Independent Tool for Recovering Workflow Information from Scripts

[...]

09 Feb 2015-arXiv: Software Engineering

TL;DR: YesWorkflow enables scientists to annotate existing scripts with special comments that reveal the computational modules and dataflows otherwise implicit in these scripts, and represents the scripts in terms of entities based on the typical scientific workflow model.

...read moreread less

Abstract: Scientific workflow management systems offer features for composing complex computational pipelines from modular building blocks, for executing the resulting automated workflows, and for recording the provenance of data products resulting from workflow runs. Despite the advantages such features provide, many automated workflows continue to be implemented and executed outside of scientific workflow systems due to the convenience and familiarity of scripting languages (such as Perl, Python, R, and MATLAB), and to the high productivity many scientists experience when using these languages. YesWorkflow is a set of software tools that aim to provide such users of scripting languages with many of the benefits of scientific workflow systems. YesWorkflow requires neither the use of a workflow engine nor the overhead of adapting code to run effectively in such a system. Instead, YesWorkflow enables scientists to annotate existing scripts with special comments that reveal the computational modules and dataflows otherwise implicit in these scripts. YesWorkflow tools extract and analyze these comments, represent the scripts in terms of entities based on the typical scientific workflow model, and provide graphical renderings of this workflow-like view of the scripts. Future versions of YesWorkflow also will allow the prospective provenance of the data products of these scripts to be queried in ways similar to those available to users of scientific workflow systems.

...read moreread less

75 citations

Proceedings Article•DOI•

SWIM: Synthesizing What I Mean

[...]

Mukund Raghothaman, Yi Wei, Youssef Hamadi

26 Nov 2015-arXiv: Software Engineering

TL;DR: SWIM, a tool which suggests code snippets given API-related natural language queries such as "generate md5 hash code" is described, which translates user queries into the APIs of interest using clickthrough data from the Bing search engine.

...read moreread less

Abstract: Modern programming frameworks come with large libraries, with diverse applications such as for matching regular expressions, parsing XML files and sending email. Programmers often use search engines such as Google and Bing to learn about existing APIs. In this paper, we describe SWIM, a tool which suggests code snippets given API-related natural language queries such as "generate md5 hash code". We translate user queries into the APIs of interest using clickthrough data from the Bing search engine. Then, based on patterns learned from open-source code repositories, we synthesize idiomatic code describing the use of these APIs. We introduce \emph{structured call sequences} to capture API-usage patterns. Structured call sequences are a generalized form of method call sequences, with if-branches and while-loops to represent conditional and repeated API usage patterns, and are simple to extract and amenable to synthesis. We evaluated SWIM with 30 common C# API-related queries received by Bing. For 70% of the queries, the first suggested snippet was a relevant solution, and a relevant solution was present in the top 10 results for all benchmarked queries. The online portion of the workflow is also very responsive, at an average of 1.5 seconds per snippet.

...read moreread less

72 citations

Posted Content•

An Extensive Systematic Review on Model-Driven Development of Secure Systems

[...]

Phu H. Nguyen, Max E. Kramer, Jacques Klein, Yves Le Traon

25 May 2015-arXiv: Software Engineering

TL;DR: The results suggest the need for addressing multiple security concerns more systematically and simultaneously, for tool chains supporting the MDS development cycle, and for more empirical studies on the application of MDS methodologies.

...read moreread less

Abstract: Context: Model-Driven Security (MDS) is as a specialised Model-Driven Engineering research area for supporting the development of secure systems. Over a decade of research on MDS has resulted in a large number of publications. Objective: To provide a detailed analysis of the state of the art in MDS, a systematic literature review (SLR) is essential. Method: We conducted an extensive SLR on MDS. Derived from our research questions, we designed a rigorous, extensive search and selection process to identify a set of primary MDS studies that is as complete as possible. Our three-pronged search process consists of automatic searching, manual searching, and snowballing. After discovering and considering more than thousand relevant papers, we identified, strictly selected, and reviewed 108 MDS publications. Results: The results of our SLR show the overall status of the key artefacts of MDS, and the identified primary MDS studies. E.g. regarding security modelling artefact, we found that developing domain-specific languages plays a key role in many MDS approaches. The current limitations in each MDS artefact are pointed out and corresponding potential research directions are suggested. Moreover, we categorise the identified primary MDS studies into 5 principal MDS studies, and other emerging or less common MDS studies. Finally, some trend analyses of MDS research are given. Conclusion: Our results suggest the need for addressing multiple security concerns more systematically and simultaneously, for tool chains supporting the MDS development cycle, and for more empirical studies on the application of MDS methodologies. To the best of our knowledge, this SLR is the first in the field of Software Engineering that combines a snowballing strategy with database searching. This combination has delivered an extensive literature study on MDS.

...read moreread less

59 citations

Posted Content•

Automatic Repair of Real Bugs: An Experience Report on the Defects4J Dataset

[...]

Thomas Durieux¹, Matias Martinez¹, Martin Monperrus¹, Romain Sommerard¹, Jifeng Xuan² - Show less +1 more•Institutions (2)

Lille University of Science and Technology¹, Wuhan University²

26 May 2015-arXiv: Software Engineering

TL;DR: An experiment on automatically repairing 224 bugs of a real-world and publicly available bug dataset, Defects4J, finds that only 8 patches are undoubtedly correct, a novel piece of evidence that there is large room for improvement in the area of test suite based repair.

...read moreread less

Abstract: Automatic software repair aims to reduce human effort for fixing bugs. Various automatic repair approaches have emerged in recent years. In this paper, we report on an experiment on automatically repairing 224 bugs of a real-world and publicly available bug dataset, Defects4J. We investigate the results of three repair methods, GenProg (repair via random search), Kali (repair via exhaustive search), and Nopol (repair via constraint based search). We conduct our investigation with five research questions: fixability, patch correctness, ill-defined bugs, performance, and fault localizability. Our implementations of GenProg, Kali, and Nopol fix together 41 out of 224 (18%) bugs with 59 different patches. This can be viewed as a baseline for future usage of Defects4J for automatic repair research. In addition, manual analysis of sampling 42 of 59 generated patches shows that only 8 patches are undoubtedly correct. This is a novel piece of evidence that there is large room for improvement in the area of test suite based repair.

...read moreread less

Posted Content•

Applied Metamodelling: A Foundation for Language Driven Development (Third Edition)

[...]

Tony Clark, Paul Sammut, James Willans

01 May 2015-arXiv: Software Engineering

TL;DR: In this paper, the authors propose that there is a common foundation to their resolution: languages, which are the primary way in which system developers communicate, design and implement systems, and provide abstractions that can encapsulate complexity, embrace the diversity of technologies and design abstractions, and unite modern and legacy systems.

...read moreread less

Abstract: Modern day system developers have some serious problems to contend with. The systems they develop are becoming increasingly complex as customers demand richer functionality delivered in ever shorter timescales. They have to manage a huge diversity of implementation technologies, design techniques and development processes: everything from scripting languages to web-services to the latest 'silver bullet' design abstraction. To add to that, nothing stays still: today's 'must have' technology rapidly becomes tomorrow's legacy problem that must be managed along with everything else. How can these problems be dealt with? In this book we propose that there is a common foundation to their resolution: languages. Languages are the primary way in which system developers communicate, design and implement systems. Languages provide abstractions that can encapsulate complexity, embrace the diversity of technologies and design abstractions, and unite modern and legacy systems.

...read moreread less

Posted Content•

On End-to-End Program Generation from User Intention by Deep Neural Networks

[...]

Lili Mou, Rui Men, Ge Li, Lu Zhang, Zhi Jin - Show less +1 more

25 Oct 2015-arXiv: Software Engineering

TL;DR: This paper envisions an end-to-end program generation scenario using recurrent neural networks (RNNs): Users can express their intention in natural language; an RNN then automatically generates corresponding code in a characterby-by-character fashion.

...read moreread less

Abstract: This paper envisions an end-to-end program generation scenario using recurrent neural networks (RNNs): Users can express their intention in natural language; an RNN then automatically generates corresponding code in a characterby-by-character fashion. We demonstrate its feasibility through a case study and empirical analysis. To fully make such technique useful in practice, we also point out several cross-disciplinary challenges, including modeling user intention, providing datasets, improving model architectures, etc. Although much long-term research shall be addressed in this new field, we believe end-to-end program generation would become a reality in future decades, and we are looking forward to its practice.

...read moreread less

Posted Content•

Performance-oriented DevOps: A Research Agenda

[...]

Andreas Brunnert, André van Hoorn, Felix Willnecker, Alexandru Danciu, Wilhelm Hasselbring, Christoph Heger, Nikolas Herbst, Pooyan Jamshidi, Reiner Jung, Jóakim von Kistowski, Anne Koziolek, Johannes Kroß, Simon Spinner, Christian Vögele, Jürgen Walter, Alexander Wert - Show less +12 more

18 Aug 2015-arXiv: Software Engineering

TL;DR: In this article, the authors focus on tools, activities, and processes to ensure one of the most important quality attributes of a software system, namely performance, which describes system properties concerning its timeliness and use of resources.

...read moreread less

Abstract: DevOps is a trend towards a tighter integration between development (Dev) and operations (Ops) teams. The need for such an integration is driven by the requirement to continuously adapt enterprise applications (EAs) to changes in the business environment. As of today, DevOps concepts have been primarily introduced to ensure a constant flow of features and bug fixes into new releases from a functional perspective. In order to integrate a non-functional perspective into these DevOps concepts this report focuses on tools, activities, and processes to ensure one of the most important quality attributes of a software system, namely performance. Performance describes system properties concerning its timeliness and use of resources. Common metrics are response time, throughput, and resource utilization. Performance goals for EAs are typically defined by setting upper and/or lower bounds for these metrics and specific business transactions. In order to ensure that such performance goals can be met, several activities are required during development and operation of these systems as well as during the transition from Dev to Ops. Activities during development are typically summarized by the term Software Performance Engineering (SPE), whereas activities during operations are called Application Performance Management (APM). SPE and APM were historically tackled independently from each other, but the newly emerging DevOps concepts require and enable a tighter integration between both activity streams. This report presents existing solutions to support this integration as well as open research challenges in this area.

...read moreread less

Posted Content•

Safety-Constrained Reinforcement Learning for MDPs

[...]

Sebastian Junges¹, Nils Jansen², Christian Dehnert¹, Ufuk Topcu², Joost-Pieter Katoen¹ - Show less +1 more•Institutions (2)

RWTH Aachen University¹, University of Texas at Austin²

20 Oct 2015-arXiv: Software Engineering

TL;DR: In this paper, a Markov decision process is proposed for controller synthesis for stochastic and partially unknown environments, where the expected performance is measured using a cost function that is unknown prior to run-time exploration of the state space.

...read moreread less

Abstract: We consider controller synthesis for stochastic and partially unknown environments in which safety is essential. Specifically, we abstract the problem as a Markov decision process in which the expected performance is measured using a cost function that is unknown prior to run-time exploration of the state space. Standard learning approaches synthesize cost-optimal strategies without guaranteeing safety properties. To remedy this, we first compute safe, permissive strategies. Then, exploration is constrained to these strategies and thereby meets the imposed safety requirements. Exploiting an iterative learning procedure, the resulting policy is safety-constrained and optimal. We show correctness and completeness of the method and discuss the use of several heuristics to increase its scalability. Finally, we demonstrate the applicability by means of a prototype implementation.

...read moreread less

Proceedings Article•DOI•

Understanding the Affect of Developers: Theoretical Background and Guidelines for Psychoempirical Software Engineering

[...]

Daniel Graziotin¹, Xiaofeng Wang¹, Pekka Abrahamsson²•Institutions (2)

Free University of Bozen-Bolzano¹, Norwegian University of Science and Technology²

14 Jul 2015-arXiv: Software Engineering

TL;DR: In this article, a comprehensive literature review in affect theory and guidelines for conducting psycho-empirical software engineering are proposed. But, they are limited to software development tasks and do not cover other aspects of software development.

...read moreread less

Abstract: Affects---emotions and moods---have an impact on cognitive processing activities and the working performance of individuals. It has been established that software development tasks are undertaken through cognitive processing activities. Therefore, we have proposed to employ psychology theory and measurements in software engineering (SE) research. We have called it "psychoempirical software engineering". However, we found out that existing SE research has often fallen into misconceptions about the affect of developers, lacking in background theory and how to successfully employ psychological measurements in studies. The contribution of this paper is threefold. (1) It highlights the challenges to conduct proper affect-related studies with psychology; (2) it provides a comprehensive literature review in affect theory; and (3) it proposes guidelines for conducting psychoempirical software engineering.

...read moreread less

Posted Content•

The xSAP Safety Analysis Platform

[...]

Benjamin Bittner¹, Marco Bozzano¹, Roberto Cavada¹, Alessandro Cimatti¹, Marco Gario¹, Alberto Griggio¹, Cristian Mattarei¹, Andrea Micheli¹, Gianni Zampedri¹ - Show less +5 more•Institutions (1)

fondazione bruno kessler¹

28 Apr 2015-arXiv: Software Engineering

TL;DR: The xSAP safety analysis platform as mentioned in this paper provides several model-based safety analysis features for finite-and infinite-state synchronous transition systems, including Fault Trees, failure propagation analysis using Timed Failure Propagation Graphs (TFPGs), and Common Cause Analysis (CCA).

...read moreread less

Abstract: This paper describes the xSAP safety analysis platform. xSAP provides several model-based safety analysis features for finite- and infinite-state synchronous transition systems. In particular, it supports library-based definition of fault modes, an automatic model extension facility, generation of safety analysis artifacts such as Dynamic Fault Trees (DFTs) and Failure Mode and Effects Analysis (FMEA) tables. Moreover, it supports probabilistic evaluation of Fault Trees, failure propagation analysis using Timed Failure Propagation Graphs (TFPGs), and Common Cause Analysis (CCA). xSAP has been used in several industrial projects as verification back-end, and is currently being evaluated in a joint R&D Project involving FBK and The Boeing Company.

...read moreread less

Posted Content•

A Survey of Linguistic Structures for Application-level Fault-Tolerance

[...]

Vincenzo De Florio¹, Chris Blondia¹•Institutions (1)

University of Antwerp¹

13 Apr 2015-arXiv: Software Engineering

TL;DR: A “base” of structural attributes with which application-level fault-tolerance structures can be qualitatively assessed and compared with each other and with respect to the aforementioned needs is defined and used to provide an elaborated survey of the state-of-the-art of application- level fault-Tolerance structures.

...read moreread less

Abstract: The structures for the expression of fault-tolerance provisions into the application software are the central topic of this paper. Structuring techniques answer the questions "How to incorporate fault-tolerance in the application layer of a computer program" and "How to manage the fault-tolerant code". As such, they provide means to control complexity, the latter being a relevant factor for the introduction of design faults. This fact and the ever increasing complexity of today's distributed software justify the need for simple, coherent, and effective structures for the expression of fault-tolerance in the application software. In this text we first define a "base" of structural attributes with which application-level fault-tolerance structures can be qualitatively assessed and compared with each other and with respect to the above mentioned needs. This result is then used to provide an elaborated survey of the state-of-the-art of application-level fault-tolerance structures.

...read moreread less

Posted Content•

Integration of Heterogeneous Modeling Languages via Extensible and Composable Language Components

[...]

Arne Haber, Markus Look, Antonio Navarro Perez, Bernhard Rumpe, Steven Völkel, Andreas Wortmann - Show less +2 more

15 Sep 2015-arXiv: Software Engineering

TL;DR: This contribution presents a method for the engineering of grammar-based language components that can be independently developed, are syntactically composable, and ultimately reusable, and allows the agile employment of modeling languages efficiently tailored for individual software projects.

...read moreread less

Abstract: Effective model-driven engineering of complex systems requires to appropriately describe different specific system aspects. To this end, efficient integration of different heterogeneous modeling languages is essential. Modeling language integaration is onerous and requires in-depth conceptual and technical knowledge and ef- fort. Traditional modeling lanugage integration approches require language engineers to compose monolithic language aggregates for a specific task or project. Adapting these aggregates cannot be to different contexts requires vast effort and makes these hardly reusable. This contribution presents a method for the engineering of grammar-based language components that can be independently developed, are syntactically composable, and ultimately reusable. To this end, it introduces the concepts of language aggregation, language embed- ding, and language inheritance, as well as their realization in the language workbench MontiCore. The result is a generalizable, systematic, and efficient syntax-oriented composition of languages that allows the agile employment of modeling languages efficiently tailored for individual software projects.

...read moreread less

Posted Content•

SAT-based Analysis of Large Real-world Feature Models is Easy

[...]

Jia Hui Liang, Vijay Ganesh, Venkatesh Raman, Krzysztof Czarnecki

17 Jun 2015-arXiv: Software Engineering

TL;DR: This work discovered that a key reason why large real-world FMs are easy-to-analyze is that the vast majority of the variables in these models are unrestricted, i.e., the models are satisfiable for both true and false assignments to such variables under the current partial assignment.

...read moreread less

Abstract: Modern conflict-driven clause-learning (CDCL) Boolean SAT solvers provide efficient automatic analysis of real-world feature models (FM) of systems ranging from cars to operating systems. It is well-known that solver-based analysis of real-world FMs scale very well even though SAT instances obtained from such FMs are large, and the corresponding analysis problems are known to be NP-complete. To better understand why SAT solvers are so effective, we systematically studied many syntactic and semantic characteristics of a representative set of large real-world FMs. We discovered that a key reason why large real-world FMs are easy-to-analyze is that the vast majority of the variables in these models are unrestricted, i.e., the models are satisfiable for both true and false assignments to such variables under the current partial assignment. Given this discovery and our understanding of CDCL SAT solvers, we show that solvers can easily find satisfying assignments for such models without too many backtracks relative to the model size, explaining why solvers scale so well. Further analysis showed that the presence of unrestricted variables in these real-world models can be attributed to their high-degree of variability. Additionally, we experimented with a series of well-known non-backtracking simplifications that are particularly effective in solving FMs. The remaining variables/clauses after simplifications, called the core, are so few that they are easily solved even with backtracking, further strengthening our conclusions.

...read moreread less

Book Chapter•DOI•

Modeling Styles in Business Process Modeling

[...]

Jakob Pinggera¹, Pnina Soffer², Stefan Zugal¹, Barbara Weber¹, Matthias Weidlich³, Dirk Fahland³, Hajo A. Reijers⁴, Jan Mendling⁵ - Show less +4 more•Institutions (5)

University of Innsbruck¹, University of Haifa², Eindhoven University of Technology³, Technion – Israel Institute of Technology⁴, Vienna University of Economics and Business⁵

11 Nov 2015-arXiv: Software Engineering

TL;DR: In this article, the authors observed 115 students engaged in the act of modeling, recording all their interactions with the modeling environment using a specialized tool, and the recordings of process modeling were subsequently clustered.

...read moreread less

Abstract: Research on quality issues of business process models has recently begun to explore the process of creating process models. As a consequence, the question arises whether different ways of creating process models exist. In this vein, we observed 115 students engaged in the act of modeling, recording all their interactions with the modeling environment using a specialized tool. The recordings of process modeling were subsequently clustered. Results presented in this paper suggest the existence of three distinct modeling styles, exhibiting significantly different characteristics. We believe that this finding constitutes another building block toward a more comprehensive understanding of the process of process modeling that will ultimately enable us to support modelers in creating better business process models.

...read moreread less

Posted Content•

Evolutionary Trends of Developer Coordination: A Network Approach

[...]

Mitchell Joblin¹, Sven Apel², Wolfgang Mauerer•Institutions (2)

Siemens¹, University of Passau²

23 Oct 2015-arXiv: Software Engineering

TL;DR: In this article, the authors examined and discussed the evolutionary principles that govern the coordination of developers in 18 large open-source projects and found that the implicit and self-organizing structure of developer coordination is ubiquitously described by non-random organizational principles.

...read moreread less

Abstract: Software evolution is a fundamental process that transcends the realm of technical artifacts and permeates the entire organizational structure of a software project. By means of a longitudinal empirical study of 18 large open-source projects, we examine and discuss the evolutionary principles that govern the coordination of developers. By applying a network-analytic approach, we found that the implicit and self-organizing structure of developer coordination is ubiquitously described by non-random organizational principles that defy conventional software-engineering wisdom. In particular, we found that: (a) developers form scale-free networks, in which the majority of coordination requirements arise among an extremely small number of developers, (b) developers tend to accumulate coordination requirements with more and more developers over time, presumably limited by an upper bound, and (c) initially developers are hierarchically arranged, but over time, form a hybrid structure, in which core developers are hierarchically arranged and peripheral developers are not. Our results suggest that the organizational structure of large projects is constrained to evolve towards a state that balances the costs and benefits of developer coordination, and the mechanisms used to achieve this state depend on the project's scale.

...read moreread less

Proceedings Article•DOI•

Continuous integration in a social-coding world: Empirical evidence from GITHUB Updated version with corrections

[...]

Bogdan Vasilescu, Stef van Schuylenburg, Jules Wulms, Alexander Serebrenik, Mark van den Brand - Show less +1 more

06 Dec 2015-arXiv: Software Engineering

TL;DR: In this paper, the authors explore how GITHUB developers use continous integration as well as whether the contribution type (direct versus indirect) and different project characteristics (e.g., main programming language, or project age) are associated with the success of the automatic builds.

...read moreread less

Abstract: Continuous integration is a software engineering practice of frequently merging all developer working copies with a shared main branch, e.g., several times a day. With the advent of GITHUB, a platform well known for its "social coding" features that aid collaboration and sharing, and currently the largest code host in the open source world, collab- orative software development has never been more prominent. In GITHUB development one can distinguish between two types of developer contributions to a project: direct ones, coming from a typically small group of developers with write access to the main project repository, and indirect ones, coming from developers who fork the main repository, update their copies locally, and submit pull requests for review and merger. In this paper we explore how GITHUB developers use contin- uous integration as well as whether the contribution type (direct versus indirect) and different project characteristics (e.g., main programming language, or project age) are associated with the success of the automatic builds.

...read moreread less

Posted Content•DOI•

Soft Skills and Software Development: A Reflection from the Software Industry

[...]

Faheem Ahmed¹, Luiz Fernando Capretz, Salah Bouktif, Piers Campbell•Institutions (1)

University of Western Ontario¹

24 Jul 2015-arXiv: Software Engineering

TL;DR: It is found that currently although the software industry is paying attention to soft skills up to some extent while hiring but there is a need to further acknowledge the role of these skills in software development.

...read moreread less

Abstract: We review the literature relating to soft skills and the software engineering and information systems domain before describing a study based on 650 job advertisements posted on well-known recruitment sites from a range of geographical locations including, North America, Europe, Asia and Australia. The study makes use of nine defined soft skills to assess the level of demand for each of these skills related to individual job roles within the software industry. This work reports some of the vital statistics from industry about the requirements of soft skills in various roles of software development phases. The work also highlights the variation in the types of skills required for each of the roles. We found that currently although the software industry is paying attention to soft skills up to some extent while hiring but there is a need to further acknowledge the role of these skills in software development. The objective of this paper is to analyze the software industry soft skills requirements for various software development positions, such as system analyst, designer, programmer, and tester. We pose two research questions, namely, (1) What soft skills are appropriate to different software development lifecycle roles, and (2) Up to what extend does the software industry consider soft skills when hiring an employee. The study suggests that there is a further need of acknowledgment of the significance of soft skills from employers in software industry.

...read moreread less

Posted Content•

Benchmarking Machine Learning Technologies for Software Defect Detection

[...]

Saiqa Aleem, Luiz Fernando Capretz, Faheem Ahmed

24 Jun 2015-arXiv: Software Engineering

TL;DR: This study used public available data sets of software modules and provides comparative performance analysis of different machine learning techniques for software bug prediction and showed most of the machine learning methods performed well on software bug datasets.

...read moreread less

Abstract: Machine Learning approaches are good in solving problems that have less information. In most cases, the software domain problems characterize as a process of learning that depend on the various circumstances and changes accordingly. A predictive model is constructed by using machine learning approaches and classified them into defective and non-defective modules. Machine learning techniques help developers to retrieve useful information after the classification and enable them to analyse data from different perspectives. Machine learning techniques are proven to be useful in terms of software bug prediction. This study used public available data sets of software modules and provides comparative performance analysis of different machine learning techniques for software bug prediction. Results showed most of the machine learning methods performed well on software bug datasets.

...read moreread less

Journal Article•DOI•

Design patterns for self adaptive systems engineering

[...]

Yousef Abuseta, Khaled Swesi

06 Aug 2015-arXiv: Software Engineering

TL;DR: A set of design patterns for modeling and designing self adaptive software systems based on MAPE-K is proposed and a case study is presented to illustrate the applicability of the proposed design patterns.

...read moreread less

Abstract: Self adaptation has been proposed to overcome the complexity of today's software systems which results from the uncertainty issue. Aspects of uncertainty include changing systems goals, changing resource availability and dynamic operating conditions. Feedback control loops have been recognized as vital elements for engineering self-adaptive systems. However, despite their importance, there is still a lack of systematic way of the design of the interactions between the different components comprising one particular feedback control loop as well as the interactions between components from different control loops . Most existing approaches are either domain specific or too abstract to be useful. In addition, the issue of multiple control loops is often neglected and consequently self adaptive systems are often designed around a single loop. In this paper we propose a set of design patterns for modeling and designing self adaptive software systems based on MAPE-K. Control loop of IBM architecture blueprint which takes into account the multiple control loops issue. A case study is presented to illustrate the applicability of the proposed design patterns.

...read moreread less

Journal Article•DOI•

Cross platform app: a comparative study

[...]

Paulo Roberto Martins de Andrade, Adriano Bessa Albuquerque, Otávio F. Frota, Robson V. Silveira, Fátima Aguiar da Silva - Show less +1 more

11 Mar 2015-arXiv: Software Engineering

TL;DR: This paper shows how hybrid development can be an alternative for companies provide their services with a low investment and still offer a great service to their clients.

...read moreread less

Abstract: The use of mobile applications is now so common that users now expect companies whose services which they consume already have an application to provide these services or a mobile version of your site, but this is not always simple to do or cheap. Thus, the hybrid development has emerged as a potential alternative to this need. The evolution of this new paradigm has taken the attention of researchers and companies as viable alternative to the mobile development. This paper shows how hybrid development can be an alternative for companies provide their services with a low investment and still offer a great service to their clients.

...read moreread less

Proceedings Article•DOI•

Behavioral Compatibility of Simulink Models for Product Line Maintenance and Evolution

[...]

Bernhard Rumpe¹, Christoph Schulze¹, Michael von Wenckstern¹, Jan Oliver Ringert², Peter Manhart³ - Show less +1 more•Institutions (3)

RWTH Aachen University¹, Tel Aviv University², Daimler AG³

17 Nov 2015-arXiv: Software Engineering

TL;DR: This paper presents a model checking approach to determine behavioral compatibility of Simulink models, obtained from different component variants or during evolution, and a prototype for automated compatibility checking demonstrates its feasibility.

...read moreread less

Abstract: Embedded software systems, e.g. automotive, robotic or automation systems are highly configurable and consist of many software components being available in different variants and versions. To identify the degree of reusability between these different occurrences of a component, it is necessary to determine the functional backward and forward compatibility between them. Based on this information it is possible to identify in which system context a component can be replaced safely by another version, e.g. exchanging an older component, or variant, e.g. introducing new features, to achieve the same functionality. This paper presents a model checking approach to determine behavioral compatibility of Simulink models, obtained from different component variants or during evolution. A prototype for automated compatibility checking demonstrates its feasibility. In addition implemented optimizations make the analysis more efficient, when the compared variants or versions are structurally similar. A case study on a driver assistance system provided by Daimler AG shows the effectiveness of the approach to automatically compare Simulink components.

...read moreread less

Journal Article•DOI•

Project risk management model based on prince2 and scrum frameworks

[...]

Martin Tomanek, Jan Juricek

12 Feb 2015-arXiv: Software Engineering

TL;DR: Enrichment of Scrum with selected practices from the heavy-weight project management methodology PRINCE2 promises better results in delivering software products especially in global development projects.

...read moreread less

Abstract: There is a lack of formal risk management techniques in agile software development methods Scrum. The need to manage risks in agile project management is also identified by various authors. Authors of this paper conducted a survey to find out the current practices in agile project management. Furthermore authors discuss the new integrated framework of Scrum and PRINCE2 with focus on risk management. Enrichment of Scrum with selected practices from the heavy-weight project management framework PRINCE2 promises better results in delivering software products especially in global development projects.

...read moreread less

Posted Content•

Human Factors in Software Reliability Engineering

[...]

Maria Spichkova, Huai Liu, Mohsen Laali, Heinz W. Schmidt¹•Institutions (1)

RMIT University¹

12 Mar 2015-arXiv: Software Engineering

TL;DR: The vision of the integration of human factors engineering into the software development process is presented to improve the quality of software and to deal with human errors in a systematic way.

...read moreread less

Abstract: In this paper, we present our vision of the integration of human factors engineering into the software development process. The aim of this approach is to improve the quality of software and to deal with human errors in a systematic way.

...read moreread less

Collapse