scispace - formally typeset
Search or ask a question

Showing papers in "arXiv: Software Engineering in 2015"


Posted Content
TL;DR: In this paper, the authors present a systematic review of the literature that analyzes the use of machine learning algorithms in recommender systems and identifies research opportunities for software engineering research, and conclude that Bayesian and decision tree algorithms are widely used in recommendation systems because of their relative simplicity and that requirement and design phases of recommender system development appear to offer opportunities for further research.
Abstract: Recommender systems use algorithms to provide users with product or service recommendations. Recently, these systems have been using machine learning algorithms from the field of artificial intelligence. However, choosing a suitable machine learning algorithm for a recommender system is difficult because of the number of algorithms described in the literature. Researchers and practitioners developing recommender systems are left with little information about the current approaches in algorithm usage. Moreover, the development of a recommender system using a machine learning algorithm often has problems and open questions that must be evaluated, so software engineers know where to focus research efforts. This paper presents a systematic review of the literature that analyzes the use of machine learning algorithms in recommender systems and identifies research opportunities for software engineering research. The study concludes that Bayesian and decision tree algorithms are widely used in recommender systems because of their relative simplicity, and that requirement and design phases of recommender system development appear to offer opportunities for further research.

354 citations


Proceedings ArticleDOI
TL;DR: This paper presents a token-based clone detector, SourcererCC, that can detect both exact and near-miss clones from large inter-project repositories using a standard workstation, and evaluates the scalability, execution time, recall and precision, and compares it to four publicly available and state-of-the-art tools.
Abstract: Despite a decade of active research, there is a marked lack in clone detectors that scale to very large repositories of source code, in particular for detecting near-miss clones where significant editing activities may take place in the cloned code. We present SourcererCC, a token-based clone detector that targets three clone types, and exploits an index to achieve scalability to large inter-project repositories using a standard workstation. SourcererCC uses an optimized inverted-index to quickly query the potential clones of a given code block. Filtering heuristics based on token ordering are used to significantly reduce the size of the index, the number of code-block comparisons needed to detect the clones, as well as the number of required token-comparisons needed to judge a potential clone. We evaluate the scalability, execution time, recall and precision of SourcererCC, and compare it to four publicly available and state-of-the-art tools. To measure recall, we use two recent benchmarks, (1) a large benchmark of real clones, BigCloneBench, and (2) a Mutation/Injection-based framework of thousands of fine-grained artificial clones. We find SourcererCC has both high recall and precision, and is able to scale to a large inter-project repository (250MLOC) using a standard workstation.

259 citations


Posted Content
TL;DR: A development methodology that separates IoT application development into different concerns and provides a conceptual framework to develop an application and a development framework that implements the development methodology to support actions of stakeholders is proposed.
Abstract: Application development in the Internet of Things (IoT) is challenging because it involves dealing with a wide range of related issues such as lack of separation of concerns, and lack of high-level of abstractions to address both the large scale and heterogeneity. Moreover, stakeholders involved in the application development have to address issues that can be attributed to different life-cycles phases. when developing applications. First, the application logic has to be analyzed and then separated into a set of distributed tasks for an underlying network. Then, the tasks have to be implemented for the specific hardware. Apart from handling these issues, they have to deal with other aspects of life-cycle such as changes in application requirements and deployed devices. Several approaches have been proposed in the closely related fields of wireless sensor network, ubiquitous and pervasive computing, and software engineering in general to address the above challenges. However, existing approaches only cover limited subsets of the above mentioned challenges when applied to the IoT. This paper proposes an integrated approach for addressing the above mentioned challenges. The main contributions of this paper are: (1) a development methodology that separates IoT application development into different concerns and provides a conceptual framework to develop an application, (2) a development framework that implements the development methodology to support actions of stakeholders. The development framework provides a set of modeling languages to specify each development concern and abstracts the scale and heterogeneity related complexity. It integrates code generation, task-mapping, and linking techniques to provide automation. Code generation supports the application development phase by producing a programming framework that allows stakeholders to focus on the application logic, while our mapping and linking techniques together support the deployment phase by producing device-specific code to result in a distributed system collaboratively hosted by individual devices. Our evaluation based on two realistic scenarios shows that the use of our approach improves the productivity of stakeholders involved in the application development.

168 citations


Posted Content
TL;DR: It is found that code with bugs tends to be more entropic (i.e. unnatural), becoming less so as bugs are fixed, suggesting that entropy may be a valid, simple way to complement the effectiveness of PMD or FindBugs, and that search-based bug-fixing methods may benefit from using entropy both for fault-localization and searching for fixes.
Abstract: Real software, the kind working programmers produce by the kLOC to solve real-world problems, tends to be "natural", like speech or natural language; it tends to be highly repetitive and predictable. Researchers have captured this naturalness of software through statistical models and used them to good effect in suggestion engines, porting tools, coding standards checkers, and idiom miners. This suggests that code that appears improbable, or surprising, to a good statistical language model is "unnatural" in some sense, and thus possibly suspicious. In this paper, we investigate this hypothesis. We consider a large corpus of bug fix commits (ca.~8,296), from 10 different Java projects, and we focus on its language statistics, evaluating the naturalness of buggy code and the corresponding fixes. We find that code with bugs tends to be more entropic (i.e., unnatural), becoming less so as bugs are fixed. Focusing on highly entropic lines is similar in cost-effectiveness to some well-known static bug finders (PMD, FindBugs) and ordering warnings from these bug finders using an entropy measure improves the cost-effectiveness of inspecting code implicated in warnings. This suggests that entropy may be a valid language-independent and simple way to complement the effectiveness of PMD or FindBugs, and that search-based bug-fixing methods may benefit from using entropy both for fault-localization and searching for fixes.

150 citations


Posted Content
TL;DR: In this article, the authors report their experience and lessons learned in an ongoing project on migrating a monolithic on-premise software architecture to microservices and conclude that microservices is not a one-fit-all solution as it introduces new complexities to the system, and many factors, such as distribution complexities, should be considered before adopting this style.
Abstract: Migration to the cloud has been a popular topic in industry and academia in recent years. Despite many benefits that the cloud presents, such as high availability and scalability, most of the on-premise application architectures are not ready to fully exploit the benefits of this environment, and adapting them to this environment is a non-trivial task. Microservices have appeared recently as novel architectural styles that are native to the cloud. These cloud-native architectures can facilitate migrating on-premise architectures to fully benefit from the cloud environments because non-functional attributes, like scalability, are inherent in this style. The existing approaches on cloud migration does not mostly consider cloud-native architectures as their first-class citizens. As a result, the final product may not meet its primary drivers for migration. In this paper, we intend to report our experience and lessons learned in an ongoing project on migrating a monolithic on-premise software architecture to microservices. We concluded that microservices is not a one-fit-all solution as it introduces new complexities to the system, and many factors, such as distribution complexities, should be considered before adopting this style. However, if adopted in a context that needs high flexibility in terms of scalability and availability, it can deliver its promised benefits.

119 citations


Posted Content
TL;DR: This paper proposes a predictive process monitoring framework for estimating the probability that a given predicate will be fulfilled upon completion of a running case and takes into account both the sequence of events observed in the current trace, as well as data attributes associated to these events.
Abstract: Business process enactment is generally supported by information systems that record data about process executions, which can be extracted as event logs. Predictive process monitoring is concerned with exploiting such event logs to predict how running (uncompleted) cases will unfold up to their completion. In this paper, we propose a predictive process monitoring framework for estimating the probability that a given predicate will be fulfilled upon completion of a running case. The predicate can be, for example, a temporal logic constraint or a time constraint, or any predicate that can be evaluated over a completed trace. The framework takes into account both the sequence of events observed in the current trace, as well as data attributes associated to these events. The prediction problem is approached in two phases. First, prefixes of previous traces are clustered according to control flow information. Secondly, a classifier is built for each cluster using event data to discriminate between fulfillments and violations. At runtime, a prediction is made on a running case by mapping it to a cluster and applying the corresponding classifier. The framework has been implemented in the ProM toolset and validated on a log pertaining to the treatment of cancer patients in a large hospital.

100 citations


Posted Content
TL;DR: A publicly available dataset of untangled code changes, created with the help of two developers who accurately split their code changes into self contained tasks over a period of four months, and a novel approach to help developers share untangled commits (aka. atomic commits) by using fine-grained code change information.
Abstract: After working for some time, developers commit their code changes to a version control system. When doing so, they often bundle unrelated changes (e.g., bug fix and refactoring) in a single commit, thus creating a so-called tangled commit. Sharing tangled commits is problematic because it makes review, reversion, and integration of these commits harder and historical analyses of the project less reliable. Researchers have worked at untangling existing commits, i.e., finding which part of a commit relates to which task. In this paper, we contribute to this line of work in two ways: (1) A publicly available dataset of untangled code changes, created with the help of two developers who accurately split their code changes into self contained tasks over a period of four months; (2) a novel approach, EpiceaUntangler, to help developers share untangled commits (aka. atomic commits) by using fine-grained code change information. EpiceaUntangler is based and tested on the publicly available dataset, and further evaluated by deploying it to 7 developers, who used it for 2 weeks. We recorded a median success rate of 91% and average one of 75%, in automatically creating clusters of untangled fine-grained code changes.

84 citations


Posted Content
TL;DR: YesWorkflow enables scientists to annotate existing scripts with special comments that reveal the computational modules and dataflows otherwise implicit in these scripts, and represents the scripts in terms of entities based on the typical scientific workflow model.
Abstract: Scientific workflow management systems offer features for composing complex computational pipelines from modular building blocks, for executing the resulting automated workflows, and for recording the provenance of data products resulting from workflow runs. Despite the advantages such features provide, many automated workflows continue to be implemented and executed outside of scientific workflow systems due to the convenience and familiarity of scripting languages (such as Perl, Python, R, and MATLAB), and to the high productivity many scientists experience when using these languages. YesWorkflow is a set of software tools that aim to provide such users of scripting languages with many of the benefits of scientific workflow systems. YesWorkflow requires neither the use of a workflow engine nor the overhead of adapting code to run effectively in such a system. Instead, YesWorkflow enables scientists to annotate existing scripts with special comments that reveal the computational modules and dataflows otherwise implicit in these scripts. YesWorkflow tools extract and analyze these comments, represent the scripts in terms of entities based on the typical scientific workflow model, and provide graphical renderings of this workflow-like view of the scripts. Future versions of YesWorkflow also will allow the prospective provenance of the data products of these scripts to be queried in ways similar to those available to users of scientific workflow systems.

75 citations


Proceedings ArticleDOI
TL;DR: SWIM, a tool which suggests code snippets given API-related natural language queries such as "generate md5 hash code" is described, which translates user queries into the APIs of interest using clickthrough data from the Bing search engine.
Abstract: Modern programming frameworks come with large libraries, with diverse applications such as for matching regular expressions, parsing XML files and sending email. Programmers often use search engines such as Google and Bing to learn about existing APIs. In this paper, we describe SWIM, a tool which suggests code snippets given API-related natural language queries such as "generate md5 hash code". We translate user queries into the APIs of interest using clickthrough data from the Bing search engine. Then, based on patterns learned from open-source code repositories, we synthesize idiomatic code describing the use of these APIs. We introduce \emph{structured call sequences} to capture API-usage patterns. Structured call sequences are a generalized form of method call sequences, with if-branches and while-loops to represent conditional and repeated API usage patterns, and are simple to extract and amenable to synthesis. We evaluated SWIM with 30 common C# API-related queries received by Bing. For 70% of the queries, the first suggested snippet was a relevant solution, and a relevant solution was present in the top 10 results for all benchmarked queries. The online portion of the workflow is also very responsive, at an average of 1.5 seconds per snippet.

72 citations


Posted Content
TL;DR: The results suggest the need for addressing multiple security concerns more systematically and simultaneously, for tool chains supporting the MDS development cycle, and for more empirical studies on the application of MDS methodologies.
Abstract: Context: Model-Driven Security (MDS) is as a specialised Model-Driven Engineering research area for supporting the development of secure systems. Over a decade of research on MDS has resulted in a large number of publications. Objective: To provide a detailed analysis of the state of the art in MDS, a systematic literature review (SLR) is essential. Method: We conducted an extensive SLR on MDS. Derived from our research questions, we designed a rigorous, extensive search and selection process to identify a set of primary MDS studies that is as complete as possible. Our three-pronged search process consists of automatic searching, manual searching, and snowballing. After discovering and considering more than thousand relevant papers, we identified, strictly selected, and reviewed 108 MDS publications. Results: The results of our SLR show the overall status of the key artefacts of MDS, and the identified primary MDS studies. E.g. regarding security modelling artefact, we found that developing domain-specific languages plays a key role in many MDS approaches. The current limitations in each MDS artefact are pointed out and corresponding potential research directions are suggested. Moreover, we categorise the identified primary MDS studies into 5 principal MDS studies, and other emerging or less common MDS studies. Finally, some trend analyses of MDS research are given. Conclusion: Our results suggest the need for addressing multiple security concerns more systematically and simultaneously, for tool chains supporting the MDS development cycle, and for more empirical studies on the application of MDS methodologies. To the best of our knowledge, this SLR is the first in the field of Software Engineering that combines a snowballing strategy with database searching. This combination has delivered an extensive literature study on MDS.

59 citations


Posted Content
TL;DR: An experiment on automatically repairing 224 bugs of a real-world and publicly available bug dataset, Defects4J, finds that only 8 patches are undoubtedly correct, a novel piece of evidence that there is large room for improvement in the area of test suite based repair.
Abstract: Automatic software repair aims to reduce human effort for fixing bugs. Various automatic repair approaches have emerged in recent years. In this paper, we report on an experiment on automatically repairing 224 bugs of a real-world and publicly available bug dataset, Defects4J. We investigate the results of three repair methods, GenProg (repair via random search), Kali (repair via exhaustive search), and Nopol (repair via constraint based search). We conduct our investigation with five research questions: fixability, patch correctness, ill-defined bugs, performance, and fault localizability. Our implementations of GenProg, Kali, and Nopol fix together 41 out of 224 (18%) bugs with 59 different patches. This can be viewed as a baseline for future usage of Defects4J for automatic repair research. In addition, manual analysis of sampling 42 of 59 generated patches shows that only 8 patches are undoubtedly correct. This is a novel piece of evidence that there is large room for improvement in the area of test suite based repair.

Posted Content
TL;DR: In this paper, the authors propose that there is a common foundation to their resolution: languages, which are the primary way in which system developers communicate, design and implement systems, and provide abstractions that can encapsulate complexity, embrace the diversity of technologies and design abstractions, and unite modern and legacy systems.
Abstract: Modern day system developers have some serious problems to contend with. The systems they develop are becoming increasingly complex as customers demand richer functionality delivered in ever shorter timescales. They have to manage a huge diversity of implementation technologies, design techniques and development processes: everything from scripting languages to web-services to the latest 'silver bullet' design abstraction. To add to that, nothing stays still: today's 'must have' technology rapidly becomes tomorrow's legacy problem that must be managed along with everything else. How can these problems be dealt with? In this book we propose that there is a common foundation to their resolution: languages. Languages are the primary way in which system developers communicate, design and implement systems. Languages provide abstractions that can encapsulate complexity, embrace the diversity of technologies and design abstractions, and unite modern and legacy systems.

Posted Content
Lili Mou, Rui Men, Ge Li, Lu Zhang, Zhi Jin 
TL;DR: This paper envisions an end-to-end program generation scenario using recurrent neural networks (RNNs): Users can express their intention in natural language; an RNN then automatically generates corresponding code in a characterby-by-character fashion.
Abstract: This paper envisions an end-to-end program generation scenario using recurrent neural networks (RNNs): Users can express their intention in natural language; an RNN then automatically generates corresponding code in a characterby-by-character fashion. We demonstrate its feasibility through a case study and empirical analysis. To fully make such technique useful in practice, we also point out several cross-disciplinary challenges, including modeling user intention, providing datasets, improving model architectures, etc. Although much long-term research shall be addressed in this new field, we believe end-to-end program generation would become a reality in future decades, and we are looking forward to its practice.

Posted Content
TL;DR: In this article, the authors focus on tools, activities, and processes to ensure one of the most important quality attributes of a software system, namely performance, which describes system properties concerning its timeliness and use of resources.
Abstract: DevOps is a trend towards a tighter integration between development (Dev) and operations (Ops) teams. The need for such an integration is driven by the requirement to continuously adapt enterprise applications (EAs) to changes in the business environment. As of today, DevOps concepts have been primarily introduced to ensure a constant flow of features and bug fixes into new releases from a functional perspective. In order to integrate a non-functional perspective into these DevOps concepts this report focuses on tools, activities, and processes to ensure one of the most important quality attributes of a software system, namely performance. Performance describes system properties concerning its timeliness and use of resources. Common metrics are response time, throughput, and resource utilization. Performance goals for EAs are typically defined by setting upper and/or lower bounds for these metrics and specific business transactions. In order to ensure that such performance goals can be met, several activities are required during development and operation of these systems as well as during the transition from Dev to Ops. Activities during development are typically summarized by the term Software Performance Engineering (SPE), whereas activities during operations are called Application Performance Management (APM). SPE and APM were historically tackled independently from each other, but the newly emerging DevOps concepts require and enable a tighter integration between both activity streams. This report presents existing solutions to support this integration as well as open research challenges in this area.

Posted Content
TL;DR: In this paper, a Markov decision process is proposed for controller synthesis for stochastic and partially unknown environments, where the expected performance is measured using a cost function that is unknown prior to run-time exploration of the state space.
Abstract: We consider controller synthesis for stochastic and partially unknown environments in which safety is essential. Specifically, we abstract the problem as a Markov decision process in which the expected performance is measured using a cost function that is unknown prior to run-time exploration of the state space. Standard learning approaches synthesize cost-optimal strategies without guaranteeing safety properties. To remedy this, we first compute safe, permissive strategies. Then, exploration is constrained to these strategies and thereby meets the imposed safety requirements. Exploiting an iterative learning procedure, the resulting policy is safety-constrained and optimal. We show correctness and completeness of the method and discuss the use of several heuristics to increase its scalability. Finally, we demonstrate the applicability by means of a prototype implementation.

Proceedings ArticleDOI
TL;DR: In this article, a comprehensive literature review in affect theory and guidelines for conducting psycho-empirical software engineering are proposed. But, they are limited to software development tasks and do not cover other aspects of software development.
Abstract: Affects---emotions and moods---have an impact on cognitive processing activities and the working performance of individuals. It has been established that software development tasks are undertaken through cognitive processing activities. Therefore, we have proposed to employ psychology theory and measurements in software engineering (SE) research. We have called it "psychoempirical software engineering". However, we found out that existing SE research has often fallen into misconceptions about the affect of developers, lacking in background theory and how to successfully employ psychological measurements in studies. The contribution of this paper is threefold. (1) It highlights the challenges to conduct proper affect-related studies with psychology; (2) it provides a comprehensive literature review in affect theory; and (3) it proposes guidelines for conducting psychoempirical software engineering.

Posted Content
TL;DR: The xSAP safety analysis platform as mentioned in this paper provides several model-based safety analysis features for finite-and infinite-state synchronous transition systems, including Fault Trees, failure propagation analysis using Timed Failure Propagation Graphs (TFPGs), and Common Cause Analysis (CCA).
Abstract: This paper describes the xSAP safety analysis platform. xSAP provides several model-based safety analysis features for finite- and infinite-state synchronous transition systems. In particular, it supports library-based definition of fault modes, an automatic model extension facility, generation of safety analysis artifacts such as Dynamic Fault Trees (DFTs) and Failure Mode and Effects Analysis (FMEA) tables. Moreover, it supports probabilistic evaluation of Fault Trees, failure propagation analysis using Timed Failure Propagation Graphs (TFPGs), and Common Cause Analysis (CCA). xSAP has been used in several industrial projects as verification back-end, and is currently being evaluated in a joint R&D Project involving FBK and The Boeing Company.

Posted Content
TL;DR: A “base” of structural attributes with which application-level fault-tolerance structures can be qualitatively assessed and compared with each other and with respect to the aforementioned needs is defined and used to provide an elaborated survey of the state-of-the-art of application- level fault-Tolerance structures.
Abstract: The structures for the expression of fault-tolerance provisions into the application software are the central topic of this paper. Structuring techniques answer the questions "How to incorporate fault-tolerance in the application layer of a computer program" and "How to manage the fault-tolerant code". As such, they provide means to control complexity, the latter being a relevant factor for the introduction of design faults. This fact and the ever increasing complexity of today's distributed software justify the need for simple, coherent, and effective structures for the expression of fault-tolerance in the application software. In this text we first define a "base" of structural attributes with which application-level fault-tolerance structures can be qualitatively assessed and compared with each other and with respect to the above mentioned needs. This result is then used to provide an elaborated survey of the state-of-the-art of application-level fault-tolerance structures.

Posted Content
TL;DR: This contribution presents a method for the engineering of grammar-based language components that can be independently developed, are syntactically composable, and ultimately reusable, and allows the agile employment of modeling languages efficiently tailored for individual software projects.
Abstract: Effective model-driven engineering of complex systems requires to appropriately describe different specific system aspects. To this end, efficient integration of different heterogeneous modeling languages is essential. Modeling language integaration is onerous and requires in-depth conceptual and technical knowledge and ef- fort. Traditional modeling lanugage integration approches require language engineers to compose monolithic language aggregates for a specific task or project. Adapting these aggregates cannot be to different contexts requires vast effort and makes these hardly reusable. This contribution presents a method for the engineering of grammar-based language components that can be independently developed, are syntactically composable, and ultimately reusable. To this end, it introduces the concepts of language aggregation, language embed- ding, and language inheritance, as well as their realization in the language workbench MontiCore. The result is a generalizable, systematic, and efficient syntax-oriented composition of languages that allows the agile employment of modeling languages efficiently tailored for individual software projects.

Posted Content
TL;DR: This work discovered that a key reason why large real-world FMs are easy-to-analyze is that the vast majority of the variables in these models are unrestricted, i.e., the models are satisfiable for both true and false assignments to such variables under the current partial assignment.
Abstract: Modern conflict-driven clause-learning (CDCL) Boolean SAT solvers provide efficient automatic analysis of real-world feature models (FM) of systems ranging from cars to operating systems. It is well-known that solver-based analysis of real-world FMs scale very well even though SAT instances obtained from such FMs are large, and the corresponding analysis problems are known to be NP-complete. To better understand why SAT solvers are so effective, we systematically studied many syntactic and semantic characteristics of a representative set of large real-world FMs. We discovered that a key reason why large real-world FMs are easy-to-analyze is that the vast majority of the variables in these models are unrestricted, i.e., the models are satisfiable for both true and false assignments to such variables under the current partial assignment. Given this discovery and our understanding of CDCL SAT solvers, we show that solvers can easily find satisfying assignments for such models without too many backtracks relative to the model size, explaining why solvers scale so well. Further analysis showed that the presence of unrestricted variables in these real-world models can be attributed to their high-degree of variability. Additionally, we experimented with a series of well-known non-backtracking simplifications that are particularly effective in solving FMs. The remaining variables/clauses after simplifications, called the core, are so few that they are easily solved even with backtracking, further strengthening our conclusions.

Book ChapterDOI
TL;DR: In this article, the authors observed 115 students engaged in the act of modeling, recording all their interactions with the modeling environment using a specialized tool, and the recordings of process modeling were subsequently clustered.
Abstract: Research on quality issues of business process models has recently begun to explore the process of creating process models. As a consequence, the question arises whether different ways of creating process models exist. In this vein, we observed 115 students engaged in the act of modeling, recording all their interactions with the modeling environment using a specialized tool. The recordings of process modeling were subsequently clustered. Results presented in this paper suggest the existence of three distinct modeling styles, exhibiting significantly different characteristics. We believe that this finding constitutes another building block toward a more comprehensive understanding of the process of process modeling that will ultimately enable us to support modelers in creating better business process models.

Posted Content
TL;DR: In this article, the authors examined and discussed the evolutionary principles that govern the coordination of developers in 18 large open-source projects and found that the implicit and self-organizing structure of developer coordination is ubiquitously described by non-random organizational principles.
Abstract: Software evolution is a fundamental process that transcends the realm of technical artifacts and permeates the entire organizational structure of a software project. By means of a longitudinal empirical study of 18 large open-source projects, we examine and discuss the evolutionary principles that govern the coordination of developers. By applying a network-analytic approach, we found that the implicit and self-organizing structure of developer coordination is ubiquitously described by non-random organizational principles that defy conventional software-engineering wisdom. In particular, we found that: (a) developers form scale-free networks, in which the majority of coordination requirements arise among an extremely small number of developers, (b) developers tend to accumulate coordination requirements with more and more developers over time, presumably limited by an upper bound, and (c) initially developers are hierarchically arranged, but over time, form a hybrid structure, in which core developers are hierarchically arranged and peripheral developers are not. Our results suggest that the organizational structure of large projects is constrained to evolve towards a state that balances the costs and benefits of developer coordination, and the mechanisms used to achieve this state depend on the project's scale.

Proceedings ArticleDOI
TL;DR: In this paper, the authors explore how GITHUB developers use continous integration as well as whether the contribution type (direct versus indirect) and different project characteristics (e.g., main programming language, or project age) are associated with the success of the automatic builds.
Abstract: Continuous integration is a software engineering practice of frequently merging all developer working copies with a shared main branch, e.g., several times a day. With the advent of GITHUB, a platform well known for its "social coding" features that aid collaboration and sharing, and currently the largest code host in the open source world, collab- orative software development has never been more prominent. In GITHUB development one can distinguish between two types of developer contributions to a project: direct ones, coming from a typically small group of developers with write access to the main project repository, and indirect ones, coming from developers who fork the main repository, update their copies locally, and submit pull requests for review and merger. In this paper we explore how GITHUB developers use contin- uous integration as well as whether the contribution type (direct versus indirect) and different project characteristics (e.g., main programming language, or project age) are associated with the success of the automatic builds.

Posted ContentDOI
TL;DR: It is found that currently although the software industry is paying attention to soft skills up to some extent while hiring but there is a need to further acknowledge the role of these skills in software development.
Abstract: We review the literature relating to soft skills and the software engineering and information systems domain before describing a study based on 650 job advertisements posted on well-known recruitment sites from a range of geographical locations including, North America, Europe, Asia and Australia. The study makes use of nine defined soft skills to assess the level of demand for each of these skills related to individual job roles within the software industry. This work reports some of the vital statistics from industry about the requirements of soft skills in various roles of software development phases. The work also highlights the variation in the types of skills required for each of the roles. We found that currently although the software industry is paying attention to soft skills up to some extent while hiring but there is a need to further acknowledge the role of these skills in software development. The objective of this paper is to analyze the software industry soft skills requirements for various software development positions, such as system analyst, designer, programmer, and tester. We pose two research questions, namely, (1) What soft skills are appropriate to different software development lifecycle roles, and (2) Up to what extend does the software industry consider soft skills when hiring an employee. The study suggests that there is a further need of acknowledgment of the significance of soft skills from employers in software industry.

Posted Content
TL;DR: This study used public available data sets of software modules and provides comparative performance analysis of different machine learning techniques for software bug prediction and showed most of the machine learning methods performed well on software bug datasets.
Abstract: Machine Learning approaches are good in solving problems that have less information. In most cases, the software domain problems characterize as a process of learning that depend on the various circumstances and changes accordingly. A predictive model is constructed by using machine learning approaches and classified them into defective and non-defective modules. Machine learning techniques help developers to retrieve useful information after the classification and enable them to analyse data from different perspectives. Machine learning techniques are proven to be useful in terms of software bug prediction. This study used public available data sets of software modules and provides comparative performance analysis of different machine learning techniques for software bug prediction. Results showed most of the machine learning methods performed well on software bug datasets.

Journal ArticleDOI
TL;DR: A set of design patterns for modeling and designing self adaptive software systems based on MAPE-K is proposed and a case study is presented to illustrate the applicability of the proposed design patterns.
Abstract: Self adaptation has been proposed to overcome the complexity of today's software systems which results from the uncertainty issue. Aspects of uncertainty include changing systems goals, changing resource availability and dynamic operating conditions. Feedback control loops have been recognized as vital elements for engineering self-adaptive systems. However, despite their importance, there is still a lack of systematic way of the design of the interactions between the different components comprising one particular feedback control loop as well as the interactions between components from different control loops . Most existing approaches are either domain specific or too abstract to be useful. In addition, the issue of multiple control loops is often neglected and consequently self adaptive systems are often designed around a single loop. In this paper we propose a set of design patterns for modeling and designing self adaptive software systems based on MAPE-K. Control loop of IBM architecture blueprint which takes into account the multiple control loops issue. A case study is presented to illustrate the applicability of the proposed design patterns.

Journal ArticleDOI
TL;DR: This paper shows how hybrid development can be an alternative for companies provide their services with a low investment and still offer a great service to their clients.
Abstract: The use of mobile applications is now so common that users now expect companies whose services which they consume already have an application to provide these services or a mobile version of your site, but this is not always simple to do or cheap. Thus, the hybrid development has emerged as a potential alternative to this need. The evolution of this new paradigm has taken the attention of researchers and companies as viable alternative to the mobile development. This paper shows how hybrid development can be an alternative for companies provide their services with a low investment and still offer a great service to their clients.

Proceedings ArticleDOI
TL;DR: This paper presents a model checking approach to determine behavioral compatibility of Simulink models, obtained from different component variants or during evolution, and a prototype for automated compatibility checking demonstrates its feasibility.
Abstract: Embedded software systems, e.g. automotive, robotic or automation systems are highly configurable and consist of many software components being available in different variants and versions. To identify the degree of reusability between these different occurrences of a component, it is necessary to determine the functional backward and forward compatibility between them. Based on this information it is possible to identify in which system context a component can be replaced safely by another version, e.g. exchanging an older component, or variant, e.g. introducing new features, to achieve the same functionality. This paper presents a model checking approach to determine behavioral compatibility of Simulink models, obtained from different component variants or during evolution. A prototype for automated compatibility checking demonstrates its feasibility. In addition implemented optimizations make the analysis more efficient, when the compared variants or versions are structurally similar. A case study on a driver assistance system provided by Daimler AG shows the effectiveness of the approach to automatically compare Simulink components.

Journal ArticleDOI
TL;DR: Enrichment of Scrum with selected practices from the heavy-weight project management methodology PRINCE2 promises better results in delivering software products especially in global development projects.
Abstract: There is a lack of formal risk management techniques in agile software development methods Scrum. The need to manage risks in agile project management is also identified by various authors. Authors of this paper conducted a survey to find out the current practices in agile project management. Furthermore authors discuss the new integrated framework of Scrum and PRINCE2 with focus on risk management. Enrichment of Scrum with selected practices from the heavy-weight project management framework PRINCE2 promises better results in delivering software products especially in global development projects.

Posted Content
TL;DR: The vision of the integration of human factors engineering into the software development process is presented to improve the quality of software and to deal with human errors in a systematic way.
Abstract: In this paper, we present our vision of the integration of human factors engineering into the software development process. The aim of this approach is to improve the quality of software and to deal with human errors in a systematic way.