scispace - formally typeset
Search or ask a question

Showing papers in "Information & Software Technology in 2020"


Journal ArticleDOI
TL;DR: A systematic literature review of unsupervised machine learners in software defect prediction found a worrying prevalence of demonstrably erroneous experimental results, undemanding benchmarks and incomplete reporting, and the meta-analysis shows that unsuper supervised models are comparable with supervised models for both within-project and cross-project prediction.
Abstract: Background Unsupervised machine learners have been increasingly applied to software defect prediction. It is an approach that may be valuable for software practitioners because it reduces the need for labeled training data. Objective Investigate the use and performance of unsupervised learning techniques in software defect prediction. Method We conducted a systematic literature review that identified 49 studies containing 2456 individual experimental results, which satisfied our inclusion criteria published between January 2000 and March 2018. In order to compare prediction performance across these studies in a consistent way, we (re-)computed the confusion matrices and employed the Matthews Correlation Coefficient (MCC) as our main performance measure. Results Our meta-analysis shows that unsupervised models are comparable with supervised models for both within-project and cross-project prediction. Among the 14 families of unsupervised model, Fuzzy CMeans (FCM) and Fuzzy SOMs (FSOMs) perform best. In addition, where we were able to check, we found that almost 11% (262/2456) of published results (contained in 16 papers) were internally inconsistent and a further 33% (823/2456) provided insufficient details for us to check. Conclusion Although many factors impact the performance of a classifier, e.g., dataset characteristics, broadly speaking, unsupervised classifiers do not seem to perform worse than the supervised classifiers in our review. However, we note a worrying prevalence of (i) demonstrably erroneous experimental results, (ii) undemanding benchmarks and (iii) incomplete reporting. We therefore encourage researchers to be comprehensive in their reporting.

123 citations


Journal ArticleDOI
TL;DR: A systematic review of studies related to the prediction of maintainability of object-oriented software systems using machine learning techniques from January 1991 to July 2018 indicates that there is relatively little activity in the area of software maintainability prediction compared with other software quality attributes.
Abstract: Context Software maintainability is one of the fundamental quality attributes of software engineering. The accurate prediction of software maintainability is a significant challenge for the effective management of the software maintenance process. Objective The major aim of this paper is to present a systematic review of studies related to the prediction of maintainability of object-oriented software systems using machine learning techniques. This review identifies and investigates a number of research questions to comprehensively summarize, analyse and discuss various viewpoints concerning software maintainability measurements, metrics, datasets, evaluation measures, individual models and ensemble models. Method The review uses the standard systematic literature review method applied to the most common computer science digital database libraries from January 1991 to July 2018. Results We survey 56 relevant studies in 35 journals and 21 conference proceedings. The results indicate that there is relatively little activity in the area of software maintainability prediction compared with other software quality attributes. CHANGE maintenance effort and the maintainability index were the most commonly used software measurements (dependent variables) employed in the selected primary studies, and most made use of class-level product metrics as the independent variables. Several private datasets were used in the selected studies, and there is a growing demand to publish datasets publicly. Most studies focused on regression problems and performed k-fold cross-validation. Individual prediction models were employed in the majority of studies, while ensemble models relatively rarely. Conclusion Based on the findings obtained in this systematic literature review, ensemble models demonstrated increased accuracy prediction over individual models, and have been shown to be useful models in predicting software maintainability. However, their application is relatively rare and there is a need to apply these, and other models to an extensive variety of datasets with the aim of improving the accuracy and consistency of results.

66 citations


Journal ArticleDOI
TL;DR: The proposed approach for testing autonomous driving takes ontologies describing the environment of autonomous vehicles, and automatically converts it to test cases that are used in a simulation environment to verify automated driving functions, and relies on combinatorial testing.
Abstract: Context: Ontologies are known as a formal and explicit conceptualization of entities, their interfaces, behaviors, and relationships. They have been applied in various application domains such as autonomous driving where ontologies are used for decision making, traffic description, auto-pilot etc. It has always been a challenge to test the corresponding safety-critical software systems in autonomous driving that have been playing an increasingly important role in our daily routines. Objective: Failures in these systems potentially not only cause great financial loss but also the loss of lives. Therefore, it is vital to obtain and cover as many as critical driving scenarios during auto drive testing to ensure that the system can always reach a fail-safe state under different circumstances. Method: We outline a general framework for testing, verification, and validation for automated and autonomous driving functions. The introduced method makes use of ontologies for describing the environment of autonomous vehicles and convert them to input models for combinatorial testing. The combinatorial test suite comprises abstract test cases that are mapped to concrete test cases that can be executed using simulation environments. Results: We discuss in detail on how to automatically convert ontologies to the corresponding combinatorial testing input models. Specifically, we present two conversion algorithms and compare their applicability using ontologies with different sizes. We also carried out a case study to further demonstrate the practical value of applying ontology-based test generation in industrial settings. Conclusion: The proposed approach for testing autonomous driving takes ontologies describing the environment of autonomous vehicles, and automatically converts it to test cases that are used in a simulation environment to verify automated driving functions. The conversion relies on combinatorial testing. The first experimental results relying on an example from the automotive industry indicates that the approach can be used in practice.

63 citations


Journal ArticleDOI
TL;DR: The development and maintenance on large-scale ML-based systems in industrial settings introduce new challenges specific for ML, and for the known challenges characteristic for these types of systems, require new methods in overcoming the challenges.
Abstract: Background: Developing and maintaining large scale machine learning (ML) based software systems in an industrial setting is challenging There are no well-established development guidelines, but the literature contains reports on how companies develop and maintain deployed ML-based software systems Objective: This study aims to survey the literature related to development and maintenance of large scale ML-based systems in industrial settings in order to provide a synthesis of the challenges that practitioners face In addition, we identify solutions used to address some of these challenges Method: A systematic literature review was conducted and we identified 72 papers related to development and maintenance of large scale ML-based software systems in industrial settings The selected articles were qualitatively analyzed by extracting challenges and solutions The challenges and solutions were thematically synthesized into four quality attributes: adaptability, scalability, safety and privacy The analysis was done in relation to ML workflow, ie data acquisition, training, evaluation, and deployment Results: We identified a total of 23 challenges and 8 solutions related to development and maintenance of large scale ML-based software systems in industrial settings including six different domains Challenges were most often reported in relation to adaptability and scalability Safety and privacy challenges had the least reported solutions Conclusion: The development and maintenance on large-scale ML-based systems in industrial settings introduce new challenges specific for ML, and for the known challenges characteristic for these types of systems, require new methods in overcoming the challenges The identified challenges highlight important concerns in ML system development practice and the lack of solutions point to directions for future research

61 citations


Journal ArticleDOI
TL;DR: Using a hybrid search strategy involving a representative digital library and parallel or sequential snowballing tends to represent an appropriate alternative to be used when searching for evidence in SLRs.
Abstract: Context When conducting a Systematic Literature Review (SLR), researchers usually face the challenge of designing a search strategy that appropriately balances result quality and review effort. Using digital library (or database) searches or snowballing alone may not be enough to achieve high-quality results. On the other hand, using both digital library searches and snowballing together may increase the overall review effort. Objective The goal of this research is to propose and evaluate hybrid search strategies that selectively combine database searches with snowballing. Method We propose four hybrid search strategies combining database searches in digital libraries with iterative, parallel, or sequential backward and forward snowballing. We simulated the strategies over three existing SLRs in SE that adopted both database searches and snowballing. We compared the outcome of digital library searches, snowballing, and hybrid strategies using precision, recall, and F-measure to investigate the performance of each strategy. Results Our results show that, for the analyzed SLRs, combining database searches from the Scopus digital library with parallel or sequential snowballing achieved the most appropriate balance of precision and recall. Conclusion We put forward that, depending on the goals of the SLR and the available resources, using a hybrid search strategy involving a representative digital library and parallel or sequential snowballing tends to represent an appropriate alternative to be used when searching for evidence in SLRs.

60 citations


Journal ArticleDOI
TL;DR: Although MFL is becoming of grave concern, existing solutions in the field are less mature and concrete solutions to reduce MFL debugging time and cost by adopting an approach such as MBA debugging approach are also less, which require more attention from the research community.
Abstract: Context Multiple fault localization (MFL) is the act of identifying the locations of multiple faults (more than one fault) in a faulty software program. This is known to be more complicated, tedious, and costly in comparison to the traditional practice of presuming that a software contains a single fault. Due to the increasing interest in MFL by the research community, a broad spectrum of MFL debugging approaches and solutions have been proposed and developed. Objective The aim of this study is to systematically review existing research on MFL in the software fault localization (SFL) domain. This study also aims to identify, categorize, and synthesize relevant studies in the research domain. Method Consequently, using an evidence-based systematic methodology, we identified 55 studies relevant to four research questions. The methodology provides a systematic selection and evaluation process with rigorous and repeatable evidence-based studies selection process. Result The result of the systematic review shows that research on MFL is gaining momentum with stable growth in the last 5 years. Three prominent MFL debugging approaches were identified, i.e. One-bug-at-a-time debugging approach (OBA), parallel debugging approach, and multiple-bug-at-a-time debugging approach (MBA), with OBA debugging approach being utilized the most. Conclusion The study concludes with some identified research challenges and suggestions for future research. Although MFL is becoming of grave concern, existing solutions in the field are less mature. Studies utilizing real faults in their experiments are scarce. Concrete solutions to reduce MFL debugging time and cost by adopting an approach such as MBA debugging approach are also less, which require more attention from the research community.

49 citations


Journal ArticleDOI
TL;DR: A quality model is built composed of seven desirable aspects of automated log-abstraction techniques that support software engineers in understanding the advantages and limitations of existing techniques and in choosing the suitable technique to their unique use cases.
Abstract: Context: Logs are often the first and only information available to software engineers to understand and debug their systems. Automated log-analysis techniques help software engineers gain insights into large log data. These techniques have several steps, among which log abstraction is the most important because it transforms raw log-data into high-level information. Thus, log abstraction allows software engineers to perform further analyses. Existing log-abstraction techniques vary significantly in their designs and performances. To the best of our knowledge, there is no study that examines the performances of these techniques with respect to the following seven quality aspects concurrently: mode, coverage, delimiter independence, efficiency,scalability, system knowledge independence, and parameter tuning effort. Objectives: We want (1) to build a quality model for evaluating automated log-abstraction techniques and (2) to evaluate and recommend existing automated log-abstraction techniques using this quality model. Method: We perform a systematic literature review (SLR) of automated log-abstraction techniques. We review 89 research papers out of 2,864 initial papers. Results: Through this SLR, we (1) identify 17 automated log-abstraction techniques, (2) build a quality model composed of seven desirable aspects: mode, coverage, delimiter independence, efficiency, scalability, system knowledge independence, and parameter tuning effort, and (3) make recommendations for researchers on future research directions. Conclusion: Our quality model and recommendations help researchers learn about the state-of-the-art automated log-abstraction techniques, identify research gaps to enhance existing techniques, and develop new ones. We also support software engineers in understanding the advantages and limitations of existing techniques and in choosing the suitable technique to their unique use cases.

48 citations


Journal ArticleDOI
TL;DR: This study identified research gaps, such as the need for more tools and guidelines, lightweight QR management strategies that fit short iteration cycles, investigations of the link between QRs challenges and technical debt, and extension of empirical validation of existing strategies to a wider context.
Abstract: Context Quality requirements (QRs) describe the desired quality of software, and they play an important role in the success of software projects. In agile software development (ASD), QRs are often ill-defined and not well addressed due to the focus on quickly delivering functionality. Rapid software development (RSD) approaches (e.g., continuous delivery and continuous deployment), which shorten delivery times, are more prone to neglect QRs. Despite the significance of QRs in both ASD and RSD, there is limited synthesized knowledge on their management in those approaches. Objective This study aims to synthesize state-of-the-art knowledge about QR management in ASD and RSD, focusing on three aspects: bibliometric, strategies, and challenges. Research method Using a systematic mapping study with a snowballing search strategy, we identified and structured the literature on QR management in ASD and RSD. Results We found 156 primary studies: 106 are empirical studies, 16 are experience reports, and 34 are theoretical studies. Security and performance were the most commonly reported QR types. We identified various QR management strategies: 74 practices, 43 methods, 13 models, 12 frameworks, 11 advices, 10 tools, and 7 guidelines. Additionally, we identified 18 categories and 4 non-recurring challenges of managing QRs. The limited ability of ASD to handle QRs, time constraints due to short iteration cycles, limitations regarding the testing of QRs and neglect of QRs were the top categories of challenges. Conclusion Management of QRs is significant in ASD and is becoming important in RSD. This study identified research gaps, such as the need for more tools and guidelines, lightweight QR management strategies that fit short iteration cycles, investigations of the link between QRs challenges and technical debt, and extension of empirical validation of existing strategies to a wider context. It also synthesizes QR management strategies and challenges, which may be useful for practitioners.

47 citations


Journal ArticleDOI
TL;DR: A data-driven approach to measure the level of maintenance activity of GitHub projects to alert users about the risks of using unmaintained projects and possibly motivate other developers to assume the maintenance of such projects.
Abstract: Context GitHub hosts an impressive number of high-quality OSS projects. However, selecting “the right tool for the job” is a challenging task, because we do not have precise information about those high-quality projects. Objective In this paper, we propose a data-driven approach to measure the level of maintenance activity of GitHub projects. Our goal is to alert users about the risks of using unmaintained projects and possibly motivate other developers to assume the maintenance of such projects. Method We train machine learning models to define a metric to express the level of maintenance activity of GitHub projects. Next, we analyze the historical evolution of 2927 active projects in the time frame of one year. Results From 2927 active projects, 16% become unmaintained in the interval of one year. We also found that Objective-C projects tend to have lower maintenance activity than projects implemented in other languages. Finally, software tools—such as compilers and editors—have the highest maintenance activity over time. Conclusions A metric about the level of maintenance activity of GitHub projects can help developers to select open source projects.

44 citations


Journal ArticleDOI
TL;DR: A behavior-based malware detection technique of high accuracy is proposed by processing the runtime features in a new way by integrating them in the training feature set to develop the malware classifiers using the machine learning algorithms.
Abstract: Malicious software deliberately affects the computer systems. Malware are analyzed using static or dynamic analysis techniques. Using these techniques, unique patterns are extracted to detect malware correctly. In this paper, a behavior-based malware detection technique is proposed. Various runtime features are extracted by setting up a dynamic analysis environment using the Cuckoo sandbox. Three primary features are processed for developing malware classifier. Firstly, printable strings are processed word by word using text mining techniques which produced a very high dimension matrix of the string features. Then we apply the singular value decomposition technique for reducing dimensions of string features. Secondly, Shannon entropy is computed over the printable strings and API calls to consider the randomness of API and PSI features. In addition to these features, behavioral features regarding file operations, registry key modification and network activities are used in malware detection. Finally, all features are integrated in the training feature set to develop the malware classifiers using the machine learning algorithms. The proposed technique is validated with 16489 malware and 8422 benign files. Our experimental results show the accuracy of 99.54% in malware detection using ensemble machine learning algorithms. Moreover, it aims to develop a behavior-based malware detection technique of high accuracy by processing the runtime features in a new way.

39 citations


Journal ArticleDOI
TL;DR: This work presents the results of a systematic mapping study on Test Case Prioritization in Continuous Integration environments (TCPCI) that reports the main characteristics of TCPCI approaches and their evaluation aspects, and observed a growing number of studies in the field.
Abstract: Context: Continuous Integration (CI) environments allow frequent integration of software changes, making software evolution more rapid and cost-effective. In such environments, the regression test plays an important role, as well as the use of Test Case Prioritization (TCP) techniques. Such techniques attempt to identify the test case order that maximizes certain goals, such as early fault detection. This research subject has been raising interest because some new challenges are faced in the CI context, as TCP techniques need to consider time constraints of the CI environments. Objective: This work presents the results of a systematic mapping study on Test Case Prioritization in Continuous Integration environments (TCPCI) that reports the main characteristics of TCPCI approaches and their evaluation aspects. Method: The mapping was conducted following a plan that includes the definition of research questions, selection criteria and search string, and the selection of search engines. The search returned 35 primary studies classified based on the goal and kind of used TCP technique, addressed CI particularities and testing problems, and adopted evaluation measures. Results: The results show a growing interest in this research subject. Most studies have been published in the last four years. 80% of the approaches are history-based, that is, are based on the failure and test execution history. The great majority of studies report evaluation results by comparing prioritization techniques. The preferred measures are Time and number/percentage of Faults Detected. Few studies address CI testing problems and characteristics, such as parallel execution and test case volatility. Conclusions: We observed a growing number of studies in the field. Future work should explore other information sources such as models and requirements, as well as CI particularities and testing problems, such as test case volatility, time constraint, and flaky tests, to solve existing challenges and offer cost-effective approaches to the software industry.

Journal ArticleDOI
TL;DR: In this article, the authors present guidelines providing support for those willing to carry out survey-based studies in software engineering research, and present several guidelines for those interested in conducting surveys.
Abstract: Context: Over the past decade Software Engineering research has seen a steady increase in survey-based studies, and there are several guidelines providing support for those willing to carry out sur ...

Journal ArticleDOI
TL;DR: Overall, although the topic area is up-and-coming, for many areas of application, it is still in its infancy, so there is a need for more empirical studies, and there are a plethora of new opportunities for researchers.
Abstract: CONTEXT: Intelligent Software Engineering (ISE) refers to the application of intelligent techniques to software engineering. We define an “intelligent technique” as a technique that explores data (from digital artifacts or domain experts) for knowledge discovery, reasoning, learning, planning, natural language processing, perception or supporting decision-making. OBJECTIVE: The purpose of this study is to synthesize and analyze the state of the art of the field of applying intelligent techniques to Agile Software Development (ASD). Furthermore, we assess its maturity and identify adoption risks. METHOD: Using a systematic literature review, we identified 104 primary studies, resulting in 93 unique studies. RESULTS: We identified that there is a positive trend in the number of studies applying intelligent techniques to ASD. Also, we determined that reasoning under uncertainty (mainly, Bayesian network), search-based solutions, and machine learning are the most popular intelligent techniques in the context of ASD. In terms of purposes, the most popular ones are effort estimation, requirements prioritization, resource allocation, requirements selection, and requirements management. Furthermore, we discovered that the primary goal of applying intelligent techniques is to support decision making. As a consequence, the adoption risks in terms of the safety of the current solutions are low. Finally, we highlight the trend of using explainable intelligent techniques. CONCLUSION: Overall, although the topic area is up-and-coming, for many areas of application, it is still in its infancy. So, this means that there is a need for more empirical studies, and there are a plethora of new opportunities for researchers.

Journal ArticleDOI
TL;DR: The SEG-M2 has been designed for software producing organizations to assess their ecosystem governance practices, set a goal for improvement, and execute an improvement plan and is evaluated by both researchers and practitioners as a useful collection of practices that enable decision making about software ecosystem governance.
Abstract: Context Increasingly, software companies are realizing that they can no longer compete through product excellence alone. The ecosystems that surround platforms, such as operating systems, enterprise applications, and even social networks are undeniably responsible for a large part of a platform’s success. With this realization, software producing organizations need to devise tools and strategies to improve their ecosystems and reinvent tools that others have invented many times before. Objective In this article, the software ecosystem governance maturity model (SEG-M2) is presented, which has been designed along the principles of a focus area maturity model. The SEG-M2 has been designed for software producing organizations to assess their ecosystem governance practices, set a goal for improvement, and execute an improvement plan. Method The model has been created following an established focus area maturity model design method. The model has been evaluated in six evaluating case studies with practitioners, first by applying the model to their organizations and secondly by evaluating with the practitioners whether the evaluation and improvement advice from the model is valid, useful, and effective. Result The model is extensively described and illustrated using six desk studies and six case studies. Conclusions The model is evaluated by both researchers and practitioners as a useful collection of practices that enable decision making about software ecosystem governance. We find that maturity models are an effective tool in disseminating a large collection of knowledge, but that research and creation tooling for maturity models is limited.

Journal ArticleDOI
TL;DR: Analyzing both diffuseness of TD items and accuracy of remediation time, estimated by SonarQube, to fix TD items on a set of 21 open-source Java projects shows that the remediated time estimated by the tool is inaccurate and, as compared to the actual time spent to fixTD items, is in most cases overestimated.
Abstract: Context. Among the static analysis tools available, SonarQube is one of the most used. SonarQube detects Technical Debt (TD) items—i.e., violations of coding rules—and then estimates TD as the time needed to remedy TD items. However, practitioners are still skeptical about the accuracy of remediation time estimated by the tool. Objective. In this paper, we analyze both diffuseness of TD items and accuracy of remediation time, estimated by SonarQube, to fix TD items on a set of 21 open-source Java projects. Method. We designed and conducted a case study where we asked 81 junior developers to fix TD items and reduce the TD of 21 projects. Results. We observed that TD items are diffused in the analyzed projects and most items are code smells. Moreover, the results point out that the remediation time estimated by SonarQube is inaccurate and, as compared to the actual time spent to fix TD items, is in most cases overestimated. Conclusions. The results of our study are promising for practitioners and researchers. The former can make more aware decisions during project execution and resource management, the latter can use this study as a starting point for improving TD estimation models.

Journal ArticleDOI
TL;DR: The Systematic Literature Reviews (SLRs) have been adopted within Software Engineering (SE) for more than a decade to provide meaningful summaries of evidence on several topics.
Abstract: Context: Systematic Literature Reviews (SLRs) have been adopted within Software Engineering (SE) for more than a decade to provide meaningful summaries of evidence on several topics. Many of these ...

Journal ArticleDOI
TL;DR: How benign Android apps execute differently from malware over time is studied, in terms of their execution structures measured by the distribution and interaction among functionality scopes, app components, and callbacks.
Abstract: Context The constant evolution of the Android platform and its applications have imposed significant challenges both to understanding and securing the Android ecosystem. Yet, despite the growing body of relevant research, it remains unclear how Android apps evolve in terms of their run-time behaviors in ways that impede our gaining consistent empirical knowledge about the workings of the ecosystem and developing effective technical solutions to defending it against security threats. Intuitively, an essential step towards addressing these challenges is to first understand the evolution itself. Among others, one avenue to examining a program’s run-time behavior is to dissect the program’s execution in terms of its syntactic and semantic structure. Objective In this paper, we study how benign Android apps execute differently from malware over time, in terms of their execution structures measured by the distribution and interaction among functionality scopes, app components, and callbacks. In doing so, we attempt to reveal how relevant app execution structure is to app security orientation (i.e., benign or malicious). Method By tracing the method calls and inter-component communications (ICCs) of 15,451 benign apps and 15,183 malware developed during eight years (2010–2017), we systematically characterized the execution structure of malware versus benign apps and revealed similarities and disparities between them that are not previously known. Results Our results show, among other findings, that (1) despite their similarity in execution distribution over functionality scopes, malware accessed framework functionalities mainly through third-party libraries, while benign apps were dominated by calls within the framework; (2) use of Activity component had been rising in malware while benign apps saw continuous drop in such uses; (3) malware invoked significantly more Services but less Content Providers than benign apps during the evolution of both groups; (4) malware carried ICC data significantly less often via standard data fields than benign apps, albeit both groups did not carry any data in most ICCs; and (5) newer malware tended to have more even distribution of callbacks among event-handler categories, while the distribution remained constant in benign apps over time. Conclusion We discussed how these findings inform understanding app behaviors, optimizing static and dynamic code analysis of Android apps, and developing sustainable app security defense solutions.

Journal ArticleDOI
TL;DR: This paper shows how information found in stack traces (a more formal source of data containing the history of function calls to the function in which the crash happened) can be used to automatically predict the severity of bugs.
Abstract: Context The severity of a bug is often used as an indicator of how a bug negatively affects system functionality. It is used by developers to prioritize bugs which need to be fixed. The problem is that, for various reasons, bug submitters often enter the incorrect severity level, delaying the bug resolution process. Techniques that can automatically predict the severity of a bug can significantly reduce the bug triaging overhead. In our previous work, we showed that the accuracy of description-based severity prediction techniques could be significantly improved by using stack traces as a source of information. Objective In this study, we expand our previous work by exploring the effect of using categorical features, in addition to stack traces, to predict the severity of bugs. These categorical features include faulty product, faulty component, and operating system. We experimented with other features and observed that they do not improve the severity prediction accuracy. A Software system is composed of many products; each has a set of components. Components interact with each to provide the functionality of the product. The operating system field refers to the operating system on which the software was running on during the crash. Method The proposed approach uses a linear combination of stack trace and categorical features similarity to predict the severity. We adopted a cost sensitive K Nearest Neighbor approach to overcome the unbalance label distribution problem and improve the classifier accuracy. Results Our experiments on bug reports of Eclipse submitted between 2001 and 2015 and Gnome submitted between 1999 and 2015 show that the accuracy of our severity prediction approach can be improved from 5% to 20% by considering categorical features, in addition to stack traces. Conclusion The accuracy of predicting the severity of bugs is higher when combining stack traces and three categorical features, product, component, and operating system.

Journal ArticleDOI
TL;DR: This paper aims to investigate the current state of research on SoS architecting by synthesizing the demographic data, assessing the quality and the coverage of architecting activities and software quality attributes by the research, and distilling a concept map that reflects a community-wide understanding of the concept of SoS.
Abstract: Context: The term System of Systems (SoS) has increasingly been used in a wide variety of domains to describe those systems composed of independent constituent systems that collaborate towards a mission that they could not accomplish on their own. There is a significant volume of research by the software architecture community that aims to overcome the challenges involved in architecting SoS, as evidenced by the number of secondary studies in the field published so far. However, the boundaries of such research do not seem to be well defined, at least partially, due to the emergence of SoS-adjacent areas of interest like the Internet of Things.Objective: This paper aims to investigate the current state of research on SoS architecting by synthesizing the demographic data, assessing the quality and the coverage of architecting activities and software quality attributes by the research, and distilling a concept map that reflects a community-wide understanding of the concept of SoS. Method: We conduct what is, to the best of our understanding, the first tertiary study on SoS architecting. Such tertiary study was based on five research questions, and was performed by following the guidelines of Kitchenham et al. In all, 19 secondary studies were evaluated, which is comparable to other tertiary studies. Results: The study illustrates a state of disconnection in the research community, with research gaps in the coverage of particular phases and quality attributes. Furthermore, a more effective approach in classifying systems as SoS is required, as the means of resolving conceptual and terminological overlaps with the related domains. Conclusions: Despite the amount of research in the area of SoS architecting, more coordinated and systematic targeted efforts are required in order to address the identified issues with the current state of research.

Journal ArticleDOI
TL;DR: An automated approach to aid the development of SPLs from existing system variants to exploit the use of UML design models, that is, it is independent of the programming language, and supports the re-engineering process in the design level, allowing practitioners to have a broader view of the SPL.
Abstract: Context Software Product Lines (SPLs) are families of related products developed for specific domains. SPLs commonly emerge from existing variants when their individual maintenance and/or evolution become complex. Even though there exists a vast research literature on SPL extraction, the majority of the approaches have only focused on source code, are partially automated, or do not reflect domain constraints. Such limitations can make more difficult the extraction, management, documentation and generation of some important SPL artifacts such as the product line architecture, a fact that can impact negatively the evolution and maintenance of SPLs. Objective To tackle these limitations, this work presents ModelVars2SPL (Model Variants to SPL Core Assets), an automated approach to aid the development of SPLs from existing system variants. Method The input for ModelVars2SPL is a set of Unified Modeling Language (UML) class diagrams and the list of features they implement. The approach extracts two main assets: (i) Feature Model (FM), which represents the combinations of features, and (ii) a Product Line Architecture (PLA), which represents a global structure of the variants. ModelVars2SPL is composed of four automated steps. We conducted a thorough evaluation of ModelVars2SPL to analyze the artefacts it generates and its performance. Results The results show that the FMs well-represent the features organization, providing useful information to define and manage commonalities and variabilities. The PLAs show a global structure of current variants, facilitating the understanding of existing implementations of all variants. Conclusions An advantage of ModelVars2SPL is to exploit the use of UML design models, that is, it is independent of the programming language, and supports the re-engineering process in the design level, allowing practitioners to have a broader view of the SPL.

Journal ArticleDOI
TL;DR: This study finds that the performance of a team not only depends on the team personality composition, but also on the interactive effects of team climate, and investigates the relationship between team personality and team climate.
Abstract: Context: Previous research found that the performance of a team not only depends on the team personality composition, but also on the interactive effects of team climate. Although investigationon p ...

Journal ArticleDOI
TL;DR: PostFinder is introduced, an approach that analyzes the project under development to extract suitable context, and allows developers to retrieve messages from Stack Overflow being relevant to the API function calls that have already been invoked.
Abstract: Context – During the development of complex software systems, programmers look for external resources to understand better how to use specific APIs and to get advice related to their current tasks Stack Overflow provides developers with a broader insight into API usage as well as useful code examples Given the circumstances, tools and techniques for mining Stack Overflow are highly desirable Objective – In this paper, we introduce PostFinder, an approach that analyzes the project under development to extract suitable context, and allows developers to retrieve messages from Stack Overflow being relevant to the API function calls that have already been invoked Method – PostFinder augments posts with additional data to make them more exposed to queries On the client side, it boosts the context code with various factors to construct a query containing information needed for matching against the stored indexes Multiple facets of the data available are used to optimize the search process, with the ultimate aim of recommending highly relevant SO posts Results – The approach has been validated utilizing a user study involving a group of 12 developers to evaluate 500 posts for 50 contexts Experimental results indicate the suitability of PostFinder to recommend relevant Stack Overflow posts and concurrently show that the tool outperforms a well-established baseline Conclusions – We conclude that PostFinder can be deployed to assist developers in selecting relevant Stack Overflow posts while they are programming as well as to replace the module for searching posts in a code-to-code search engine

Journal ArticleDOI
TL;DR: A program slice-based binary code vulnerability intelligent detection system called BVDetector is proposed, which can effectively detect vulnerabilities related to library/API function calls in binary programs and reduces the false positive rate and false negative rate of vulnerability detection.
Abstract: Context Software vulnerability detection is essential to ensure cybersecurity. Currently, most software is published in binary form, thus researchers can only detect vulnerabilities in these software by analysing binary programs. Although existing research approaches have made a substantial contribution to binary vulnerability detection, there are still many deficiencies, such as high false positive rate, detection with coarse granularity, and dependence on expert experience. Objective The goal of this study is to perform fine-grained intelligent detection on the vulnerabilities in binary programs. This leads us to propose a fine-grained representation of binary programs and introduce deep learning techniques to intelligently detect the vulnerabilities. Method We use program slices of library/API function calls to represent binary programs. Additionally, we design and construct a Binary Gated Recurrent Unit (BGRU) network model to intelligently learn vulnerability patterns and automatically detect vulnerabilities in binary programs. Results This approach yields the design and implementation of a program slice-based binary code vulnerability intelligent detection system called BVDetector. We show that BVDetector can effectively detect vulnerabilities related to library/API function calls in binary programs, which reduces the false positive rate and false negative rate of vulnerability detection. Conclusion This paper proposes a program slice-based binary code vulnerability intelligent detection system called BVDetector. The experimental results show that BVDetector can effectively reduce the false negative rate and false positive rate of binary vulnerability detection.

Journal ArticleDOI
TL;DR: While the top popular viewpoints for the UML-based software architecture modeling are the functional (96%) and information (99%) viewpoints, the least popular one is the operational viewpoint that is considered by 26% of the practitioners.
Abstract: Context Software architecture viewpoints modularize the software architectures in terms of different viewpoints that each address a different concern. Unified Modeling Language (UML) is so popular among practitioners for modeling software architectures from different viewpoints. Objective In this paper, we aimed at understanding the practitioners’ UML usage for the modeling of software architectures from different viewpoints. Method To this end, 109 practitioners with diverse profiles have been surveyed to understand practitioners’ UML usage for six different viewpoints: functional, information, concurrency, development, deployment, and operational. Each viewpoint has been considered in terms of a set of software models that can be created in that viewpoint. Results The survey includes 35 questions for different viewpoint models, and the results lead to interesting findings. While the top popular viewpoints for the UML-based software architecture modeling are the functional (96%) and information (99%) viewpoints, the least popular one is the operational viewpoint that is considered by 26% of the practitioners. The top popular UML modeling tool is Enterprise Architect regardless of the viewpoints considered. Concerning the software models that can be created in each viewpoint, UML’s class diagram is practitioners’ top choice for the functional structure (71%), data structure (85%), concurrency structure (75%), software code structure (34%), and system installation (39%), and system support (16%) models; UML’s sequence diagram is the top choice for the data lifecycle models (47%); UML’s deployment diagram for the physical structure (71%), mapping between the functional and physical components (53%), and system migration (21%) models; UML’s activity diagram for the data flow (65%), software build and release processes (20–22%), and system administration (36%) models; UML’s component diagram for the mapping between the functional and concurrent components (35%), software module structure (47%), and system configuration (21%) models; and UML’s package diagram for the software module structure (47%) models.

Journal ArticleDOI
Abstract: Context To reduce manual effort of extracting test cases from natural-language requirements, many approaches based on Natural Language Processing (NLP) have been proposed in the literature. Given the large amount of approaches in this area, and since many practitioners are eager to utilize such techniques, it is important to synthesize and provide an overview of the state-of-the-art in this area. Objective Our objective is to summarize the state-of-the-art in NLP-assisted software testing which could benefit practitioners to potentially utilize those NLP-based techniques. Moreover, this can benefit researchers in providing an overview of the research landscape. Method To address the above need, we conducted a survey in the form of a systematic literature mapping (classification). After compiling an initial pool of 95 papers, we conducted a systematic voting, and our final pool included 67 technical papers. Results This review paper provides an overview of the contribution types presented in the papers, types of NLP approaches used to assist software testing, types of required input requirements, and a review of tool support in this area. Some key results we have detected are: (1) only four of the 38 tools (11%) presented in the papers are available for download; (2) a larger ratio of the papers (30 of 67) provided a shallow exposure to the NLP aspects (almost no details). Conclusion This paper would benefit both practitioners and researchers by serving as an “index” to the body of knowledge in this area. The results could help practitioners utilizing the existing NLP-based techniques; this in turn reduces the cost of test-case design and decreases the amount of human resources spent on test activities. After sharing this review with some of our industrial collaborators, initial insights show that this review can indeed be useful and beneficial to practitioners.

Journal ArticleDOI
TL;DR: A framework to automatically mine API usage scenarios from Stack Overflow, supported by three novel algorithms is proposed and implemented and deployed in the proof-of-concept online tool, Opiner.
Abstract: Context APIs play a central role in software development. The seminal research of Carroll et al. [15] on minimal manual and subsequent studies by Shull et al. [79] showed that developers prefer task-based API documentation instead of traditional hierarchical official documentation (e.g., Javadoc). The Q&A format in Stack Overflow offers developers an interface to ask and answer questions related to their development tasks. Objective With a view to produce API documentation, we study automated techniques to mine API usage scenarios from Stack Overflow. Method We propose a framework to mine API usage scenarios from Stack Overflow. Each task consists of a code example, the task description, and the reactions of developers towards the code example. First, we present an algorithm to automatically link a code example in a forum post to an API mentioned in the textual contents of the forum post. Second, we generate a natural language description of the task by summarizing the discussions around the code example. Third, we automatically associate developers reactions (i.e., positive and negative opinions) towards the code example to offer information about code quality. Results We evaluate the algorithms using three benchmarks. We compared the algorithms against seven baselines. Our algorithms outperformed each baseline. We developed an online tool by automatically mining API usage scenarios from Stack Overflow. A user study of 31 software developers shows that the participants preferred the mined usage scenarios in Opiner over API official documentation. The tool is available online at: http://opiner.polymtl.ca/ . Conclusion With a view to produce API documentation, we propose a framework to automatically mine API usage scenarios from Stack Overflow, supported by three novel algorithms. We evaluated the algorithms against a total of eight state of the art baselines. We implement and deploy the framework in our proof-of-concept online tool, Opiner.

Journal ArticleDOI
TL;DR: There is a need for more context and domain sensitive evaluations of code smells and anti-patterns, better guidelines for making trade-offs when applying design patterns or eliminating smells/anti- patterns in industry, and a unified, constantly updated, catalog of smells andAnti-Patterns.
Abstract: Context: In this paper, we investigate how developers discuss code smells and anti-patterns across three technical Stack Exchange sites. Understanding developers perceptions of these issues is important to inform and align future research efforts and direct tools vendors to design tailored tools that best suit developers. Method: we mined three Stack Exchange sites and used quantitative and qualitative methods to analyse more than 4000 posts that discuss code smells and anti-patterns.Results: results showed that developers often asked their peers to smell their code, thus utilising those sites as an informal, crowd-based code smell/anti-pattern detector. The majority of questions (556) asked were focused on smells like Duplicated Code, Spaghetti Code, God and Data Classes. In terms of languages, most of discussions centred around popular languages such as C # (772 posts), JavaScript (720) and Java (699), however greater support is available for Java compared to other languages (especially modern languages such as Swift and Kotlin). We also found that developers often discuss the downsides of implementing specific design patterns and ‘flag’ them as potential anti-patterns to be avoided. Some well-defined smells and anti-patterns are discussed as potentially being acceptable practice in certain scenarios. In general, developers actively seek to consider trade-offs to decide whether to use a design pattern, an anti-pattern or not.Conclusion: our results suggest that there is a need for: 1) more context and domain sensitive evaluations of code smells and anti-patterns, 2) better guidelines for making trade-offs when applying design patterns or eliminating smells/anti-patterns in industry, and 3) a unified, constantly updated, catalog of smells and anti-patterns. We conjecture that the crowd-based detection approach considers contextual factors and thus tend to be more trusted by developers than automated detection tools.

Journal ArticleDOI
TL;DR: The conclusions indicate that programmer's personality traits and knowledge collection behavior play a key role in shaping their intention to be creative and should be given due attention during the hiring process of creativity-oriented software companies.
Abstract: Context: Creativity is one of the essential ingredients in successful software engineering. However, majority of the work related to creativity in software engineering has focused on creativity in requirement engineering. Furthermore, there are very few studies that examine programmer creativity and the impact of individual and contextual factors on it. Objective: The objective of the study is to analyze the impact of the big five personality traits including extraversion, agreeableness, conscientiousness, neuroticism and openness to experience, as well as knowledge collection behavior on a programmer's creativity intention. Method: A quantitative survey was conducted and data from 294 programmers, working in offshore software development projects, was collected. The data was later analyzed using Smart-PLS (3.0). Results and Conclusions: The results indicated that openness to experience, extraversion, conscientiousness and knowledge collection behavior positively predicted a programmer's creativity intention. On the other hand, neuroticism negatively predicts creativity intention of the programmer. The study also concluded that all of the independent variables, except the agreeableness trait, significantly predict creativity intention which in turn significantly predicts creativity. As a result, our conclusions indicate that programmer's personality traits and knowledge collection behavior play a key role in shaping their intention to be creative. Hence, personality traits and knowledge collection behavior should be given due attention during the hiring process of creativity-oriented software companies.

Journal ArticleDOI
TL;DR: A microservice composition approach based on the choreography of BPMN fragments that is split into fragments in order to be executed through an event-based choreography form, providing the high degree of decoupling among microservices demanded in this type of architecture.
Abstract: Context Microservices must be composed to provide users with complex and elaborated functionalities It seems that the decentralized nature of microservices makes a choreography style more appropriate to achieve such cooperation, where lighter solutions based on asynchronous events are generally used However, a microservice composition based on choreography distributes the flow logic of the composition among microservices making further analysis and updating difficult, ie there is not a big picture of the composition that facilitates these tasks Business Process Model and Notation (BPMN) is the OMG standard developed to represent Business Processes (BPs), being widely used to define the big picture of such compositions However, BPMN is usually considered in orchestration-based solutions, and orchestration can be a drawback to achieve the decoupling pursued by a microservice architecture Objective Defining a microservice composition approach that allows us to create a composition in a BPMN model, which facilitates further analysis for taking engineering decisions, and execute them through an event-based choreography to have a high degree of decoupling and independence among microservices Method We followed a research methodology for information systems that consists of a 5-step process: awareness of the problem, suggestion, development, evaluation, and conclusion Results We presented a microservice composition approach based on the choreography of BPMN fragments On the one hand, we propose to describe the big picture of the composition with a BPMN model, providing a valuable mechanism to analyse it when engineering decisions need to be taken On the other hand, this model is split into fragments in order to be executed through an event-based choreography form, providing the high degree of decoupling among microservices demanded in this type of architecture This composition approach is supported by a microservice architecture defined to achieve that both descriptions of a composition (big picture and split one) coexist A realization of this architecture in Java/Spring technology is also presented Conclusions The evaluation that is done to our work allows us to conclude that the proposed approach for composing microservices is more efficient than solutions based on ad-hoc development

Journal ArticleDOI
TL;DR: This work has identified the technological challenges of designing human-in-the-loop CPSs and provided a method that helps designers to identify and specify how the human and the system should work together, focusing on the control strategies and interactions required.
Abstract: Context: Cyber-Physical Systems (CPSs) are gradually and widely introducing autonomous capabilities into everything. However, human participation is required to accomplish tasks that are better performed with humans (often called human-in-the-loop). In this way, human-in-the-loop solutions have the potential to handle complex tasks in unstructured environments, by combining the cognitive skills of humans with autonomous systems behaviors. Objective: The objective of this paper is to provide appropriate techniques and methods to help designers analyze and design human-in-the-loop solutions. These solutions require interactions that engage the human, provide natural and understandable collaboration, and avoid disturbing the human in order to improve human experience. Method: We have analyzed several works that identified different requirements and critical factors that are relevant to the design of human-in-the-loop solutions. Based on these works, we have defined a set of design principles that are used to build our proposal. Fast-prototyping techniques have been applied to simulate the designed human-in-the-loop solutions and validate them. Results: We have identified the technological challenges of designing human-in-the-loop CPSs and have provided a method that helps designers to identify and specify how the human and the system should work together, focusing on the control strategies and interactions required. Conclusions: The use of our approach facilitates the design of human-in-the-loop solutions. Our method is practical at earlier stages of the software life cycle since it allows domain experts to focus on the problem and not on the solution.