scispace - formally typeset
Search or ask a question

Showing papers in "International Journal of Software Engineering and Knowledge Engineering in 2015"


Journal ArticleDOI
Jihun Park1, Dongwon Seo1, Gwangui Hong1, Donghwan Shin1, Jimin Hwa1, Doo-Hwan Bae1 
TL;DR: Software planning is very important for the success of a software project, even if the same developers work on the same project, the time span of the project and the quality of software may change.
Abstract: Software planning is very important for the success of a software project. Even if the same developers work on the same project, the time span of the project and the quality of software may change ...

21 citations


Journal ArticleDOI
TL;DR: The dynamic fault tree (DFT) formalism is adopted and it is shown how cost-effective software rejuvenation schedules can be created to keep the system reliability consistently staying above a predefined critical level.
Abstract: Correctly measuring the reliability and availability of a cloud-based system is critical for evaluating its system performance. Due to the promised high reliability of physical facilities provided for cloud services, software faults have become one of the major factors for the failures of cloud-based systems. In this paper, we focus on the software aging phenomenon where system performance may be progressively degraded due to exhaustion of system resources, fragmentation and accumulation of errors. We use a proactive technique, called software rejuvenation, to counteract the software aging problem. The dynamic fault tree (DFT) formalism is adopted to model the system reliability before and during a software rejuvenation process in an aging cloud-based system. A novel analytical approach is presented to derive the reliability function of a cloud-based Hot SPare (HSP) gate, which is further verified using Continuous Time Markov Chains (CTMC) for its correctness. We use a case study of a cloud-based system to illustrate the validity of our approach. Based on the reliability analytical results, we show how cost-effective software rejuvenation schedules can be created to keep the system reliability consistently staying above a predefined critical level.

17 citations


Journal ArticleDOI
TL;DR: A survey on EFSM-based test case generation techniques in the last two decades is provided and several possible research areas in the future are presented.
Abstract: Model-based testing has been intensively and extensively studied in the past decades. Extended Finite State Machine (EFSM) is a widely used model of software testing in both academy and industry. This paper provides a survey on EFSM-based test case generation techniques in the last two decades. All techniques in EFSM-based test case generation are mainly classified into three parts: test sequence generation, test data generation, and test oracle construction. The key challenges, such as coverage criterion and feasibility analysis in EFSM-based test case generation are discussed. Finally, we summarize the research work and present several possible research areas in the future.

16 citations


Journal ArticleDOI
TL;DR: End-user development (EUD) is drawing an increasing attention due to the necessity of users to frequently extend and personalize their applications.
Abstract: End-user development (EUD) is drawing an increasing attention due to the necessity of users to frequently extend and personalize their applications. In particular, EUD in the context of Web (EUDWeb...

15 citations


Journal ArticleDOI
TL;DR: Results showed inspection ability does not depend on educational background and technical knowledge, and LS’s can aid software managers to create high performance inspection team(s) and manage software quality.
Abstract: Inspections of software artifacts during early software development aids managers to detect early faults that may be hard to find and fix later. Results showed inspection ability does not depend on educational background and technical knowledge. This paper presents the results from an industrial empirical study, wherein the Learning Styles (i.e. ability to perceive and process information) of individual inspectors were manipulated to measure its impact on the fault detection effectiveness of inspection teams. Using inspection data from professional developers, we developed virtual teams with varying LS’s of individual inspectors and analyzed the team performance. The results from the current study show that teams of inspectors with diverse LS’s are significantly more effective at detecting faults as compared to teams of inspectors with similar LS’s. Therefore, LS’s can aid software managers to create high performance inspection team(s) and manage software quality.

13 citations


Journal ArticleDOI
TL;DR: The idea that outliers not only indicate the need for system redesign, but explicitly point out to problematic design spots is illustrated, which extends the applicability of linear algebra spectral methods to Modularity Matrices, at higher software abstraction levels than previously shown.
Abstract: Modularity Matrices for software systems can be put in block-diagonal form, where blocks are higher-level software modules, in a hierarchy of modules. But the exact module boundaries are often blurred by the uncertainty whether given matrix elements are module members or outliers. This paper provides an algorithm to determine module sizes. As a consequence the algorithm also decides which matrix elements are outliers. Matrix elements are weighted by their Affinity — an exponential function of the off-diagonality. The module size is given by the positive consecutive elements of the eigenvectors corresponding to the largest eigenvalues of this weighted symmetrized Modularity Matrix. By means of case studies, we illustrate the idea that outliers not only indicate the need for system redesign, but explicitly point out to problematic design spots. This work extends the applicability of linear algebra spectral methods to Modularity Matrices, at higher software abstraction levels than previously shown.

12 citations


Journal ArticleDOI
TL;DR: National happiness has been actively studied throughout the past years and the factors used in this work include both physical needs an...
Abstract: National happiness has been actively studied throughout the past years. The happiness factor varies due to different human perspectives. The factors used in this work include both physical needs an...

9 citations


Journal ArticleDOI
TL;DR: This work shows a suggestion of Scrum gamification together with an evaluation of the proposed approach in a case study of a software house, in order to change its use to a more amusing task, by taking advantage of the gamification trend.
Abstract: Software development is sometimes considered a boring task. To avoid this fact we propose an approach based on the incorporation of game mechanics into Scrum framework, in order to change its use to a more amusing task, by taking advantage of the gamification trend. Gamification is applied to non-game applications and processes, trying to encourage people to adopt them. This work shows a suggestion of Scrum gamification together with an evaluation of the proposed approach in a case study of a software house. The use of this concept can help the software industry to increase the team productivity in a natural way.

9 citations


Journal ArticleDOI
TL;DR: Fine-grained techniques are still required to support ever-present and complex model comparison tasks during the evolution of design models.
Abstract: Context: Model comparison plays a central role in many software engineering activities. However, a comprehensive understanding about the state-of-the-art is still required. Goal: This paper aims at classifying and performing a thematic analysis of the current literature. Method: For this, we have followed well-established empirical guidelines to define and perform a systematic mapping study. Results: Some studies (14 out of 40) provide generic model comparison techniques, rather than specific ones for UML diagrams. Conclusion: Fine-grained techniques are still required to support ever-present and complex model comparison tasks during the evolution of design models.

9 citations


Journal ArticleDOI
TL;DR: This paper proposes PSTMiner, which considers the nature of data streams and provides an efficient classifier for predicting the class label of real data streams, and proposes a compact novel tree structure called PSTree (Prefix Streaming Tree) for storing data.
Abstract: Data stream associative classification poses many challenges to the data mining community. In this paper, we address four major challenges posed, namely, infinite length, extraction of knowledge with single scan, processing time, and accuracy. Since data streams are infinite in length, it is impractical to store and use all the historical data for training. Mining such streaming data for knowledge acquisition is a unique opportunity and even a tough task. A streaming algorithm must scan data once and extract knowledge. While mining data streams, processing time, and accuracy have become two important aspects. In this paper, we propose PSTMiner which considers the nature of data streams and provides an efficient classifier for predicting the class label of real data streams. It has greater potential when compared with many existing classification techniques. Additionally, we propose a compact novel tree structure called PSTree (Prefix Streaming Tree) for storing data. Extensive experiments conducted on 24 real datasets from UCI repository and synthetic datasets from MOA (Massive Online Analysis) show that PSTMiner is consistent. Empirical results show that performance of PSTMiner is highly competitive in terms of accuracy and performance time when compared with other approaches under windowed streaming model.

8 citations


Journal ArticleDOI
TL;DR: Results demonstrate that ReliefF (RF) is the most stable feature selection method and wrapper based feature subset selection shows least stability, and as the overlap of partitions increased, the stability of the feature selection strategies increased.
Abstract: Software quality modeling is the process of using software metrics from previous iterations of development to locate potentially faulty modules in current under-development code. This has become an important part of the software development process, allowing practitioners to focus development efforts where they are most needed. One difficulty encountered in software quality modeling is the problem of high dimensionality, where the number of available software metrics is too large for a classifier to work well. In this case, many of the metrics may be redundant or irrelevant to defect prediction results, thereby selecting a subset of software metrics that are the best predictors becomes important. This process is called feature (metric) selection. There are three major forms of feature selection: filter-based feature rankers, which uses statistical measures to assign a score to each feature and present the user with a ranked list; filter-based feature subset evaluation, which uses statistical measures on feature subsets to find the best feature subset; and wrapper-based subset selection, which builds classification models using different subsets to find the one which maximizes performance. Software practitioners are interested in which feature selection methods are best at providing the most stable feature subset in the face of changes to the data (here, the addition or removal of instances). In this study we select feature subsets using fifteen feature selection methods and then use our newly proposed Average Pairwise Tanimoto Index (APTI) to evaluate the stability of the feature selection methods. We evaluate the stability of feature selection methods on a pair of subsamples generated by our fixed-overlap partitions algorithm. Four different levels of overlap are considered in this study. 13 software metric datasets from two real-world software projects are used in this study. Results demonstrate that ReliefF (RF) is the most stable feature selection method and wrapper based feature subset selection shows least stability. In addition, as the overlap of partitions increased, the stability of the feature selection strategies increased.

Journal ArticleDOI
TL;DR: The proposed EVM technique through the integration of historical cost performance data of processes as a means to improve the project's cost predictability was more accurate and more precise than the traditional technique for calculating the Cost Performance Index (CPI) and Estimates at Completion (EAC).
Abstract: Although the Earned Value Management (EVM) technique has been used by several companies in various industrial sectors (software development, construction, aerospace, aeronautics, among others) for over 35 years to predict time and cost outcomes, many studies have found vulnerabilities, including: (i) cost performance data do not always have normal distribution, which makes reliable projections difficult; (ii) instability of cost performance indexes during the execution of projects, (iii) there is a worsening tendency in cost performance indexes when project approaches termination. This paper proposes an extension of the EVM technique through the integration of historical cost performance data of processes as a means to improve the project's cost predictability. The proposed technique was evaluated through an empirical study, which evaluated the implementation of the proposed technique in 22 software development projects. The proposed technique has been applied in real projects with the aim of evaluating the accuracy and variation compared to the traditional technique. Hypotheses tests with 95% significance level were performed, and the proposed technique was more accurate and more precise than the traditional technique for calculating the Cost Performance Index (CPI) and Estimates at Completion (EAC).

Journal ArticleDOI
TL;DR: WDCG-CA makes full use of the structural and quantitative information of WDCG, and avoids wrong compositions and arbitrary partitions successfully in the process of reconstructing software architecture in most cases in terms of the four metrics.
Abstract: Software architecture reconstruction plays an important role in software reuse, evolution and maintenance. Clustering is a promising technique for software architecture reconstruction. However, the representation of software, which serves as clustering input, and the clustering algorithm need to be improved in real applications. The representation should contain appropriate and adequate information of software. Furthermore, the clustering algorithm should be adapted to the particular demands of software architecture reconstruction well. In this paper, we first extract Weighted Directed Class Graph (WDCG) to represent object-oriented software. WDCG is a structural and quantitative representation of software, which contains not only the static information of software source code but also the dynamic information of software execution. Then we propose a WDCG-based Clustering Algorithm (WDCG-CA) to reconstruct high-level software architecture. WDCG-CA makes full use of the structural and quantitative information of WDCG, and avoids wrong compositions and arbitrary partitions successfully in the process of reconstructing software architecture. We introduce four metrics to evaluate the performance of WDCG-CA. The results of the comparative experiments show that WDCG-CA outperforms the comparative approaches in most cases in terms of the four metrics.

Journal ArticleDOI
TL;DR: The method for the Assessment of eXperience (MAX), which through cards and a board assists software engineers in gathering UX data while motivating users to report their experience, showed that the method is useful for evaluating the UX of finished/prototyped applications from the point of view of users and software engineers.
Abstract: User Experience (UX) is an important attribute for the success and quality of a software application. UX explores how an application is used and the emotional and behavioral consequences of such use. Although several UX evaluation methods allow understanding the reasons for a poor UX, some of them are tedious or too intrusive, making the evaluation unpleasant. This paper presents the Method for the Assessment of eXperience (MAX), which through cards and a board assists software engineers in gathering UX data while motivating users to report their experience. We conducted two pilot studies to verify the feasibility of MAX, which showed that the method is useful for evaluating the UX of finished/prototyped applications from the point of view of users and software engineers.

Journal ArticleDOI
TL;DR: Three data preprocessing approaches, in which feature selection is combined with data sampling, to overcome high dimensionality and class imbalance in the context of software quality estimation are investigated.
Abstract: Defect prediction is an important process activity frequently used for improving the quality and reliability of software products. Defect prediction results provide a list of fault-prone modules which are necessary in helping project managers better utilize valuable project resources. In the software quality modeling process, high dimensionality and class imbalance are the two potential problems that may exist in data repositories. In this study, we investigate three data preprocessing approaches, in which feature selection is combined with data sampling, to overcome these problems in the context of software quality estimation. These three approaches are: Approach 1 — sampling performed prior to feature selection, but retaining the unsampled data instances; Approach 2 — sampling performed prior to feature selection, retaining the sampled data instances; and Approach 3 — sampling performed after feature selection. A comparative investigation is presented for evaluating the three approaches. In the experiments, we employed three sampling methods (random undersampling, random oversampling, and synthetic minority oversampling), each combined with a filter-based feature subset selection technique called correlation-based feature selection. We built the defect prediction models using five common classification algorithms. The case study was based on software metrics and defect data collected from multiple releases of a real-world software system. The results demonstrated that the type of sampling methods used in data preprocessing significantly affected the performance of the combination approaches. It was found that when the random undersampling technique was used, Approach 1 performed better than the other two approaches. However, when the feature selection technique was used in conjunction with an oversampling method (random oversampling or synthetic minority oversampling), we strongly recommended Approach 3.

Journal ArticleDOI
TL;DR: This paper investigated thirty wrapper-based feature selection methods to remove irrelevant and redundant software metrics used for building defect predictors and demonstrated that Best Arithmetic Mean is the best performance metric used within the wrapper.
Abstract: The basic measurements for software quality control and management are the various project and software metrics collected at various states of a software development life cycle. The software metrics may not all be relevant for predicting the fault proneness of software components, modules, or releases. Thus creating the need for the use of feature (software metric) selection. The goal of feature selection is to find a minimum subset of attributes that can characterize the underlying data with results as well as, or even better than the original data when all available features are considered. As an example of inter-disciplinary research (between data science and software engineering), this study is unique in presenting a large comparative study of wrapper-based feature (or attribute) selection techniques for building defect predictors. In this paper, we investigated thirty wrapper-based feature selection methods to remove irrelevant and redundant software metrics used for building defect predictors. In this study, these thirty wrappers vary based on the choice of search method (Best First or Greedy Stepwise), leaner (Naive Bayes, Support Vector Machine, and Logistic Regression), and performance metric (Overall Accuracy, Area Under ROC (Receiver Operating Characteristic) Curve, Area Under the Precision-Recall Curve, Best Geometric Mean, and Best Arithmetic Mean) used in the defect prediction model evaluation process. The models are trained using the three learners and evaluated using the five performance metrics. The case study is based on software metrics and defect data collected from a real world software project. The results demonstrate that Best Arithmetic Mean is the best performance metric used within the wrapper. Naive Bayes performed significantly better than Logistic Regression and Support Vector Machine as a wrapper learner on slightly and less imbalanced datasets. We also recommend Greedy Stepwise as a search method for wrappers. Moreover, comparing to models built with full datasets, the performances of defect prediction models can be improved when metric subsets are selected through a wrapper subset selector.

Journal ArticleDOI
TL;DR: The experimental results demonstrate that feature selection is important and needed prior to the learning process, and the ensemble feature ranking method generally has better or similar performance than the average of the base ranking techniques, and more importantly, the ensemble method exhibits better robustness than most baseranking techniques.
Abstract: Defect prediction is very challenging in software development practice. Classification models are useful tools that can help for such prediction. Classification models can classify program modules into quality-based classes, e.g. fault-prone (fp) or not-fault-prone (nfp). This facilitates the allocation of limited project resources. For example, more resources are assigned to program modules that are of poor quality or likely to have a high number of faults based on the classification. However, two main problems, high dimensionality and class imbalance, affect the quality of training datasets and therefore classification models. Feature selection and data sampling are often used to overcome these problems. Feature selection is a process of choosing the most important attributes from the original dataset. Data sampling alters the dataset to change its balance level. Another technique, called boosting (building multiple models, with each model tuned to work better on instances misclassified by previous models), is found to also be effective for resolving the class imbalance problem. In this study, we investigate an approach for combining feature selection with this ensemble learning (boosting) process. We focus on two different scenarios: feature selection performed prior to the boosting process and feature selection performed inside the boosting process. Ten individual base feature ranking techniques, as well as an ensemble ranker based on the ten, are examined and compared over the two scenarios. We also employ the boosting algorithm to construct classification models without performing feature selection and use the results as the baseline for further comparison. The experimental results demonstrate that feature selection is important and needed prior to the learning process. In addition, the ensemble feature ranking method generally has better or similar performance than the average of the base ranking techniques, and more importantly, the ensemble method exhibits better robustness than most base ranking techniques. As for the two scenarios, the results show that applying feature selection inside boosting performs better than using feature selection prior to boosting.

Journal ArticleDOI
TL;DR: A neural network approach is proposed to first investigate the relationship between available system resources and system workload and then to forecast future available system Resources under the real-world situation where workload changes dynamically over time.
Abstract: Software aging refers to the phenomenon that software systems show progressive performance degradation or a sudden crash after longtime execution. It has been reported that this phenomenon is closely related to the exhaustion of system resources. This paper quantitatively studies available system resources under the real-world situation where workload changes dynamically over time. We propose a neural network approach to first investigate the relationship between available system resources and system workload and then to forecast future available system resources. Experimental results on data sets collected from real-world computer systems demonstrate that the proposed approach is effective.

Journal ArticleDOI
TL;DR: A feature-level CIA approach using Formal Concept Analysis (FCA) applied to SPL evolution is proposed and the effectiveness of the technique is shown in terms of the most commonly used metrics on the subject.
Abstract: Software Product Line Engineering (SPLE) is a systematic reuse approach to develop a short time-to-market and quality products, called Software Product Line (SPL). Usually, a SPL is not developed from scratch but it is developed by reusing features (resp. their implementing source code elements) of existing similar systems previously developed by ad-hoc reuse techniques. The features implementations that are reused may be changed for developing new products (SPL) using SPLE. Any code element can be a part of (shared by) different features implementations; modifying one feature's implementation can thus impact others. Therefore, feature-level Change Impact Analysis (CIA) is important to predict affected features for change management purpose. In this paper, we propose a feature-level CIA approach using Formal Concept Analysis (FCA) applied to SPL evolution. In our experimental evaluation using three case studies of different domains and sizes, we show the effectiveness of our technique in terms of the most commonly used metrics on the subject.

Journal ArticleDOI
TL;DR: CoMEP is designed, a platform for supporting collaboration in empirical research on software evolution by shared knowledge, and lessons learned from the application of the platform in a large research programme are reported.
Abstract: Methods for supporting evolution of software-intensive systems are a competitive edge in software engineering as software is often operated over decades. Empirical research is useful to validate the effectiveness of these methods. However, empirical studies on software evolution are rarely comprehensive and hardly replicable. Collaboration may prevent these shortcomings. We designed CoCoMEP — a platform for supporting collaboration in empirical research on software evolution by shared knowledge. We report lessons learned from the application of the platform in a large research programme.

Journal ArticleDOI
TL;DR: This paper presents three reusable solutions at detailed design and programming level in order to effectively implement the Abort Operation, Progress Feedback and Preferences usability functionalities in web applications.
Abstract: Usability is a software system quality attribute. Although software engineers originally considered usability to be related exclusively to the user interface, it was later found to affect the core functionality of software applications. As of then, proposals for addressing usability at different stages of the software development cycle were researched. The objective of this paper is to present three reusable solutions at detailed design and programming level in order to effectively implement the Abort Operation, Progress Feedback and Preferences usability functionalities in web applications. To do this, an inductive research method was applied. We developed three web applications including the above usability functionalities as case studies. We looked for commonalities across the implementations in order to induce a general solution. The elements common to all three developed applications include: application scenarios, functionalities, responsibilities, classes, methods, attributes and code snippets. The findings were specified as an implementation-oriented design pattern and as programming patterns in three languages. Additional case studies were conducted in order to validate the proposed solution. The independent developers used the patterns to implement different applications for each case study. As a result, we found that solutions specified as patterns can be reused to develop web applications.

Journal ArticleDOI
TL;DR: The self-join operation in relational database (RDB) to realize Web service clustering uses the semantic reasoning relationship between concepts and the concept status path to do the calculation, which can improve the calculation accuracy.
Abstract: In the era of service-oriented software engineering (SOSE), service clustering is used to organize Web services, and it can help to enhance the efficiency and accuracy of service discovery. In order to improve the efficiency and accuracy of service clustering, this paper uses the self-join operation in relational database (RDB) to realize Web service clustering. Based on storing service information, it does the self-join operation towards the Input, Output, Precondition, Effect (IOPE) tables of Web services, which can enhance the efficiency of computing services similarity. The semantic reasoning relationship between concepts and the concept status path are used to do the calculation, which can improve the calculation accuracy. Finally, we use experiments to validate the effectiveness of the proposed methods.

Journal ArticleDOI
TL;DR: A bibliometric-based approach is presented and implemented to quantitatively review the progress in global cloud computing research with the related literature during 2007–2013 from the databases of Science Citation Index Expanded (SCI-E), Conference Proceedings Citation Index–Science (CPCI-S), and IEEEXplore.
Abstract: Cloud computing has been a mainstream solution for the processing and storage of mass data, as well as an exciting area for research. As a novel business model, cloud computing has dramatically changed the provision of services and IT capacity by means of the advanced techniques. In recent years, with the increasing research interests and rapid growth of publications, some review papers provide detailed analysis on cloud computing area. In this paper, a bibliometric-based approach is presented and implemented to quantitatively review the progress in global cloud computing research with the related literature during 2007–2013 from the databases of Science Citation Index Expanded (SCI-E), Conference Proceedings Citation Index–Science (CPCI-S), and IEEEXplore. Our work is motivated by the purpose of tracing global advancement in terms of research content, geographic distribution and issue time of the related publications, rather than a specific technological area in cloud computing research. By investigating the characteristics of publications such as keywords, output, geographic distribution and affiliation, we draw some valuable conclusions to guide the further research. The experimental results show that the top 5 active research points of cloud computing concentrate on virtualization, security, mobile cloud, distributed computing, and scheduling. From the location-time aspect, China, USA, and India have published most of the papers, dominate cloud computing research and keep a high level on the international research cooperation. And there is a great increase in publication outputs especially in China and USA. Meanwhile, the analysis results demonstrate the top 3 high-cited research institutes of the University of Melbourne, University of California. Berkeley and University of Vienna in cloud computing research. The mobile cloud will be a future research hotspot and promising application field.

Journal ArticleDOI
TL;DR: This paper proposes a reliable and secure distributed cloud data storage schema using Reed-Solomon codes that relies on multiple cloud service providers (CSP), and protects users’ cloud data from the client side, and demonstrates the feasibility of the approach.
Abstract: Despite the popularity and many advantages of using cloud data storage, there are still major concerns about the data stored in the cloud, such as security, reliability and confidentiality. In this paper, we propose a reliable and secure distributed cloud data storage schema using Reed-Solomon codes. Different from existing approaches to achieving data reliability with redundancy at the server side, our proposed mechanism relies on multiple cloud service providers (CSP), and protects users’ cloud data from the client side. In our approach, we view multiple cloud-based storage services as virtual independent disks for storing redundant data encoded with erasure codes. Since each CSP has no access to a user’s complete data, the data stored in the cloud would not be easily compromised. Furthermore, the failure or disconnection of a CSP will not result in the loss of a user’s data as the missing data pieces can be readily recovered. To demonstrate the feasibility of our approach, we developed a prototype distributed cloud data storage application using three major CSPs. The experimental results show that, besides the reliability and security related benefits of our approach, the application outperforms each individual CSP for uploading and downloading files.

Journal ArticleDOI
TL;DR: Experiments demonstrate that the CFG-Match approach outperforms the comparative approaches in the detection of Java program plagiarism and is more accurate and robust against semantics-preserving transformations.
Abstract: Measuring program similarity plays an important role in solving many problems in software engineering. However, because programs are instruction sequences with complex structures and semantic functions and furthermore, programs may be obfuscated deliberately through semantics-preserving transformations, measuring program similarity is a difficult task that has not been adequately addressed. In this paper, we propose a new approach to measuring Java program similarity. The approach first measures the low-level similarity between basic blocks according to the bytecode instruction sequences and the structural property of the basic blocks. Then, an error-tolerant graph matching algorithm that can combat structure transformations is used to match the Control Flow Graphs (CFG) based on the basic block similarity. The high-level similarity between Java programs is subsequently calculated on the matched pairs of the independent paths extracted from the optimal CFG matching. The proposed CFG-Match approach is compared with a string-based approach, a tree-based approach and a graph-based approach. Experimental results show that the CFG-Match approach is more accurate and robust against semantics-preserving transformations. The CFG-Match approach is used to detect Java program plagiarism. Experiments on the collection of benchmark program pairs collected from the students’ submission of project assignments demonstrate that the CFG-Match approach outperforms the comparative approaches in the detection of Java program plagiarism.

Journal ArticleDOI
TL;DR: This paper presents an architecture-based model of software reliability and performance that explicitly considers a two-stage fault recovery mechanism implementing component restarts and application-level retries and suggests that the model can be used to quantify the impact of software fault recovery and correlated component failures on application reliability andperformance.
Abstract: High reliability and performance are essential attributes of software systems designed for critical real-time applications. To improve the reliability and performance of software, many systems incorporate some form of fault recovery mechanism. However, contemporary models of software reliability and performance rarely consider these fault recovery mechanisms. Another notable shortcoming of many software models is that they make the simplifying assumption that component failures are statistically independent, which disagrees with several experimental studies that have shown that the failures of software components can exhibit correlation. This paper presents an architecture-based model of software reliability and performance that explicitly considers a two-stage fault recovery mechanism implementing component restarts and application-level retries. The application architecture is characterized by a Discrete Time Markov Chain (DTMC) to represent the dynamic branching behavior of control between the components of the application. Correlations between the component failures are computed with an efficient numerical algorithm for a multivariate Bernoulli (MVB) distribution. We illustrate the utility of the model through a case study of an embedded software application. The results suggest that the model can be used to quantify the impact of software fault recovery and correlated component failures on application reliability and performance.

Journal ArticleDOI
TL;DR: The Featherweight Visual Scenarios (FVS) as mentioned in this paper is a declarative and graphical language based on scenarios, which is a possible alternative to specify behavioral properties and is better suited for validation tasks.
Abstract: Property specification is still one of the most challenging tasks for transference of software verification technology. The use of patterns has been proposed in order to hide the complicated handling of formal languages from the developer. However, this goal is not entirely satisfied. When validating the desired property the developer may have to deal with the pattern representation in some particular formalism. For this reason, we identify four desirable quality attributes for the underlying specification language: succinctness, comparability, complementariness, and modifiability. We show that typical formalisms such as temporal logics or automata fail at some extent to support these features. Given this context we introduce Featherweight Visual Scenarios (FVS), a declarative and graphical language based on scenarios, as a possible alternative to specify behavioral properties. We illustrate FVS applicability by modeling all the specification patterns and we thoroughly compare FVS to other known approaches, showing that FVS specifications are better suited for validation tasks. In addition, we augment pattern specification by introducing the concept of violating behavior. Finally we characterize the type of properties that can be written in FVS and we formally introduce its syntax and semantics.

Journal ArticleDOI
TL;DR: The translation rules that translate the multi-agent model to an executable PROMELA model are formally defined, and the translation with an example is demonstrated.
Abstract: This paper presents a methodology for analyzing multi-agent systems modeled in nested predicate transition nets. The objective is to automate the model analysis for complex systems, and provide a foundation for tool development. We formally define the translation rules that translate the multi-agent model to an executable PROMELA model, and demonstrate the translation with an example.

Journal ArticleDOI
TL;DR: The main contributions of this paper consist of the formalization of IPR Model that enable the shortening of the activities for the IPR resolution, and avoid the assignment of conflicting rights/permissions during IPR model formalization, and thus of licensing.
Abstract: Multimedia services of cultural institutions need to be supported by content, metadata and workflow management systems to efficiently manage huge amount of content items and metadata production. Online digital libraries and cultural heritage institutions, as well as portals of publishers need an integrated multimedia back office in order to aggregate content collection and provide them to national and international aggregators, with respect to Intellectual Property Rights, IPR. The aim of this paper is to formalize and discuss requirements, modeling, design and validation of an institutional aggregator for metadata and content, coping with IPR Models for conditional access and providing content towards Europeana, the European international aggregator. This paper presents the identification of the Content Aggregator requirements for content management and IPR, and thus the definition and realization of a corresponding distributed architecture and workflow solution satisfying them. The main contributions of this paper consist of the formalization of IPR Model that enable the shortening of the activities for the IPR resolution, and avoid the assignment of conflicting rights/permissions during IPR model formalization, and thus of licensing. The proposed solution, models and tools have been validated in the case of the ECLAP service and results are reported in the paper. ECLAP Content Aggregator has been established by the European Commission to serve Europeana for the thematic area of Performing Arts institutions.

Journal ArticleDOI
TL;DR: To the best of the knowledge, this is the first work of learning specifications from object-oriented programs dynamically based on probabilistic models and it learns specifications in an online mode, which can refine existing models continuously.
Abstract: Class temporal specification is a kind of important program specifications especially for object-oriented programs, which specifies that interface methods of a class should be called in a particular sequence. Currently, most existing approaches mine this kind of specifications based on finite state automaton. Observed that finite state automaton is a kind of deterministic models with inability to tolerate noise. In this paper, we propose to mine class temporal specifications relying on a probabilistic model extending from Markov chain. To the best of our knowledge, this is the first work of learning specifications from object-oriented programs dynamically based on probabilistic models. Different from similar works, our technique does not require annotating programs. Additionally, it learns specifications in an online mode, which can refine existing models continuously. Above all, we talk about problems regarding noise and connectivity of mined models and a strategy of computing thresholds is proposed to resolve them. To investigate our technique's feasibility and effectiveness, we implemented our technique in a prototype tool ISpecMiner and used it to conduct several experiments. Results of the experiments show that our technique can deal with noise effectively and useful specifications can be learned. Furthermore, our method of computing thresholds provides a strong assurance for mined models to be connected.