scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Services Computing in 2016"


Journal ArticleDOI
TL;DR: A novel adaptive filtering technique to determine the best way to combine direct trust and indirect trust dynamically to minimize convergence time and trust estimation bias in the presence of malicious nodes performing opportunistic service and collusion attacks is developed.
Abstract: A future Internet of Things (IoT) system will connect the physical world into cyberspace everywhere and everything via billions of smart objects. On the one hand, IoT devices are physically connected via communication networks. The service oriented architecture (SOA) can provide interoperability among heterogeneous IoT devices in physical networks. On the other hand, IoT devices are virtually connected via social networks. In this paper we propose adaptive and scalable trust management to support service composition applications in SOA-based IoT systems. We develop a technique based on distributed collaborative filtering to select feedback using similarity rating of friendship, social contact, and community of interest relationships as the filter. Further we develop a novel adaptive filtering technique to determine the best way to combine direct trust and indirect trust dynamically to minimize convergence time and trust estimation bias in the presence of malicious nodes performing opportunistic service and collusion attacks. For scalability, we consider a design by which a capacity-limited node only keeps trust information of a subset of nodes of interest and performs minimum computation to update trust. We demonstrate the effectiveness of our proposed trust management through service composition application scenarios with a comparative performance analysis against EigenTrust and PeerTrust.

362 citations


Journal ArticleDOI
TL;DR: This paper has demonstrated that CCAF multi-layered security can protect data in real-time and it has three layers of security: 1) firewall and access control; 2) identity management and intrusion prevention and 3) convergent encryption.
Abstract: Offering real-time data security for petabytes of data is important for cloud computing. A recent survey on cloud security states that the security of users' data has the highest priority as well as concern. We believe this can only be able to achieve with an approach that is systematic, adoptable and well-structured. Therefore, this paper has developed a framework known as Cloud Computing Adoption Framework (CCAF) which has been customized for securing cloud data. This paper explains the overview, rationale and components in the CCAF to protect data security. CCAF is illustrated by the system design based on the requirements and the implementation demonstrated by the CCAF multi-layered security. Since our Data Center has 10 petabytes of data, there is a huge task to provide real-time protection and quarantine. We use Business Process Modeling Notation (BPMN) to simulate how data is in use. The use of BPMN simulation allows us to evaluate the chosen security performances before actual implementation. Results show that the time to take control of security breach can take between 50 and 125 hours. This means that additional security is required to ensure all data is well-protected in the crucial 125 hours. This paper has also demonstrated that CCAF multi-layered security can protect data in real-time and it has three layers of security: 1) firewall and access control; 2) identity management and intrusion prevention and 3) convergent encryption. To validate CCAF, this paper has undertaken two sets of ethical-hacking experiments involved with penetration testing with 10,000 trojans and viruses. The CCAF multi-layered security can block 9,919 viruses and trojans which can be destroyed in seconds and the remaining ones can be quarantined or isolated. The experiments show although the percentage of blocking can decrease for continuous injection of viruses and trojans, 97.43 percent of them can be quarantined. Our CCAF multi-layered security has an average of 20 percent better performance than the single-layered approach which could only block 7,438 viruses and trojans. CCAF can be more effective when combined with BPMN simulation to evaluate security process and penetrating testing results.

253 citations


Journal ArticleDOI
TL;DR: A novel lightweight feature selection is proposed designed particularly for mining streaming data on the fly, by using accelerated particle swarm optimization (APSO) type of swarm search that achieves enhanced analytical accuracy within reasonable processing time.
Abstract: Big Data though it is a hype up-springing many technical challenges that confront both academic research communities and commercial IT deployment, the root sources of Big Data are founded on data streams and the curse of dimensionality. It is generally known that data which are sourced from data streams accumulate continuously making traditional batch-based model induction algorithms infeasible for real-time data mining. Feature selection has been popularly used to lighten the processing load in inducing a data mining model. However, when it comes to mining over high dimensional data the search space from which an optimal feature subset is derived grows exponentially in size, leading to an intractable demand in computation. In order to tackle this problem which is mainly based on the high-dimensionality and streaming format of data feeds in Big Data, a novel lightweight feature selection is proposed. The feature selection is designed particularly for mining streaming data on the fly, by using accelerated particle swarm optimization (APSO) type of swarm search that achieves enhanced analytical accuracy within reasonable processing time. In this paper, a collection of Big Data with exceptionally large degree of dimensionality are put under test of our new feature selection algorithm for performance evaluation.

202 citations


Journal ArticleDOI
TL;DR: This paper designs, implements, and evaluates a time series analysis approach that is able to decompose large scale mobile traffic into regularity and randomness components, and reveals that high predictability of the regularity component can be achieved, and demonstrates that the prediction of randomness component of mobile traffic data is impossible.
Abstract: Understanding and forecasting mobile traffic of large scale cellular networks is extremely valuable for service providers to control and manage the explosive mobile data, such as network planning, load balancing, and data pricing mechanisms. This paper targets at extracting and modeling traffic patterns of 9,000 cellular towers deployed in a metropolitan city. To achieve this goal, we design, implement, and evaluate a time series analysis approach that is able to decompose large scale mobile traffic into regularity and randomness components. Then, we use time series prediction to forecast the traffic patterns based on the regularity components. Our study verifies the effectiveness of our utilized time series decomposition method, and shows the geographical distribution of the regularity and randomness component. Moreover, we reveal that high predictability of the regularity component can be achieved, and demonstrate that the prediction of randomness component of mobile traffic data is impossible.

188 citations


Journal ArticleDOI
TL;DR: A feasible and truthful incentive mechanism (TIM), to coordinate the resource auction between mobile devices as service users (buyers) and cloudlets as service providers (sellers) is proposed and extended to a more efficient design of auction (EDA).
Abstract: Mobile cloud computing offers an appealing paradigm to relieve the pressure of soaring data demands and augment energy efficiency for future green networks. Cloudlets can provide available resources to nearby mobile devices with lower access overhead and energy consumption. To stimulate service provisioning by cloudlets and improve resource utilization, a feasible and efficient incentive mechanism is required to charge mobile users and reward cloudlets. Although auction has been considered as a promising form for incentive, it is challenging to design an auction mechanism that holds certain desirable properties for the cloudlet scenario. Truthfulness and system efficiency are two crucial properties in addition to computational efficiency, individual rationality and budget balance. In this paper, we first propose a feasible and truthful incentive mechanism (TIM), to coordinate the resource auction between mobile devices as service users (buyers) and cloudlets as service providers (sellers). Further, TIM is extended to a more efficient design of auction (EDA). TIM guarantees strong truthfulness for both buyers and sellers, while EDA achieves a fairly high system efficiency but only satisfies strong truthfulness for sellers. We also show the difficulties for the buyers to manipulate the resource auction in EDA and the high expected utility with truthful bidding.

144 citations


Journal ArticleDOI
TL;DR: A composition framework is defined by means of integration with fine-grained I/O service discovery that enables the generation of a graph-based composition which contains the set of services that are semantically relevant for an input-output request.
Abstract: In this paper we present a theoretical analysis of graph-based service composition in terms of its dependency with service discovery. Driven by this analysis we define a composition framework by means of integration with fine-grained I/O service discovery that enables the generation of a graph-based composition which contains the set of services that are semantically relevant for an input-output request. The proposed framework also includes an optimal composition search algorithm to extract the best composition from the graph minimising the length and the number of services, and different graph optimisations to improve the scalability of the system. A practical implementation used for the empirical analysis is also provided. This analysis proves the scalability and flexibility of our proposal and provides insights on how integrated composition systems can be designed in order to achieve good performance in real scenarios for the web.

134 citations


Journal ArticleDOI
TL;DR: A novel method of service selection, called the correlation-aware service pruning (CASP) method, which manages QoS correlations by accounting for all services that may be integrated into optimal composite services and prunes services that are not the optimal candidate services.
Abstract: QoS as an important criterion has attracted more and more attention in the service selection process. Various QoS-aware service selection methods have been proposed in recent years. However, few of them take into account of the QoS correlations between services, causing several performance issues. QoS correlations can be defined as that some QoS attributes of a service are not only dependent on the service itself but are also correlated to other services. Since such correlations will affect QoS values, it is important to study how to select appropriate candidate services while taking into account of QoS correlations when generating composite services with optimal QoS values. To this end, we propose a novel method of service selection, called the correlation-aware service pruning (CASP) method. It manages QoS correlations by accounting for all services that may be integrated into optimal composite services and prunes services that are not the optimal candidate services. Our experiments show that this method can manage complicated correlations between services and significantly improve the QoS values of the generated composite services.

132 citations


Journal ArticleDOI
TL;DR: The experimental results indicate that the proposed location-aware personalized CF method improves the QoS prediction accuracy and computational efficiency significantly, compared to previous CF-based methods.
Abstract: Collaborative Filtering (CF) is widely employed for making Web service recommendation. CF-based Web service recommendation aims to predict missing QoS (Quality-of-Service) values of Web services. Although several CF-based Web service QoS prediction methods have been proposed in recent years, the performance still needs significant improvement. First, existing QoS prediction methods seldom consider personalized influence of users and services when measuring the similarity between users and between services. Second, Web service QoS factors, such as response time and throughput, usually depends on the locations of Web services and users. However, existing Web service QoS prediction methods seldom took this observation into consideration. In this paper, we propose a location-aware personalized CF method for Web service recommendation. The proposed method leverages both locations of users and Web services when selecting similar neighbors for the target user or service. The method also includes an enhanced similarity measurement for users and Web services, by taking into account the personalized influence of them. To evaluate the performance of our proposed method, we conduct a set of comprehensive experiments using a real-world Web service dataset. The experimental results indicate that our approach improves the QoS prediction accuracy and computational efficiency significantly, compared to previous CF-based methods.

131 citations


Journal ArticleDOI
TL;DR: A deep computation model for feature learning on big data, which uses a tensor to model the complex correlations of heterogeneous data and is efficient to perform feature learning when evaluated using the STL-10, CUAVE, SANE and INEX datasets.
Abstract: Deep learning has been successfully applied to feature learning in speech recognition, image classification and language processing. However, current deep learning models work in the vector space, resulting in the failure to learn features for big data since a vector cannot model the highly non-linear distribution of big data, especially heterogeneous data. This paper proposes a deep computation model for feature learning on big data, which uses a tensor to model the complex correlations of heterogeneous data. To fully learn the underlying data distribution, the proposed model uses the tensor distance as the average sum-of-squares error term of the reconstruction error in the output layer. To train the parameters of the proposed model, the paper designs a high-order back-propagation algorithm (HBP) by extending the conventional back-propagation algorithm from the vector space to the high-order tensor space. To evaluate the performance of the proposed model, we carried out the experiments on four representative datasets by comparison with stacking auto-encoders and multimodal deep learning models. Experimental results clearly demonstrate that the proposed model is efficient to perform feature learning when evaluated using the STL-10, CUAVE, SANE and INEX datasets.

129 citations


Journal ArticleDOI
Liming Nie1, He Jiang1, Zhilei Ren1, Zeyi Sun1, Xiaochen Li1 
TL;DR: QECK as mentioned in this paper identifies software-specific expansion words from the high quality pseudo relevance feedback question and answer pairs on Stack Overflow to automatically generate the expansion queries, and incorporates QECK in the classic Rocchio's model, and propose QECK based code search method QECKRocchio.
Abstract: As code search is a frequent developer activity in software development practices, improving the performance of code search is a critical task. In the text retrieval based search techniques employed in the code search, the term mismatch problem is a critical language issue for retrieval effectiveness. By reformulating the queries, query expansion provides effective ways to solve the term mismatch problem. In this paper, we propose Query Expansion based on Crowd Knowledge (QECK), a novel technique to improve the performance of code search algorithms. QECK identifies software-specific expansion words from the high quality pseudo relevance feedback question and answer pairs on Stack Overflow to automatically generate the expansion queries. Furthermore, we incorporate QECK in the classic Rocchio's model, and propose QECK based code search method QECKRocchio . We conduct three experiments to evaluate our QECK technique and investigate QECKRocchio in a large-scale corpus containing real-world code snippets and a question and answer pair collection. The results show that QECK improves the performance of three code search algorithms by up to 64 percent in Precision, and 35 percent in NDCG. Meanwhile, compared with the state-of-the-art query expansion method, the improvement of QECK Rocchio is 22 percent in Precision, and 16 percent in NDCG.

121 citations


Journal ArticleDOI
TL;DR: This paper proposes a group-centric recommender system in the CPSS domain, which consists of activity-oriented group discovery, the revision of rating data for improved accuracy, and group preference modeling that supports sufficient context mining from multiple sources.
Abstract: In recent years, an extensive integration of cyber, physical and social spaces has been occurring. Cyber-Physical-Social Systems (CPSSs) have become the basic paradigm of evolution in the information industry, through which traditional computer science will evolve into cyber-physical-social computational science. Intelligent recommender systems, which are an important fundamental research topic in the CPSS field and one of the key techniques for the implementation of personalized and intelligent computing, have great significance in CPSS development. This paper proposes a group-centric recommender system in the CPSS domain, which consists of activity-oriented group discovery, the revision of rating data for improved accuracy, and group preference modeling that supports sufficient context mining from multiple sources. Through experiments, it is verified that the proposed recommender system is efficient, objective and accurate, thereby providing a strong foundation for personalized computing in the CPSS paradigm.

Journal ArticleDOI
TL;DR: The proposed BGM-BLA algorithm performs relatively well in terms of the Pareto sets obtained and computational time in comparison with two optimization algorithms, i.e., Non-dominated Sorting Genetic Algorithm (NSGA-II) and binary graph matching-based common-coding algorithm.
Abstract: Cloud computing is getting more prevalent and finding a way to reduce the cost of cloud computing platform through the migration of virtual machines (VM) is a concerned issue. In this paper, the problem of dynamic migration of VMs (DM-VM) in the cloud computing platform (or simply the cloud) is investigated. A triple-objective optimization model for DM-VM is established, which takes energy consumption, communication between VMs, and migration cost into account under the situation that the platform works normally. The DM-VM problem is divided into two parts: (i) forming VMs into groups, and (ii) determining the best way to place the groups into certain physical nodes. A binary graph matching-based bucket-code learning algorithm (BGM-BLA) is designed for solving the DM-VM problem. In BGM-BLA, bucket-coding and learning is employed for finding the optimal solutions, and binary graph matching is used for evaluating the candidate solutions. The computational results demonstrate that the proposed BGM-BLA algorithm performs relatively well in terms of the Pareto sets obtained and computational time in comparison with two optimization algorithms, i.e., Non-dominated Sorting Genetic Algorithm (NSGA-II) and binary graph matching-based common-coding algorithm.

Journal ArticleDOI
TL;DR: This paper determines some important characteristics of objective QoS datasets that have never been found before, and proposes a prediction algorithm to realize these characteristics, allowing the unknown QoS values to be predicted accurately.
Abstract: Quality of service (QoS) guarantee is an important component of service recommendation. Generally, some QoS values of a service are unknown to its users who has never invoked it before, and therefore the accurate prediction of unknown QoS values is significant for the successful deployment of web service-based applications. Collaborative filtering is an important method for predicting missing values, and has thus been widely adopted in the prediction of unknown QoS values. However, collaborative filtering originated from the processing of subjective data, such as movie scores. The QoS data of web services are usually objective, meaning that existing collaborative filtering-based approaches are not always applicable for unknown QoS values. Based on real world web service QoS data and a number of experiments, in this paper, we determine some important characteristics of objective QoS datasets that have never been found before. We propose a prediction algorithm to realize these characteristics, allowing the unknown QoS values to be predicted accurately. Experimental results show that the proposed algorithm predicts unknown web service QoS values more accurately than other existing approaches.

Journal ArticleDOI
TL;DR: A game based services price decision (GSPD) model which depicts the process of price decisions is proposed and can explain the price dynamics in the real world, and also can help decision makers a lot under various scenarios.
Abstract: In cyber-physical systems (CPS), service organizers (SOs) aim to collect service from service entities at lower price and provide better combined services to users. However, each entity receives payoffs when providing services, which leads to competition between SOs and service entities or within internal service entities. In this paper, we first formulate the price competition model of SOs where the SOs dynamically increase and decrease their service prices periodically according to the number of collected services from entities. A g ame based s ervices p rice d ecision (GSPD) model which depicts the process of price decisions is proposed in this paper. In the GSPD model, entities game with other entities under the rule of “survival of the fittest” and calculate payoffs according to their own payoff-matrix, which leads to a Pareto-optimal equilibrium point. Numerous experiments demonstrate that the GSPD model can explain the price dynamics in the real world, and also can help decision makers a lot under various scenarios.

Journal ArticleDOI
TL;DR: A probabilistic framework called TICRec that utilizes temporal influence correlations of both weekdays and weekends for time-aware location recommendations, and estimates a time probability density of a user visiting a new location without splitting the continuous time into discrete time slots to avoid the time information loss.
Abstract: In location-based social networks (LBSNs), time significantly affects users’ check-in behaviors, for example, people usually visit different places at different times of weekdays and weekends, e.g., restaurants at noon on weekdays and bars at midnight on weekends. Current studies use the temporal influence to recommend locations through dividing users’ check-in locations into time slots based on their check-in time and learning their preferences to locations in each time slot separately. Unfortunately, these studies generally suffer from two major limitations: (1) the loss of time information because of dividing a day into time slots and (2) the lack of temporal influence correlations due to modeling users’ preferences to locations for each time slot separately. In this paper, we propose a probabilistic framework called TICRec that utilizes temporal influence correlations (TIC) of both weekdays and weekends for time-aware location recommendations. TICRec not only recommends locations to users, but it also suggests when a user should visit a recommended location. In TICRec, we estimate a time probability density of a user visiting a new location without splitting the continuous time into discrete time slots to avoid the time information loss. To leverage the TIC, TICRec considers both user-based TIC (i.e., different users’ check-in behaviors to the same location at different times ) and location-based TIC (i.e., the same user's check-in behaviors to different locations at different times ). Finally, we conduct a comprehensive performance evaluation for TICRec using two real data sets collected from Foursquare and Gowalla. Experimental results show that TICRec achieves significantly superior location recommendations compared to other state-of-the-art recommendation techniques with temporal influence.

Journal ArticleDOI
TL;DR: A novel large-scale, context-aware recommender system that provides accurate recommendations, scalability to a large number of diverse users and items, differential services, and does not suffer from “cold start” problems is proposed.
Abstract: In this paper, we propose a novel large-scale, context-aware recommender system that provides accurate recommendations, scalability to a large number of diverse users and items, differential services, and does not suffer from “cold start” problems. Our proposed recommendation system relies on a novel algorithm which learns online the item preferences of users based on their click behavior, and constructs online item-cluster trees. The recommendations are then made by choosing an item-cluster level and then selecting an item within that cluster as a recommendation for the user. This approach is able to significantly improve the learning speed when the number of users and items is large, while still providing high recommendation accuracy. Each time a user arrives at the website, the system makes a recommendation based on the estimations of item payoffs by exploiting past context arrivals in a neighborhood of the current user's context. It exploits the similarity of contexts to learn how to make better recommendations even when the number and diversity of users and items is large. This also addresses the cold start problem by using the information gained from similar users and items to make recommendations for new users and items. We theoretically prove that the proposed algorithm for item recommendations converges to the optimal item recommendations in the long-run. We also bound the probability of making a suboptimal item recommendation for each user arriving to the system while the system is learning. Experimental results show that our approach outperforms the state-of-the-art algorithms by over 20 percent in terms of click through rates.

Journal ArticleDOI
TL;DR: A cloud service composition framework that selects the optimal composition based on an end user's long-term Quality of Service (QoS) requirements is proposed that uses QoS time series' inter correlations and performs a novel time series group similarity approach on the predicted QoS values.
Abstract: We propose a cloud service composition framework that selects the optimal composition based on an end user's long-term Quality of Service (QoS) requirements. In a typical cloud environment, existing solutions are not suitable when service providers fail to provide the long-term QoS provision advertisements. The proposed framework uses a new multivariate QoS analysis to predict the long-term QoS provisions from service providers’ historical QoS data and short-term advertisements represented using Time Series . The quality of the QoS prediction is improved by incorporating QoS attributes’ intra correlations into the multivariate analysis. To select the optimal service composition, the proposed framework uses QoS time series’ inter correlations and performs a novel time series group similarity approach on the predicted QoS values. Experiments are conducted on real QoS dataset and results prove the efficiency of the proposed approach.

Journal ArticleDOI
TL;DR: An algorithm is developed by combining particle swarm optimization and k-means clustering and runs in parallel using MapReduce in the Hadoop platform, obtaining the optimum service composition in significantly less time than alternative algorithms.
Abstract: The proliferation of mobile computing and smartphone technologies has resulted in an increasing number and range of services from myriad service providers. These mobile service providers support numerous emerging services with differing quality metrics but similar functionality. Facilitating an automated service workflow requires fast selection and composition of services from the services pool. The mobile environment is ambient and dynamic in nature, requiring more efficient techniques to deliver the required service composition promptly to users. Selecting the optimum required services in a minimal time from the numerous sets of dynamic services is a challenge. This work addresses the challenge as an optimization problem. An algorithm is developed by combining particle swarm optimization and k-means clustering. It runs in parallel using MapReduce in the Hadoop platform. By using parallel processing, the optimum service composition is obtained in significantly less time than alternative algorithms. This is essential for handling large amounts of heterogeneous data and services from various sources in the mobile environment. The suitability of this proposed approach for big data-driven service composition is validated through modeling and simulation.

Journal ArticleDOI
TL;DR: A mobility model, a mobility-aware QoS computation rule, and a Mobility-enabled selection algorithm with teaching-learning-based optimization are proposed that can obtain near-optimal solutions and has a nearly linear algorithmic complexity with respect to the problem size.
Abstract: Mobile business is becoming a reality due to ubiquitous Internet connectivity, popular mobile devices, and widely available cloud services. However, characteristics of the mobile environment, such as mobility, unpredictability, and variation of mobile network's signal strength, present challenges in selecting optimal services for composition. Traditional QoS-aware methods that select individual services with the best QoS may not always result in the best composite service because constant mobility makes the performance of service invocation unpredictable and location-based. This paper discusses the challenges of this problem and defines it in a formal way. To solve this new research problem, we propose a mobility model, a mobility-aware QoS computation rule, and a mobility-enabled selection algorithm with teaching-learning-based optimization. The experimental simulation results demonstrate that our approach can obtain better solutions than current standard composition methods in mobile environments. The approach can obtain near-optimal solutions and has a nearly linear algorithmic complexity with respect to the problem size.

Journal ArticleDOI
TL;DR: The experimental results show the agility and accuracy of the proposed crawler framework, SmartCrawler, which efficiently retrieves deep-web interfaces from large-scale sites and achieves higher harvest rates than other crawlers.
Abstract: As deep web grows at a very fast pace, there has been increased interest in techniques that help efficiently locate deep-web interfaces. However, due to the large volume of web resources and the dynamic nature of deep web, achieving wide coverage and high efficiency is a challenging issue. We propose a two-stage framework, namely SmartCrawler , for efficient harvesting deep web interfaces. In the first stage, SmartCrawler performs site-based searching for center pages with the help of search engines, avoiding visiting a large number of pages. To achieve more accurate results for a focused crawl, SmartCrawler ranks websites to prioritize highly relevant ones for a given topic. In the second stage, SmartCrawler achieves fast in-site searching by excavating most relevant links with an adaptive link-ranking. To eliminate bias on visiting some highly relevant links in hidden web directories, we design a link tree data structure to achieve wider coverage for a website. Our experimental results on a set of representative domains show the agility and accuracy of our proposed crawler framework, which efficiently retrieves deep-web interfaces from large-scale sites and achieves higher harvest rates than other crawlers.

Journal ArticleDOI
TL;DR: This paper proposes a novel web service recommendation approach incorporating a user's potential QoS preferences and diversity feature of user interests on web services, and presents an innovative diversity-aware web service ranking algorithm to rank theweb service candidates based on their scores, and diversity degrees derived from the web service graph.
Abstract: The last decade has witnessed a tremendous growth of web services as a major technology for sharing data, computing resources, and programs on the web. With the increasing adoption and presence of web services, design of novel approaches for effective web service recommendation to satisfy users’ potential requirements has become of paramount importance. Existing web service recommendation approaches mainly focus on predicting missing QoS values of web service candidates which are interesting to a user using collaborative filtering approach, content-based approach, or their hybrid. These recommendation approaches assume that recommended web services are independent to each other, which sometimes may not be true. As a result, many similar or redundant web services may exist in a recommendation list. In this paper, we propose a novel web service recommendation approach incorporating a user's potential QoS preferences and diversity feature of user interests on web services. User's interests and QoS preferences on web services are first mined by exploring the web service usage history. Then we compute scores of web service candidates by measuring their relevance with historical and potential user interests, and their QoS utility. We also construct a web service graph based on the functional similarity between web services. Finally, we present an innovative diversity-aware web service ranking algorithm to rank the web service candidates based on their scores, and diversity degrees derived from the web service graph. Extensive experiments are conducted based on a real world web service dataset, indicating that our proposed web service recommendation approach significantly improves the quality of the recommendation results compared with existing methods.

Journal ArticleDOI
TL;DR: This paper investigates a multiple view aware approach to trace clustering, based on a co-training strategy, and shows that the presented algorithm is able to discover a clustering pattern of the log, such that related traces result appropriately clustered.
Abstract: Process mining refers to the discovery, conformance, and enhancement of process models from event logs currently produced by several information systems (e.g. workflow management systems). By tightly coupling event logs and process models, process mining makes it possible to detect deviations, predict delays, support decision making, and recommend process redesigns.Event logs are data sets containing the executions (called traces) of a business process. Several process mining algorithms have been defined to mine event logs and deliver valuable models (e.g. Petri nets) of how logged processes are being executed. However, they often generate spaghetti-like process models, which can be hard to understand. This is caused by the inherent complexity of real-life processes, which tend to be less structured and more flexible than what the stakeholders typically expect. In particular, spaghetti-like process models are discovered when all possible behaviors are shown in a single model as a result of considering the set of traces in the event log all at once.To minimize this problem, trace clustering can be used as a preprocessing step. It splits up an event log into clusters of similar traces, so as to handle variability in the recorded behavior and facilitate process model discovery. In this paper, we investigate a multiple view aware approach to trace clustering, based on a co-training strategy. In an assessment, using benchmark event logs, we show that the presented algorithm is able to discover a clustering pattern of the log, such that related traces result appropriately clustered. We evaluate the significance of the formed clusters using established machine learning and process mining metrics.

Journal ArticleDOI
TL;DR: The state-of-the-art of the proxy re-encryption is reviewed by investigating the design philosophy, examining the security models and comparing the efficiency and security proofs of existing schemes.
Abstract: Never before have data sharing been more convenient with the rapid development and wide adoption of cloud computing. However, how to ensure the cloud user’s data security is becoming the main obstacles that hinder cloud computing from extensive adoption. Proxy re-encryption serves as a promising solution to secure the data sharing in the cloud computing. It enables a data owner to encrypt shared data in cloud under its own public key, which is further transformed by a semitrusted cloud server into an encryption intended for the legitimate recipient for access control. This paper gives a solid and inspiring survey of proxy re-encryption from different perspectives to offer a better understanding of this primitive. In particular, we reviewed the state-of-the-art of the proxy re-encryption by investigating the design philosophy, examining the security models and comparing the efficiency and security proofs of existing schemes. Furthermore, the potential applications and extensions of proxy re-encryption have also been discussed. Finally, this paper is concluded with a summary of the possible future work.

Journal ArticleDOI
TL;DR: Experimental results show that the proposed heuristics can reduce 24 percent of the service renting cost than the compared algorithms on the test benchmarks at most for non-shareable services.
Abstract: In XaaS clouds, resources as services (e.g., infrastructure, platform and software as a service) are sold to applications such as scientific and big data analysis workflows. Candidate services with various configurations (CPU type, memory size, number of machines and so on) for the same task may have different execution time and cost. Further, some services are priced rented by intervals that be shared among tasks of the same workflow to save service rental cost. Establishing a task-mode (service) mapping (to get a balance between time and cost) and tabling tasks on rented service instances are crucial for minimizing the client-oriented cost to rent services for the whole workflow. In this paper, a multiple complete critical-path based heuristic (CPIS) is developed for the task-mode mapping problem. A list based heuristic (LHCM) concerning the task processing cost and task-slot matching is developed for tabling tasks on service instances based on the result of task-mode mapping. Then, the effectiveness of the proposed CPIS is compared with that of the previously proposed CPIL, the existing state-of-the-art heuristics including PCP, SC-PCP (an extension to PCP), DET, and CPLEX. The effectiveness of the proposed LHCM is evaluated with its use with different task-mode mapping algorithms. Experimental results show that the proposed heuristics can reduce 24 percent of the service renting cost than the compared algorithms on the test benchmarks at most for non-shareable services. In addition, half of the service renting cost could be saved when LHCM is applied to consolidate tasks on rented service instances.

Journal ArticleDOI
TL;DR: This paper proposes a novel approach that can introduce branch structures into composite solutions to cope with uncertainty in the service composition process and employs and extends the original form of Graphplan to tackle this problem.
Abstract: By arranging multiple existing web services into workflows to create value-added services, automatic web service composition has received much attention in service-oriented computing. A large number of methods have been proposed for it although most of them are merely based on the matching of input-output parameters of services. Besides these parameters, some other elements can affect the execution of services and their composition, such as the preconditions and service execution results. In particular, the execution effects of some services are often uncertain because of the complex and dynamically changing application environments in the real world, and this can cause the emergence of nondeterministic choices in the workflows of composite services. However, the previous methods for automatic service composition mainly rely on sequential structures, which make them difficult to take into account uncertain effects during service composition. In this paper, Graphplan is employed and extended to tackle this problem. In order to model services with uncertain effects, we first extend the original form of Graphplan. Then, we propose a novel approach that can introduce branch structures into composite solutions to cope with such uncertainty in the service composition process. Extensive experiments are performed to evaluate and analyze the proposed methodology.

Journal ArticleDOI
TL;DR: A multilevel index model for large-scale service repositories, which can be used to reduce the execution time of service discovery and composition and validate that the proposed model is more efficient than the existing structures, i.e., sequential and inverted index ones.
Abstract: The number of web services has grown drastically. Then how to manage them efficiently in a service repository is an important issue to address. Given a special field, there often exists an efficient data structure for a class of objects, e.g., the Google’ Bigtable is very suitable for webpages’ storage and management. Based on the theory of the equivalence relations and quotient sets, this work proposes a multilevel index model for large-scale service repositories, which can be used to reduce the execution time of service discovery and composition. Its novel use of keys as inspired by the key in relational database can effectively remove the redundancy of the commonly-used inverted index. Its four function-based operations are for the first time proposed to manage and maintain services in a repository. The experiments validate that the proposed model is more efficient than the existing structures, i.e., sequential and inverted index ones.

Journal ArticleDOI
Zhaohui Wu1, Jianwei Yin1, Shuiguang Deng1, Jian Wu1, Ying Li1, Liang Chen1 
TL;DR: A technical framework is proposed that facilities addressing the scientific issues and technical challenges on crossover service realization and illustrates how a new cloud-based middleware platform, named JTang++, supports the realization of crossover services.
Abstract: Modern service industry (MSI) is an information and knowledge intensive service industry, relying on information technology and modern management philosophy. The development of MSI is of high significance for promoting rapid global economy, accelerating social progress, and building an innovation-oriented society and harmonious realm. This paper first presents the history and trends of MIS in China in terms of the worldwide MSI development status. It then proposes the concept of Crossover Services based upon a survey of 62 MSI-related listed firms in China, whose business models, products, and services evidently exploit the concept. After elaborating on the crossover, convergence, and complex characteristics of crossover services, it proposes a technical framework that facilities addressing the scientific issues and technical challenges on crossover service realization. Finally, it illustrates how a new cloud-based middleware platform, named JTang++, supports the realization of crossover services.

Journal ArticleDOI
TL;DR: This paper presents Map- Reduce implementations of two well-known process mining algorithms to take advantage of the scalability of the Map-Reduce approach and presents the design of a series of mappers and reducers to compute the log-based ordering relations from distributed event logs.
Abstract: Process discovery is an approach to extract process models from event logs. Given the distributed nature of modern information systems, event logs are likely to be distributed across different physical machines. Map-Reduce is a scalable approach for efficient computations on distributed data. In this paper we present Map-Reduce implementations of two well-known process mining algorithms to take advantage of the scalability of the Map-Reduce approach. We present the design of a series of mappers and reducers to compute the log-based ordering relations from distributed event logs. These can then be used to discover a process model. We provide experimental results that show the performance and scalability of our implementations.

Journal ArticleDOI
TL;DR: This work proposes an integrated skyline query processing method that is shortened by three times, when compared with two state-of-the-art methods, to discover qualified services and compose them with guaranteed quality of service (QoS) over multiple clouds.
Abstract: A cloud mashup is composed of multiple services with shared datasets and integrated functionalities. For example, the elastic compute cloud (EC2) provided by Amazon Web Service (AWS), the authentication and authorization services provided by Facebook, and the Map service provided by Google can all be mashed up to deliver real-time, personalized driving route recommendation service. To discover qualified services and compose them with guaranteed quality of service (QoS), we propose an integrated skyline query processing method for building up cloud mashup applications. We use a similarity test to achieve optimal localized skyline. This mashup method scales well with the growing number of cloud sites involved in the mashup applications. Faster skyline selection, reduced composition time, dataset sharing, and resources integration assure the QoS over multiple clouds. We experiment with the quality of web service (QWS) benchmark over 10,000 web services along six QoS dimensions. By utilizing block-elimination, data-space partitioning, and service similarity pruning, the skyline process is shortened by three times, when compared with two state-of-the-art methods.

Journal ArticleDOI
TL;DR: This paper proposes a novel technique which leverages activity dependences in traces to discover processes with concurrency even if the logs fail to meet the completeness criteria, and calls for a weaker notion of completeness.
Abstract: Process mining, especially process discovery, has been utilized to extract process models from event logs. One challenge faced by process discovery is to identify concurrency effectively. State-of-the-art approaches employ activity orders in traces to undertake process discovery and they require stringent completeness notions of event logs. Thus, they may fail to extract appropriate processes when event logs cannot meet the completeness criteria. To address this problem, we propose in this paper a novel technique which leverages activity dependences in traces. Based on the observation that activities with no dependencies can be executed in parallel, our technique is in a position to discover processes with concurrencies even if the logs fail to meet the completeness criteria. That is, our technique calls for a weaker notion of completeness. We evaluate our technique through experiments on both real-world and synthetic event logs, and the conformance checking results demonstrate the effectiveness of our technique and its relative advantages compared with state-of-the-art approaches.