scispace - formally typeset
Search or ask a question

Showing papers in "International Journal of Grid and Distributed Computing in 2013"


Journal ArticleDOI
TL;DR: This work focuses on comparing many employed and proposed pricing models techniques and highlights the pros and cons of each, and finds that most approaches are theoretical and not implemented in the real market, although their simulation results are very promising.
Abstract: Cloud computing is emerging as a promising field offering a variety of computing services to end users. These services are offered at different prices using various pricing schemes and techniques. End users will favor the service provider offering the best QoS with the lowest price. Therefore, applying a fair pricing model will attract more customers and achieve higher revenues for service providers. This work focuses on comparing many employed and proposed pricing models techniques and highlights the pros and cons of each. The comparison is based on many aspects such as fairness, pricing approach, and utilization period. Such an approach provides a solid ground for designing better models in the future. We have found that most approaches are theoretical and not implemented in the real market, although their simulation results are very promising. Moreover, most of these approaches are biased toward the service provider.

227 citations


Journal ArticleDOI
TL;DR: The basic principles of standard PSO are elaborated and the existing work on the convergence analyses of PSO in the literatures is thoroughly surveyed, which plays an important role in establishing the solid theoretical foundation for PSO algorithm.
Abstract: Particle swarm optimization (PSO) is a population-based stochastic optimization originating from artificial life and evolutionary computation. PSO is motivated by the social behavior of organisms, such as bird flocking, fish schooling and human social relations. Its properties of low constraint on the continuity of objective function and ability of adapting to the dynamic environment make PSO become one of the most important swarm intelligence algorithms. However, compared to the various version of modified PSO and the corresponding applications in many domains, there has been very little research on the PSO’s convergence analysis. So the current paper, to begin with, elaborates the basic principles of standard PSO. Then the existing work on the convergence analyses of PSO in the literatures is thoroughly surveyed, which plays an important role in establishing the solid theoretical foundation for PSO algorithm. In the end, some important conclusions and possible research directions of PSO that need to be studied in the future are proposed.

52 citations


Journal ArticleDOI
TL;DR: Key techniques to reduce the energy consumption and CO2 emission that can cause severe health issues are discussed and various existing architectures related to green cloud are discussed with their pros and cons.
Abstract: Cloud computing provides computing power and resources as a service to users across the globe. This scheme was introduced as a means to an end for customer’s worldwide, providing high performance at a cheaper cost when compared to dedicated high-performance computing machines. This provision requires huge data-centers to be tightly-coupled with the system, the increasing use of which yields heavy consumption of energy and huge emission of CO2. Since energy has been a prime concern of late, this issue generated the importance of green cloud computing that provides techniques and algorithms to reduce energy wastage by incorporating its reuse. In this survey we discuss key techniques to reduce the energy consumption and CO2 emission that can cause severe health issues. We begin with a discussion on green matrices appropriate for data-centers and then throw light on green scheduling algorithms that facilitate reduction in energy consumption and CO2 emission levels in the existing systems. At the same time the various existing architectures related to green cloud also discussed in this paper with their pros and cons.

47 citations


Journal ArticleDOI
TL;DR: This paper presents a comprehensive literature review of MCC and its security issues and challenges and concludes that a lot work has to be done to produce a security prone MCC environment.
Abstract: Cloud computing is proving itself an emerging technology in IT world which provides a novel business model for organizations to utilize softwares, applications and hardware resources without any upfront investment. Few years later with the broad development in mobile applications and advancements in cloud computing, a new expansion is being expected in the form of mobile cloud computing (MCC). MCC provides a platform where mobile users make use of cloud services on mobile devices. The use of MCC minimizes the performance, compatibility, and lack of resources issues in mobile computing environment. Despite the astonishing advancement achieved by MCC, the users of MCC are still below expectations because of the associated risks in terms of security and privacy. These risks are playing important role by preventing the organizations to adopt MCC environment. Significant amount of research is in progress in order to reduce the security concerns but still a lot work has to be done to produce a security prone MCC environment. This paper presents a comprehensive literature review of MCC and its security issues and challenges.

35 citations


Journal ArticleDOI
Tao Gu, Chuang Zuo, Qun Liao, Yulu Yang, Tao Li 
TL;DR: It is proved that the proposal of this paper reduces data transmission overhead effectively with theoretical analysis, and experiment results on real applications show that the data prefetching mechanism can reduce data transmission time by up to 94%.
Abstract: MapReduce is an effective programming model for large-scale data-intensive computing applications. Hadoop, an open-source implementation of MapReduce, has been widely used. The communication overhead from the big data sets’ transmission affects the performance of Hadoop greatly. In consideration of data locality, Hadoop schedules tasks to the nodes near the data locations preferentially to decrease data transmission overhead, which works well in homogeneous and dedicated MapReduce environments. However, due to practical considerations about cost and resource utilization, it is common to maintain heterogeneous clusters or share resources by multiple users. Unfortunately, it’s difficult to take advantage of data locality in these heterogeneous or shared environments. To improve the performance of MapReduce in heterogeneous or shared environments, a data prefetching mechanism is proposed in this paper, which can fetch the data to corresponding compute nodes in advance. It is proved that the proposal of this paper reduces data transmission overhead effectively with theoretical analysis. The mechanism is implemented and evaluated on Hadoop-1.0.4. Experiment results on real applications show that the data prefetching mechanism can reduce data transmission time by up to 94%.

15 citations


Journal Article
TL;DR: Three strategies are presented to build the bloomfilter for the large datasets using MapReduce with cost models characterized in order to find out the way of improving the performance of two-way and multi-way joins.
Abstract: The MapReduce framework has been widely used to process and analyze largescale datasets over large clusters. As an essential problem, join operation among large clusters attracts more and more attention in recent years due to the utilization of MapReduce. Many strategies have been proposed to improve the efficiency of distributed join, among which bloomfilter is a successful one. However, the bloomfilter’s potential has not yet been fully exploited, especially in the MapReduce environment. In this paper, three strategies are presented to build the bloomfilter for the large datasets using MapReduce. Based on these strategies, we design two algorithms for two-way join and one algorithm for multi-way join. The experimental results show that our algorithms can significantly improve the efficiency of current join algorithm. Moreover, cost models of these algorithms are characterized in order to find out the way of improving the performance of two-way and multi-way joins.

15 citations


Journal ArticleDOI
TL;DR: The model introduces the concept of exclusive managerial role, which extends access control from static protection on resources to dynamic authorization on managerial roles, and describes the approach of role permission activation systematically.
Abstract: In view of malicious insider attacks on cloud computing environments, a new ContextAware Access Control Model for cloud computing (CAACM) was presented According to the characteristic of cloud computing, we take spatial state, temporal state and platform trust level as context The model establishes mechanisms of authorization from cloud management role to objects, which enables dynamic activation of role permission by associating cloud management role with context It also achieves fine-grained access control on cloud objects by supervising the permission of management role in full life cycle Moreover, it introduces the concept of exclusive managerial role, which extends access control from static protection on resources to dynamic authorization on managerial roles Further, it describes the approach of role permission activation systematically CAACM formally proves to be safe and it lays the groundwork for the deployment of CAACM in cloud computing systems

14 citations


Journal ArticleDOI
TL;DR: The experimental results show that RRK algorithm, compared with MTN algorithm and Weight algorithm, cannot only assure that the tasks can be preferentially assigned to those virtual resources of high relevance, but also enable the tasks with less average waiting time for assignments of virtual resource groups.
Abstract: Load balancing is one of key techniques in the virtual cluster system. In view of the fact that resource relevance has not been considered in the load balancing algorithm under current virtual cluster application environment, this paper proposes a load balancing algorithm with key resource relevance (RRK). Firstly, virtual resources are divided into groups by category. Then, considering the relevance between user tasks and each virtual resource group as well as the integrated load of each virtual resource group, the priorities of the tasks assigned to each virtual resource group are dynamically calculated, and thus the tasks can be assigned to the corresponding virtual resource group based on the priorities; while those tasks will be distributed depending on load values of the virtual resources and the weight values of resources needed to be consumed. The experimental results show that: RRK algorithm, compared with MTN algorithm and Weight algorithm, cannot only assure that the tasks can be preferentially assigned to those virtual resources of high relevance, but also enable the tasks with less average waiting time for assignments of virtual resource groups.

14 citations


Journal ArticleDOI
TL;DR: This paper surveys the state-of-the-art routing protocols and gives a comparison result of them with respect to the important challenging issues and highlights performance issues of each routing technique.
Abstract: Directional antennas have the potential to provide a fundamental breakthrough in ad hoc network capacity. Omni-directional nature of transmission restricts the network capacity, where distribution of energy in all directions other than the intended direction of the destination node not only generated unnecessary interference to other neighboring nodes but also decreases the potential range of transmission. Directional antenna systems are increasingly being recognized as a powerful way of increasing the capacity, connectivity, and covertness of MANETs. In this paper, we survey the state-of-the-art routing protocols and give a comparison result of them with respect to the important challenging issues. We study the advantages and disadvantages of the routing protocols using directional antenna and also highlight performance issues of each routing technique. At the end, an explicit comparison table is presented and discussed.

11 citations


Journal ArticleDOI
TL;DR: This paper proposes a novel hybrid distributed energy-efficient heterogeneous clustered protocol for wireless sensor networks (HDEEHC), which achieves a longer lifetime and more stability than HEED clustering protocols in heterogeneous environments.
Abstract: The clustering Algorithm is a key technique used to prolong the lifetime of a sensor network by reducing energy consumption. It can prolong the network lifetime and improve scalability. In this paper, we propose a novel hybrid distributed energy-efficient heterogeneous clustered protocol for wireless sensor networks (HDEEHC). The HDEEHC protocol periodically selects cluster heads according to a hybrid of a primary parameter and a secondary parameter. The residual energy and the type of a node is the first parameter in the election of a cluster head, and the proximity to its neighbors or node degree is the second. The nodes which have high initial and residual energy will have more chances to be the cluster-heads than the low-energy nodes. The clustering does not depend on the network topology or size. Finally, the simulation results show that HDEEHC achieves a longer lifetime and more stability than HEED clustering protocols in heterogeneous environments.

9 citations


Journal ArticleDOI
TL;DR: Two robust watermarking algorithms for the Chinese text image are presented, one embeds watermark by modulating the character spacings and the other is based on the relative heights of characters in the same text line.
Abstract: Digital text watermarking has been a popular way to discourage illicit reproduction of documents by embedding copyright information into them. This study presents two robust watermarking algorithms for the Chinese text image. One embeds watermark by modulating the character spacings and the other is based on the relative heights of characters in the same text line. In the embedding process, the characters are segmented firstly by projecting the image horizontally and then vertically. And the rough segmentation is refined according to the peculiarity of Chinese characters. Then on the basis of character segmentation algorithm, watermark embedding is achieved by shifting characters up or down (left or right). In the extracting process, pre-process operations such as the binarization and image deskewing algorithm are done first to reduce the impact caused by print-scan operation. Then the messages are extracted by comparing the character spacings or relative heights of characters. Experimental results show that the proposed methods possess high extraction accuracy under the tampering of print-scan operations.

Journal ArticleDOI
TL;DR: An active method based on frequency – reactive power feedback islanding detection is proposed, which breaks the existing islands of stability point in the case of the power balance and detects the system frequency deviation by real time.
Abstract: Islanding detection is one of the essential features for photovoltaic grid-connected systems; detection performance is directly related to safe operation of the equipment. Summarize existing islanding detection, then an active method based on frequency – reactive power feedback islanding detection is proposed. Once an islanding condition occurs, the method introducing a load frequency - reactive power characteristics associated component as input of reactive power perturbation, breaks the existing islands of stability point in the case of the power balance. Then it detects the system frequency deviation by real time, and draws into the function of reactive perturbation to form a positive feedback loop. So the frequency drifting is accelerated and the frequency of the output voltage of the photovoltaic grid-connected inverter exceeds a preset threshold value, and then the islanding will be detected. Simulation shows that the proposed method has no detection dead zone and it is fast and has little influence on power quality.

Journal ArticleDOI
TL;DR: New trends in the present day business environment are analyzed alongside the hardware and software industry that led to the development of SaaS model; and the characteristics and features that a storage pattern for multi tenant system in SAAS needs to possess in order to put this concept into practice are looked into.
Abstract: Heading towards the next decade, a major paradigm shift has been observed in the way the software services are being provided to the enterprises and corporate sector. Corporations and enterprises are switching to www host applications being offered as a service by software vendors and on-premises LOB (Line of Business) applications are taking a toll back. SaaS (Software as a Service) is the new concept. Adapting of SaaS, however, requires that the applications which are being provided as a service should be generalized for users or groups of users and it would require a vast space to be allocated to user or user group. The users or user groups ordinarily correspond to a company or group of companies/businesses and are termed as tenants. In this regard, the architecture of SaaS applications needs to be customized to support certain characteristics — e.g., configurability, maintainability and scalability — to support high storage for hosting resources made available to diverse number of users. This paper, firstly, analyzes new trends in the present day business environment alongside the hardware and software industry that led to the development of SaaS model; and then looks into the characteristics and features that a storage pattern for multi tenant system in SaaS needs to possess in order to put this concept into practice.

Journal Article
TL;DR: This paper defines cloud service access control as a process and extends the cloud client related information into a fuzzy set as the authentication condition for the exchange, according to the amount of information security level.
Abstract: Cloud computing provides elastic, scalable on-demand IT services for individuals and organizations. In the cloud computing, everything is as the service. Cloud clients enjoy convenience and efficiency service, but at the same time encounter new issues. Cloud clients needs to provide authentication information to access to service, which often contains a lot of sensitive information. The service provided by Cloud is dynamic, diverse, and context-related. The traditional identity authentication methods which implement coarse-grained to allow or prohibit access is no longer to adapt service-oriented cloud computing. In this paper, we propose a service-oriented identity authentication privacy protection method. In the method, we define cloud service access control as a process and extending the cloud client related information into a fuzzy set as the authentication condition for the exchange, according to the amount of information security level, dynamic opening the corresponding service access control and providing fine-grained service-oriented identity authentication, guarantees global minimal sensitive information disclosure, and maximally protects individual privacy.

Journal ArticleDOI
TL;DR: A new heuristic algorithm for scheduling meta-tasks in grid computing system is presented which tries to consider the execution time and machine state simultaneously by a mapping function.
Abstract: Grid computing is a promising technology for future computing platforms and is expected to provide easier access to remote computational resources that are usually locally limited. Scheduling is one of the active research topics in grid environments. The goal of grid task scheduling is to achieve high system throughput and to allocate various computing resources to applications. The Complexity of scheduling problem increases with the size of the grid and becomes highly difficult to solve effectively. Many different methods have been proposed to solve this problem. Some of these methods are based on heuristic techniques that provide an optimal or near optimal solution for large grids. In this paper, a new heuristic algorithm for scheduling meta-tasks in grid computing system is presented which tries to consider the execution time and machine state simultaneously by a mapping function. According to the experimental results, the proposed algorithm confidently demonstrates its competitiveness with previously proposed algorithms.

Journal ArticleDOI
TL;DR: An improved version of Hamsa, a network based automated signature generation scheme to thwart zeroday polymorphic worms, and a novel architecture that reduces the noise in suspicious traffic pool, thus enhancing the accuracy of worm’s signature.
Abstract: With growing sophistication of computer worms, it is very important to detect and prevent the worms quickly and accurately at their early phase of infection. Traditional signature based IDS, though effective for known attacks but failed to handle the zero-day attack promptly. Recent works on polymorphic worms does not guarantee accurate signature in presence of noise in suspicious flow samples. In this paper we propose PolyS, an improved version of Hamsa, a network based automated signature generation scheme to thwart zeroday polymorphic worms. We contribute a novel architecture that reduces the noise in suspicious traffic pool, thus enhancing the accuracy of worm’s signature. Also we propose a signature generation algorithm for successfully matching polymorphic worm payload with higher speed and memory efficiency. Analysis shows that our system is fast, accurate, attackresilient and capable of generating quality signature with low false positive and false negative.

Journal ArticleDOI
TL;DR: Evaluating the sensing performance of energy detector in multihop networks over Nakagami-n fading channels emphasizes that multi-branch networks (Co-operation) is an important tool to combat the effect of fading and to improve the detection probability.
Abstract: The spectrum sensing is an important activity of cognitive radios over fading channels. A proper sensing performance depends upon the fading margin and number of relays within a wireless link. This paper evaluates the sensing performance of energy detector in multihop networks over Nakagami-n fading channels. The decode and forward relays are considered for the analysis because of their best performance characteristics. Further, digital communication is aided by digital relaying techniques to achieve better performance results. The results yield the optimum value of average SNR based on fading margin and number of relays. The results emphasize that multi-branch networks (Co-operation) is an important tool to combat the effect of fading and to improve the detection probability.

Journal ArticleDOI
TL;DR: This paper analyzes the indicator of power loss and takes it as objective function, using the TSPSO (Tabu Search mechanism Particle Swarm Optimization) algorithm to study the problems which include the positions, capacity and numbers of DGs.
Abstract: When the system is with DG (distributed generation), Power system structure has changed The structure has change to complicated new model with distributed generations from traditional open network. The voltage and power losses of traditional network will be influenced by the location connected with DG, reactive power, active power and the number of DG. The purpose to connect DG is to improve reliability of the system, reduce the loss of network and reduce the cost. In order to achieve this goal, this paper analyzes the indicator of power loss and takes it as objective function. Considering the superior properties of particle swarm optimization algorithm in solving discrete values problem, the algorithm is improved by the tabu search mechanism, we use the TSPSO (Tabu Search mechanism Particle Swarm Optimization) algorithm to study the problems which include the positions, capacity and numbers of DGs. At last, verify the validity of the method by simulation experiment.

Journal ArticleDOI
TL;DR: Simulation results and comparisons demonstrate that the proposed autonomic power aware SLA-oriented cloud resources orchestration two-tier architecture significantly surpasses previous approaches in terms of total energy consumption, furthermore maintaining web applications SLAs objectives despite dynamic workload scenarios.
Abstract: Endless resources provisioning illusion is the mainstay for cloud computing paradigm. However, the unpredictable volatility nature involving web applications workload demand would highly hinder cloud computing platforms performance, furthermore, expose cloud resources for possible devastation. Accordingly, this work proposes autonomic power aware SLA-oriented cloud resources orchestration two-tier architecture. Despite complexity and uncertainties of the workload fluctuations, the proposed architecture geared for leveraging cloud system resources utilization, ensuring explicit guarantees on web applications’ responsiveness obligations, meanwhile achieving power consumption minimization objectives. The proposed architecture consolidates heuristic methodologies along with control theory approaches in a resource orchestration hierarchical structure. Firstly, an autonomic global controller is presented. The proposed global controller exploits heuristic methodology for mapping virtual machines (VMs) to the appreciate cloud resources in accordance to heuristic multidimensional objectives based placement strategy. Secondly, a proactive fuzzy-logic based local controller is proposed. The proposed local controller aimed at in confronting workloads' sustainable fluctuations via proactive amendment for the placement and provisioning schedules. Furthermore, the proposed local controller oriented towards maintaining active power management policy especially during transient peak of usage, thereby mitigating overall costs, and extending resources capacity and performance capabilities. Simulation results and comparisons demonstrate that the proposed architecture significantly surpasses previous approaches in terms of total energy consumption, furthermore maintaining web applications SLAs objectives despite dynamic workload scenarios.

Journal ArticleDOI
TL;DR: The proposed routing mechanism applies stochastic service model to calculate the latency-guarantee for any given network links, and can provide better QoS performance for those latency-sensitive traffics with improved energyefficiency.
Abstract: With the rapid development of Internet technology and enhanced QoS requirements, network energy consumption has attracted more and more attentions due to the overprovision of network resources. Generally, energy saving can be achieved by sacrificed some performance. However, many popular applications require real-time or soft real-time QoS performance for attracting potential users, and existing technologies can hardly obtain satisfying tradeoffs between energy consumption and performance. In this paper, a novel energy-aware routing mechanism is presented with aiming at reducing the network energy consumption and maintaining satisfying QoS performance for these latency-sensitive applications. The proposed routing mechanism applies stochastic service model to calculate the latency-guarantee for any given network links. Based on such a quantitative latencyguarantee, we further propose a technique to decide whether a link should be powered down and how long it should be kept in power saving mode. Extensive experiments are conducted to evaluate the effectiveness of the proposed mechanism, and the results indicate that it can provide better QoS performance for those latency-sensitive traffics with improved energyefficiency.

Journal ArticleDOI
TL;DR: In the new model for similarity metric of web queries using user logs, not only word form, but also semantic information has been taken into account and improved recall and precision are shown.
Abstract: The similarity of web queries plays an important role in capturing frequently asked questions, most popular topics of search engine or automatic query expansion. Accurate measurement of similarity between queries is crucial. The paper presents a new model for similarity metric of web queries using user logs and applied it into information retrieval for query expansion. Different from previous works, in the new model not only word form, but also semantic information has been taken into account. Experiments show that using the new model in query expansion actually improved recall of 8.1 percent and precision of 9.2 percent, which indicates the good performance.

Journal ArticleDOI
TL;DR: The experiment results show that the proposed method can improve efficiency and precision of the image segmentation, and afford an important assisting method on concrete meso-structure CT image study of architecture projection.
Abstract: An image segmentation fast method based on multi-scale belief propagation is proposed to solve the concrete CT image segmentations problem. Firstly, according to the feature of belief propagation algorithm, a self-characteristic multiscale belief propagation(MBP) is proposed; Then, according to compute complexity problem in process of belief messages propagation, a method to reduce quantity of algorithm compute is proposed; Finally, using standard images to validate nicety and speediness on our method ,and applying on concrete CT image segmentation. The experiment results show that the proposed method can improve efficiency and precision of the image segmentation, and afford an important assisting method on concrete meso-structure CT image study of architecture projection. The method has important projection applying meaning.

Journal ArticleDOI
TL;DR: A DNA sticker algorithm for parallel reduction over finite field GF(2 n) suitable for some specific finite fields defined with trinomials or pentanomials.
Abstract: This paper proposes a DNA sticker algorithm for parallel reduction over finite field GF(2 n ). This algorithm is suitable for some specific finite fields defined with trinomials or pentanomials. We use one binary finite field GF(2 163 ) which is recommended by National Institute of Standards and Technology (NIST) to describe the details about our algorithm. The solution space of 2 325 cases could be figured out within 3059 DNA steps. This work also presents clear evidence of ability of DNA computing to perform complicated mathematic operations for elliptic curve cryptosystem over finite field GF(2 n ).

Journal ArticleDOI
TL;DR: This paper exploits both the workload dispatching and the service provisioning to address the total electricity cost minimization problem as a hierarchical capacitated median model based on mixed integer linear programming (MILP) technique.
Abstract: As the demand on online services and cloud computing has kept increasing in recent years, the power usage and cost associated with cloud data centers’ operation have been uprising significantly. Most existing research focuses on reducing power consumption of data centers. However, the ultimate goal of cloud service operators is to reduce the total operating cost of data centers while guaranteeing the quality of service such as service delay to the end users. This paper exploits both the workload dispatching and the service provisioning to address the total electricity cost minimization problem. This problem is formulated as a hierarchical capacitated median model based on mixed integer linear programming (MILP) technique. Extensive evaluations based on real-life electricity price data for multiple data centers show the efficiency and efficacy of our approach.

Journal ArticleDOI
TL;DR: An Energy-aware Iterative Sampling Framework (EISF) for data gathering to reduce the total number of transmissions by exploiting the correlation among the readings of close sensor nodes.
Abstract: Large numbers of nodes are often densely deployed to deliver the desired environmental attributes to the sink in Wireless Sensor Networks (WSNs), so there is a high spatial correlation among the readings of close sensor nodes. Given a certain requirement for accuracy, only part of the sensor nodes should be required to transport the data to sink. We proposed an Energy-aware Iterative Sampling Framework (EISF) for data gathering to reduce the total number of transmissions by exploiting the correlation. In our method, all nodes in a WSNs compete for reporting nodes with energy-related probability and each nonreporting node autonomously determines whether its own readings are redundant or not by utilizing the overheard packets transmitted by the nearby reporting nodes for each epoch. The redundant nodes will be put into sleep mode. After a limited number of iterations, our algorithm can select a set of sampling nodes to transport data with accuracy guarantees. The results of simulation experiments using the real data demonstrate that our proposed approach is effective in prolonging the network life.

Journal ArticleDOI
TL;DR: This paper proposes a new parallel version of the Apriori algorithm of Agrawal, that is the main algorithm of each data mining technique, and carries out a detailed evaluation of the parallelization techniques and the impact of combining different types of parallelism (task, data and pipeline) on the effectiveness of the system.
Abstract: Like all the other fields of data processing, the modern information systems have integrated the results of the advanced technologies of the last decades. These systems contain implicit data which it will be necessary to extract and exploit, by using data Mining techniques. Mining association rules which trends to find interesting association or correlation relationships among large amounts of data is one of these techniques. It is a twosteps process, the first step finds all frequent itemsets and the second step constructs association rules from these frequent sets. The overall performance of mining association rules is determined by the first step which becomes the focus problem. This step is expensive with high demands for computation and data access. Parallel computing seems to have a natural role to play since parallel computers provides scalability. In this paper, we examine the issue of mining association rules among items in large databases transactions using the algorithm Apriori proposed by Agrawal. In this context, we propose a new parallel version of the Apriori algorithm of Agrawal, that is the main algorithm of each data mining technique. Parallel computing seems to have a natural role to play since parallel computers provides scalability. In fact, our objective of our work is to have an efficient parallel execution time that requires a delicate balance between program granularity and communication latency (synchronization overhead) between the different granules. Unlike previous work on parallelization of specific data mining algorithms, our approaches consist to discover the different granularity levels of parallelism and their impact on the performance. In this paper we focus on task and data parallelism (hybrid approach) under distributed memory. In particular, if communication latency is minimal then fine grain partitioning will yield the best performance. This is the case when data parallelism is used. If communication latency is large (as in a loosely coupled system), then coarse grain partitioning is more appropriate. For the target architecture used in this work (distributed-shared memory).), the problem of load balancing among the nodes becomes a more critical issue in attempts to yield high performance. We have carried out a detailed evaluation of the parallelization techniques and the impact of combining different types of parallelism (task, data and pipeline) on the effectiveness of the system.

Journal ArticleDOI
TL;DR: The experimental results show that the Adaptive Trust Sampling strategy can adapt to the dynamic change of sample sizes, effectively reduce the total sample size, mitigate the consumptions of system resources to some extent, and achieve the purpose of P2P traffic sampling.
Abstract: This paper focuses on the sampling-based Deep Packet Inspection for the traffic of P2P file sharing systems, especially for BitTorrent, and proposes a logarithmic-based Adaptive Trust Sampling (ATS) strategy for P2P traffic identification. In the whole process of sampling identification for P2P traffic, the sampling ratio of the current node in a P2P network can automatically adjust and dynamically vary according to the estimator of P2P traffic ratio of historical cycles. The experimental results show that the Adaptive Trust Sampling strategy can adapt to the dynamic change of sample size, effectively reduce the total sample size, mitigate the consumptions of system resources to some extent, and achieve the purpose of P2P traffic sampling.

Journal ArticleDOI
TL;DR: The experimental result shows that the proposed approach to test suite reduction with test requirement partition is helpful to generate the reduced test suite, which is used to test all the test requirements sufficiently by comparing with the existed methods, at the cost of a moderate loss in fault detection capability.
Abstract: Test suite reduction aims at improving the effectiveness of testing and cutting down the test cost with the least test cases under the condition of satisfying all testing objectives. This paper proposes a new method for test suite reduction with test requirement partition. First, it gives a partition to the set of all the available test cases based on the test requirements. After that, a test suite by the partition is generated and then a smaller test suite is obtained by further reduction according to the relationship between test requirements and test suites. At last, the experimental result shows that the proposed approach is helpful to generate the reduced test suite, which is used to test all the test requirements sufficiently by comparing with the existed methods, at the cost of a moderate loss in fault detection capability.

Journal ArticleDOI
TL;DR: An enhanced data collection protocol based on the Collection Tree Protocol (CTP), which introduces the concept of congestion detection and congestion avoidance into CTP and improves the efficiency and performance of data collection tasks in WSN.
Abstract: In wireless sensor networks (WSNs), the communication radius of a single sensor node is constrained. Thus, many-to-one and multi-hop routing protocols are designed to relay collected data back to the sink node. One of the challenges handled by present routing protocols is to prevent or reduce traffic congestion, which inevitably causes high packet drop rate, low energy efficiency, and long end-to-end delay. This paper presents an enhanced data collection protocol based on the Collection Tree Protocol (CTP), which introduces the concept of congestion detection and congestion avoidance into CTP. We have implemented test-bed experiments containing 10 TelosB motes, and compared results of our ECTP with the original CTP protocol and also with another enhanced CTP which takes the main mechanisms of ECODA. According to the experimental results, our ECTP improves the efficiency and performance of data collection tasks in WSN compared to the other two.

Journal Article
TL;DR: In this paper, the authors proposed a new preshuffling strategy in Hadoop to reduce high network loads imposed by shuffle-intensive applications, where the 2-stage pipeline was incorporated with the preshuffle scheme.
Abstract: MapReduce has become an important distributed processing model for large-scale data-intensive applications like data mining and web indexing. Hadoop–an open-source implementation of MapReduce is widely used for short jobs requiring low response time. In this paper, we proposed a new preshuffling strategy in Hadoop to reduce high network loads imposed by shuffle-intensive applications. Designing new shuffling strategies is very appealing for Hadoop clusters where network interconnects are performance bottleneck when the clusters are shared among a large number of applications. The network interconnects are likely to become scarce resource when many shuffle-intensive applications are sharing a Hadoop cluster. We implemented the push model along with the preshuffling scheme in the Hadoop system, where the 2-stage pipeline was incorporated with the preshuffling scheme. We implemented the push model and a pipeline along with the preshuffling scheme in the Hadoop system. Using two Hadoop benchmarks running on the 10-node cluster, we conducted experiments to show that preshuffling-enabled Hadoop clusters are faster than native Hadoop clusters. For example, the push model and the preshuffling scheme powered by the 2-stage pipeline can shorten the execution times of the WordCount and Sort Hadoop applications by an average of 10% and 14%, respectively.