scispace - formally typeset
Search or ask a question

Showing papers by "Albert Y. Zomaya published in 2015"


Journal ArticleDOI
TL;DR: A brief overview on the Big Data and data-intensive problems, including the analysis of RS Big Data, Big Data challenges, current techniques and works for processing RS Big data is given.

460 citations


Journal ArticleDOI
TL;DR: A novel offloading system to design robust offloading decisions for mobile services is proposed and its approach considers the dependency relations among component services and aims to optimize execution time and energy consumption of executing mobile services.
Abstract: The development of cloud computing and virtualization techniques enables mobile devices to overcome the severity of scarce resource constrained by allowing them to offload computation and migrate several computation parts of an application to powerful cloud servers. A mobile device should judiciously determine whether to offload computation as well as what portion of an application should be offloaded to the cloud. This paper considers a mobile computation offloading problem where multiple mobile services in workflows can be invoked to fulfill their complex requirements and makes decision on whether the services of a workflow should be offloaded. Due to the mobility of portable devices, unstable connectivity of mobile networks can impact the offloading decision. To address this issue, we propose a novel offloading system to design robust offloading decisions for mobile services. Our approach considers the dependency relations among component services and aims to optimize execution time and energy consumption of executing mobile services. To this end, we also introduce a mobility model and a trade-off fault-tolerance mechanism for the offloading system. A genetic algorithm (GA) based offloading method is then designed and implemented after carefully modifying parts of a generic GA to match our special needs for the stated problem. Experimental results are promising and show near-optimal solutions for all of our studied cases with almost linear algorithmic complexity with respect to the problem size.

261 citations


Journal ArticleDOI
01 Mar 2015
TL;DR: This paper studies data replication in cloud computing data centers and considers both energy efficiency and bandwidth consumption of the system, in addition to the improved quality of service QoS obtained as a result of the reduced communication delays.
Abstract: Cloud computing is an emerging paradigm that provides computing, communication and storage resources as a service over a network. Communication resources often become a bottleneck in service provisioning for many cloud applications. Therefore, data replication which brings data (e.g., databases) closer to data consumers (e.g., cloud applications) is seen as a promising solution. It allows minimizing network delays and bandwidth usage. In this paper we study data replication in cloud computing data centers. Unlike other approaches available in the literature, we consider both energy efficiency and bandwidth consumption of the system. This is in addition to the improved quality of service QoS obtained as a result of the reduced communication delays. The evaluation results, obtained from both mathematical model and extensive simulations, help to unveil performance and energy efficiency tradeoffs as well as guide the design of future data replication solutions.

208 citations


Journal ArticleDOI
TL;DR: Some of the main challenges of privacy in the IoT as well as opportunities for research and innovation are discussed and some of the ongoing research efforts that address IoT privacy issues are introduced.
Abstract: Over the last few years, we've seen a plethora of Internet of Things (IoT) solutions, products, and services make their way into the industry's marketplace. All such solutions will capture large amounts of data pertaining to the environment as well as their users. The IoT's objective is to learn more and better serve system users. Some IoT solutions might store data locally on devices ("things"), whereas others might store it in the cloud. The real value of collecting data comes through data processing and aggregation on a large scale, where new knowledge can be extracted. However, such procedures can lead to user privacy issues. This article discusses some of the main challenges of privacy in the IoT as well as opportunities for research and innovation. The authors also introduce some of the ongoing research efforts that address IoT privacy issues.

192 citations


Journal ArticleDOI
TL;DR: This survey aims to investigate similarities and differences of such a framework on the basis of the thematic taxonomy to diagnose significant and explore major outstanding issues in the domain of the distributed clouds.
Abstract: Cloud computing has emerged as a long-dreamt vision of the utility computing paradigm that provides reliable and resilient infrastructure for users to remotely store data and use on-demand applications and services. Currently, many individuals and organizations mitigate the burden of local data storage and reduce the maintenance cost by outsourcing data to the cloud. However, the outsourced data is not always trustworthy due to the loss of physical control and possession over the data. As a result, many scholars have concentrated on relieving the security threats of the outsourced data by designing the Remote Data Auditing (RDA) technique as a new concept to enable public auditability for the stored data in the cloud. The RDA is a useful technique to check the reliability and integrity of data outsourced to a single or distributed servers. This is because all of the RDA techniques for single cloud servers are unable to support data recovery; such techniques are complemented with redundant storage mechanisms. The article also reviews techniques of remote data auditing more comprehensively in the domain of the distributed clouds in conjunction with the presentation of classifying ongoing developments within this specified area. The thematic taxonomy of the distributed storage auditing is presented based on significant parameters, such as scheme nature, security pattern, objective functions, auditing mode, update mode, cryptography model, and dynamic data structure. The more recent remote auditing approaches, which have not gained considerable attention in distributed cloud environments, are also critically analyzed and further categorized into three different classes, namely, replication based, erasure coding based, and network coding based, to present a taxonomy. This survey also aims to investigate similarities and differences of such a framework on the basis of the thematic taxonomy to diagnose significant and explore major outstanding issues.

110 citations


Journal ArticleDOI
TL;DR: This survey investigates the recent advancement in the field of text analysis and covers two basic approaches of text mining, such as classification and clustering that are widely used for the exploration of the unstructured text available on the Web.
Abstract: In this survey, we review different text mining techniques to discover various textual patterns from the social networking sites. Social network applications create opportunities to establish interaction among people leading to mutual learning and sharing of valuable knowledge, such as chat, comments, and discussion boards. Data in social networking websites is inherently unstructured and fuzzy in nature. In everyday life conversations, people do not care about the spellings and accurate grammatical construction of a sentence that may lead to different types of ambiguities, such as lexical, syntactic, and semantic. Therefore, analyzing and extracting information patterns from such data sets are more complex. Several surveys have been conducted to analyze different methods for the information extraction. Most of the surveys emphasized on the application of different text mining techniques for unstructured data sets reside in the form of text documents, but do not specifically target the data sets in social networking website. This survey attempts to provide a thorough understanding of different text mining techniques as well as the application of these techniques in the social networking websites. This survey investigates the recent advancement in the field of text analysis and covers two basic approaches of text mining, such as classification and clustering that are widely used for the exploration of the unstructured text available on the Web.

100 citations


Journal ArticleDOI
TL;DR: The Maximum Effective Reduction algorithm is presented, a resource efficiency solution that optimizes the resource usage of a workflow schedule generated by any particular scheduling algorithm and can be applied to any environments that deal with the execution of (scientific) workflows of many precedence-constrained tasks.
Abstract: Workflow applications in science and engineering have steadily increased in variety and scale. Coinciding with this increase has been the relentless effort to improve the performance of these applications through exploiting the abundance of resources in hyper-scale clouds and with little attention to resources efficiency. The inefficient use of resources when executing scientific workflows results from both the excessive amount of resources provisioned and the wastage from unused resources among task runs. In this paper, we address the problem of resource-efficient workflow scheduling. To this end, we present the Maximum Effective Reduction (MER) algorithm, a resource efficiency solution that optimizes the resource usage of a workflow schedule generated by any particular scheduling algorithm. MER trades the minimal makespan increase for the maximal resource usage reduction by consolidating tasks with the exploitation of resource inefficiency in the original workflow schedule. The main novelty of MER lies in its identification of "near-optimal" trade-off point between makespan increase and resource usage reduction. Finding such a point is of great practical importance and can lead to: (1) improvements in resource utilization, (2) reductions in resource provisioning, and (3) savings in energy consumption. Another significant contribution of this work is MER's broad applicability. In essence, MER can be applied to any environments that deal with the execution of (scientific) workflows of many precedence-constrained tasks although MER best suits for the IaaS cloud model. Based on results obtained from our extensive simulations using scientific workflow traces, we demonstrate MER is capable of reducing the amount of actual resources used by 54% with an average makespan increase of less than 10%. The efficacy of MER is further verified by results (from a comprehensive set of experiments with varying makespan delay limits) that show the resource usage reduction, makespan increase and the trade-off between them for various workflow applications.

91 citations


Journal ArticleDOI
TL;DR: A profiling-based server consolidation framework which minimizes the number of physical machines (PMs) used in data centers while maintaining satisfactory performance of various workloads is proposed.
Abstract: Improving energy efficiency of data centers has become increasingly important nowadays due to the significant amounts of power needed to operate these centers. An important method for achieving energy efficiency is server consolidation supported by virtualization. However, server consolidation may incur significant degradation to workload performance due to virtual machine (VM) co-location and migration. How to reduce such performance degradation becomes a critical issue to address. In this paper, we propose a profiling-based server consolidation framework which minimizes the number of physical machines (PMs) used in data centers while maintaining satisfactory performance of various workloads. Inside this framework, we first profile the performance losses of various workloads under two situations: running in co-location and experiencing migrations. We then design two modules: (1) consolidation planning module which, given a set of workloads, minimizes the number of PMs by an integer programming model, and (2) migration planning module which, given a source VM placement scenario and a target VM placement scenario, minimizes the number of VM migrations by a polynomial time algorithm. Also, based on the workload performance profiles, both modules can guarantee the performance losses of various workloads below configurable thresholds. Our experiments for workload profiling are conducted with real data center workloads and our experiments on our two modules validate the integer programming model and the polynomial time algorithm.

89 citations


Journal ArticleDOI
TL;DR: This paper proposes a method of improved online dictionary learning based on Particle Swarm Optimization (PSO), which improves the performance of the algorithm for large-scale remote sensing images, and has a better effect on noise suppression.
Abstract: Dictionary learning, which is based on sparse coding, has been frequently applied to many tasks related to remote sensing processes. Recently, many new non-analytic dictionary-learning algorithms have been proposed. Some are based on online learning. In online learning, data can be sequentially incorporated into the computation process. Therefore, these algorithms can train dictionaries using large-scale remote sensing images. However, their accuracy is decreased for two reasons. On one hand, it is a strategy of updating all atoms at once; on the other, the direction of optimization, such as the gradient, is not well estimated because of the complexity of the data and the model. In this paper, we propose a method of improved online dictionary learning based on Particle Swarm Optimization (PSO). In our iterations, we reasonably selected special atoms within the dictionary and then introduced the PSO into the atom-updating stage of the dictionary-learning model. Furthermore, to guide the direction of the optimization, the prior reference data were introduced into the PSO model. As a result, the movement dimension of the particles is reasonably limited and the accuracy and effectiveness of the dictionary are promoted, but without heavy computational burdens. Experiments confirm that our proposed algorithm improves the performance of the algorithm for large-scale remote sensing images, and our method also has a better effect on noise suppression.

86 citations


Journal ArticleDOI
TL;DR: Three genetic algorithms as energy‐aware grid schedulers are developed and empirically evaluated in three grid size scenarios in static and dynamic modes to address the independent batch scheduling in computational grid as a bi‐objective global minimization problem with Makespan and energy consumption as the main criteria.
Abstract: In today's highly parametrized distributed computational environments, such as green grid clusters and clouds, the growing power and cooling rates are becoming the dominant part of the users' and system managers' budgets. Computational grids, owing to their sheer sizes, still require advanced methodologies and strategies for supporting the scheduling of the users' tasks and applications to the distributed resources. The efficient resource allocation becomes even more challenging when energy utilization, beyond the conventional scheduling criteria, such as Makespan, is treated as first-class additional scheduling objective. In this paper, we address the independent batch scheduling in computational grid as a bi-objective global minimization problem with Makespan and energy consumption as the main criteria. We apply the dynamic voltage and frequency scaling model for the management of the cumulative power energy utilized by the grid resources. We develop three genetic algorithms as energy-aware grid schedulers, which were empirically evaluated in three grid size scenarios in static and dynamic modes. The simulation results confirmed the effectiveness of the proposed genetic algorithm-based schedulers in the reduction of the energy consumed by the whole system and in dynamic load balancing of the resources in grid clusters, which is sufficient to maintain the desired quality levels. Copyright © 2012 John Wiley & Sons, Ltd.

77 citations


Journal ArticleDOI
TL;DR: A RS data object-based parallel file system for remote sensing applications and implement it with the OrangeFS file system that provides application-aware data layout policies for efficient support of various data access patterns of RS applications from the server side.
Abstract: Remote sensing applications in Digital Earth are overwhelmed with vast quantities of remote sensing (RS) image data. The intolerable I/O burden introduced by the massive amounts of RS data and the irregular RS data access patterns has made the traditional cluster based parallel I/O systems no longer applicable. We propose a RS data object-based parallel file system for remote sensing applications and implement it with the OrangeFS file system. It provides application-aware data layout policies, together with RS data object based data I/O interfaces, for efficient support of various data access patterns of RS applications from the server side. With the prior knowledge of the desired RS data access patterns, HPGFS could offer relevant space-filling curves to organize the sliced 3-D data bricks and distribute them over I/O servers. In this way, data layouts consistent with expected data access patterns could be created to explore data locality and achieve performance improvement. Moreover, the multi-band RS data with complex structured geographical metadata could be accessed and managed as a single data object. Through experiments on remote sensing applications with different access patterns, we have achieved performance improvement of about 30 percent for I/O and 20 percent overall.

Journal ArticleDOI
19 Aug 2015-PLOS ONE
TL;DR: It is argued that the pagerank-index is an inherently fairer and more nuanced metric to quantify the publication records of scientists compared to existing measures.
Abstract: Quantifying and comparing the scientific output of researchers has become critical for governments, funding agencies and universities. Comparison by reputation and direct assessment of contributions to the field is no longer possible, as the number of scientists increases and traditional definitions about scientific fields become blurred. The h-index is often used for comparing scientists, but has several well-documented shortcomings. In this paper, we introduce a new index for measuring and comparing the publication records of scientists: the pagerank-index (symbolised as π). The index uses a version of pagerank algorithm and the citation networks of papers in its computation, and is fundamentally different from the existing variants of h-index because it considers not only the number of citations but also the actual impact of each citation. We adapt two approaches to demonstrate the utility of the new index. Firstly, we use a simulation model of a community of authors, whereby we create various ‘groups’ of authors which are different from each other in inherent publication habits, to show that the pagerank-index is fairer than the existing indices in three distinct scenarios: (i) when authors try to ‘massage’ their index by publishing papers in low-quality outlets primarily to self-cite other papers (ii) when authors collaborate in large groups in order to obtain more authorships (iii) when authors spend most of their time in producing genuine but low quality publications that would massage their index. Secondly, we undertake two real world case studies: (i) the evolving author community of quantum game theory, as defined by Google Scholar (ii) a snapshot of the high energy physics (HEP) theory research community in arXiv. In both case studies, we find that the list of top authors vary very significantly when h-index and pagerank-index are used for comparison. We show that in both cases, authors who have collaborated in large groups and/or published less impactful papers tend to be comparatively favoured by the h-index, whereas the pagerank-index highlights authors who have made a relatively small number of definitive contributions, or written papers which served to highlight the link between diverse disciplines, or typically worked in smaller groups. Thus, we argue that the pagerank-index is an inherently fairer and more nuanced metric to quantify the publication records of scientists compared to existing measures.

Journal ArticleDOI
TL;DR: A new dynamic programming algorithm is proposed that inserts the minimum number of FRTUs satisfying the detection rate constraint and can perform FRTU insertion for a large scale power system.
Abstract: In the modern smart home and community, smart meters have been massively deployed for the replacement of traditional analog meters. Although it significantly reduces the cost of data collection as the meter readings are wireless transmitted, a smart meter is not tamper-resistant. As a consequence, the smart grid infrastructure is under threat of energy theft, by means of attacking a smart meter so that it undercounts the electricity usage. Deployment of feeder remote terminal unit (FRTU) helps narrow the search zone of energy theft in smart home and community. However, due to budgetary limit, utility companies can only afford to insert the minimum number of FRTUs. This imposes a signifcant challenge to deploy the minimum number of FRTUs while each smart meter is still effectively monitored. To the best of our knowledge, the only work addressing this problem is [1] , which uses stochastic optimization methods. Their algorithm is not very practical as it cannot handle large distribution networks because of the scalability issue. Due to the inherent heuristic and non-deterministic nature, there is no guarantee on the solution quality as well. Thus, the high performance energy theft detection is still needed for this energy theft problem. In order to resolve this challenge, we propose a new dynamic programming algorithm that inserts the minimum number of FRTUs satisfying the detection rate constraint. It evaluates every candidate solution in a bottom-up fashion using an innovative pruning technique. As a deterministic polynomial time algorithm, it is able to handle large distribution networks. In contrast to [1] which can only handle small system, our technique can perform FRTU insertion for a large scale power system. Our experimental results demonstrate that the average number of FRTUs required is only 26% of the number of smart meters in the community. Compared with the previous work, the number of FRTUs is reduced by 18.8% while the solution quality in terms of anomaly coverage index metric is still improved.

Journal ArticleDOI
TL;DR: A tutorial on the development of the smart controller to schedule household appliances, which is also known as smart home scheduling, is presented and results demonstrate that it can reduce the electricity bill by 30.11% while still improving peak-to-average ratio (PAR) in the power grid.
Abstract: The smart home infrastructure features the automatic control of various household appliances in the advanced metering infrastructure, which enables the connection of individual smart home systems to a smart grid. In such an infrastructure, each smart meter receives electricity price from utilities and uses a smart controller to schedule the household appliances accordingly. This helps shift the heavy energy load from peak hours to nonpeak hours. Such an architecture significantly improves the reliability of the power grid through reducing the peak energy usage, while benefiting the customers through reducing electricity bills. This paper presents a tutorial on the development of the smart controller to schedule household appliances, which is also known as smart home scheduling. For each individual user, a dynamic programming-based algorithm that schedules household appliances with discrete power levels is introduced. Based on it, a game theoretic framework is designed for multi-user smart home scheduling to mitigate the accumulated energy usage during the peak hours. The simulation results demonstrate that it can reduce the electricity bill by 30.11% while still improving peak-to-average ratio (PAR) in the power grid. Furthermore, the deployment of smart home scheduling techniques in a big city is discussed. In such a context, the parallel computation is explored to tackle the large computational complexity, a machine assignment approximation algorithm is proposed to accelerate the smart home scheduling, and a new hieratical framework is proposed to reduce the communication overhead. The simulation results on large test cases demonstrate that the city level hierarchical smart home scheduling can achieve the bill reduction of 43.04% and the PAR reduction of 47.50% on average.

Journal ArticleDOI
TL;DR: The simulation results indicate that the “adaptability” in individual behaviors has a significant influence on the evacuation procedure, and the proposed approach's capability to sustain complex scenarios involving a huge crowd consisting of tens of thousands of individuals.
Abstract: Simulation study on evacuation scenarios has gained tremendous attention in recent years. Two major research challenges remain along this direction: (1) how to portray the effect of individuals’ adaptive behaviors under various situations in the evacuation procedures and (2) how to simulate complex evacuation scenarios involving huge crowds at the individual level due to the ultrahigh complexity of these scenarios. In this study, a simulation framework for general evacuation scenarios has been developed. Each individual in the scenario is modeled as an adaptable and autonomous agent driven by a weight-based decision-making mechanism. The simulation is intended to characterize the individuals’ adaptable behaviors, the interactions among individuals, among small groups of individuals, and between the individuals and the environment. To handle the second challenge, this study adopts GPGPU to sustain massively parallel modeling and simulation of an evacuation scenario. An efficient scheme has been proposed to minimize the overhead to access the global system state of the simulation process maintained by the GPU platform. The simulation results indicate that the “adaptability” in individual behaviors has a significant influence on the evacuation procedure. The experimental results also exhibit the proposed approach's capability to sustain complex scenarios involving a huge crowd consisting of tens of thousands of individuals.

Journal ArticleDOI
TL;DR: A dynamic task rearrangement and rescheduling algorithm that exploits the scheduling flexibility from precedence constraints among tasks, and optimizes resource allocation among multiple workflows, and it often stops the influence of delayed execution passing to subsequent tasks.
Abstract: Large-scale distributed computing systems like grids and more recently clouds are a platform of choice for many resource-intensive applications Workflow applications account for the majority of these applications, particularly in science and engineering A workflow application consists of multiple precedence-constrained tasks with data dependencies Since resources in those systems are shared by many users and applications deployed there are very diverse, scheduling is complicated Often, the actual execution of applications differs from the original schedule following delays such as those caused by resource contention and other issues in performance prediction These delays have further impact when running multiple workflow applications due to inter-task dependencies In this paper, we investigate the problem of scheduling multiple workflow applications concurrently, explicitly taking into account scheduling robustness We present a dynamic task rearrangement and rescheduling algorithm that exploits the scheduling flexibility from precedence constraints among tasks The algorithm optimizes resource allocation among multiple workflows, and it often stops the influence of delayed execution passing to subsequent tasks The experimental results demonstrate that our approach can significantly improve performance in multiple-workflow scheduling

Journal ArticleDOI
TL;DR: A decentralized WSAN virtualization model based on information fusion is proposed, termed Olympus, which seeks to leverage the CoS paradigm to make the best use of both the cloud and physical WSAN environments.
Abstract: The cloud of sensors (CoS) paradigm emerged from the broader concept of the cloud of things. CoS infrastructures are built on the concept of wireless sensor and actuator network (WSAN) virtualization and have the potential to leverage the benefits of both cloud computing and WSANs. However, WSAN virtualization is still in its infancy within the CoS paradigm. Existing approaches are centralized CoS infrastructures that consider smart sensors only as passive data providers, which don't take full advantage of these devices. The authors propose a decentralized WSAN virtualization model based on information fusion. The model, termed Olympus, seeks to leverage the CoS paradigm to make the best use of both the cloud and physical WSAN environments.

Journal ArticleDOI
TL;DR: This paper introduces an adaptive data-aware scheduling (ADAS) strategy for workflow applications that consist of a set-up stage which builds the clusters for the workflow tasks and datasets, and a run-time stage which makes the overlapped execution for the workflows.

Journal ArticleDOI
TL;DR: This study has explored the feasibility to utilize the contemporary general-purpose computing on the graphics processing unit (GPGPU) on the CKDB-tree with the support of only a single Kepler GPU and it provides real-time filtering of streaming data with 2.5M data tuples per second.
Abstract: More and more real-time applications need to handle dynamic continuous queries over streaming data of high density. Conventional data and query indexing approaches generally do not apply for excessive costs in either maintenance or space. Aiming at these problems, this study first proposes a new indexing structure by fusing an adaptive cell and KDB-tree, namely CKDB-tree. A cell-tree indexing approach has been developed on the basis of the CKDB-tree that supports dynamic continuous queries. The approach significantly reduces the space costs and scales well with the increasing data size. Towards providing a scalable solution to filtering massive steaming data, this study has explored the feasibility to utilize the contemporary general-purpose computing on the graphics processing unit (GPGPU). The CKDB-tree-based approach has been extended to operate on both the CPU (host) and the GPU (device). The GPGPU-aided approach performs query indexing on the host while perform streaming data filtering on the device in a massively parallel manner. The two heterogeneous tasks execute in parallel and the latency of streaming data transfer between the host and the device is hidden. The experimental results indicate that (1) CKDB-tree can reduce the space cost comparing to the cell-based indexing structure by 60 percent on average, (2) the approach upon the CKDB-tree outperforms the traditional counterparts upon the KDB-tree by 66, 75 and 79 percent in average for uniform, skewed and hyper-skewed data in terms of update costs, and (3) the GPGPU-aided approach greatly improves the approach upon the CKDB-tree with the support of only a single Kepler GPU, and it provides real-time filtering of streaming data with 2.5M data tuples per second. The massively parallel computing technology exhibits great potentials in streaming data monitoring.

BookDOI
01 Jan 2015
TL;DR: This handbook offers a comprehensive review of the state-of-the-art research achievements in the field of data centers, and is intended for those seeking to gain a stronger grasp on data centers: the fundamental protocol used by the applications and the network, the typical network technologies, and their design aspects.
Abstract: This handbook offers a comprehensive review of the state-of-the-art research achievements in the field of data centers. Contributions from international, leading researchers and scholars offer topics in cloud computing, virtualization in data centers, energy efficient data centers, and next generation data center architecture. It also comprises current research trends in emerging areas, such as data security, data protection management, and network resource management in data centers. Specific attention is devoted to industry needs associated with the challenges faced by data centers, such as various power, cooling, floor space, and associated environmental health and safety issues, while still working to support growth without disrupting quality of service. The contributions cut across various IT data technology domains as a single source to discuss the interdependencies that need to be supported to enable a virtualized, next-generation, energy efficient, economical, and environmentally friendly data center. This book appeals to a broad spectrum of readers, including server, storage, networking, database, and applications analysts, administrators, and architects. It is intended for those seeking to gain a stronger grasp on data center networks: the fundamental protocol used by the applications and the network, the typical network technologies, and their design aspects. The Handbook of Data Centers is a leading reference on design and implementation for planning, implementing, and operating data center networks.

Journal ArticleDOI
TL;DR: The exponential growth in the size of the aforementioned health related raw data sets has widened this integration gap, which is severely limiting the potential benefits of having large datasets and HIS/CDSS for medical decision-making processes.

Journal ArticleDOI
TL;DR: A new Gateway based Multi-hop Routing algorithm (GMR) is proposed to enhance the routing management capability of the network and is the first of its kind that considers both the capability of gateway-based management and the requirements of high-bandwidth applications.
Abstract: Wireless community networks (WCNs) have emerged as a cost-effective ubiquitous broadband connectivity solution, offering a wide range of services in a given geographical area. QoS-aware multicast over WCNs is among the most challenging issues and has attracted a lot of attention in recent times. The existing multicast schemes in WCNs suffer in terms of several key performance metrics, such as, latency, jitter and throughput, particularly in large-scale networks. Consequently, these schemes cannot accommodate the desired performance levels, especially when dealing with high-bandwidth applications that require efficient gateway-based management. To fill in this gap, a new strategy for supporting QoS-aware multicast in large-scale WCNs is proposed in this paper. Specifically, a new Gateway based Multi-hop Routing algorithm (GMR) is firstly proposed to enhance the routing management capability of the network. Built upon GMR, a new Multicast Gateway Multi-hop Routing algorithm (MGMR) is devised to cope with high-bandwidth applications in WCNs. The MGMR is the first of its kind that considers both the capability of gateway-based management and the requirements of high-bandwidth applications. Extensive simulation experiments and performance results demonstrate the superiority of both GMR and MGMR when compared to other methods under various operating conditions.

BookDOI
TL;DR: This work aims at overcoming inefficiency by designing a distributed parallel system architecture that improves the performance of SPARQL endpoints by incorporating two functionalities: a queuing system to avoid bottlenecks during the execution of SParQL queries; and an intelligent relaxation of the queries submitted to the endpoint at hand whenever the relaxation itself and the consequently lowered complexity of the query are beneficial for the overall performance of the system.
Abstract: The Web of Data is widely considered as one of the major global repositories populated with countless interconnected and structured data prompting these linked datasets to be continuously and sharply increasing. In this context the so-called SPARQL Protocol and RDF Query Language is commonly used to retrieve and manage stored data by means of SPARQL endpoints, a query processing service especially designed to get access to these databases. Nevertheless, due to the large amount of data tackled by such endpoints and their structural complexity, these services usually suffer from severe performance issues, including inadmissible processing times. This work aims at overcoming this noted inefficiency by designing a distributed parallel system architecture that improves the performance of SPARQL endpoints by incorporating two functionalities: (1) a queuing system to avoid bottlenecks during the execution of SPARQL queries; and (2) an intelligent relaxation of the queries submitted to the endpoint at hand whenever the relaxation itself and the consequently lowered complexity of the query are beneficial for the overall performance of the system. To this end the system relies on a two-fold optimization criterion: the minimization of the query running time, as predicted by a supervised learning model; and the maximization of the quality of the results of the query as quantified by a measure of similarity. These two conflicting optimization criteria are efficiently balanced by two bi-objective heuristic algorithms sequentially executed over groups of SPARQL queries. The approach is validated on a prototype and several experiments that evince the applicability of the proposed scheme.

Proceedings ArticleDOI
04 Nov 2015
TL;DR: This paper proposes an intelligent framework that can reinforce an effective resource selection scheme by allowing the components that give impact on the performance such as resource/data freshness of the replicated system in such environment to be considered.
Abstract: This paper addresses a problem on implementing an asynchronous replication scheme in utility-based computing environment. The problem needs a special attention as most of the existing replication scheme in this computing system whether implicitly support synchronous replication and/or only consider read-only job. Therefore, we propose an intelligent framework that can reinforce an effective resource selection scheme by allowing the components that give impact on the performance such as resource/data freshness of the replicated system in such environment to be considered. We exploit an Update Ordering (UO) approach and reconcile these components in designing the framework. Important issues such as job propagation delay and job propagation rules are especially addressed. Our experiments show that the proposed framework is capable to become a platform of an effective resource selection scheme and achieve a good result with good system performance as compared to existing algorithms.

Journal ArticleDOI
TL;DR: The editor in chief of the IEEE Transactions on Computers says that the TC remains the perfect incubator for all of new ideas and opportunities and can be the initiator of many more.
Abstract: Ç IT is with mixed feelings that I am writing this farewell note to the readership of the IEEE Transactions on Computers (TC). I have enjoyed my tenure at the helm of the TC for the last four years and I am grateful for the opportunity to lead such a prestigious and highly respected journal in the field of computing. Another privilege was the chance to work with many excellent associate editors and the wonderful staff at the IEEE Computer Society (CS). Without their help, the enormity of the task would have surpassed by capabilities. The start of my term (in 2011) coincided with the 60th anniversary of the TC, which was marked by a number of celebrations by the CS and TC during that year. It was a great time during which we reflected on the history of the TC and the impact that it had on the career of many computer scientists and engineers. Over the years, the TC maintained its esteemed status through a rigorous review process, which I further enforced during my term. Of course, that was only made possible by the tireless efforts of many of the present and past associate editors and loyal legion of reviewers. The TC maintained its relevance by engaging the right expertise and supporting new and emerging themes in computing. I supported special issues in many emerging areas, such as cloud computing, energyefficient computing, multicore systems, and many more. This, of course, emphasized the need to continue selecting visible and high caliber researchers (from academia, industry and research laboratories) to serve on the editorial board who are able to cover the full spectrum of computing research today, even though that might not be an easy task. During my term, I nearly doubled the number of associate editors while maintaining a gender balance and a mix of regional representations. I also appointed two associate editors in chief to assist with the different operations of the TC. The TC will to continue its mission to remain the lead publication in the field. The number of submissions has grown tremendously. We currently handle close to 1,000 submissions every year. We also transitioned in January 2013 from paper printed form to the OnlinePlus publication model, which was another development that allowed us to deal with papers more rapidly. The TC remains the perfect incubator for all of new ideas and opportunities and can be the initiator of many more. I would like to welcome professor Paolo Montuschi, who has been selected as the editor in Chief of the TC. Paolo worked with me as an associate editor and associate editor in chief. He is a distinguished researcher with interests in computer arithmetic, computer graphics, computer architectures, and electronic publishing. Paolo is very energetic and he has played many roles with the CS. He is the first editor in chief from continental Europe, which is a reflection of the international and diverse nature of the TC. I am sure he will further lead the rapid growth of TC in the next few years and will take it to new heights. I am also positive that all of you will support Paolo in his new role as you supported me. Finally, I would like thank the readers of the TC for their support and encouragement during my time as editor in chief. I look forward to seeing TC going from strength to strength and I am certain the future holds more fruitful years for this great publication. Once more, thank you for the great times and the wonderful memories!

Journal ArticleDOI
01 Apr 2015
TL;DR: This special issue compiles recent advances in Autonomic Provisioning of Big Data Applications on Clouds to make cloud-hosted Big Data applications operate more efficiently, with reduced financial and environmental costs, reduced under-utilisation of resources, and better performance at times of unpredictable workload.
Abstract: Cloud computing assembles large networks of virtualised ICT services such as hardware resources (such as CPU, storage, and network), software resources (such as databases, application servers, and web servers) and applications. Big Data applications have become a common phenomenon in domain of science, engineering, and commerce. Large-scale, heterogeneous, and uncertain Big Data applications are becoming increasingly common, yet current cloud resource provisioning methods do not scale well and nor do they perform well under highly unpredictable conditions (data volume, data variety, data arrival rate, etc.). Much research effort have been paid in the fundamental understanding, technologies, and concepts related to autonomic provisioning of cloud resources for Big Data applications, to make cloud-hosted Big Data applications operate more efficiently, with reduced financial and environmental costs, reduced under-utilisation of resources, and better performance at times of unpredictable workload. Targeting the aforementioned research challenges, this special issue compiles recent advances in Autonomic Provisioning of Big Data Applications on Clouds. The special issue articles are briefly summarized.

Proceedings ArticleDOI
01 Sep 2015
TL;DR: A new pulling based workflow execution system with a profiling-based resource provisioning strategy that can achieve 80% speed-up, by removing scheduling overhead, compared to the well-known Pegasus workflow management system when running scientific workflow ensembles.
Abstract: Scientists in different fields, such as high energy physics, earth science, and astronomy are developing large-scale workflow applications. In many use cases, scientists need to run a set of interrelated but independent workflows (i.e., Workflow ensembles) for the entire scientific analysis. As a workflow ensemble usually contains many sub-workflows in each of which hundreds or thousands of jobs exist with precedence constraints, the execution of such a workflow ensemble makes a great concern with cost even using elastic and pay-as-you-go cloud resources. In this paper, we address two main challenges in executing large-scale workflow ensembles in public clouds with both cost and deadline constraints: (1) execution coordination, and (2) resource provisioning. To this end, we develop a new pulling based workflow execution system with a profiling-based resource provisioning strategy. The idea is homogeneity in both scientific workflows and cloud resources can be exploited to remove scheduling overhead (in execution coordination) and to minimize cost meeting deadline. Our results show that our solution system can achieve 80% speed-up, by removing scheduling overhead, compared to the well-known Pegasus workflow management system when running scientific workflow ensembles. Besides, our evaluation using Montage (an astronomical image mosaic engine) workflow ensembles on around 1000-core Amazon EC2 clusters has demonstrated the efficacy of our resource provisioning strategy in terms of cost effectiveness within deadline.

Journal ArticleDOI
TL;DR: The issue of selecting resource configurations across multiple layers of a cloud computing stack is discussed by considering deployment of a real-time stock recommendation big data application over an Amazon Web Services public datacenter.
Abstract: Cloud computing has transformed people's perception of how Internet-based applications can be deployed in datacenters and offered to users in a pay-as-you-go model. Despite the growing adoption of cloud datacenters, challenges related to big data application management still exist. One important research challenge is selecting configurations of resources as infrastructure-as-a-service (IaaS) and platform-as-a-service (PaaS) layers such that big data application-specific service-level agreement goals (such as minimizing event-detection and decision-making delays, maximizing application and data availability, and maximizing the number of alerts sent per second) are constantly achieved for big data applications. This article discusses the issue of selecting resource configurations across multiple layers of a cloud computing stack by considering deployment of a real-time stock recommendation big data application over an Amazon Web Services public datacenter.

Proceedings ArticleDOI
08 Jun 2015
TL;DR: In this paper, the authors propose a novel solution, which implements tight cooperation with the cellular network to optimize traffic offloading, aiming at optimizing the balance between user application requirements and availability of network resources.
Abstract: Data traffic from mobile devices experiences unprecedented growth, which current cellular network capacities cannot sustain. Traffic offloading to other type of networks, such as WiFi, can be used to reduce load in cellular networks. In this paper, we propose a novel solution, which unlike other existing methodologies, implements tight cooperation with the cellular network to optimize traffic offloading. The cellular network provides information about channel usage statistics, user mobility patterns, available resources and other parameters. The offloading decisions aim at optimizing the balance between user application requirements and availability of network resources. The validation results, obtained from NS-3 simulations, confirm effectiveness of the proposed solution in balancing cellular traffic load while ensuring QoS.

Proceedings ArticleDOI
08 Jun 2015
TL;DR: An energy efficient replication strategy based on the proposed models for energy consumption and bandwidth demand of database access in cloud computing datacenter is proposed, which results in improved Quality of Service (QoS) with reduced communication delays.
Abstract: Cloud computing is a computing model where users access ICT services and resources without regard to where the services are hosted Communication resources often become a bottleneck in service provisioning for many cloud applications Therefore, data replication which brings data (eg, databases) closer to data consumers (eg, cloud applications) is seen as a promising solution In this paper, we present models for energy consumption and bandwidth demand of database access in cloud computing datacenter In addition, we propose an energy efficient replication strategy based on the proposed models, which results in improved Quality of Service (QoS) with reduced communication delays The evaluation results obtained with extensive simulations help to unveil performance and energy efficiency tradeoffs and guide the design of future data replication solutions