Showing papers by "Albert Y. Zomaya published in 2014"

PDF

Open Access

Journal Article•DOI•

A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis

[...]

Adil Fahad¹, Najlaa Alshatri¹, Zahir Tari¹, Abdullah Alamri¹, Ibrahim Khalil¹, Albert Y. Zomaya², Sebti Foufou³, Abdelaziz Bouras³ - Show less +4 more•Institutions (3)

RMIT University¹, University of Sydney², Qatar University³

12 Jun 2014-IEEE Transactions on Emerging Topics in Computing

TL;DR: Concepts and algorithms related to clustering, a concise survey of existing (clustering) algorithms as well as a comparison, both from a theoretical and an empirical perspective are introduced.

...read moreread less

Abstract: Clustering algorithms have emerged as an alternative powerful meta-learning tool to accurately analyze the massive volume of data generated by modern applications. In particular, their main goal is to categorize data into clusters such that objects are grouped in the same cluster when they are similar according to specific metrics. There is a vast body of knowledge in the area of clustering and there has been attempts to analyze and categorize them for a larger number of applications. However, one of the major issues in using clustering algorithms for big data that causes confusion amongst practitioners is the lack of consensus in the definition of their properties as well as a lack of formal categorization. With the intention of alleviating these problems, this paper introduces concepts and algorithms related to clustering, a concise survey of existing (clustering) algorithms as well as providing a comparison, both from a theoretical and an empirical perspective. From a theoretical perspective, we developed a categorizing framework based on the main properties pointed out in previous studies. Empirically, we conducted extensive experiments where we compared the most representative algorithm from each of the categories using a large number of real (big) data sets. The effectiveness of the candidate clustering algorithms is measured through a number of internal and external validity metrics, stability, runtime, and scalability tests. In addition, we highlighted the set of clustering algorithms that are the best performing for big data.

...read moreread less

833 citations

Journal Article•DOI•

Efficient allocation of resources in multiple heterogeneous Wireless Sensor Networks

[...]

Wei Li¹, Flavia C. Delicato², Paulo F. Pires², Young Choon Lee¹, Albert Y. Zomaya¹, Claudio Miceli², Luci Pirmez² - Show less +3 more•Institutions (2)

University of Sydney¹, Federal University of Rio de Janeiro²

01 Jan 2014-Journal of Parallel and Distributed Computing

TL;DR: It is argued that a middleware platform is required to manage heterogeneous WSNs and efficiently share their resources while satisfying user needs in the emergent scenarios of WoT.

...read moreread less

95 citations

Journal Article•DOI•

Trends and challenges in cloud datacenters

[...]

Kashif Bilal¹, Saif Ur Rehman Malik¹, Samee U. Khan¹, Albert Y. Zomaya²•Institutions (2)

North Dakota State University¹, University of Sydney²

01 May 2014-IEEE Cloud Computing

TL;DR: Next-generation datacenters (DCs) built on virtualization technologies are pivotal to the effective implementation of the cloud computing paradigm and face major reliability and robustness challenges.

...read moreread less

Abstract: Next-generation datacenters (DCs) built on virtualization technologies are pivotal to the effective implementation of the cloud computing paradigm. To deliver the necessary services and quality of service, cloud DCs face major reliability and robustness challenges.

...read moreread less

93 citations

Book Chapter•DOI•

A framework for the local information dynamics of distributed computation in complex systems

[...]

Joseph T. Lizier¹, Joseph T. Lizier², Mikhail Prokopenko³, Mikhail Prokopenko², Mikhail Prokopenko¹, Albert Y. Zomaya² - Show less +2 more•Institutions (3)

Commonwealth Scientific and Industrial Research Organisation¹, University of Sydney², Macquarie University³

01 Jan 2014-arXiv: Cellular Automata and Lattice Gases

TL;DR: The nature of distributed computation has long been a topic of interest in complex systems science, physics, artificial life and bioinformatics and has been postulated to be associated with the capability to support universal computation.

...read moreread less

Abstract: The nature of distributed computation has long been a topic of interest in complex systems science, physics, artificial life and bioinformatics. In particular, emergent complex behavior has often been described from the perspective of computation within the system (Mitchell 1998b,a) and has been postulated to be associated with the capability to support universal computation (Langton 1990; Wolfram 1984c; Casti 1991).

...read moreread less

82 citations

Journal Article•DOI•

OmniSuggest: A Ubiquitous Cloud-Based Context-Aware Recommendation System for Mobile Social Networks

[...]

Osman Khalid¹, Muhammad Usman Shahid Khan¹, Samee U. Khan¹, Albert Y. Zomaya²•Institutions (2)

North Dakota State University¹, University of Sydney²

01 Jul 2014-IEEE Transactions on Services Computing

TL;DR: A novel cloud-based recommendation framework OmniSuggest is proposed that utilizes: 1) Ant colony algorithms, 2) social filtering, and 3) hub and authority scores, to generate optimal venue recommendations that offers more effective recommendations than many state of the art schemes.

...read moreread less

Abstract: The evolution of mobile social networks and the availability of online check-in services, such as Foursquare and Gowalla, have initiated a new wave of research in the area of venue recommendation systems. Such systems recommend places to users closely related to their preferences. Although venue recommendation systems have been studied in recent literature, the existing approaches, mostly based on collaborative filtering, suffer from various issues, such as: 1) data sparseness, 2) cold start, and 3) scalability. Moreover, many existing schemes are limited in functionality, as the generated recommendations do not consider group of “friends” type situations. Furthermore, the traditional systems do not take into account the effect of real-time physical factors (e.g., distance from venue, traffic, and weather conditions) on recommendations. To address the aforementioned issues, this paper proposes a novel cloud-based recommendation framework OmniSuggest that utilizes: 1) Ant colony algorithms, 2) social filtering, and 3) hub and authority scores, to generate optimal venue recommendations. Unlike existing work, our approach suggests venues at a finer granularity for an individual or a “group” of friends with similar interest. Comprehensive experiments are conducted with a large-scale real dataset collected from Foursquare. The results confirm that our method offers more effective recommendations than many state of the art schemes.

...read moreread less

81 citations

Journal Article•DOI•

Survey on Grid Resource Allocation Mechanisms

[...]

Muhammad Bilal Qureshi¹, Maryam Mehri Dehnavi², Nasro Min-Allah², Muhammad Shuaib Qureshi³, Hameed Hussain¹, Ilias Rentifis⁴, Nikos Tziritas⁵, Thanasis Loukopoulos⁶, Samee U. Khan⁷, Cheng-Zhong Xu⁵, Albert Y. Zomaya⁸ - Show less +7 more•Institutions (8)

COMSATS Institute of Information Technology¹, Massachusetts Institute of Technology², King Abdulaziz University³, University of Thessaly⁴, Chinese Academy of Sciences⁵, American Hotel & Lodging Educational Institute⁶, North Dakota State University⁷, University of Sydney⁸

01 Jun 2014

TL;DR: In this article, the authors explore and discuss existing resource allocation mechanisms for resource allocation problems employed in Grid systems and compare them based on their common features such as time complexity, searching mechanism, allocation strategy, optimality, operational environment and objective function they adopt for solving computing and data-intensive applications.

...read moreread less

Abstract: Grid is a distributed high performance computing paradigm that offers various types of resources (like computing, storage, communication) to resource-intensive user tasks. These tasks are scheduled to allocate available Grid resources efficiently to achieve high system throughput and to satisfy user requirements. The task scheduling problem has become more complex with the ever increasing size of Grid systems. Even though selecting an efficient resource allocation strategy for a particular task helps in obtaining a desired level of service, researchers still face difficulties in choosing a suitable technique from a plethora of existing methods in literature. In this paper, we explore and discuss existing resource allocation mechanisms for resource allocation problems employed in Grid systems. The work comprehensively surveys Gird resource allocation mechanisms for different architectures (centralized, distributed, static or dynamic). The paper also compares these resource allocation mechanisms based on their common features such as time complexity, searching mechanism, allocation strategy, optimality, operational environment and objective function they adopt for solving computing- and data-intensive applications. The comprehensive analysis of cutting-edge research in the Grid domain presented in this work provides readers with an understanding of essential concepts of resource allocation mechanisms in Grid systems and helps them identify important and outstanding issues for further investigation. It also helps readers to choose the most appropriate mechanism for a given system/application.

...read moreread less

75 citations

Journal Article•

Survey on Grid Resource Allocation Mechanisms

[...]

Muhammad Bilal Qureshi, Maryam Mehri Dehnavi, Nasro Min-Allah, Muhammad Shuaib Qureshi, Hameed Hussain, Ilias Rentifis, Nikos Tziritas, Thanasis Loukopoulos, Samee U. Khan, Cheng-Zhong Xu, Albert Y. Zomaya - Show less +7 more

01 Apr 2014-Springer Netherlands

TL;DR: The comprehensive analysis of cutting-edge research in the Grid domain presented in this work provides readers with an understanding of essential concepts of resource allocation mechanisms in Grid systems and helps them identify important and outstanding issues for further investigation and helps readers to choose the most appropriate mechanism for a given system/application.

...read moreread less

71 citations

Journal Article•DOI•

Sample Subset Optimization Techniques for Imbalanced and Ensemble Learning Problems in Bioinformatics Applications

[...]

Pengyi Yang¹, Paul D. Yoo², Juanita Fernando³, Bing Bing Zhou¹, Zili Zhang⁴, Albert Y. Zomaya¹ - Show less +2 more•Institutions (4)

Information Technology University¹, Khalifa University², Monash University, Clayton campus³, Southwest University⁴

01 Mar 2014-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: The application of SSO techniques to imbalanced and ensemble learning problems, respectively are described and the utilities and advantages of the proposed techniques are demonstrated on a variety of bioinformatics applications where class imbalance, small sample size, and noisy data are prevalent.

...read moreread less

Abstract: Data sampling is a widely used technique in a broad range of machine learning problems. Traditional sampling approaches generally rely on random resampling from a given dataset. However, these approaches do not take into consideration additional information, such as sample quality and usefulness. We recently proposed a data sampling technique, called sample subset optimization (SSO). The SSO technique relies on a cross-validation procedure for identifying and selecting the most useful samples as subsets. In this paper, we describe the application of SSO techniques to imbalanced and ensemble learning problems, respectively. For imbalanced learning, the SSO technique is employed as an under-sampling technique for identifying a subset of highly discriminative samples in the majority class. In ensemble learning, the SSO technique is utilized as a generic ensemble technique where multiple optimized subsets of samples from each class are selected for building an ensemble classifier. We demonstrate the utilities and advantages of the proposed techniques on a variety of bioinformatics applications where class imbalance, small sample size, and noisy data are prevalent.

...read moreread less

69 citations

Journal Article•DOI•

Security, energy, and performance-aware resource allocation mechanisms for computational grids

[...]

Joanna Kolodziej, Samee U. Khan¹, Lizhe Wang², Marek Kisiel-Dorohinicki³, Sajjad A. Madani⁴, Ewa Niewiadomska-Szynkiewicz⁵, Albert Y. Zomaya⁶, Cheng-Zhong Xu⁷ - Show less +4 more•Institutions (7)

North Dakota State University¹, Chinese Academy of Sciences², AGH University of Science and Technology³, COMSATS Institute of Information Technology⁴, Warsaw University of Technology⁵, University of Sydney⁶, Wayne State University⁷

01 Feb 2014-Future Generation Computer Systems

TL;DR: This paper defines the independent batch scheduling in Computational Grid as a three-objective global optimization problem with makespan, flowtime and energy consumption as the main scheduling criteria minimized according to different security constraints, and develops six genetic-based single- and multi-population meta-heuristics for solving the considered optimization problem.

...read moreread less

67 citations

Journal Article•DOI•

Task-Tree Based Large-Scale Mosaicking for Massive Remote Sensed Imageries with Dynamic DAG Scheduling

[...]

Yan Ma¹, Lizhe Wang¹, Albert Y. Zomaya², Dan Chen³, Rajiv Ranjan⁴ - Show less +1 more•Institutions (4)

Chinese Academy of Sciences¹, University of Sydney², China University of Geosciences (Wuhan)³, Commonwealth Scientific and Industrial Research Organisation⁴

01 Aug 2014-IEEE Transactions on Parallel and Distributed Systems

TL;DR: A task-tree based mosaicking for remote sensed imageries at large scale with dynamic DAG scheduling that expresses large scale mosaicking as a data-driven task tree with minimal height is proposed.

...read moreread less

Abstract: Remote sensed imagery mosaicking at large scale has been receiving increasing attentions in regional to global research. However, when scaling to large areas, image mosaicking becomes extremely challenging for the dependency relationships among a large collection of tasks which give rise to ordering constraint, the demand of significant processing capabilities and also the difficulties inherent in organizing these enormous tasks and RS image data. We propose a task-tree based mosaicking for remote sensed imageries at large scale with dynamic DAG scheduling. It expresses large scale mosaicking as a data-driven task tree with minimal height. And also a critical path based dynamical DAG scheduling solution with status queue named CPDS-SQ is provided to offer an optimized schedule on multi-core cluster with minimal completion time. All the individual dependent tasks are run by a core parallel mosaicking program implemented with MPI to perform mosaicking on different pairs of images. Eventually, an effective but easier approach is offered to improve the large-scale processing capability by decoupling the dependence relationships among tasks from the complex parallel processing procedure. Through experiments on large-scale mosaicking, we confirmed that our approach were efficient and scalable.

...read moreread less

63 citations

Journal Article•DOI•

A Web Platform for Interconnecting Body Sensors and Improving Health Care

[...]

Pedro Maia¹, Thais Batista¹, Everton Cavalcante¹, Everton Cavalcante², Augusto Baffa³, Flavia C. Delicato⁴, Paulo F. Pires⁴, Albert Y. Zomaya⁵ - Show less +4 more•Institutions (5)

Federal University of Rio Grande do Norte¹, University of Southern Brittany², Pontifical Catholic University of Rio de Janeiro³, Federal University of Rio de Janeiro⁴, University of Sydney⁵

01 Jan 2014-Procedia Computer Science

TL;DR: This paper introduces EcoHealth (Ecosystem of Health Care Devices), a Web middleware platform for connecting doctors and patients using attached body sensors, thus aiming to provide improved health monitoring and diagnosis for patients.

...read moreread less

Journal Article•DOI•

Evolutionary Scheduling of Dynamic Multitasking Workloads for Big-Data Analytics in Elastic Cloud

[...]

Fan Zhang¹, Junwei Cao², Wei Tan³, Samee U. Khan⁴, Keqin Li², Albert Y. Zomaya⁵ - Show less +2 more•Institutions (5)

Massachusetts Institute of Technology¹, Tsinghua University², IBM³, North Dakota State University⁴, University of Sydney⁵

18 Aug 2014-IEEE Transactions on Emerging Topics in Computing

TL;DR: Experimental results show that the evolutionary approach compared with existing methods, such as Monte Carlo and Blind Pick, can achieve higher overall average scheduling performance in real-world applications with dynamic workloads and an optimal computing budget allocating method that smartly allocates computing cycles to the most promising schedules.

...read moreread less

Abstract: Scheduling of dynamic and multitasking workloads for big-data analytics is a challenging issue, as it requires a significant amount of parameter sweeping and iterations. Therefore, real-time scheduling becomes essential to increase the throughput of many-task computing. The difficulty lies in obtaining a series of optimal yet responsive schedules. In dynamic scenarios, such as virtual clusters in cloud, scheduling must be processed fast enough to keep pace with the unpredictable fluctuations in the workloads to optimize the overall system performance. In this paper, ordinal optimization using rough models and fast simulation is introduced to obtain suboptimal solutions in a much shorter timeframe. While the scheduling solution for each period may not be the best, ordinal optimization can be processed fast in an iterative and evolutionary way to capture the details of big-data workload dynamism. Experimental results show that our evolutionary approach compared with existing methods, such as Monte Carlo and Blind Pick, can achieve higher overall average scheduling performance, such as throughput, in real-world applications with dynamic workloads. Furthermore, performance improvement is seen by implementing an optimal computing budget allocating method that smartly allocates computing cycles to the most promising schedules.

...read moreread less

Journal Article•DOI•

Pareto-Optimal Cloud Bursting

[...]

Mohammad Reza Hoseiny Farahabady, Young Choon Lee, Albert Y. Zomaya

01 Oct 2014-IEEE Transactions on Parallel and Distributed Systems

TL;DR: PANDA is presented, a framework for static scheduling BoT applications across resources in both private and public clouds that incorporates a fully polynomial-time approximation scheme (FPTAS) as a novel scheduling algorithm, which generates schedules with the best trade-off point between cost and performance; hence Pareto-optimality.

...read moreread less

Abstract: Large-scale Bag-of-Tasks (BoT) applications are characterized by their massively parallel, yet independent operations. The use of resources in public clouds to dynamically expand the capacity of a private computer system might be an appealing alternative to cope with such massive parallelism. To fully realize the benefit of this `cloud bursting', the performance to cost ratio (or cost efficiency) must be thoroughly studied and incorporated into scheduling and resource allocation strategies. In this paper, we present PANDA, a framework for static scheduling BoT applications across resources in both private and public clouds. The framework at the core incorporates a fully polynomial-time approximation scheme (FPTAS) as a novel scheduling algorithm, which generates schedules with the best trade-off point between cost and performance; hence Pareto-optimality. We have theoretically discussed the complexity and correctness of our algorithms, and experimentally verified their efficacy and practicality using ISOMAP-a widely-used nonlinear manifold method as a real-world BoT application. Our evaluation conducted in a 'multi-cloud' environment of our 40-core private system and Amazon EC2 public cloud demonstrates the scheduling quality of PANDA is guaranteed to be within a measurable distance from the optimal solution. Results obtained from our experiments show such quality is 8 percent or less from the optimum. We also show the sensitivity and robustness of our scheduling solutions against performance errors in both resources and applications.

...read moreread less

Journal Article•DOI•

An optimal and stable feature selection approach for traffic classification based on multi-criterion fusion

[...]

Adil Fahad¹, Adil Fahad², Zahir Tari¹, Ibrahim Khalil¹, Abdulmohsen Almalawi³, Abdulmohsen Almalawi¹, Albert Y. Zomaya⁴ - Show less +3 more•Institutions (4)

RMIT University¹, Al Baha University², King Abdulaziz University³, University of Sydney⁴

01 Jul 2014

TL;DR: The proposed GOA first combines multiple well-known FS techniques to yield a possible optimal feature subsets across different traffic datasets; then the proposed adaptive threshold, which is based on entropy to extract the stable features.

...read moreread less

Abstract: There is significant interest in the network management community about the need to identify the most optimal and stable features for network traffic data. In practice, feature selection techniques are used as a pre-processing step to eliminate meaningless features, and also as a tool to reveal the set of optimal features. Unfortunately, such techniques are often sensitive to a small variation in the traffic data. Thus, obtaining a stable feature set is crucial in enhancing the confidence of network operators. This paper proposes an robust approach, called the Global Optimization Approach (GOA), to identify both optimal and stable features, relying on multi-criterion fusion-based feature selection technique and an information-theoretic method. The proposed GOA first combines multiple well-known FS techniques to yield a possible optimal feature subsets across different traffic datasets; then the proposed adaptive threshold, which is based on entropy to extract the stable features. A new goodness measure is proposed within a Random Forest framework to estimate the final optimum feature subset. Experimental studies on network traffic data in spatial and temporal domains show that the proposed GOA approach outperforms the commonly used feature selection techniques for traffic classification task.

...read moreread less

Journal Article•DOI•

A hierarchical approach for energy-efficient scheduling of large workloads in multicore distributed systems

[...]

Bernabé Dorronsoro¹, Sergio Nesmachnow², Javid Taheri³, Albert Y. Zomaya³, El-Ghazali Talbi¹, Pascal Bouvry⁴ - Show less +2 more•Institutions (4)

university of lille¹, University of the Republic², University of Sydney³, University of Luxembourg⁴

01 Dec 2014-Sustainable Computing: Informatics and Systems

TL;DR: A two-level strategy for scheduling large workloads of parallel applications in multicore distributed systems, taking into account the minimization of both the total computation time and the energy consumption of solutions is presented.

...read moreread less

Journal Article•DOI•

Long term and large scale time synchronization in wireless sensor networks

[...]

Ge Huang¹, Albert Y. Zomaya¹, Flavia C. Delicato², Paulo F. Pires²•Institutions (2)

University of Sydney¹, Federal University of Rio de Janeiro²

01 Jan 2014-Computer Communications

TL;DR: Theoretical analysis and simulation results show that when the synchronization period is less than 100s, the error of 2LTSP is within 0.6ms, no matter how large the size of the network is, even for large-scale and long-term running networks.

...read moreread less

Journal Article•DOI•

A localized algorithm for Structural Health Monitoring using wireless sensor networks

[...]

Igor Leão dos Santos¹, Luci Pirmez¹, írico T. Lemos¹, Flavia C. Delicato¹, Luiz Antonio Vaz Pinto¹, J. Neuman de Souza², Albert Y. Zomaya³ - Show less +3 more•Institutions (3)

Federal University of Rio de Janeiro¹, Federal University of Ceará², University of Sydney³

01 Jan 2014-Information Fusion

TL;DR: A localized algorithm supported by multilevel information fusion techniques to enable detection, localization and extent determination of damage sites using the resource constrained environment of a wireless sensor network is proposed.

...read moreread less

Journal Article•DOI•

A control and decision system for smart buildings using wireless sensor and actuator networks

[...]

Claudio M. de Farias¹, Henrique Soares¹, Luci Pirmez¹, Flavia C. Delicato¹, Igor Leão dos Santos¹, Luiz Fernando Rust da Costa Carmo, José Neuman de Souza², Albert Y. Zomaya³, Mischa Dohler⁴ - Show less +5 more•Institutions (4)

Federal University of Rio de Janeiro¹, Federal University of Ceará², University of Sydney³, King's College London⁴

01 Jan 2014

TL;DR: CONDE is presented, a decentralised CONtrol and DEcision-making system for smart building applications using WSANs and shows gains in terms of the following: response time; system efficiency; and energy savings from the network and the building.

...read moreread less

Abstract: A research field that makes use of information and communication technologies to provide solutions to contemporaneous environmental challenges such as greenhouse gas emissions and global warming is the 'smart building' field. The use of wireless sensor and actuator networks WSANs emerges as an alternative for the use of information and communication technologies in the smart buildings. However, most of smart building applications make use of centralised architectures with sensing nodes transmitting messages to a base station wherein, effectively, the control and decision processes happen. In this context, we present CONDE, a decentralised CONtrol and DEcision-making system for smart building applications using WSANs. CONDE main contributions are as follows: i the decentralisation of the control and decision-making processes among WSAN nodes, saving energy of both the WSAN and the building; ii the integration of applications through sharing the sensed data and chaining decisions between applications within the WSAN, also saving energy of both the WSAN and the building; and iii the provision of a consensual multilevel decision that takes into account the cooperation among nodes to have a broader view of the monitored building. Performed experiments have shown CONDE gains in terms of the following: i response time; ii system efficiency; and iii energy savings from the network and the building. Copyright © 2014 John Wiley & Sons, Ltd.

...read moreread less

Proceedings Article•DOI•

Energy-Efficient Data Center Networks Planning with Virtual Machine Placement and Traffic Configuration

[...]

Ting Yang¹, Young Choon Lee², Albert Y. Zomaya²•Institutions (2)

Tianjin University¹, Information Technology University²

15 Dec 2014

TL;DR: This paper presents VPTCA as an energy-efficient data center network planning solution that collectively deals with virtual machine placement and communication traffic configuration, and outperforms existing algorithms in providing DCN more transmission capacity with less energy consumption.

...read moreread less

Abstract: Data Center (DC), the underlying infrastructure of cloud computing, becomes startling large with more powerful computing and communication capability to satisfy the wide spectrum of composite applications. In a large scale DC, a great number of switches connect servers into one complex network. The energy consumption of this communication network has skyrocketed and become the same league as the computing servers' costs. More than one-third of the total energy in DCs is consumed by communication links, switching and aggregation elements. Saving Data Center Network (DCN) energy to improve data center efficiency (power usage effectiveness or PUE) become the key technique in green computing. In this paper, we present VPTCA as an energy-efficient data center network planning solution that collectively deals with virtual machine placement and communication traffic configuration. VPTCA aims to reduce the DCN's energy consumption. In particular, interrelated VMs are assigned into the same server or pod, which effectively helps to reduce the amount of transmission load. In the layer of traffic message, VPTCA optimally uses switch ports and link bandwidth to balance the load and avoid congestions, enabling DCN to increase its transmission capacity, and saving a significant amount of network energy. In our evaluation via NS-2 simulations, the performance of VPTCA is measured and compared with two well-known DCN management algorithms, Global First Fit and Elastic Tree. Based on our experimental results, VPTCA outperforms existing algorithms in providing DCN more transmission capacity with less energy consumption.

...read moreread less

Journal Article•DOI•

Performance Analysis of EDF Scheduling in a Multi-Priority Preemptive M/G/1 Queue

[...]

Vidura Gamini Abhaya¹, Zahir Tari¹, Panlop Zeephongsekul¹, Albert Y. Zomaya²•Institutions (2)

RMIT University¹, University of Sydney²

01 Aug 2014-IEEE Transactions on Parallel and Distributed Systems

TL;DR: Comparisons with other algorithms reveal that EDF achieves a better balance among priority classes where high priority requests are favored while preventing lower priority requests from overstarvation.

...read moreread less

Abstract: This paper presents a queueing theoretic performance model for a multi-priority preemptive M/G/1/./EDF system. Existing models on EDF scheduling consider them to be M/M/1 queues or non-preemptive M/G/1 queues. The proposed model approximates the mean waiting time for a given class based on the higher and lower priority tasks receiving service prior to the target and the mean residual service time experienced. Additional time caused by preemptions is estimated as part of mean request completion time for a given class and as part of the mean delay experienced due to jobs in execution, on an arrival. The model is evaluated analytically and by simulation. Results confirm its accuracy, with the difference being a factor of two on average in high loads. Comparisons with other algorithms (such as First-Come-First-Served, Round-Robin and Non-Preemptive Priority Ordered) reveal that EDF achieves a better balance among priority classes where high priority requests are favoured while preventing lower priority requests from over-starvation. EDF achieves best waiting times for higher priorities in lower to moderate loads (0.2 - 0.6) and while only being 6.5 times more than static priority algorithms in high loads (0.9). However for the lowest priority classes it achieves comparable waiting times to Round-Robin and First-Come-First-Served in low to moderate loads and achieves waiting times only twice the amount of Round-Robin in high system loads.

...read moreread less

Journal Article•DOI•

Pareto frontier for job execution and data transfer time in hybrid clouds

[...]

Javid Taheri¹, Albert Y. Zomaya¹, Howard Jay Siegel², Zahir Tari³•Institutions (3)

University of Sydney¹, Colorado State University², RMIT University³

01 Jul 2014-Future Generation Computer Systems

TL;DR: A Particle Swarm Optimization (PSO)-based approach is proposed, called here PSO-ParFnt, to find the relevant Pareto frontier, and the results are promising and provide new insights into this complex problem.

...read moreread less

Journal Article•DOI•

Estimating the Statistical Characteristics of Remote Sensing Big Data in the Wavelet Transform Domain

[...]

Lizhe Wang¹, Hui Zhong¹, Rajiv Ranjan², Albert Y. Zomaya³, Peng Liu¹ - Show less +1 more•Institutions (3)

Chinese Academy of Sciences¹, Commonwealth Scientific and Industrial Research Organisation², University of Sydney³

12 Sep 2014-IEEE Transactions on Emerging Topics in Computing

TL;DR: A wavelet transform is used to represent remote sensing big data that are large scale in the space domain, correlated in the spectral domain, and continuous in the time domain and it is found that the scale features of different textures for the big data set are obviously reflected in the probability density function and GMM parameters of the wavelet coefficients.

...read moreread less

Abstract: Since it is difficult to deal with big data using traditional models and algorithms, predicting and estimating the characteristics of big data is very important. Remote sensing big data consist of many large-scale images that are extremely complex in terms of their structural, spectral, and textual features. Based on multiresolution analysis theory, most of the natural images are sparse and have obvious clustering and persistence characters when they are transformed into another domain by a group of basic special functions. In this paper, we use a wavelet transform to represent remote sensing big data that are large scale in the space domain, correlated in the spectral domain, and continuous in the time domain. We decompose the big data set into approximate multiscale detail coefficients based on a wavelet transform. In order to determine whether the density function of wavelet coefficients in a big data set are peaky at zero and have a heavy tailed shape, a two-component Gaussian mixture model (GMM) is employed. For the first time, we use the expectation-maximization likelihood method to estimate the model parameters for the remote sensing big data set in the wavelet domain. The variance of the GMM with changing of bands, time, and scale are comprehensively analyzed. The statistical characteristics of different textures are also compared. We find that the cluster characteristics of the wavelet coefficients are still obvious in the remote sensing big data set for different bands and different scales. However, it is not always precise when we model the long-term sequence data set using the GMM. We also found that the scale features of different textures for the big data set are obviously reflected in the probability density function and GMM parameters of the wavelet coefficients.

...read moreread less

Journal Article•DOI•

Randomized approximation scheme for resource allocation in hybrid-cloud environment

[...]

Mohammadreza Hoseiny Farahabady¹, Young Choon Lee¹, Albert Y. Zomaya¹•Institutions (1)

University of Sydney¹

01 Aug 2014-The Journal of Supercomputing

TL;DR: This paper explores how resources in the hybrid-cloud environment should be used to run Bag-of-Tasks applications and introduces a simple yet effective objective function that approximates the optimal solution with a little scheduling overhead.

...read moreread less

Abstract: Using the virtually unlimited resource capacity of public cloud, dynamic scaling out of large-scale applications is facilitated. A critical question arises practically here is how to run such applications effectively in terms of both cost and performance. In this paper, we explore how resources in the hybrid-cloud environment should be used to run Bag-of-Tasks applications. Having introduced a simple yet effective objective function, our algorithm helps the user to make a better decision for realization of his/her goal. Then, we cope with the problem in two different cases of "known" and "unknown" running time of available tasks. A solution to approximate the optimal value of user's objective function will be provided for each case. Specifically, a fully polynomial-time randomized approximation scheme based on a Monte Carlo sampling method will be presented in case of unknown running time. The experimental results confirm that our algorithm approximates the optimal solution with a little scheduling overhead.

...read moreread less

Posted Content•

Privacy of Big Data in the Internet of Things Era.

[...]

Charith Perera, Rajiv Ranjan, Lizhe Wang, Samee U. Khan, Albert Y. Zomaya - Show less +1 more

29 Dec 2014

TL;DR: Some of the main challenges of privacy in IoT, and opportunities for research and innovation are discussed; some of the ongoing research efforts that address IoT privacy issues are introduced.

...read moreread less

Abstract: Over the last few years, we have seen a plethora of Internet of Things (IoT) solutions, products and services, making their way into the industry's market-place. All such solution will capture a large amount of data pertaining to the environment, as well as their users. The objective of the IoT is to learn more and to serve better the system users. Some of these solutions may store the data locally on the devices (‘things’), and others may store in the Cloud. The real value of collecting data comes through data processing and aggregation in large-scale where new knowledge can be extracted. However, such procedures can also lead to user privacy issues. This article discusses some of the main challenges of privacy in IoT, and opportunities for research and innovation. We also introduce some of the ongoing research efforts that address IoT privacy issues.

...read moreread less

Proceedings Article•DOI•

Online Multiple Workflow Scheduling under Privacy and Deadline in Hybrid Cloud Environment

[...]

Shaghayegh Sharif¹, Javid Taheri¹, Albert Y. Zomaya¹, Surya Nepal²•Institutions (2)

Information Technology University¹, Commonwealth Scientific and Industrial Research Organisation²

15 Dec 2014

TL;DR: Two online algorithms to schedule multiple workflows under deadline and privacy constraints, while considering the dynamic nature of hybrid cloud environment are presented.

...read moreread less

Abstract: Organizations overcome resource shortages by utilizing the multiple services of cloud providers. This leads to sharing resources among various public and private clouds in order to improve the performance while executing the organization's complex workflow systems. Executing multiple workflows in such a hybrid environment needs an effective mapping between workflow's tasks and cloud resources that considers the trade-off between budget and time. There is also a challenge when organizations are forced to deploy workflow's tasks on public resources to execute the tasks before their requested deadlines without violating customers' privacy. In recent years, several online and static approaches were presented to schedule single or multiple workflows considering deadline and budget in cloud environments. However, these studies neglect the privacy constraint along with other SLAs such as deadline and budget. In this paper, we present two online algorithms to schedule multiple workflows under deadline and privacy constraints, while considering the dynamic nature of hybrid cloud environment. The proposed algorithms were evaluated with a series of simulation as well as real experiments using real-life privacy constrained healthcare workflows. Our two algorithms use different methods to rank the tasks: one utilises a novel technique for ranking, the other uses a similar approach to current existing studies. Results show that the novel approach outperforms the current existing ranking methods.

...read moreread less

Proceedings Article•

Multisensor data fusion in Shared Sensor and Actuator Networks

[...]

Claudio M. de Farias¹, Luci Pirmez¹, Flavia C. Delicato¹, Luiz Fernando Rust da Costa Carmo¹, Wei Li², Albert Y. Zomaya², José Neuman de Souza³ - Show less +3 more•Institutions (3)

Federal University of Rio de Janeiro¹, University of Sydney², Federal University of Ceará³

07 Jul 2014

TL;DR: This work presents an adaptation of well-known MDFs to deal with multiple applications simultaneously in the SSANs context, and is validated through simulations and tests on real nodes in the domain of smart grid applications.

...read moreread less

Abstract: Recent years have witnessed the emergence of the Shared Sensor and Actuator Networks (SSANs), which instead of assuming an application-specific design, allow the sensing and communication infrastructure to be shared among multiple applications. With an increasing number of sharing applications, a growing amount of sensor-generated data will be produced, from which useful information can be extracted. However, wireless sensors and actuators commonly rely on batteries as their energy sources, whose replacement is undesirable or unfeasible. Therefore, in order to reduce the amount of data to be transmitted in the wireless channel, thus saving energy, Multisensor Data Fusion Methods (MDF) can be employed. MDF can also enhance data accuracy in the SSAN scenario and make inferences that are not feasible from a single sensor or data source. Existing MDFs are currently utilized following an application-specific design for the network. We present an adaptation of well-known MDFs to deal with multiple applications simultaneously in the SSANs context. Our proposal is validated through simulations and tests on real nodes in the domain of smart grid applications.

...read moreread less

Proceedings Article•DOI•

Optimizing Scientific Workflows in the Cloud: A Montage Example

[...]

Qingye Jiang¹, Young Choon Lee¹, Manuel Arenaz², Luke M. Leslie³, Albert Y. Zomaya¹ - Show less +1 more•Institutions (3)

University of Sydney¹, University of A Coruña², University of Illinois at Urbana–Champaign³

08 Dec 2014

TL;DR: This paper develops a workflow visualization toolkit to synthetize resource consumption and data transfer patterns, as well as to identify the bottleneck, of the workflow being studied, and addresses the optimization of scientific workflow execution in clouds by exploiting multi-core systems with the parallelization of bottleneck tasks.

...read moreread less

Abstract: As scientific workflows are increasingly deployed in clouds, a myriad of studies have been conducted-including the development of workflow execution systems and scheduling/resource-management algorithms-for optimizing the execution of these workflows. However, the efficacy of most, if not all, of these previous works is limited by the original design and structure of workflow, i.e., Sequential code and few bottleneck tasks. In this paper, we address the optimization of scientific workflow execution in clouds by exploiting multi-core systems with the parallelization of bottleneck tasks. To this end, we develop a workflow visualization toolkit to synthetize resource consumption and data transfer patterns, as well as to identify the bottleneck, of the workflow being studied. Parallelization techniques are then applied to the module that is identified as the bottleneck in order to take full advantage of the underlying multicore computing environment. Testing results with a 6.0-degree Montage example on Amazon EC2 with various configurations show that our optimization of workflows (bottleneck tasks in particular) reduces completion time (or make span) by 21% to 43% depending on the instance type being used to run the workflow, without any impact on the cost.

...read moreread less

Journal Article•DOI•

Localized algorithms for information fusion in resource constrained networks

[...]

Eduardo F. Nakamura, Antonio A. F. Loureiro, Azzedine Boukerche, Albert Y. Zomaya

01 Jan 2014-Information Fusion

TL;DR: A special issue devoted to Localized Algorithms for Information Fusion in Resource-Constrained Networks, which deals with the challenge of working with partial views, or incomplete data, to provide accurate results at reduced cost.

...read moreread less

Journal Article•DOI•

The p-index: Ranking Scientists using Network Dynamics

[...]

Upul Senanayake¹, Mahendra Piraveenan¹, Albert Y. Zomaya¹•Institutions (1)

University of Sydney¹

01 Jan 2014

TL;DR: The p-index (pagerank-index) is introduced, which is computed from the underlying citation network of papers, and uses the pagerank algorithm in its computation, and demonstrates that the metric aids in fairer ranking of scientists compared to h-index and its variants.

...read moreread less

Abstract: The indices currently used by scholarly databases, such as Google scholar, to rank scientists, do not attach weights to the citations. Neither is the underlying network structure of citations considered in computing these metrics. This results in scientists cited by well-recognized journals not being rewarded, and may lead to potential misuse if documents are created purely to cite others. In this paper we introduce a new ranking metric, the p-index (pagerank-index), which is computed from the underlying citation network of papers, and uses the pagerank algorithm in its computation. The index is a percentile score, and can potentially be implemented in public databases such as Google scholar, and can be applied at many levels of abstraction. We demonstrate that the metric aids in fairer ranking of scientists compared to h-index and its variants. We do this by simulating a realistic model of the evolution of citation and collaboration networks in a particular field, and comparing h-index and p-index of scientists under a number of scenarios. Our results show that the p-index is immune to author behaviors that can result in artificially bloated h-index values.

...read moreread less

Journal Article•DOI•

Energy-efficient task allocation with quality of service provisioning for concurrent applications in multi-functional wireless sensor network systems

[...]

Wei Li¹, Flavia C. Delicato¹, Flavia C. Delicato², Paulo F. Pires¹, Paulo F. Pires², Albert Y. Zomaya¹ - Show less +2 more•Institutions (2)

University of Sydney¹, Federal University of Rio de Janeiro²

10 Aug 2014-Concurrency and Computation: Practice and Experience

TL;DR: The Multi‐Application Requirements Aware and Energy Efficiency algorithm is presented as a new resource allocation heuristic for multi‐functional WSN system to maximize system lifetime subject to various application requirements.

...read moreread less

Abstract: Multi-functional wireless sensor network WSN system is a new design trend of WSNs, which are evolving from dedicated application-specific systems to an integrated infrastructure that supports the execution of multiple concurrent applications. Such system offers inherent advantages in terms of cost and flexibility because it allows the effective utilization of available sensors and resource sharing among multiple applications. However, sensor nodes are very constrained in resources, mainly regarding their energy. Therefore, the usage of such resources needs to be carefully managed, and the sharing with several applications imposes new challenges in achieving energy efficiency in these networks. In order to exploit the full potential of multi-functional WSN systems, it is crucial to design mechanisms that effectively allocate tasks onto sensors so that the entire system lifetime is maximized while meeting various application requirements. However, it is likely that the requirements of different applications cannot be simultaneously met. In this paper, we present the Multi-Application Requirements Aware and Energy Efficiency algorithm as a new resource allocation heuristic for multi-functional WSN system to maximize system lifetime subject to various application requirements. The heuristic effectively deals with different quality of service parameters possibly conflicting trading those parameters and exploiting heterogeneity of multiple WSNs. Copyright © 2013 John Wiley & Sons, Ltd.

...read moreread less