scispace - formally typeset
Search or ask a question

Showing papers by "Albert Y. Zomaya published in 2014"


Journal ArticleDOI
TL;DR: Concepts and algorithms related to clustering, a concise survey of existing (clustering) algorithms as well as a comparison, both from a theoretical and an empirical perspective are introduced.
Abstract: Clustering algorithms have emerged as an alternative powerful meta-learning tool to accurately analyze the massive volume of data generated by modern applications. In particular, their main goal is to categorize data into clusters such that objects are grouped in the same cluster when they are similar according to specific metrics. There is a vast body of knowledge in the area of clustering and there has been attempts to analyze and categorize them for a larger number of applications. However, one of the major issues in using clustering algorithms for big data that causes confusion amongst practitioners is the lack of consensus in the definition of their properties as well as a lack of formal categorization. With the intention of alleviating these problems, this paper introduces concepts and algorithms related to clustering, a concise survey of existing (clustering) algorithms as well as providing a comparison, both from a theoretical and an empirical perspective. From a theoretical perspective, we developed a categorizing framework based on the main properties pointed out in previous studies. Empirically, we conducted extensive experiments where we compared the most representative algorithm from each of the categories using a large number of real (big) data sets. The effectiveness of the candidate clustering algorithms is measured through a number of internal and external validity metrics, stability, runtime, and scalability tests. In addition, we highlighted the set of clustering algorithms that are the best performing for big data.

833 citations


Journal ArticleDOI
TL;DR: It is argued that a middleware platform is required to manage heterogeneous WSNs and efficiently share their resources while satisfying user needs in the emergent scenarios of WoT.

95 citations


Journal ArticleDOI
TL;DR: Next-generation datacenters (DCs) built on virtualization technologies are pivotal to the effective implementation of the cloud computing paradigm and face major reliability and robustness challenges.
Abstract: Next-generation datacenters (DCs) built on virtualization technologies are pivotal to the effective implementation of the cloud computing paradigm. To deliver the necessary services and quality of service, cloud DCs face major reliability and robustness challenges.

93 citations


Book ChapterDOI
TL;DR: The nature of distributed computation has long been a topic of interest in complex systems science, physics, artificial life and bioinformatics and has been postulated to be associated with the capability to support universal computation.
Abstract: The nature of distributed computation has long been a topic of interest in complex systems science, physics, artificial life and bioinformatics. In particular, emergent complex behavior has often been described from the perspective of computation within the system (Mitchell 1998b,a) and has been postulated to be associated with the capability to support universal computation (Langton 1990; Wolfram 1984c; Casti 1991).

82 citations


Journal ArticleDOI
TL;DR: A novel cloud-based recommendation framework OmniSuggest is proposed that utilizes: 1) Ant colony algorithms, 2) social filtering, and 3) hub and authority scores, to generate optimal venue recommendations that offers more effective recommendations than many state of the art schemes.
Abstract: The evolution of mobile social networks and the availability of online check-in services, such as Foursquare and Gowalla, have initiated a new wave of research in the area of venue recommendation systems. Such systems recommend places to users closely related to their preferences. Although venue recommendation systems have been studied in recent literature, the existing approaches, mostly based on collaborative filtering, suffer from various issues, such as: 1) data sparseness, 2) cold start, and 3) scalability. Moreover, many existing schemes are limited in functionality, as the generated recommendations do not consider group of “friends” type situations. Furthermore, the traditional systems do not take into account the effect of real-time physical factors (e.g., distance from venue, traffic, and weather conditions) on recommendations. To address the aforementioned issues, this paper proposes a novel cloud-based recommendation framework OmniSuggest that utilizes: 1) Ant colony algorithms, 2) social filtering, and 3) hub and authority scores, to generate optimal venue recommendations. Unlike existing work, our approach suggests venues at a finer granularity for an individual or a “group” of friends with similar interest. Comprehensive experiments are conducted with a large-scale real dataset collected from Foursquare. The results confirm that our method offers more effective recommendations than many state of the art schemes.

81 citations


Journal ArticleDOI
01 Jun 2014
TL;DR: In this article, the authors explore and discuss existing resource allocation mechanisms for resource allocation problems employed in Grid systems and compare them based on their common features such as time complexity, searching mechanism, allocation strategy, optimality, operational environment and objective function they adopt for solving computing and data-intensive applications.
Abstract: Grid is a distributed high performance computing paradigm that offers various types of resources (like computing, storage, communication) to resource-intensive user tasks. These tasks are scheduled to allocate available Grid resources efficiently to achieve high system throughput and to satisfy user requirements. The task scheduling problem has become more complex with the ever increasing size of Grid systems. Even though selecting an efficient resource allocation strategy for a particular task helps in obtaining a desired level of service, researchers still face difficulties in choosing a suitable technique from a plethora of existing methods in literature. In this paper, we explore and discuss existing resource allocation mechanisms for resource allocation problems employed in Grid systems. The work comprehensively surveys Gird resource allocation mechanisms for different architectures (centralized, distributed, static or dynamic). The paper also compares these resource allocation mechanisms based on their common features such as time complexity, searching mechanism, allocation strategy, optimality, operational environment and objective function they adopt for solving computing- and data-intensive applications. The comprehensive analysis of cutting-edge research in the Grid domain presented in this work provides readers with an understanding of essential concepts of resource allocation mechanisms in Grid systems and helps them identify important and outstanding issues for further investigation. It also helps readers to choose the most appropriate mechanism for a given system/application.

75 citations


Journal Article
TL;DR: The comprehensive analysis of cutting-edge research in the Grid domain presented in this work provides readers with an understanding of essential concepts of resource allocation mechanisms in Grid systems and helps them identify important and outstanding issues for further investigation and helps readers to choose the most appropriate mechanism for a given system/application.
Abstract: Grid is a distributed high performance computing paradigm that offers various types of resources (like computing, storage, communication) to resource-intensive user tasks. These tasks are scheduled to allocate available Grid resources efficiently to achieve high system throughput and to satisfy user requirements. The task scheduling problem has become more complex with the ever increasing size of Grid systems. Even though selecting an efficient resource allocation strategy for a particular task helps in obtaining a desired level of service, researchers still face difficulties in choosing a suitable technique from a plethora of existing methods in literature. In this paper, we explore and discuss existing resource allocation mechanisms for resource allocation problems employed in Grid systems. The work comprehensively surveys Gird resource allocation mechanisms for different architectures (centralized, distributed, static or dynamic). The paper also compares these resource allocation mechanisms based on their common features such as time complexity, searching mechanism, allocation strategy, optimality, operational environment and objective function they adopt for solving computing- and data-intensive applications. The comprehensive analysis of cutting-edge research in the Grid domain presented in this work provides readers with an understanding of essential concepts of resource allocation mechanisms in Grid systems and helps them identify important and outstanding issues for further investigation. It also helps readers to choose the most appropriate mechanism for a given system/application.

71 citations


Journal ArticleDOI
TL;DR: The application of SSO techniques to imbalanced and ensemble learning problems, respectively are described and the utilities and advantages of the proposed techniques are demonstrated on a variety of bioinformatics applications where class imbalance, small sample size, and noisy data are prevalent.
Abstract: Data sampling is a widely used technique in a broad range of machine learning problems. Traditional sampling approaches generally rely on random resampling from a given dataset. However, these approaches do not take into consideration additional information, such as sample quality and usefulness. We recently proposed a data sampling technique, called sample subset optimization (SSO). The SSO technique relies on a cross-validation procedure for identifying and selecting the most useful samples as subsets. In this paper, we describe the application of SSO techniques to imbalanced and ensemble learning problems, respectively. For imbalanced learning, the SSO technique is employed as an under-sampling technique for identifying a subset of highly discriminative samples in the majority class. In ensemble learning, the SSO technique is utilized as a generic ensemble technique where multiple optimized subsets of samples from each class are selected for building an ensemble classifier. We demonstrate the utilities and advantages of the proposed techniques on a variety of bioinformatics applications where class imbalance, small sample size, and noisy data are prevalent.

69 citations


Journal ArticleDOI
TL;DR: This paper defines the independent batch scheduling in Computational Grid as a three-objective global optimization problem with makespan, flowtime and energy consumption as the main scheduling criteria minimized according to different security constraints, and develops six genetic-based single- and multi-population meta-heuristics for solving the considered optimization problem.

67 citations


Journal ArticleDOI
TL;DR: A task-tree based mosaicking for remote sensed imageries at large scale with dynamic DAG scheduling that expresses large scale mosaicking as a data-driven task tree with minimal height is proposed.
Abstract: Remote sensed imagery mosaicking at large scale has been receiving increasing attentions in regional to global research. However, when scaling to large areas, image mosaicking becomes extremely challenging for the dependency relationships among a large collection of tasks which give rise to ordering constraint, the demand of significant processing capabilities and also the difficulties inherent in organizing these enormous tasks and RS image data. We propose a task-tree based mosaicking for remote sensed imageries at large scale with dynamic DAG scheduling. It expresses large scale mosaicking as a data-driven task tree with minimal height. And also a critical path based dynamical DAG scheduling solution with status queue named CPDS-SQ is provided to offer an optimized schedule on multi-core cluster with minimal completion time. All the individual dependent tasks are run by a core parallel mosaicking program implemented with MPI to perform mosaicking on different pairs of images. Eventually, an effective but easier approach is offered to improve the large-scale processing capability by decoupling the dependence relationships among tasks from the complex parallel processing procedure. Through experiments on large-scale mosaicking, we confirmed that our approach were efficient and scalable.

63 citations


Journal ArticleDOI
TL;DR: This paper introduces EcoHealth (Ecosystem of Health Care Devices), a Web middleware platform for connecting doctors and patients using attached body sensors, thus aiming to provide improved health monitoring and diagnosis for patients.

Journal ArticleDOI
TL;DR: Experimental results show that the evolutionary approach compared with existing methods, such as Monte Carlo and Blind Pick, can achieve higher overall average scheduling performance in real-world applications with dynamic workloads and an optimal computing budget allocating method that smartly allocates computing cycles to the most promising schedules.
Abstract: Scheduling of dynamic and multitasking workloads for big-data analytics is a challenging issue, as it requires a significant amount of parameter sweeping and iterations. Therefore, real-time scheduling becomes essential to increase the throughput of many-task computing. The difficulty lies in obtaining a series of optimal yet responsive schedules. In dynamic scenarios, such as virtual clusters in cloud, scheduling must be processed fast enough to keep pace with the unpredictable fluctuations in the workloads to optimize the overall system performance. In this paper, ordinal optimization using rough models and fast simulation is introduced to obtain suboptimal solutions in a much shorter timeframe. While the scheduling solution for each period may not be the best, ordinal optimization can be processed fast in an iterative and evolutionary way to capture the details of big-data workload dynamism. Experimental results show that our evolutionary approach compared with existing methods, such as Monte Carlo and Blind Pick, can achieve higher overall average scheduling performance, such as throughput, in real-world applications with dynamic workloads. Furthermore, performance improvement is seen by implementing an optimal computing budget allocating method that smartly allocates computing cycles to the most promising schedules.

Journal ArticleDOI
TL;DR: PANDA is presented, a framework for static scheduling BoT applications across resources in both private and public clouds that incorporates a fully polynomial-time approximation scheme (FPTAS) as a novel scheduling algorithm, which generates schedules with the best trade-off point between cost and performance; hence Pareto-optimality.
Abstract: Large-scale Bag-of-Tasks (BoT) applications are characterized by their massively parallel, yet independent operations. The use of resources in public clouds to dynamically expand the capacity of a private computer system might be an appealing alternative to cope with such massive parallelism. To fully realize the benefit of this `cloud bursting', the performance to cost ratio (or cost efficiency) must be thoroughly studied and incorporated into scheduling and resource allocation strategies. In this paper, we present PANDA, a framework for static scheduling BoT applications across resources in both private and public clouds. The framework at the core incorporates a fully polynomial-time approximation scheme (FPTAS) as a novel scheduling algorithm, which generates schedules with the best trade-off point between cost and performance; hence Pareto-optimality. We have theoretically discussed the complexity and correctness of our algorithms, and experimentally verified their efficacy and practicality using ISOMAP-a widely-used nonlinear manifold method as a real-world BoT application. Our evaluation conducted in a 'multi-cloud' environment of our 40-core private system and Amazon EC2 public cloud demonstrates the scheduling quality of PANDA is guaranteed to be within a measurable distance from the optimal solution. Results obtained from our experiments show such quality is 8 percent or less from the optimum. We also show the sensitivity and robustness of our scheduling solutions against performance errors in both resources and applications.

Journal ArticleDOI
01 Jul 2014
TL;DR: The proposed GOA first combines multiple well-known FS techniques to yield a possible optimal feature subsets across different traffic datasets; then the proposed adaptive threshold, which is based on entropy to extract the stable features.
Abstract: There is significant interest in the network management community about the need to identify the most optimal and stable features for network traffic data. In practice, feature selection techniques are used as a pre-processing step to eliminate meaningless features, and also as a tool to reveal the set of optimal features. Unfortunately, such techniques are often sensitive to a small variation in the traffic data. Thus, obtaining a stable feature set is crucial in enhancing the confidence of network operators. This paper proposes an robust approach, called the Global Optimization Approach (GOA), to identify both optimal and stable features, relying on multi-criterion fusion-based feature selection technique and an information-theoretic method. The proposed GOA first combines multiple well-known FS techniques to yield a possible optimal feature subsets across different traffic datasets; then the proposed adaptive threshold, which is based on entropy to extract the stable features. A new goodness measure is proposed within a Random Forest framework to estimate the final optimum feature subset. Experimental studies on network traffic data in spatial and temporal domains show that the proposed GOA approach outperforms the commonly used feature selection techniques for traffic classification task.

Journal ArticleDOI
TL;DR: A two-level strategy for scheduling large workloads of parallel applications in multicore distributed systems, taking into account the minimization of both the total computation time and the energy consumption of solutions is presented.

Journal ArticleDOI
TL;DR: Theoretical analysis and simulation results show that when the synchronization period is less than 100s, the error of 2LTSP is within 0.6ms, no matter how large the size of the network is, even for large-scale and long-term running networks.

Journal ArticleDOI
TL;DR: A localized algorithm supported by multilevel information fusion techniques to enable detection, localization and extent determination of damage sites using the resource constrained environment of a wireless sensor network is proposed.

Journal ArticleDOI
01 Jan 2014
TL;DR: CONDE is presented, a decentralised CONtrol and DEcision-making system for smart building applications using WSANs and shows gains in terms of the following: response time; system efficiency; and energy savings from the network and the building.
Abstract: A research field that makes use of information and communication technologies to provide solutions to contemporaneous environmental challenges such as greenhouse gas emissions and global warming is the 'smart building' field. The use of wireless sensor and actuator networks WSANs emerges as an alternative for the use of information and communication technologies in the smart buildings. However, most of smart building applications make use of centralised architectures with sensing nodes transmitting messages to a base station wherein, effectively, the control and decision processes happen. In this context, we present CONDE, a decentralised CONtrol and DEcision-making system for smart building applications using WSANs. CONDE main contributions are as follows: i the decentralisation of the control and decision-making processes among WSAN nodes, saving energy of both the WSAN and the building; ii the integration of applications through sharing the sensed data and chaining decisions between applications within the WSAN, also saving energy of both the WSAN and the building; and iii the provision of a consensual multilevel decision that takes into account the cooperation among nodes to have a broader view of the monitored building. Performed experiments have shown CONDE gains in terms of the following: i response time; ii system efficiency; and iii energy savings from the network and the building. Copyright © 2014 John Wiley & Sons, Ltd.

Proceedings ArticleDOI
15 Dec 2014
TL;DR: This paper presents VPTCA as an energy-efficient data center network planning solution that collectively deals with virtual machine placement and communication traffic configuration, and outperforms existing algorithms in providing DCN more transmission capacity with less energy consumption.
Abstract: Data Center (DC), the underlying infrastructure of cloud computing, becomes startling large with more powerful computing and communication capability to satisfy the wide spectrum of composite applications. In a large scale DC, a great number of switches connect servers into one complex network. The energy consumption of this communication network has skyrocketed and become the same league as the computing servers' costs. More than one-third of the total energy in DCs is consumed by communication links, switching and aggregation elements. Saving Data Center Network (DCN) energy to improve data center efficiency (power usage effectiveness or PUE) become the key technique in green computing. In this paper, we present VPTCA as an energy-efficient data center network planning solution that collectively deals with virtual machine placement and communication traffic configuration. VPTCA aims to reduce the DCN's energy consumption. In particular, interrelated VMs are assigned into the same server or pod, which effectively helps to reduce the amount of transmission load. In the layer of traffic message, VPTCA optimally uses switch ports and link bandwidth to balance the load and avoid congestions, enabling DCN to increase its transmission capacity, and saving a significant amount of network energy. In our evaluation via NS-2 simulations, the performance of VPTCA is measured and compared with two well-known DCN management algorithms, Global First Fit and Elastic Tree. Based on our experimental results, VPTCA outperforms existing algorithms in providing DCN more transmission capacity with less energy consumption.

Journal ArticleDOI
TL;DR: Comparisons with other algorithms reveal that EDF achieves a better balance among priority classes where high priority requests are favored while preventing lower priority requests from overstarvation.
Abstract: This paper presents a queueing theoretic performance model for a multi-priority preemptive M/G/1/./EDF system. Existing models on EDF scheduling consider them to be M/M/1 queues or non-preemptive M/G/1 queues. The proposed model approximates the mean waiting time for a given class based on the higher and lower priority tasks receiving service prior to the target and the mean residual service time experienced. Additional time caused by preemptions is estimated as part of mean request completion time for a given class and as part of the mean delay experienced due to jobs in execution, on an arrival. The model is evaluated analytically and by simulation. Results confirm its accuracy, with the difference being a factor of two on average in high loads. Comparisons with other algorithms (such as First-Come-First-Served, Round-Robin and Non-Preemptive Priority Ordered) reveal that EDF achieves a better balance among priority classes where high priority requests are favoured while preventing lower priority requests from over-starvation. EDF achieves best waiting times for higher priorities in lower to moderate loads (0.2 - 0.6) and while only being 6.5 times more than static priority algorithms in high loads (0.9). However for the lowest priority classes it achieves comparable waiting times to Round-Robin and First-Come-First-Served in low to moderate loads and achieves waiting times only twice the amount of Round-Robin in high system loads.

Journal ArticleDOI
TL;DR: A Particle Swarm Optimization (PSO)-based approach is proposed, called here PSO-ParFnt, to find the relevant Pareto frontier, and the results are promising and provide new insights into this complex problem.

Journal ArticleDOI
TL;DR: A wavelet transform is used to represent remote sensing big data that are large scale in the space domain, correlated in the spectral domain, and continuous in the time domain and it is found that the scale features of different textures for the big data set are obviously reflected in the probability density function and GMM parameters of the wavelet coefficients.
Abstract: Since it is difficult to deal with big data using traditional models and algorithms, predicting and estimating the characteristics of big data is very important. Remote sensing big data consist of many large-scale images that are extremely complex in terms of their structural, spectral, and textual features. Based on multiresolution analysis theory, most of the natural images are sparse and have obvious clustering and persistence characters when they are transformed into another domain by a group of basic special functions. In this paper, we use a wavelet transform to represent remote sensing big data that are large scale in the space domain, correlated in the spectral domain, and continuous in the time domain. We decompose the big data set into approximate multiscale detail coefficients based on a wavelet transform. In order to determine whether the density function of wavelet coefficients in a big data set are peaky at zero and have a heavy tailed shape, a two-component Gaussian mixture model (GMM) is employed. For the first time, we use the expectation-maximization likelihood method to estimate the model parameters for the remote sensing big data set in the wavelet domain. The variance of the GMM with changing of bands, time, and scale are comprehensively analyzed. The statistical characteristics of different textures are also compared. We find that the cluster characteristics of the wavelet coefficients are still obvious in the remote sensing big data set for different bands and different scales. However, it is not always precise when we model the long-term sequence data set using the GMM. We also found that the scale features of different textures for the big data set are obviously reflected in the probability density function and GMM parameters of the wavelet coefficients.

Journal ArticleDOI
TL;DR: This paper explores how resources in the hybrid-cloud environment should be used to run Bag-of-Tasks applications and introduces a simple yet effective objective function that approximates the optimal solution with a little scheduling overhead.
Abstract: Using the virtually unlimited resource capacity of public cloud, dynamic scaling out of large-scale applications is facilitated. A critical question arises practically here is how to run such applications effectively in terms of both cost and performance. In this paper, we explore how resources in the hybrid-cloud environment should be used to run Bag-of-Tasks applications. Having introduced a simple yet effective objective function, our algorithm helps the user to make a better decision for realization of his/her goal. Then, we cope with the problem in two different cases of "known" and "unknown" running time of available tasks. A solution to approximate the optimal value of user's objective function will be provided for each case. Specifically, a fully polynomial-time randomized approximation scheme based on a Monte Carlo sampling method will be presented in case of unknown running time. The experimental results confirm that our algorithm approximates the optimal solution with a little scheduling overhead.

Posted Content
29 Dec 2014
TL;DR: Some of the main challenges of privacy in IoT, and opportunities for research and innovation are discussed; some of the ongoing research efforts that address IoT privacy issues are introduced.
Abstract: Over the last few years, we have seen a plethora of Internet of Things (IoT) solutions, products and services, making their way into the industry's market-place. All such solution will capture a large amount of data pertaining to the environment, as well as their users. The objective of the IoT is to learn more and to serve better the system users. Some of these solutions may store the data locally on the devices (‘things’), and others may store in the Cloud. The real value of collecting data comes through data processing and aggregation in large-scale where new knowledge can be extracted. However, such procedures can also lead to user privacy issues. This article discusses some of the main challenges of privacy in IoT, and opportunities for research and innovation. We also introduce some of the ongoing research efforts that address IoT privacy issues.

Proceedings ArticleDOI
15 Dec 2014
TL;DR: Two online algorithms to schedule multiple workflows under deadline and privacy constraints, while considering the dynamic nature of hybrid cloud environment are presented.
Abstract: Organizations overcome resource shortages by utilizing the multiple services of cloud providers. This leads to sharing resources among various public and private clouds in order to improve the performance while executing the organization's complex workflow systems. Executing multiple workflows in such a hybrid environment needs an effective mapping between workflow's tasks and cloud resources that considers the trade-off between budget and time. There is also a challenge when organizations are forced to deploy workflow's tasks on public resources to execute the tasks before their requested deadlines without violating customers' privacy. In recent years, several online and static approaches were presented to schedule single or multiple workflows considering deadline and budget in cloud environments. However, these studies neglect the privacy constraint along with other SLAs such as deadline and budget. In this paper, we present two online algorithms to schedule multiple workflows under deadline and privacy constraints, while considering the dynamic nature of hybrid cloud environment. The proposed algorithms were evaluated with a series of simulation as well as real experiments using real-life privacy constrained healthcare workflows. Our two algorithms use different methods to rank the tasks: one utilises a novel technique for ranking, the other uses a similar approach to current existing studies. Results show that the novel approach outperforms the current existing ranking methods.

Proceedings Article
07 Jul 2014
TL;DR: This work presents an adaptation of well-known MDFs to deal with multiple applications simultaneously in the SSANs context, and is validated through simulations and tests on real nodes in the domain of smart grid applications.
Abstract: Recent years have witnessed the emergence of the Shared Sensor and Actuator Networks (SSANs), which instead of assuming an application-specific design, allow the sensing and communication infrastructure to be shared among multiple applications. With an increasing number of sharing applications, a growing amount of sensor-generated data will be produced, from which useful information can be extracted. However, wireless sensors and actuators commonly rely on batteries as their energy sources, whose replacement is undesirable or unfeasible. Therefore, in order to reduce the amount of data to be transmitted in the wireless channel, thus saving energy, Multisensor Data Fusion Methods (MDF) can be employed. MDF can also enhance data accuracy in the SSAN scenario and make inferences that are not feasible from a single sensor or data source. Existing MDFs are currently utilized following an application-specific design for the network. We present an adaptation of well-known MDFs to deal with multiple applications simultaneously in the SSANs context. Our proposal is validated through simulations and tests on real nodes in the domain of smart grid applications.

Proceedings ArticleDOI
08 Dec 2014
TL;DR: This paper develops a workflow visualization toolkit to synthetize resource consumption and data transfer patterns, as well as to identify the bottleneck, of the workflow being studied, and addresses the optimization of scientific workflow execution in clouds by exploiting multi-core systems with the parallelization of bottleneck tasks.
Abstract: As scientific workflows are increasingly deployed in clouds, a myriad of studies have been conducted-including the development of workflow execution systems and scheduling/resource-management algorithms-for optimizing the execution of these workflows. However, the efficacy of most, if not all, of these previous works is limited by the original design and structure of workflow, i.e., Sequential code and few bottleneck tasks. In this paper, we address the optimization of scientific workflow execution in clouds by exploiting multi-core systems with the parallelization of bottleneck tasks. To this end, we develop a workflow visualization toolkit to synthetize resource consumption and data transfer patterns, as well as to identify the bottleneck, of the workflow being studied. Parallelization techniques are then applied to the module that is identified as the bottleneck in order to take full advantage of the underlying multicore computing environment. Testing results with a 6.0-degree Montage example on Amazon EC2 with various configurations show that our optimization of workflows (bottleneck tasks in particular) reduces completion time (or make span) by 21% to 43% depending on the instance type being used to run the workflow, without any impact on the cost.

Journal ArticleDOI
TL;DR: A special issue devoted to Localized Algorithms for Information Fusion in Resource-Constrained Networks, which deals with the challenge of working with partial views, or incomplete data, to provide accurate results at reduced cost.

Journal ArticleDOI
01 Jan 2014
TL;DR: The p-index (pagerank-index) is introduced, which is computed from the underlying citation network of papers, and uses the pagerank algorithm in its computation, and demonstrates that the metric aids in fairer ranking of scientists compared to h-index and its variants.
Abstract: The indices currently used by scholarly databases, such as Google scholar, to rank scientists, do not attach weights to the citations. Neither is the underlying network structure of citations considered in computing these metrics. This results in scientists cited by well-recognized journals not being rewarded, and may lead to potential misuse if documents are created purely to cite others. In this paper we introduce a new ranking metric, the p-index (pagerank-index), which is computed from the underlying citation network of papers, and uses the pagerank algorithm in its computation. The index is a percentile score, and can potentially be implemented in public databases such as Google scholar, and can be applied at many levels of abstraction. We demonstrate that the metric aids in fairer ranking of scientists compared to h-index and its variants. We do this by simulating a realistic model of the evolution of citation and collaboration networks in a particular field, and comparing h-index and p-index of scientists under a number of scenarios. Our results show that the p-index is immune to author behaviors that can result in artificially bloated h-index values.

Journal ArticleDOI
TL;DR: The Multi‐Application Requirements Aware and Energy Efficiency algorithm is presented as a new resource allocation heuristic for multi‐functional WSN system to maximize system lifetime subject to various application requirements.
Abstract: Multi-functional wireless sensor network WSN system is a new design trend of WSNs, which are evolving from dedicated application-specific systems to an integrated infrastructure that supports the execution of multiple concurrent applications. Such system offers inherent advantages in terms of cost and flexibility because it allows the effective utilization of available sensors and resource sharing among multiple applications. However, sensor nodes are very constrained in resources, mainly regarding their energy. Therefore, the usage of such resources needs to be carefully managed, and the sharing with several applications imposes new challenges in achieving energy efficiency in these networks. In order to exploit the full potential of multi-functional WSN systems, it is crucial to design mechanisms that effectively allocate tasks onto sensors so that the entire system lifetime is maximized while meeting various application requirements. However, it is likely that the requirements of different applications cannot be simultaneously met. In this paper, we present the Multi-Application Requirements Aware and Energy Efficiency algorithm as a new resource allocation heuristic for multi-functional WSN system to maximize system lifetime subject to various application requirements. The heuristic effectively deals with different quality of service parameters possibly conflicting trading those parameters and exploiting heterogeneity of multiple WSNs. Copyright © 2013 John Wiley & Sons, Ltd.