scispace - formally typeset
Search or ask a question

Showing papers presented at "Parallel and Distributed Computing: Applications and Technologies in 2019"


Proceedings ArticleDOI
01 Dec 2019
TL;DR: This paper proposed and showed the effectiveness of employing a new combination of deep learning methods for human activity recognition (HAR) which features a combination of Convolutional neural networks (CNN) and Long short-term Memory (LSTM) networks for HAR task.
Abstract: The traditional methods of recognizing human activities involve typical machine learning (ML) algorithms which uses heuristic engineered features. Human activities are dynamic in nature and are encoded with a sequence of actions. ML methods are able to perform activity recognition tasks but may not exploit the temporal correlations of the input data. Therefore, in this paper, we proposed and showed the effectiveness of employing a new combination of deep learning (DL) methods for human activity recognition (HAR). DL methods are capable of extracting discriminative features automatically from the raw sensor data. Specifically, in this paper, we proposed a hybrid architecture which features a combination of Convolutional neural networks (CNN) and Long short-term Memory (LSTM) networks for HAR task. The model is tested on UCI HAR dataset which is a benchmark dataset and comprises of accelerometer and gyroscope data obtained from a smartphone. Our experimental results showed that our proposed method outperformed the recent results which used pure LSTM and bidirectional LSTM networks on the same dataset.

26 citations


Proceedings ArticleDOI
01 Dec 2019
TL;DR: Proof-of-Concept (PoC) evaluation results show the potential of the extended CloudSim extension in terms of execution efficiency and simulation reality.
Abstract: The performance of Functions-as-a-Service (FaaS) would be significantly improved by organizing cloud servers into a hierarchical distributed architecture, resulting in low-latency access and faster response when compared to centralized cloud. However, the distributed organization introduces a new type of decision making problem for placing and executing functions to a specific cloud server. In order to handle the problem, we extended a well-known cloud computing simulator, CloudSim. The extended CloudSim enables users to define FaaS functions with various characteristics and service level objectives (SLOs), place them across geo-distributed cloud servers, and evaluate per-function performance. Proof-of-Concept (PoC) evaluation results show the potential of our CloudSim extension in terms of execution efficiency and simulation reality.

15 citations


Proceedings ArticleDOI
01 Dec 2019
TL;DR: The paper discusses dark web sites which users can grab the dark web jihadist services and anonymous markets including safety precautions and how the law enforcement agencies exponentially tracking down the users with terror behaviours and activities.
Abstract: Cybercrimes and cyber criminals widely use dark web and illegal functionalities of the dark web towards the world crisis. More than half of the criminal activities and the terror activities conducted through the dark web such as, cryptocurrency, selling human organs, red rooms, child pornography, arm deals, drug deals, hire assassins and hackers, hacking software and malware programs, etc. The law enforcement agencies such as FBI, NSA, Interpol, Mossad, FSB etc, are always conducting surveillance programs through the dark web to trace down the mass criminals and terrorists while stopping the crimes and the terror activities. This paper is about the dark web marketing and surveillance programs. In the deep end research will discuss the dark web access with securely and how the law enforcement agencies exponentially tracking down the users with terror behaviours and activities. Moreover, the paper discusses dark web sites which users can grab the dark web jihadist services and anonymous markets including safety precautions.

14 citations


Proceedings ArticleDOI
01 Dec 2019
TL;DR: The benefits of using the clause concurrent of OmpSs when performing reductions, more specifically, when applied to the dot product (DOT) operations are presented and the use of the concurrent clause may improve performance.
Abstract: In this paper, we present the benefits of using the clause concurrent of OmpSs when performing reductions, more specifically, when applied to the dot product (DOT) operations. We analyze its benefits through the implementation of different versions of the Conjugate Gradient (CG) method. We start from a parallel version of the code based on tasks and dependencies; later, we introduce the use of the concurrent clause, which allows to overlap the execution of tasks that have data dependencies among them. In this way, we want to show the benefits of the concurrent clause, which might be included in OpenMP standard as previously done with other OmpSs features. Our tests, performed on a single node of the (Intel-based) Marenostrum 4 Supercomputer and a single socket of the (ARM-based) Dibona cluster, show that the use of the concurrent clause may improve performance with respect to the version where only tasks and dependencies are used around 37% and 23% respectively.

9 citations


Proceedings ArticleDOI
01 Dec 2019
TL;DR: A cluster-based under-sampling algorithm (CUS) according to the important characteristic of support vector machines (SVM) classification relying on support vector is proposed, which overcomes the limitation of the original algorithm.
Abstract: Imbalanced classification problem is a hot issue in data mining and machine learning. Traditional classification algorithms are proposed based on some form of symmetry hypothesis of class distribution, whose main purpose is to improve the overall classification performance. It is difficult to obtain ideal classification result when handling imbalanced datasets. In order to improve the classification performance of imbalanced datasets, this paper proposes a cluster-based under-sampling algorithm (CUS) according to the important characteristic of support vector machines (SVM) classification relying on support vector. Firstly, majority class is divided into different clusters using improved clustering by fast search and find of density peaks (CFSFDP) algorithm. The improved clustering algorithm can realize automatic selection of clustering centers, which overcomes the limitation of the original algorithm. Then the minority class and each cluster of the majority class are used to construct training set to get the support vector of each cluster by support vector machine. Retaining support vectors for each cluster and deleting non-support vectors are to construct a new majority class sample points to obtain relatively balanced datasets. Finally, the new datasets are classified by support vector machines and the performance is evaluated by cross validation sets. The experimental results show that CUS algorithm is effective.

9 citations


Proceedings ArticleDOI
01 Dec 2019
TL;DR: A novel event context word extension technique based on background knowledge is proposed and a feed-forward neural network based approach to detect event causality from tweets is developed.
Abstract: Twitter has become a great source of user-generated information about events. Very often people report causal relationships between events in their tweets. Automatic detection of causality information in these events might play an important role in prescriptive event analytics. Existing approaches include both rule-based and data-driven supervised methods. However, it is challenging to identify event causality accurately using linguistic rules due to the unstructured nature and grammatical incorrectness of social media short text such as tweets. Also, it is difficult to develop a data-driven supervised method for event causality detection in tweets due to insufficient contextual information. This paper proposes a novel event context word extension technique based on background knowledge. To demonstrate the effectiveness of our event context word extension technique, we develop a feed-forward neural network based approach to detect event causality from tweets. Extensive experiments demonstrate the superiority of our approach.

8 citations


Proceedings ArticleDOI
01 Dec 2019
TL;DR: This paper proposes a new general high-level approach to track and reconstruct states in the scope of heuristic optimization systems on GPUs that has a considerably lower memory consumption compared to traditional approaches and scales well with the complexity of the optimization problem.
Abstract: Modern heuristic optimization systems leverage the parallel processing power of Graphics Processing Units (GPUs). Many states are maintained and evaluated in parallel to improve runtime by orders of magnitudes in comparison to purely CPUbased approaches. A well known example is the parallel Monte Carlo tree search, which is often used in combination with more advanced machine-learning methods these days. However, all approaches require different optimization states in memory to update or manipulate variables and observe their behavior over time. Large real-world problems often require a large number of states that are typically limited by the amount of available memory. This is particularly challenging in cases in which older states (that are not currently being evaluated) are still required for backtracking purposes. In this paper, we propose a new general high-level approach to track and reconstruct states in the scope of heuristic optimization systems on GPUs. Our method has a considerably lower memory consumption compared to traditional approaches and scales well with the complexity of the optimization problem.

6 citations


Proceedings ArticleDOI
01 Dec 2019
TL;DR: Performance evaluation results clearly demonstrate that, with a moderate programming effort, the proposed framework can express the collaboration between a vector host and a vector engine so as to make a good use of both of the two different processors.
Abstract: This paper presents an OpenCL-like offload programming framework for NEC SX-Aurora TSUBASA (SXAurora). Unlike traditional vector systems, one node of an SXAurora system consists of a host processor and some vector processors on PCI-Express cards, which are called a vector host and vector engines, respectively. Since the standard OpenCL execution model does not naturally fit in the vector engine, this paper discusses how to adapt the OpenCL specification to SXAurora while considering the trade off between performance and code portability. Performance evaluation results clearly demonstrate that, with a moderate programming effort, the proposed framework can express the collaboration between a vector host and a vector engine so as to make a good use of both of the two different processors. By delegating the right task to the right processor, an OpenCL-like program can fully exploit the performance of SX-Aurora.

5 citations


Proceedings ArticleDOI
01 Dec 2019
TL;DR: This paper presents "DOCKERANALYZER" a software module that detects and identifies execution problems in microservices context that is based on threshold and demonstrates the effectiveness of the proposed solution.
Abstract: This article deals with anomaly detection for microservices-based applications during elastic treatment. In elastic treatment, scaling-up resources is based on threshold. Many studies consider that threshold exceeding is caused by the increase in requests number. However, this exceeding may be caused by many problems such as specific requests requiring a lot of resources or issues related to VMs and containers. That's why, when thresholds are exceeded we propose to apply an analysis treatment that detects and identifies the root cause of the threshold exceeding, either it's caused by a problem such as specific request, VM issue, container issue or it's caused by a normal increase in request's number. This paper presents "DOCKERANALYZER" a software module that detects and identifies execution problems in microservices context. Experimental measurements have been conducted on an IOT platform as a real use-case presenting realistic problems and demonstrating the effectiveness of our proposed solution.

5 citations


Proceedings ArticleDOI
01 Dec 2019
TL;DR: An anomaly detection method using a fixed size user-centric subgraph which is extracted from whole graph made from all the transactions, to prevent the increase of the execution time and to accelerate the anomaly detections.
Abstract: Blockchain is a distributed ledger system composed of P2P network proposed as an electronic cash system which can transfer money without a trusted third party. Blockchain has high tamper resistance by the structure which cannot modify a transaction by everyone including the creator of it. However, it also becomes a problem that Blockchain system cannot modify fraudulent transaction already approved. This problem means once an illegal transaction occurs, the damage expands. It is necessary to detect the transaction by the anomaly detection and modify it before approval in order to suppress the damage. However, existing anomaly detection methods of Blockchain need the processing for all the past transactions in Blockchain. The execution time exceeds the approval interval of the major Blockchain system (Ethereum). In this paper, we propose an anomaly detection method using a fixed size user-centric subgraph which is extracted from whole graph made from all the transactions, to prevent the increase of the execution time. Furthermore, to accelerate the anomaly detections, we propose the subgraph structure which is suitable for GPU processing so that all of the subgraph making, the feature extraction, and the anomaly detection are performed in GPU. When the number of transactions is 300 million, our proposed method archives 195 times faster than the existing GPU-based method and the execution time is shorter than the approval interval of the Ethereum. In terms of accuracy, the true positive rate is significantly higher than the existing method in the case of small scale transactions because the local anomaly can be detected by the subgraph with locality. And the rate in the case of large scale and the false positive rate are close to the existing method.

5 citations


Proceedings ArticleDOI
01 Dec 2019
TL;DR: New multi-hop routing method that takes the advantages of periodic hello messages to conduct routing tables and select efficient node for rebroadcasting messages is proposed and results indicate that removing contention phase to select rebroadcast node results better end-to-end delay and reduce redundant packets.
Abstract: Vehicular Ad hoc Networks (VANET) is a promising technology in the Internet of Things (IoT) which enables communication in vehicle to vehicle (V2V) and vehicle to infrastructure (V2I). VANETs attracts great attention in various applications such as automakers, universities, and traffic police. There are some challenges that to be addressed. The main challenge is routing due to high speed movement of vehicles. Several methods have been proposed to tackle the problem of end-to-end delay in routing. They are still issues from delay to select next-hop relay node. This delay in dense environment broadcast storm problem. Therefore, we proposed new multi-hop routing method that takes the advantages of periodic hello messages to conduct routing tables and select efficient node for rebroadcasting messages. In order to show the performance of the proposed method, extensive results carried out using Network Simulator (NS2) and compare it with other related routing methods. The simulation results indicates that removing contention phase to select rebroadcast node results better end-to-end delay and reduce redundant packets.

Proceedings ArticleDOI
01 Dec 2019
TL;DR: This work analyzes the implications and results of implementing dynamic parallelism, concurrent kernels and CUDA Graphs to solve task-oriented problems and proposes three different methods for solving DGEMM operation on tiled-matrices.
Abstract: In this work, we analyze the implications and results of implementing dynamic parallelism, concurrent kernels and CUDA Graphs to solve task-oriented problems. As a benchmark we propose three different methods for solving DGEMM operation on tiled-matrices; which might be the most popular benchmark for performance analysis. For the algorithms that we study, we present significant differences in terms of data dependencies, synchronization and granularity. The main contribution of this work is determining which of the previous approaches work better for having multiple task running concurrently in a single GPU, as well as stating the main limitations and benefits of every technique. Using dynamic parallelism and CUDA Streams we were able to achieve up to 30% speedups and for CUDA Graph API up to 25x acceleration outperforming state of the art results.

Proceedings ArticleDOI
01 Jan 2019
TL;DR: The goal of the proposed method is to provide a recommender system based on information diffusion and popularity in social networks by adding popularity, similarity and users' trusts, which makes an improvement in tackling the issues and defects of the previous methods.
Abstract: With the rapid advancement of World Wide Web, people can share their knowledge and information via online tools such as sharing systems and ecommerce applications. Many approaches have been proposed to process and organize information. Recommender systems are good successful examples of such tools in providing personalized suggestions. The main purpose of a recommender system is to identify and introduce desired items of a user among many other options (e.g. music, movies, books, news and etc). The goal of our proposed method is to provide a recommender system based on information diffusion and popularity in social networks. By adding popularity, similarity and users' trusts a more efficient system is proposed. This approach makes an improvement in tackling the issues and defects of the previous methods such as prediction accuracy and coverage. The evaluation of the simulated proposed method on MovieLens and Epinions datasets shows that it provides more accurate recommendations in comparison to other approaches.

Proceedings ArticleDOI
01 Dec 2019
TL;DR: In this paper, the authors use the D-Wave 2000Q to monitor how the anneal solution evolves during the process of annealing and obtain an approximate distribution of solutions.
Abstract: Commercial adiabatic quantum annealers have the potential to solve important NP-hard optimization problems efficiently. The newest generation of those machines additionally allows the user to customize the anneal schedule, that is, the schedule with which the anneal fraction is changed from the start to the end of the annealing. In this work we use the aforementioned feature of the D-Wave 2000Q to attempt to monitor how the anneal solution evolves during the anneal process. This process we call slicing: at each time slice during the anneal, we are able to obtain an approximate distribution of anneal solutions. We use our technique to obtain a variety of insights into the D-Wave 2000Q. For example, we observe when individual bits flip during the anneal process and when they stabilize, which allows us to determine the freeze-out point for each qubit individually. We highlight our results using both random QUBO (quadratic unconstrained binary optimization) instances and, for better visualization, instances which we specifically optimize (using our own genetic algorithm) to exhibit a pronounced evolution of its solution during the anneal.

Proceedings ArticleDOI
01 Dec 2019
TL;DR: The capability of graphics processing units (GPU), specifically Nvidia's CUDA platform, is exploited to accelerate the genetic algorithm by modifying the evolutionary operations to fit the hardware architecture.
Abstract: When a deterministic search approach is too costly, such as for non-deterministic polynomial-hard problems, finding near-optimal solutions with approximation algorithms, such as the genetic algorithm, is the only practical approach to reduce the execution time. In this paper, we exploit the capability of graphics processing units (GPU), specifically Nvidia's CUDA platform, to accelerate the genetic algorithm by modifying the evolutionary operations to fit the hardware architecture. This has allowed us to achieve significant computational speedups compared to the non-GPU counterparts.

Proceedings ArticleDOI
01 Dec 2019
TL;DR: This paper investigates D2D-assisted computation offloading for mobile edge computing systems with energy harvesting and proposes a low-complexity online algorithm, which stem from Lyapunov Optimization-based Dynamic Computation Offloading (LODCO) algorithm, to solve the problem.
Abstract: In mobile edge computing (MEC) systems with energy harvesting, the mobile devices are empowered with the energy that harvested from renewable energy sources. On the other hand, mobile devices can offload their computation-intensive tasks to the MEC server to further save energy and reduce the task execution latency. However, the energy harvested is unstable and the mobile devices have to make sure that the energy should not be run out. Moreover, the wireless channel condition between the mobile device and the MEC server is dynamically changing, leading to unstable communication delay. Considering the energy constraints and unstable communication delay, the benefit of computation offloading is limited. In this paper, we investigate D2D-assisted computation offloading for mobile edge computing systems with energy harvesting. In our method, the mobile device is allowed to offload its tasks to the MEC server with the help of its neighbor node. More Specifically, the neighbor node acts as a relay to help the mobile device to communicate with the MEC server. Our goal is to minimize the average task execution time by selecting an optimal execution strategy for each task, i.e., whether to execute the task locally, or offload it to the MEC server directly, or offload it to the MEC server with the help of the most suitable neighbor node, or just to drop it. We propose a low-complexity online algorithm, which stem from Lyapunov Optimization-based Dynamic Computation Offloading (LODCO) algorithm, to solve this problem. Extensive simulations verified the effectiveness of the proposed algorithm, where the average task execution time is reduced around 50% as compared to that of the original LODCO algorithm.

Proceedings ArticleDOI
01 Dec 2019
TL;DR: This work studies how different resources allocation heuristics affect integral job-flow scheduling characteristics in a dedicated simulation environment and proposes a special dynamic programming scheme to select resources depending on how they fit a particular job execution duration.
Abstract: In this work, we consider heuristic algorithms for parallel jobs execution and efficient resources allocation in heterogeneous computing environments. Existing modern job-flow execution features and realities impose many restrictions for the resources allocation procedures. Emerging virtual organizations and incorporated economic scheduling models allow users and resource owners to compete for suitable allocations based on market principles and fair scheduling policies. Subject to these features a special dynamic programming scheme is proposed to select resources depending on how they fit a particular job execution duration. Hindsight approach makes it possible to select between several different scenarios obtained with the same base scheduling procedure. Based on a conservative backfilling scheduling procedure we study how different resources allocation heuristics affect integral job-flow scheduling characteristics in a dedicated simulation environment.

Proceedings ArticleDOI
01 Dec 2019
TL;DR: A new consensus algorithm in permissioned blockchain based on consistent hashing based on pseudo-randomness of the hash operation is presented, which is applicable to blockchain systems containing Byzantine nodes and has a high throughput, low delay and many other advantages.
Abstract: Blockchain, the concept from Bitcoin created by Satoshi Nakamoto, has the potential to decentralise traditionally centralised systems. Blockchain is a distributed ledger for recording information, stored by many nodes without a central organization through distributed systems and cryptography. The consensus algorithm is a protocol that guarantees the consistency of all data in a blockchain system. It is a key for building a blockchain system and an important part that affects the performance of the blockchain system. In this paper, we firstly compare the usage scenarios of different consensus algorithms, their advantages and disadvantages. After that, we present a new consensus algorithm in permissioned blockchain based on consistent hashing. For blockchain system construction, we propose a new design of the hash ring. The pseudo-randomness of the hash operation is used to ensure the randomness of the electoral leadership node in the blockchain system. It avoids the security risk of the fixed leadership node model. Our algorithm is applicable to blockchain systems containing Byzantine nodes and has a high throughput, low delay and many other advantages. Its communication complexity is O(n), significantly better than that of the practical Byzantine fault tolerance algorithm whose communication complexity is O(n2).

Proceedings ArticleDOI
01 Dec 2019
TL;DR: This paper proposes a distributed GA with adaptive parameter controls for solving ordered problems such as the travelling salesman problem, capacitated vehicle routing problem (CVRP) and the job-shop scheduling problem (JSSP).
Abstract: Maintaining population diversity is critical to the performance of a Genetic Algorithm (GA). Applying appropriate strategies for measuring population diversity is important in order to ensure that the mechanisms for controlling population diversity are provided with accurate feedback. Sequence-wise approaches to measuring population diversity have demonstrated their effectiveness in assisting with maintaining population diversity for ordered problems, however these processes increase the computational costs for solving ordered problems. Research in distributed GAs have demonstrated how applying different distribution models can affect an GA's ability to scale and effectively search the solution space. This paper proposes a distributed GA with adaptive parameter controls for solving ordered problems such as the travelling salesman problem(TSP), capacitated vehicle routing problem (CVRP) and the job-shop scheduling problem (JSSP). Extensive experimental results demonstrate the superiority of the proposed approach.

Proceedings ArticleDOI
01 Dec 2019
TL;DR: A thorough systematic categorization and rationalization of security issues, covering the security landscape of Internet of Things/fog computing systems, as well as contributing to the discussion on the aspects of fog computing security and state-of-the-art solutions.
Abstract: The overarching connectivity of "things" in the Internet of Things presents an appealing environment for innovation and business ventures, but also brings a certain set of security challenges. Engineering secure Internet of Things systems requires addressing the peculiar circumstances under which they operate: constraints due to limited resources, high node churn, decentralized decision making, direct interfacing with end users etc. Thus, techniques and methodologies for building secure and robust Internet of Things systems should support these conditions. In this paper, we are presenting a description of the CAAVI-RICS framework, a novel security review methodology tightly coupled with distributed, Internet of Things and fog computing systems. With CAAVI-RICS we are exploring credibility, authentication, authorization, verification, and integrity (CAAVI) through explaining the rationale, influence, concerns and security solutions (RICS) that accompany them. Our contribution is a thorough systematic categorization and rationalization of security issues, covering the security landscape of Internet of Things/fog computing systems, as well as contributing to the discussion on the aspects of fog computing security and state-of-the-art solutions. Specifically, in this paper we explore the Authentication in Internet of Things systems through the RICS review methodology.

Proceedings ArticleDOI
01 Dec 2019
TL;DR: A new dynamic power allocation strategy is proposed that utilizes power-performance models derived from offline data for reallocating power from running jobs to newly arrived jobs to increase overall system utilization and productivity.
Abstract: In the exascale era, HPC systems are expected to operate under different system-wide power-constraints. For such power-constrained systems, improving per-job flops-per-watt may not be sufficient to improve the total HPC productivity as more number of scientific applications with different compute intensities are migrating to the HPC systems. To measure HPC productivity for such applications, we utilize a monotonically decreasing time-dependent value function, called job-value, with each application. A job-value function represents the value of completing a job for an organization. We begin by exploring the trade-off between two commonly used static power allocation strategies (uniform and greedy) in a power-constrained oversubscribed system. We simulate a large-scale system and demonstrate that, at the tightest power constraint, the greedy allocation can lead to 30% higher productivity compared to the uniform allocation whereas, the uniform allocation can gain up to 6% higher productivity at the relaxed power constraint. We then propose a new dynamic power allocation strategy that utilizes power-performance models derived from offline data. We use these models for reallocating power from running jobs to newly arrived jobs to increase overall system utilization and productivity. In our simulation study, we show that compared to static allocation, the dynamic power allocation policy improves node utilization and job completion rates by 20% and 9%, respectively, at the tightest power constraint. Our dynamic approach consistently earns up to 8% higher productivity compared to the best performing static strategy under different power constraints.

Proceedings ArticleDOI
01 Dec 2019
TL;DR: Results show that correct fix operations can be obtained and fault-tolerant logical Hadamard gates can be realized as expected and a universal fault-Tolerant gate set is achieved in single RMQC.
Abstract: We investigate how to implement fault-tolerant logical Hadamard gates in Reed-Muller quantum codes(RMQCs) using the gauge-fixing method. During the realization, we consider the influence of random single-qubit errors by performing the error-detecting measurements. Moreover, some error-detecting stabilizers are simplified by the existing syndromes. Then we identify the errors and modify the syndromes, and refer to the modified syndromes to select the fix operations, and finally perform the error-correcting and fix operations together. Further, we establish a graph model for the RMQCs and exhibit a progress of how to find the fix operations for the unsatisfied stabilizers. We simulate the progress of finding corresponding fix operations for 31-quibt and 63-qubit RMQCs and the whole process of realizing logical Hadamard gate with random single-qubit errors for 15-qubit and 31-quibt RMQCs. Results show that correct fix operations can be obtained and fault-tolerant logical Hadamard gates can be realized as expected. With the implementation of the logical Hadamard gate, a universal fault-tolerant gate set is achieved in single RMQC.

Proceedings ArticleDOI
01 Dec 2019
TL;DR: This paper reviews the state-of-the-art in the overlapping community detection of complex networks, and briefly summarizes the advantages and applications of each algorithm.
Abstract: It is well established that the network is ubiquitous. Social platforms, academic system, and other systems all exist in the form of networks, which often reflect the connections between different individuals in the real world. Effective community detection algorithm can explore the hidden community structure in the network, which has a great positive impact on people's daily life. At present, it has been widely applied in online public opinion monitoring, personalized recommendation, advertising and other fields. As the network structure tends to be complicated, the detection of community structure of complex networks has become a hot topic of current research. This paper reviews the state-of-the-art in the overlapping community detection of complex networks, and briefly summarizes the advantages and applications of each algorithm. Furthermore, the current challenges in overlapping community detection of complex networks are illustrated and some suggestions on future research are proposed.

Proceedings ArticleDOI
01 Dec 2019
TL;DR: On the basis of accumulated job logs on a high-performance computing cluster, machine learning based performance analysis and prediction methods for parallel jobs are examined and the optimal prediction model for different users is selected.
Abstract: There are a lot of middle-class or small-class high-performance computing clusters at universities and research institutes, etc. Large volumes of job logs have been accumulated after many years of operation. In this paper, on the basis of accumulated job logs on a high-performance computing cluster, we examine and analyze the job logs. Then, we study machine learning based performance analysis and prediction methods for parallel jobs. Various machine learning methods such as multivariate linear fitting, artificial neural network are used to build performance prediction models. We compare the errors of each model, and select the optimal prediction model for different users. The experimental results show that we can obtain reasonable prediction accuracy using the selected machine learning algorithms.

Proceedings ArticleDOI
01 Dec 2019
TL;DR: An adaptive multi-layer clustering networking strategy based on capability weights is proposed, which can effectively reduce the load of key nodes such as cluster head, and improves the network performance metrics such as average transmission delay, average transmission hops and load balancing.
Abstract: With the rapid development of Internet of Things (IoT) technology, the number of nodes in wireless sensor networks (WSNs) is explosively increasing, and the scale of network is increased gradually. Traditional single-layer non-clustering network is no longer suitable for current WSNs, which results in high maintenance cost and fast deterioration of network performance. By analyzing the impact of existing static and dynamic clustering schemes on network performance, it is concluded that additional factors need to be considered to improve the overall performance of the network, such as residual energy of nodes, number of neighbor nodes and load balancing. Therefore, an adaptive multi-layer clustering networking strategy based on capability weights is proposed. Based on the real-time changes of each cluster density, node load and residual energy, the node capacity weights are updated dynamically according to the actual network performance, then the cluster heads are renewed adaptively. By comparing the performance metrics in the experiments, proposed strategy can effectively reduce the load of key nodes such as cluster head, and improves the network performance metrics such as average transmission delay, average transmission hops and load balancing.

Proceedings ArticleDOI
01 Dec 2019
TL;DR: A circle drawing processing is developed as one of functions in HLS oriented game software library based on Michener's algorithm which is suitable for hardware implementation as a circle drawing algorithm because it just consists of simple integer operations.
Abstract: We are developing a high-level synthesizable software game library to realize high performance and low power mobile terminals executing game applications. High-level synthesis, HLS, is a technology that automatically converts software into hardware. Games developed by HLS oriented game software library are executed by high-speed and low-power hardware reconfigured on reconfigurable devices in mobile terminals. This paper develops a circle drawing processing as one of functions in HLS oriented game software library. We employ Michener's algorithm which is suitable for hardware implementation as a circle drawing algorithm because it just consists of simple integer operations. We also show the program description method so that HLS tool can convert the circle drawing processing based on Michener's algorithm to a good hardware module. The experiments evaluate the execution time and the amount of hardware of the hardware module HLS generated.

Proceedings ArticleDOI
Wenting Wei1, Kun Wang1, Kexin Wang1, Shengjun Guo1, Huaxi Gu1 
01 Dec 2019
TL;DR: This paper presents a joint bin-packing heuristic and genetic algorithm to reduce the time complexity while obtaining an approximate optimal solution to balance multi-dimensional resource usage and maximize the service rate.
Abstract: The servers in the data center networks have multi-dimensional physical resources, and there is a lot of diversity in resource consumption among tasks. When virtual machines carrying different user requests are deployed on the same server at the same time, it is very likely that there is an imbalanced usage of multi-dimensional resources, resulting in the waste of physical resources. In this paper, we focus on virtual machine placement in data centers aiming to balance multi-dimensional resource usage and maximize the service rate. To solve such a bi-objective optimization problem, we present a joint bin-packing heuristic and genetic algorithm to reduce the time complexity while obtaining an approximate optimal solution.

Proceedings ArticleDOI
01 Dec 2019
TL;DR: This paper proposes a strategy based on control theory for managing the performance of several I/O requests, such as mean response times and read/write throughput in a consolidated environment where multiple virtual services can share access to a storage system.
Abstract: With the increasing popularity of virtual machine monitoring (VMM) technologies, performance variability among collocated virtual machines (VMs) can easily become a severe scalability issue. Particularly, it becomes a necessary for administrative team to control the performance degradation level in a shared environment when multiple I/O-intensive applications simultaneously request their I/O operations [1]. Nevertheless, adding several logical layers between the running applications and the physical storage system, as seen in contemporary virtualized storage devices, makes it considerably difficult to build a low overhead controlling mechanism for such systems (while each VM may running a separate operating system instance) [2]. In this paper, we propose a strategy based on control theory for managing the performance of several I/O requests, such as mean response times and read/write throughput in a consolidated environment where multiple virtual services can share access to a storage system. This scheme uses an approach for measuring the characterization of read/write performance attributes of each virtual services and also takes into account the run-time quality of service enforcement levels requested by them. This is formulated as an optimization problem where a reward function is defined to reduce the overall QoS violation incidents among all consolidated virtual services. Performance evaluation is carried out by comparing the proposed solution with the default embedded Linux controller across a range of emulated application workloads in scenarios with multiple consolidated virtual containers. The results confirm that the proposed solution can reduce the overall QoS violation incident rates in scenarios in which the platform operates at a significant traffic load comparing to the default policy in LXC engine.

Proceedings ArticleDOI
01 Dec 2019
TL;DR: A new dataflow platform of DFC is demonstrated, which can handle the successive dataflow computing passes with tagged data and it was verified that DFC get a reasonable speedup for large scale computing for thread number up to 512.
Abstract: In this paper, we demonstrate a new dataflow platform of DFC, which can handle the successive dataflow computing passes with tagged data. By implementing the matrix multiplication in DFC, we show that DFC can exploit the parallelism automatically with a much simple dataflow graph constructed by DF functions of DFC. Different from the other dataflow execution platform, DFC support multiple worker threads for one dataflow node of DF functions. By running the matrix multiplication program of DFC on Kunlun system, it was verified that DFC get a reasonable speedup for large scale computing for thread number up to 512.

Proceedings ArticleDOI
01 Dec 2019
TL;DR: An empirical model to classify the programs according to their power consumption by using the performance counter statistics is presented and shows that this power model can predict power group membership of a code with an accuracy of more than 96.5%.
Abstract: This paper presents an empirical model to classify the programs according to their power consumption by using the performance counter statistics. The programs with similar power consumption are put into the same group. The difference in power data between two adjacent groups is 5 watts. A power model is generated based on the performance data that the program generated. Discriminant analysis is adopted to generate the power consumption model upon the data from the performance counter statistics. We use discriminant analysis to determine the power category (i.e., the number of the group) that is derived from the independent variable. By using the performance counter variables as the input to the power model, we can predict the level of power consumption of the code, that is, the group that this code belongs to. The experiment results in modeling and validation show that this power model can predict power group membership of a code with an accuracy of more than 96.5%, with the difference of original and predicted group numbers being smaller than 2.