scispace - formally typeset
Search or ask a question

Showing papers presented at "Parallel and Distributed Computing: Applications and Technologies in 2020"


Book ChapterDOI
28 Dec 2020
TL;DR: In this article, a short-term load forecasting based on Convolutional Neural Networks and Bidirectional Long Short-Term Memory (CNN-BiLSTM) with Bayesian Optimization (BO) and Attention Mechanism (AM) was proposed.
Abstract: Short-term power load forecasting is quite vital in maintaining the balance between power production and power consumption of the power grid. Prediction accuracy not only affects the power grid construction, but also influences the economic development of the power grid. This paper proposes a short-term load forecasting based on Convolutional Neural Networks and Bidirectional Long Short-Term Memory (CNN-BiLSTM) with Bayesian Optimization (BO) and Attention Mechanism (AM). The BiLSTM is good at time series forecasting, and the Attention Mechanism can help the model to focus on the important part of the BiLSTM output. In order to make the forecasting performance of the model as good as possible, the Bayesian Optimization is used to tune the hyperparameters of the model. The input of the model is history load, time slot, and meteorological factors. In order to eliminate the seasonal influence, the data set is divided into four subsets with respect to four seasons. The performance of the proposed model is compared with other forecasting models by MAE, RMSE, MAPE, and \(R^2\) score. The experiment results show that the proposed model fits the actual values best and has the best forecasting performance among the contrast models.

5 citations


Book ChapterDOI
28 Dec 2020
TL;DR: LightLayers as discussed by the authors is a method for reducing the number of trainable parameters in DNNs, which consists of LightDense and LightConv2D layers that are as efficient as regular Conv2D and Dense layers but uses less parameters.
Abstract: Deep Neural Networks (DNNs) have become the de-facto standard in computer vision, as well as in many other pattern recognition tasks. A key drawback of DNNs is that the training phase can be very computationally expensive. Organizations or individuals that cannot afford purchasing state-of-the-art hardware or tapping into cloud hosted infrastructures may face a long waiting time before the training completes or might not be able to train a model at all. Investigating novel ways to reduce the training time could be a potential solution to alleviate this drawback, and thus enabling more rapid development of new algorithms and models. In this paper, we propose LightLayers, a method for reducing the number of trainable parameters in DNNs. The proposed LightLayers consists of LightDense and LightConv2D layers that are as efficient as regular Conv2D and Dense layers but uses less parameters. We resort to Matrix Factorization to reduce the complexity of the DNN models resulting in lightweight DNN models that require less computational power, without much loss in the accuracy. We have tested LightLayers on MNIST, Fashion MNIST, CIFAR 10, and CIFAR 100 datasets. Promising results are obtained for MNIST, Fashion MNIST, and CIFAR-10 datasets whereas CIFAR 100 shows acceptable performance by using fewer parameters.

5 citations


Book ChapterDOI
28 Dec 2020
TL;DR: In this article, a distributed model underlying the real distributed processing model Pregel is presented, based on which truss decomposition and truss maintenance algorithms are proposed, and extensive experiments over both real-world and synthetic graphs are conducted.
Abstract: Cohesive subgraphs are applied in various fields. Mining cohesive components such as k-truss have attracted a lot of effort to improve time efficiency in large-scale graphs. The k-truss is a subgraph where each edge is contained in at least \(k-2\) triangles and the problem of truss decomposition is computing the k-trusses of a graph for all k. However, most graphs in real scenarios are usually changing over time. The previous studies take the static graphs as input, and the truss maintenance in dynamic graphs receives little attention. This paper focuses on distributed algorithms for truss maintenance. We present a distributed model underlying the real distributed processing model Pregel. Based on the model, we propose truss decomposition and truss maintenance algorithms. To confirm the effectiveness and efficiency of the proposed algorithms, we conduct extensive experiments over both real-world and synthetic graphs.

5 citations


Book ChapterDOI
28 Dec 2020
TL;DR: In this article, a UAV path planning approach for multi-building inspection is presented, which generates helix paths for single building inspection first and defines the possible points for collecting inspection data with reasonable time slots.
Abstract: This paper presents a UAV path planning approach for multi-building inspection, which is a new application for UAV path planning. It generates helix paths for single building inspection first and defines the possible points for collecting inspection data with reasonable time slots. After inspecting one building, the UAV flies to another building with a trajectory based on a cost matrix and a visited vector defined in this algorithm. The planning of the entire inspection path is evaluated considering several factors, such as distance, time, and altitude. The proposed algorithm is applied to historical giant communal homes, Fujian Tulou, consisting of five buildings.

3 citations


Book ChapterDOI
28 Dec 2020
TL;DR: In this paper, the issue of memory access optimization for the key stencil computations of a high-order CFD program on the NVidia GPU has been studied, and two methods are used to optimize the performance.
Abstract: Stencils computations are a class of computations commonly found in scientific and engineering applications. They have relatively lower arithmetic intensity. Therefore, their performance is greatly affected by memory access. This paper studies the issue of memory access optimization for the key stencil computations of a high-order CFD program on the NVidia GPU. Two methods are used to optimize the performance. First, we use registers to cache the data used by the stencil computations in the kernel. We use the CUDA warp shuffle functions to exchange data between neighboring grid points, and adjust the thread computation granularity to increase the data reuse. Second, we use the shared memory to buffer the grid data used by the stencil computations in the kernel, and utilize loop tiling to reduce redundant accesses to the global memory. Performance evaluation is done on an NVidia Tesla K80 GPU. The results show that compared to the original implementation that only uses the global memory, the optimized implementation that utilizes the registers achieves a maximum speedup of 2.59 and 2.79 relatively for 15M and 60M grids, and the optimized implementation that utilizes the shared memory achieves a maximum speedup of 3.51 and 3.36 relatively for 15M and 60M grids.

2 citations


Book ChapterDOI
28 Dec 2020
TL;DR: OnDisco as discussed by the authors combines reinforcement learning and heuristic methods to minimize the average completion time of co-tasks, which can learn the inherent characteristics of the environment without any prior knowledge and OnDisco is therefore well adapted to varying environments.
Abstract: Edge computing is a promising cloud computing paradigm that reduces computing latency by deploying edge servers near data sources and users, which is of great importance to implement delay-sensitive applications like AR, Cloud Gaming and Auto Driving. Due to the limited resources of edge servers, task dispatching and function configuration are the key to fully utilize edge servers. Moreover, a typical task request in edge computing (called a co-task) is consisted of a set of subtasks, where the task completion time is determined by the latest completed subtask. In this work, we propose a scheme named OnDisco, which combines reinforcement learning and heuristic methods to minimize the average completion time of co-tasks. Compared with heuristic algorithm, deep reinforcement learning can learn the inherent characteristics of the environment without any prior knowledge, and OnDisco is therefore well adapted to varying environments. Simulations on Alibaba traces shows that OnDisco reduces the average task completion time by \(58\%\) and \(76\%\) compared with the heuristic and random algorithm, respectively. Moreover, OnDisco outperforms the baselines consistently in various data environments and parameter settings.

2 citations


Book ChapterDOI
28 Dec 2020
TL;DR: In this article, a streaming model of maximizing monotone lattice submodular functions with a cardinality constraint on the integer lattice was considered and the Sieve-Streaming algorithm combining with a modified binary search subroutine was proposed.
Abstract: In this paper, we consider a streaming model of maximizing monotone lattice submodular function with a cardinality constraint on the integer lattice. As (lattice) submodularity does not imply the diminishing return property on the integer lattice, we introduce the Sieve-Streaming algorithm combining with a modified binary search subroutine to solve the problem. We also show it is with an approximation ratio \(1/2-\epsilon \), a memory complexity \(O( \epsilon ^{-1} k\log k)\), and a query complexity \(O( \epsilon ^{-2}\log ^2 k )\) per element.

2 citations


Book ChapterDOI
28 Dec 2020
TL;DR: In this paper, the authors proposed an algorithm called Maximizing Group Coverage (MGC), which greedily selects the best node based on evaluating the contribution of nodes to the groups, ensuring the success of approximating the maximization of the number of activated groups.
Abstract: Groups play a crucial role in decision-making of social networks, since individual decision-making is often influenced by groups. This brings the Group Influence Maximization (GIM) problem which aims to maximize the expected number of activated groups by finding k seed nodes. The GIM problem has been proved NP-hard while computing the objective function is \(\#P\)-hard under Independent Cascade (IC) model. We propose an algorithm called Maximizing Group Coverage (MGC) which greedily selects the best node based on evaluating the contribution of nodes to the groups, ensuring the success of approximating the maximization of the number of activated groups. Finally, we compare the MGC algorithm with the baseline algorithm called Maximum Coverage (MC) through experiments, demonstrating that MGC outperforms MC under IC model regarding the average number of activated groups.

1 citations


Book ChapterDOI
28 Dec 2020
TL;DR: In this article, a data aggregation aware routing problem is formulated as a mixed-integer non-linear programming problem, and it is proved to be NP-hard, and then a greedy routing algorithm is proposed to solve the formulated problem, by transmitting the data to the closest aggregation node in greedy to reduce the network overhead.
Abstract: For distributed training, the communication overhead for parameter synchronization is heavy in the network. Data aggregation can efficiently alleviate network overheads. However, existing works on data aggregation are based on the streaming message data, which can not well adapt to the discrete communication for parameter synchronization. This paper formulates a data aggregation aware routing problem, with the objective of minimizing training finishing time for global model under the constraint of cache capacity. The problem is formulated as a mixed-integer non-linear programming problem, and it is proved to be NP-Hard. Then we propose a data aggregation aware routing algorithm to solve the formulated problem, by transmitting the data to the closest aggregation node in greedy to reduce the network overhead. Simulation results show that, the proposed algorithm can reduce average training finishing time by \(74\%\), and it can reduce the network overhead by \(33\%\) on average, compared with the shortest path algorithm.

1 citations


Book ChapterDOI
28 Dec 2020
TL;DR: In this article, a Feature Selecting Mechanism (FSM) was proposed to find the most suitable ratio of feature dimension from RGB image and point cloud (converted from depth image) to predict the 6D pose more effectively.
Abstract: In the task of 6D pose estimation by RGB-D image, the crucial problem is how to make the most of two types of features respectively from RGB and depth input. As far as we know, prior approaches treat those two sources equally, which may overlook that the different combinations of those two properties could have varying degrees of impact. Therefore, we propose a Feature Selecting Mechanism (FSM) in this paper to find the most suitable ratio of feature dimension from RGB image and point cloud (converted from depth image) to predict the 6D pose more effectively. We first conduct artificial selection in our Feature Selecting Mechanism (FSM) to prove the potential for the weight of the RGB-D feature. Afterward, the neural network is deployed in our FSM to adaptively pick out features from RGB-D input. Through our experiments on the LINEMOD dataset, YCB-Video dataset, and our multi-pose synthetic image dataset, we show that there is an up to 2% improvement in the accuracy by utilizing our FSM, compared to the state-of-the-art method.

1 citations


Book ChapterDOI
28 Dec 2020
TL;DR: In this article, the authors proposed an online algorithm to minimize the total transfer and caching costs of the system and proved that the competitive ratio of the algorithm is proved to be a constant.
Abstract: With the tremendous increased scale of networks, data transferring plays a very important role in high performance computing. Traditionally, when a task is completed, the data associated with the task will be released to free the space for the following tasks. However, the following task may require the same data, which will be fetched back and the efficiency will be affected too. In this paper, we implement a data caching method so as to improve the performance of the system. When a task is completed, the data will be stored for a longer time. If the previous used data is requested before releasing, there will be no transferring cost. With the help of data caching, the performance of the network in handling real-time tasks is significantly improved while the traffic jamming is also decreased. We extend the previous single data model to multiple type of data among servers. Tasks arrive in an online fashion, and when handling a task on some network server, we have no information about the future tasks. Assuming the storage cost is submodular, which is well adopted in reality, we propose an online algorithm to minimize the total transfer and caching costs of the system. Via theoretical analysis, the competitive ratio of the algorithm is proved to be a constant.

Book ChapterDOI
28 Dec 2020
TL;DR: In this paper, the authors proposed a system-level routing method based on TDM to minimize the maximum TDM ratio that satisfies the strict ratio constraint, and the experimental results show that their method not only solves all problems but also achieves a good TDM-ratio.
Abstract: Multi-FPGA prototype design is widely used to verify modern VLSI circuits, but the limited number of connections between FPGAs in a multi-FPGA system may cause routing failure. Therefore, using time-division multiplexing (TDM) technology, multiple signals are transmitted through the same routing channel to improve utilization. However, the performance of this type of system depends on the routing quality within the FPGAs due to the signal delay between FPGA pairs. In this paper, we propose a system-level routing method based on TDM to minimize the maximum TDM ratio that satisfies the strict ratio constraint. Firstly, we weight the edges and use two methods to build approximate minimum Steiner trees (MST) to route the nets. Then we propose a ratio assignment method based on edge-demand which satisfy the TDM ratio constraint. We tested our method with the benchmarks provided by 2019 CAD Contest at ICCAD and compared it with the top two. The experimental results shows that our method not only solves all problems but also achieves a good TDM ratio.

Book ChapterDOI
28 Dec 2020
TL;DR: This paper studies constant approximation algorithms for some variations of the cluster general routing problem with edge-weighted complete undirected graph G, whose vertex set is partitioned into clusters C_{1,\dots,C_{k}.\)
Abstract: Graph routing problem (GRP) and its generalizations have been extensively studied because of their broad applications in the real world. In this paper, we study a variant of GRP called the general cluster routing problem (GCRP). Let \(G=\left( V,\,E\right) \) be a complete undirected graph with edge weight \(c\left( e\right) \) satisfying the triangle inequality. The vertex set is partitioned into a family of clusters \(C_{1},...,C_{m}\). We are given a required vertex subset \(\overline{V} \subseteq V\) and a required edge subset \(\overline{E} \subseteq E\). The aim is to compute a minimum cost tour that visits each vertex in \(\overline{V}\) exactly once and traverses each edge in \(\overline{E}\) at least once, while the vertices and edges of each cluster are visited consecutively. When the starting and ending vertices of each cluster are specified, we devise an approximation algorithm via incorporating Christofides’ algorithm with minimum weight bipartite matching, achieving an approximation ratio that equals the best approximation ratio of path TSP. Then for the case there are no specified starting and ending vertices for each cluster, we propose a more complicated algorithm with a compromised ratio of 2.67 by employing directed spanning tree against the designated auxiliary graph. Both approximation ratios improve the state-of-art approximation ratio that are respectively 2.4 and 3.25.

Book ChapterDOI
28 Dec 2020
TL;DR: Wang et al. as discussed by the authors introduced a novel contact prediction method based on fully convolutional neural networks and extensively extracted evolutionary features from multi-sequence alignment, which is very effective in interresidue contact prediction.
Abstract: Predicting the corresponding 3D structure from the protein’s sequence is one of the most challenging tasks in computational biology, and a confident interresdiue contact map serves as the main driver towards ab initio protein structure prediction. Benefiting from the ever-increasing sequence databases, residue contact prediction has been revolutionized recently by the introduction of direct coupling analysis and deep learning techniques. However, existing deep learning contact prediction methods often rely on a number of external programs and are therefore computationally expensive. Here, we introduce a novel contact prediction method based on fully convolutional neural networks and extensively extracted evolutionary features from multi-sequence alignment. The results show that our deep learning model based on a highly optimized feature extraction mechanism is very effective in interresidue contact prediction.

Book ChapterDOI
28 Dec 2020
TL;DR: Wang et al. as discussed by the authors proposed the Dynamic Fee Threshold Mechanism to counter the DDoS attack on Bitcoin's memory pool, which uses several characteristics to distinguish the normal transactions in Bitcoin from the malicious spam transactions.
Abstract: Blockchain is a well-known distributed technology which can be regarded as a decentralized database and combines the peer-to-peer architecture, cryptography and consensus mechanism. As the first and most famous application of blockchain and cryptocurrencies, Bitcoin also suffers from different kinds of attacks. Several studies have been done to analyze and solve these attacks. In this paper, we research the Distributed Denial-of-Service (DDoS) on Bitcoin’s memory pool (Mempool). We analyze the feasibility of this kind of attack based on the current mining and relay progress of Bitcoin, and propose the possible effects caused by this attack from some novel aspects of Bitcoin, including the confirmation time, market price and legitimate user. We further present the Dynamic Fee Threshold Mechanism to counter this attack. Our method uses several characteristics to distinguish the normal transactions in Bitcoin from the malicious spam transactions. We also make an analysis and experiments on the method, to demonstrate its effectiveness and accuracy of detecting the malicious transactions.

Book ChapterDOI
28 Dec 2020
TL;DR: Wang et al. as mentioned in this paper proposed a new CNN structure including layers which can incorporate first-order information into the covariance matrix, use eigenvalue vectors to measure the importance of feature channels, reduce the bilinear dimensionality of the parameter matrix, and perform Cholesky decomposition on the positive definite matrix.
Abstract: In the past few years, Convolution Neural Network (CNN) has been successfully applied to many computer vision tasks. Most of these networks can only extract first-order information from input images. The second-order statistical information refers to the second-order correlation obtained by calculating the covariance matrix, the fisher information matrix, or the vector outer product operation on the local feature group according to the channels. It has been shown that using second-order information on facial expression datasets can better capture the distortion of facial area features, while at the same time generate more parameters which may cause much more computational cost. In this article we propose a new CNN structure including layers which can (i) incorporate first-order information into the covariance matrix; (ii) use eigenvalue vectors to measure the importance of feature channels; (iii) reduce the bilinear dimensionality of the parameter matrix; and (iv) perform Cholesky decomposition on the positive definite matrix to complete the compression of the second-order information matrix. Due to the incorporation of both first-order and second-order information and the Cholesky compression strategy, our proposed method reduces the number of parameters by half of the SPDNet model, and simultaneously achieves better results in facial expression classification tasks than the corresponding first-order model and the reference second-order model.

Book ChapterDOI
Jing Zhang1, Jinrong Li1, Zheng Du1, Jiwu Shu2, Qiuming Luo1 
28 Dec 2020
TL;DR: The DFC runtime library as mentioned in this paper constructs the DAG of the dataflow graph, firing the DFC tasks and the synchronizations between the tasks of successive passes, and shows an ideal performance comparing with DSPatch.
Abstract: In this paper, we introduce the DFC dataflow language and its runtime environment. DFC runtime library is in charge of constructing the DAG of the dataflow graph, firing the DFC tasks and the synchronizations between the tasks of successive passes. Basing on an elaborately implemented thread pool and queued Active Data, DFC runtime shows an ideal performance comparing with DSPatch. The experiment of a simple dataflow graph shows that DFC has better performance for the cases that the parallelism beneath the core number, while DSPatch shows a better scalability for the cases of the parallelism exceed the core number. As DFC is still a prototype language, it lacks the ability to construct the DAG dynamically, which leads to low efficiency when coding for a well-structured dataflow graph.

Book ChapterDOI
28 Dec 2020
TL;DR: In this article, the prize-collecting k-Steiner tree problem (PC k-ST) was studied and two approximation algorithms were proposed for the problem with ratios of 5.9672 and 5.5672.
Abstract: In this paper, we study the prize-collecting k-Steiner tree problem (PC k-ST), which is an interesting generalization of both the k-Steiner tree problem (k-ST) and the prize-collecting Steiner tree problem (PCST). In the PC k-ST, we are given an undirected connected graph \(G =(V, E)\), a subset \(R \subseteq V\) called terminals, a root vertex \(r \in V\) and an integer k. Every edge has a non-negative edge cost and every vertex has a non-negative penalty cost. We wish to find an r-rooted tree F that spans at least k vertices in R so as to minimize the total edge costs of F as well as the penalty costs of the vertices not in F. As our main contribution, we propose two approximation algorithms for the PC k-ST with ratios of 5.9672 and 5. The first algorithm is based on an observation of the solutions for the k-ST and the PCST, and the second one is based on the technique of primal-dual.

Book ChapterDOI
28 Dec 2020
TL;DR: In this paper, an optimal polynomial time algorithm for computing the isolated toughness of interval graphs, a subclass of co-comparability graphs, is presented, and the algorithm is shown to be optimal for interval graphs.
Abstract: Factor and fractional factor are widely used in many fields related to computer science. The isolated toughness of an incomplete graph G is defined as \(i\tau (G)=\min \{\frac{|S|}{i(G-S)}:S\in C(G), i(G-S)>1\}\). Otherwise, we set \(i\tau (G)=\infty \) if G is complete. This parameter has a close relationship with the existence of factors and fractional factors of graphs. In this paper, we pay our attention to computational complexity of isolated toughness, and present an optimal polynomial time algorithm to compute the isolated toughness for interval graphs, a subclass of co-comparability graphs.

Book ChapterDOI
28 Dec 2020
TL;DR: The model parameter-transfer method is used to alleviate the problem of less leukocyte labeled data in the training model and deformable convolution is introduced into the main network of target detection to improve the accuracy of the object detection model.
Abstract: Currently, leukocyte detection has the problem of scarcity of labeled samples, so a focal dataset must be expanded by merging multiple datasets. At the same time, given the difference in the dyeing methods, dyeing time, and collection techniques, some datasets have the problem of different homology distributions. Moreover, the effect of direct training after dataset merging is not satisfactory. The morphology of the leukocyte types is also variable and stain contamination occurs, thereby leading to the misjudgment of using traditional convolutional networks. Therefore, in this paper, the model parameter-transfer method is used to alleviate the problem of less leukocyte labeled data in the training model and deformable convolution is introduced into the main network of target detection to improve the accuracy of the object detection model. First, numerous leukocyte datasets are used to train the blood leukocyte binary classification detection network, and the model parameters of the blood leukocyte binary classification detection network are transferred to the blood leukocyte multi classification detection network through the transfer of model parameters. This method can make better use of datasets of the same origin and different distributions so as to solve the problem of scarcity in blood leukocyte data sets. Finally, the multi classification detection network is trained quickly and the accurate blood leukocyte detection results are obtained through fine tuning. The experimental results show that compare our method with the traditional Faster RCNN object detection algorithm, \({mAP}_{0.5}\) is 0.056 higher, \({mAP}_{0.7}\) is 0.119 higher, with higher recall by 4%, and better accuracy by 5%. Thus, the method proposed in this paper can achieve highly accurate leukocyte detection.

Book ChapterDOI
28 Dec 2020
TL;DR: In this article, the authors proposed relaxed sparse locality presenting projection (RSLPP) which introduces two different projection matrices to better accomplish the two tasks of locality preserving and dimension reduction.
Abstract: Locality Preserving Projection (LPP) is a dimensional reduction method that has been widely used in various fields. While traditional LPP only uses a single projection matrix to reduce the dimension and preserve the locality structure of data, it may cause the single matrix may not handle these two tasks well at the same time. Therefore, in this paper, we proposed relaxed sparse locality presenting projection (RSLPP) which introduces two different projection matrices to better accomplish the two tasks. The addition of another projection matrix can help the original projection matrix has more freedom to select the appropriate feature for preserving the local structure of data. The experimental results on two data sets prove the effectiveness of the method.

Book ChapterDOI
28 Dec 2020
TL;DR: Wang et al. as discussed by the authors proposed a Dynamic Heterogeneous Software Effort Estimation (DHSEE) model, which leverages the adversarial auto-encoder and convolutional neural network techniques.
Abstract: In Software Effort Estimation (SEE) practice, the data drought problem has been plaguing researchers and practitioners. Leveraging heterogeneous SEE data collected by other companies is a feasible solution to relieve the data drought problem. However, how to make full use of the heterogeneous effort data to conduct SEE, which is called as Heterogeneous Software Effort Estimation (HSEE), has not been well studied. In this paper, we propose a HSEE model, called Dynamic Heterogeneous Software Effort Estimation (i.e., DHSEE), which leverages the adversarial auto-encoder and convolutional neural network techniques. Meanwhile, we have investigated the scenario of conducting HSEE with dynamically increasing effort data. Experiments on ten public datasets indicate that our approach can significantly outperform the state-of-the-art HSEE method and other competing methods on both static and dynamic SEE scenarios.

Book ChapterDOI
28 Dec 2020
TL;DR: Wang et al. as mentioned in this paper proposed a secure outsourcing scheme for Fully Homomorphic Encryption using Hidden Ideal Lattice (FHEHIL), in which the time-consuming operations (including modular exponentiation and polynomial multiplication) are outsourced.
Abstract: The efficiency of homomorphic encryption has always affected its practicality. With the dawn of internet of things, the demand for computation and encryption on lightweight devices is increasing. Complex cryptographic computing is an important burden for lightweight devices, but outsourcing provides great convenience for them. In this paper, based on blockchain, we propose a secure outsourcing scheme for Fully Homomorphic Encryption using Hidden Ideal Lattice (FHEHIL), in which the time-consuming operations (including modular exponentiation and polynomial multiplication) are outsourced. For polynomial multiplication, we propose a secure outsourcing algorithm that reduces the local computation cost to \(O\left(n\right)\). Previous work based on Fast Fourier Transform can only achieve \(O\left(nlog(n)\right)\) for the local cost. Through security analysis, our scheme achieves the goals of privacy protection against passive attackers and cheating detection against active attackers. Experiments also demonstrate our scheme is more efficient in comparison with the non-outsourcing FHEHIL.

Book ChapterDOI
28 Dec 2020
TL;DR: Wang et al. as mentioned in this paper proposed a framework to combine the tasks of image super-resolution and colorization together, which can obtain motion information of objects in the video by predicting surrounding frames with the current frame.
Abstract: Image super-resolution and colorization are two important research fields in computer vision. In previous studies, they have been considered separately as two unrelated tasks. However, for the task of restoring gray video to high-definition color video, when the network learns to abstract features from low-resolution images and maps them to high-resolution images, the abstract understanding of images by the network is also useful for colorization task. Treating them as two unrelated tasks have to construct two different models, which needs more time and resources. In this paper, we propose a framework to combine the tasks of image super-resolution and colorization together. We design a new network model to directly map low-resolution gray images into high-resolution color images. Moreover, this model can obtain motion information of objects in the video by predicting surrounding frames with the current frame. Thus, video super-resolution and colorization can be realized. To support studying super-resolution and colorization together, we build a video dataset containing three scenes. As far as we know, this is the first dataset for such kinds of tasks combination.

Book ChapterDOI
28 Dec 2020
TL;DR: In this paper, a decentralized stochastic proximal gradient method, termed DSPG, is proposed to ensure the structural properties of the output to be preserved, where the nonergodic (last) iteration acts as the output.
Abstract: This paper considers the distributed “nonsmooth+nonsmooth” composite optimization problems for which n agents collaboratively minimize the sum of their local objective functions over the directed networks. In particular, we focus on the scenarios where the sought solutions are desired to possess some structural properties, e.g., sparsity. However, to ensure the convergence, most existing methods produce an ergodic solution via the averaging schemes as the output, which causes the desired structural properties of the output to be destroyed. To address this issue, we develop a new decentralized stochastic proximal gradient method, termed DSPG, in which the nonergodic (last) iteration acts as the output. We also show that the DSPG method achieves the nonergodic convergence rate \(O(\log (T)/\sqrt{T})\) for generally convex objective functions and \(O(\log (T)/T)\) for strongly convex objective functions. When the structure-enhancing regularization is absent and the simple and suffix averaging schemes are used, the convergence rates of DSPG reach \(O(1/\sqrt{T})\) for generally convex objective functions and O(1/T) for strongly convex objective functions, showing improvement relative to the rates \(O(\log (T)/\sqrt{T})\) and \(O(\log (T)/T)\) provided by the existing methods. Simulation examples further illustrate the effectiveness of the proposed method.

Book ChapterDOI
28 Dec 2020
TL;DR: The authors proposed a distributed reinforcement learning framework to generate creative and meaningful poetry, which consists of two stages of training, one at word-level and another at sentence-level to learn how connotation is developed in human literary art works.
Abstract: Poetry generation has been a classic natural language generation task recently. But so far the methods for this topic mainly imitate and reproduce the poems on the training data set, which indicates that they either have not much connotation or overfit too much like plagiarism of the existing poems. To solve this problem, unlike previous work, instead of tuning the trade-off between connotation and innovation, we propose a distributed reinforcement learning framework, which consists of two stages of training, to generate creative and meaningful poetry. At the first stage we train a model in parallel on a large poetry corpus at word level to master how poets write poems. At the second stage we train the model with a distributed architecture to learn how connotation is developed in human literary art works at sentence level and force the model to imitate itself when it composes some ‘good poems’ to further improve performance. Experiments on generating classical Chinese poetry demonstrate that the proposed model is able to achieve better performance and the high efficiency of training compared to the state-of-the-art.

Book ChapterDOI
28 Dec 2020
TL;DR: In this paper, the authors propose a consolidation strategy to pack as many applications as possible in a (cloud-centric) cluster with limited number of working nodes, however, collocated applications can negatively compete with each other, for obtaining the resource capacity in a shared platform that, in turn, the result may lead to a severe performance degradation among all running applications.
Abstract: The concept of stream data processing is becoming challenging in most business sectors where try to improve their operational efficiency by deriving valuable information from unstructured, yet, contentiously generated high volume raw data in an expected time spans. A modern streamlined data processing platform is required to execute analytical pipelines over a continues flow of data-items that might arrive in a high rate. In most cases, the platform is also expected to dynamically adapt to dynamic characteristics of the incoming traffic rates and the ever-changing condition of underlying computational resources while fulfill the tight latency constraints imposed by the end-users. Apache Storm has emerged as an important open source technology for performing stream processing with very tight latency constraints over a cluster of computing nodes. To increase the overall resource utilization, however, the service provider might be tempted to use a consolidation strategy to pack as many applications as possible in a (cloud-centric) cluster with limited number of working nodes. However, collocated applications can negatively compete with each other, for obtaining the resource capacity in a shared platform that, in turn, the result may lead to a severe performance degradation among all running applications.

Book ChapterDOI
28 Dec 2020
TL;DR: In this article, an adaptive tensor-train decomposition (ATTD) algorithm was proposed to solve the problem of finding the optimal decomposition rank for neural networks on mobile devices.
Abstract: It could be of great difficulty and cost to directly apply complex deep neural network to mobile devices with limited computing and endurance abilities. This paper aims to solve such problem through improving the compactness of the model and efficiency of computing. On the basis of MobileNet, a mainstream lightweight neural network, we proposed an Adaptive Tensor-Train Decomposition (ATTD) algorithm to solve the cumbersome problem of finding optimal decomposition rank. For its non-obviousness in the forward acceleration of GPU side, our strategy of choosing to use lower decomposition dimensions and moderate decomposition rank, and the using of dynamic programming, have effectively reduced the number of parameters and amount of computation. And then, we have also set up a real-time target network for mobile devices. With the support of sufficient amount of experiment results, the method proposed in this paper can greatly reduce the number of parameters and amount of computation, improving the model’s speed in deducing on mobile devices.

Book ChapterDOI
28 Dec 2020
TL;DR: In this article, the working principle of DFC compiler is introduced, which is a grammatical extension of standard C language, with special DF function which describe the dependence of computing DAG.
Abstract: The working principle of DFC compiler is introduced in this article. DFC is a grammatical extension of standard C language, with special DF function which describe the dependence of computing DAG. DFC compiler, dfcc, is used to convert the DFC source codes to standard C codes with the assistance of multi-threaded library. The lexical rules and grammatical rules are used to setup the AST of DFC codes, and then it is converted to the AST of standard C without DF function nodes. The derived AST is printed into a text file, a normal C file, and finally is processed by GCC to obtain the executable file. The experiment gives some demonstration of that converting details, and the memory footprint is studied to show an ideal scalability.

Book ChapterDOI
28 Dec 2020
TL;DR: In this article, a relaxed balanced concurrent binary search tree using a single word compare and swap primitive is presented, which separates balancing actions from update operations and includes a lock-free balancing mechanism in addition to the insert, search, and relaxed delete operations.
Abstract: This paper presents a new relaxed balanced concurrent binary search tree using a single word compare and swap primitive, in which all operations are lock-free. Our design separates balancing actions from update operations and includes a lock-free balancing mechanism in addition to the insert, search, and relaxed delete operations. Search in our design is not affected by ongoing concurrent update operations or by the movement of nodes by tree restructuring operations. Our experiments show that our algorithm performs better than other state-of-the-art concurrent BSTs.