scispace - formally typeset
Search or ask a question

Showing papers on "Scalability published in 2015"


Proceedings ArticleDOI
18 May 2015
TL;DR: A novel network embedding method called the ``LINE,'' which is suitable for arbitrary types of information networks: undirected, directed, and/or weighted, and optimizes a carefully designed objective function that preserves both the local and global network structures.
Abstract: This paper studies the problem of embedding very large information networks into low-dimensional vector spaces, which is useful in many tasks such as visualization, node classification, and link prediction. Most existing graph embedding methods do not scale for real world information networks which usually contain millions of nodes. In this paper, we propose a novel network embedding method called the ``LINE,'' which is suitable for arbitrary types of information networks: undirected, directed, and/or weighted. The method optimizes a carefully designed objective function that preserves both the local and global network structures. An edge-sampling algorithm is proposed that addresses the limitation of the classical stochastic gradient descent and improves both the effectiveness and the efficiency of the inference. Empirical experiments prove the effectiveness of the LINE on a variety of real-world information networks, including language networks, social networks, and citation networks. The algorithm is very efficient, which is able to learn the embedding of a network with millions of vertices and billions of edges in a few hours on a typical single machine. The source code of the LINE is available online\footnote{\url{https://github.com/tangjianpku/LINE}}.

3,492 citations


Journal ArticleDOI
TL;DR: The definition, characteristics, and classification of big data along with some discussions on cloud computing are introduced, and research challenges are investigated, with focus on scalability, availability, data integrity, data transformation, data quality, data heterogeneity, privacy, legal and regulatory issues, and governance.

2,141 citations


Proceedings ArticleDOI
22 Feb 2015
TL;DR: This work implements a CNN accelerator on a VC707 FPGA board and compares it to previous approaches, achieving a peak performance of 61.62 GFLOPS under 100MHz working frequency, which outperform previous approaches significantly.
Abstract: Convolutional neural network (CNN) has been widely employed for image recognition because it can achieve high accuracy by emulating behavior of optic nerves in living creatures. Recently, rapid growth of modern applications based on deep learning algorithms has further improved research and implementations. Especially, various accelerators for deep CNN have been proposed based on FPGA platform because it has advantages of high performance, reconfigurability, and fast development round, etc. Although current FPGA accelerators have demonstrated better performance over generic processors, the accelerator design space has not been well exploited. One critical problem is that the computation throughput may not well match the memory bandwidth provided an FPGA platform. Consequently, existing approaches cannot achieve best performance due to under-utilization of either logic resource or memory bandwidth. At the same time, the increasing complexity and scalability of deep learning applications aggravate this problem. In order to overcome this problem, we propose an analytical design scheme using the roofline model. For any solution of a CNN design, we quantitatively analyze its computing throughput and required memory bandwidth using various optimization techniques, such as loop tiling and transformation. Then, with the help of rooine model, we can identify the solution with best performance and lowest FPGA resource requirement. As a case study, we implement a CNN accelerator on a VC707 FPGA board and compare it to previous approaches. Our implementation achieves a peak performance of 61.62 GFLOPS under 100MHz working frequency, which outperform previous approaches significantly.

1,893 citations


Book
10 May 2015
TL;DR: Big Data describes a scalable, easy to understand approach to big data systems that can be built and run by a small team that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data.
Abstract: Services like social networks, web analytics, and intelligent e-commerce often need to manage data at a scale too big for a traditional database As scale and demand increase, so does Complexity Fortunately, scalability and simplicity are not mutually exclusiverather than using some trendy technology, a different approach is needed Big data systems use many machines working in parallel to store and process data, which introduces fundamental challenges unfamiliar to most developers Big Data shows how to build these systems using an architecture that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data It describes a scalable, easy to understand approach to big data systems that can be built and run by a small team Following a realistic example, this book guides readers through the theory of big data systems, how to use them in practice, and how to deploy and operate them once they're built Purchase of the print book comes with an offer of a free PDF, ePub, and Kindle eBook from Manning Also available is all code from the book

926 citations


Proceedings ArticleDOI
13 Jun 2015
TL;DR: This work argues that the conventional concept of processing-in-memory (PIM) can be a viable solution to achieve memory-capacity-proportional performance and designs a programmable PIM accelerator for large-scale graph processing called Tesseract.
Abstract: The explosion of digital data and the ever-growing need for fast data analysis have made in-memory big-data processing in computer systems increasingly important. In particular, large-scale graph processing is gaining attention due to its broad applicability from social science to machine learning. However, scalable hardware design that can efficiently process large graphs in main memory is still an open problem. Ideally, cost-effective and scalable graph processing systems can be realized by building a system whose performance increases proportionally with the sizes of graphs that can be stored in the system, which is extremely challenging in conventional systems due to severe memory bandwidth limitations. In this work, we argue that the conventional concept of processing-in-memory (PIM) can be a viable solution to achieve such an objective. The key modern enabler for PIM is the recent advancement of the 3D integration technology that facilitates stacking logic and memory dies in a single package, which was not available when the PIM concept was originally examined. In order to take advantage of such a new technology to enable memory-capacity-proportional performance, we design a programmable PIM accelerator for large-scale graph processing called Tesseract. Tesseract is composed of (1) a new hardware architecture that fully utilizes the available memory bandwidth, (2) an efficient method of communication between different memory partitions, and (3) a programming interface that reflects and exploits the unique hardware design. It also includes two hardware prefetchers specialized for memory access patterns of graph processing, which operate based on the hints provided by our programming model. Our comprehensive evaluations using five state-of-the-art graph processing workloads with large real-world graphs show that the proposed architecture improves average system performance by a factor of ten and achieves 87% average energy reduction over conventional systems.

718 citations


Proceedings ArticleDOI
18 May 2015
TL;DR: This work proposes a content-based recommendation system to address both the recommendation quality and the system scalability, and proposes to use a rich feature set to represent users, according to their web browsing history and search queries, using a Deep Learning approach.
Abstract: Recent online services rely heavily on automatic personalization to recommend relevant content to a large number of users. This requires systems to scale promptly to accommodate the stream of new users visiting the online services for the first time. In this work, we propose a content-based recommendation system to address both the recommendation quality and the system scalability. We propose to use a rich feature set to represent users, according to their web browsing history and search queries. We use a Deep Learning approach to map users and items to a latent space where the similarity between users and their preferred items is maximized. We extend the model to jointly learn from features of items from different domains and user features by introducing a multi-view Deep Learning model. We show how to make this rich-feature based user representation scalable by reducing the dimension of the inputs and the amount of training data. The rich user feature representation allows the model to learn relevant user behavior patterns and give useful recommendations for users who do not have any interaction with the service, given that they have adequate search and browsing history. The combination of different domains into a single model for learning helps improve the recommendation quality across all the domains, as well as having a more compact and a semantically richer user latent feature vector. We experiment with our approach on three real-world recommendation systems acquired from different sources of Microsoft products: Windows Apps recommendation, News recommendation, and Movie/TV recommendation. Results indicate that our approach is significantly better than the state-of-the-art algorithms (up to 49% enhancement on existing users and 115% enhancement on new users). In addition, experiments on a publicly open data set also indicate the superiority of our method in comparison with transitional generative topic models, for modeling cross-domain recommender systems. Scalability analysis show that our multi-view DNN model can easily scale to encompass millions of users and billions of item entries. Experimental results also confirm that combining features from all domains produces much better performance than building separate models for each domain.

650 citations


Proceedings ArticleDOI
10 Aug 2015
TL;DR: It is demonstrated that the rich content and linkage information in a heterogeneous network can be captured by a multi-resolution deep embedding function, so that similarities among cross-modal data can be measured directly in a common embedding space.
Abstract: Data embedding is used in many machine learning applications to create low-dimensional feature representations, which preserves the structure of data points in their original space. In this paper, we examine the scenario of a heterogeneous network with nodes and content of various types. Such networks are notoriously difficult to mine because of the bewildering combination of heterogeneous contents and structures. The creation of a multidimensional embedding of such data opens the door to the use of a wide variety of off-the-shelf mining techniques for multidimensional data. Despite the importance of this problem, limited efforts have been made on embedding a network of scalable, dynamic and heterogeneous data. In such cases, both the content and linkage structure provide important cues for creating a unified feature representation of the underlying network. In this paper, we design a deep embedding algorithm for networked data. A highly nonlinear multi-layered embedding function is used to capture the complex interactions between the heterogeneous data in a network. Our goal is to create a multi-resolution deep embedding function, that reflects both the local and global network structures, and makes the resulting embedding useful for a variety of data mining tasks. In particular, we demonstrate that the rich content and linkage information in a heterogeneous network can be captured by such an approach, so that similarities among cross-modal data can be measured directly in a common embedding space. Once this goal has been achieved, a wide variety of data mining problems can be solved by applying off-the-shelf algorithms designed for handling vector representations. Our experiments on real-world network datasets show the effectiveness and scalability of the proposed algorithm as compared to the state-of-the-art embedding methods.

594 citations


Journal ArticleDOI
TL;DR: This paper provides a list of criteria for making selections along with an analysis of the advantages and drawbacks of three different processing paradigms along with a comparison of engines that implement them, including MapReduce, Spark, Flink, Storm, and H2O.
Abstract: With an ever-increasing amount of options, the task of selecting machine learning tools for big data can be difficult. The available tools have advantages and drawbacks, and many have overlapping uses. The world’s data is growing rapidly, and traditional tools for machine learning are becoming insufficient as we move towards distributed and real-time processing. This paper is intended to aid the researcher or professional who understands machine learning but is inexperienced with big data. In order to evaluate tools, one should have a thorough understanding of what to look for. To that end, this paper provides a list of criteria for making selections along with an analysis of the advantages and drawbacks of each. We do this by starting from the beginning, and looking at what exactly the term “big data” means. From there, we go on to the Hadoop ecosystem for a look at many of the projects that are part of a typical machine learning architecture and an understanding of how everything might fit together. We discuss the advantages and disadvantages of three different processing paradigms along with a comparison of engines that implement them, including MapReduce, Spark, Flink, Storm, and H2O. We then look at machine learning libraries and frameworks including Mahout, MLlib, SAMOA, and evaluate them based on criteria such as scalability, ease of use, and extensibility. There is no single toolkit that truly embodies a one-size-fits-all solution, so this paper aims to help make decisions smoother by providing as much information as possible and quantifying what the tradeoffs will be. Additionally, throughout this paper, we review recent research in the field using these tools and talk about possible future directions for toolkit-based learning.

379 citations


Posted Content
TL;DR: A new structured kernel interpolation (SKI) framework is introduced, which generalises and unifies inducing point methods for scalable Gaussian processes (GPs) and naturally enables Kronecker and Toeplitz algebra for substantial additional gains in scalability.
Abstract: We introduce a new structured kernel interpolation (SKI) framework, which generalises and unifies inducing point methods for scalable Gaussian processes (GPs). SKI methods produce kernel approximations for fast computations through kernel interpolation. The SKI framework clarifies how the quality of an inducing point approach depends on the number of inducing (aka interpolation) points, interpolation strategy, and GP covariance kernel. SKI also provides a mechanism to create new scalable kernel methods, through choosing different kernel interpolation strategies. Using SKI, with local cubic kernel interpolation, we introduce KISS-GP, which is 1) more scalable than inducing point alternatives, 2) naturally enables Kronecker and Toeplitz algebra for substantial additional gains in scalability, without requiring any grid data, and 3) can be used for fast and expressive kernel learning. KISS-GP costs O(n) time and storage for GP inference. We evaluate KISS-GP for kernel matrix approximation, kernel learning, and natural sound modelling.

358 citations


Proceedings ArticleDOI
11 May 2015
TL;DR: A new VANET architecture called FSDN is proposed which combines two emergent computing and network paradigm Software Defined Networking (SDN) and Fog Computing as a prospective solution and provides flexibility, scalability, programmability and global knowledge.
Abstract: Vehicular Adhoc Networks (VANETs) have been attracted a lot of research recent years. Although VANETs are deployed in reality offering several services, the current architecture has been facing many difficulties in deployment and management because of poor connectivity, less scalability, less flexibility and less intelligence. We propose a new VANET architecture called FSDN which combines two emergent computing and network paradigm Software Defined Networking (SDN) and Fog Computing as a prospective solution. SDN-based architecture provides flexibility, scalability, programmability and global knowledge while Fog Computing offers delay-sensitive and location-awareness services which could be satisfy the demands of future VANETs scenarios. We figure out all the SDN-based VANET components as well as their functionality in the system. We also consider the system basic operations in which Fog Computing are leveraged to support surveillance services by taking into account resource manager and Fog orchestration models. The proposed architecture could resolve the main challenges in VANETs by augmenting Vehicle-to-Vehicle (V2V), Vehicle-to-Infrastructure (V2I), Vehicle-to-Base Station communications and SDN centralized control while optimizing resources utility and reducing latency by integrating Fog Computing. Two use-cases for non-safety service (data streaming) and safety service (Lane-change assistance) are also presented to illustrate the benefits of our proposed architecture.

358 citations


Proceedings ArticleDOI
04 Oct 2015
TL;DR: This paper presents a scalable and application-agnostic scheduling framework for packet processing, and compares its performance to current approaches.
Abstract: By moving network appliance functionality from proprietary hardware to software, Network Function Virtualization promises to bring the advantages of cloud computing to network packet processing. However, the evolution of cloud computing (particularly for data analytics) has greatly benefited from application-independent methods for scaling and placement that achieve high efficiency while relieving programmers of these burdens. NFV has no such general management solutions. In this paper, we present a scalable and application-agnostic scheduling framework for packet processing, and compare its performance to current approaches.

Journal ArticleDOI
TL;DR: This work develops one data collection protocol called EDAL, which stands for Energy-efficient Delay-aware Lifetime-balancing data collection, and proposes both a centralized heuristic to reduce its computational overhead and a distributed heuristics to make the algorithm scalable for large-scale network operations.
Abstract: Our work in this paper stems from our insight that recent research efforts on open vehicle routing (OVR) problems, an active area in operations research, are based on similar assumptions and constraints compared to sensor networks. Therefore, it may be feasible that we could adapt these techniques in such a way that they will provide valuable solutions to certain tricky problems in the wireless sensor network (WSN) domain. To demonstrate that this approach is feasible, we develop one data collection protocol called EDAL, which stands for Energy-efficient Delay-aware Lifetime-balancing data collection. The algorithm design of EDAL leverages one result from OVR to prove that the problem formulation is inherently NP-hard. Therefore, we proposed both a centralized heuristic to reduce its computational overhead and a distributed heuristic to make the algorithm scalable for large-scale network operations. We also develop EDAL to be closely integrated with compressive sensing, an emerging technique that promises considerable reduction in total traffic cost for collecting sensor readings under loose delay bounds. Finally, we systematically evaluate EDAL to compare its performance to related protocols in both simulations and a hardware testbed.

Journal ArticleDOI
TL;DR: This work introduces a micrometer-scale flip-chip process that enables scalable integration of Superconducting nanowire single-photon detectors on a range of photonic circuits and demonstrates the first on-chip photon correlation measurements of non-classical light.
Abstract: United States. Army Research Office (Defense Advanced Research Projects Agency. Information in a Photon (InPho) Program Grant W911NF-10-1-0416)

Journal ArticleDOI
01 Jul 2015
TL;DR: GraphMat is a single-node multicore graph framework written in C++ that achieves better multicore scalability than other frameworks and is 1.2X off native, hand-optimized code on a variety of graph algorithms.
Abstract: Given the growing importance of large-scale graph analytics, there is a need to improve the performance of graph analysis frameworks without compromising on productivity. GraphMat is our solution to bridge this gap between a user-friendly graph analytics framework and native, hand-optimized code. GraphMat functions by taking vertex programs and mapping them to high performance sparse matrix operations in the backend. We thus get the productivity benefits of a vertex programming framework without sacrificing performance. GraphMat is a single-node multicore graph framework written in C++ which has enabled us to write a diverse set of graph algorithms with the same effort compared to other vertex programming frameworks. GraphMat performs 1.1-7X faster than high performance frameworks such as GraphLab, CombBLAS and Galois. GraphMat also matches the performance of MapGraph, a GPU-based graph framework, despite running on a CPU platform with significantly lower compute and bandwidth resources. It achieves better multicore scalability (13-15X on 24 cores) than other frameworks and is 1.2X off native, hand-optimized code on a variety of graph algorithms. Since GraphMat performance depends mainly on a few scalable and well-understood sparse matrix operations, GraphMat can naturally benefit from the trend of increasing parallelism in future hardware.

Proceedings Article
18 May 2015
TL;DR: This work surveys measurements of data-parallel systems recently reported in SOSP and OSDI, and finds that many systems have either a surprisingly large COST, often hundreds of cores, or simply underperform one thread for all of their reported configurations.
Abstract: We offer a new metric for big data platforms, COST, or the Configuration that Outperforms a Single Thread The COST of a given platform for a given problem is the hardware configuration required before the platform outperforms a competent single-threaded implementation COST weighs a system's scalability against the overheads introduced by the system, and indicates the actual performance gains of the system, without rewarding systems that bring substantial but parallelizable overheads We survey measurements of data-parallel systems recently reported in SOSP and OSDI, and find that many systems have either a surprisingly large COST, often hundreds of cores, or simply underperform one thread for all of their reported configurations

Proceedings ArticleDOI
17 Dec 2015
TL;DR: This paper proposes a Distributed Dataflow (DDF) programming model for the IoT that utilises computing infrastructures across the Fog and the Cloud and demonstrates that this approach eases the development process and can be used to build a variety of IoT applications that work efficiently in the Fog.
Abstract: In this paper we examine the development of IoT applications from the perspective of the Fog Computing paradigm, where computing infrastructure at the network edge in devices and gateways is leverage for efficiency and timeliness. Due to the intrinsic nature of the IoT: heterogeneous devices/resources, a tightly coupled perception-action cycle and widely distributed devices and processing, application development in the Fog can be challenging. To address these challenges, we propose a Distributed Dataflow (DDF) programming model for the IoT that utilises computing infrastructures across the Fog and the Cloud. We evaluate our proposal by implementing a DDF framework based on Node-RED (Distributed Node-RED or D-NR), a visual programming tool that uses a flow-based model for building IoT applications. Via demonstrations, we show that our approach eases the development process and can be used to build a variety of IoT applications that work efficiently in the Fog.

Proceedings ArticleDOI
17 Aug 2015
TL;DR: This work presents Everflow, a packet-level network telemetry system for large DCNs, and presents experiments that demonstrate Everflow's scalability, and shares experiences of troubleshooting network faults gathered from running it for over 6 months in Microsoft's DCNs.
Abstract: Debugging faults in complex networks often requires capturing and analyzing traffic at the packet level. In this task, datacenter networks (DCNs) present unique challenges with their scale, traffic volume, and diversity of faults. To troubleshoot faults in a timely manner, DCN administrators must a) identify affected packets inside large volume of traffic; b) track them across multiple network components; c) analyze traffic traces for fault patterns; and d) test or confirm potential causes. To our knowledge, no tool today can achieve both the specificity and scale required for this task. We present Everflow, a packet-level network telemetry system for large DCNs. Everflow traces specific packets by implementing a powerful packet filter on top of "match and mirror" functionality of commodity switches. It shuffles captured packets to multiple analysis servers using load balancers built on switch ASICs, and it sends "guided probes" to test or confirm potential faults. We present experiments that demonstrate Everflow's scalability, and share experiences of troubleshooting network faults gathered from running it for over 6 months in Microsoft's DCNs.

Journal ArticleDOI
TL;DR: A new software-defined architecture, called SoftAir, for next generation (5G) wireless systems, is introduced, where the novel ideas of network function cloudification and network virtualization are exploited to provide a scalable, flexible and resilient network architecture.

Journal ArticleDOI
TL;DR: In this survey, the vertex-centric approach to graph processing is overviewed, TLAV frameworks are deconstructed into four main components and respectively analyzed, and TLAV implementations are reviewed and categorized.
Abstract: The vertex-centric programming model is an established computational paradigm recently incorporated into distributed processing frameworks to address challenges in large-scale graph processing. Billion-node graphs that exceed the memory capacity of commodity machines are not well supported by popular Big Data tools like MapReduce, which are notoriously poor performing for iterative graph algorithms such as PageRank. In response, a new type of framework challenges one to “think like a vertex” (TLAV) and implements user-defined programs from the perspective of a vertex rather than a graph. Such an approach improves locality, demonstrates linear scalability, and provides a natural way to express and compute many iterative graph algorithms. These frameworks are simple to program and widely applicable but, like an operating system, are composed of several intricate, interdependent components, of which a thorough understanding is necessary in order to elicit top performance at scale. To this end, the first comprehensive survey of TLAV frameworks is presented. In this survey, the vertex-centric approach to graph processing is overviewed, TLAV frameworks are deconstructed into four main components and respectively analyzed, and TLAV implementations are reviewed and categorized.

Journal ArticleDOI
Anders Rantzer1
TL;DR: In this paper, a method for synthesis of distributed controllers based on linear Lyapunov functions and storage functions instead of quadratic ones was developed for analysis and design of large scale control systems.

Proceedings ArticleDOI
18 Oct 2015
TL;DR: This paper develops the hardware and software of an NDP architecture for in-memory analytics frameworks, including MapReduce, graphprocessing, and deep neural networks, and shows that it is critical to optimize software frameworks for spatial locality as it leads to 2.9x efficiency improvements for NDP.
Abstract: The end of Dennard scaling has made all systemsenergy-constrained For data-intensive applications with limitedtemporal locality, the major energy bottleneck is data movementbetween processor chips and main memory modules For such workloads, the best way to optimize energy is to place processing near the datain main memory Advances in 3D integrationprovide an opportunity to implement near-data processing (NDP) withoutthe technology problems that similar efforts had in the past This paper develops the hardware and software of an NDP architecturefor in-memory analytics frameworks, including MapReduce, graphprocessing, and deep neural networks We develop simple but scalablehardware support for coherence, communication, and synchronization, anda runtime system that is sufficient to support analytics frameworks withcomplex data patterns while hiding all thedetails of the NDP hardware Our NDP architecture provides up to 16x performance and energy advantageover conventional approaches, and 25x over recently-proposed NDP systems We also investigate the balance between processing and memory throughput, as well as the scalability and physical and logical organization of the memory system Finally, we show that it is critical to optimize software frameworksfor spatial locality as it leads to 29x efficiency improvements for NDP

Posted Content
TL;DR: The robust Bayesian Committee Machine is introduced, a practical and scalable product-of-experts model for large-scale distributed GP regression and can be used on heterogeneous computing infrastructures, ranging from laptops to clusters.
Abstract: To scale Gaussian processes (GPs) to large data sets we introduce the robust Bayesian Committee Machine (rBCM), a practical and scalable product-of-experts model for large-scale distributed GP regression. Unlike state-of-the-art sparse GP approximations, the rBCM is conceptually simple and does not rely on inducing or variational parameters. The key idea is to recursively distribute computations to independent computational units and, subsequently, recombine them to form an overall result. Efficient closed-form inference allows for straightforward parallelisation and distributed computations with a small memory footprint. The rBCM is independent of the computational graph and can be used on heterogeneous computing infrastructures, ranging from laptops to clusters. With sufficient computing resources our distributed GP model can handle arbitrarily large data sets.

Journal ArticleDOI
TL;DR: This paper presents the 5Vs characteristics of big data and the technique and technology used to handle big data in a wide variety of scalable database tools and techniques.

Journal ArticleDOI
TL;DR: The goal is to elaborate on the role of WSNs for smart grid applications and to provide an overview of the most recent advances in MAC and routing protocols for W SNs in this timely and exciting field.

Proceedings ArticleDOI
12 Oct 2015
TL;DR: It is shown that current scalability measures adopted by Bitcoin come at odds with the security of the system, and that an adversary can exploit these measures in order to effectively delay the propagation of transactions and blocks to specific nodes for a considerable amount of time---without causing a network partitioning in the system.
Abstract: Given the increasing adoption of Bitcoin, the number of transactions and the block sizes within the system are only expected to increase. To sustain its correct operation in spite of its ever-increasing use, Bitcoin implements a number of necessary optimizations and scalability measures. These measures limit the amount of information broadcast in the system to the minimum necessary. In this paper, we show that current scalability measures adopted by Bitcoin come at odds with the security of the system. More specifically, we show that an adversary can exploit these measures in order to effectively delay the propagation of transactions and blocks to specific nodes for a considerable amount of time---without causing a network partitioning in the system. Notice that this attack alters the information received by Bitcoin nodes, and modifies their views of the ledger state. Namely, we show that this allows the adversary to considerably increase its mining advantage in the network, and to double-spend transactions in spite of the current countermeasures adopted by Bitcoin. Based on our results, we propose a number of countermeasures in order to enhance the security of Bitcoin without deteriorating its scalability.

Journal ArticleDOI
01 Apr 2015
TL;DR: The main idea of the framework is to build a hierarchical structure of cloud computing centers to provide different types of computing services for information management and big data analysis in smart grids, which is called “Smart-Frame.”
Abstract: Smart grid is a technological innovation that improves efficiency, reliability, economics, and sustainability of electricity services. It plays a crucial role in modern energy infrastructure. The main challenges of smart grids, however, are how to manage different types of front-end intelligent devices such as power assets and smart meters efficiently; and how to process a huge amount of data received from these devices. Cloud computing, a technology that provides computational resources on demands, is a good candidate to address these challenges since it has several good properties such as energy saving, cost saving, agility, scalability, and flexibility. In this paper, we propose a secure cloud computing based framework for big data information management in smart grids, which we call “Smart-Frame.” The main idea of our framework is to build a hierarchical structure of cloud computing centers to provide different types of computing services for information management and big data analysis. In addition to this structural framework, we present a security solution based on identity-based encryption, signature and proxy re-encryption to address critical security issues of the proposed framework.

Journal ArticleDOI
TL;DR: This work proposes a framework to mine requirements from a closed-loop model of an industrial-scale control system, such as one specified in Simulink, and demonstrates the scalability and utility of the technique on three complex case studies in the domain of automotive powertrain systems.
Abstract: Formal verification of a control system can be performed by checking if a model of its dynamical behavior conforms to temporal requirements. Unfortunately, adoption of formal verification in an industrial setting is a formidable challenge as design requirements are often vague, nonmodular, evolving, or sometimes simply unknown. We propose a framework to mine requirements from a closed-loop model of an industrial-scale control system, such as one specified in Simulink. The input to our algorithm is a requirement template expressed in parametric signal temporal logic: a logical formula in which concrete signal or time values are replaced with parameters. Given a set of simulation traces of the model, our method infers values for the template parameters to obtain the strongest candidate requirement satisfied by the traces. It then tries to falsify the candidate requirement using a falsification tool. If a counterexample is found, it is added to the existing set of traces and these steps are repeated; otherwise, it terminates with the synthesized requirement. Requirement mining has several usage scenarios: mined requirements can be used to formally validate future modifications of the model, they can be used to gain better understanding of legacy models or code, and can also help enhancing the process of bug finding through simulations. We demonstrate the scalability and utility of our technique on three complex case studies in the domain of automotive powertrain systems: a simple automatic transmission controller, an air-fuel controller with a mean-value model of the engine dynamics, and an industrial-size prototype airpath controller for a diesel engine. We include results on a bug found in the prototype controller by our method.

Proceedings ArticleDOI
12 Oct 2015
TL;DR: It is shown that many popular code bases such as Apache and Nginx use coding practices that create flexibility in their intended control flow graph (CFG) even when a strong static analyzer is used to construct the CFG, which allows an attacker to gain control of the execution while strictly adhering to a fine-grained CFI.
Abstract: Control flow integrity (CFI) has been proposed as an approach to defend against control-hijacking memory corruption attacks. CFI works by assigning tags to indirect branch targets statically and checking them at runtime. Coarse-grained enforcements of CFI that use a small number of tags to improve the performance overhead have been shown to be ineffective. As a result, a number of recent efforts have focused on fine-grained enforcement of CFI as it was originally proposed. In this work, we show that even a fine-grained form of CFI with unlimited number of tags and a shadow stack (to check calls and returns) is ineffective in protecting against malicious attacks. We show that many popular code bases such as Apache and Nginx use coding practices that create flexibility in their intended control flow graph (CFG) even when a strong static analyzer is used to construct the CFG. These flexibilities allow an attacker to gain control of the execution while strictly adhering to a fine-grained CFI. We then construct two proof-of-concept exploits that attack an unlimited tag CFI system with a shadow stack. We also evaluate the difficulties of generating a precise CFG using scalable static analysis for real-world applications. Finally, we perform an analysis on a number of popular applications that highlights the availability of such attacks.

Posted Content
TL;DR: In this paper, the authors present FireCaffe, which scales deep neural network training across a cluster of GPUs by selecting network hardware that achieves high bandwidth between GPU servers and using reduction trees to reduce communication overhead.
Abstract: Long training times for high-accuracy deep neural networks (DNNs) impede research into new DNN architectures and slow the development of high-accuracy DNNs. In this paper we present FireCaffe, which successfully scales deep neural network training across a cluster of GPUs. We also present a number of best practices to aid in comparing advancements in methods for scaling and accelerating the training of deep neural networks. The speed and scalability of distributed algorithms is almost always limited by the overhead of communicating between servers; DNN training is not an exception to this rule. Therefore, the key consideration here is to reduce communication overhead wherever possible, while not degrading the accuracy of the DNN models that we train. Our approach has three key pillars. First, we select network hardware that achieves high bandwidth between GPU servers -- Infiniband or Cray interconnects are ideal for this. Second, we consider a number of communication algorithms, and we find that reduction trees are more efficient and scalable than the traditional parameter server approach. Third, we optionally increase the batch size to reduce the total quantity of communication during DNN training, and we identify hyperparameters that allow us to reproduce the small-batch accuracy while training with large batch sizes. When training GoogLeNet and Network-in-Network on ImageNet, we achieve a 47x and 39x speedup, respectively, when training on a cluster of 128 GPUs.

Journal ArticleDOI
TL;DR: A novel distributed partitioning methodology for prototype reduction techniques in nearest neighbor classification that enables prototype reduction algorithms to be applied over big data classification problems without significant accuracy loss and is a suitable tool to enhance the performance of the nearest neighbor classifier with big data.