scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Distributed Parallel PCA for Modeling and Monitoring of Large-Scale Plant-Wide Processes With Big Data

26 Jan 2017-IEEE Transactions on Industrial Informatics (IEEE)-Vol. 13, Iss: 4, pp 1877-1885
TL;DR: A systematic fault detection and isolation scheme is designed so that the whole large-scale process can be hierarchically monitored from the plant-wide level, unit block level, and variable level and the effectiveness of the proposed method is evaluated.
Abstract: In order to deal with the modeling and monitoring issue of large-scale industrial processes with big data, a distributed and parallel designed principal component analysis approach is proposed. To handle the high-dimensional process variables, the large-scale process is first decomposed into distributed blocks with a priori process knowledge. Afterward, in order to solve the modeling issue with large-scale data chunks in each block, a distributed and parallel data processing strategy is proposed based on the framework of MapReduce and then principal components are further extracted for each distributed block. With all these steps, statistical modeling of large-scale processes with big data can be established. Finally, a systematic fault detection and isolation scheme is designed so that the whole large-scale process can be hierarchically monitored from the plant-wide level, unit block level, and variable level. The effectiveness of the proposed method is evaluated through the Tennessee Eastman benchmark process.
Citations
More filters
Journal ArticleDOI
Zhiqiang Ge1
TL;DR: A systematic review on data-driven modeling and monitoring for plant-wide processes is presented in this paper, where the authors provide an overview of the state-of-the-art data processing and modeling procedures for the plantwide process monitoring.

462 citations

Journal ArticleDOI
Le Yao1, Zhiqiang Ge1
TL;DR: The proposed semisupervised HELM method is applied in a high–low transformer to estimate the carbon monoxide content, which shows a significant improvement of the prediction accuracy, compared to traditional methods.
Abstract: Data-driven soft sensors have been widely utilized in industrial processes to estimate the critical quality variables which are intractable to directly measure online through physical devices. Due to the low sampling rate of quality variables, most of the soft sensors are developed on small number of labeled samples and the large number of unlabeled process data is discarded. The loss of information greatly limits the improvement of quality prediction accuracy. One of the main issues of data-driven soft sensor is to furthest exploit the information contained in all available process data. This paper proposes a semisupervised deep learning model for soft sensor development based on the hierarchical extreme learning machine (HELM). First, the deep network structure of autoencoders is implemented for unsupervised feature extraction with all the process samples. Then, extreme learning machine is utilized for regression through appending the quality variable. Meanwhile, the manifold regularization method is introduced for semisupervised model training. The new method can not only deeply extract the information that the data contains, but learn more from the extra unlabeled samples as well. The proposed semisupervised HELM method is applied in a high–low transformer to estimate the carbon monoxide content, which shows a significant improvement of the prediction accuracy, compared to traditional methods.

222 citations

Journal ArticleDOI
TL;DR: The key idea of DMSPPM is first decomposing a plant-wide process into multiple subprocesses and then establishing a data-driven model for monitoring the process, in which process variable decomposition is important for guaranteeing the monitoring performance.
Abstract: Process monitoring is crucial for maintaining favorable operating conditions and has received considerable attention in previous decades. Currently, a plant-wide process generally consists of multiple operational units and a large number of measured variables. The correlation among the variables and units is complex and results in the imperative but challenging monitoring of such plant-wide processes. With the rapid advancement of industrial sensing techniques, process data with meaningful process information are collected. Data-driven multivariate statistical plant-wide process monitoring (DMSPPM) has become popular. The key idea of DMSPPM is first decomposing a plant-wide process into multiple subprocesses and then establishing a data-driven model for monitoring the process, in which process variable decomposition is important for guaranteeing the monitoring performance. In the current review, we first introduce the basics of multivariate statistical process monitoring and highlight the necessity of des...

206 citations

Journal ArticleDOI
Zhiqiang Ge1
TL;DR: A tutorial review of probabilistic latent variable models on process data analytics and detailed illustrations of different kinds of basic PLVMs are provided, as well as their research statuses.
Abstract: Dimensionality reduction is important for the high-dimensional nature of data in the process industry, which has made latent variable modeling methods popular in recent years. By projecting high-di...

185 citations

Journal ArticleDOI
TL;DR: A systematic review of various state-of-the-art data preprocessing tricks as well as robust principal component analysis methods for process understanding and monitoring applications and big data perspectives on potential challenges and opportunities have been highlighted.

176 citations

References
More filters
Journal ArticleDOI
Jeffrey Dean1, Sanjay Ghemawat1
06 Dec 2004
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Abstract: MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day.

20,309 citations

Journal ArticleDOI
Jeffrey Dean1, Sanjay Ghemawat1
TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
Abstract: MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Users specify the computation in terms of a map and a reduce function, and the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks. Programmers find the system easy to use: more than ten thousand distinct MapReduce programs have been implemented internally at Google over the past four years, and an average of one hundred thousand MapReduce jobs are executed on Google's clusters every day, processing a total of more than twenty petabytes of data per day.

17,663 citations

Book
29 May 2009
TL;DR: This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoops clusters.
Abstract: Hadoop: The Definitive Guide helps you harness the power of your data. Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on which Google built its empire. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoop clusters. Complete with case studies that illustrate how Hadoop solves specific problems, this book helps you: Use the Hadoop Distributed File System (HDFS) for storing large datasets, and run distributed computations over those datasets using MapReduce Become familiar with Hadoop's data and I/O building blocks for compression, data integrity, serialization, and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud Use Pig, a high-level query language for large-scale data processing Take advantage of HBase, Hadoop's database for structured and semi-structured data Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems If you have lots of data -- whether it's gigabytes or petabytes -- Hadoop is the perfect solution. Hadoop: The Definitive Guide is the most thorough book available on the subject. "Now you have the opportunity to learn about Hadoop from a master-not only of the technology, but also of common sense and plain talk." -- Doug Cutting, Hadoop Founder, Yahoo!

3,797 citations


"Distributed Parallel PCA for Modeli..." refers methods in this paper

  • ...In the meanwhile, the designed runtime system based on HDFS is designed with implicit mechanisms and can automatically deal with data splitting, parallel task scheduling/monitoring, parallel compute node communication management, and also provide data redundancy and fault tolerance mechanisms [25]....

    [...]

Journal ArticleDOI
TL;DR: In this article, a model of an industrial chemical process for the purpose of developing, studying and evaluating process control technology is presented, which is well suited for a wide variety of studies including both plantwide control and multivariable control problems.

2,603 citations


"Distributed Parallel PCA for Modeli..." refers background or methods in this paper

  • ...In this section, the effectiveness of the proposed method is investigated on the plant-wide TE process [27]....

    [...]

  • ...The process working flowchart can be found in the corresponding literature [27]....

    [...]

Book
23 Feb 2008
TL;DR: This book is to introduce basic model-based FDI schemes, advanced analysis and design algorithms and the needed mathematical and control theory tools at a level for graduate students and researchers as well as for engineers.
Abstract: A most critical and important issue surrounding the design of automatic control systems with the successively increasing complexity is guaranteeing a high system performance over a wide operating range and meeting the requirements on system reliability and dependability. As one of the key technologies for the problem solutions, advanced fault detection and identification (FDI) technology is receiving considerable attention. The objective of this book is to introduce basic model-based FDI schemes, advanced analysis and design algorithms and the needed mathematical and control theory tools at a level for graduate students and researchers as well as for engineers.

2,088 citations


"Distributed Parallel PCA for Modeli..." refers methods in this paper

  • ...For small-scale systems with explicit process details, traditional model-based first principal methods could be preferred [4]....

    [...]