Distributed Parallel PCA for Modeling and Monitoring of Large-Scale Plant-Wide Processes With Big Data

doi:10.1109/TII.2017.2658732

Home
/
Papers
/
Distributed Parallel PCA for Modeling and Monitoring of Large-Scale Plant-Wide Processes With Big Data

Journal Article•DOI•

Distributed Parallel PCA for Modeling and Monitoring of Large-Scale Plant-Wide Processes With Big Data

Jinlin Zhu¹, Zhiqiang Ge¹, Zhihuan Song¹•Institutions (1)

Zhejiang University¹

26 Jan 2017-IEEE Transactions on Industrial Informatics (IEEE)-Vol. 13, Iss: 4, pp 1877-1885

TL;DR: A systematic fault detection and isolation scheme is designed so that the whole large-scale process can be hierarchically monitored from the plant-wide level, unit block level, and variable level and the effectiveness of the proposed method is evaluated.

read less

Abstract: In order to deal with the modeling and monitoring issue of large-scale industrial processes with big data, a distributed and parallel designed principal component analysis approach is proposed. To handle the high-dimensional process variables, the large-scale process is first decomposed into distributed blocks with a priori process knowledge. Afterward, in order to solve the modeling issue with large-scale data chunks in each block, a distributed and parallel data processing strategy is proposed based on the framework of MapReduce and then principal components are further extracted for each distributed block. With all these steps, statistical modeling of large-scale processes with big data can be established. Finally, a systematic fault detection and isolation scheme is designed so that the whole large-scale process can be hierarchically monitored from the plant-wide level, unit block level, and variable level. The effectiveness of the proposed method is evaluated through the Tennessee Eastman benchmark process.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Review on data-driven modeling and monitoring for plant-wide industrial processes

[...]

Zhiqiang Ge¹•Institutions (1)

Zhejiang University¹

15 Dec 2017-Chemometrics and Intelligent Laboratory Systems

TL;DR: A systematic review on data-driven modeling and monitoring for plant-wide processes is presented in this paper, where the authors provide an overview of the state-of-the-art data processing and modeling procedures for the plantwide process monitoring.

...read moreread less

462 citations

Journal Article•DOI•

Deep Learning of Semisupervised Process Data With Hierarchical Extreme Learning Machine and Soft Sensor Application

[...]

Le Yao¹, Zhiqiang Ge¹•Institutions (1)

Zhejiang University¹

01 Feb 2018-IEEE Transactions on Industrial Electronics

TL;DR: The proposed semisupervised HELM method is applied in a high–low transformer to estimate the carbon monoxide content, which shows a significant improvement of the prediction accuracy, compared to traditional methods.

...read moreread less

Abstract: Data-driven soft sensors have been widely utilized in industrial processes to estimate the critical quality variables which are intractable to directly measure online through physical devices. Due to the low sampling rate of quality variables, most of the soft sensors are developed on small number of labeled samples and the large number of unlabeled process data is discarded. The loss of information greatly limits the improvement of quality prediction accuracy. One of the main issues of data-driven soft sensor is to furthest exploit the information contained in all available process data. This paper proposes a semisupervised deep learning model for soft sensor development based on the hierarchical extreme learning machine (HELM). First, the deep network structure of autoencoders is implemented for unsupervised feature extraction with all the process samples. Then, extreme learning machine is utilized for regression through appending the quality variable. Meanwhile, the manifold regularization method is introduced for semisupervised model training. The new method can not only deeply extract the information that the data contains, but learn more from the extra unlabeled samples as well. The proposed semisupervised HELM method is applied in a high–low transformer to estimate the carbon monoxide content, which shows a significant improvement of the prediction accuracy, compared to traditional methods.

...read moreread less

222 citations

Journal Article•DOI•

Review and Perspectives of Data-Driven Distributed Monitoring for Industrial Plant-Wide Processes

[...]

Qingchao Jiang¹, Xuefeng Yan¹, Biao Huang²•Institutions (2)

East China University of Science and Technology¹, University of Alberta²

08 Jul 2019-Industrial & Engineering Chemistry Research

TL;DR: The key idea of DMSPPM is first decomposing a plant-wide process into multiple subprocesses and then establishing a data-driven model for monitoring the process, in which process variable decomposition is important for guaranteeing the monitoring performance.

...read moreread less

Abstract: Process monitoring is crucial for maintaining favorable operating conditions and has received considerable attention in previous decades. Currently, a plant-wide process generally consists of multiple operational units and a large number of measured variables. The correlation among the variables and units is complex and results in the imperative but challenging monitoring of such plant-wide processes. With the rapid advancement of industrial sensing techniques, process data with meaningful process information are collected. Data-driven multivariate statistical plant-wide process monitoring (DMSPPM) has become popular. The key idea of DMSPPM is first decomposing a plant-wide process into multiple subprocesses and then establishing a data-driven model for monitoring the process, in which process variable decomposition is important for guaranteeing the monitoring performance. In the current review, we first introduce the basics of multivariate statistical process monitoring and highlight the necessity of des...

...read moreread less

206 citations

Journal Article•DOI•

Process Data Analytics via Probabilistic Latent Variable Models: A Tutorial Review

[...]

Zhiqiang Ge¹•Institutions (1)

Zhejiang University¹

31 Aug 2018-Industrial & Engineering Chemistry Research

TL;DR: A tutorial review of probabilistic latent variable models on process data analytics and detailed illustrations of different kinds of basic PLVMs are provided, as well as their research statuses.

...read moreread less

Abstract: Dimensionality reduction is important for the high-dimensional nature of data in the process industry, which has made latent variable modeling methods popular in recent years. By projecting high-di...

...read moreread less

185 citations

Journal Article•DOI•

Review and big data perspectives on robust data mining approaches for industrial process modeling with outliers and missing data

[...]

Jinlin Zhu¹, Jinlin Zhu², Zhiqiang Ge¹, Zhihuan Song¹, Furong Gao² - Show less +1 more•Institutions (2)

Zhejiang University¹, Hong Kong University of Science and Technology²

01 Jan 2018-Annual Reviews in Control

TL;DR: A systematic review of various state-of-the-art data preprocessing tricks as well as robust principal component analysis methods for process understanding and monitoring applications and big data perspectives on potential challenges and opportunities have been highlighted.

...read moreread less

176 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

MapReduce: simplified data processing on large clusters

[...]

Jeffrey Dean¹, Sanjay Ghemawat¹•Institutions (1)

Google¹

06 Dec 2004

TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.

...read moreread less

Abstract: MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day.

...read moreread less

20,309 citations

Journal Article•DOI•

MapReduce: simplified data processing on large clusters

[...]

Jeffrey Dean¹, Sanjay Ghemawat¹•Institutions (1)

Google¹

01 Jan 2008-Communications of The ACM

TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.

...read moreread less

Abstract: MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Users specify the computation in terms of a map and a reduce function, and the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks. Programmers find the system easy to use: more than ten thousand distinct MapReduce programs have been implemented internally at Google over the past four years, and an average of one hundred thousand MapReduce jobs are executed on Google's clusters every day, processing a total of more than twenty petabytes of data per day.

...read moreread less

17,663 citations

Book•

Hadoop: The Definitive Guide

[...]

Tom White

29 May 2009

TL;DR: This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoops clusters.

...read moreread less

Abstract: Hadoop: The Definitive Guide helps you harness the power of your data. Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on which Google built its empire. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoop clusters. Complete with case studies that illustrate how Hadoop solves specific problems, this book helps you: Use the Hadoop Distributed File System (HDFS) for storing large datasets, and run distributed computations over those datasets using MapReduce Become familiar with Hadoop's data and I/O building blocks for compression, data integrity, serialization, and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud Use Pig, a high-level query language for large-scale data processing Take advantage of HBase, Hadoop's database for structured and semi-structured data Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems If you have lots of data -- whether it's gigabytes or petabytes -- Hadoop is the perfect solution. Hadoop: The Definitive Guide is the most thorough book available on the subject. "Now you have the opportunity to learn about Hadoop from a master-not only of the technology, but also of common sense and plain talk." -- Doug Cutting, Hadoop Founder, Yahoo!

...read moreread less

3,797 citations

"Distributed Parallel PCA for Modeli..." refers methods in this paper

...In the meanwhile, the designed runtime system based on HDFS is designed with implicit mechanisms and can automatically deal with data splitting, parallel task scheduling/monitoring, parallel compute node communication management, and also provide data redundancy and fault tolerance mechanisms [25]....
[...]

Journal Article•DOI•

A plant-wide industrial process control problem

[...]

James J. Downs¹, E.F. Vogel¹•Institutions (1)

Eastman Chemical Company¹

01 Mar 1993-Computers & Chemical Engineering

TL;DR: In this article, a model of an industrial chemical process for the purpose of developing, studying and evaluating process control technology is presented, which is well suited for a wide variety of studies including both plantwide control and multivariable control problems.

...read moreread less

2,603 citations

"Distributed Parallel PCA for Modeli..." refers background or methods in this paper

...In this section, the effectiveness of the proposed method is investigated on the plant-wide TE process [27]....
[...]
...The process working flowchart can be found in the corresponding literature [27]....
[...]

Book•

Model-based Fault Diagnosis Techniques: Design Schemes, Algorithms, and Tools

[...]

Steven X. Ding

23 Feb 2008

TL;DR: This book is to introduce basic model-based FDI schemes, advanced analysis and design algorithms and the needed mathematical and control theory tools at a level for graduate students and researchers as well as for engineers.

...read moreread less

Abstract: A most critical and important issue surrounding the design of automatic control systems with the successively increasing complexity is guaranteeing a high system performance over a wide operating range and meeting the requirements on system reliability and dependability. As one of the key technologies for the problem solutions, advanced fault detection and identification (FDI) technology is receiving considerable attention. The objective of this book is to introduce basic model-based FDI schemes, advanced analysis and design algorithms and the needed mathematical and control theory tools at a level for graduate students and researchers as well as for engineers.

...read moreread less

2,088 citations