scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Identification source of variation on regional impact of air quality pattern using chemometric

TL;DR: In this paper, the effectiveness of hierarchical agglomerative cluster analysis (HACA), discriminant analysis (DA), principal component analysis (PCA), factor analysis (FA), and multiple linear regressions (MLR) for assessing the air quality data and air pollution sources pattern recognition were applied.
Abstract: This study intends to show the effectiveness of hierarchical agglomerative cluster analysis (HACA), discriminant analysis (DA), principal component analysis (PCA), factor analysis (FA) and multiple linear regressions (MLR) for assessing the air quality data and air pollution sources pattern recognition. The data sets of air quality for 12 months (January–December) in 2007, consisting of 14 stations around Peninsular Malaysia with 14 parameters (168 datasets) were applied. Three significant clusters - low pollution source (LPS) region, moderate pollution source (MPS) region, and slightly high pollution source (SHPS) region were generated via HACA. Forward stepwise of DA managed to discriminate 8 variables, whereas backward stepwise of DA managed to discriminate 9 out of 14 variables. The method of PCA and FA has identified 8 pollutants in LPS and SHPS respectively, as well as 11 pollutants in MPS region, where most of the pollutants are expected derived from industrial activities, transportation and agriculture systems. Four MLR models show that PM10 categorize as the primary pollutant in Malaysia. From the study, it can be stipulated that the application of chemometric techniques can disclose meaningful information on the spatial variability of a large and complex air quality data. A clearer review about the air quality and a novel design of air quality monitoring network for better management of air pollution can be achieved.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: The results showed that clustering based on topological features via the improved HACA approach was able to correctly group the months with severe haze compared to clustering them without such features, and these results were consistent for all three locations.
Abstract: Severe haze episodes have periodically occurred in Southeast Asia, specifically taunting Malaysia with adverse effects. A technique called cluster analysis was used to analyze these occurrences. Traditional cluster analysis, in particular, hierarchical agglomerative cluster analysis (HACA), was applied directly to data sets. The data sets may contain hidden patterns that can be explored. In this paper, this underlying information was captured via persistent homology, a topological data analysis (TDA) tool, which extracts topological features including components, holes, and cavities in the data sets. In particular, an improved version of HACA was proposed by combining HACA and persistent homology. Additionally, a comparative study between traditional HACA and improved HACA was done using particulate matter data, which was the major pollutant found during haze episodes by the Klang, Petaling Jaya, and Shah Alam air quality monitoring stations. The effectiveness of these two clustering approaches was evaluated based on their ability to cluster the months according to the haze condition. The results showed that clustering based on topological features via the improved HACA approach was able to correctly group the months with severe haze compared to clustering them without such features, and these results were consistent for all three locations.

7 citations

Journal ArticleDOI
TL;DR: In this article, the spatial distribution of social support index (SSI) among drug-abuse inmates throughout Peninsular Malaysia was identified by using Factor Analysis (FA) and Discriminant Analysis (DA).
Abstract: This study was to identify the spatial distribution of Social Support Index (SSI) among drug-abuse inmates throughout Peninsular Malaysia. Factor Analysis (FA) and Discriminant Analysis (DA) were applied to analyses the level of social support (SS) among drug-abuse inmates and develop the spatial model using Geographic Information System (GIS). Five significant index categories were generated from FA: excellent, good, moderate, low and poor Quality of Life Index (QoLi) and the nine of SS variables are expected to be derived from family, friends and other social factor. DA showed each category differed from others in terms of different compositions, stepwise backward and forward modes gave 99.75% correct classification. GIS analysis show the distribution of SSI categorized on family and friends factor were moderately for where the prisoners came. Besides that, Perlis classified as low-level index and Melaka as high-level index of other social factor. The distribution model of SSI in moderately-level showed Jelebu, Sungai Petani, Pengkalan Chepa and Simpang Renggang as the better SS factor to quality of life compared to the Penor, Pahang. The procedures of FA, DA and GIS were used in this study proved the source apportionment of SS and QoLi among drug-abuse inmates in Peninsular Malaysian prisons.

7 citations


Cites background from "Identification source of variation ..."

  • ...49) are considered as “weak” factor loadings [39]....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the differences between the topological features of months with and those without haze episodes observed at air quality monitoring stations located in the areas of Jerantut, Klang, Petaling Jaya and Shah Alam were investigated.
Abstract: Haze is one of the major environmental issues that have continuously vexed countries worldwide, including Malaysia, for the last three decades. Therefore, this study aims to investigate the differences between the topological features of months with and those without haze episodes observed at air quality monitoring stations located in the areas of Jerantut, Klang, Petaling Jaya and Shah Alam. We employ persistent homology, which is a method of topological data analysis (TDA) that focuses on connected components and holes in the data, to characterize the local particulate matter (PM10). The summary statistics reveal drastic changes in the lifetimes of the topological data from every station during haze episodes, highlighting the possibility of developing an early detection system for haze based on our approach.

7 citations


Cites methods from "Identification source of variation ..."

  • ...Similarly, other methods like chemometric analysis (Azid et al., 2015),...

    [...]

  • ...Similarly, other methods like chemometric analysis (Azid et al., 2015), fuzzy comprehensive evaluation method (Zhao et al., 2010) and chaotic approach (Hamid and Noorani, 2014) have also been utilized in assessing data on air pollutants....

    [...]

Journal ArticleDOI
TL;DR: In this paper, Zainal Abidin et al. identified the significant variables and verified the best statistical method for determining the effect of indoor air quality (IAQ) at 7 different locations in Universiti Sultan Malaysia.
Abstract: The objectives of this study are to identify the significant variables and to verify the best statistical method for determining the effect of indoor air quality (IAQ) at 7 different locations in Universiti Sultan Zainal Abidin, Terengganu, Malaysia. The IAQ data were collected using in-situ measurement. Principal component analysis (PCA), partial least squares discriminant analysis (PLS-DA), linear discrimination analysis (LDA), and agglomerative hierarchical clustering (AHC) were used to classify the significant variables as well as to compare the best method for determining IAQ levels. PCA verifies only 4 out of 9 parameters (PM10, PM2.5, PM1.0, and O3) and is the significant variable in IAQ. The PLS-DA model classifies 89.05% correct of the IAQ variables in each station compared to LDA with only 66.67% correct. AHC identifies three cluster groups, which are highly polluted concentration (HPC), moderately polluted concentration (MPC), and low-polluted concentration (LPC) area. PLS-DA verifies the groups produced by AHC by identifying the variables that affect the quality at each station without being affected by redundancy. In conclusion, PLS-DA is a promising procedure for differentiating the group classes and determining the correct percentage of variables for IAQ.

6 citations


Cites methods from "Identification source of variation ..."

  • ...AHC were performed based on the normal distribution of datasets through Ward’s method by means of Euclidean distances, as a measure of the connection between the datasets or variables [24]....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the authors examined the effects of population growth on the economic development between the two developed and developing countries which is Singapore and Malaysia, and they found that there was a strong relationship between the effects and economic development, which is the growth of population is depending on economic growth.
Abstract: This research examines the effects of population growth on the economic development between the two developed and developing countries which is Singapore and Malaysia. They were many previous studies that have sought to gauge the effects or impact of population growth along the economic development. It was said that there was a strong relationship between the effects of population growth and the economic development, which is the growth of population is depending on the economic growth. Singapore was well known worldwide as a highly developed free-market economy. The economy of Singapore has been ranked as the most open in the world and the most-pro business. The population in the country is estimated at 5.5 million recently. As for Malaysia, it is known as the most competitive developing countries and is ranked on the 5th largest in South Asia. The population estimated at 31.63 million in Malaysia.

5 citations

References
More filters
Journal ArticleDOI
TL;DR: In this paper, a procedure for forming hierarchical groups of mutually exclusive subsets, each of which has members that are maximally similar with respect to specified characteristics, is suggested for use in large-scale (n > 100) studies when a precise optimal solution for a specified number of groups is not practical.
Abstract: A procedure for forming hierarchical groups of mutually exclusive subsets, each of which has members that are maximally similar with respect to specified characteristics, is suggested for use in large-scale (n > 100) studies when a precise optimal solution for a specified number of groups is not practical. Given n sets, this procedure permits their reduction to n − 1 mutually exclusive sets by considering the union of all possible n(n − 1)/2 pairs and selecting a union having a maximal value for the functional relation, or objective function, that reflects the criterion chosen by the investigator. By repeating this process until only one group remains, the complete hierarchical structure and a quantitative estimate of the loss associated with each stage in the grouping can be obtained. A general flowchart helpful in computer programming and a numerical example are included.

17,405 citations


"Identification source of variation ..." refers methods in this paper

  • ...Analysis of variance (ANOVA) is used to analyse the distances between clusters in Ward’s method, which is established to minimize the total of squares of any two achievable clusters at every step (Ward, 1963)....

    [...]

Book
01 Jan 1982
TL;DR: In this article, the authors present an overview of the basic concepts of multivariate analysis, including matrix algebra and random vectors, as well as a strategy for analyzing multivariate models.
Abstract: (NOTE: Each chapter begins with an Introduction, and concludes with Exercises and References.) I. GETTING STARTED. 1. Aspects of Multivariate Analysis. Applications of Multivariate Techniques. The Organization of Data. Data Displays and Pictorial Representations. Distance. Final Comments. 2. Matrix Algebra and Random Vectors. Some Basics of Matrix and Vector Algebra. Positive Definite Matrices. A Square-Root Matrix. Random Vectors and Matrices. Mean Vectors and Covariance Matrices. Matrix Inequalities and Maximization. Supplement 2A Vectors and Matrices: Basic Concepts. 3. Sample Geometry and Random Sampling. The Geometry of the Sample. Random Samples and the Expected Values of the Sample Mean and Covariance Matrix. Generalized Variance. Sample Mean, Covariance, and Correlation as Matrix Operations. Sample Values of Linear Combinations of Variables. 4. The Multivariate Normal Distribution. The Multivariate Normal Density and Its Properties. Sampling from a Multivariate Normal Distribution and Maximum Likelihood Estimation. The Sampling Distribution of 'X and S. Large-Sample Behavior of 'X and S. Assessing the Assumption of Normality. Detecting Outliners and Data Cleaning. Transformations to Near Normality. II. INFERENCES ABOUT MULTIVARIATE MEANS AND LINEAR MODELS. 5. Inferences About a Mean Vector. The Plausibility of ...m0 as a Value for a Normal Population Mean. Hotelling's T 2 and Likelihood Ratio Tests. Confidence Regions and Simultaneous Comparisons of Component Means. Large Sample Inferences about a Population Mean Vector. Multivariate Quality Control Charts. Inferences about Mean Vectors When Some Observations Are Missing. Difficulties Due To Time Dependence in Multivariate Observations. Supplement 5A Simultaneous Confidence Intervals and Ellipses as Shadows of the p-Dimensional Ellipsoids. 6. Comparisons of Several Multivariate Means. Paired Comparisons and a Repeated Measures Design. Comparing Mean Vectors from Two Populations. Comparison of Several Multivariate Population Means (One-Way MANOVA). Simultaneous Confidence Intervals for Treatment Effects. Two-Way Multivariate Analysis of Variance. Profile Analysis. Repealed Measures, Designs, and Growth Curves. Perspectives and a Strategy for Analyzing Multivariate Models. 7. Multivariate Linear Regression Models. The Classical Linear Regression Model. Least Squares Estimation. Inferences About the Regression Model. Inferences from the Estimated Regression Function. Model Checking and Other Aspects of Regression. Multivariate Multiple Regression. The Concept of Linear Regression. Comparing the Two Formulations of the Regression Model. Multiple Regression Models with Time Dependant Errors. Supplement 7A The Distribution of the Likelihood Ratio for the Multivariate Regression Model. III. ANALYSIS OF A COVARIANCE STRUCTURE. 8. Principal Components. Population Principal Components. Summarizing Sample Variation by Principal Components. Graphing the Principal Components. Large-Sample Inferences. Monitoring Quality with Principal Components. Supplement 8A The Geometry of the Sample Principal Component Approximation. 9. Factor Analysis and Inference for Structured Covariance Matrices. The Orthogonal Factor Model. Methods of Estimation. Factor Rotation. Factor Scores. Perspectives and a Strategy for Factor Analysis. Structural Equation Models. Supplement 9A Some Computational Details for Maximum Likelihood Estimation. 10. Canonical Correlation Analysis Canonical Variates and Canonical Correlations. Interpreting the Population Canonical Variables. The Sample Canonical Variates and Sample Canonical Correlations. Additional Sample Descriptive Measures. Large Sample Inferences. IV. CLASSIFICATION AND GROUPING TECHNIQUES. 11. Discrimination and Classification. Separation and Classification for Two Populations. Classifications with Two Multivariate Normal Populations. Evaluating Classification Functions. Fisher's Discriminant Function...nSeparation of Populations. Classification with Several Populations. Fisher's Method for Discriminating among Several Populations. Final Comments. 12. Clustering, Distance Methods and Ordination. Similarity Measures. Hierarchical Clustering Methods. Nonhierarchical Clustering Methods. Multidimensional Scaling. Correspondence Analysis. Biplots for Viewing Sample Units and Variables. Procustes Analysis: A Method for Comparing Configurations. Appendix. Standard Normal Probabilities. Student's t-Distribution Percentage Points. ...c2 Distribution Percentage Points. F-Distribution Percentage Points. F-Distribution Percentage Points (...a = .10). F-Distribution Percentage Points (...a = .05). F-Distribution Percentage Points (...a = .01). Data Index. Subject Index.

11,697 citations


"Identification source of variation ..." refers background in this paper

  • ...For every cluster, it creates a discriminant function (DF) (Johnson and Wichern 1992)....

    [...]

Journal ArticleDOI
TL;DR: In this article, the authors present an overview of the basic concepts of multivariate analysis, including matrix algebra and random vectors, as well as a strategy for analyzing multivariate models.
Abstract: (NOTE: Each chapter begins with an Introduction, and concludes with Exercises and References.) I. GETTING STARTED. 1. Aspects of Multivariate Analysis. Applications of Multivariate Techniques. The Organization of Data. Data Displays and Pictorial Representations. Distance. Final Comments. 2. Matrix Algebra and Random Vectors. Some Basics of Matrix and Vector Algebra. Positive Definite Matrices. A Square-Root Matrix. Random Vectors and Matrices. Mean Vectors and Covariance Matrices. Matrix Inequalities and Maximization. Supplement 2A Vectors and Matrices: Basic Concepts. 3. Sample Geometry and Random Sampling. The Geometry of the Sample. Random Samples and the Expected Values of the Sample Mean and Covariance Matrix. Generalized Variance. Sample Mean, Covariance, and Correlation as Matrix Operations. Sample Values of Linear Combinations of Variables. 4. The Multivariate Normal Distribution. The Multivariate Normal Density and Its Properties. Sampling from a Multivariate Normal Distribution and Maximum Likelihood Estimation. The Sampling Distribution of 'X and S. Large-Sample Behavior of 'X and S. Assessing the Assumption of Normality. Detecting Outliners and Data Cleaning. Transformations to Near Normality. II. INFERENCES ABOUT MULTIVARIATE MEANS AND LINEAR MODELS. 5. Inferences About a Mean Vector. The Plausibility of ...m0 as a Value for a Normal Population Mean. Hotelling's T 2 and Likelihood Ratio Tests. Confidence Regions and Simultaneous Comparisons of Component Means. Large Sample Inferences about a Population Mean Vector. Multivariate Quality Control Charts. Inferences about Mean Vectors When Some Observations Are Missing. Difficulties Due To Time Dependence in Multivariate Observations. Supplement 5A Simultaneous Confidence Intervals and Ellipses as Shadows of the p-Dimensional Ellipsoids. 6. Comparisons of Several Multivariate Means. Paired Comparisons and a Repeated Measures Design. Comparing Mean Vectors from Two Populations. Comparison of Several Multivariate Population Means (One-Way MANOVA). Simultaneous Confidence Intervals for Treatment Effects. Two-Way Multivariate Analysis of Variance. Profile Analysis. Repealed Measures, Designs, and Growth Curves. Perspectives and a Strategy for Analyzing Multivariate Models. 7. Multivariate Linear Regression Models. The Classical Linear Regression Model. Least Squares Estimation. Inferences About the Regression Model. Inferences from the Estimated Regression Function. Model Checking and Other Aspects of Regression. Multivariate Multiple Regression. The Concept of Linear Regression. Comparing the Two Formulations of the Regression Model. Multiple Regression Models with Time Dependant Errors. Supplement 7A The Distribution of the Likelihood Ratio for the Multivariate Regression Model. III. ANALYSIS OF A COVARIANCE STRUCTURE. 8. Principal Components. Population Principal Components. Summarizing Sample Variation by Principal Components. Graphing the Principal Components. Large-Sample Inferences. Monitoring Quality with Principal Components. Supplement 8A The Geometry of the Sample Principal Component Approximation. 9. Factor Analysis and Inference for Structured Covariance Matrices. The Orthogonal Factor Model. Methods of Estimation. Factor Rotation. Factor Scores. Perspectives and a Strategy for Factor Analysis. Structural Equation Models. Supplement 9A Some Computational Details for Maximum Likelihood Estimation. 10. Canonical Correlation Analysis Canonical Variates and Canonical Correlations. Interpreting the Population Canonical Variables. The Sample Canonical Variates and Sample Canonical Correlations. Additional Sample Descriptive Measures. Large Sample Inferences. IV. CLASSIFICATION AND GROUPING TECHNIQUES. 11. Discrimination and Classification. Separation and Classification for Two Populations. Classifications with Two Multivariate Normal Populations. Evaluating Classification Functions. Fisher's Discriminant Function...nSeparation of Populations. Classification with Several Populations. Fisher's Method for Discriminating among Several Populations. Final Comments. 12. Clustering, Distance Methods and Ordination. Similarity Measures. Hierarchical Clustering Methods. Nonhierarchical Clustering Methods. Multidimensional Scaling. Correspondence Analysis. Biplots for Viewing Sample Units and Variables. Procustes Analysis: A Method for Comparing Configurations. Appendix. Standard Normal Probabilities. Student's t-Distribution Percentage Points. ...c2 Distribution Percentage Points. F-Distribution Percentage Points. F-Distribution Percentage Points (...a = .10). F-Distribution Percentage Points (...a = .05). F-Distribution Percentage Points (...a = .01). Data Index. Subject Index.

10,148 citations

Journal ArticleDOI
TL;DR: This study illustrates the usefulness of multivariate statistical techniques for analysis and interpretation of complex data sets, and in water quality assessment, identification of pollution sources/factors and understanding temporal/spatial variations in waterquality for effective river water quality management.
Abstract: Multivariate statistical techniques, such as cluster analysis (CA), principal component analysis (PCA), factor analysis (FA) and discriminant analysis (DA), were applied for the evaluation of temporal/spatial variations and the interpretation of a large complex water quality data set of the Fuji river basin, generated during 8 years (1995–2002) monitoring of 12 parameters at 13 different sites (14 976 observations). Hierarchical cluster analysis grouped 13 sampling sites into three clusters, i.e., relatively less polluted (LP), medium polluted (MP) and highly polluted (HP) sites, based on the similarity of water quality characteristics. Factor analysis/principal component analysis, applied to the data sets of the three different groups obtained from cluster analysis, resulted in five, five and three latent factors explaining 73.18, 77.61 and 65.39% of the total variance in water quality data sets of LP, MP and HP areas, respectively. The varifactors obtained from factor analysis indicate that the parameters responsible for water quality variations are mainly related to discharge and temperature (natural), organic pollution (point source: domestic wastewater) in relatively less polluted areas; organic pollution (point source: domestic wastewater) and nutrients (non-point sources: agriculture and orchard plantations) in medium polluted areas; and organic pollution and nutrients (point sources: domestic wastewater, wastewater treatment plants and industries) in highly polluted areas in the basin. Discriminant analysis gave the best results for both spatial and temporal analysis. It provided an important data reduction as it uses only six parameters (discharge, temperature, dissolved oxygen, biochemical oxygen demand, electrical conductivity and nitrate nitrogen), affording more than 85% correct assignations in temporal analysis, and seven parameters (discharge, temperature, biochemical oxygen demand, pH, electrical conductivity, nitrate nitrogen and ammonical nitrogen), affording more than 81% correct assignations in spatial analysis, of three different sampling sites of the basin. Therefore, DA allowed a reduction in the dimensionality of the large data set, delineating a few indicator parameters responsible for large variations in water quality. Thus, this study illustrates the usefulness of multivariate statistical techniques for analysis and interpretation of complex data sets, and in water quality assessment, identification of pollution sources/factors and understanding temporal/spatial variations in water quality for effective river water quality management.

1,481 citations


"Identification source of variation ..." refers methods in this paper

  • ...The measure will be multiplied by 100 as a way to standardize the linkage distance signified by the y-axis (Shrestha and Kazama, 2007)....

    [...]

Journal ArticleDOI
TL;DR: This study presents necessity and usefulness of multivariate statistical techniques for evaluation and interpretation of large complex data sets with a view to get better information about the water quality and design of monitoring network for effective management of water resources.

1,429 citations


"Identification source of variation ..." refers background in this paper

  • ...It presents the details on the most significant variables due to spatial and temporal variations, by putting them from the less significant variables with minimum loss of the original information (Singh et al., 2004; 2005; Azid et al., 2015)....

    [...]

Related Papers (5)