scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Identification source of variation on regional impact of air quality pattern using chemometric

TL;DR: In this paper, the effectiveness of hierarchical agglomerative cluster analysis (HACA), discriminant analysis (DA), principal component analysis (PCA), factor analysis (FA), and multiple linear regressions (MLR) for assessing the air quality data and air pollution sources pattern recognition were applied.
Abstract: This study intends to show the effectiveness of hierarchical agglomerative cluster analysis (HACA), discriminant analysis (DA), principal component analysis (PCA), factor analysis (FA) and multiple linear regressions (MLR) for assessing the air quality data and air pollution sources pattern recognition. The data sets of air quality for 12 months (January–December) in 2007, consisting of 14 stations around Peninsular Malaysia with 14 parameters (168 datasets) were applied. Three significant clusters - low pollution source (LPS) region, moderate pollution source (MPS) region, and slightly high pollution source (SHPS) region were generated via HACA. Forward stepwise of DA managed to discriminate 8 variables, whereas backward stepwise of DA managed to discriminate 9 out of 14 variables. The method of PCA and FA has identified 8 pollutants in LPS and SHPS respectively, as well as 11 pollutants in MPS region, where most of the pollutants are expected derived from industrial activities, transportation and agriculture systems. Four MLR models show that PM10 categorize as the primary pollutant in Malaysia. From the study, it can be stipulated that the application of chemometric techniques can disclose meaningful information on the spatial variability of a large and complex air quality data. A clearer review about the air quality and a novel design of air quality monitoring network for better management of air pollution can be achieved.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: In this paper, four multivariate receptor models (Unmix, Positive Matrix Factorization, Principal Component Analysis, and Multiple Curve Regression) are used for source apportionment of road dust at Vellore City, India.
Abstract: Road dust is one of the biggest contributors to airborne particulate matter (PM) in many urban regions. Due to the inherent heterogeneity of road dust, it is important that its sources are identified and mitigated. Multivariate receptor models are used for source apportionment of PM in many cities. In recent years, these receptor models are finding more applications outside the scope of PM source apportionment. In this study, four multivariate receptor models (Unmix, Positive Matrix Factorization, Principal Component Analysis, and Multiple Curve Regression) are used for source apportionment of road dust at Vellore City, India. The elemental composition of road dust samples from 18 locations and for three seasons (summer, winter, and monsoon) are measured using acid digestion followed by Inductively Coupled Plasma–Optical Emission Spectroscopy. Irrespective of models, results showed that crustal material (100–68%) and resuspended road dust (82–15%) are the biggest contributors to road dust in the study region. Brake wear, tire wear, biomass combustion, vehicular emission, and industrial sources are some of the other sources identified by the receptor models. Receptor modeling performance of MCR and PCA models are unsatisfactory. PMF and Unmix models gave acceptable results. From comparing the performance characteristics, Unmix is found to be the ideal receptor model for this dataset. This research clarifies the constraints of different receptor models and the source apportionment information obtained is critical for development of future policy and regulation.

4 citations

Journal ArticleDOI
TL;DR: In this article, the spatial-temporal relationship of particulate matter (PM10), to determine the characteristic of each location and to classify hierarchical of the location in relation to their impact on PM10 concentration in Klang Valley.
Abstract: The urbanization in Klang Valley, Peninsular Malaysia over the last decades has induce the atmospheric pollution’s risk resulted to negative impact on the environment. The aims of this paper are to identify the spatial-temporal relationship of particulate matter (PM10), to determine the characteristic of each location and to classify hierarchical of the location in relation to their impact on PM10 concentration in Klang Valley. The Spearman correlation test indicate that there was strong significant relationship between all the locations (> 0.7; p < 0.001) and moderate relationship between Petaling Jaya-Kajang and Kajang-Shah Alam (< 0.7; p < 0.001). The principal component analysis (PCA) identifies all four locations have been affected by PM10 which were determined as one of the pollutant that deteriorated the air quality. Cluster analysis (CA) has classified the PM10 pattern into three (3) different classes; Class 1 (Klang), Class 2 (Petaling Jaya and Kajang) and Class 3 (Shah Alam) based on location. Further analysis of CA would be able to classify the PM10 classes into groups depending on their dissimilarities characteristic. Thus, possible period of extreme air quality degradation could be identified. Therefore, statistical and envirometric techniques have proved the impact of the various location on increasing concentration of PM10.

4 citations


Cites methods from "Identification source of variation ..."

  • ...CA is employed on the normal distribution dataset through the Ward’s method by means of Euclidean distances, as a measure of the relationship [11, 12, 16]....

    [...]

  • ...Cluster analysis (CA) CA is an unsupervised pattern recognition identification method, used to split a large group into smaller ones [12] based on homogeneity data....

    [...]

Journal ArticleDOI
TL;DR: In this paper , the extent of implementing the end-of-life vehicles (ELV) policy and the social readiness in implementing environmentally friendly ELV disposal in Malaysia were determined. But no proper rule oversees the disposal of ELV waste in Malaysia.
Abstract: Effective management of end-of-life vehicles (ELVs) represents a sound strategy to mitigate global climate change. ELVs are contaminants that pollute water, air, soil, and landscape. This waste flow must be adequately treated, but no proper rule oversees the disposal of ELV waste in Malaysia. This study aims to determine the extent of implementing the ELV policy and the social readiness in implementing environmentally friendly ELV disposal in Malaysia. The questionnaire seeks public input on critical ELV concerns such as public perception of the phenomena, environmental and safety standards, and recycling and treatment facilities. This research uses a cross-sectional design with 448 respondents in the survey. Fit models in structural equation modeling are evaluated using a variety of goodness-of-fit indicators to ensure an actual hypothesis. This study's advantages include the availability of representative samples and allowing for comparable and generalizable conclusions to larger communities throughout Malaysia. It is found that personal experience is significantly correlated with social readiness. The cause of ELV vehicles knowledge was the vital mediator, along with recycling costs knowledge. Thus, knowledge regarding ELV management costs is the most decisive mediation variable to predict public acceptance. The recommended strategy to reduce resentment and rejection of ELV policy is to disseminate information about the negative ELV impact on environmental and social sustainability.

3 citations

Journal ArticleDOI
19 Dec 2019
TL;DR: In this paper, the main objective is to classify the level of PM10 in selected locations in Peninsular Malaysia using discriminant analysis, and two important components considered in this study, namely; the meteorological factors and pollutant factors.
Abstract: Particulate matter (PM) comprises of a complex mixture of small solid or liquid particles of organic and inorganic elements that floats freely in air. PM10 is defined as a particulate matter with an aerodynamic diameter of 10 m or less. The main objective of this paper is to classify the level of PM10 in selected locations in Peninsular Malaysia using discriminant analysis. Two important components considered in this study, namely; the meteorological factors and pollutant factors. The meteorological factors comprise of wind speed, wind direction, humidity and temperature while pollutant factors consist of Carbon Monoxide (CO), Sulphur Dioxide (SO2), Nitrogen Dioxide (NO2) and Ozone (O3). The classification of high or low level of PM10 concentrations was based on the Malaysia Ambient Air Quality Guideline (MAAQG). The findings indicated that the classification equation differs from location to location due to different levels of PM10 concentrations, location of monitoring stations and factors affecting air pollution in that location. The simulation data also verified that the classification of PM10 concentration was almost similar to the real condition that occurred in Klang in October 2015.

3 citations


Cites background from "Identification source of variation ..."

  • ...Klang and Shah Alam are located nearby main roads, industrial and residential areas and thus experienced high density of vehicles which contributed to high concentrations of these two pollutants (Azid et al., 2015)....

    [...]

Journal ArticleDOI
01 Dec 2019

3 citations


Cites background from "Identification source of variation ..."

  • ...8 and it is a classification technique [45] on observations performed [46] by researchers at the early stage of study [47]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: In this paper, a procedure for forming hierarchical groups of mutually exclusive subsets, each of which has members that are maximally similar with respect to specified characteristics, is suggested for use in large-scale (n > 100) studies when a precise optimal solution for a specified number of groups is not practical.
Abstract: A procedure for forming hierarchical groups of mutually exclusive subsets, each of which has members that are maximally similar with respect to specified characteristics, is suggested for use in large-scale (n > 100) studies when a precise optimal solution for a specified number of groups is not practical. Given n sets, this procedure permits their reduction to n − 1 mutually exclusive sets by considering the union of all possible n(n − 1)/2 pairs and selecting a union having a maximal value for the functional relation, or objective function, that reflects the criterion chosen by the investigator. By repeating this process until only one group remains, the complete hierarchical structure and a quantitative estimate of the loss associated with each stage in the grouping can be obtained. A general flowchart helpful in computer programming and a numerical example are included.

17,405 citations


"Identification source of variation ..." refers methods in this paper

  • ...Analysis of variance (ANOVA) is used to analyse the distances between clusters in Ward’s method, which is established to minimize the total of squares of any two achievable clusters at every step (Ward, 1963)....

    [...]

Book
01 Jan 1982
TL;DR: In this article, the authors present an overview of the basic concepts of multivariate analysis, including matrix algebra and random vectors, as well as a strategy for analyzing multivariate models.
Abstract: (NOTE: Each chapter begins with an Introduction, and concludes with Exercises and References.) I. GETTING STARTED. 1. Aspects of Multivariate Analysis. Applications of Multivariate Techniques. The Organization of Data. Data Displays and Pictorial Representations. Distance. Final Comments. 2. Matrix Algebra and Random Vectors. Some Basics of Matrix and Vector Algebra. Positive Definite Matrices. A Square-Root Matrix. Random Vectors and Matrices. Mean Vectors and Covariance Matrices. Matrix Inequalities and Maximization. Supplement 2A Vectors and Matrices: Basic Concepts. 3. Sample Geometry and Random Sampling. The Geometry of the Sample. Random Samples and the Expected Values of the Sample Mean and Covariance Matrix. Generalized Variance. Sample Mean, Covariance, and Correlation as Matrix Operations. Sample Values of Linear Combinations of Variables. 4. The Multivariate Normal Distribution. The Multivariate Normal Density and Its Properties. Sampling from a Multivariate Normal Distribution and Maximum Likelihood Estimation. The Sampling Distribution of 'X and S. Large-Sample Behavior of 'X and S. Assessing the Assumption of Normality. Detecting Outliners and Data Cleaning. Transformations to Near Normality. II. INFERENCES ABOUT MULTIVARIATE MEANS AND LINEAR MODELS. 5. Inferences About a Mean Vector. The Plausibility of ...m0 as a Value for a Normal Population Mean. Hotelling's T 2 and Likelihood Ratio Tests. Confidence Regions and Simultaneous Comparisons of Component Means. Large Sample Inferences about a Population Mean Vector. Multivariate Quality Control Charts. Inferences about Mean Vectors When Some Observations Are Missing. Difficulties Due To Time Dependence in Multivariate Observations. Supplement 5A Simultaneous Confidence Intervals and Ellipses as Shadows of the p-Dimensional Ellipsoids. 6. Comparisons of Several Multivariate Means. Paired Comparisons and a Repeated Measures Design. Comparing Mean Vectors from Two Populations. Comparison of Several Multivariate Population Means (One-Way MANOVA). Simultaneous Confidence Intervals for Treatment Effects. Two-Way Multivariate Analysis of Variance. Profile Analysis. Repealed Measures, Designs, and Growth Curves. Perspectives and a Strategy for Analyzing Multivariate Models. 7. Multivariate Linear Regression Models. The Classical Linear Regression Model. Least Squares Estimation. Inferences About the Regression Model. Inferences from the Estimated Regression Function. Model Checking and Other Aspects of Regression. Multivariate Multiple Regression. The Concept of Linear Regression. Comparing the Two Formulations of the Regression Model. Multiple Regression Models with Time Dependant Errors. Supplement 7A The Distribution of the Likelihood Ratio for the Multivariate Regression Model. III. ANALYSIS OF A COVARIANCE STRUCTURE. 8. Principal Components. Population Principal Components. Summarizing Sample Variation by Principal Components. Graphing the Principal Components. Large-Sample Inferences. Monitoring Quality with Principal Components. Supplement 8A The Geometry of the Sample Principal Component Approximation. 9. Factor Analysis and Inference for Structured Covariance Matrices. The Orthogonal Factor Model. Methods of Estimation. Factor Rotation. Factor Scores. Perspectives and a Strategy for Factor Analysis. Structural Equation Models. Supplement 9A Some Computational Details for Maximum Likelihood Estimation. 10. Canonical Correlation Analysis Canonical Variates and Canonical Correlations. Interpreting the Population Canonical Variables. The Sample Canonical Variates and Sample Canonical Correlations. Additional Sample Descriptive Measures. Large Sample Inferences. IV. CLASSIFICATION AND GROUPING TECHNIQUES. 11. Discrimination and Classification. Separation and Classification for Two Populations. Classifications with Two Multivariate Normal Populations. Evaluating Classification Functions. Fisher's Discriminant Function...nSeparation of Populations. Classification with Several Populations. Fisher's Method for Discriminating among Several Populations. Final Comments. 12. Clustering, Distance Methods and Ordination. Similarity Measures. Hierarchical Clustering Methods. Nonhierarchical Clustering Methods. Multidimensional Scaling. Correspondence Analysis. Biplots for Viewing Sample Units and Variables. Procustes Analysis: A Method for Comparing Configurations. Appendix. Standard Normal Probabilities. Student's t-Distribution Percentage Points. ...c2 Distribution Percentage Points. F-Distribution Percentage Points. F-Distribution Percentage Points (...a = .10). F-Distribution Percentage Points (...a = .05). F-Distribution Percentage Points (...a = .01). Data Index. Subject Index.

11,697 citations


"Identification source of variation ..." refers background in this paper

  • ...For every cluster, it creates a discriminant function (DF) (Johnson and Wichern 1992)....

    [...]

Journal ArticleDOI
TL;DR: In this article, the authors present an overview of the basic concepts of multivariate analysis, including matrix algebra and random vectors, as well as a strategy for analyzing multivariate models.
Abstract: (NOTE: Each chapter begins with an Introduction, and concludes with Exercises and References.) I. GETTING STARTED. 1. Aspects of Multivariate Analysis. Applications of Multivariate Techniques. The Organization of Data. Data Displays and Pictorial Representations. Distance. Final Comments. 2. Matrix Algebra and Random Vectors. Some Basics of Matrix and Vector Algebra. Positive Definite Matrices. A Square-Root Matrix. Random Vectors and Matrices. Mean Vectors and Covariance Matrices. Matrix Inequalities and Maximization. Supplement 2A Vectors and Matrices: Basic Concepts. 3. Sample Geometry and Random Sampling. The Geometry of the Sample. Random Samples and the Expected Values of the Sample Mean and Covariance Matrix. Generalized Variance. Sample Mean, Covariance, and Correlation as Matrix Operations. Sample Values of Linear Combinations of Variables. 4. The Multivariate Normal Distribution. The Multivariate Normal Density and Its Properties. Sampling from a Multivariate Normal Distribution and Maximum Likelihood Estimation. The Sampling Distribution of 'X and S. Large-Sample Behavior of 'X and S. Assessing the Assumption of Normality. Detecting Outliners and Data Cleaning. Transformations to Near Normality. II. INFERENCES ABOUT MULTIVARIATE MEANS AND LINEAR MODELS. 5. Inferences About a Mean Vector. The Plausibility of ...m0 as a Value for a Normal Population Mean. Hotelling's T 2 and Likelihood Ratio Tests. Confidence Regions and Simultaneous Comparisons of Component Means. Large Sample Inferences about a Population Mean Vector. Multivariate Quality Control Charts. Inferences about Mean Vectors When Some Observations Are Missing. Difficulties Due To Time Dependence in Multivariate Observations. Supplement 5A Simultaneous Confidence Intervals and Ellipses as Shadows of the p-Dimensional Ellipsoids. 6. Comparisons of Several Multivariate Means. Paired Comparisons and a Repeated Measures Design. Comparing Mean Vectors from Two Populations. Comparison of Several Multivariate Population Means (One-Way MANOVA). Simultaneous Confidence Intervals for Treatment Effects. Two-Way Multivariate Analysis of Variance. Profile Analysis. Repealed Measures, Designs, and Growth Curves. Perspectives and a Strategy for Analyzing Multivariate Models. 7. Multivariate Linear Regression Models. The Classical Linear Regression Model. Least Squares Estimation. Inferences About the Regression Model. Inferences from the Estimated Regression Function. Model Checking and Other Aspects of Regression. Multivariate Multiple Regression. The Concept of Linear Regression. Comparing the Two Formulations of the Regression Model. Multiple Regression Models with Time Dependant Errors. Supplement 7A The Distribution of the Likelihood Ratio for the Multivariate Regression Model. III. ANALYSIS OF A COVARIANCE STRUCTURE. 8. Principal Components. Population Principal Components. Summarizing Sample Variation by Principal Components. Graphing the Principal Components. Large-Sample Inferences. Monitoring Quality with Principal Components. Supplement 8A The Geometry of the Sample Principal Component Approximation. 9. Factor Analysis and Inference for Structured Covariance Matrices. The Orthogonal Factor Model. Methods of Estimation. Factor Rotation. Factor Scores. Perspectives and a Strategy for Factor Analysis. Structural Equation Models. Supplement 9A Some Computational Details for Maximum Likelihood Estimation. 10. Canonical Correlation Analysis Canonical Variates and Canonical Correlations. Interpreting the Population Canonical Variables. The Sample Canonical Variates and Sample Canonical Correlations. Additional Sample Descriptive Measures. Large Sample Inferences. IV. CLASSIFICATION AND GROUPING TECHNIQUES. 11. Discrimination and Classification. Separation and Classification for Two Populations. Classifications with Two Multivariate Normal Populations. Evaluating Classification Functions. Fisher's Discriminant Function...nSeparation of Populations. Classification with Several Populations. Fisher's Method for Discriminating among Several Populations. Final Comments. 12. Clustering, Distance Methods and Ordination. Similarity Measures. Hierarchical Clustering Methods. Nonhierarchical Clustering Methods. Multidimensional Scaling. Correspondence Analysis. Biplots for Viewing Sample Units and Variables. Procustes Analysis: A Method for Comparing Configurations. Appendix. Standard Normal Probabilities. Student's t-Distribution Percentage Points. ...c2 Distribution Percentage Points. F-Distribution Percentage Points. F-Distribution Percentage Points (...a = .10). F-Distribution Percentage Points (...a = .05). F-Distribution Percentage Points (...a = .01). Data Index. Subject Index.

10,148 citations

Journal ArticleDOI
TL;DR: This study illustrates the usefulness of multivariate statistical techniques for analysis and interpretation of complex data sets, and in water quality assessment, identification of pollution sources/factors and understanding temporal/spatial variations in waterquality for effective river water quality management.
Abstract: Multivariate statistical techniques, such as cluster analysis (CA), principal component analysis (PCA), factor analysis (FA) and discriminant analysis (DA), were applied for the evaluation of temporal/spatial variations and the interpretation of a large complex water quality data set of the Fuji river basin, generated during 8 years (1995–2002) monitoring of 12 parameters at 13 different sites (14 976 observations). Hierarchical cluster analysis grouped 13 sampling sites into three clusters, i.e., relatively less polluted (LP), medium polluted (MP) and highly polluted (HP) sites, based on the similarity of water quality characteristics. Factor analysis/principal component analysis, applied to the data sets of the three different groups obtained from cluster analysis, resulted in five, five and three latent factors explaining 73.18, 77.61 and 65.39% of the total variance in water quality data sets of LP, MP and HP areas, respectively. The varifactors obtained from factor analysis indicate that the parameters responsible for water quality variations are mainly related to discharge and temperature (natural), organic pollution (point source: domestic wastewater) in relatively less polluted areas; organic pollution (point source: domestic wastewater) and nutrients (non-point sources: agriculture and orchard plantations) in medium polluted areas; and organic pollution and nutrients (point sources: domestic wastewater, wastewater treatment plants and industries) in highly polluted areas in the basin. Discriminant analysis gave the best results for both spatial and temporal analysis. It provided an important data reduction as it uses only six parameters (discharge, temperature, dissolved oxygen, biochemical oxygen demand, electrical conductivity and nitrate nitrogen), affording more than 85% correct assignations in temporal analysis, and seven parameters (discharge, temperature, biochemical oxygen demand, pH, electrical conductivity, nitrate nitrogen and ammonical nitrogen), affording more than 81% correct assignations in spatial analysis, of three different sampling sites of the basin. Therefore, DA allowed a reduction in the dimensionality of the large data set, delineating a few indicator parameters responsible for large variations in water quality. Thus, this study illustrates the usefulness of multivariate statistical techniques for analysis and interpretation of complex data sets, and in water quality assessment, identification of pollution sources/factors and understanding temporal/spatial variations in water quality for effective river water quality management.

1,481 citations


"Identification source of variation ..." refers methods in this paper

  • ...The measure will be multiplied by 100 as a way to standardize the linkage distance signified by the y-axis (Shrestha and Kazama, 2007)....

    [...]

Journal ArticleDOI
TL;DR: This study presents necessity and usefulness of multivariate statistical techniques for evaluation and interpretation of large complex data sets with a view to get better information about the water quality and design of monitoring network for effective management of water resources.

1,429 citations


"Identification source of variation ..." refers background in this paper

  • ...It presents the details on the most significant variables due to spatial and temporal variations, by putting them from the less significant variables with minimum loss of the original information (Singh et al., 2004; 2005; Azid et al., 2015)....

    [...]

Related Papers (5)