scispace - formally typeset

Journal ArticleDOI

Identification source of variation on regional impact of air quality pattern using chemometric

03 Aug 2015-Aerosol and Air Quality Research (Taiwan Association for Aerosol Research)-Vol. 15, Iss: 4, pp 1545-1558

AbstractThis study intends to show the effectiveness of hierarchical agglomerative cluster analysis (HACA), discriminant analysis (DA), principal component analysis (PCA), factor analysis (FA) and multiple linear regressions (MLR) for assessing the air quality data and air pollution sources pattern recognition. The data sets of air quality for 12 months (January–December) in 2007, consisting of 14 stations around Peninsular Malaysia with 14 parameters (168 datasets) were applied. Three significant clusters - low pollution source (LPS) region, moderate pollution source (MPS) region, and slightly high pollution source (SHPS) region were generated via HACA. Forward stepwise of DA managed to discriminate 8 variables, whereas backward stepwise of DA managed to discriminate 9 out of 14 variables. The method of PCA and FA has identified 8 pollutants in LPS and SHPS respectively, as well as 11 pollutants in MPS region, where most of the pollutants are expected derived from industrial activities, transportation and agriculture systems. Four MLR models show that PM10 categorize as the primary pollutant in Malaysia. From the study, it can be stipulated that the application of chemometric techniques can disclose meaningful information on the spatial variability of a large and complex air quality data. A clearer review about the air quality and a novel design of air quality monitoring network for better management of air pollution can be achieved.

Topics: Air quality index (60%), Pollution (52%)

...read more

Content maybe subject to copyright    Report

Citations
More filters

Journal ArticleDOI
Abstract: According to World Health Organization, 9 out of 10 people breathe polluted air and the ambient air pollution accounts for nearly 4.2 million early deaths worldwide. There is an urgent need for scientific management of urban air systems. Mathematical modeling of air quality helps the researchers and urban authorities in devising scientific management plans for mitigation of the associated impacts. We present an organized review of the broad aspects related to urban air quality modeling such as – urban microclimate, geospatial data, chemical transport models, computational fluid dynamics (CFD) models and integration of CFD and mesoscale models. The paper also discusses about the influence of urban land scape features on air quality, accuracy of emission inventory and model validation methods. The present review provides a vantage point to the researchers in the emerging field of high resolution urban air quality modeling for devising the location specific mitigation plans for the scientific management of the clean air.

24 citations


Journal ArticleDOI
Abstract: Water ecosystem deterioration can be affected by various factors of either natural environment or physical changes in the river basin. Data observation were made during dry season (April 2017) and wet season (December 2017). 21 sampling stations were selected along Kenyir Lake Basin. Overall, the water quality status as stated in NWQS is categorized as Class I on dry season and Class II on wet sea-son. The major pollutants in Kenyir Lake are Total Suspended Solids (TSS), Chemical Oxygen Demand (COD), Dissolve Oxygen and pH which are contributed largely by untreated or partially treated sewage from tourism development and construction activities around the basin. The sedimentation problem level in the Kenyir Lake Basin is not in critically stage but the flow rate of water and land use ac-tivities (development around basin) will be contributed to the increasing levels of sedimentation. The good site management such as the implementation of proper site practice measures to control and treat run-off prior to discharge will ensure that the construction works will not affect the quality and quantity of the receiving waters or have significant impact upon the receiving waters.

14 citations


Cites background from "Identification source of variation ..."

  • ...The main sources of pollutants were possibly waste product and effluent which from development and activities in the construction, tourism, agricultural areas and inorganic wastes which ultimately contaminated the river basin [31]....

    [...]


Journal ArticleDOI
TL;DR: It is suggested that ANN was an effective tool to compute the MWQ in mangrove estuarine zone and a powerful alternative prediction model as compared to the other modelling methods.
Abstract: The prediction models of MWQI in mangrove and estuarine zones were constructed. The 2011–2015 data employed in this study entailed 13 parameters from six monitoring stations in West Malaysia. Spatial discriminant analysis (SDA) had recommended seven significant parameters to develop the MWQI which were DO, TSS, O&G, PO4, Cd, Cr and Zn. These selected parameters were then used to develop prediction models for the MWQI using artificial neural network (ANN) and multiple linear regressions (MLR). The SDA-ANN model had higher R2 value for training (0.9044) and validation (0.7113) results than SDA-MLR model and was chosen as the best model in mangrove estuarine zone. The SDA-ANN model had also demonstrated lower RMSE (5.224) than the SDA-MLR (12.7755). In summary, this work suggested that ANN was an effective tool to compute the MWQ in mangrove estuarine zone and a powerful alternative prediction model as compared to the other modelling methods.

14 citations


Journal ArticleDOI
Abstract: This study investigates metals in the PM_(1.0) and PM_(2.5) collected using a micro-orifice uniform deposition impactor (MOUDI) sampler in the YanShuei area of southern Taiwan during a beehive firework display. The results of sample analyses indicate that during the beehive firework display, the ratios of metal concentrations in PM_(2.5) (D) to the background level (B) at leeward sampling site were 1,828 for Ba, 702 for K, 534 for Sr, 473 for Cu, 104 for Mg, 121 for Al, and 98 for Pb. The corresponding data for PM_(1.0) were 3036, 838, 550, 676, 594, 190, and 126, respectively. According to the results of metal composition ratio, Principal Component Analysis (PCA), and upper continental crust (UCC) analyses, the concentrations of particle-bound Al, Ba, Cu, K, Mg, Pb, and Sr increased during the beehive firework displays, suggesting that firework-display aerosols contained abundant metal elements of Al, Ba, Cu, K, Mg, Pb, and Sr. Before (background), trial, during, and after the beehive firework display, the Ba, K, Cu, Mg, Pb, and Sr (commonly regarded as firework display indicator elements) accounted for 0.520, 2.45, 26.4 and 0.849% mass of PM1, respectively, while for PM_(2.5) the corresponding data were 0.777, 2.32, 23.8, and 0.776%, respectively.

11 citations


Cites background from "Identification source of variation ..."

  • ...0) can be classified into several groups by their sources (Allen et al., 2001; Marcazzan et al., 2001; Manoli et al., 2002; AlMomani, 2003; Azid et al., 2015; Chen et al., 2015; Fang et al., 2015; Liang et al., 2015)....

    [...]

  • ...…characteristic values of over 1 in Principal Component Analysis (PCA) (SPSS v.12.0) can be classified into several groups by their sources (Allen et al., 2001; Marcazzan et al., 2001; Manoli et al., 2002; AlMomani, 2003; Azid et al., 2015; Chen et al., 2015; Fang et al., 2015; Liang et al., 2015)....

    [...]


Journal ArticleDOI
Abstract: The comprehensives of particulate matter studies are needed in predicting future haze occurrences in Malaysia. This paper presents the application of Artificial Neural Networks (ANN) and Multiple Linear Regressions (MLR) coupled with sensitivity analysis (SA) in order to recognize the pollutant relationship status over particulate matter (PM10) in eastern region. Eight monitoring studies were used, involving 14 input parameters as independent variables including meteorological factors. In order to investigate the efficiency of ANN and MLR performance, two different weather circumstances were selected; haze and non-haze. The performance evaluation was characterized into two steps. Firstly, two models were developed based on ANN and MLR which denoted as full model, with all parameters (14 variables) were used as the input. SA was used as additional feature to rank the most contributed parameter to PM10 variations in both situations. Next, the model development was evaluated based on selected model, where only significant variables were selected as input. Three mathematical indices were introduced (R2, RMSE and SSE) to compare on both techniques. From the findings, ANN performed better in full and selected model, with both models were completely showed a significant result during hazy and non-hazy. On top of that, UVb and carbon monoxide were both variables that mutually predicted by ANN and MLR during hazy and non-hazy days, respectively. The precise predictions were required in helping any related agency to emphasize on pollutant that essentially contributed to PM10 variations, especially during haze period.

10 citations


Cites background or methods from "Identification source of variation ..."

  • ...Thus, a serious attention is needed by all parties, not only by government sector, but also more to individual responsibility (Azid et al. 2015a)....

    [...]

  • ...The MLR is a traditional methodology to examine the impact of dependent variable by identifying the relationship of each independent variables (Azid et al. 2015b; Azid et al. 2015c)....

    [...]


References
More filters

Journal ArticleDOI
Abstract: A procedure for forming hierarchical groups of mutually exclusive subsets, each of which has members that are maximally similar with respect to specified characteristics, is suggested for use in large-scale (n > 100) studies when a precise optimal solution for a specified number of groups is not practical. Given n sets, this procedure permits their reduction to n − 1 mutually exclusive sets by considering the union of all possible n(n − 1)/2 pairs and selecting a union having a maximal value for the functional relation, or objective function, that reflects the criterion chosen by the investigator. By repeating this process until only one group remains, the complete hierarchical structure and a quantitative estimate of the loss associated with each stage in the grouping can be obtained. A general flowchart helpful in computer programming and a numerical example are included.

15,609 citations


"Identification source of variation ..." refers methods in this paper

  • ...Analysis of variance (ANOVA) is used to analyse the distances between clusters in Ward’s method, which is established to minimize the total of squares of any two achievable clusters at every step (Ward, 1963)....

    [...]


Book
01 Jan 1982
Abstract: (NOTE: Each chapter begins with an Introduction, and concludes with Exercises and References.) I. GETTING STARTED. 1. Aspects of Multivariate Analysis. Applications of Multivariate Techniques. The Organization of Data. Data Displays and Pictorial Representations. Distance. Final Comments. 2. Matrix Algebra and Random Vectors. Some Basics of Matrix and Vector Algebra. Positive Definite Matrices. A Square-Root Matrix. Random Vectors and Matrices. Mean Vectors and Covariance Matrices. Matrix Inequalities and Maximization. Supplement 2A Vectors and Matrices: Basic Concepts. 3. Sample Geometry and Random Sampling. The Geometry of the Sample. Random Samples and the Expected Values of the Sample Mean and Covariance Matrix. Generalized Variance. Sample Mean, Covariance, and Correlation as Matrix Operations. Sample Values of Linear Combinations of Variables. 4. The Multivariate Normal Distribution. The Multivariate Normal Density and Its Properties. Sampling from a Multivariate Normal Distribution and Maximum Likelihood Estimation. The Sampling Distribution of 'X and S. Large-Sample Behavior of 'X and S. Assessing the Assumption of Normality. Detecting Outliners and Data Cleaning. Transformations to Near Normality. II. INFERENCES ABOUT MULTIVARIATE MEANS AND LINEAR MODELS. 5. Inferences About a Mean Vector. The Plausibility of ...m0 as a Value for a Normal Population Mean. Hotelling's T 2 and Likelihood Ratio Tests. Confidence Regions and Simultaneous Comparisons of Component Means. Large Sample Inferences about a Population Mean Vector. Multivariate Quality Control Charts. Inferences about Mean Vectors When Some Observations Are Missing. Difficulties Due To Time Dependence in Multivariate Observations. Supplement 5A Simultaneous Confidence Intervals and Ellipses as Shadows of the p-Dimensional Ellipsoids. 6. Comparisons of Several Multivariate Means. Paired Comparisons and a Repeated Measures Design. Comparing Mean Vectors from Two Populations. Comparison of Several Multivariate Population Means (One-Way MANOVA). Simultaneous Confidence Intervals for Treatment Effects. Two-Way Multivariate Analysis of Variance. Profile Analysis. Repealed Measures, Designs, and Growth Curves. Perspectives and a Strategy for Analyzing Multivariate Models. 7. Multivariate Linear Regression Models. The Classical Linear Regression Model. Least Squares Estimation. Inferences About the Regression Model. Inferences from the Estimated Regression Function. Model Checking and Other Aspects of Regression. Multivariate Multiple Regression. The Concept of Linear Regression. Comparing the Two Formulations of the Regression Model. Multiple Regression Models with Time Dependant Errors. Supplement 7A The Distribution of the Likelihood Ratio for the Multivariate Regression Model. III. ANALYSIS OF A COVARIANCE STRUCTURE. 8. Principal Components. Population Principal Components. Summarizing Sample Variation by Principal Components. Graphing the Principal Components. Large-Sample Inferences. Monitoring Quality with Principal Components. Supplement 8A The Geometry of the Sample Principal Component Approximation. 9. Factor Analysis and Inference for Structured Covariance Matrices. The Orthogonal Factor Model. Methods of Estimation. Factor Rotation. Factor Scores. Perspectives and a Strategy for Factor Analysis. Structural Equation Models. Supplement 9A Some Computational Details for Maximum Likelihood Estimation. 10. Canonical Correlation Analysis Canonical Variates and Canonical Correlations. Interpreting the Population Canonical Variables. The Sample Canonical Variates and Sample Canonical Correlations. Additional Sample Descriptive Measures. Large Sample Inferences. IV. CLASSIFICATION AND GROUPING TECHNIQUES. 11. Discrimination and Classification. Separation and Classification for Two Populations. Classifications with Two Multivariate Normal Populations. Evaluating Classification Functions. Fisher's Discriminant Function...nSeparation of Populations. Classification with Several Populations. Fisher's Method for Discriminating among Several Populations. Final Comments. 12. Clustering, Distance Methods and Ordination. Similarity Measures. Hierarchical Clustering Methods. Nonhierarchical Clustering Methods. Multidimensional Scaling. Correspondence Analysis. Biplots for Viewing Sample Units and Variables. Procustes Analysis: A Method for Comparing Configurations. Appendix. Standard Normal Probabilities. Student's t-Distribution Percentage Points. ...c2 Distribution Percentage Points. F-Distribution Percentage Points. F-Distribution Percentage Points (...a = .10). F-Distribution Percentage Points (...a = .05). F-Distribution Percentage Points (...a = .01). Data Index. Subject Index.

11,666 citations


"Identification source of variation ..." refers background in this paper

  • ...For every cluster, it creates a discriminant function (DF) (Johnson and Wichern 1992)....

    [...]


Journal ArticleDOI
Abstract: (NOTE: Each chapter begins with an Introduction, and concludes with Exercises and References.) I. GETTING STARTED. 1. Aspects of Multivariate Analysis. Applications of Multivariate Techniques. The Organization of Data. Data Displays and Pictorial Representations. Distance. Final Comments. 2. Matrix Algebra and Random Vectors. Some Basics of Matrix and Vector Algebra. Positive Definite Matrices. A Square-Root Matrix. Random Vectors and Matrices. Mean Vectors and Covariance Matrices. Matrix Inequalities and Maximization. Supplement 2A Vectors and Matrices: Basic Concepts. 3. Sample Geometry and Random Sampling. The Geometry of the Sample. Random Samples and the Expected Values of the Sample Mean and Covariance Matrix. Generalized Variance. Sample Mean, Covariance, and Correlation as Matrix Operations. Sample Values of Linear Combinations of Variables. 4. The Multivariate Normal Distribution. The Multivariate Normal Density and Its Properties. Sampling from a Multivariate Normal Distribution and Maximum Likelihood Estimation. The Sampling Distribution of 'X and S. Large-Sample Behavior of 'X and S. Assessing the Assumption of Normality. Detecting Outliners and Data Cleaning. Transformations to Near Normality. II. INFERENCES ABOUT MULTIVARIATE MEANS AND LINEAR MODELS. 5. Inferences About a Mean Vector. The Plausibility of ...m0 as a Value for a Normal Population Mean. Hotelling's T 2 and Likelihood Ratio Tests. Confidence Regions and Simultaneous Comparisons of Component Means. Large Sample Inferences about a Population Mean Vector. Multivariate Quality Control Charts. Inferences about Mean Vectors When Some Observations Are Missing. Difficulties Due To Time Dependence in Multivariate Observations. Supplement 5A Simultaneous Confidence Intervals and Ellipses as Shadows of the p-Dimensional Ellipsoids. 6. Comparisons of Several Multivariate Means. Paired Comparisons and a Repeated Measures Design. Comparing Mean Vectors from Two Populations. Comparison of Several Multivariate Population Means (One-Way MANOVA). Simultaneous Confidence Intervals for Treatment Effects. Two-Way Multivariate Analysis of Variance. Profile Analysis. Repealed Measures, Designs, and Growth Curves. Perspectives and a Strategy for Analyzing Multivariate Models. 7. Multivariate Linear Regression Models. The Classical Linear Regression Model. Least Squares Estimation. Inferences About the Regression Model. Inferences from the Estimated Regression Function. Model Checking and Other Aspects of Regression. Multivariate Multiple Regression. The Concept of Linear Regression. Comparing the Two Formulations of the Regression Model. Multiple Regression Models with Time Dependant Errors. Supplement 7A The Distribution of the Likelihood Ratio for the Multivariate Regression Model. III. ANALYSIS OF A COVARIANCE STRUCTURE. 8. Principal Components. Population Principal Components. Summarizing Sample Variation by Principal Components. Graphing the Principal Components. Large-Sample Inferences. Monitoring Quality with Principal Components. Supplement 8A The Geometry of the Sample Principal Component Approximation. 9. Factor Analysis and Inference for Structured Covariance Matrices. The Orthogonal Factor Model. Methods of Estimation. Factor Rotation. Factor Scores. Perspectives and a Strategy for Factor Analysis. Structural Equation Models. Supplement 9A Some Computational Details for Maximum Likelihood Estimation. 10. Canonical Correlation Analysis Canonical Variates and Canonical Correlations. Interpreting the Population Canonical Variables. The Sample Canonical Variates and Sample Canonical Correlations. Additional Sample Descriptive Measures. Large Sample Inferences. IV. CLASSIFICATION AND GROUPING TECHNIQUES. 11. Discrimination and Classification. Separation and Classification for Two Populations. Classifications with Two Multivariate Normal Populations. Evaluating Classification Functions. Fisher's Discriminant Function...nSeparation of Populations. Classification with Several Populations. Fisher's Method for Discriminating among Several Populations. Final Comments. 12. Clustering, Distance Methods and Ordination. Similarity Measures. Hierarchical Clustering Methods. Nonhierarchical Clustering Methods. Multidimensional Scaling. Correspondence Analysis. Biplots for Viewing Sample Units and Variables. Procustes Analysis: A Method for Comparing Configurations. Appendix. Standard Normal Probabilities. Student's t-Distribution Percentage Points. ...c2 Distribution Percentage Points. F-Distribution Percentage Points. F-Distribution Percentage Points (...a = .10). F-Distribution Percentage Points (...a = .05). F-Distribution Percentage Points (...a = .01). Data Index. Subject Index.

10,147 citations


Journal ArticleDOI
TL;DR: This study illustrates the usefulness of multivariate statistical techniques for analysis and interpretation of complex data sets, and in water quality assessment, identification of pollution sources/factors and understanding temporal/spatial variations in waterquality for effective river water quality management.
Abstract: Multivariate statistical techniques, such as cluster analysis (CA), principal component analysis (PCA), factor analysis (FA) and discriminant analysis (DA), were applied for the evaluation of temporal/spatial variations and the interpretation of a large complex water quality data set of the Fuji river basin, generated during 8 years (1995–2002) monitoring of 12 parameters at 13 different sites (14 976 observations). Hierarchical cluster analysis grouped 13 sampling sites into three clusters, i.e., relatively less polluted (LP), medium polluted (MP) and highly polluted (HP) sites, based on the similarity of water quality characteristics. Factor analysis/principal component analysis, applied to the data sets of the three different groups obtained from cluster analysis, resulted in five, five and three latent factors explaining 73.18, 77.61 and 65.39% of the total variance in water quality data sets of LP, MP and HP areas, respectively. The varifactors obtained from factor analysis indicate that the parameters responsible for water quality variations are mainly related to discharge and temperature (natural), organic pollution (point source: domestic wastewater) in relatively less polluted areas; organic pollution (point source: domestic wastewater) and nutrients (non-point sources: agriculture and orchard plantations) in medium polluted areas; and organic pollution and nutrients (point sources: domestic wastewater, wastewater treatment plants and industries) in highly polluted areas in the basin. Discriminant analysis gave the best results for both spatial and temporal analysis. It provided an important data reduction as it uses only six parameters (discharge, temperature, dissolved oxygen, biochemical oxygen demand, electrical conductivity and nitrate nitrogen), affording more than 85% correct assignations in temporal analysis, and seven parameters (discharge, temperature, biochemical oxygen demand, pH, electrical conductivity, nitrate nitrogen and ammonical nitrogen), affording more than 81% correct assignations in spatial analysis, of three different sampling sites of the basin. Therefore, DA allowed a reduction in the dimensionality of the large data set, delineating a few indicator parameters responsible for large variations in water quality. Thus, this study illustrates the usefulness of multivariate statistical techniques for analysis and interpretation of complex data sets, and in water quality assessment, identification of pollution sources/factors and understanding temporal/spatial variations in water quality for effective river water quality management.

1,292 citations


"Identification source of variation ..." refers methods in this paper

  • ...The measure will be multiplied by 100 as a way to standardize the linkage distance signified by the y-axis (Shrestha and Kazama, 2007)....

    [...]


Journal ArticleDOI
TL;DR: This study presents necessity and usefulness of multivariate statistical techniques for evaluation and interpretation of large complex data sets with a view to get better information about the water quality and design of monitoring network for effective management of water resources.
Abstract: This case study reports different multivariate statistical techniques applied for evaluation of temporal/spatial variations and interpretation of a large complex water-quality data set obtained during monitoring of Gomti River in Northern part of India. Water quality of the Gomti River, a major tributary of the Ganga River was monitored at eight different sites selected in relatively low, moderate and high pollution regions, regularly over a period of 5 years (1994-1998) for 24 parameters. The complex data matrix (17,790 observations) was treated with different multivariate techniques such as cluster analysis, factor analysis/principal component analysis (FA/PCA) and discriminant analysis (DA). Cluster analysis (CA) showed good results rendering three different groups of similarity between the sampling sites reflecting the different water-quality parameters of the river system. FA/PCA identified six factors, which are responsible for the data structure explaining 71% of the total variance of the data set and allowed to group the selected parameters according to common features as well as to evaluate the incidence of each group on the overall variation in water quality. However, significant data reduction was not achieved, as it needed 14 parameters to explain 71% of both the temporal and spatial changes in water quality. Discriminant analysis showed the best results for data reduction and pattern recognition during both temporal and spatial analysis. Discriminant analysis showed five parameters (pH, temperature, conductivity, total alkalinity and magnesium) affording more than 88% right assignations in temporal analysis, while nine parameters (pH, temperature, alkalinity, Ca-hardness, DO, BOD, chloride, sulfate and TKN) to afford 91% right assignations in spatial analysis of three different regions in the basin. Thus, DA allowed reduction in dimensionality of the large data set, delineating a few indicator parameters responsible for large variations in water quality. This study presents necessity and usefulness of multivariate statistical techniques for evaluation and interpretation of large complex data sets with a view to get better information about the water quality and design of monitoring network for effective management of water resources.

1,251 citations


"Identification source of variation ..." refers background in this paper

  • ...It presents the details on the most significant variables due to spatial and temporal variations, by putting them from the less significant variables with minimum loss of the original information (Singh et al., 2004; 2005; Azid et al., 2015)....

    [...]