scispace - formally typeset
Search or ask a question

Showing papers on "Correspondence analysis published in 2016"


Journal ArticleDOI
TL;DR: The missMDA as mentioned in this paper package performs principal component analysis on incomplete data sets, aiming to obtain scores, loadings and graphical representations despite missing values, and can be used to perform single imputation to complete data involving continuous, categorical and mixed variables.
Abstract: We present the R package missMDA which performs principal component methods on incomplete data sets, aiming to obtain scores, loadings and graphical representations despite missing values. Package methods include principal component analysis for continuous variables, multiple correspondence analysis for categorical variables, factorial analysis on mixed data for both continuous and categorical variables, and multiple factor analysis for multi-table data. Furthermore, missMDA can be used to perform single imputation to complete data involving continuous, categorical and mixed variables. A multiple imputation method is also available. In the principal component analysis framework, variability across different imputations is represented by confidence areas around the row and column positions on the graphical outputs. This allows assessment of the credibility of results obtained from incomplete data sets.

758 citations


01 Jan 2016
TL;DR: All the ordination methods that are commonly used, for example, Principal Components Analysis and all variants of Correspondence Analysis as well as standard cluster analyses such as Ward's method and group average clustering, are inappropriate when using AD data and the application of ordinal clustering and scaling methods to traditional phytosociological data is advocated.
Abstract: This article investigates whether the Braun-Blanquet abundance/dominance (AD) scores that commonly appear in phytosociological tables can properly be analysed by conventional multivariate analysis methods such as Principal Components Analysis and Correspondence Analysis. The answer is a definite NO. The source of problems is that the AD values express species performance on a scale, namely the ordinal scale, on which differences are not interpretable. There are several arguments suggesting that no matter which methods have been preferred in contemporary numerical syntaxonomy and why, ordinal data should be treated in an ordinal way. In addition to the inadmissibility of arithmetic operations with the AD scores, these arguments include interpretability of dissimilarities derived from ordinal data, consistency of all steps throughout the analysis and universality of the method which enables simultaneous treatment of various measurement scales. All the ordination methods that are commonly used, for example, Principal Components Analysis and all variants of Correspondence Analysis as well as standard cluster analyses such as Ward's method and group average clustering, are inappropriate when using AD data. Therefore, the application of ordinal clustering and scaling methods to traditional phytosociological data is advocated. Dissimilarities between relev6s should be calculated using ordinal measures of resemblance, and ordination and clustering algorithms should also be ordinal in nature. A good ordination example is Nonmetric Multidimensional Scaling (NMDS) as long as it is calculated from an ordinal dissimilarity measure such as the Goodman & Kruskal y coefficient, and for clustering the new OrdClAn-H and OrdClAn-N methods.

124 citations


Journal ArticleDOI
TL;DR: The history of MCA is a curious one: in about 80 years, it has been invented and re-invented by different authors independently of each other as discussed by the authors, and various techniques based on the multiple correspondence analysis systems provided by two main schools: the French and the Dutch.
Abstract: The history of multiple correspondence analysis (MCA) is a curious one: in about 80 years, it has been invented and re-invented by different authors independently of each other. After a brief historical account of MCA, the present article intends comparing the various techniques based on the multiple correspondence analysis systems provided by two main schools: the French and the Dutch.

93 citations


Journal ArticleDOI
TL;DR: In this article, a subset correspondence analysis is applied to a subset of response categories from a questionnaire survey (e.g., undecided responses or the subset of responses for a particular category across several questions) to maintain the original relative frequencies of the categories and not reexpress them relative to totals within the subset.
Abstract: This study shows how correspondence analysis may be applied to a subset of response categories from a questionnaire survey (e.g., the subset of undecided responses or the subset of responses for a particular category across several questions). The idea is to maintain the original relative frequencies of the categories and not reexpress them relative to totals within the subset, as would normally be done in a regular correspondence analysis of the subset. Furthermore, the masses and chi-square distances assigned to the subset of categories are the same as those in the correspondence analysis of the whole data set, which leads to a decomposition of total variance into parts if the whole data set is subdivided into disjoint subsets. This variant of the method, called subset correspondence analysis, is illustrated on data from the International Social Survey Programme’s Family and Changing Gender Roles survey.

48 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed the use of the Dirichlet distribution and presented a new approach to perform age-at-death multivariate graphical comparisons using domestic sheep/goat dental remains from 10 Cardial sites (Early Neolithic) located in South France and the Iberian Peninsula.

28 citations


Journal ArticleDOI
TL;DR: This work formally defines and illustrates-in a tutorial format-how partial least squares correspondence analysis extends to various types of data and design problems that are particularly relevant for psychological research that include genetic data.
Abstract: For nearly a century, detecting the genetic contributions to cognitive and behavioral phenomena has been a core interest for psychological research. Recently, this interest has been reinvigorated by the availability of genotyping technologies (e.g., microarrays) that provide new genetic data, such as single nucleotide polymorphisms (SNPs). These SNPs-which represent pairs of nucleotide letters (e.g., AA, AG, or GG) found at specific positions on human chromosomes-are best considered as categorical variables, but this coding scheme can make difficult the multivariate analysis of their relationships with behavioral measurements, because most multivariate techniques developed for the analysis between sets of variables are designed for quantitative variables. To palliate this problem, we present a generalization of partial least squares-a technique used to extract the information common to 2 different data tables measured on the same observations-called partial least squares correspondence analysis-that is specifically tailored for the analysis of categorical and mixed ("heterogeneous") data types. Here, we formally define and illustrate-in a tutorial format-how partial least squares correspondence analysis extends to various types of data and design problems that are particularly relevant for psychological research that include genetic data. We illustrate partial least squares correspondence analysis with genetic, behavioral, and neuroimaging data from the Alzheimer's Disease Neuroimaging Initiative. R code is available on the Comprehensive R Archive Network and via the authors' websites. (PsycINFO Database Record

23 citations


Journal ArticleDOI
TL;DR: Correspondence analysis (CA) is an exploratory descriptive method designed to analyze two-way data tables, including some measure of association between rows and columns, known as factors as discussed by the authors.
Abstract: Recent developments of sequencing technologies that allow the production of massive amounts of genomic and genotyping data have highlighted the need for synthetic data representation and pattern recognition methods that can mine and help discovering biologically meaningful knowledge included in such large data sets. Correspondence analysis (CA) is an exploratory descriptive method designed to analyze two-way data tables, including some measure of association between rows and columns. It constructs linear combinations of variables, known as factors. CA has been used for decades to study high-dimensional data, and remarkable inferences from large data tables were obtained by reducing the dimensionality to a few orthogonal factors that correspond to the largest amount of variability in the data. Herein, I review CA and highlight its use by considering examples in handling high-dimensional data that can be constructed from genomic and genetic studies. Examples in amino acid compositions of large sets of species (viruses, phages, yeast, and fungi) as well as an example related to pairwise shared orthologs in a set of yeast and fungal species, as obtained from their proteome comparisons, are considered. For the first time, results show striking segregations between yeasts and fungi as well as between viruses and phages. Distributions obtained from shared orthologs show clusters of yeast and fungal species corresponding to their phylogenetic relationships. A direct comparison with the principal component analysis method is discussed using a recently published example of genotyping data related to newly discovered traces of an ancient hominid that was compared to modern human populations in the search for ancestral similarities. CA offers more detailed results highlighting links between modern humans and the ancient hominid and their characterizations. Compared to the popular principal component analysis method, CA allows easier and more effective interpretation of results, particularly by the ability of relating individual patterns with their corresponding characteristic variables.

22 citations


Journal ArticleDOI
TL;DR: In this paper, the authors presented a method for mapping habitat indices across networks using semi-quantitative data and a geostatistical technique called regression kriging, which consists of the derivation of habitat indices using multivariate statistical techniques that are regressed on map-based covariates such as altitude, slope and geology.

21 citations


01 Jan 2016
TL;DR: This correspondence analysis and data coding with java and r helps people to read a good book with a cup of coffee in the afternoon, instead they are facing with some harmful virus inside their desktop computer.
Abstract: Thank you for downloading correspondence analysis and data coding with java and r. Maybe you have knowledge that, people have search numerous times for their chosen books like this correspondence analysis and data coding with java and r, but end up in harmful downloads. Rather than reading a good book with a cup of coffee in the afternoon, instead they are facing with some harmful virus inside their desktop computer.

14 citations


Journal ArticleDOI
TL;DR: In this paper, the authors apply mixed techniques to integrate qualitative and quantitative data to analyze cultural dimensionality in a sequential research design, the data obtained in qualitative interviews, free-listing, and focus groups are validated by using triangulation method and further formalized to create survey items.
Abstract: While most research on cultural models isolates qualitative and quantitative methodologies, this study applies mixed techniques to integrate qualitative and quantitative data to analyze cultural dimensionality. In a sequential research design, the data obtained in qualitative interviews, free-listing, and focus groups are validated by using triangulation method and further formalized to create survey items. Correspondence analysis of multi-item scales is then employed to explore the Swedish cultural model of prosocial cooperation and to represent the dynamic relationship among its facets and different categories of informants. Besides its contents-rich structured output, the advantages of this technique include increased reliability of qualitative findings. The study illustrates applicability of mixed methods to research the patterns of stability and change in cultural models.

13 citations


Book ChapterDOI
01 Jan 2016

Journal ArticleDOI
TL;DR: This paper provides R functions that elaborates more fully on the code presented in Beh and Lombardo (2014) and provides the flexibility for constructing either a classical correspondence plot or a biplot graphical display.
Abstract: This paper presents the R package CAvariants (Lombardo and Beh, 2017). The package performs six variants of correspondence analysis on a two-way contingency table. The main function that shares the same name as the package – CAvariants – allows the user to choose (via a series of input parameters) from six different correspondence analysis procedures. These include the classical approach to (symmetrical) correspondence analysis, singly ordered correspondence analysis, doubly ordered correspondence analysis, non symmetrical correspondence analysis, singly ordered non symmetrical correspondence analysis and doubly ordered non symmetrical correspondence analysis. The code provides the flexibility for constructing either a classical correspondence plot or a biplot graphical display. It also allows the user to consider other important features that allow to assess the reliability of the graphical representations, such as the inclusion of algebraically derived elliptical confidence regions. This paper provides R functions that elaborates more fully on the code presented in Beh and Lombardo (2014).

Book ChapterDOI
01 Jan 2016
TL;DR: This work introduces a formulation in which the quantified data matrix is approximated by a lower-rank matrix using the quantification technique proposed by Murakami et al. (Non-metric principal component analysis for categorical variables with multiple quantifications, 1999).
Abstract: Multiple correspondence analysis (MCA) is a widely used technique to analyze categorical data and aims to reduce large sets of variables into smaller sets of components that summarize the information contained in the data. The purpose of MCA is the same as that of principal component analysis (PCA), and MCA can be regarded as an adaptation to the categorical data of PCA (Jolliffe, Principal Component Analysis, 2002). There are various approaches to formulate an MCA. We introduce a formulation in which the quantified data matrix is approximated by a lower-rank matrix using the quantification technique proposed by Murakami et al. (Non-metric principal component analysis for categorical variables with multiple quantifications, 1999).

Journal ArticleDOI
TL;DR: A new multi-step approach, “Standardized Factor Analysis”, which relies on geometric analysis and uses linear regression in a second stage in order to uncover structural effects in the original space, and raises a more general set of questions about causality.
Abstract: Since their introduction in the late 1960s, the ‘‘moderate’’, and moreover ‘‘metrological’’ and ‘‘hypermetrological’’ uses of regression models quickly became the dominant quantitative approach in the Anglo-Saxon social sciences. This ‘‘sociology of the variables’’ has been the subject of many critical insights, with little impact on its dominance. By contrast, the French situation is quite different, mainly because of the strong association between Pierre Bourdieu’s research program and the correspondence analysis methods. In this context, the relationship between geometric data analysis and regression models has turned into a ‘‘dialogue of the deaf’’. Complementarity is sometimes emphasized, correspondence analysis being associated with exploration and description of the data, and regressions being used to explain, reject or confirm assumptions. But regression models may also be used in order to analyze structural effects within a framework of geometrical data analysis, e.g. by visualizing graphically the results of a regression (Rouanet et al. 2002; Lebaron 2013). We propose a new multi-step approach, ‘‘Standardized Factor Analysis’’, which relies on geometric analysis and uses linear regression in a second stage in order to uncover structural effects in the original space. We illustrate it with data about tastes for cinema in France. We conclude by raising a more general set of questions about causality: social determinisms, even well established, are partial in the sense that they produce their effects only when associated with each other.

Book ChapterDOI
01 Jan 2016
TL;DR: This work proposes an extension of correspondence analysis to multiple levels, incorporating multiple relations and attributes, and shows how results serve as an exploratory stepping-stone for generating hypotheses to be tested in a more focused manner using confirmatory techniques such as p*/ERGM.
Abstract: Social actors are often nested within multiple levels that share several members, giving rise to multimodal data. Such data are complex if the actor-nesting is not mutually exclusive. We use affiliation networks to represent teams and individuals, with links representing team membership; social relations between individuals are represented using one-mode networks. We propose an extension of correspondence analysis to multiple levels, incorporating multiple relations and attributes, and demonstrate it with two illustrative examples. We also show how results serve as an exploratory stepping-stone for generating hypotheses to be tested in a more focused manner using confirmatory techniques such as p*/ERGM.

Journal ArticleDOI
TL;DR: In this article, the Burt's table, Joint Correspondence Analysis (JCA), Extended Matching Coefficient (EMC), and Gower & Hand (1996) are compared to MCA in order to check the quality of the methods.
Abstract: In this work, the reconstruction of the Burt's table, Greenacre (1988)'s Joint Correspondence Analysis (JCA), and Gower & Hand (1996)'s Extended Matching Coefficient (EMC) are compared to Multiple Correspondence Analysis (MCA) in order to check the quality of the methods. In particular, for the whole table, the ability is considered separately the diagonal, and the off-diagonal tables, that is the ability to describe either each character's distribution or the interaction between pairs of characters, or both. The theoretical aspects are discussed first, and finally the results obtained in an application are shown and discussed.


01 Jan 2016
TL;DR: The overall results from simulation study show that the smoothed location model performed better when the binary extraction is done by JCA rather than the Indicator MCA in terms of misclassification rate and computational efficiency.
Abstract: Non-parametric smoothed location model is another powerful approach which can be used to discriminate the objects that contain both continuous and binary variables.However, the smoothed location model is infeasible in estimating parameters when a large number of binary variables involved in the study.To handle this issue, the combination of two variable extraction techniques namely principal component analysis (PCA) and multiple correspondence analysis (MCA) are carried out before the construction of the smoothed location model. In fact, there are four types of MCA but only Indicator MCA and joint correspondence analysis (JCA) will be discussed in this article.Thus, the performance of the smoothed location model together with combination of PCA and two types of MCA, i.e. Indicator MCA and JCA, will be compared and evaluated.The overall results from simulation study show that the smoothed location model performed better when the binary extraction is done by JCA rather than the Indicator MCA in terms of misclassification rate and computational efficiency.

Journal ArticleDOI
TL;DR: This paper will provide researchers with the theoretical and practical foundations for understanding and applying correspondence analysis to their own research agendas with a focus on categorical data.
Abstract: Correspondence analysis is a statistical method that allows researchers to explore relationships among complex categorical variables. This paper will provide researchers with the theoretical and practical foundations for understanding and applying correspondence analysis to their own research agendas. Problem: Technical communicators use a variety of research methods and collect a variety of types of data. Of particular interest to technical communicators is categorical data, or data that are not traditionally quantitative. For instance, technical communicators often collect and analyze language data from a variety of texts. Analyzing this type of data can be difficult using traditional statistical methods. Key concepts: Variable types, a priori versus exploratory research designs, contingency tables, and data visualization are central to understanding the foundations of correspondence analysis. Key lessons: To conduct correspondence analysis, a researcher must walk through a series of steps including: (1) determining whether correspondence analysis is appropriate, (2) choosing a statistical software package, (3) running the correspondence analysis, and (4) interpreting and applying the results. Implications for practice: While correspondence analysis provides many useful insights into categorical data, a researcher must consider several things when deciding to use correspondence analysis. These include the potential to misinterpret and misapply the results of a correspondence analysis.

Journal ArticleDOI
TL;DR: In this paper, all the features of quantitative biplots are found in qualitative bi-plots, but calibrated interpolation axes become labeled category-level points and calibrated prediction axes become prediction regions.
Abstract: A previous paper, Biplots: Quantitative data, dealt exclusively with biplots for quantitative data. This paper is mainly concerned with qualitative data or data in the form of counts. Qualitative data can be nominal or ordinal, and it is usually reported in a coded numerical form. In the analysis of qualitative data, many methods can be grouped as quantification methods e.g., categorical principal component analysis, correspondence analysis, multiple correspondence analysis, homogeneity analysis: transforming qualities into quantitative values that may then be treated with quantitative methods. All the features of quantitative biplots are found in qualitative biplots, but calibrated interpolation axes become labeled category-level points and calibrated prediction axes become prediction regions. Interpretation remains in terms of distance, inner products, and sometimes area. WIREs Comput Stat 2016, 8:82-111. doi: 10.1002/wics.1377

Journal Article
TL;DR: The compatibility analysis was performed according to purchase frequency and the preferred status in the order cycle of these determined products, and the explanation ratio of the first two dimensions of the corresponding variable was found to be 97.2%.
Abstract: Correspondence analysis is an analysis method that facilitates interpretation of categorical variables in the cross tables (correspondence table, contingency table) as well as the similarities, divergences and associations between the row and column variables, and represents these associations graphically in a lower-dimensional space. It is a method quite popular especially in the fields that require analysis of categorical data such as medicine, health sciences, biometrics, economics, marketing and social sciences. Correspondence analysis has some similarities to the other multivariate methods such as the principal component analysis, log-linear analysis and multi-dimensional scaling. Most purchases by public institutions are determined first 5 items to demonstrate the purchase profile of product in "Hospital Equipment, Furnishings and Equipment" product line in Turkey. These are wheelchairs, companion seats, emergency stretchers, wheeled patient nightstands and examination tables. The compatibility analysis was performed according to purchase frequency and the preferred status in the order cycle of these determined products. In result of multiple correspondence analyses, the explanation ratio of the first two dimensions of the corresponding variable was found to be 97.2%. It is observed that the preferred state of the product type is primary with a ratio of 87% and the number of product order is secondary with a ratio of 10.2%.

01 Jan 2016
TL;DR: In this article, principal component analysis is extended to allow for multi-state discrete characters as well as continuous characters, and a taxonomic example is given to illustrate the method in practice, however, taxonomic structure may be better inferred directly from the ordination, rather than by constructing a taxonom distance measure.
Abstract: Summary It is shown how principal component analysis can be extended to allow for multi-state discrete characters as well as continuous characters. When all the characters are discrete, the proposed extension reduces to correspondence analysis. A taxonomic example is given to illustrate the method in practice. The technique allows the estimation of a taxonomic distance between objects which have been scored for multi-state characters. However, taxonomic structure may be better inferred directly from the ordination, rather than by constructing a taxonomic distance measure. i. Principal component analysis

01 Jan 2016
TL;DR: In this paper, the ability of two methods, TWINSPAN and COCKTAIL, to produce similar classifications of wet meadows (Calthion, incl. Filipendulenion) for Germany (7909 relev6s) and the Czech Republic (1287 releves) in this respect was tested.
Abstract: In European phytosociology, national classifications of corresponding vegetation types show considerable differences even between neighbouring countries. Therefore, the European Vegetation Survey project urgently needs numerical classification methods for large data sets that are able to produce compatible classifications using data sets from different countries. We tested the ability of two methods, TWINSPAN and COCKTAIL, to produce similar classifications of wet meadows (Calthion, incl. Filipendulenion) for Germany (7909 relev6s) and the Czech Republic (1287 releves) in this respect. In TWINSPAN, the indicator ordination option was used for classification of two national data sets, and the extracted assignment criteria (indicator species) were applied crosswise from one to the other national data set. Although the data sets presumably contained similar community types, TWINSPAN revealed almost no correspondence between the groups derived from the proper classification of the national data set and the groups defined by the assignment criteria taken from the other national data set. The reason is probably the difference in structure between the national data sets, which is a typical, but hardly avoidable, feature of any pair of phytosociological data sets. As a result, the first axis of the correspondence analysis, and consequently the first TWINSPAN division, are associated with different environmental gradients; the difference in the first division is transferred and multiplied further down the hierarchy. COCKTAIL is a method which produces releve groups on the basis of statistically formed species groups. The user determines the starting points for the formation of species groups, and groups already found in one data set can be tested for existence in the other data set. The correspondence between the national classifications produced by COCKTAIL was fairly good. For some releve groups, the lack of correspondence to groups in the other national data set could be explained by the absence of the corresponding vegetation types in one of the countries, rather than by methodological problems.

Proceedings ArticleDOI
01 Dec 2016
TL;DR: This work considers multivariate time series data and proposes an unsupervised learning technique to identify the top-k discriminative features and shows that the technique yields improved classification accuracy and increased accuracy and efficiency.
Abstract: Feature selection is important for dimensionality reduction, analysis, and pattern discovery applications. We consider multivariate time series data and propose an unsupervised learning technique to identify the top-k discriminative features. The proposed technique uses statistics drawn from the Principal Component Analysis (PCA) of the input data to leverage the relative importance of the principal components along with the coefficients within the principal directions of the data to uncover the ranking of the features. We conduct numerous experiments using various benchmark datasets to study the performance of the proposed technique in terms of the discriminant power of the selected features and its ability to minimize the original data reconstruction error. Compared to major existing techniques, our results indicate increased accuracy and efficiency. We also show that our technique yields improved classification accuracy.

Proceedings ArticleDOI
09 Apr 2016
TL;DR: The kernel principal component analysis method is used to improve the original foundation, and realize the construction of the artist evaluation model, and improve the accuracy.
Abstract: Principal component analysis is a number of related indexes into a multivariate statistical method a fewrelated indicators, often used in data compression and feature extraction are widely used in industry, agriculture, economy, biology, medicine, astronomy, geography and other fields. In the classical principal component analysis, each training data in construction the main ingredient is the same. However, in many practical problems, the significance and effect of the training data is different, usually some of the data than other data is moreimportant. We should pay more attention to the important data, should play a greater role in the construction of the main components, and the data may be not credible is the abnormal data, should limit its role. In this paper, each training data gives a confidence weight to the training data as fuzzy points in the sample space, based on the research Principal component analysis and kernel principal component analysis of fuzzy point data. In this paper, an analysis method based on the principal component analysis of the objective weight is presented, and the method is applied to the evaluation of the value of the artist's creation. The paper analyses the in kernel principal component analysis KPCA, input data space X reflect the shoot diameter: R P, h is projected onto a new high-dimensional feature space h after, although you can achieve nonlinear feature extraction, but still in the presence of outliers. In feature space, we can reduce the effect of outliers, and avoid the effect of the traditionalprinciple component analysis, so it has the advantages of robust and nonlinear. In fact, we show that the fuzzy membership degree of fuzzy membership degree is improved, and the accuracy rate is 84%.We use the kernel principal component analysis method to improve the original foundation, and realize the construction of the artist evaluation model, and improve the accuracy.

Journal ArticleDOI
TL;DR: In this article, the authors used correspondence analysis to diagnose the coexistence of category variables in antecedents of innovativeness, with the positions of the respondents representing various medical professions in hospitals.
Abstract: Abstract The aim of the study presented in this article is to show correspondence analysis as a method useful in the diagnosis of coexistence of category variables in antecedents of innovativeness, with the positions of the respondents representing various medical professions in hospitals. Primary data obtained in the course of empirical research, carried out using a questionnaire study on a sample of 459 respondents representing 8 public hospitals in Poland, is used to this aim. To follow up on the achievements of the analysis, literature on the issue of innovativeness and its antecedents was also used. The results of the correspondence analysis allows one to confirm the thesis of the different opinions of doctors, nurses/midwives and managers regarding the level of significance of antecedents of innovativeness, where for doctors and managers in this context the most important is financial optimization, and for nurses the improvement of the quality of medical services. The results may provide an important clue to the chief executives of hospitals in the context of further changes and innovativeness necessary to achieve the desired efficiency of these organizations.

Posted Content
TL;DR: In this paper, the authors use low level statistical tools based on purposefully written functions that search for unobservable patterns (clusters) that may not be apparent at higher levels of aggregation in publishable data.
Abstract: Availability of large datasets of (often) sensitive data at the level of Statistics Department imposes the obligation to verify their quality and numerical consistency. However, such circumstances offer a chance to use the low level statistical tools based on purposefully written functions that search for unobservable patterns (clusters) that may not be apparent at higher levels of aggregation in publishable data. The versatile statistical tools created in R environment, relying in principle on e.g. cluster analysis and correspondence analysis, may serve in many fields as a link between micro and macro level analysis, with additional possibility to create the automatic documentation of the R code and results. The working example covers analysis of the saving behaviours based on household budget surveys.

Journal ArticleDOI
TL;DR: In this paper, the authors provide an overview of the main scoring schemes focusing on the advantages and the statistical properties; they pay special attention to the impact of the chosen scores on the C statistic of CATANOVA and the graphical representations of doubly ordered non-symmetrical correspondence analysis.
Abstract: In the context of categorical data analysis, the CATegorical ANalysis Of Variance (CATANOVA) has been proposed to analyse the scheme variable-factor, both for nominal and ordinal variables. This method is based on the C statistic and allows to test the statistical significance of the tau index using its relationship with the C statistic. Through Emerson orthogonal polynomials (EOP) a useful decomposition of C statistic into bivariate moments (location, dispersion and higher order components) has been developed. In the construction of EOP the categories are replaced by scores, typically natural scores. In the paper, we provide an overview of the main scoring schemes focusing on the advantages and the statistical properties; we pay special attention to the impact of the chosen scores on the C statistic of CATANOVA and the graphical representations of doubly ordered non-symmetrical correspondence analysis. Through a real data example, we show the impact of the scoring schemes and we consider the RV a...