scispace - formally typeset
Search or ask a question

Showing papers in "Advances in Latent Variables - Methods, Models and Applications in 2013"


Journal Article
TL;DR: This paper works on a selection of publications from the international statistical literature, proposes an approach that allows us to identify sophisticated topic descriptors, and analyzes the links between topics and their temporal evolution.
Abstract: In this paper the international statistical literature of the last thirteen years is analyzed. Our aim is to understand on one side what are the most common topics, on the other side, where they are more developed and by whom. We want also to know how the topics are interconnected and how they evolved. For this purpose we use Scopus as bibliometric database, in particular the papers published in 16 journals that are representative for the statistical literature. For the analysis, we apply Topic Model approach.

41 citations


Journal Article
TL;DR: An approach to describe change over time of the latent process underlying multiple longitudinal outcomes of different types (binary, ordinal, quantitative) by relying on random-effect models, which handles individually varying and outcome-specific measurement times.
Abstract: We introduce a flexible latent process model to describe jointly multivariate longitudinal scales measuring the same underlying latent process. The main asset of this model is that it handles multiple types of longitudinal outcomes (quantitative, bounded quantitative outcomes and ordinal) and corrects for their metrological properties (ceiling/floor effects but also curvilinearity i.e varying sensitivity to change). Specifically, we combine the random-effect approach and the latent variable approach: the latent process trajectory is described by a (structural) linear mixed model while measurement models combine outcome-specific threshold models for ordinal outcomes and models based on a series of flexible parameterised nonlinear families of transformations for quantitative outcomes. The assets of the flexible latent process model are highlighted through several applications from a large population-based cognitive ageing study.

38 citations


Journal Article
TL;DR: In this paper, the authors used the second wave of SHARE (Survey of Health, Ageing and Retirement in Europe) to assess to what extent the outcomes of well-being analyses based on multidimensional indicators can be affected by the adoption of alternative weighting schemes.
Abstract: There is a general consensus about recognizing that the concept of well-being cannot be comprehensively captured by any traditional unidimensional indicator focusing on a single aspect of individuals’ condition, such as income. However, in spite of a general consensus on the necessity of using multidimensional indicators, there is an open debate on how different dimensions should be weighted within a single well-being indicator. In this paper we use the second wave of SHARE (Survey of Health, Ageing and Retirement in Europe) to assess to what extent the outcomes of well-being analyses based on multidimensional indicators can be affected by the adoption of alternative weighting schemes.

35 citations


Journal Article
TL;DR: In this article, Anderson and Deistler obtained one-sided representations for the Generalized Dynamic Factor Model (GDFM) without assuming finite dimension of the factor space and constructed corresponding estimators.
Abstract: Factor model methods recently have become extremely popular in the theory and practice of large panels of time series data. Those methods rely on various factor models which all are particular cases of the Generalized Dynamic Factor Model (GDFM) introduced in a paper by Forni, Hallin, Lippi and Reichlin, 2000. That paper, however, relies on Brillinger's dynamic principal components. The corresponding estimators are two-sided filters whose performance at the end of the observation period or for forecasting purposes is rather poor. No such problem arises with estimators based on standard principal components, which have been dominant in this literature. On the other hand, those estimators require the assumption that the space spanned by the factors has finite dimension. In the present paper, we argue that such an assumption is extremely restrictive and potentially quite harmful. Elaborating upon recent results published by Anderson and Deistler in 2008 on singular stationary processes with rational spectrum, we obtain one-sided representations for the GDFM without assuming finite dimension of the factor space. Construction of the corresponding estimators is also briefly outlined.

23 citations


Journal Article
TL;DR: The authors show that ignoring the matching variables in cohort studies produces a certain population causal effect and that the argument does not carry over to effect estimation in matched case-control studies although it does carry over for null-hypothesis testing.
Abstract: In observational studies of the effect of an exposure on an outcome, the exposure-outcome association is usually confounded by other causes of the outcome. One common method to increase efficiency is to match the study on potential confounders. Matched case-control studies are relatively common and well covered by the literature. Matched cohort studies are less common but increasingly recommended. It is commonly asserted that it is valid to ignore the matching variables, in the analysis of matched cohort data. In this paper we provide analyses delineating the scope and limits of this assertion. We show that ignoring the matching variables in cohort studies produces a certain population causal effect. We discuss why the argument does not carry over to effect estimation in matched case-control studies although it does carry over to null-hypothesis testing. We also show how it does not extend to matched cohort studies when one adjusts for additional confounders.

19 citations


Journal Article
TL;DR: In this paper, the authors proposed semiparametric methods based on kernel methods and penalized spline models to estimate dose-response functions of a continuous treatment, under the uncounfoundedness assumption.
Abstract: We propose semiparametric methods based on kernel methods and penalized spline models to estimate dose-response functions of a continuous treatment, under the uncounfoundedness assumption. The dose-response functions are estimated after adjusting for covariate imbalance by using the generalized propensity score.

17 citations


Journal Article
TL;DR: In this paper, a modified version of the three-step latent class approach is used to estimate a latent Markov model with individual covariates and possible dropout, which represents an useful estimation tool when a large number of observed variables and covariates occurs in the model.
Abstract: We illustrate the use of a modified version of the three-step latent class approach in order to estimate a latent Markov model with individual covariates and possible dropout. This approach represents an useful estimation tool when a large number of observed variables and covariates occurs in the model. Motivated by a study on the health status of elderly people hosted in Italian nursing homes, we address the problem to deal with informative missing responses and dropout due to the death of the patient. The proposed model allows us to account for both these types of missingness. We also consider a model in which time-constant and time varying covariates affect the initial and transition probabilities of the latent process, through a suitable parametrization. Aim of the study is to estimate the effect of each nursing home on the probability of transition between latent states, corresponding to different levels of the health status of the patients, and on the probability of dropout.

15 citations


Journal Article
TL;DR: In this article, a Bayesian Markov Switching Generalized Autoregressive Conditional Heteroscedasticity (MS-GARCH) model is proposed for determining time-varying Minimum Variance hedge ratio in energy futures markets.
Abstract: We propose Bayesian Markov Switching Generalized Autoregressive Conditional Heteroscedasticity (MS-GARCH) models for determining time-varying Minimum Variance (MV) hedge ratio in energy futures markets. We apply an efficient simulation based technique for inference and suggest a robust hedging strategy which accounts for model parameter uncertainty. The hedging model is further applied to crude oil and gasoline spot and futures markets.

12 citations


Journal Article
TL;DR: In this article, the authors proposed a new reliability measurement for polytomous ordinal items, which is based upon the theoretical framework of the classic polychoric correlation coefficient, but relaxes its fundamental assumption that the ordinal variables have a underlying multinormal distribution.
Abstract: We aim at proposing a new reliability measurement for polytomous ordinal items. Conventionally, reliability coefficients, such as Cronbach Alpha, are calculated using the Pearson correlation matrix. We suggest a modification of the classical Cronbach Alpha for ordinal variables, by using the polychoric correlation coefficient. In particular we consider the extension of polychoric correlation coefficient via copula approach. It builds upon the theoretical framework of the classic polychoric correlation coefficient, but relaxes its fundamental assumption that the ordinal variables have a underlying multinormal distributions. A simulation study is conducted in order to compare the proposed index to classical reliability measures.

8 citations


Journal Article
TL;DR: In this article, a Cox regression model is used to evaluate the insolvency risk caused by credits that enter into default, based on the conditional distribution function of the time to default.
Abstract: Credit risk models are used to evaluate the insolvency risk caused by credits that enter into default. Many models for credit risk have been developed over the past few decades. In this paper, we focus on those models that can be formulated in terms of the probability of default by using survival analysis techniques. In order to write the default probability in terms of the conditional distribution function of the time to default, in this paper we use the Cox regression model. We compare in terms of cross validation the results of the survival model with respect to classical models employed in the literature, such as generalised linear models based on logistic regression and non parametric techniques based on classification trees. An empirical study, based on real data, illustrates the performance of each model.

6 citations


Journal Article
TL;DR: The authors compare the performance of the HOPIT model with the non-parametric estimators put forward by King et al. (2004) and King and Wand (2007) using data relating to the health domains of mobility and memory from the Survey of Health, Ageing and Retirement in Europe.
Abstract: This paper compares the use of parametric and non-parametric approaches to adjust for heterogeneity in self-reported data. Despite the growing popularity of the HOPIT model to account for reporting heterogeneity when dealing with self-reported categorical data, recent evidence has questioned the validity of this heavily parametric approach. We compare the performance of the HOPIT model with the non-parametric estimators put forward by King et al. (2004) and King and Wand (2007). Using data relating to the health domains of mobility and memory from the Survey of Health, Ageing and Retirement in Europe (SHARE) we perform pairwise country comparisons of self-reported health, objective measures of health,and measures of health adjusted for the presence of reporting heterogeneity. Our study design focuses on comparisons of countries where there exist a discrepancy between the distribution of self-reported data and objective measures of health and assesses whether vignettes are able to reconcile this difference. Comparisons of distributions are based on first order stochastic dominance. In general, HOPIT and non-parametric estimation produce similar results in terms of first order stochastic dominance for the domains of both mobility and memory. Neither method consistently explains discrepancies across countries between self-reported and objective measures of health mobility and memory.

Journal Article
TL;DR: In this article, a nonparametric Item Response Theory model for dichotomously scored items in a Bayesian framework is proposed, where parts of the items are defined on the basis of inequality constraints among the latent class success probabilities.
Abstract: We propose a nonparametric Item Response Theory model for dichotomously scored items in a Bayesian framework. Partitions of the items are defined on the basis of inequality constraints among the latent class success probabilities. A Reversible Jump type algorithm is described for sampling from the posterior distribution.A consequence is the possibility to make inference on the number of dimensions (i.e., number of groups of items measuring the same latent trait) and to cluster items when unidimensionality is violated.

Journal Article
TL;DR: In this article, the authors investigated the role of unobserved frailty on the estimation of mortality differentials from age 50 on by education level and found that the models without frailty estimated a smaller educational gradient then the models with frailty.
Abstract: Background. This study investigated the role of unobserved frailty on the estimation of mortality differentials from age 50 on by education level. Data. We used data of a 36 years follow up from the Turin Longitudinal Study containing 391 170 men and 456 216 women. Methods. As Turin underwent strong immigration flows during the post war industrialization, also the macro-region of birth was controlled for. We fitted survival analysis models with and without the unobserved heterogeneity component, controlling for mortality improvement from both cohort and period perspectives. Results. We found that in the majority of the cases, the models without frailty estimated a smaller educational gradient then the models with frailty. Conclusions. The results draw the attention on the potential underestimation of the mortality inequalities by socioeconomic levels in survival models when not controlling for frailty.

Journal Article
Arthur Tenenhaus1
TL;DR: In this article, the authors compare the performance of RGCCA and Partial Least Squares Path Modeling (PLSPM) for finding relationships between J sets of variables observed on the same set of individual staking.
Abstract: Regularized Generalized Canonical Correlation Analysis (RGCCA) and Partial Least Squares Path Modeling (PLSPM) have been proposed for studying relationships between J sets of variables observed on the same set of individual staking into account a graph of connection between blocks. The main goal of this communication, is to compare the various options of PLS-PM and RGCCA. Actually, first comparisons show very close behavior of these two approaches.

Journal Article
TL;DR: An overview of taxicab correspondence analysis (TCA) of different kinds of datasets, for which the 1st TCA factor can be interpreted as simple sum score statistic is presented.
Abstract: We present an overview of taxicab correspondence analysis (TCA) of different kinds of datasets, for which the 1st TCA factor can be interpreted as simple sum score statistic.

Journal Article
TL;DR: This article proposed two adjustments to the aggregate association index (AAI) that overcome this problem and used Fisher's twin criminal data to demonstrate the application of the AAI and its adjustments to demonstrate that the true nature of the association between the variables is at great risk of being masked by the magnitude of the sample size.
Abstract: Recently, the aggregate association index (or AAI) was proposed to quantify the strength of the association between two dichotomous variables given only the marginal, or aggregate, data from a 2x2 contingency table One feature of this index is that it is susceptible to changes in the sample size; as the sample size increases, so too does the AAI even when the relative distribution of the aggregate data remains unchanged Therefore the true nature of the association between the variables is at great risk of being masked by the magnitude of the sample size This paper proposes two adjustments to the AAI that overcome this problem We consider a simple example using Fisher’s twin criminal data to demonstrate the application of the AAI and its adjustments

Journal Article
TL;DR: A methodology to deal with heterogeneity in modelling when the sources are unknown for the PLS-PM latent variable modelling and an application to an Alumni Satisfaction survey is presented.
Abstract: We present in this paper a methodology to deal with heterogeneity in modelling when the sources are unknown Although the approach is general we present it for the PLS-PM latent variable modelling We call such approach PATHMOX The idea behind PATHMOX is to build a path models tree having a binary decision tree look-alike structure with models for different segments in each of its nodes The split criterion consists in an F statistic for comparing structural models based on testing the equality of the path coefficients We emphasize the rationale of such approach and its limitations Finally we present an application to an Alumni Satisfaction survey

Journal Article
TL;DR: This work investigates the performance of computationally simple indirect estimators based on auxiliary models that do not require the Kalman filter implementation and considers a large latent factor model, in which the volatilities of common and idiosyncratic factors are conditionally heteroskedastic.
Abstract: A large latent factor model, in which the volatilities of common and idiosyncratic factors are conditionally heteroskedastic, is considered. We investigate the performance of computationally simple indirect estimators based on auxiliary models that do not require the Kalman filter implementation.

Journal Article
TL;DR: In this article, a class of nonseparable space-time correlation functions, termed as quasi-tapers, is proposed, which can be compactly supported over space or time.
Abstract: Covariance tapering is a well known technique used to avoid, or mitigate, the computational burdens required for estimating and/or predicting the parameters of the covariance function, for which it is required to work with large covariance matrices arising from irregularly spaced spatial data. We propose a class of nonseparable space-time correlation functions, termed here quasi-tapers, to mean that such correlations can be compactly supported over space or time.

Journal Article
TL;DR: Latent class models are applied to five repeated measures of voter turnout in the Dutch Parliamentary elections of 2006 and 2010 obtained from a probability sample of 9510 citizens, giving insight into the classification errors present in survey questions about voting.
Abstract: Demonstrates methods of detecting local dependence in binary data latent class models. Latent class models are applied to five repeated measures of voter turnout in the Dutch Parliamentary elections of 2006 and 2010 obtained from a probability sample of 9510 citizens. Modeling substantive local dependence as separate discrete latent variables while modeling nuisance dependencies as direct effects yields an interpretable model, giving insight into the classification errors present in survey questions about voting. The procedure followed stands in contrast to the “standard” procedure of increasing the number of latent classes until information criteria are satisfactory.

Journal Article
TL;DR: An approach to beanplot data analysis by PCA on the parameters of the models is proposed, building synthesis of multiple beanplot time series as indicators which can have relevant applications in Finance, in Risk Management and in other disciplines.
Abstract: Advances in computer technology have made large data ubiquitous and have determined the need to handles these data accordingly. In particular these data need to be aggregated using some function, but this process can lead to a loss of information. Beanplot series in this context can represent a solution in terms of special symbolic data: in fact the parameters of the density models based on a mixture of distribution can represent the original data accordingly and contribute to solve the problem of data storage. In this work we propose an approach to beanplot data analysis by PCA on the parameters of the models. The aim is building synthesis of multiple beanplot time series as indicators which can have relevant applications in Finance, in Risk Management and in other disciplines.

Journal Article
TL;DR: Two methods for clustering time series on the basis of their patterns over time are applied and developed to incorporate spatial correlation and stopping criteria are investigated to identify an appropriate number of clusters.
Abstract: Environmental time series observed over 100’s of monitoring locations usually possess some spatial structure in terms of common patterns throughout time, commonly described as temporal coherence. This paper will apply, develop and compare two methods for clustering time series on the basis of their patterns over time. The first approach treats the time series as functional data and applies hierarchical clustering while the second uses a state-space model based clustering approach. Both methods are developed to incorporate spatial correlation and stopping criteria are investigated to identify an appropriate number of clusters. The methods are applied to Total Organic Carbon data from river sites across Scotland.

Journal Article
TL;DR: This paper explores students’ strategies in tackling two undergraduate programmes, as a latent variable in a fuzzy states Markov chains approach, with respect to two former faculties of the University of Milano-Bicocca, with the aim of sketching academic settings where the circular agreement between undergraduates' strategies and the academic offer leads to a more favorable educational outcome.
Abstract: The remarkable length of stay in the university system of Italian students, compared with their European colleagues, is by and large a reckoned problem, to the point that governmental action has been taken to reduce the issue of long term students. For longer in the shadow, the creeping phenomenon of students prematurely leaving their university paths, the so called students’ retention, has briskly drawn attention as an additional growing theme, at the academic governance level still focusing mainly on students dropping out during their first curricular year, rather than the subsequent ones, due to the present public university funding. In the domestic educational system, the completion of a year, in terms of attainment of all the credits in the syllabus, does not represent a prerequisite for the administrative registration to the next year. This paper explores students’ strategies in tackling two undergraduate programmes, as a latent variable in a fuzzy states Markov chains approach, with respect to two former faculties of the University of Milano-Bicocca, with the aim of sketching academic settings where the circular agreement between undergraduates’ strategies and the academic offer leads to a more favorable educational outcome.

Journal Article
TL;DR: In this paper, the authors proposed a new procedure of decomposition of the multiple TAU index starting from a two-way contingency table with four categorical variables obtained by means of a concatenation of a predictor variable to another.
Abstract: Non-Symmetric Correspondence Analysis-NSCA (D’Ambra L. & Lauro, 1989) is a useful technique for analyzing a two-way contingency table. The key difference between the symmetrical and non-symmetrical versions of correspondence analysis rests on the measure of the association used to quantify the relationship between the variables. For a two-way, or multi-way, contingency table, the Pearson chi-squared statistic is commonly used when it can be assumed that the categorical variables are symmetrically related. However, for a two-way table, it may be that one variable can be treated as a predictor variable and the second variable can be considered as a response variable. Yet, for such a variable structure, the Pearson chi-squared statistic is not an appropriate measure of the association. Instead, one may consider the Goodman-Kruskal tau index. In the case that there are more than two cross-classified variables, multivariate versions of the Goodman-Kruskal tau index can be considered. These include Marcotorchino’s index (Marcotorchino, 1985) and Gray-Williams’ index (Gray & Williams, 1975). In the present paper, we suggest a new procedure of decomposition of the multiple TAU index starting from a two-way contingency table with four categorical variables obtained by means of a concatenation of a predictor variable to another. The Multiple non- Symmetric Correspondence Analysis- MNSCA (Gray, L. N., Williams, J. S,1975), along with the decomposition of the TAU in main effects and interaction (D’Ambra, L. et al., 2012) and (D’Ambra & Crisci, In press), is used for the evaluation of the overall passenger satisfaction of public transport service.

Journal Article
TL;DR: The simulation results showed that the effect of measurement model misspecification is much larger on the ML-SEM parameter estimates than on the PLS-PM estimates, and the inter-correlation level among formative MVs and the magnitude of the variance of the disturbance in the formative block have evident effects on the bias and the variability of the estimates.
Abstract: A common misunderstanding found in the literature is that only PLS-PM allows the estimation of SEM including formative blocks. However, if certain model specification rules are followed, the model is identified, and it is possible to estimate a Covariance-Based SEM with formative blocks. Due to the complexity of both SEM estimation techniques, we study, in the framework of the same simulation design, their relative performance, analysing the bias and the variability of the estimates. We find that both PLSPM and ML-SEM perform particularly well in terms of bias and efficiency of the parameter estimates when the variance of the disturbance in the formative block is small. As we increase the variance of the disturbance, the bias of the inner PLS estimates grows significantly, while the variability holds steady to a very low value. On the contrary, the inner ML estimates present a minor degree of bias, but the variability of grows drastically. Nevertheless, the two approaches behave almost equally in the formative outer block.

Journal Article
TL;DR: In this paper, the authors use Bayesian methods to estimate a multi-factor linear asset pricing model characterized by structural instability in factor loadings, idiosyncratic variances, and factor risk premia.
Abstract: We use Bayesian methods to estimate a multi-factor linear asset pricing model characterized by structural instability in factor loadings, idiosyncratic variances, and factor risk premia. We use such a framework to investigate the key differences in the pricing mechanism that applies to residential vs. non-residential (such as office space, industrial buildings, retail property) REITs. Under the assumption that the subprime crisis has had its epicentre in the housing/residential sector, we interpret any differential dynamics as indicative of the propagation mechanism of the crisis towards business-oriented segments of the US real estate market. We find important differences in the structure as well as the dynamic evolution of risk factor exposures across residential vs. non-residential RE-ITs. An analysis of cross-sectional mispricings reveals that only retail, residential, and mortgage-specialized REITs have been over-priced over the initial part of our sample, i.e.,1999-2006. However, the strongest mispricings occurred and may be still persisting in the office and regional mall-specialized REIT subsectors.

Journal Article
TL;DR: In this article, the authors assess the gender attitudes influence on the partners' allocation of both paid and unpaid work hours when they experienced the transition to parenthood, using a cluster-based classification of couples according to gender attitudes.
Abstract: The aim of this study is to assess the gender attitudes influence on the partners’ allocation of both paid and unpaid work hours when they experienced the transition to parenthood. To this purpose, we use a cluster-based classification of couples according to gender attitudes. The classification gives dummies that are added as explanatory variables in a Difference-in-Differences estimation procedure. We adjust for endogeneity of fertility on working activities by introducing a specific instrumental variable. For the empirical analysis we use data on Italian married and cohabiting couples provided by the Istat Multipurpose Panel Survey in the years 2003 and 2007.

Journal Article
TL;DR: Power priors are introduced as informative priors at item parameter and ability sampling steps within a Gibbs sampler scheme and the efficiency of this approach is demonstrated in terms of measurement precision with small samples.
Abstract: In this paper, we propose the introduction of power priors in the Bayesian estimation of item response theory models. Within this approach, information coming from historical data can be used for the estimation of model parameters based on current data. In the literature, power priors have been discussed for generalized linear models. In this work, power priors are introduced as informative priors at item parameter and ability sampling steps within a Gibbs sampler scheme. By using data from the Hospital Anxiety and Depression Scale (HADS), the efficiency of this approach is demonstrated in terms of measurement precision with small samples.

Journal Article
TL;DR: In this paper, a comparison between three methods of structural equation models identification of composite variables with one outcome variable is made. And the results suggest that the Canonical Correlation Analysis (CCA) model is the best model for the designed SEM population about recovery bias.
Abstract: Composite variables (CV), i.e. latent variables with “causal” or “formative” indicators, must influence two or more distinct outcomes for meaningful fitting via Structural Equation Models (SEM). This constraint determine “interpretational confounding”. A comparison is undertaken between three methods of SEM identification of a CV with one outcome variable. Specifically, Multiple Indicators of Multiple Causes (MIMIC), Principal Component Analysis (PCA), and Canonical Correlation Analysis (CCA) model specifications were proposed to operationalize a CV and a Monte Carlo simulation comparison was performed. The results suggest that the CCA model is the best model for the designed SEM population about recovery bias.

Journal Article
TL;DR: Aim of this paper is to deal with the word-sense disambiguation problem, not in the usual pre-processing step, but during the analysis, to propose some statistical lexical sources, useful in the peculiar domain of business information.
Abstract: Assuming that language can be modelled as a network of words, it is difficult to mine knowledge in textual data bases, due to their high dimensionality and the ambiguity which characterises words and their use. From a methodological viewpoint, here we propose a strategy for stressing the differences in the manifest relations emerging by Network Analysis (NA) and the latent relations obtained by lexical Correspondence Analysis (CA). Aim of this paper is to deal with the word-sense disambiguation problem, not in the usual pre-processing step, but during the analysis. The results applied to the analysis of a management commentary are presented in order to propose some statistical lexical sources, useful in the peculiar domain of business information.