scispace - formally typeset
Search or ask a question

Showing papers by "Francesco Bartolucci published in 2014"


Journal ArticleDOI
21 Aug 2014-Test
TL;DR: A comprehensive overview of latent Markov (LM) models for the analysis of longitudinal categorical data is provided and methods for selecting the number of states and for path prediction are outlined.
Abstract: We provide a comprehensive overview of latent Markov (LM) models for the analysis of longitudinal categorical data. We illustrate the general version of the LM model which includes individual covariates, and several constrained versions. Constraints make the model more parsimonious and allow us to consider and test hypotheses of interest. These constraints may be put on the conditional distribution of the response variables given the latent process (measurement model) or on the distribution of the latent process (latent model). We also illustrate in detail maximum likelihood estimation through the Expectation–Maximization algorithm, which may be efficiently implemented by recursions taken from the hidden Markov literature. We outline methods for obtaining standard errors for the parameter estimates. We also illustrate methods for selecting the number of states and for path prediction. Finally, we mention issues related to Bayesian inference of LM models. Possibilities for further developments are given among the concluding remarks.

61 citations


Journal ArticleDOI
TL;DR: In this paper, a class of multidimensional item response theory models for polytomously-scored items with ordinal response categories was proposed, which allows for different parameterizations for the conditional distribution of the response variables given the latent traits, which depend on the type of link function and the constraints imposed on the item parameters.
Abstract: We propose a class of multidimensional Item Response Theory models for polytomously-scored items with ordinal response categories. This class extends an existing class of multidimensional models for dichotomously-scored items in which the latent abilities are represented by a random vector assumed to have a discrete distribution, with support points corresponding to different latent classes in the population. In the proposed approach, we allow for different parameterizations for the conditional distribution of the response variables given the latent traits, which depend on the type of link function and the constraints imposed on the item parameters. Moreover, we suggest a strategy for model selection that is based on a series of steps consisting of selecting specific features, such as the dimension of the model (number of latent traits), the number of latent classes, and the specific parameterization. In order to illustrate the proposed approach, we analyze a dataset from a study on anxiety and depression...

48 citations


Journal ArticleDOI
TL;DR: In this paper, a latent process is modelled by a mixture of auto-regressive AR(1) processes with different means and correlation coefficients, but with equal variances.
Abstract: Summary Motivated by an application to a longitudinal data set coming from the Health and Retirement Study about self-reported health status, we propose a model for longitudinal data which is based on a latent process to account for the unobserved heterogeneity between sample units in a dynamic fashion. The latent process is modelled by a mixture of auto-regressive AR(1) processes with different means and correlation coefficients, but with equal variances. We show how to perform maximum likelihood estimation of the proposed model by the joint use of an expectation–maximization algorithm and a Newton–Raphson algorithm, implemented by means of recursions developed in the hidden Markov model literature. We also introduce a simple method to obtain standard errors for the parameter estimates and suggest a strategy to choose the number of mixture components. In the application the response variable is ordinal; however, the approach may also be applied in other settings. Moreover, the application to the self-reported health status data set allows us to show that the model proposed is more flexible than other models for longitudinal data based on a continuous latent process. The model also achieves a goodness of fit that is similar to that of models based on a discrete latent process following a Markov chain, while retaining a reduced number of parameters. The effect of different formulations of the latent structure of the model is evaluated in terms of estimates of the regression parameters for the covariates.

44 citations


Journal ArticleDOI
TL;DR: In this article, a class of Item Response Theory (IRT) models for binary and ordinal polytomous items is illustrated and an R package for dealing with these models, named MultiLCIRT, is described.

39 citations


Journal ArticleDOI
TL;DR: In this paper, the authors investigated the relationship between self-reported health and the employment status in Italy using the Survey on Household Income and Wealth (SHIW), and found that temporary workers, first-job seekers and unemployed individuals are worse off than permanent employees, especially males, young workers, and those living in the center and south of Italy.
Abstract: The considerable increase of non-standard labor contracts, unemployment and inactivity rates raises the question of whether job insecurity and the lack of job opportunities affect physical and mental well-being differently from being employed with an open-ended contract. In this paper we offer evidence on the relationship between self-reported health and the employment status in Italy using the Survey on Household Income and Wealth (SHIW); another aim is to investigate whether these potential inequalities have changed with the recent economic downturn (time period 2006-2010). We estimate an ordered logit model with self-reported health status (SRHS) as response variable based on a fixed-effects approach which has certain advantages with respect to the random-effects formulation: the fixed-effects nature of the model also allows us to solve the problems of incidental parameters and non-random selection of individuals into different labor market categories. We find that temporary workers, first-job seekers and unemployed individuals are worse off than permanent employees, especially males, young workers, and those living in the center and south of Italy. Health inequalities between permanent workers and job seekers widen over time for male and young workers, and arise in the north of the country as well.

36 citations


Posted Content
TL;DR: Evidence is offered on the relationship between self-reported health and the employment status in Italy using the Survey on Household Income and Wealth (SHIW), which finds that temporary workers, first-job seekers and unemployed individuals are worse off than permanent employees.
Abstract: The considerable increase of non-standard labor contracts, unemployment and inactivity rates raises the question of whether job insecurity and the lack of job opportunities affect physical and mental well-being differently from being employed with an open-ended contract. In this paper we offer evidence on the relationship between Self Reported Health Status (SRHS) and the employment status in Italy using the Survey on Household Income and Wealth; another aim is to investigate whether these potential inequalities have changed with the recent economic downturn (time period 2006-2010). We estimate an ordered logit model with SRHS as response variable based on a fixed-effects approach which has certain advantages with respect to the random-effects formulation and has not been applied before with SRHS data. The fixed-effects nature of the model also allows us to solve the problems of incidental parameters and non-random selection of individuals into different labor market categories. We find that temporary workers, unemployed and inactive individuals are worse off than permanent employees, especially males, young workers, and those living in the center and south of Italy. Health inequalities between unemployed/inactive and permanent workers widen over time for males and young workers, and arise in the north of the country as well.

32 citations


Journal ArticleDOI
TL;DR: A generalized multiple-try version of the Reversible Jump algorithm, based on drawing several proposals at each step and randomly choosing one of them on the basis of weights that may be arbitrarily chosen, which leads to a gain in efficiency and computational effort.

17 citations


Posted Content
TL;DR: In this article, an extended class of multidimensional latent class IRT models characterized by latent traits defined at student level and at school level, latent traits represented through random vectors with a discrete distribution, and a two-parameter logistic parametrization for the conditional probability of a correct response given the ability.
Abstract: Within the educational context, a key goal is to assess students acquired skills and to cluster students according to their ability level. In this regard, a relevant element to be accounted for is the possible effect of the school students come from. For this aim, we provide a methodological tool which takes into account the multilevel structure of the data (i.e., students in schools) in a suitable way. This approach allows us to cluster both students and schools into homogeneous classes of ability and effectiveness, and to assess the effect of certain students and school characteristics on the probability to belong to such classes. The approach relies on an extended class of multidimensional latent class IRT models characterized by: (i) latent traits defined at student level and at school level, (ii) latent traits represented through random vectors with a discrete distribution, (iii) the inclusion of covariates at student level and at school level, and (iv) a two-parameter logistic parametrization for the conditional probability of a correct response given the ability. The approach is applied for the analysis of data collected by two national tests administered in Italy to middle school students in June 2009: the INVALSI Italian Test and Mathematics Test. Results allow us to study the relationships between observed characteristics and latent trait standing within each latent class at the different levels of the hierarchy. They show that examinees and school expected observed scores, at a given latent trait level, are dependent on both unobserved (latent class) group membership and observed first and second level covariates.

17 citations


Journal ArticleDOI
TL;DR: The birthweight of second-borns is significantly higher than that of first- borns and Statistically significant effects are related with a longer gestational age, an increased number of visits during the pregnancy, and the gender of infants.
Abstract: Objectives: We investigate the differences in birthweight between first- and second-borns, evaluating the impact of changes in pregnancy (e.g., gestational age), demographic (e.g., age), and social (e.g., education level, marital status) maternal characteristics. Data and Methods: All analyses are performed on data collected in Umbria (Italy) taking into account a set of 792 women who delivered twice from 2005 to 2008. Firstly, we use a univariate paired t-test for the comparison between weights of first- and second-borns; Secondly, we use linear and nonlinear regression approaches in order to: (i) evaluate the effect of demographic and social maternal characteristics and (ii) predict the odds-ratio of low and high birthweight infants, respectively. Results: We find that the birthweight of second-borns is significantly higher than that of first-borns. Statistically significant effects are related with a longer gestational age, an increased number of visits during the pregnancy, and the gender of infants. On the other hand, we do not observe any significant effect related with mother’s age and with other characteristics of interest.

16 citations


Posted Content
01 Jan 2014
TL;DR: In this paper, the authors investigated the differences in birthweight between first and second-borns, evaluating the impact of changes in pregnancy (e.g., gestational age), demographic and social maternal characteristics.
Abstract: Objectives We investigate about the differences in birthweight between firstand second-borns, evaluating the impact of changes in pregnancy (e.g., gestational age), demographic (e.g., age), and social (e.g., education level, marital status) maternal characteristics. Data and Methods All analyses are performed on data collected in Umbria (Italy) taking into account a set of 792 women who delivered twice from 2005 to 2008. Firstly, we use a univariate paired t-test for the comparison between weights of first- and second-borns. Secondly, we use linear and nonlinear regression approaches in order to: (i) evaluate the effect of demographic and social maternal characteristics and (ii) predict the odds-ratio of low and high birthweight infants, respectively. Results We find that the birthweight of second-borns is significantly higher than that of first-borns. Statistically significant effects are related with a longer gestational age, an increased number of visits during the pregnancy, and the gender of infants. On the other hand, we do not observe any significant effect related with mother’s age and with other characteristics of interest.

13 citations



Posted Content
TL;DR: In this article, the authors analyse job satisfaction among Russian young workers by using the data collected f or four items, the first of which concerns the general satisfaction about the job; the other three items concern specific aspects of job satisfaction with respect to work condition, earning, and opport unity for professional growth.
Abstract: A growing economic literature regards the analysis of job satisfaction; however, as for young people the investigations are still scarce. I n this paper we analyse job satisfaction among Russian young workers by using the data collected f or four items, the first of which concerns the general satisfaction about the job; the other three items concern specific aspects of job satisfaction with respect to work condition, earning, and opport unity for professional growth. The corresponding response variables are categorical wi th five ordered categories, from “absolutely unsatisfied” to “absolutely satisfied”. The longitu dinal dataset also contains personal information about the respondents (gender, age, marital status, number of children, educational level, etc.). We estimate ordered logit models of job satisfaction w ith individual fixed effects for a panel data of Russian young workers, carrying out separate analys es for the general job satisfaction variable and three variables on specific aspects of job satisfac tion. If wages adjusted to fully compensate workplace disamenities, we would expect that differ ences in job satisfaction across individuals would not be systematically related to wage differe ntials, ceteris paribus. But this is not the case f or our panel: for all but one of the samples considere d there is at least one job satisfaction variable with a significantly positive wage effect. We, ther efore, interpret this result as a failure of the th eory of compensating wage differentials in the Russian y outh labour market. There is the interesting exception, though, that compensating wage different ials do seem at work among the older subjects in the panel. Our estimates also show strong gender and location effects.

Journal ArticleDOI
TL;DR: The recursion for hidden Markov (HM) models proposed by Bartolucci and Besag (2002) is developed and it is shown how it may be used to implement an estimation algorithm for these models that requires an amount of memory not depending on the length of the observed series of data.
Abstract: We develop the recursion for hidden Markov (HM) models proposed by Bartolucci and Besag (2002), and we show how it may be used to implement an estimation algorithm for these models that requires an amount of memory not depending on the length of the observed series of data. This recursion allows us to obtain the conditional distribution of the latent state at every occasion, given the previous state and the observed data. With respect to the estimation algorithm based on the well-known Baum-Welch recursions, which requires an amount of memory that increases with the sample size, the proposed algorithm also has the advantage of not requiring dummy renormalizations to avoid numerical problems. Moreover, it directly allows us to perform global decoding of the latent sequence of states, without the need of a Viterbi method and with a consistent reduction of the memory requirement with respect to the latter. The proposed approach is compared, in terms of computing time and memory requirement, with the...

Journal ArticleDOI
21 Aug 2014-Test
TL;DR: Their contribution strengthen the original message about the mathematical and analytical flexibility of the proposed Latent Markov (LM) framework, and especially about the usefulness in many areas of application in which LM models can be seen to arise as observation-driven models.
Abstract: First of all, we thank all the Discussants and the four Referees for their contribution. They helped us to clarify some of the features of the methodology and to answer questions we had in writing the paper; they also stimulated us to consider new interesting issues. All Discussants mention additional important applications, including interesting examples in areas like language processing, psychology, and finance. Their contribution strengthen our original message about the mathematical and analytical flexibility of the proposed Latent Markov (LM) framework, and especially about the usefulness in many areas of application in which LM models can be seen to arise as observation-driven models (Cox, 1981). Furthermore, their comments confirm that generalizations of current LM models can often be directly obtained with minor adjustments to inference. In the following we comment on some of the issues raised by the Discussants. Bockenhold and McShane mention a very interesting and parsimonious extension of the first-order model, which can also include non-memoryless holding times. This is potentially very interesting as in many cases persistency in latent states is stronger than the model predicts, and related to the literature on semi-Markov models and hidden semi-Markov models, which are also mentioned by Visser and Speekenbrink.

Book ChapterDOI
01 Jan 2014
TL;DR: In this article, a class of hidden Markov models for longitudinal data is presented, where unobserved individual characteristics of interest are represented by a sequence of discrete latent variables, which follows a Markov chain.
Abstract: I review a class of models for longitudinal data, showing how it may be applied in a meaningful way for the analysis of data collected by the administration of a series of items finalized to educational or psychological measurement In this class of models, the unobserved individual characteristics of interest are represented by a sequence of discrete latent variables, which follows a Markov chain Inferential problems involved in the application of these models are discussed considering, in particular, maximum likelihood estimation based on the Expectation-Maximization algorithm, model selection, and hypothesis testing Most of these problems are common to hidden Markov models for time-series data The approach is illustrated by different applications in education and psychology

Posted Content
TL;DR: In this article, a structural equation model is proposed for the analysis of binary item responses with non-ignorable missingness, which is driven by two sets of latent variables: one describing the propensity to respond and the other referred to the abilities measured by the test items.
Abstract: We propose a structural equation model, which reduces to a multidimensional latent class item response theory model, for the analysis of binary item responses with non-ignorable missingness. The missingness mechanism is driven by two sets of latent variables: one describing the propensity to respond and the other referred to the abilities measured by the test items. These latent variables are assumed to have a discrete distribution, so as to reduce the number of parametric assumptions regarding the latent structure of the model. Individual covariates may also be included through a multinomial logistic parametrization of the probabilities of each support point of the distribution of the latent variables. Given the discrete nature of this distribution, the proposed model is efficiently estimated by the Expectation-Maximization algorithm. A simulation study is performed to evaluate the finite sample properties of the parameter estimates. Moreover, an application is illustrated to data coming from a Students' Entry Test for the admission to some university courses.

Journal ArticleDOI
TL;DR: In this article, the null hypothesis of symmetry may be formulated in terms of a normal mixture model, with weights about the center of symmetry constrained to be equal one another, and the resulting model is nested in a more general unconstrained one, with the same number of mixture components and free weights.

Journal ArticleDOI
TL;DR: In implementing the EFFBS algorithm, the author finds a numerical problem that limits its applicability, providing some possible explanations of the causes of the error, together with two illustrative examples.

Posted Content
TL;DR: In this article, a modified version of the three-step estimation method for the latent class model with covariates is proposed, which may be used to estimate latent Markov models for longitudinal data.
Abstract: We propose a modified version of the three-step estimation method for the latent class model with covariates, which may be used to estimate latent Markov models for longitudinal data. The three-step estimation approach we propose is based on a preliminary clustering of sample units on the basis of the time specific responses only. This approach represents an useful estimation tool when a large number of response variables are observed at each time occasion. In such a context, full maximum likelihood estimation, which is typically based on the Expectation-Maximization algorithm, may have some drawbacks, essentially due to the presence of many local maxima of the model likelihood. Moreover, the EM algorithm may be particularly slow to converge, and may become unstable with complex LM models. We prove the consistency of the proposed three-step estimator when the number of response variables tends to infinity. We also show the results of a simulation study aimed at evaluating the performance of the proposed alternative approach with respect to the full likelihood method. We finally illustrate an application to a real dataset on the health status of elderly people hosted in Italian nursing homes.



Posted Content
TL;DR: This paper shows the application of an item selection algorithm to real data collected within a project, named ULISSE, on the quality-of-life of elderly patients hosted in italian nursing homes and finds the subset of items which provides the best clustering according to the Bayesian Information Criterion.
Abstract: The evaluation of nursing homes is usually based on the administration of questionnaires made of a large number of polytomous items. In such a context, the Latent Class (LC) model represents a useful tool for clustering subjects in homogenous groups corresponding to different degrees of impairment of the health conditions. It is known that the performance of model-based clustering and the accuracy of the choice of the number of latent classes may be affected by the presence of irrelevant or noise variables. In this paper, we show the application of an item selection algorithm to real data collected within a project, named ULISSE, on the quality-of-life of elderly patients hosted in italian nursing homes. This algorithm, which is closely related to that proposed by Dean and Raftery in 2010, is aimed at finding the subset of items which provides the best clustering according to the Bayesian Information Criterion. At the same time, it allows us to select the optimal number of latent classes. Given the complexity of the ULISSE study, we perform a validation of the results by means of a sensitivity analysis to different specifications of the initial subset of items and of a resampling procedure.