scispace - formally typeset
Search or ask a question

Showing papers by "Francesco Bartolucci published in 2012"


Book
29 Oct 2012
TL;DR: This book discusses Latent Markov Modeling as a guide to Bayesian inference via reversible jump, and its applications include selection and hypothesis testing, and modeling and inference of latent variable models and their applications.
Abstract: Overview on Latent Markov Modeling Introduction Literature review on latent Markov models Alternative approaches Example datasets Background on Latent Variable and Markov Chain Models Introduction Latent variable models Expectation-Maximization algorithm Standard errors Latent class model Selection of the number of latent classes Applications Markov chain model for longitudinal data Applications Basic Latent Markov Model Introduction Univariate formulation Multivariate formulation Model identifiability Maximum likelihood estimation Selection of the number of latent states Applications Constrained Latent Markov Models Introduction Constraints on the measurement model Constraints on the latent model Maximum likelihood estimation Model selection and hypothesis testing Applications Including Individual Covariates and Relaxing Basic Model Assumptions Introduction Notation Covariates in the measurement model Covariates in the latent model Interpretation of the resulting models Maximum likelihood estimation Observed information matrix, identifiability, and standard errors Relaxing local independence Higher order extensions Applications Including Random Effects and Extension to Multilevel Data Introduction Random-effects formulation Maximum likelihood estimation Multilevel formulation Application to the student math achievement dataset Advanced Topics about Latent Markov Modeling Introduction Dealing with continuous response variables Dealing with missing responses Additional computational issues Decoding and forecasting Selection of the number of latent states Bayesian Latent Markov Models Introduction Prior distributions Bayesian inference via reversible jump Alternative sampling Application to the labor market dataset Appendix: Software List of Main Symbols Bibliography Index

181 citations


Posted Content
TL;DR: In this article, a comprehensive overview of latent Markov (LM) models for the analysis of longitudinal data is provided, which make the model more parsimonious and allow us to consider and test hypotheses of interest.
Abstract: We provide a comprehensive overview of latent Markov (LM) models for the analysis of longitudinal data. The main assumption behind these models is that the response variables are conditionally independent given a latent process which follows a first-order Markov chain. We first illustrate the more general version of the LM model which includes individual covariates. We then illustrate several constrained versions of the general LM model, which make the model more parsimonious and allow us to consider and test hypotheses of interest. These constraints may be put on the conditional distribution of the response variables given the latent process (measurement model) or on the distribution of the latent process (latent model). For the general version of the model we also illustrate in detail maximum likelihood estimation through the Expectation-Maximization algorithm, which may be efficiently implemented by recursions known in the hidden Markov literature. We discuss about the model identifiability and we outline methods for obtaining standard errors for the parameter estimates. We also illustrate methods for selecting the number of states and for path prediction. Finally, we illustrate Bayesian estimation method. Models and related inference are illustrated by the description of relevant socio-economic applications available in the literature.

44 citations


Posted Content
TL;DR: A class of Item Response Theory models for binary and ordinal polytomous items is illustrated and an R package for dealing with these models, named MultiLCIRT, is described, allowing for multidimensionality and discreteness of latent traits.
Abstract: We illustrate a class of Item Response Theory (IRT) models for binary and ordinal polythomous items and we describe an R package for dealing with these models, which is named MultiLCIRT. The models at issue extend traditional IRT models allowing for (i) multidimensionality and (ii) discreteness of latent traits. This class of models also allows for different parameterizations for the conditional distribution of the response variables given the latent traits, depending on both the type of link function and the constraints imposed on the discriminating and the difficulty item parameters. We illustrate how the proposed class of models may be estimated by the maximum likelihood approach via an Expectation-Maximization algorithm, which is implemented in the MultiLCIRT package, and we discuss in detail issues related to model selection. In order to illustrate this package, we analyze two datasets: one concerning binary items and referred to the measurement of ability in mathematics and the other one coming from the administration of ordinal polythomous items for the assessment of anxiety and depression. In the first application, we illustrate how aggregating items in homogeneous groups through a model-based hierarchical clustering procedure which is implemented in the proposed package. In the second application, we describe the steps to select a specific model having the best fit in our class of IRT models.

38 citations


Journal ArticleDOI
TL;DR: It is concluded that all individual socioeconomic factors are strongly associated with the outcomes at birth, apart from the deprivation index, which does not affect these outcomes, showing the proper implementation of the Health System.
Abstract: We examine the effects of mother’s characteristics and socioeconomic condition on weight at birth and preterm delivery in an Italian region (Umbria). The study concerns all live-born singleton infants in 2007 with at least a gestational age of 22 weeks. Information derived from the Standard Certificate of Live Birth was linked to information from census statistics, so as to obtain a deprivation index. On the basis of the fitting of two separate logistic regression models, we conclude that all individual socioeconomic factors are strongly associated with the outcomes at birth, apart from the deprivation index. Older and less educated mothers, and those with lower occupational level, have a higher probability to run into preterm delivery with respect to the other mothers. The relative risk ratios for low birth weight are significantly higher for older mothers, non-European, and not married. Lower weight rates are found in infants from complicated pregnancy and non-spontaneous conception. Effects of mother’s characteristics on weight at birth and weeks of gestation are confirmed. The deprivation index does not affect these outcomes, showing the proper implementation of the Health System.

34 citations


Journal ArticleDOI
TL;DR: In this paper, the dynamic logit model for binary panel data may be approximated by a quadratic exponential model, where simple sufficient statistics exist for the subject-specific parameters introduced to capture the unobserved heterogeneity between subjects.

30 citations


Journal ArticleDOI
TL;DR: In this paper, a Bayesian approach is developed for selecting the model that is most supported by the data within a class of marginal models for categorical variables, which are formulated through equality and/or inequality constraints on generalized logits (local, global, continuation, or reverse continuation).

14 citations


Journal ArticleDOI
TL;DR: This work investigates two relevant issues: dimensionality of the latent structure and discriminating power of the items composing the questionnaire, based on a multidimensional item response theory model, which assumes a two-parameter logistic parameterization for the response probabilities.
Abstract: With reference to a questionnaire aimed at assessing the performance of Italian nursing homes on the basis of the health conditions of their patients, we investigate two relevant issues: dimensionality of the latent structure and discriminating power of the items composing the questionnaire. The approach is based on a multidimensional item response theory model, which assumes a two-parameter logistic parameterization for the response probabilities. This model represents the health status of a patient by latent variables having a discrete distribution and, therefore, it may be seen as a constrained version of the latent class model. On the basis of the adopted model, we implement a hierarchical clustering algorithm aimed at assessing the actual number of dimensions measured by the questionnaire. These dimensions correspond to disjoint groups of items. Once the number of dimensions is selected, we also study the discriminating power of every item, so that it is possible to select the subset of these items which is able to provide an amount of information close to that of the full set. We illustrate the proposed approach on the basis of the data collected on 1,051 elderly people hosted in a sample of Italian nursing homes.

13 citations


Posted Content
TL;DR: The approach is illustrated by three applications based on different datasets, which also include explanatory variables, where the model selection is performed by using Bayes factors estimated through an importance sampling method.
Abstract: We develop a Bayesian approach for selecting the model which is the most supported by the data within a class of marginal models for categorical variables formulated through equality and/or inequality constraints on generalised logits (local, global, continuation or reverse continuation), generalised log-odds ratios and similar higher-order interactions. For each constrained model, the prior distribution of the model parameters is formulated following the encompassing prior approach. Then, model selection is performed by using Bayes factors which are estimated by an importance sampling method. The approach is illustrated through three applications involving some datasets, which also include explanatory variables. In connection with one of these examples, a sensitivity analysis to the prior specification is also considered.

10 citations



Journal ArticleDOI
TL;DR: In this paper, modified profile likelihood methods are applied to estimate the structural parameters of econometric models for panel data, with a remarkable reduction of bias with respect to the ordinary likelihood methods.
Abstract: We show how modified profile likelihood methods, developed in the statistical literature, may be effectively applied to estimate the structural parameters of econometric models for panel data, with a remarkable reduction of bias with respect to the ordinary likelihood methods. The implementation of these methods is illustrated in detail for certain static and dynamic models which are commonly used in economic applications. We consider, in particular, the truncated linear regression model, the first order autoregressive model, the (static and dynamic) logit model, and the (static and dynamic) probit model. Differently from static models, dynamic models include the lagged response variable among the regressors. For each of these models, we report the results of simulation studies showing the good behaviour of the proposed estimation methods, even with respect to an ideal, although infeasible, procedure. The methods are made available through an R package.

7 citations


Posted Content
TL;DR: In this article, a causal analysis of the mother's educational level on the health status of the newborn, in terms of gestational weeks and weight, is presented, based on a finite mixture structural equation model, the parameters of which have a causal interpretation.
Abstract: We propose a causal analysis of the mother’s educational level on the health status of the newborn, in terms of gestational weeks and weight. The analysis is based on a finite mixture structural equation model, the parameters of which have a causal interpretation. The model is applied to a dataset of almost ten thausand deliveries collected in an Italian region. The analysis confirms that standard regression overestimates the impact of education on the child health. With respect to the current economic literature, our findings indicate that only high education has positive consequences on child health, implying that policy efforts in education should have benefits for welfare.

Posted Content
TL;DR: In this paper, a multidimensional latent class Rasch model and its application to data about the measurement of some aspects of health-related quality of life and anxiety and depression in oncological patients was described.
Abstract: The work describes a multidimensional latent class Rasch model and its application to data about the measurement of some aspects of Health-related Quality of Life and Anxiety and Depression in oncological patients

Journal ArticleDOI
TL;DR: An algorithm for item selection, which is aimed at finding the smallest subset of items which provides an amount of information close to that of the initial set of items, is proposed.
Abstract: The evaluation of nursing homes and the assessment of the quality of the health care provided to their patients are usually based on the administration of questionnaires made of a large number of polytomous items. In applications involving data collected by questionnaires of this type, the Latent Class (LC) model represents a useful tool for classifying subjects in homogenous groups. In this paper, we propose an algorithm for item selection, which is based on the LC model. The proposed algorithm is aimed at finding the smallest subset of items which provides an amount of information close to that of the initial set. The method sequentially eliminates the items that do not significantly change the classification of the subjects in the sample with respect to the classification based on the full set of items. The LC model, and then the item selection algorithm, may be also used with missing responses that are dealt with assuming a form of latent ignorability. The potentialities of the proposed approach are illustrated through an application to a nursing home dataset collected within the ULISSE project, which concerns the quality-of-life of elderly patients hosted in Italian nursing homes. The dataset presents several issues, such as missing responses and a very large number of items included in the questionnaire.

Posted Content
TL;DR: It is shown how the null hypothesis of symmetry may be formulated in terms of a normal mixture model, with weights about the center of symmetry constrained to be equal one another, and nested in a more general unconstrained one.
Abstract: Given a random sample of observations, mixtures of normal densities are often used to estimate the unknown continuous distribution from which the data come. Here we propose the use of this semiparametric framework for testing symmetry about an unknown value. More precisely, we show how the null hypothesis of symmetry may be formulated in terms of normal mixture model, with weights about the centre of symmetry constrained to be equal one another. The resulting model is nested in a more general unconstrained one, with same number of mixture components and free weights. Therefore, after having maximised the constrained and unconstrained log-likelihoods by means of a suitable algorithm, such as the Expectation-Maximisation, symmetry is tested against skewness through a likelihood ratio statistic. The performance of the proposed mixture-based test is illustrated through a Monte Carlo simulation study, where we compare two versions of the test, based on different criteria to select the number of mixture components, with the traditional one based on the third standardised moment. An illustrative example is also given that focuses on real data.

Posted Content
TL;DR: A class of multidimensional Item Response Theory models for polytomously-scored items with ordinal response categories is proposed and a strategy for model selection that is based on a series of steps consisting of selecting specific features, such as the dimension of the model, the number of latent classes, and the specific parameterization is suggested.
Abstract: We propose a class of Item Response Theory models for items with ordinal polytomous responses, which extends an existing class of multidimensional models for dichotomously-scored items measuring more than one latent trait. In the proposed approach, the random vector used to represent the latent traits is assumed to have a discrete distribution with support points corresponding to different latent classes in the population. We also allow for different parameterizations for the conditional distribution of the response variables given the latent traits - such as those adopted in the Graded Response model, in the Partial Credit model, and in the Rating Scale model - depending on both the type of link function and the constraints imposed on the item parameters. For the proposed models we outline how to perform maximum likelihood estimation via the Expectation-Maximization algorithm. Moreover, we suggest a strategy for model selection which is based on a series of steps consisting of selecting specific features, such as the number of latent dimensions, the number of latent classes, and the specific parametrization. In order to illustrate the proposed approach, we analyze data deriving from a study on anxiety and depression as perceived by oncological patients.

Posted Content
TL;DR: In this paper, the authors propose an approach based on nested hidden Markov chains, which are associated to every sample unit and to every cluster, and make inference on the proposed model through a composite likelihood function based on all the possible pairs of subjects within every cluster.
Abstract: In the context of multilevel longitudinal data, where sample units are collected in clusters, an important aspect that should be accounted for is the unobserved heterogeneity between sample units and between clusters. For this aim we propose an approach based on nested hidden (latent) Markov chains, which are associated to every sample unit and to every cluster. The approach allows us to account for the mentioned forms of unobserved heterogeneity in a dynamic fashion; it also allows us to account for the correlation which may arise between the responses provided by the units belonging to the same cluster. Given the complexity in computing the manifest distribution of these response variables, we make inference on the proposed model through a composite likelihood function based on all the possible pairs of subjects within every cluster. The proposed approach is illustrated through an application to a dataset concerning a sample of Italian workers in which a binary response variable for the worker receiving an illness benefit was repeatedly observed.

23 May 2012
TL;DR: In this article, a multidimensionallatent class IRT model is proposed, in which the missingness mechanism is driven by a latent variable (propensity to answer) correlated with the latent variable for the ability (orabilities) measured by the test items.
Abstract: A relevant problem in applications of Item Response Theory (IRT) modelsis the presence of nonignorable missing responses. We propose a multidimensionallatent class IRT model in which the missingness mechanism is driven by a latentvariable (propensity to answer) correlated with the latent variable for the ability (orabilities) measured by the test items. These two latent variables are assumed to havea joint discrete distribution. This assumption is convenient both from the computationalpoint of view and for the decisional process, since individuals are classifiedin homogeneous latent classes which may be associated to the same treatment.Moreover, this assumption avoids parametric formulations for the distribution of thelatent variables, giving rise to a semiparametric model. The proposed approach isillustrated through an application to data coming from a Students’ Entry Test for theadmission to the courses in Economics in an Italian University.

Posted Content
08 Oct 2012
TL;DR: In this article, the causal effect of a sequential binary treatment (typically corresponding to a policy or a subsidy in the economic context) on a final outcome, when the treatment assignment at a given occasion depends on the sequence of previous assignments as well as on time-varying confounders, is considered.
Abstract: We consider estimation of the causal effect of a sequential binary treatment (typically corresponding to a policy or a subsidy in the economic context) on a final outcome, when the treatment assignment at a given occasion depends on the sequence of previous assignments as well as on time-varying confounders. In this case, a popular modeling strategy is represented by Marginal Structural Models; within this approach, the causal effect of the treatment is estimated by the Inverse Probability Weighting (IPW) estimator, which is consistent provided that all the confounders are observed (sequential ignorability). To alleviate this serious limitation, we propose a new estimator, called Latent Class Inverse Probability Weighting (LC-IPW), which is based on two steps: first, a finite mixture model is fitted in order to compute latent-class-specific weights; then, these weights are used to fit the Marginal Structural Model of interest. A simulation study shows that the LC-IPW estimator outperforms the IPW estimator for all the considered configurations, even in cases of no unobserved confounding. The proposed approach is applied to the estimation of the causal effect of wage subsidies on employment, using a dataset of Finnish firms observed for eight years. The LC-IPW estimate confirms the existence of a positive effect, but its magnitude is nearly halved with respect to the IPW estimate, pointing out the substantial role of unobserved confounding in this setting.

15 May 2012
TL;DR: In this article, the causal effect of a sequential treatment can be assessed via a MarginalStructural Model fitted by the Inverse ProbabilityWeighted (IPW) estimator, ex-tend the estimator to account for unobserved pre-treatment confounders, representing them by a discrete latent variable.
Abstract: The causal effect of a sequential treatment can be assessed via a MarginalStructural Model fitted by the Inverse ProbabilityWeighted (IPW) estimator.We ex-tend the IPW estimator to account for unobserved pre-treatment confounders, rep-resenting them by a discrete latent variable. Therefore, we estimate the probabilityof receiving the treatment by a latent class model, deriving a Latent Class InverseProbability Weighted (LC-IPW) estimator. In a simulation study the new estimatoroutperforms the standard estimator even when there is no unobserved confound-ing. The proposed approach is applied to the estimation of causal effects of wagesubsidies on employment, using a dataset of Finnish firms observed for eight years.

Posted Content
TL;DR: An approach based on nested hidden (latent) Markov chains, which are associated to every sample unit and to every cluster, which allows us to account for the forms of unobserved heterogeneity in a dynamic fashion and for the correlation which may arise between the responses provided by the units belonging to the same cluster.
Abstract: In the context of multilevel longitudinal data, where sample units are collected in clusters, an important aspect that should be accounted for is the unobserved heterogeneity between sample units and between clusters. For this aim we propose an approach based on nested hidden (latent) Markov chains, which are associated to every sample unit and to every cluster. The approach allows us to account for the mentioned forms of unobserved heterogeneity in a dynamic fashion; it also allows us to account for the correlation which may arise between the responses provided by the units belonging to the same cluster. Given the complexity in computing the manifest distribution of these response variables, we make inference on the proposed model through a composite likelihood function based on all the possible pairs of subjects within every cluster. The proposed approach is illustrated through an application to a dataset concerning a sample of Italian workers in which a binary response variable for the worker receiving an illness benefit was repeatedly observed.

Posted Content
01 Jan 2012
TL;DR: In this article, the authors proposed an algorithm for item selection based on the Latent Class (LC) model, which is aimed at finding the smallest subset of items which provides an amount of information close to that of the initial set.
Abstract: The evaluation of nursing homes and the assessment of the quality of the health care provided to their patients are usually based on the administration of questionnaires made of a large number of polytomous items In applications involving data collected by questionnaires of this type, the Latent Class (LC) model represents a useful tool for classifying subjects in homogenous groups In this paper, we propose an algorithm for item selection, which is based on the LC model The proposed algorithm is aimed at finding the smallest subset of items which provides an amount of information close to that of the initial set The method sequentially eliminates the items that do not significantly change the classification of the subjects in the sample with respect to the classification based on the full set of items The LC model, and then the item selection algorithm, may be also used with missing responses that are dealt with assuming a form of latent ignorability The potentialities of the proposed approach are illustrated through an application to a nursing home dataset collected within the ULISSE project, which concerns the quality-of-life of elderly patients hosted in Italian nursing homes The dataset presents several issues, such as missing responses and a very large number of items included in the questionnaire

Posted Content
TL;DR: In this article, a causal analysis of the mother's educational level on the health status of the newborn, in terms of gestational weeks and weight, is presented, based on a finite mixture structural equation model, the parameters of which have a causal interpretation.
Abstract: We propose a causal analysis of the mother’s educational level on the health status of the newborn, in terms of gestational weeks and weight. The analysis is based on a finite mixture structural equation model, the parameters of which have a causal interpretation. The model is applied to a dataset of almost ten thausand deliveries collected in an Italian region. The analysis confirms that standard regression overestimates the impact of education on the child health. With respect to the current economic literature, our findings indicate that only high education has positive consequences on child health, implying that policy efforts in education should have benefits for welfare.

Posted Content
TL;DR: In this article, the causal effect of a sequential binary treatment on a final outcome was investigated, where the treatment assignment at a given time depends on the sequence of previous assignments as well as on time-varying confounders.
Abstract: We consider estimation of the causal effect of a sequential binary treatment (typically corresponding to a policy or a subsidy in the economic context) on a final outcome, when the treatment assignment at a given occasion depends on the sequence of previous assignments as well as on time-varying confounders In this case, a popular modeling strategy is represented by Marginal Structural Models; within this approach, the causal effect of the treatment is estimated by the Inverse Probability Weighting (IPW) estimator, which is consistent provided that all the confounders are observed (sequential ignorability) To alleviate this serious limitation, we propose a new estimator, called Latent Class Inverse Probability Weighting (LC-IPW), which is based on two steps: first, a finite mixture model is fitted in order to compute latent-class-specific weights; then, these weights are used to fit the Marginal Structural Model of interest A simulation study shows that the LC-IPW estimator outperforms the IPW estimator for all the considered configurations, even in cases of no unobserved confounding The proposed approach is applied to the estimation of the causal effect of wage subsidies on employment, using a dataset of Finnish firms observed for eight years The LC-IPW estimate confirms the existence of a positive effect, but its magnitude is nearly halved with respect to the IPW estimate, pointing out the substantial role of unobserved confounding in this setting

Posted Content
TL;DR: The observed information matrix of hidden Markov models is derived by the application of the Oakes (1999)'s identity by requiring the first derivative of the forward-backward recursions of Baum and Welch (1970) instead of the second derivative within the approach of Lystig and Hughes (2002).
Abstract: We derive the observed information matrix of hidden Markov models by the application of the Oakes (1999)'s identity. The method only requires the first derivative of the forward-backward recursions of Baum and Welch (1970), instead of the second derivative of the forward recursion, which is required within the approach of Lystig and Hughes (2002). The method is illustrated by an example based on the analysis of a longitudinal dataset which is well known in sociology.

Journal ArticleDOI
TL;DR: In this article, a causal analysis of the mother's educational level on the health status of the newborn, in terms of gestational weeks and weight, is presented, based on a finite mixture structural equation model, the parameters of which have a causal interpretation.
Abstract: We propose a causal analysis of the mother’s educational level on the health status of the newborn, in terms of gestational weeks and weight. The analysis is based on a finite mixture structural equation model, the parameters of which have a causal interpretation. The model is applied to a dataset of almost ten thausand deliveries collected in an Italian region. The analysis confirms that standard regression overestimates the impact of education on the child health. With respect to the current economic literature, our findings indicate that only high education has positive consequences on child health, implying that policy efforts in education should have benefits for welfare.

Posted Content
TL;DR: This paper investigates if assumptions of unidimensionality and absence of Di erential Item Functioning hold for two national tests administered in Italy to middle school students in June 2009: the Italian Test and the Mathematics Test.
Abstract: Within the educational context, students’ assessment tests are routinely vali-dated through Item Response Theory (IRT) models which assume unidimensionalityand absence of Di erential Item Functioning (DIF). In this paper, we investigate ifsuch assumptions hold for two national tests administered in Italy to middle schoolstudents in June 2009: the Italian Test and the Mathematics Test. To this aim, werely on an extended class of multidimensional latent class IRT models characterisedby: (i) a two-parameter logistic parameterisation for the conditional probability ofa correct response, (ii) latent traits represented through a random vector with a dis-crete distribution, and (iii) the inclusion of (uniform) DIF to account for students’gender and geographical area. A classi cation of the items into unidimensionalgroups is also proposed and represented by a dendrogram, which is obtained froma hierarchical clustering algorithm. The results provide evidence for DIF e ects forboth Tests. Besides, the assumption of unidimensionality is strongly rejected forthe Italian Test, whereas it is reasonable for the Mathematics Test.Keywords: EM algorithm; Hierarchical clustering; Item Response Theory; Multi-dimensional latent variable models; Two-parameter logistic parameterisation.

Posted Content
TL;DR: In this paper, the causal effect of a treatment in two-arm experimental studies with possible non-compliance in both treatment and control arms is estimated by a two-step estimator: at the first step the probability that a subject belongs to one of the three subpopulations is estimated on the basis of the available covariates; at the second step the causal effects are estimated through a conditional logistic method, the implementation of which depends on the results from the first stage.
Abstract: Motivated by a study about prompt coronary angiography in myocardial infarction, we propose a method to estimate the causal effect of a treatment in two-arm experimental studies with possible non-compliance in both treatment and control arms. The method is based on a causal model for repeated binary outcomes (before and after the treatment), which includes individual covariates and latent variables for the unobserved heterogeneity between subjects. Moreover, given the type of non-compliance, the model assumes the existence of three subpopulations of subjects: compliers, never-takers, and always-takers. The model is estimated by a two-step estimator: at the first step the probability that a subject belongs to one of the three subpopulations is estimated on the basis of the available covariates; at the second step the causal effects are estimated through a conditional logistic method, the implementation of which depends on the results from the first step. Standard errors for this estimator are computed on the basis of a sandwich formula. The application shows that prompt coronary angiography in patients with myocardial infarction may significantly decrease the risk of other events within the next two years, with a log-odds of about -2. Given that non-compliance is significant for patients being given the treatment because of high risk conditions, classical estimators fail to detect, or at least underestimate, this effect.

Posted Content
TL;DR: A decomposition of the h-index is introduced, which is nowadays the leading criterion to assess the relevance of a scientist in his/her research field, and is illustrated by an application based on data concerning a group of top level economists.
Abstract: I introduce a decomposition of the h-index, which is nowadays the leading criterion to assess the relevance of a scientist in his/her research field. According to the proposed decomposition, the h-index is the product of two indicators, the first of which measures the impact of the scientist on the research community and the second may be seen as a measure of concentration of the citations in correspondence of a reduced number of papers. The decomposition is illustrated by an application based on data concerning a group of top level economists.

Posted Content
TL;DR: In this article, the authors show that the h-index and the g-index may be seen as concentration indices and propose transformations that make them always ranging between two known limits, which correspond to the situation of null concentration and to that of high concentration.
Abstract: We show that the h-index and the g-index, which are commonly used to measure the research productivity of a scientist, may be seen as concentration indices. For these indices we also propose transformations that make them always ranging between two known limits, which correspond to the situation of null concentration and to that of high concentration. The approach is illustrated by an application to data coming from the bank sector in USA.

Posted Content
TL;DR: In this paper, an algorithm for item selection based on the Latent Class (LC) model was proposed to find the smallest subset of items which provides an amount of information close to that of the initial set.
Abstract: The evaluation of nursing homes and the assessment of the quality of the health care provided to their patients are usually based on the administration of questionnaires made of a large number of polytomous items. In applications involving data collected by questionnaires of this type, the Latent Class (LC) model represents a useful tool for classifying subjects in homogenous groups. In this paper, we propose an algorithm for item selection, which is based on the LC model. The proposed algorithm is aimed at finding the smallest subset of items which provides an amount of information close to that of the initial set. The method sequentially eliminates the items that do not significantly change the classification of the subjects in the sample with respect to the classification based on the full set of items. The LC model, and then the item selection algorithm, may be also used with missing responses that are dealt with assuming a form of latent ignorability. The potentialities of the proposed approach are illustrated through an application to a nursing home dataset collected within the ULISSE project, which concerns the quality-of-life of elderly patients hosted in Italian nursing homes. The dataset presents several issues, such as missing responses and a very large number of items included in the questionnaire.