scispace - formally typeset
Search or ask a question

Showing papers on "Ordinal regression published in 2001"


Book ChapterDOI
05 Sep 2001
TL;DR: This paper presents a simple method that enables standard classification algorithms to make use of ordering information in class attributes and shows that it outperforms the naive approach, which treats the class values as an unordered set.
Abstract: Machine learning methods for classification problems commonly assume that the class values are unordered. However, in many practical applications the class values do exhibit a natural order--for example, when learning how to grade. The standard approach to ordinal classification converts the class value into a numeric quantity and applies a regression learner to the transformed data, translating the output back into a discrete class value in a post-processing step. A disadvantage of this method is that it can only be applied in conjunction with a regression scheme. In this paper we present a simple method that enables standard classification algorithms to make use of ordering information in class attributes. By applying it in conjunction with a decision tree learner we show that it outperforms the naive approach, which treats the class values as an unordered set. Compared to special-purpose algorithms for ordinal classification our method has the advantage that it can be applied without any modification to the underlying learning scheme.

565 citations


Journal ArticleDOI
TL;DR: Four approaches to factor analysis of ordinal variables which take proper account of Ordinality are described and three of them are compared with respect to parameter estimates and fit and the issue of how to test the model and to measure model fit is discussed.
Abstract: Theory and methodology for exploratory factor analysis have been well developed for continuous variables. In practice, observed or measured variables are often ordinal. However, ordinality is most often ignored and numbers such as 1, 2, 3, 4, representing ordered categories, are treated as numbers having metric properties, a procedure which is incorrect in several ways. In this article we describe four approaches to factor analysis of ordinal variables which take proper account of ordinality and compare three of them with respect to parameter estimates and fit. The comparison is made both in terms of their relative methodological advantages and in terms of an empirical data example and two generated data examples. In particular, we discuss the issue of how to test the model and to measure model fit.

380 citations


Journal ArticleDOI
TL;DR: In this article, item response theory is used to rescale ordinal data to an interval scale, where the differences among values composing the scale are unequal in terms of what is being measured, permitting only a rank ordering of scores.
Abstract: Many statistical procedures used in educational research are described as requiring that dependent variables follow a normal distribution, implying an interval scale of measurement. Despite the desirability of interval scales, many dependent variables possess an ordinal scale of measurement in which the differences among values composing the scale are unequal in terms of what is being measured, permitting only a rank ordering of scores. This means that data possessing an ordinal scale will not satisfy the assumption of normality needed in many statistical procedures and may produce biased statistical results that threaten the validity of inferences. This article shows how the measurement technique known as item response theory can be used to rescale ordinal data to an interval scale. The authors provide examples of rescaling using student performance data and argue that educational researchers should routinely consider rescaling ordinal data using item response theory.

163 citations


Book ChapterDOI
01 Jan 2001
TL;DR: The semiparametric proportional odds model is a direct competitor of ordinary linear models as mentioned in this paper, which is a generalization of the Wilcoxon-Mann-Whitney rank test.
Abstract: Many medical and epidemiologic studies incorporate an ordinal response variable. In some cases an ordinal response Y represents levels of a standard measurement scale such as severity of pain (none, mild, moderate, severe). In other cases, ordinal responses are constructed by specifying a hierarchy of separate endpoints. For example, clinicians may specify an ordering of the severity of several component events and assign patients to the worst event present from among none, heart attack, disabling stroke, and death. Still another use of ordinal response methods is the application of rank-based methods to continuous responses so as to obtain robust inferences. For example, the proportional odds model described later allows for a continuous Y and is really a generalization of the Wilcoxon–Mann–Whitney rank test. Thus the semiparametric proportional odds model is a direct competitor of ordinary linear models.

132 citations


Book ChapterDOI
30 Sep 2001
TL;DR: Preliminary results indicate that this is a promising avenue towards algorithms that combine aspects of classification and regression, and the trade-off between optimal categorical classification accuracy (hit rate) and minimum distance-based error is studied.
Abstract: This paper is devoted to the problem of learning to predict ordinal (i.e., ordered discrete) classes using classification and regression trees. We start with S-CART, a tree induction algorithm, and study various ways of transforming it into a learner for ordinal classification tasks. These algorithm variants are compared on a number of benchmark data sets to verify the relative strengths and weaknesses of the strategies and to study the trade-off between optimal categorical classification accuracy (hit rate) and minimum distance-based error. Preliminary results indicate that this is a promising avenue towards algorithms that combine aspects of classification and regression.

126 citations


Journal ArticleDOI
TL;DR: In this paper, Markov chain Monte Carlo (MCMCMC) algorithms based on the approach of Albert and Chib (1993, Journal of the American Statistical Association88, 669-679) are developed for the fitting of these models.
Abstract: Summary. This paper considers the class of sequential ordinal models in relation to other models for ordinal response data. Markov chain Monte Carlo (MCMC) algorithms, based on the approach of Albert and Chib (1993, Journal of the American Statistical Association88, 669–679), are developed for the fitting of these models. The ideas and methods are illustrated in detail with a real data example on the length of hospital stay for patients undergoing heart surgery. A notable aspect of this analysis is the comparison, based on marginal likelihoods and training sample priors, of several nonnested models, such as the sequential model, the cumulative ordinal model, and Weibull and log-logistic models.

96 citations


Journal ArticleDOI
TL;DR: Practical guidelines for the design and analysis of trials with HRQoL measures as outcomes are presented and conventional parametric methods of estimation and hypothesis testing may not be appropriate for such outcomes.
Abstract: Health Related Quality of Life (HRQoL) measures are becoming more frequently used in clinical trials, as both primary and secondary endpoints. Investigators are now asking statisticians for advice on how to plan (e.g., sample size) and analyze studies using HRQoL measures. HRQoL measures such as the SF-36 are usually measured on an ordered categorical (ordinal) scale. In the designing stages and when analyzing, the scales are often scored and the scores treated as if they were continuous and normally distributed. However the ordinal scaling of HRQoL measures leads to problems in determining sample size, and conventional parametric methods of estimation and hypothesis testing may not be appropriate for such outcomes. We present practical guidelines for the design and analysis of trials with HRQoL measures as outcomes. We used conventional statistical methods (i.e., t-tests and multiple regression), various ordinal regression models (proportional odds, continuation ratio, polytomous and stereotype) and bootstrap methods to analyze an HRQoL dataset. To illustrate the various methods we used HRQoL data on the SF-36 Role Limitations Emotional dimension for two groups of patients with leg ulcers. The bootstrap, t-test, and multiple regression methods gave similar results. The various ordinal regression models also gave similar results. If the HRQoL measure has a large number of ordered categories, most of which are occupied, and the underlying scale really is continuous but measured imperfectly by an instrument with a limited number of discrete values, then an informal rule of thumb is that this discrete scale should be treated as continuous if it has seven or more categories and as ordinal otherwise.

72 citations


Proceedings ArticleDOI
05 Oct 2001
TL;DR: A method that finds these rules and identifies potential errors in data is proposed, and one use for ordinal rules is to identify possible errors inData.
Abstract: A new extension of the Boolean association rules, ordinal association rules, that incorporates ordinal relationships among data items, is introduced. One use for ordinal rules is to identify possible errors in data. A method that finds these rules and identifies potential errors in data is proposed.

47 citations


Journal ArticleDOI
TL;DR: This study shows that NN-DSS models perform significantly better than the Naïve, MDA, and OLGR models on the ECM criteria, and provide better results than MDA and OL GR on other criteria, although not always significantly better.
Abstract: Many accounting and finance problems require ordinal multi-state classification decisions, (e.g., control risk, bond rating, financial distress, etc.), yet few decision support systems are available to aid decision makers in such tasks. In this study, we develop a Neural Network based decision support system (NN-DSS) to classify firms in four ordinal states of financial condition namely healthy, dividend reduction, debt default and bankrupt. The classification results of the NN-DSS model are compared with those of a Naive model, a Multiple Discriminant Analysis (MDA) model, and an Ordinal Logistic Regression (OLGR) model. Four different evaluation criteria are used to compare the models, namely, simple classification accuracy, distance-weighted classification accuracy, expected cost of misclassification (ECM) and ranked probability score. Our study shows that NN-DSS models perform significantly better than the Naive, MDA, and OLGR models on the ECM criteria, and provide better results than MDA and OLGR on other criteria, although not always significantly better. The effect of the proportion of firms of each state in the training set is also studied. A balanced training set leads to more uniform (less skewed) classification across all four states, whereas an unbalanced training set biases the classification results in favor of the state with the largest number of observations.

26 citations


Posted Content
TL;DR: The Stereotype Ordinal Regression (SOR) model as mentioned in this paper is an alternative form of ordinal regression model, which can be thought of as imposing ordering constraints on a multinomial model.
Abstract: There are a number of reasonable approaches to analysing an ordinal outcome variable. One common approach, known as the Proportional Odds (PO) Model, is implemented in Stata as ologit. If the assumptions of the PO model are not satisfied, an alternative is to treat the outcome as categorical, rather than ordinal, and use multinomial logistic regression (mlogit) in Stata. This insert describes an alternative form of ordinal regression model, the Stereotype Ordinal Regression (SOR) Model, which can be thought of as imposing ordering constraints on a multinomial model. The multinomial model provides the best possible fit to the data, at the cost of a large number of parameters which can be difficult to interpret. Stereotype regression aims to reduce the number of parameters by imposing constraints, without reducing the adequacy of the fit.

20 citations


Journal ArticleDOI
TL;DR: The ordinal hierarchical classes model is shown to subsume Coombs and Kao's model for nonmetric factor analysis and an algorithm is described to fit the model to a given data set and is subsequently evaluated in an extensive simulation study.
Abstract: This paper proposes an ordinal generalization of the hierarchical classes model originally proposed by De Boeck and Rosenberg (1998). Any hierarchical classes model implies a decomposition of a two-way two-mode binary arrayM into two component matrices, called bundle matrices, which represent the association relation and the set-theoretical relations among the elements of both modes inM. Whereas the original model restricts the bundle matrices to be binary, the ordinal hierarchical classes model assumes that the bundles are ordinal variables with a prespecified number of values. This generalization results in a classification model with classes ordered along ordinal dimensions. The ordinal hierarchical classes model is shown to subsume Coombs and Kao's (1955) model for nonmetric factor analysis. An algorithm is described to fit the model to a given data set and is subsequently evaluated in an extensive simulation study. An application of the model to student housing data is discussed.

01 Jan 2001
TL;DR: A novel approach to shape similarity estimation based on ordinal correlation is presented, suitable for use in CBIR and produced encouraging results when applied on the MPEG-7 test data.
Abstract: In this paper we present a novel approach to shape similarity estimation based on ordinal correlation. The proposed method operates in three steps: object alignment, contour to multilevel image transformation and similarity evaluation. This approach is suitable for use in CBIR. The proposed technique produced encouraging results when applied on the MPEG-7 test data.

Journal ArticleDOI
TL;DR: The Pearson chi-squared statistic for testing the equality of two multinomial populations when the categories are nominal is much less appropriate for ordinal categories Test statistics typically used in this context are based on scorings of the ordinal levels, but the results of these tests are highly dependent on the choice of scores as mentioned in this paper.
Abstract: The Pearson chi-squared statistic for testing the equality of two multinomial populations when the categories are nominal is much less appropriate for ordinal categories Test statistics typically used in this context are based on scorings of the ordinal levels, but the results of these tests are highly dependent on the choice of scores The authors propose a test which naturally modifies the Pearson chi-squared statistic to incorporate the ordinal information The proposed test statistic does not depend on the scores and under the null hypothesis of equality of populations, it is asymptotically equivalent to the likelihood ratio test against the alternative of two-sided likelihood ratio ordering

Journal ArticleDOI
TL;DR: A random effect ordinal regression model was applied to data sets from two randomized controlled intervention trials that measured graded scale non-independent responses that showed a borderline significant increase in the role-functioning scale scores over the follow-up period.
Abstract: Cluster randomization is often used in intervention trials, yet when individuals nested within clusters are considered as the units of analysis for outcome evaluation, it cannot be assumed that the observations are statistically independent. Observations that are not statistically independent also result when repeated measures are taken over time for the same individual. Ignoring clustered observations when performing data analysis can lead to the erroneous conclusion that the intervention under study had a statistically significant effect. Moreover, individual responses are often collected on ordinal scales; thus models for continuous or categorical data are usually not appropriate. We applied a random effect ordinal regression model to data sets from two randomized controlled intervention trials that measured graded scale non-independent responses. The first trial compared two school programmes for AIDS prevention in terms of impact (i.e., changes in the frequency of condom use). The second trial used the MOS-HIV questionnaire to measure the quality of life of new AIDS cases four times over a one-year follow-up period (only results of the role-functioning scale are reported). Regarding the first data set, the effect of the intervention was not significant, and the post-intervention frequency of condom use was mainly attributable to the pre-intervention frequency (p < 0.01), with no differences among schools. Regarding the second data set, a borderline significant increase in the role-functioning scale scores was observed over the follow-up period; the results differed only slightly by intervention group; a significant (p < 0.01) intra-individual correlation of 0.4 was found.

Book ChapterDOI
19 Sep 2001
TL;DR: This paper addresses the problem of ranking alternatives in a multiple criteria decision making problem by the use of a compensatory aggregation operator, where scores are given on a finite ordinal scale.
Abstract: We present in this paper an attempt to deal with ordinal information in a strict ordinal framework. We address the problem of ranking alternatives in a multiple criteria decision making problem by the use of a compensatory aggregation operator, where scores are given on a finite ordinal scale. Necessary and sufficient conditions for the existence of a representation are given.

Journal ArticleDOI
TL;DR: In this article, the authors compare power curves between 3-and 7-category ordinal logistic regression models in terms of the probability of detecting the treatment effect, assuming a symmetric distribution or skewed distributions for the placebo group.
Abstract: For clinical trials on neurodegenerative diseases such as Parkinson's or Alzheimer's, the distributions of psychometric measures for both placebo and treatment groups are generally skewed because of the characteristics of the diseases. Through an analytical, but computationally intensive, algorithm, we specifically compare power curves between 3- and 7-category ordinal logistic regression models in terms of the probability of detecting the treatment effect, assuming a symmetric distribution or skewed distributions for the placebo group. The proportional odds assumption under the ordinal logistic regression model plays an important role in these comparisons. The results indicate that there is no significant difference in the power curves between 3-category and 7-category response models where a symmetric distribution is assumed for the placebo group. However, when the skewness becomes more extreme for the placebo group, the loss of power can be substantial.


Book ChapterDOI
13 Jun 2001
TL;DR: The ordinal regression problem or ordination is fomulated from the viewpoint of a recently defined learning architecture based on support vectors, the K-SVCR learning machine, specially developed to treat with multiple classes.
Abstract: The ordinal regression problem or ordination have mixed features of both, the classification and the regression problem, so it can be seen as an independent problem class. The particular behaviour of this sort of problem should be explicitly considered by the learning machines working on it. In this paper the ordination problem is fomulated from the viewpoint of a recently defined learning architecture based on support vectors, the K-SVCR learning machine, specially developed to treat with multiple classes. In this study its definition is compared to other existing results in the literature.

Book ChapterDOI
01 Jan 2001
TL;DR: The purpose of this paper is to examine some of the possibilities of constructing and manipulating models with ordinal variables and the kinds of things that can and cannot legitimately be accomplished.
Abstract: Ever since Fisher and Pareto discovered at the end of the nineteenth century that increasing transformations of utility functions have no impact on the consumer demand functions derived from them, and hence that utility could be understood as being ordinal in character, economists have been familiar with ordinal measurement. Today, many other variables like product quality and effort, to name but two, find ordinal expression in economic models. Often, however, these variables have been treated as if they were cardinal variables, and this, in turn, has been shown to create a significant potential for errors and misconceptions (Katzner [5 (this volume, Essay 8)]). Models containing ordinally measured variables really do need to be constructed and manipulated keeping the presence of that ordinality clearly in mind. For such models frequently cannot maintain the same meaning, significance, and explanatory power as those whose variables are all cardinally or ratio calibrated, and the ability to manipulate the ordinal variables in them is often severely limited. Thus it is natural to ask about the kinds of things that can and cannot legitimately be accomplished when constructing and manipulating models with ordinal variables. The purpose of this paper is to examine some of the possibilities.

Journal ArticleDOI
01 Jan 2001
TL;DR: This paper focuses on methods for ordinal categorical data with repeated measures that can be implemented using SAS, and compares the strengths and weaknesses of these different methods.
Abstract: Recent advances in statistical software made possible by the rapid development of computer technology in the past decade have made many new procedures available to data analysts. We focus in this paper on methods for ordinal categorical data with repeated measures that can be implemented using SAS. These procedures are illustrated using data from an animal health experiment. The responses, measured as severity of symptoms on an ordinal scale, are recorded for test animals over time. The experiment was designed to estimate treatment and time effects on the severity of symptoms. The data were analyzed with various approaches using PROC MIXED, PROC NLMIXED, PROC GENMOD, and the GLIMMIX macro. In this paper, we compare the strengths and weaknesses of these different methods.

Book ChapterDOI
28 Nov 2001

Journal ArticleDOI
TL;DR: A new regression model is developed for the analysis of scored ordinal data that enables one to capture and identify nonlinear aspects of the relationship between an ordinal clinical measurement and risk factors.
Abstract: In this paper, we develop new regression models for the analysis of scored ordinal data (i.e. ordinal outcomes where the categories are assigned numeric values). The novel feature of these models is that they enable one to capture and identify nonlinear aspects of the relationship between an ordinal clinical measurement (used for disease diagnosis) and risk factors. These nonlinearities may be useful in generating hypotheses about the risk factor's role in the etiologic process as well as suggesting how to design future studies of the risk factor. We apply our model to study the effects of race, gender, and family history on alcohol dependence among a cohort of lifetime drinkers from the 1992 National Longitudinal Alcohol Epidemiologic Survey.

Journal ArticleDOI
TL;DR: In this article, a latent variable model is considered for the analysis of twin data with an ordinal response, where the underlying latent multivariate normally distributed variable is expressed in terms of genetic and environmental effects, and the variance components associated with these effects are estimated.
Abstract: A latent variable model is considered for the analysis of twin data with an ordinal response. The underlying latent multivariate normally distributed variable is expressed in terms of genetic and environmental effects, and the variance components associated with these effects are estimated. We illustrate this approach with analysis of the NHLBI Twin Study. Model assessment is ascertained by proposing a goodness-of-fit test for ordered categorical data. Extensions of this approach for the investigation of how genetic effects vary over time are discussed.


Journal ArticleDOI
TL;DR: In this paper, some new indices for ordinal data are introduced, which measure the degree of concentration on the "small" or the "large" values of a variable whose level of measurement is ordinal.
Abstract: In this paper, some new indices for ordinal data are introduced. These indices have been developed so as to measure the degree of concentration on the "small" or the "large" values of a variable whose level of measurement is ordinal. Their advantage in relation to other approached is that they ascribe unequal weights to each class of values. Although, they constitute a useful tool in various fields of applications, the focus here is on their use in sample surveys and more specifically in situations where one is interested in taking into account the "distance" of the responses from the "neutral" category in a given question. The properties of these indices are examined and methods for constructing confidence intervals for their actual values are discussed. The performance of these methods is evaluated through an extensive simulation study.

Journal ArticleDOI
TL;DR: Three common factor models are proposed for the analysis of k x k ordinal data arising from test validity or reliability situations, which represent an extension of the polychoric correlation model and item response theory.
Abstract: Three common factor models are proposed for the analysis of k x k ordinal data arising from test validity or reliability situations. These models represent an extension of the polychoric correlation model and item response theory. Identification is complete in the most usual reliability situation, where data from only two indicators (raters) are available. Full maximum likelihood estimation is available together with associated informative deviance tests and goodness-of-fit tests, examples of which are provided.

Posted Content
01 Jan 2001
TL;DR: In this article, the authors used a highly constrained multinomial model to fit a radiographic dataset to predict the severity of joint damage, where the assumptions of the cumulative odds and continuation ratio models were not satisfied.
Abstract: There are a number of methods of analyzing data that consists of several distinct categories, with the categories ordered in some manner. Analysis of such data is commonly based on a generalized linear model of the cumulative response probability, either the cumulative odds model (ologit) or the continuation ratio model (ocratio). However, these models assume a particular relationship between the predictor variables and the outcome. If these assumptions are not met, a multinomial model, which does not make such assumptions, can be fitted instead. This effectively ignores the ordering of the categories. It has the disadvantage that it requires more parameters than the above models, which makes it more difficult to interpret. An alternative model for ordinal data is the stereotype model. This has been little used in the past, as it is quite difficult to fit. It can be thought of as a constrained multinomial model, although some of the constraints applied are nonlinear. An ado-file to fit this model in Stata has recently been developed. I will present analyses of a radiographic dataset, where the aim was to predict the severity of joint damage. All four of the above models were fitted to the data. The assumptions of the cumulative odds and continuation ratio models were not satisfied. A highly constrained stereotype model provided a good fit. Importantly, it showed that different variables were important for discriminating between different levels of the outcome variable.