scispace - formally typeset
Search or ask a question

Showing papers on "Ordinal regression published in 2006"


BookDOI
01 Jan 2006
TL;DR: Regression models are frequently used to develop diagnostic, prognostic, and health resource utilization models in clinical, health services, outcomes, pharmacoeconomic, and epidemiologic research, and in a multitude of non-health-related areas.
Abstract: Regression models are frequently used to develop diagnostic, prognostic, and health resource utilization models in clinical, health services, outcomes, pharmacoeconomic, and epidemiologic research, and in a multitude of non-health-related areas. Regression models are also used to adjust for patient heterogeneity in randomized clinical trials, to obtain tests that are more powerful and valid than unadjusted treatment comparisons.

4,211 citations


Journal ArticleDOI
TL;DR: Gologit2 as discussed by the authors is a generalized ordered logit model inspired by Vincent Fu's gologit routine (Stata Technical Bulletin Reprints 8: 160-164).
Abstract: This article describes the gologit2 program for generalized ordered logit models. gologit2 is inspired by Vincent Fu's gologit routine (Stata Technical Bulletin Reprints 8: 160–164) and is backward...

1,805 citations


Journal ArticleDOI
TL;DR: The ordinal logistic regression approach to DIF detection, when combined with IRT ability estimates, provides a reasonable alternative for Dif detection.
Abstract: Introduction:We present an ordinal logistic regression model for identification of items with differential item functioning (DIF) and apply this model to a Mini-Mental State Examination (MMSE) dataset. We employ item response theory ability estimation in our models. Three nested ordinal logistic reg

261 citations


Proceedings Article
04 Dec 2006
TL;DR: This framework allows not only to design good Ordinal regression algorithms based on well-tuned binary classification approaches, but also to derive new generalization bounds for ordinal regression from known bounds for binary classification.
Abstract: We present a reduction framework from ordinal regression to binary classification based on extended examples. The framework consists of three steps: extracting extended examples from the original examples, learning a binary classifier on the extended examples with any binary classification algorithm, and constructing a ranking rule from the binary classifier. A weighted 0/1 loss of the binary classifier would then bound the mislabeling cost of the ranking rule. Our framework allows not only to design good ordinal regression algorithms based on well-tuned binary classification approaches, but also to derive new generalization bounds for ordinal regression from known bounds for binary classification. In addition, our framework unifies many existing ordinal regression algorithms, such as perceptron ranking and support vector ordinal regression. When compared empirically on benchmark data sets, some of our newly designed algorithms enjoy advantages in terms of both training speed and generalization performance over existing algorithms, which demonstrates the usefulness of our framework.

254 citations


Journal ArticleDOI
TL;DR: Estimation of conditional logistic regression models for the Health Utilities Index Mark 2 and the SF-6D, using ordinal preference data indicate that ordinal data have the potential to provide useful insights into community health state preferences.

131 citations


Journal ArticleDOI
TL;DR: The authors investigates whether the Braun-Blanquet abundance/dominance (AD) scores that commonly appear in phytosociological tables can properly be analysed by conventional multivariate analysis methods such as Principal Component Analysis and Correspondence Analysis.
Abstract: This article investigates whether the Braun-Blanquet abundance/dominance (AD) scores that commonly appear in phytosociological tables can properly be analysed by conventional multivariate analysis methods such as Principal Components Analysis and Correspondence Analysis. The answer is a definite NO. The source of problems is that the AD values express species performance on a scale, namely the ordinal scale, on which differences are not interpretable. There are several arguments suggesting that no matter which methods have been preferred in contemporary numerical syntaxonomy and why, ordinal data should be treated in an ordinal way. In addition to the inadmissibility of arithmetic operations with the AD scores, these arguments include interpretability of dissimilarities derived from ordinal data, consistency of all steps throughout the analysis and universality of the method which enables simultaneous treatment of various measurement scales. All the ordination methods that are commonly used, for ...

101 citations


Journal ArticleDOI
TL;DR: In this paper, the authors employ a Monte Carlo study to investigate the effects of coarse categorization of dependent variables on power to detect true effects using three classes of regression models: OLS regression, ordinal logistic regression, and ordinal probit regression.
Abstract: Variables that have been coarsely categorized into a small number of ordered categories are often modeled as outcome variables in psychological research. The authors employ a Monte Carlo study to investigate the effects of this coarse categorization of dependent variables on power to detect true effects using three classes of regression models: ordinary least squares (OLS) regression, ordinal logistic regression, and ordinal probit regression. Both the loss of power and the increase in required sample size to regain the lost power are estimated. The loss of power and required sample size increase were substantial under conditions in which the coarsely categorized variable is highly skewed, has few categories (e.g., 2, 3), or both. Ordinal logistic and ordinal probit regression protect marginally better against power loss than does OLS regression.

92 citations


Journal ArticleDOI
TL;DR: A combination of analysis results from both of these models (adjusted SAQ scores and odds ratios) provides the most comprehensive interpretation of the data.

84 citations


Book
11 Dec 2006
TL;DR: In this paper, the authors describe the data file defining the data, creating new variables transforming existing variables, checking data definitions cleaning data, and describing data: Tables graphs OLAP cubes measures of central tendency and dispersion standard scores the normal distribution measures of association Testing simple hypotheses: Basics of hypothesis testing t-tests oneway analysis of variance multiple comparisons nonparametric tests chi square tests correlation partial correlation Building models: Bivariate and multiple linear regression loglinear models discriminant analysis binary logistic regression ordinal regression, factor analysis cluster analysis Using the General Linear Model: Univariate models
Abstract: Contents at a glance: Preparing data for analysis: Introduction to SPSS the data file defining the data creating new variables transforming existing variables checking data definitions cleaning data Describing data: Tables graphs OLAP cubes measures of central tendency and dispersion standard scores the normal distribution measures of association Testing simple hypotheses: Basics of hypothesis testing t-tests oneway analysis of variance multiple comparisons nonparametric tests chi square tests correlation partial correlation Building models: Bivariate and multiple linear regression loglinear models discriminant analysis binary logistic regression ordinal regression, factor analysis cluster analysis Using the General Linear Model: Univariate models multivariate models repeated measures Analyzing scales: Reliability analysis

74 citations


Journal ArticleDOI
TL;DR: In this article, a mixed-effects item response theory model that allows for three-level multivariate ordinal outcomes and accommodates multiple random subject effects is proposed for analysis of multi-dimensional ordinal outcome in longitudinal studies.
Abstract: Summary. A mixed-effects item response theory model that allows for three-level multivariate ordinal outcomes and accommodates multiple random subject effects is proposed for analysis of multivariate ordinal outcomes in longitudinal studies. This model allows for the estimation of different item factor loadings (item discrimination parameters) for the multiple outcomes. The covariates in the model do not have to follow the proportional odds assumption and can be at any level. Assuming either a probit or logistic response function, maximum marginal likelihood estimation is proposed utilizing multidimensional Gauss–Hermite quadrature for integration of the random effects. An iterative Fisher scoring solution, which provides standard errors for all model parameters, is used. An analysis of a longitudinal substance use data set, where four items of substance use behavior (cigarette use, alcohol use, marijuana use, and getting drunk or high) are repeatedly measured over time, is used to illustrate application of the proposed model.

71 citations


Journal ArticleDOI
TL;DR: In this paper, current approaches to regression for ordinal data are reviewed and a new proposal is described which has the advantage of not assuming any latent continuous variable underlying the dependent ordinal variable.

Proceedings ArticleDOI
25 Jun 2006
TL;DR: Empirical studies show that the collaborative ordinal regression model outperforms the individual counterpart in preference learning applications and explores the dependency between ranking functions through a hierarchical Bayesian model and assign a common Gaussian Process prior to all individual functions.
Abstract: Ordinal regression has become an effective way of learning user preferences, but most research focuses on single regression problems. In this paper we introduce collaborative ordinal regression, where multiple ordinal regression tasks are handled simultaneously. Rather than modeling each task individually, we explore the dependency between ranking functions through a hierarchical Bayesian model and assign a common Gaussian Process (GP) prior to all individual functions. Empirical studies show that our collaborative model outperforms the individual counterpart in preference learning applications.

Journal ArticleDOI
TL;DR: In this article a 1-v-1 tri-class Support Vector Machine (SVM) is presented and it is demonstrated that the final machine proposed allows ordinal regression as a form of decomposition procedure.
Abstract: The standard form for dealing with multi-class classification problems when bi-classifiers are used is to consider a two-phase (decomposition, reconstruction) training scheme. The most popular decomposition procedures are pairwise coupling (one versus one, 1-v-1), which considers a learning machine for each Pair of classes, and the one-versus-all scheme (one versus all, 1-v-r), which takes into consideration each class versus the remaining classes. In this article a 1-v-1 tri-class Support Vector Machine (SVM) is presented. The expansion of the architecture of this machine into three categories specifically addresses the decomposition problem of how to prevent the loss of information which occurs in the usual 1-v-1 training procedure. The proposed machine, by means of a third class, allows all the information to be incorporated into the remaining training patterns when a multi-class problem is considered in the form of a 1-v-1 decomposition. Three general structures are presented where each improves some features from the precedent structure. In order to deal with multi-classification problems, it is demonstrated that the final machine proposed allows ordinal regression as a form of decomposition procedure. Examples and experimental results are presented which illustrate the performance of the new tri-class SV machine.

Journal ArticleDOI
TL;DR: Employing binary data generation is shown to be an effective method for simulating ordinal variates for a broad range of given marginals and pairwise associations.
Abstract: A method is described for simulating multivariate ordinal variates with specified marginal distributions and correlation structure. The method relies on simulating correlated binary variates as an intermediate step. After collapsing the ordinal levels to the binary ones, it is straightforward to obtain binary means. Corresponding binary correlations are computed via simulation in a way to ensure that re-conversion to the ordinal scale delivers the original distributional properties. Employing binary data generation is shown to be an effective method for simulating ordinal variates for a broad range of given marginals and pairwise associations.

Journal ArticleDOI
TL;DR: The problem of parameter estimation is solved through a simple pseudolikelihood, called pairwise likelihood, and this inferential methodology is successfully applied to the class of autoregressive ordered probit models.

Journal Article
TL;DR: A thresholded ensemble model for ordinal regression problems and two novel boosting approaches for constructing thresholded ensembles that have comparable performance to SVM-based algorithms, but enjoy the benefit of faster training.
Abstract: We propose a thresholded ensemble model for ordinal regression problems. The model consists of a weighted ensemble of confidence functions and an ordered vector of thresholds. We derive novel large-margin bounds of common error functions, such as the classification error and the absolute error. In addition to some existing algorithms, we also study two novel boosting approaches for constructing thresholded ensembles. Both our approaches not only are simpler than existing algorithms, but also have a stronger connection to the large-margin bounds. In addition, they have comparable performance to SVM-based algorithms, but enjoy the benefit of faster training. Experimental results on benchmark datasets demonstrate the usefulness of our boosting approaches.

Journal ArticleDOI
TL;DR: A new measure of dispersion is described as an indication of consensus and dissention that utilizes a probability distribution and the ordered ranking of categories in an ordinal scale distribution to yield a value confined to the unit interval.
Abstract: This article describes a new measure of dispersion as an indication of consensus and dissention. Building on the generally accepted Shannon entropy, this measure utilizes a probability distribution and the ordered ranking of categories in an ordinal scale distribution to yield a value confined to the unit interval. Unlike other measures that need to be normalized, this measure is always in the interval 0 to 1. The measure is typically applied to the Likert scale to determine degrees of agreement among ordinal-ranked categories when one is dealing with data collection and analysis, although other scales are possible. Using this measure, investigators can easily determine the proximity of ordinal data to consensus (agreement) or dissention. Consensus and dissention are defined relative to the degree of proximity of values constituting a frequency distribution on the ordinal scale measure. The authors identify a set of criteria that a measure must satisfy in order to be an acceptable indicator of consensus and show how the consensus measure satisfies all the criteria.

Journal ArticleDOI
TL;DR: Logistic regression is a method used to model data where the output is binary, nominal or ordinal and its use in modeling data from a business process involving customer feedback is demonstrated.
Abstract: Variation exists in all processes. Significant work has been done to identify and remove sources of variation in manufacturing processes resulting in large returns for companies. However, business process optimization is an area that has a large potential return for a company. Business processes can be difficult to optimize due to the nature of the output variables associated with them. Business processes tend to have output variables that are binary, nominal or ordinal. Examples of these types of output include whether a particular event occurred, a customer's color preference for a new product and survey questions that assess the extent of the survey respondent's agreement with a particular statement. Output variables that are binary, nominal or ordinal cannot be modeled using ordinary least-squares regression. Logistic regression is a method used to model data where the output is binary, nominal or ordinal. This article provides a review of logistic regression and demonstrates its use in modeling data from a business process involving customer feedback. Copyright © 2006 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: This paper suggests an exact method to determine the finite-sample distribution of maximally selected chi-square statistics in this context and applies this method to a new data set describing pregnancy and birth for 811 babies.
Abstract: The association between a binary variable Y and a variable X having an at least ordinal measurement scale might be examined by selecting a cutpoint in the range of X and then performing an association test for the obtained 2 x 2 contingency table using the chi-square statistic. The distribution of the maximally selected chi-square statistic (i.e. the maximal chi-square statistic over all possible cutpoints) under the null-hypothesis of no association between X and Y is different from the known chi-square distribution. In the last decades, this topic has been extensively studied for continuous X variables, but not for non-continuous variables of at least ordinal measurement scale (which include e.g. classical ordinal or discretized continuous variables). In this paper, we suggest an exact method to determine the finite-sample distribution of maximally selected chi-square statistics in this context. This novel approach can be seen as a method to measure the association between a binary variable and variables having an at least ordinal scale of different types (ordinal, discretized continuous, etc). As an illustration, this method is applied to a new data set describing pregnancy and birth for 811 babies.

Journal ArticleDOI
TL;DR: In this paper, the authors present and justify R2O, an explained variation measure for ordinal response models, which is based on a recent ordinal dispersion measure, and compare it with other ordinal R2 measures.
Abstract: No explained variation (R2) measure for ordinal response models enjoys wide use, and such measures have in fact received little specific attention or evaluation To remedy this gap, the author presents and justifies R2O, an explained variation measure for ordinal response models, which is based on a recent ordinal dispersion measure The use and function of R2O is illustrated, and the performance of it and other ordinal R2 measures is compared via a series of simulated sampling and variable selection experiments The R2O and a bias-adjusted version of it perform well in the simulation experiments and have a number of other advantages that make them attractive as measures of fit for any ordinal response model

Book ChapterDOI
09 Sep 2006
TL;DR: The ordinal regression, or preference learning, implements a kernel-defined feature space and an optimization technique by which the margin between rank boundaries is maximized, illustrated on some classical numerical optimization functions using an evolution strategy.
Abstract: Surrogate ranking in evolutionary computation using ordinal regression is introduced. The fitness of individual points is indirectly estimated by modeling their rank. The aim is to reduce the number of costly fitness evaluations needed for evolution. The ordinal regression, or preference learning, implements a kernel-defined feature space and an optimization technique by which the margin between rank boundaries is maximized. The technique is illustrated on some classical numerical optimization functions using an evolution strategy. The benefits of surrogate ranking, compared to surrogates that model the fitness function directly, are discussed.

Journal ArticleDOI
TL;DR: Adjustment for health behaviors and health status appreciably reduced SES influence on SRH, but adjustment for negative emotions did not, and detection of SRH determinants was sensitive to binary versus ordinal SRH definitions.
Abstract: This study evaluated whether negative emotions explain socioeconomic status (SES) stratification of self-rated health (SRH) and whether this putative relation is independent of established SRH determinants. Mood disorders, trait negative affect and health status indices were assessed in a representative cross-sectional survey of 3032 adults in the National Survey of Midlife Development in the United States (MIDUS). Adjustment for health behaviors and health status appreciably reduced SES influence on SRH, but adjustment for negative emotions did not. However, both psychological resources (e.g. social support, extraversion) and negative emotions independently predicted SRH. Detection of SRH determinants was sensitive to binary versus ordinal SRH definitions.

01 Jan 2006
TL;DR: The basic idea of factor analysis is the following as mentioned in this paper : for a given set of manifest variables x 1,...,, x 2, x 3, x 4, x 5, X 6, X 7, X 8, X 9, X 10, X 11, X 12, X 13, X 14, X 15, X 16, X 17, X 18, X 19, X 20, X 21, X 22, X
Abstract: The basic idea of factor analysis is the following. For a given set of manifest variables x1, . . . , xp one wants to find a set of latent variables ξ1, . . . , ξk, fewer in number than the manifest variables, that contain essentially the same information. The latent variables are supposed to account for the dependencies among the manifest variables in the sense that if the latent variables are held fixed, the manifest variables would be independent. Classical factor analysis assumes that both the manifest and the latent variables are continuous variables and is usually carried out by factor analyzing the sample covariance or correlation matrix of the manifest variables. There is a long history of methods for fitting factor models to a correlation or covariance matrix, see e.g., Joreskog (2006)

Journal ArticleDOI
TL;DR: In this paper, generalized estimating equations for correlated repeated ordinal score data are developed assuming a proportional odds model and a working correlation structure based on a first-order autoregressive process, and a new algorithm for the joint estimation of the model regression parameters and the correlation coefficient is developed.
Abstract: Generalized estimating equations for correlated repeated ordinal score data are developed assuming a proportional odds model and a working correlation structure based on a first-order autoregressive process. Repeated ordinal scores on the same experimental units, not necessarily with equally spaced time intervals, are assumed and a new algorithm for the joint estimation of the model regression parameters and the correlation coefficient is developed. Approximate standard errors for the estimated correlation coefficient are developed and a simulation study is used to compare the new methodology with existing methodology. The work was part of a project on post-harvest quality of pot-plants and the generalized estimating equation model is used to analyse data on poinsettia and begonia pot-plant quality deterioration over time. The relationship between the key attributes of plant quality and the quality and longevity of ornamental pot-plants during shelf and after-sales life is explored.

Journal ArticleDOI
16 Aug 2006-Heredity
TL;DR: A multivariate model for ordinal trait analysis is developed and an EM algorithm for parameter estimation is implemented, which turns out to be extremely similar to formulae seen in standard linear model analysis.
Abstract: Many economically important characteristics of agricultural crops are measured as ordinal traits Statistical analysis of the genetic basis of ordinal traits appears to be quite different from regular quantitative traits The generalized linear model methodology implemented via the Newton–Raphson algorithm offers improved efficiency in the analysis of such data, but does not take full advantage of the extensive theory developed in the linear model arena Instead, we develop a multivariate model for ordinal trait analysis and implement an EM algorithm for parameter estimation We also propose a method for calculating the variance-covariance matrix of the estimated parameters The EM equations turn out to be extremely similar to formulae seen in standard linear model analysis Computer simulations are performed to validate the EM algorithm A real data set is analyzed to demonstrate the application of the method The advantages of the EM algorithm over other methods are addressed Application of the method to QTL mapping for ordinal traits is demonstrated using a simulated baclcross (BC) population

Book ChapterDOI
14 Jul 2006
TL;DR: The proposed method is able to handle non-linear ordering on the class and attribute values of classified objects and lies on the boundary between ordinal classification trees, classification trees with monotonicity constraints and multi-relational classification trees.
Abstract: Classification methods commonly assume unordered class values. In many practical applications – for example grading – there is a natural ordering between class values. Furthermore, some attribute values of classified objects can be ordered, too. The standard approach in this case is to convert the ordered values into a numeric quantity and apply a regression learner to the transformed data. This approach can be used just in case of linear ordering. The proposed method for such a classification lies on the boundary between ordinal classification trees, classification trees with monotonicity constraints and multi-relational classification trees. The advantage of the proposed method is that it is able to handle non-linear ordering on the class and attribute values. For the better understanding, we use a toy example from the semantic web environment – prediction of rules for the user's evaluation of hotels.

Journal ArticleDOI
TL;DR: In this article, the ordinal regression method was used to model the relationship between the behavioural outcome variable: consumer overall satisfaction in the food-marketing context and the most discussed marketing constructs such as perceived quality and perceived value.
Abstract: The ordinal regression method was used to model the relationship between the behavioural outcome variable: consumer overall satisfaction in the food-marketing context and the most discussed marketing constructs such as perceived quality and perceived value. Two alternative models were developed in order to lead to a better understanding of consumer satisfaction in the food-marketing context. Two new marketing constructs in the food-marketing literature (perceived technological risk and perceived environmental friendliness) were also included in the alternative models. The research results showed that consumer satisfaction items are better predicted by the ‘third model (III)’. We believe that the final findings of our research try can advance retailers’ strategic tries regarding to consumer strategy management at a store level.

01 Jan 2006
TL;DR: An exact expression is derived for the volume under the ROC surface (VUS) spanned by the true positive rates for each class and its interpretation is shown as the probability that a randomly drawn sequence with one object of each class is correctly ranked.
Abstract: Ordinal regression learning has characteristics of both multi-class classification and metric regression because labels take ordered, discrete values. In applications of ordinal regression, the misclassification cost among the classes often diers and with dierent misclassification costs the common performance measures are not appropriate. Therefore we extend ROC analysis principles to ordinal regression. We derive an exact expression for the volume under the ROC surface (VUS) spanned by the true positive rates for each class and show its interpretation as the probability that a randomly drawn sequence with one object of each class is correctly ranked. Because the computation of V US has a huge time complexity, we also propose three approximations to this measure. Furthermore, the properties of VUS and its relationship with the approximations are analyzed by simulation. The results demonstrate that optimizing various measures will lead to dierent models.

Journal ArticleDOI
TL;DR: In this paper, it is shown that using conventional multivariate procedures for evaluating ordinal data should imply a shift from a metric space to a topological data space; as such the use of ordinal datasets does not represent a serious methodological error, provided that results are interpreted accordingly.
Abstract: In a recent Forum paper, it is argued that, in most studies, ordinal data such as the Braun-Blanquet abundance/dominance scale are not properly treated by multivariate methods. This is because conventional multivariate methods are generally adequate for ratio-scale variables only, while for ordinal variables differences between states and their ratios are not interpreted. Conversely, in this paper it is shown that using conventional multivariate procedures for evaluating ordinal data should imply a shift from a metric space to a topological data space; as such the use of ordinal data does not represent a serious methodological error, provided that results are interpreted accordingly.

Journal ArticleDOI
TL;DR: This work uses a prior to select a limited number of candidate variables to enter the model, applying a popular method with selection indicators, and shows that this approach can induce posterior estimates of the regression functions that are consistently estimating the truth, if the true regression model is sparse.
Abstract: Modern data mining and bioinformatics have presented an important playground for statistical learning techniques, where the number of input variables is possibly much larger than the sample size of the training data. In supervised learning, logistic regression or probit regression can be used to model a binary output and form perceptron classification rules based on Bayesian inference. We use a prior to select a limited number of candidate variables to enter the model, applying a popular method with selection indicators. We show that this approach can induce posterior estimates of the regression functions that are consistently estimating the truth, if the true regression model is sparse in the sense that the aggregated size of the regression coefficients are bounded. The estimated regression functions therefore can also produce consistent classifiers that are asymptotically optimal for predicting future binary outputs. These provide theoretical justifications for some recent empirical successes in microarray data analysis.