scispace - formally typeset
Search or ask a question

Showing papers on "Ordinal regression published in 2005"


Journal Article
TL;DR: A probabilistic kernel approach to ordinal regression based on Gaussian processes is presented, where a threshold model that generalizes the probit function is used as the likelihood function for ordinal variables.
Abstract: We present a probabilistic kernel approach to ordinal regression based on Gaussian processes. A threshold model that generalizes the probit function is used as the likelihood function for ordinal variables. Two inference techniques, based on the Laplace approximation and the expectation propagation algorithm respectively, are derived for hyperparameter learning and model selection. We compare these two Gaussian process approaches with a previous ordinal regression method based on support vector machines on some benchmark and real-world data sets, including applications of ordinal regression to collaborative filtering and gene expression analysis. Experimental results on these data sets verify the usefulness of our approach.

475 citations


Proceedings ArticleDOI
07 Aug 2005
TL;DR: Two new support vector approaches for ordinal regression are proposed, which optimize multiple thresholds to define parallel discriminant hyperplanes for the ordinal scales and guarantee that the thresholds are properly ordered at the optimal solution.
Abstract: In this paper, we propose two new support vector approaches for ordinal regression, which optimize multiple thresholds to define parallel discriminant hyperplanes for the ordinal scales. Both approaches guarantee that the thresholds are properly ordered at the optimal solution. The size of these optimization problems is linear in the number of training samples. The SMO algorithm is adapted for the resulting optimization problems; it is extremely easy to implement and scales efficiently as a quadratic function of the number of examples. The results of numerical experiments on benchmark datasets verify the usefulness of these approaches.

302 citations


Journal ArticleDOI
01 Jun 2005-Test
TL;DR: In this paper, a review of methods used for analyzing ordered categorical (ordinal) response variables is presented, with the main emphasis on maximum likelihood inference, although some models (e.g., marginal models, multi-level models) are computationally difficult.
Abstract: This article review methodologies used for analyzing ordered categorical (ordinal) response variables. We begin by surveying models for data with a single ordinal response variable. We also survey recently proposed strategies for modeling ordinal response variables when the data have some type of clustering or when repeated measurement occurs at various occasions for each subject, such as in longitudinal studies. Primary models in that case includemarginal models andcluster-specific (conditional) models for which effects apply conditionally at the cluster level. Related discussion refers to multi-level and transitional models. The main emphasis is on maximum likelihood inference, although we indicate certain models (e.g., marginal models, multi-level models) for which this can be computationally difficult. The Bayesian approach has also received considerable attention for categorical data in the past decade, and we survey recent Bayesian approaches to modeling ordinal response variables. Alternative, non-model-based, approaches are also available for certain types of inference.

261 citations


Book
11 Jul 2005
TL;DR: In this paper, the authors present a model comparison and choice approach for binary and count regression, which is based on the linear regression model and generalized linear models, respectively, for ordinal data.
Abstract: Preface. Chapter 1 Principles of Bayesian Inference. 1.1 Bayesian updating. 1.2 MCMC techniques. 1.3 The basis for MCMC. 1.4 MCMC sampling algorithms. 1.5 MCMC convergence. 1.6 Competing models. 1.7 Setting priors. 1.8 The normal linear model and generalized linear models. 1.9 Data augmentation. 1.10 Identifiability. 1.11 Robustness and sensitivity. 1.12 Chapter themes. References. Chapter 2 Model Comparison and Choice. 2.1 Introduction: formal methods, predictive methods and penalized deviance criteria. 2.2 Formal Bayes model choice. 2.3 Marginal likelihood and Bayes factor approximations. 2.4 Predictive model choice and checking. 2.5 Posterior predictive checks. 2.6 Out-of-sample cross-validation. 2.7 Penalized deviances from a Bayes perspective. 2.8 Multimodel perspectives via parallel sampling. 2.9 Model probability estimates from parallel sampling. 2.10 Worked example. References. Chapter 3 Regression for Metric Outcomes. 3.1 Introduction: priors for the linear regression model. 3.2 Regression model choice and averaging based on predictor selection. 3.3 Robust regression methods: models for outliers. 3.4 Robust regression methods: models for skewness and heteroscedasticity. 3.5 Robustness via discrete mixture models. 3.6 Non-linear regression effects via splines and other basis functions. 3.7 Dynamic linear models and their application in non-parametric regression. Exercises. References. Chapter 4 Models for Binary and Count Outcomes. 4.1 Introduction: discrete model likelihoods vs. data augmentation. 4.2 Estimation by data augmentation: the Albert-Chib method. 4.3 Model assessment: outlier detection and model checks. 4.4 Predictor selection in binary and count regression. 4.5 Contingency tables. 4.6 Semi-parametric and general additive models for binomial and count responses. Exercises. References. Chapter 5 Further Questions in Binomial and Count Regression. 5.1 Generalizing the Poisson and binomial: overdispersion and robustness. 5.2 Continuous mixture models. 5.3 Discrete mixtures. 5.4 Hurdle and zero-inflated models. 5.5 Modelling the link function. 5.6 Multivariate outcomes. Exercises. References. Chapter 6 Random Effect and Latent Variable Models for Multicategory Outcomes. 6.1 Multicategory data: level of observation and relations between categories. 6.2 Multinomial models for individual data: modelling choices. 6.3 Multinomial models for aggregated data: modelling contingency tables. 6.4 The multinomial probit. 6.5 Non-linear predictor effects. 6.6 Heterogeneity via the mixed logit. 6.7 Aggregate multicategory data: the multinomial-Dirichlet model and extensions. 6.8 Multinomial extra variation. 6.9 Latent class analysis. Exercises. References. Chapter 7 Ordinal Regression. 7.1 Aspects and assumptions of ordinal data models. 7.2 Latent scale and data augmentation. 7.3 Assessing model assumptions: non-parametric ordinal regression and assessing ordinality. 7.4 Location-scale ordinal regression. 7.5 Structural interpretations with aggregated ordinal data. 7.6 Log-linear models for contingency tables with ordered categories. 7.7 Multivariate ordered outcomes. Exercises. References. Chapter 8Discrete Spatial Data. 8.1 Introduction. 8.2 Univariate responses: the mixed ICAR model and extensions. 8.3 Spatial robustness. 8.4 Multivariate spatial priors. 8.5 Varying predictor effect models. Exercises. References. Chapter 9 Time Series Models for Discrete Variables. 9.1 Introduction: time dependence in observations and latent data. 9.2 Observation-driven dependence. 9.3 Parameter-driven dependence via DLMs. 9.4 Parameter-driven dependence via autocorrelated error models. 9.5 Integer autoregressive models. 9.6 Hidden Markov models. Exercises. References. Chapter 10 Hierarchical and Panel Data Models 10.1 Introduction: clustered data and general linear mixed models. 10.2 Hierarchical models for metric outcomes. 10.3 Hierarchical generalized linear models. 10.4 Random effects for crossed factors. 10.5 The general linear mixed model for panel data. 10.6 Conjugate panel models. 10.7 Growth curve analysis. 10.8 Multivariate panel data. 10.9 Robustness in panel and clustered data analysis. 10.10 APC and spatio-temporal models. 10.11 Space-time and spatial APC models. Exercises. References. Chapter 11 Missing-Data Models. 11.1 Introduction: types of missing data. 11.2 Density mechanisms for missing data. 11.3 Auxiliary variables. 11.4 Predictors with missing values. 11.5 Multiple imputation. 11.6 Several responses with missing values. 11.7 Non-ignorable non-response models for survey tabulations. 11.8 Recent developments. Exercises. References. Index.

172 citations


Journal ArticleDOI
TL;DR: The possibility of using a method known as ordinal regression to model the probability of correctly classifying a new project to a cost category and is validated with respect to its fitting and predictive accuracy.
Abstract: In the area of software cost estimation, various methods have been proposed to predict the effort or the productivity of a software project. Although most of the proposed methods produce point estimates, in practice it is more realistic and useful for a method to provide interval predictions. In this paper, we explore the possibility of using such a method, known as ordinal regression to model the probability of correctly classifying a new project to a cost category. The proposed method is applied to three data sets and is validated with respect to its fitting and predictive accuracy.

126 citations


Journal ArticleDOI
TL;DR: A method for extracting the whole ordinal information from non-linear time series on the basis of counting ordinal patterns and the concept of permutation entropy is presented.
Abstract: In order to develop fast and robust methods for extracting qualitative information from non-linear time series, Bandt and Pompe have proposed to consider time series from the pure ordinal viewpoint. On the basis of counting ordinal patterns, which describe the up-and-down in a time series, they have introduced the concept of permutation entropy for quantifying the complexity of a system behind a time series. The permutation entropy only provides one detail of the ordinal structure of a time series. Here we present a method for extracting the whole ordinal information.

121 citations


Journal ArticleDOI
TL;DR: In this paper, a mixture of normals prior replaces the usual single multivariate normal model for the latent variables, allowing for varying local dependence structure across the contingency table, and removing the problems related to the choice and resampling of cutoffs defined for these latent variables.
Abstract: This article proposes a probability model for k-dimensional ordinal outcomes, that is, it considers inference for data recorded in k-dimensional contingency tables with ordinal factors. The proposed approach is based on full posterior inference, assuming a flexible underlying prior probability model for the contingency table cell probabilities. We use a variation of the traditional multivariate probit model, with latent scores that determine the observed data. In our model, a mixture of normals prior replaces the usual single multivariate normal model for the latent variables. By augmenting the prior model to a mixture of normals we generalize inference in two important ways. First, we allow for varying local dependence structure across the contingency table. Second, inference in ordinal multivariate probit models is plagued by problems related to the choice and resampling of cutoffs defined for these latent variables. We show how the proposed mixture model approach entirely removes these problems. We ill...

114 citations


Book
01 Jan 2005
TL;DR: Data, models, and multidimensional scaling analysis nature of data analyzed in MDS measurement level of data shape of data conditionality of data missing data multivariate data classical MDS Euclidean model details of CMDS Replicated MDS Weighted MDS geometry of the weighted Euclideans model algebra of the weight-based model matrix algebra ofThe weighted Euclidan model Weirdness index flattened weights.
Abstract: 1. Model Selection Loglinear Analysis Loglinear Modeling Basics. A Two-Way Table. The Saturated Model. Main Effects. Interactions. Examining Parameters in a Saturated Model. Calculating the Missing Parameter Estimates. Testing Hypotheses about Parameters. Fitting an Independence Model. Specifying the Model. Checking Convergence. Chi-Square Goodness-of-Fit Tests. Hierarchical Models. Generating Classes. Selecting a Model. Evaluating Interactions. Testing Individual Terms in the Model. Model Selection Using Backward Elimination. 2. Logit Loglinear Analysis Dichotomous Logit Model. Loglinear Representation. Logit Model. Specifying the Model. Parameter Estimates for the Saturated Logit Model. Unsaturated Logit Model. Specifying the Analysis. Goodness-of-Fit Statistics. Observed and Expected Cell Counts. Parameter Estimates. Measures of Dispersion and Association. Polychotomous Logit Model. Specifying the Model. Goodness of Fit of the Model. Interpreting Parameter Estimates. Examining Residuals. Covariates. Other Logit Models. 3. Multinomial Logistic Regression The Logit Model. Baseline Logit Example. Specifying the Model. Parameter Estimates. Likelihood-Ratio Test for Individual Effects. Likelihood-Ratio Test for the Overall Model. Evaluating the Model. Calculating Predicted Probabilities and Expected Frequencies. Classification Table. Goodness-of-Fit Tests. Examining the Residuals. Pseudo-R-square Measures. Correcting for Overdispersion. Automated Variable Selection. Hierarchical Variable Entry. Specifying the Analysis. Step Output. Likelihood-Ratio Tests for Individual Effects. Matched Case-Control Studies. The Model. Creating the Difference Variables. The Data File. Specifying the Analysis. Examining the Results. 4. Ordinal Regression Fitting an Ordinal Logit Model. Modeling Cumulative Counts. Specifying the Analysis. Parameter Estimates. Testing Parallel Lines. Does the Model Fit? Comparing Observed and Expected Counts. Including Additional Predictor Variables. Overall Model Test. Measuring Strength of Association. Classifying Cases. Generalized Linear Models. Link Function. Fitting a Heteroscedastic Probit Model. Modeling Signal Detection. Fitting a Location-Only Model. Fitting a Scale Parameter. Parameter Estimates. Model-Fitting Information 5. Probit Regression Probit and Logit Response Models. Evaluating Insecticides. Confidence Intervals for Expected Dosages. Comparing Several Groups. Comparing Relative Potencies of the Agents. Estimates of Relative Median Potency Estimating the Natural Response Rate. More than One Stimulus Variable. 6. Kaplan-Meier Survival Analysis SPSS Procedures for Survival Data. Background. Calculating Length of Time. Estimating the Survival Function. Estimating the Conditional Probability. Estimating the Cumulative Probability of Survival. The SPSS Kaplan-Meier Table. Plotting Survival Functions. Comparing Survival Functions. Specifying the Analysis. Comparing Groups. Stratified Comparisons of Survival Functions. 7. Life Tables Background Studying Employment Longevity. The Body of a Life Table. Calculating Survival Probabilities. Assumptions Needed to Use the Life Table. Lost to Follow-up 1 Plotting Survival Functions. Comparing Survival Functions. 8. Cox Regression The Cox Regression Model. The Hazard Function. Proportional Hazards Assumption. Modeling Survival Times. Coding Categorical Variables. Specifying the Analysis. Testing Hypotheses about the Age Coefficient. Interpreting the Regression Coefficient. Baseline Hazard and Cumulative Survival Rates Including Multiple Covariates. The Model with Three Covariates. Global Tests of the Model. Plotting the Estimated Functions Checking the Proportional Hazards Assumption. Stratification. Log-Minus-Log Survival Plot. Identifying Influential Cases. Examining Residuals. Partial (Schoenfeld) Residuals. Martingale Residuals. Selecting Predictor Variables. Variable Selection Methods. An Example of Forward Selection. Omnibus Test of the Model At Each Step. Time-Dependent Covariates. Examining the Data. Specifying a Time-Dependent Covariate. Calculating Segmented Time-Dependent Covariates. Testing the Proportional Hazard Assumption with a Time-Dependent Covariate Fitting a Conditional Logistic Regression Model. The Data File Structure. Specifying the Analysis. Parameter Estimates. 9. Variance Components Examples Factors, Effects, and Models. Types of Factors. Types of Effects. Types of Models. Model for One-Way Classification. Estimation Methods. Negative Variance Estimates. Nested Design Model for Two-Way Classification. Univariate Repeated Measures Analysis Using a Mixed Model Approach Background Information. Model. Distribution Assumptions. Estimation Methods. 10. Linear Mixed Models The Linear Mixed Model. Background. 11. Nonlinear Regression Examples What Is a Nonlinear Model? Transforming Nonlinear Models. Intrinsically Nonlinear Models. Fitting a Logistic Population Growth Model. Estimating a Nonlinear Model. Finding Starting Values. Specifying the Analysis. Approximate Confidence Intervals for the Parameters. Bootstrap Estimates. Estimating Starting Values. Use Starting Values from Previous Analysis. Look for a Linear Approximation. Use Properties of the Nonlinear Model. Solve a System of Equations. Computational Issues. Additional Nonlinear Regression Options. Nonlinear Regression Common Models. Specifying a Segmented Model. 12. Two-Stage Least-Squares Regression Artichoke Data Demand-Price-Income Economic Model. Estimation with Ordinary Least Squares. Feedback and Correlated Errors. Two-Stage Least Squares. Strategy. Stage 1: Estimating Price. Stage 2: Estimating the Model. 2-Stage Least Squares Procedure. 13. Weighted Least-Squares Regression Diagnosing the Problem. Estimating the Weights. Estimating Weights as Powers. Specifying the Analysis. Examining the Log-Likelihood Functions. WLS Solutions. . Estimating Weights from Replicates. Diagnostics from the Linear Regression Procedure. 14. Multidimensional Scaling Data, Models, and Analysis of Multidimensional Scaling. Example: Flying Mileages. The Nature of Data Analyzed in MDS. The Measurement Level of Data. The Shape of Data. The Conditionality of Data. Missing Data. Multivariate Data. Classical MDS. Example: Flying Mileages Revisited. The Euclidean Model. Details of CMDS. Example: Ranked Flying Mileages. Repeated CMDS. Replicated MDS. Details of RMDS. Example: Perceived Body-Part Structure. Weighted MDS. Geometry of the Weighted Euclidean Model. Algebra of the Weighted Euclidean Model. Matrix Algebra of the Weighted Euclidean Model. Details of WMDS. Example: Perceived Body-Part Structure. The Weirdness Index. Flattened Weights.

114 citations


Journal ArticleDOI
TL;DR: E evaluation of the various steps of exploratory data analysis of ordinal ecological data shows that consistency of methodology throughout the study is of primary importance, and the multivariate procedures that are most commonly applied in numerical ecology do not satisfy these requirements and are therefore not recommended.
Abstract: Questions: Are ordinal data appropriately treated by multivariate methods in numerical ecology? If not, what are the most common mistakes? Which dissimilarity coefficients, ordination and classification methods are best suited to ordinal data? Should we worry about such problems at all? Methods: A new classification model family, OrdClAn (Ordinal Cluster Analysis), is suggested for hierarchical and non-hierarchical classifications from ordinal ecological data, e.g. the abundance/dominance scores that are commonly recorded in releves. During the clustering process, the objects are grouped so as to minimize a measure calculated from the ranks of within-cluster and between-cluster distances or dissimilarities. Results and Conclusions: Evaluation of the various steps of exploratory data analysis of ordinal ecological data shows that consistency of methodology throughout the study is of primary importance. In an optimal situation, each methodological step is order invariant. This property ensures that...

83 citations


Proceedings ArticleDOI
10 May 2005
TL;DR: Experimental results indicate that the use of SVM and Ranking SVM can significantly outperform the baseline methods of using heuristic rules or employing the conventional information retrieval method of Okapi, indicating that generic models for definition ranking can be constructed.
Abstract: This paper is concerned with the problem of definition search. Specifically, given a term, we are to retrieve definitional excerpts of the term and rank the extracted excerpts according to their likelihood of being good definitions. This is in contrast to the traditional approaches of either generating a single combined definition or simply outputting all retrieved definitions. Definition ranking is essential for the task. Methods for performing definition ranking are proposed in this paper, which formalize the problem as either classification or ordinal regression. A specification for judging the goodness of a definition is given. We employ SVM as the classification model and Ranking SVM as the ordinal regression model respectively, such that they rank definition candidates according to their likelihood of being good definitions. Features for constructing the SVM and Ranking SVM models are defined. An enterprise search system based on this method has been developed and has been put into practical use. Experimental results indicate that the use of SVM and Ranking SVM can significantly outperform the baseline methods of using heuristic rules or employing the conventional information retrieval method of Okapi. This is true both when the answers are paragraphs and when they are sentences. Experimental results also show that SVM or Ranking SVM models trained in one domain can be adapted to another domain, indicating that generic models for definition ranking can be constructed.

60 citations


Book
30 Dec 2005
TL;DR: In this paper, the authors proposed a multinomial logistic regression model for estimating the probability of survival in a multivariate MDS data set, which is based on the MDS Euclidean model.
Abstract: SPSS 14.0 Advanced Statistical Procedures Companion: Chapters 1. Model Selection in Loglinear Analysis. Model formulation parameters in saturated models hypothesis testing convergence goodness-of-fit tests hierarchical models generating classes model selection with backward elimination. 2. Logit Loglinear Analysis. Dichotomous logit model loglinear representation parameter estimates goodness-of-fit statistics measures of dispersion and association polychotomous logit model interpreting parameters examining residuals introducing covariates. 3. Multinomial Logistic Regression. Baseline logits likelihood-ratio tests for models and individual effects evaluating the model calculating predicted probabilities the classification table goodness-of-fit tests residuals pseudo R-square measures overdispersion model selection matched case-control studies. 4. Ordinal Regression. Modeling cumulative counts parameter estimates testing for parallel lines model fit observed and expected counts measures of strength of association classifying cases link functions fitting a heteroscedastic probit model fitting location and scale parameters. 5. Probit Regression. Probit and logit response models confidence intervals for effective dosages comparing groups comparing relative potencies estimating the natural response rate multiple stimuli. 6. Kaplan-Meier Survival Analysis. Calculating survival time estimating the survival function, the conditional probability of survival, and the cumulative probability of survival plotting survival functions comparing survival functions stratified comparisons. 7. Life Tables. Calculating survival probabilities assumptions observations lost to follow-up plotting survival functions comparing survival functions. 8. Cox Regression. The model proportional hazards assumption coding categorical variables interpreting the regression coefficients baseline hazard and cumulative survival rates global tests of the model checking the proportional hazards assumption stratification log-minus-log survival plot identifying influential cases examining residuals partial (Schoenfeld) residuals martingale residuals variable-selection methods time-dependent covariates specifying a time-dependent covariate calculating segmented time-dependent covariates testing the proportional hazards assumption with a time-dependent covariate fitting a conditional logistic regression model. 9. Variance Components. Factors, effects, and models model for one-way classification estimation methods negative variance estimates nested design model for two-way classification univariate repeated measures analysis using a Mixed Models Approach distribution assumptions estimation methods. 10. Linear Mixed Models. Background Unconditional random-effects models hierarchical models random-coefficient model model with school-level and individual-level covariates three-level hierarchical model repeated measurements selecting a residual covariance structure. 11. Nonlinear Regression. The nonlinear model transforming nonlinear models intrinsically nonlinear models fitting a logistic population growth model finding starting values approximate confidence intervals for the parameters bootstrapped estimates starting values from previous analysis linear approximation computational issues common models for nonlinear regression specifying a segmented model. 12. Two-Stage Least-Squares Regression. Demand-price-income economic model estimation with ordinary least squares feedback and correlated errors estimation with two-stage least squares. 13. Weighted Least-Squares Regression. Diagnosing the problem estimating weights examining the log-likelihood function the WLS solution estimating weights from replicates diagnostics from the linear regression procedure. 14. Multidimensional Scaling. Data, models, and multidimensional scaling analysis nature of data analyzed in MDS measurement level of data shape of data conditionality of data missing data multivariate data classical MDS Euclidean model details of CMDS Replicated MDS Weighted MDS geometry of the weighted Euclidean model algebra of the weighted Euclidean model matrix algebra of the weighted Euclidean model Weirdness index flattened weights.

Journal ArticleDOI
TL;DR: In this article, an approach for correcting for interobserver measurement error in an ordinal logistic regression model taking into account also the variability of the estimated correction terms is presented.
Abstract: Summary. We present an approach for correcting for interobserver measurement error in an ordinal logistic regression model taking into account also the variability of the estimated correction terms. The different scoring behaviour of the 16 examiners complicated the identification of a geographical trend in a recent study on caries experience in Flemish children (Belgium) who were 7 years old. Since the measurement error is on the response the factor 'examiner' could be included in the regression model to correct for its confounding effect. However, controlling for examiner largely removed the geographical east-west trend. Instead, we suggest a (Bayesian) ordinal logistic model which corrects for the scoring error (compared with a gold standard) using a calibration data set. The marginal posterior distribution of the regression parameters of interest is obtained by integrating out the correction terms pertaining to the calibration data set. This is done by processing two Markov chains sequentially, whereby one Markov chain samples the correction terms. The sampled correction term is imputed in the Markov chain pertaining to the regression parameters. The model was fitted to the oral health data of the Signal-Tandmobiel? study. A WinBUGS program was written to perform the analysis.

Proceedings Article
05 Dec 2005
TL;DR: Experiments indicate that the proposed algorithm for learning ranking functions from order constraints between sets—i.e. classes—of training samples is at least as accurate as the current state-of-the-art and several orders of magnitude faster than current methods.
Abstract: We propose efficient algorithms for learning ranking functions from order constraints between sets—i.e. classes—of training samples. Our algorithms may be used for maximizing the generalized Wilcoxon Mann Whitney statistic that accounts for the partial ordering of the classes: special cases include maximizing the area under the ROC curve for binary classification and its generalization for ordinal regression. Experiments on public benchmarks indicate that: (a) the proposed algorithm is at least as accurate as the current state-of-the-art; (b) computationally, it is several orders of magnitude faster and—unlike current methods—it is easily able to handle even large datasets with over 20,000 samples.

Journal ArticleDOI
TL;DR: In this article, a distance for mixed nominal, ordinal and continuous data is developed by applying the Kullback-Leibler divergence to the general mixed-data model, an extension of the general location model that allows for ordinal variables to be incorporated in the model.

Journal ArticleDOI
TL;DR: In this article, a new approach based on the use of a new sample scale obtained by ordering the original variable sample space according to some specific "dominance criteria" fixed on the basis of the monitored process characteristics is presented.
Abstract: The paper presents a new method for statistical process control when ordinal variables are involved. This is the case of a quality characteristic evaluated by an ordinal scale. The method allows a statistical analysis without exploiting an arbitrary numerical conversion of scale levels and without using the traditional sample synthesis operators (sample mean and variance). It consists of a different approach based on the use of a new sample scale obtained by ordering the original variable sample space according to some specific ‘dominance criteria’ fixed on the basis of the monitored process characteristics. Samples are directly reported on the chart and no distributional shape is assumed for the population (universe) of evaluations. Finally, a practical application of the method in the health sector is provided. Copyright © 2005 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: In this paper, a simple model for repeated observations of an ordered categorical response variable which is isotonic over time is introduced, where the measurements represent an irreversible process such that the response at time t is never lower than the response observed at the previous time point t-1.
Abstract: The paper introduces a simple model for repeated observations of an ordered categorical response variable which is isotonic over time. It is assumed that the measurements represent an irreversible process such that the response at time t is never lower than the response observed at the previous time point t-1. Observations of this type occur for example in treatment studies when improvement is measured on an ordinal scale. Since the response at time t depends on the previous outcome, the number of ordered response categories depends on the previous outcome leading to severe problems when simple threshold models for ordered data are used. In order to avoid these problems the isotonic sequential model is introduced. It accounts for the irreversible process by considering the binary transitions to higher scores and allows a parsimonious parameterization. It is shown how the model may easily be estimated by using existing software. Moreover, the model is extended to a random effects version which explicitly takes heterogeneity of individuals and potential correlations into account.

Journal Article
TL;DR: In this article, an alternative, algorithmic rather than an arithmetic, approach is described, which involves the fitting of distributions to the observed survey data using the c 2 statistic, compared to the approach that assumes an interval level scale, and the rescaling approach that uses correspondence analysis.
Abstract: The debate over the appropriate analysis of ordinal level survey data has lasted for many decades. One school of thought maintains that the data can generally be regarded as interval level, whereas another asserts that the data should be rescaled before subjecting it to statistical analysis. In this study, an alternative, algorithmic rather than an arithmetic, approach is described, which involves the fitting of distributions to the observed survey data using the c 2 statistic. This distribution-fitting approach is compared to the approach that assumes an interval level scale, and the rescaling approach that uses correspondence analysis. Using a bootstrap resampling methodology, the analysis confirms that survey results may be flawed if ordinal level scales are assumed to be interval level, or if the correspondence analysis approach is applied inappropriately. The distribution-fitting approach is found to have accuracy and validity that is superior to the alternative approaches.

Reference EntryDOI
15 Jul 2005
TL;DR: The authors provided an overview of generalized Mantel-Haenszel (MH) methods for the analysis of categorical data from factor-response and repeated measures study designs, using data from two different clinical research studies, investigating treatment differences within several different sets of 2 × 2 tables in a clinical trial, and within-subject differences in an ordinal response across ordinal factor levels within a repeated measures design.
Abstract: This article provides an overview of generalized Mantel–Haenszel (MH) methods for the analysis of categorical data from factor–response and repeated measures study designs. These methods are illustrated using data from two different clinical research studies, investigating treatment differences within several different sets of 2 × 2 tables in a clinical trial, and within-subject differences in an ordinal response across ordinal factor levels within a repeated measures design. The underlying multiple hypergeometric probability structure, based on a randomization model framework for hypothesis testing, is summarized for testing alternative hypotheses of 1) general association; 2) mean responses differ; and 3) linear trend in mean responses. These generalized MH methods can all be implemented directly within SAS and StatXact, with appropriate stratification and choice of scores. Keywords: categorical data; ordinal scores; randomization model; hypergeometric probability; repeated measures

Journal ArticleDOI
TL;DR: The proposed Monte Carlo methods for computing a single marginal likelihood or several marginal likelihoods for the purpose of Bayesian model comparisons are motivated by Bayesian variable selection.
Abstract: In this article, we propose new Monte Carlo methods for computing a single marginal likelihood or several marginal likelihoods for the purpose of Bayesian model comparisons. The methods are motivated by Bayesian variable selection, in which the marginal likelihoods for all subset variable models are required to compute. The proposed estimates use only a single Markov chain Monte Carlo (MCMC) output from the joint posterior distribution and it does not require the specific structure or the form of the MCMC sampling algorithm that is used to generate the MCMC sample to be known. The theoretical properties of the proposed method are examined in detail. The applicability and usefulness of the proposed method are demonstrated via ordinal data probit regression models. A real dataset involving ordinal outcomes is used to further illustrate the proposed methodology.

Journal ArticleDOI
TL;DR: In this paper, the authors compare complete case analysis of ordinal data with including multivariate normal imputations and show that using only complete cases is not as good as using multivariate normals.
Abstract: Simulations were used to compare complete case analysis of ordinal data with including multivariate normal imputations. MVN methods of imputation were not as good as using only complete cases. Bias and standard errors were measured against coefficients estimated from logistic regression and a standard data set.

Reference EntryDOI
15 Oct 2005
TL;DR: One of the most frequently used ordinal regression models is the ordinal logistic model, a member of the family of generalized linear models as discussed by the authors, which is based upon the cumulative probabilities for the categories of response variable.
Abstract: Regression models for ordinal data have been developed based upon the cumulative probabilities for the categories of response variable. One of the most frequently used ordinal regression model is the ordinal logistic model, a member of the family of generalized linear models. Keywords: logistic regression; ordinal variable; regression coefficient; odds ratio

Journal ArticleDOI
TL;DR: If there are sufficient outcome levels and/or predictor variables, there may be a number of stereotype models of differing dimension, and this method is illustrated with an example of prediction of damage to joints in rheumatoid arthritis.
Abstract: There are a number of regression models which are widely used to predict ordinal outcomes. The commonly used models assume that all predictor variables have a similar effect at all levels of the outcome variable. If this is not the case, for example if some variables predict susceptibility to a disease and others predict the severity of the disease, then a more complex model is required. One possibility is the multinomial logistic regression model, which assumes that the predictor variables have different effects at all levels of the outcome variable. An alternative is to use the stereotype family of regression models. A one-dimensional stereotype model makes the assumption that the effect of each predictor is the same at all outcome levels. However, it is possible to fit stereotype models with more than one dimension, up to a maximum of min(k-1, p) where k is the number of outcome categories and p is the number of predictor variables. A stereotype model of this maximum dimension is equivalent to a multinomial logistic regression model, in that it will produce the same predicted values and log-likelihood. If there are sufficient outcome levels and/or predictor variables, there may be a number of stereotype models of differing dimension. The method is illustrated with an example of prediction of damage to joints in rheumatoid arthritis.

Proceedings ArticleDOI
Fabio Aiolli1
27 Nov 2005
TL;DR: The preference model introduced in this paper gives a natural framework and a principled solution for a broad class of supervised learning problems with structured predictions, such as predicting orders and instance ranking, and predicting rates.
Abstract: The preference model introduced in this paper gives a natural framework and a principled solution for a broad class of supervised learning problems with structured predictions, such as predicting orders (label and instance ranking), and predicting rates (classification and ordinal regression). We show how all these problems can be cast as linear problems in an augmented space, and we propose an on-line method to efficiently solve them. Experiments on an ordinal regression task confirm the effectiveness of the approach.

Journal ArticleDOI
TL;DR: Rochon's method of sample-size estimation with a repeated binary response to the ordinal case is extended, based on an analysis with generalized estimating equations (GEE) and inference with the Wald test.
Abstract: Correlated ordinal response data often arise in public health studies. Sample-size (power) calculations are a crucial step in designing such studies to ensure an adequate sample to detect a significant effect. Here we extend Rochon's method of sample-size estimation with a repeated binary response to the ordinal case. The proposed sample-size calculations are based on an analysis with generalized estimating equations (GEE) and inference with the Wald test. Simulation results demonstrate the merit of the proposed power calculations. Analysis of an arthritis clinical trial is used for illustration.

Posted Content
TL;DR: In this paper, the behavioural change framework of Ajzen and Fishbein is used to explore whether attitudes towards organic farming, the perceived social pressure of the environment and the perceived feasibility of organic farming standards on the farm determine the willingness of farmers to convert to organic farming methods.
Abstract: In this paper the behavioural change framework of Ajzen and Fishbein is used to explore whether attitudes towards organic farming, the perceived social pressure of the environment and the perceived feasibility of organic farming standards on the farm determine the willingness of farmers to convert to organic farming methods. These variables together with the business and personal objectives and the organic farming information seeking behaviour of the farmer were used in an ordinal regression procedure to predict the intended organic farming conversion behaviour of conventional farmers.

Reference EntryDOI
15 Jul 2005
TL;DR: The proportional hazards model as discussed by the authors is a member of the family of cumulative logistic regression models, designed for studying the effect of covariates on an ordinal responses variable, and its relationships with other members of this family are described, emphasizing the importance of response aggregation.
Abstract: The proportional odds model is one member of the family of cumulative logistic regression models, designed for studying the effect of covariates on an ordinal responses variable. Relationships with other members of this family are described, emphasizing the effect of response aggregation. The connection with latent variable models, dispersion models, continuation-ratio models, and log-linear models is also discussed. Keywords: canonical regression model; extreme-value distribution; latent variable; logistic regression model; proportional-hazards model

Journal ArticleDOI
TL;DR: In this paper, the authors examined discharge-related anxiety in a group of 65 patients resident in five medium secure units located in the South of England and found that the main predictors of a general dischargerelated anxiety scale were low self-esteem and perceived absence of social support, although high trait anxiety also exerted a significant independent effect.
Abstract: This study examines discharge-related anxiety in a group of 65 patients resident in five medium secure units located in the South of England. The study is part of a larger investigation of non-compliance within medium secure unit environments. Participants completed standardised questionnaire measures of self-efficacy, self-esteem, anxiety and locus of control, together with a newly constructed questionnaire investigating anxiety relating to discharge. Results of ordinal regression procedures indicated that the main predictors of a general discharge-related anxiety scale were low self-esteem and perceived absence of social support, although on univariate analysis high trait anxiety also exerted a significant independent effect. The clinical implications of the findings are discussed.

Posted Content
TL;DR: The Gologit2-2- as discussed by the authors program is a user-written program that estimates generalized logistic regression models for ordinal dependent variables, where the actual values taken on by the dependent variable are irrelevant except that larger values are assumed to correspond to "higher" outcomes.
Abstract: -gologit2- is a user-written program that estimates generalized logistic regression models for ordinal dependent variables. The actual values taken on by the dependent variable are irrelevant except that larger values are assumed to correspond to "higher" outcomes. A major strength of -gologit2- is that it can also estimate two special cases of the generalized model: the proportional odds model and the partial proportional odds model. Hence, -gologit2- can estimate models that are less restrictive than the proportional odds/parallel lines models estimated by –ologit- (whose assumptions are often violated) but more parsimonious and interpretable than those estimated by a non-ordinal method, such as multinomial logistic regression. The –autofit- option greatly simplifies the process of identifying partial proportional odds models that fit the data. Two alternative but equivalent parameterizations of the model that have appeared in the literature are both supported. Other key advantages of -gologit2- include support for linear constraints, Stata 8.2 survey data (svy) estimation, and the computation of estimated probabilities via the –predict- command. -gologit2- is inspired by Vincent Fu’s –gologit- program and is backward compatible with it but offers several additional powerful options.

Posted Content
TL;DR: Gologit2 as mentioned in this paper is a generalized ordered logit model for ordinal dependent variables that can estimate models that are less restrictive than the proportional odds/parallel lines models estimated by ologit but more parsimonious and interpretable than those estimated by a non-ordinal method, such as multinomial logistic regression.
Abstract: gologit2 estimates generalized ordered logit models for ordinal dependent variables. A major strength of gologit2 is that it can also estimate three special cases of the generalized model: the proportional odds/parallel lines model, the partial proportional odds model, and the logistic regression model. Hence, gologit2 can estimate models that are less restrictive than the proportional odds /parallel lines models estimated by ologit (whose assumptions are often violated) but more parsimonious and interpretable than those estimated by a non-ordinal method, such as multinomial logistic regression (i.e. mlogit). The svy: prefix, as well as factor variables and post-estimation commands such as margins, are supported. Other key strengths of gologit2 include options for linear constraints, alternative model parameterizations, automated model fitting, alternative link functions (logit, probit, complementary log-log, log-log & cauchit), and the computation of estimated probabilities via the predict command. gologit2 works under Stata 11.2 or higher. Those with older versions of Stata should use gologit29 instead. gologit2 is inspired by Vincent Fu's gologit program and is backward compatible with both it and gologit29 but offers several additional powerful options.

Proceedings ArticleDOI
28 Nov 2005
TL;DR: This paper proposes a multi- class classification algorithm based on ordinal regression algorithm using 3-class classification that is similar to algorithm K-SVCR and algorithm nu-K-VCR, but it includes fewer parameters.
Abstract: Multi-class classification is an important and on-going research subject in machine learning. In this paper, we propose a multi-class classification algorithm based on ordinal regression algorithm using 3-class classification. This algorithm is similar to algorithm K-SVCR and algorithm nu-K-SVCR, but it includes fewer parameters. Another advantage of our algorithm is that, for the K-class classification problem, our algorithm can be extended to using p-class classification with 2 les p les K. Numerical experiments on artificial data sets and benchmark data sets show that the algorithm is reasonable and effective