Showing papers on "Ordinal regression published in 2005"

PDF

Open Access

Journal Article•

Gaussian Processes for Ordinal Regression

[...]

01 Dec 2005-Journal of Machine Learning Research

TL;DR: A probabilistic kernel approach to ordinal regression based on Gaussian processes is presented, where a threshold model that generalizes the probit function is used as the likelihood function for ordinal variables.

...read moreread less

Abstract: We present a probabilistic kernel approach to ordinal regression based on Gaussian processes. A threshold model that generalizes the probit function is used as the likelihood function for ordinal variables. Two inference techniques, based on the Laplace approximation and the expectation propagation algorithm respectively, are derived for hyperparameter learning and model selection. We compare these two Gaussian process approaches with a previous ordinal regression method based on support vector machines on some benchmark and real-world data sets, including applications of ordinal regression to collaborative filtering and gene expression analysis. Experimental results on these data sets verify the usefulness of our approach.

...read moreread less

475 citations

Proceedings Article•DOI•

New approaches to support vector ordinal regression

[...]

Wei Chu¹, S. Sathiya Keerthi²•Institutions (2)

University College London¹, Yahoo!²

07 Aug 2005

TL;DR: Two new support vector approaches for ordinal regression are proposed, which optimize multiple thresholds to define parallel discriminant hyperplanes for the ordinal scales and guarantee that the thresholds are properly ordered at the optimal solution.

...read moreread less

Abstract: In this paper, we propose two new support vector approaches for ordinal regression, which optimize multiple thresholds to define parallel discriminant hyperplanes for the ordinal scales. Both approaches guarantee that the thresholds are properly ordered at the optimal solution. The size of these optimization problems is linear in the number of training samples. The SMO algorithm is adapted for the resulting optimization problems; it is extremely easy to implement and scales efficiently as a quadratic function of the number of examples. The results of numerical experiments on benchmark datasets verify the usefulness of these approaches.

...read moreread less

302 citations

Journal Article•DOI•

The analysis of ordered categorical data: An overview and a survey of recent developments

[...]

Ivy Liu¹, Alan Agresti²•Institutions (2)

Victoria University of Wellington¹, University of Florida²

01 Jun 2005-Test

TL;DR: In this paper, a review of methods used for analyzing ordered categorical (ordinal) response variables is presented, with the main emphasis on maximum likelihood inference, although some models (e.g., marginal models, multi-level models) are computationally difficult.

...read moreread less

Abstract: This article review methodologies used for analyzing ordered categorical (ordinal) response variables. We begin by surveying models for data with a single ordinal response variable. We also survey recently proposed strategies for modeling ordinal response variables when the data have some type of clustering or when repeated measurement occurs at various occasions for each subject, such as in longitudinal studies. Primary models in that case includemarginal models andcluster-specific (conditional) models for which effects apply conditionally at the cluster level. Related discussion refers to multi-level and transitional models. The main emphasis is on maximum likelihood inference, although we indicate certain models (e.g., marginal models, multi-level models) for which this can be computationally difficult. The Bayesian approach has also received considerable attention for categorical data in the past decade, and we survey recent Bayesian approaches to modeling ordinal response variables. Alternative, non-model-based, approaches are also available for certain types of inference.

...read moreread less

261 citations

Book•

Bayesian Models for Categorical Data

[...]

Peter Congdon

11 Jul 2005

TL;DR: In this paper, the authors present a model comparison and choice approach for binary and count regression, which is based on the linear regression model and generalized linear models, respectively, for ordinal data.

...read moreread less

Abstract: Preface. Chapter 1 Principles of Bayesian Inference. 1.1 Bayesian updating. 1.2 MCMC techniques. 1.3 The basis for MCMC. 1.4 MCMC sampling algorithms. 1.5 MCMC convergence. 1.6 Competing models. 1.7 Setting priors. 1.8 The normal linear model and generalized linear models. 1.9 Data augmentation. 1.10 Identifiability. 1.11 Robustness and sensitivity. 1.12 Chapter themes. References. Chapter 2 Model Comparison and Choice. 2.1 Introduction: formal methods, predictive methods and penalized deviance criteria. 2.2 Formal Bayes model choice. 2.3 Marginal likelihood and Bayes factor approximations. 2.4 Predictive model choice and checking. 2.5 Posterior predictive checks. 2.6 Out-of-sample cross-validation. 2.7 Penalized deviances from a Bayes perspective. 2.8 Multimodel perspectives via parallel sampling. 2.9 Model probability estimates from parallel sampling. 2.10 Worked example. References. Chapter 3 Regression for Metric Outcomes. 3.1 Introduction: priors for the linear regression model. 3.2 Regression model choice and averaging based on predictor selection. 3.3 Robust regression methods: models for outliers. 3.4 Robust regression methods: models for skewness and heteroscedasticity. 3.5 Robustness via discrete mixture models. 3.6 Non-linear regression effects via splines and other basis functions. 3.7 Dynamic linear models and their application in non-parametric regression. Exercises. References. Chapter 4 Models for Binary and Count Outcomes. 4.1 Introduction: discrete model likelihoods vs. data augmentation. 4.2 Estimation by data augmentation: the Albert-Chib method. 4.3 Model assessment: outlier detection and model checks. 4.4 Predictor selection in binary and count regression. 4.5 Contingency tables. 4.6 Semi-parametric and general additive models for binomial and count responses. Exercises. References. Chapter 5 Further Questions in Binomial and Count Regression. 5.1 Generalizing the Poisson and binomial: overdispersion and robustness. 5.2 Continuous mixture models. 5.3 Discrete mixtures. 5.4 Hurdle and zero-inflated models. 5.5 Modelling the link function. 5.6 Multivariate outcomes. Exercises. References. Chapter 6 Random Effect and Latent Variable Models for Multicategory Outcomes. 6.1 Multicategory data: level of observation and relations between categories. 6.2 Multinomial models for individual data: modelling choices. 6.3 Multinomial models for aggregated data: modelling contingency tables. 6.4 The multinomial probit. 6.5 Non-linear predictor effects. 6.6 Heterogeneity via the mixed logit. 6.7 Aggregate multicategory data: the multinomial-Dirichlet model and extensions. 6.8 Multinomial extra variation. 6.9 Latent class analysis. Exercises. References. Chapter 7 Ordinal Regression. 7.1 Aspects and assumptions of ordinal data models. 7.2 Latent scale and data augmentation. 7.3 Assessing model assumptions: non-parametric ordinal regression and assessing ordinality. 7.4 Location-scale ordinal regression. 7.5 Structural interpretations with aggregated ordinal data. 7.6 Log-linear models for contingency tables with ordered categories. 7.7 Multivariate ordered outcomes. Exercises. References. Chapter 8Discrete Spatial Data. 8.1 Introduction. 8.2 Univariate responses: the mixed ICAR model and extensions. 8.3 Spatial robustness. 8.4 Multivariate spatial priors. 8.5 Varying predictor effect models. Exercises. References. Chapter 9 Time Series Models for Discrete Variables. 9.1 Introduction: time dependence in observations and latent data. 9.2 Observation-driven dependence. 9.3 Parameter-driven dependence via DLMs. 9.4 Parameter-driven dependence via autocorrelated error models. 9.5 Integer autoregressive models. 9.6 Hidden Markov models. Exercises. References. Chapter 10 Hierarchical and Panel Data Models 10.1 Introduction: clustered data and general linear mixed models. 10.2 Hierarchical models for metric outcomes. 10.3 Hierarchical generalized linear models. 10.4 Random effects for crossed factors. 10.5 The general linear mixed model for panel data. 10.6 Conjugate panel models. 10.7 Growth curve analysis. 10.8 Multivariate panel data. 10.9 Robustness in panel and clustered data analysis. 10.10 APC and spatio-temporal models. 10.11 Space-time and spatial APC models. Exercises. References. Chapter 11 Missing-Data Models. 11.1 Introduction: types of missing data. 11.2 Density mechanisms for missing data. 11.3 Auxiliary variables. 11.4 Predictors with missing values. 11.5 Multiple imputation. 11.6 Several responses with missing values. 11.7 Non-ignorable non-response models for survey tabulations. 11.8 Recent developments. Exercises. References. Index.

...read moreread less

172 citations

Journal Article•DOI•

Software productivity and effort prediction with ordinal regression

[...]

Panagiotis Sentas¹, Lefteris Angelis¹, Ioannis Stamelos¹, Georgios L. Bleris¹•Institutions (1)

Aristotle University of Thessaloniki¹

01 Jan 2005-Information & Software Technology

TL;DR: The possibility of using a method known as ordinal regression to model the probability of correctly classifying a new project to a cost category and is validated with respect to its fitting and predictive accuracy.

...read moreread less

Abstract: In the area of software cost estimation, various methods have been proposed to predict the effort or the productivity of a software project. Although most of the proposed methods produce point estimates, in practice it is more realistic and useful for a method to provide interval predictions. In this paper, we explore the possibility of using such a method, known as ordinal regression to model the probability of correctly classifying a new project to a cost category. The proposed method is applied to three data sets and is validated with respect to its fitting and predictive accuracy.

...read moreread less

126 citations

Journal Article•DOI•

Ordinal analysis of time series

[...]

Karsten Keller, Mathieu Sinn

01 Oct 2005-Physica A-statistical Mechanics and Its Applications

TL;DR: A method for extracting the whole ordinal information from non-linear time series on the basis of counting ordinal patterns and the concept of permutation entropy is presented.

...read moreread less

Abstract: In order to develop fast and robust methods for extracting qualitative information from non-linear time series, Bandt and Pompe have proposed to consider time series from the pure ordinal viewpoint. On the basis of counting ordinal patterns, which describe the up-and-down in a time series, they have introduced the concept of permutation entropy for quantifying the complexity of a system behind a time series. The permutation entropy only provides one detail of the ordinal structure of a time series. Here we present a method for extracting the whole ordinal information.

...read moreread less

121 citations

Journal Article•DOI•

Nonparametric Bayesian Modeling for Multivariate Ordinal Data

[...]

Athanasios Kottas¹, Peter Müller¹, Fernando A. Quintana¹•Institutions (1)

University of California, Santa Cruz¹

01 Sep 2005-Journal of Computational and Graphical Statistics

TL;DR: In this paper, a mixture of normals prior replaces the usual single multivariate normal model for the latent variables, allowing for varying local dependence structure across the contingency table, and removing the problems related to the choice and resampling of cutoffs defined for these latent variables.

...read moreread less

Abstract: This article proposes a probability model for k-dimensional ordinal outcomes, that is, it considers inference for data recorded in k-dimensional contingency tables with ordinal factors. The proposed approach is based on full posterior inference, assuming a flexible underlying prior probability model for the contingency table cell probabilities. We use a variation of the traditional multivariate probit model, with latent scores that determine the observed data. In our model, a mixture of normals prior replaces the usual single multivariate normal model for the latent variables. By augmenting the prior model to a mixture of normals we generalize inference in two important ways. First, we allow for varying local dependence structure across the contingency table. Second, inference in ordinal multivariate probit models is plagued by problems related to the choice and resampling of cutoffs defined for these latent variables. We show how the proposed mixture model approach entirely removes these problems. We ill...

...read moreread less

114 citations

Book•

SPSS 13.0 advanced statistical procedures companion

[...]

M. J. Norušis

01 Jan 2005

TL;DR: Data, models, and multidimensional scaling analysis nature of data analyzed in MDS measurement level of data shape of data conditionality of data missing data multivariate data classical MDS Euclidean model details of CMDS Replicated MDS Weighted MDS geometry of the weighted Euclideans model algebra of the weight-based model matrix algebra ofThe weighted Euclidan model Weirdness index flattened weights.

...read moreread less

Abstract: 1. Model Selection Loglinear Analysis Loglinear Modeling Basics. A Two-Way Table. The Saturated Model. Main Effects. Interactions. Examining Parameters in a Saturated Model. Calculating the Missing Parameter Estimates. Testing Hypotheses about Parameters. Fitting an Independence Model. Specifying the Model. Checking Convergence. Chi-Square Goodness-of-Fit Tests. Hierarchical Models. Generating Classes. Selecting a Model. Evaluating Interactions. Testing Individual Terms in the Model. Model Selection Using Backward Elimination. 2. Logit Loglinear Analysis Dichotomous Logit Model. Loglinear Representation. Logit Model. Specifying the Model. Parameter Estimates for the Saturated Logit Model. Unsaturated Logit Model. Specifying the Analysis. Goodness-of-Fit Statistics. Observed and Expected Cell Counts. Parameter Estimates. Measures of Dispersion and Association. Polychotomous Logit Model. Specifying the Model. Goodness of Fit of the Model. Interpreting Parameter Estimates. Examining Residuals. Covariates. Other Logit Models. 3. Multinomial Logistic Regression The Logit Model. Baseline Logit Example. Specifying the Model. Parameter Estimates. Likelihood-Ratio Test for Individual Effects. Likelihood-Ratio Test for the Overall Model. Evaluating the Model. Calculating Predicted Probabilities and Expected Frequencies. Classification Table. Goodness-of-Fit Tests. Examining the Residuals. Pseudo-R-square Measures. Correcting for Overdispersion. Automated Variable Selection. Hierarchical Variable Entry. Specifying the Analysis. Step Output. Likelihood-Ratio Tests for Individual Effects. Matched Case-Control Studies. The Model. Creating the Difference Variables. The Data File. Specifying the Analysis. Examining the Results. 4. Ordinal Regression Fitting an Ordinal Logit Model. Modeling Cumulative Counts. Specifying the Analysis. Parameter Estimates. Testing Parallel Lines. Does the Model Fit? Comparing Observed and Expected Counts. Including Additional Predictor Variables. Overall Model Test. Measuring Strength of Association. Classifying Cases. Generalized Linear Models. Link Function. Fitting a Heteroscedastic Probit Model. Modeling Signal Detection. Fitting a Location-Only Model. Fitting a Scale Parameter. Parameter Estimates. Model-Fitting Information 5. Probit Regression Probit and Logit Response Models. Evaluating Insecticides. Confidence Intervals for Expected Dosages. Comparing Several Groups. Comparing Relative Potencies of the Agents. Estimates of Relative Median Potency Estimating the Natural Response Rate. More than One Stimulus Variable. 6. Kaplan-Meier Survival Analysis SPSS Procedures for Survival Data. Background. Calculating Length of Time. Estimating the Survival Function. Estimating the Conditional Probability. Estimating the Cumulative Probability of Survival. The SPSS Kaplan-Meier Table. Plotting Survival Functions. Comparing Survival Functions. Specifying the Analysis. Comparing Groups. Stratified Comparisons of Survival Functions. 7. Life Tables Background Studying Employment Longevity. The Body of a Life Table. Calculating Survival Probabilities. Assumptions Needed to Use the Life Table. Lost to Follow-up 1 Plotting Survival Functions. Comparing Survival Functions. 8. Cox Regression The Cox Regression Model. The Hazard Function. Proportional Hazards Assumption. Modeling Survival Times. Coding Categorical Variables. Specifying the Analysis. Testing Hypotheses about the Age Coefficient. Interpreting the Regression Coefficient. Baseline Hazard and Cumulative Survival Rates Including Multiple Covariates. The Model with Three Covariates. Global Tests of the Model. Plotting the Estimated Functions Checking the Proportional Hazards Assumption. Stratification. Log-Minus-Log Survival Plot. Identifying Influential Cases. Examining Residuals. Partial (Schoenfeld) Residuals. Martingale Residuals. Selecting Predictor Variables. Variable Selection Methods. An Example of Forward Selection. Omnibus Test of the Model At Each Step. Time-Dependent Covariates. Examining the Data. Specifying a Time-Dependent Covariate. Calculating Segmented Time-Dependent Covariates. Testing the Proportional Hazard Assumption with a Time-Dependent Covariate Fitting a Conditional Logistic Regression Model. The Data File Structure. Specifying the Analysis. Parameter Estimates. 9. Variance Components Examples Factors, Effects, and Models. Types of Factors. Types of Effects. Types of Models. Model for One-Way Classification. Estimation Methods. Negative Variance Estimates. Nested Design Model for Two-Way Classification. Univariate Repeated Measures Analysis Using a Mixed Model Approach Background Information. Model. Distribution Assumptions. Estimation Methods. 10. Linear Mixed Models The Linear Mixed Model. Background. 11. Nonlinear Regression Examples What Is a Nonlinear Model? Transforming Nonlinear Models. Intrinsically Nonlinear Models. Fitting a Logistic Population Growth Model. Estimating a Nonlinear Model. Finding Starting Values. Specifying the Analysis. Approximate Confidence Intervals for the Parameters. Bootstrap Estimates. Estimating Starting Values. Use Starting Values from Previous Analysis. Look for a Linear Approximation. Use Properties of the Nonlinear Model. Solve a System of Equations. Computational Issues. Additional Nonlinear Regression Options. Nonlinear Regression Common Models. Specifying a Segmented Model. 12. Two-Stage Least-Squares Regression Artichoke Data Demand-Price-Income Economic Model. Estimation with Ordinary Least Squares. Feedback and Correlated Errors. Two-Stage Least Squares. Strategy. Stage 1: Estimating Price. Stage 2: Estimating the Model. 2-Stage Least Squares Procedure. 13. Weighted Least-Squares Regression Diagnosing the Problem. Estimating the Weights. Estimating Weights as Powers. Specifying the Analysis. Examining the Log-Likelihood Functions. WLS Solutions. . Estimating Weights from Replicates. Diagnostics from the Linear Regression Procedure. 14. Multidimensional Scaling Data, Models, and Analysis of Multidimensional Scaling. Example: Flying Mileages. The Nature of Data Analyzed in MDS. The Measurement Level of Data. The Shape of Data. The Conditionality of Data. Missing Data. Multivariate Data. Classical MDS. Example: Flying Mileages Revisited. The Euclidean Model. Details of CMDS. Example: Ranked Flying Mileages. Repeated CMDS. Replicated MDS. Details of RMDS. Example: Perceived Body-Part Structure. Weighted MDS. Geometry of the Weighted Euclidean Model. Algebra of the Weighted Euclidean Model. Matrix Algebra of the Weighted Euclidean Model. Details of WMDS. Example: Perceived Body-Part Structure. The Weirdness Index. Flattened Weights.

...read moreread less

114 citations

Journal Article•DOI•

Multivariate exploratory analysis of ordinal data in ecology: Pitfalls, problems and solutions

[...]

János Podani¹•Institutions (1)

Eötvös Loránd University¹

01 Oct 2005-Journal of Vegetation Science

TL;DR: E evaluation of the various steps of exploratory data analysis of ordinal ecological data shows that consistency of methodology throughout the study is of primary importance, and the multivariate procedures that are most commonly applied in numerical ecology do not satisfy these requirements and are therefore not recommended.

...read moreread less

Abstract: Questions: Are ordinal data appropriately treated by multivariate methods in numerical ecology? If not, what are the most common mistakes? Which dissimilarity coefficients, ordination and classification methods are best suited to ordinal data? Should we worry about such problems at all? Methods: A new classification model family, OrdClAn (Ordinal Cluster Analysis), is suggested for hierarchical and non-hierarchical classifications from ordinal ecological data, e.g. the abundance/dominance scores that are commonly recorded in releves. During the clustering process, the objects are grouped so as to minimize a measure calculated from the ranks of within-cluster and between-cluster distances or dissimilarities. Results and Conclusions: Evaluation of the various steps of exploratory data analysis of ordinal ecological data shows that consistency of methodology throughout the study is of primary importance. In an optimal situation, each methodological step is order invariant. This property ensures that...

...read moreread less

83 citations

Proceedings Article•DOI•

Ranking definitions with supervised learning methods

[...]

Jun Xu¹, Yunbo Cao², Hang Li², Min Zhao³•Institutions (3)

Nankai University¹, Microsoft², Chinese Academy of Sciences³

10 May 2005

TL;DR: Experimental results indicate that the use of SVM and Ranking SVM can significantly outperform the baseline methods of using heuristic rules or employing the conventional information retrieval method of Okapi, indicating that generic models for definition ranking can be constructed.

...read moreread less

Abstract: This paper is concerned with the problem of definition search. Specifically, given a term, we are to retrieve definitional excerpts of the term and rank the extracted excerpts according to their likelihood of being good definitions. This is in contrast to the traditional approaches of either generating a single combined definition or simply outputting all retrieved definitions. Definition ranking is essential for the task. Methods for performing definition ranking are proposed in this paper, which formalize the problem as either classification or ordinal regression. A specification for judging the goodness of a definition is given. We employ SVM as the classification model and Ranking SVM as the ordinal regression model respectively, such that they rank definition candidates according to their likelihood of being good definitions. Features for constructing the SVM and Ranking SVM models are defined. An enterprise search system based on this method has been developed and has been put into practical use. Experimental results indicate that the use of SVM and Ranking SVM can significantly outperform the baseline methods of using heuristic rules or employing the conventional information retrieval method of Okapi. This is true both when the answers are paragraphs and when they are sentences. Experimental results also show that SVM or Ranking SVM models trained in one domain can be adapted to another domain, indicating that generic models for definition ranking can be constructed.

...read moreread less

60 citations

Book•

SPSS 14.0 Advanced Statistical Procedures Companion

[...]

Marija Norusis

30 Dec 2005

TL;DR: In this paper, the authors proposed a multinomial logistic regression model for estimating the probability of survival in a multivariate MDS data set, which is based on the MDS Euclidean model.

...read moreread less

Abstract: SPSS 14.0 Advanced Statistical Procedures Companion: Chapters 1. Model Selection in Loglinear Analysis. Model formulation parameters in saturated models hypothesis testing convergence goodness-of-fit tests hierarchical models generating classes model selection with backward elimination. 2. Logit Loglinear Analysis. Dichotomous logit model loglinear representation parameter estimates goodness-of-fit statistics measures of dispersion and association polychotomous logit model interpreting parameters examining residuals introducing covariates. 3. Multinomial Logistic Regression. Baseline logits likelihood-ratio tests for models and individual effects evaluating the model calculating predicted probabilities the classification table goodness-of-fit tests residuals pseudo R-square measures overdispersion model selection matched case-control studies. 4. Ordinal Regression. Modeling cumulative counts parameter estimates testing for parallel lines model fit observed and expected counts measures of strength of association classifying cases link functions fitting a heteroscedastic probit model fitting location and scale parameters. 5. Probit Regression. Probit and logit response models confidence intervals for effective dosages comparing groups comparing relative potencies estimating the natural response rate multiple stimuli. 6. Kaplan-Meier Survival Analysis. Calculating survival time estimating the survival function, the conditional probability of survival, and the cumulative probability of survival plotting survival functions comparing survival functions stratified comparisons. 7. Life Tables. Calculating survival probabilities assumptions observations lost to follow-up plotting survival functions comparing survival functions. 8. Cox Regression. The model proportional hazards assumption coding categorical variables interpreting the regression coefficients baseline hazard and cumulative survival rates global tests of the model checking the proportional hazards assumption stratification log-minus-log survival plot identifying influential cases examining residuals partial (Schoenfeld) residuals martingale residuals variable-selection methods time-dependent covariates specifying a time-dependent covariate calculating segmented time-dependent covariates testing the proportional hazards assumption with a time-dependent covariate fitting a conditional logistic regression model. 9. Variance Components. Factors, effects, and models model for one-way classification estimation methods negative variance estimates nested design model for two-way classification univariate repeated measures analysis using a Mixed Models Approach distribution assumptions estimation methods. 10. Linear Mixed Models. Background Unconditional random-effects models hierarchical models random-coefficient model model with school-level and individual-level covariates three-level hierarchical model repeated measurements selecting a residual covariance structure. 11. Nonlinear Regression. The nonlinear model transforming nonlinear models intrinsically nonlinear models fitting a logistic population growth model finding starting values approximate confidence intervals for the parameters bootstrapped estimates starting values from previous analysis linear approximation computational issues common models for nonlinear regression specifying a segmented model. 12. Two-Stage Least-Squares Regression. Demand-price-income economic model estimation with ordinary least squares feedback and correlated errors estimation with two-stage least squares. 13. Weighted Least-Squares Regression. Diagnosing the problem estimating weights examining the log-likelihood function the WLS solution estimating weights from replicates diagnostics from the linear regression procedure. 14. Multidimensional Scaling. Data, models, and multidimensional scaling analysis nature of data analyzed in MDS measurement level of data shape of data conditionality of data missing data multivariate data classical MDS Euclidean model details of CMDS Replicated MDS Weighted MDS geometry of the weighted Euclidean model algebra of the weighted Euclidean model matrix algebra of the weighted Euclidean model Weirdness index flattened weights.

...read moreread less

Journal Article•DOI•

A Bayesian ordinal logistic regression model to correct for interobserver measurement error in a geographical oral health study

[...]

Samuel Mwalili¹, Emmanuel Lesaffre¹, Dominique Declerck¹•Institutions (1)

Katholieke Universiteit Leuven¹

01 Jan 2005-Journal of The Royal Statistical Society Series C-applied Statistics

TL;DR: In this article, an approach for correcting for interobserver measurement error in an ordinal logistic regression model taking into account also the variability of the estimated correction terms is presented.

...read moreread less

Abstract: Summary. We present an approach for correcting for interobserver measurement error in an ordinal logistic regression model taking into account also the variability of the estimated correction terms. The different scoring behaviour of the 16 examiners complicated the identification of a geographical trend in a recent study on caries experience in Flemish children (Belgium) who were 7 years old. Since the measurement error is on the response the factor 'examiner' could be included in the regression model to correct for its confounding effect. However, controlling for examiner largely removed the geographical east-west trend. Instead, we suggest a (Bayesian) ordinal logistic model which corrects for the scoring error (compared with a gold standard) using a calibration data set. The marginal posterior distribution of the regression parameters of interest is obtained by integrating out the correction terms pertaining to the calibration data set. This is done by processing two Markov chains sequentially, whereby one Markov chain samples the correction terms. The sampled correction term is imputed in the Markov chain pertaining to the regression parameters. The model was fitted to the oral health data of the Signal-Tandmobiel? study. A WinBUGS program was written to perform the analysis.

...read moreread less

Proceedings Article•

Learning Rankings via Convex Hull Separation

[...]

Glenn Fung¹, Romer Rosales¹, Balaji Krishnapuram¹•Institutions (1)

Siemens¹

05 Dec 2005

TL;DR: Experiments indicate that the proposed algorithm for learning ranking functions from order constraints between sets—i.e. classes—of training samples is at least as accurate as the current state-of-the-art and several orders of magnitude faster than current methods.

...read moreread less

Abstract: We propose efficient algorithms for learning ranking functions from order constraints between sets—i.e. classes—of training samples. Our algorithms may be used for maximizing the generalized Wilcoxon Mann Whitney statistic that accounts for the partial ordering of the classes: special cases include maximizing the area under the ROC curve for binary classification and its generalization for ordinal regression. Experiments on public benchmarks indicate that: (a) the proposed algorithm is at least as accurate as the current state-of-the-art; (b) computationally, it is several orders of magnitude faster and—unlike current methods—it is easily able to handle even large datasets with over 20,000 samples.

...read moreread less

Journal Article•DOI•

A generalized Mahalanobis distance for mixed data

[...]

A. R. de Leon¹, Keumhee C. Carriere¹•Institutions (1)

University of Calgary¹

01 Jan 2005-Journal of Multivariate Analysis

TL;DR: In this article, a distance for mixed nominal, ordinal and continuous data is developed by applying the Kullback-Leibler divergence to the general mixed-data model, an extension of the general location model that allows for ordinal variables to be incorporated in the model.

...read moreread less

Journal Article•DOI•

Ordered Samples Control Charts for Ordinal Variables

[...]

Fiorenzo Franceschini¹, Maurizio Galetto¹, M. Varetto¹•Institutions (1)

Polytechnic University of Turin¹

01 Mar 2005-Quality and Reliability Engineering International

TL;DR: In this article, a new approach based on the use of a new sample scale obtained by ordering the original variable sample space according to some specific "dominance criteria" fixed on the basis of the monitored process characteristics is presented.

...read moreread less

Abstract: The paper presents a new method for statistical process control when ordinal variables are involved. This is the case of a quality characteristic evaluated by an ordinal scale. The method allows a statistical analysis without exploiting an arbitrary numerical conversion of scale levels and without using the traditional sample synthesis operators (sample mean and variance). It consists of a different approach based on the use of a new sample scale obtained by ordering the original variable sample space according to some specific ‘dominance criteria’ fixed on the basis of the monitored process characteristics. Samples are directly reported on the chart and no distributional shape is assumed for the population (universe) of evaluations. Finally, a practical application of the method in the health sector is provided. Copyright © 2005 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

Modelling of repeated ordered measurements by isotonic sequential regression

[...]

Gerhard Tutz¹•Institutions (1)

Ludwig Maximilian University of Munich¹

01 Dec 2005-Statistical Modelling

TL;DR: In this paper, a simple model for repeated observations of an ordered categorical response variable which is isotonic over time is introduced, where the measurements represent an irreversible process such that the response at time t is never lower than the response observed at the previous time point t-1.

...read moreread less

Abstract: The paper introduces a simple model for repeated observations of an ordered categorical response variable which is isotonic over time. It is assumed that the measurements represent an irreversible process such that the response at time t is never lower than the response observed at the previous time point t-1. Observations of this type occur for example in treatment studies when improvement is measured on an ordinal scale. Since the response at time t depends on the previous outcome, the number of ordered response categories depends on the previous outcome leading to severe problems when simple threshold models for ordered data are used. In order to avoid these problems the isotonic sequential model is introduced. It accounts for the irreversible process by considering the binary transitions to higher scores and allows a parsimonious parameterization. It is shown how the model may easily be estimated by using existing software. Moreover, the model is extended to a random effects version which explicitly takes heterogeneity of individuals and potential correlations into account.

...read moreread less

Journal Article•

The reliability and validity of the item means and standard deviations of ordinal level response data

[...]

A.G. Stacey

01 Jan 2005-Management Dynamics : Journal of the Southern African Institute for Management Scientists

TL;DR: In this article, an alternative, algorithmic rather than an arithmetic, approach is described, which involves the fitting of distributions to the observed survey data using the c 2 statistic, compared to the approach that assumes an interval level scale, and the rescaling approach that uses correspondence analysis.

...read moreread less

Abstract: The debate over the appropriate analysis of ordinal level survey data has lasted for many decades. One school of thought maintains that the data can generally be regarded as interval level, whereas another asserts that the data should be rescaled before subjecting it to statistical analysis. In this study, an alternative, algorithmic rather than an arithmetic, approach is described, which involves the fitting of distributions to the observed survey data using the c 2 statistic. This distribution-fitting approach is compared to the approach that assumes an interval level scale, and the rescaling approach that uses correspondence analysis. Using a bootstrap resampling methodology, the analysis confirms that survey results may be flawed if ordinal level scales are assumed to be interval level, or if the correspondence analysis approach is applied inappropriately. The distribution-fitting approach is found to have accuracy and validity that is superior to the alternative approaches.

...read moreread less

Reference Entry•DOI•

Mantel–Haenszel Methods

[...]

J. Richard Landis, Tonya J. Sharp¹, Stephen J. Kuritz, Gary G. Koch¹•Institutions (1)

University of North Carolina at Chapel Hill¹

15 Jul 2005

TL;DR: The authors provided an overview of generalized Mantel-Haenszel (MH) methods for the analysis of categorical data from factor-response and repeated measures study designs, using data from two different clinical research studies, investigating treatment differences within several different sets of 2 × 2 tables in a clinical trial, and within-subject differences in an ordinal response across ordinal factor levels within a repeated measures design.

...read moreread less

Abstract: This article provides an overview of generalized Mantel–Haenszel (MH) methods for the analysis of categorical data from factor–response and repeated measures study designs. These methods are illustrated using data from two different clinical research studies, investigating treatment differences within several different sets of 2 × 2 tables in a clinical trial, and within-subject differences in an ordinal response across ordinal factor levels within a repeated measures design. The underlying multiple hypergeometric probability structure, based on a randomization model framework for hypothesis testing, is summarized for testing alternative hypotheses of 1) general association; 2) mean responses differ; and 3) linear trend in mean responses. These generalized MH methods can all be implemented directly within SAS and StatXact, with appropriate stratification and choice of scores. Keywords: categorical data; ordinal scores; randomization model; hypergeometric probability; repeated measures

...read moreread less

Journal Article•DOI•

Computing marginal likelihoods from a single MCMC output

[...]

Ming-Hui Chen¹•Institutions (1)

University of Connecticut¹

01 Feb 2005-Statistica Neerlandica

TL;DR: The proposed Monte Carlo methods for computing a single marginal likelihood or several marginal likelihoods for the purpose of Bayesian model comparisons are motivated by Bayesian variable selection.

...read moreread less

Abstract: In this article, we propose new Monte Carlo methods for computing a single marginal likelihood or several marginal likelihoods for the purpose of Bayesian model comparisons. The methods are motivated by Bayesian variable selection, in which the marginal likelihoods for all subset variable models are required to compute. The proposed estimates use only a single Markov chain Monte Carlo (MCMC) output from the joint posterior distribution and it does not require the specific structure or the form of the MCMC sampling algorithm that is used to generate the MCMC sample to be known. The theoretical properties of the proposed method are examined in detail. The applicability and usefulness of the proposed method are demonstrated via ordinal data probit regression models. A real dataset involving ordinal outcomes is used to further illustrate the proposed methodology.

...read moreread less

Journal Article•DOI•

Multiple Imputation For Missing Ordinal Data

[...]

Ling Chen, Mariana Toma-Drane, Robert F. Valois, J. Wanzer Drane

01 May 2005-Journal of Modern Applied Statistical Methods

TL;DR: In this paper, the authors compare complete case analysis of ordinal data with including multivariate normal imputations and show that using only complete cases is not as good as using multivariate normals.

...read moreread less

Abstract: Simulations were used to compare complete case analysis of ordinal data with including multivariate normal imputations. MVN methods of imputation were not as good as using only complete cases. Bias and standard errors were measured against coefficients estimated from logistic regression and a standard data set.

...read moreread less

Reference Entry•DOI•

Ordinal Regression Models

[...]

Scott L. Hershberger¹•Institutions (1)

California State University, Long Beach¹

15 Oct 2005

TL;DR: One of the most frequently used ordinal regression models is the ordinal logistic model, a member of the family of generalized linear models as discussed by the authors, which is based upon the cumulative probabilities for the categories of response variable.

...read moreread less

Abstract: Regression models for ordinal data have been developed based upon the cumulative probabilities for the categories of response variable. One of the most frequently used ordinal regression model is the ordinal logistic model, a member of the family of generalized linear models. Keywords: logistic regression; ordinal variable; regression coefficient; odds ratio

...read moreread less

Journal Article•DOI•

Prediction of ordinal outcomes when the association between predictors and outcome differs between outcome levels

[...]

Mark Lunt¹•Institutions (1)

University of Manchester¹

15 May 2005-Statistics in Medicine

TL;DR: If there are sufficient outcome levels and/or predictor variables, there may be a number of stereotype models of differing dimension, and this method is illustrated with an example of prediction of damage to joints in rheumatoid arthritis.

...read moreread less

Abstract: There are a number of regression models which are widely used to predict ordinal outcomes. The commonly used models assume that all predictor variables have a similar effect at all levels of the outcome variable. If this is not the case, for example if some variables predict susceptibility to a disease and others predict the severity of the disease, then a more complex model is required. One possibility is the multinomial logistic regression model, which assumes that the predictor variables have different effects at all levels of the outcome variable. An alternative is to use the stereotype family of regression models. A one-dimensional stereotype model makes the assumption that the effect of each predictor is the same at all outcome levels. However, it is possible to fit stereotype models with more than one dimension, up to a maximum of min(k-1, p) where k is the number of outcome categories and p is the number of predictor variables. A stereotype model of this maximum dimension is equivalent to a multinomial logistic regression model, in that it will produce the same predicted values and log-likelihood. If there are sufficient outcome levels and/or predictor variables, there may be a number of stereotype models of differing dimension. The method is illustrated with an example of prediction of damage to joints in rheumatoid arthritis.

...read moreread less

Proceedings Article•DOI•

A preference model for structured supervised learning tasks

[...]

Fabio Aiolli¹•Institutions (1)

University of Padua¹

27 Nov 2005

TL;DR: The preference model introduced in this paper gives a natural framework and a principled solution for a broad class of supervised learning problems with structured predictions, such as predicting orders and instance ranking, and predicting rates.

...read moreread less

Abstract: The preference model introduced in this paper gives a natural framework and a principled solution for a broad class of supervised learning problems with structured predictions, such as predicting orders (label and instance ranking), and predicting rates (classification and ordinal regression). We show how all these problems can be cast as linear problems in an augmented space, and we propose an on-line method to efficiently solve them. Experiments on an ordinal regression task confirm the effectiveness of the approach.

...read moreread less

Journal Article•DOI•

Sample‐size calculations for studies with correlated ordinal outcomes

[...]

Hae-Young Kim¹, John Williamson¹, Cynthia M. Lyles¹•Institutions (1)

Centers for Disease Control and Prevention¹

15 Oct 2005-Statistics in Medicine

TL;DR: Rochon's method of sample-size estimation with a repeated binary response to the ordinal case is extended, based on an analysis with generalized estimating equations (GEE) and inference with the Wald test.

...read moreread less

Abstract: Correlated ordinal response data often arise in public health studies. Sample-size (power) calculations are a crucial step in designing such studies to ensure an adequate sample to detect a significant effect. Here we extend Rochon's method of sample-size estimation with a repeated binary response to the ordinal case. The proposed sample-size calculations are based on an analysis with generalized estimating equations (GEE) and inference with the Wald test. Simulation results demonstrate the merit of the proposed power calculations. Analysis of an arthritis clinical trial is used for illustration.

...read moreread less

Posted Content•

Determinants of Organic Farming Conversion

[...]

Lieve De Cock

01 Jan 2005-Research Papers in Economics

TL;DR: In this paper, the behavioural change framework of Ajzen and Fishbein is used to explore whether attitudes towards organic farming, the perceived social pressure of the environment and the perceived feasibility of organic farming standards on the farm determine the willingness of farmers to convert to organic farming methods.

...read moreread less

Abstract: In this paper the behavioural change framework of Ajzen and Fishbein is used to explore whether attitudes towards organic farming, the perceived social pressure of the environment and the perceived feasibility of organic farming standards on the farm determine the willingness of farmers to convert to organic farming methods. These variables together with the business and personal objectives and the organic farming information seeking behaviour of the farmer were used in an ordinal regression procedure to predict the intended organic farming conversion behaviour of conventional farmers.

...read moreread less

Reference Entry•DOI•

Proportional‐Odds Model

[...]

Peter McCullagh¹•Institutions (1)

University of Illinois at Chicago¹

15 Jul 2005

TL;DR: The proportional hazards model as discussed by the authors is a member of the family of cumulative logistic regression models, designed for studying the effect of covariates on an ordinal responses variable, and its relationships with other members of this family are described, emphasizing the importance of response aggregation.

...read moreread less

Abstract: The proportional odds model is one member of the family of cumulative logistic regression models, designed for studying the effect of covariates on an ordinal responses variable. Relationships with other members of this family are described, emphasizing the effect of response aggregation. The connection with latent variable models, dispersion models, continuation-ratio models, and log-linear models is also discussed. Keywords: canonical regression model; extreme-value distribution; latent variable; logistic regression model; proportional-hazards model

...read moreread less

Journal Article•DOI•

An investigation into the factors that influence discharge-related anxiety in medium secure unit patients

[...]

Nicole Main, Gisli H. Gudjonsson

01 Jun 2005-Journal of Forensic Psychiatry & Psychology

TL;DR: In this paper, the authors examined discharge-related anxiety in a group of 65 patients resident in five medium secure units located in the South of England and found that the main predictors of a general dischargerelated anxiety scale were low self-esteem and perceived absence of social support, although high trait anxiety also exerted a significant independent effect.

...read moreread less

Abstract: This study examines discharge-related anxiety in a group of 65 patients resident in five medium secure units located in the South of England. The study is part of a larger investigation of non-compliance within medium secure unit environments. Participants completed standardised questionnaire measures of self-efficacy, self-esteem, anxiety and locus of control, together with a newly constructed questionnaire investigating anxiety relating to discharge. Results of ordinal regression procedures indicated that the main predictors of a general discharge-related anxiety scale were low self-esteem and perceived absence of social support, although on univariate analysis high trait anxiety also exerted a significant independent effect. The clinical implications of the findings are discussed.

...read moreread less

Posted Content•

Gologit2: Generalized Logistic Regression Models for Ordinal Dependent Variables

[...]

Richard J. Williams

12 Jul 2005-Research Papers in Economics

TL;DR: The Gologit2-2- as discussed by the authors program is a user-written program that estimates generalized logistic regression models for ordinal dependent variables, where the actual values taken on by the dependent variable are irrelevant except that larger values are assumed to correspond to "higher" outcomes.

...read moreread less

Abstract: -gologit2- is a user-written program that estimates generalized logistic regression models for ordinal dependent variables. The actual values taken on by the dependent variable are irrelevant except that larger values are assumed to correspond to "higher" outcomes. A major strength of -gologit2- is that it can also estimate two special cases of the generalized model: the proportional odds model and the partial proportional odds model. Hence, -gologit2- can estimate models that are less restrictive than the proportional odds/parallel lines models estimated by –ologit- (whose assumptions are often violated) but more parsimonious and interpretable than those estimated by a non-ordinal method, such as multinomial logistic regression. The –autofit- option greatly simplifies the process of identifying partial proportional odds models that fit the data. Two alternative but equivalent parameterizations of the model that have appeared in the literature are both supported. Other key advantages of -gologit2- include support for linear constraints, Stata 8.2 survey data (svy) estimation, and the computation of estimated probabilities via the –predict- command. -gologit2- is inspired by Vincent Fu’s –gologit- program and is backward compatible with it but offers several additional powerful options.

...read moreread less

Posted Content•

GOLOGIT2: Stata module to estimate generalized logistic regression models for ordinal dependent variables

[...]

Richard J. Williams

01 Jan 2005-Research Papers in Economics

TL;DR: Gologit2 as mentioned in this paper is a generalized ordered logit model for ordinal dependent variables that can estimate models that are less restrictive than the proportional odds/parallel lines models estimated by ologit but more parsimonious and interpretable than those estimated by a non-ordinal method, such as multinomial logistic regression.

...read moreread less

Abstract: gologit2 estimates generalized ordered logit models for ordinal dependent variables. A major strength of gologit2 is that it can also estimate three special cases of the generalized model: the proportional odds/parallel lines model, the partial proportional odds model, and the logistic regression model. Hence, gologit2 can estimate models that are less restrictive than the proportional odds /parallel lines models estimated by ologit (whose assumptions are often violated) but more parsimonious and interpretable than those estimated by a non-ordinal method, such as multinomial logistic regression (i.e. mlogit). The svy: prefix, as well as factor variables and post-estimation commands such as margins, are supported. Other key strengths of gologit2 include options for linear constraints, alternative model parameterizations, automated model fitting, alternative link functions (logit, probit, complementary log-log, log-log & cauchit), and the computation of estimated probabilities via the predict command. gologit2 works under Stata 11.2 or higher. Those with older versions of Stata should use gologit29 instead. gologit2 is inspired by Vincent Fu's gologit program and is backward compatible with both it and gologit29 but offers several additional powerful options.

...read moreread less

Proceedings Article•DOI•

A Multi-class Classification Algorithm based on Ordinal Regression Machine

[...]

Zhi-Xia Yang, Nai-Yang Deng, Yingjie Tian¹•Institutions (1)

Chinese Academy of Sciences¹

28 Nov 2005

TL;DR: This paper proposes a multi- class classification algorithm based on ordinal regression algorithm using 3-class classification that is similar to algorithm K-SVCR and algorithm nu-K-VCR, but it includes fewer parameters.

...read moreread less

Abstract: Multi-class classification is an important and on-going research subject in machine learning. In this paper, we propose a multi-class classification algorithm based on ordinal regression algorithm using 3-class classification. This algorithm is similar to algorithm K-SVCR and algorithm nu-K-SVCR, but it includes fewer parameters. Another advantage of our algorithm is that, for the K-class classification problem, our algorithm can be extended to using p-class classification with 2 les p les K. Numerical experiments on artificial data sets and benchmark data sets show that the algorithm is reasonable and effective

...read moreread less