scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Applied Multivariate Data Analysis, Volume I: Regression and Experimental Design

01 Nov 1992-Technometrics (Taylor & Francis Group)-Vol. 34, Iss: 4, pp 490-491
About: This article is published in Technometrics.The article was published on 1992-11-01. It has received 98 citations till now. The article focuses on the topics: Multivariate statistics & Bayesian multivariate linear regression.
Citations
More filters
Journal ArticleDOI
TL;DR: The paper identified empirically different stages in the Internet adoption process and linked them with those factors and established that the size of the company does not have any effect on that process.
Abstract: Purpose – To explore the factors that affect the implementation of Internet technologies and to what extent the size of the company, as an organizational factor, influences that process.Design/methodology/approach – According to the innovation adoption theory, it was found that Internet adoption in firms is a process with different stages where a company is in one of a number of development stages depending on some variables related to organizational factors, such as the availability of technology resources, organizational structure, and managerial capabilities. The paper identified empirically different stages in the Internet adoption process and linked them with those factors. It analyzed questionnaire‐based data from 280 companies, applying factor and clustering analysis.Findings – Four main groups of companies were found according to their stage in the adoption of Internet technologies. The paper established that, contrary to the literature suggestions, the size of the company does not have any effect...

226 citations


Cites methods from "Applied Multivariate Data Analysis,..."

  • ...…information obtained from the sample – specifically to determine groups of organizations associated with particular phases in Internet technology implementation, as well as the organizational factors affecting them – we applied a cluster analysis (Cliff, 1987; Jobson, 1991a, b; Greenacre, 1993)....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a methodological framework for analyzing the relationship between state sequences and covariates is defined, and a generalized simple and multi-factor discrepancy-based methods to test for dierences between groups, a pseudo R 2 for measuring the strength of sequence-covariate associations, a generalized Levene statistic for testing dierences in the within-group discrepancies, as well as tools and plots for studying the evolution of the dierences along the timeframe and a regression tree method for discovering the most significant discriminant covariates.
Abstract: In this article we define a methodological framework for analyzing the relationship between state sequences and covariates. Inspired by the ANOVA principles, our approach looks at how the covariates explain the discrepancy of the sequences. We use the pairwise dissimilarities between sequences to determine the discrepancy which makes it then possible to develop a series of statistical-significancebased analysis tools. We introduce generalized simple and multi-factor discrepancy-based methods to test for dierences between groups, a pseudo R 2 for measuring the strength of sequence‐covariate associations, a generalized Levene statistic for testing dierences in the within-group discrepancies, as well as tools and plots for studying the evolution of the dierences along the timeframe and a regression tree method for discovering the most significant discriminant covariates and their interactions. In addition, we extend all methods to account for case weights. The scope of the proposed methodological framework is illustrated using a real-world sequence dataset.

165 citations


Cites methods from "Applied Multivariate Data Analysis,..."

  • ...Studer et al. (2010) use a generalization of the Bartlett T (Bartlett 1937; Jobson 1991) for testing the homogeneity of the within-group discrepancies....

    [...]

  • ...(2010) use a generalization of the Bartlett T (Bartlett 1937; Jobson 1991) for testing the homogeneity of the within-group discrepancies....

    [...]

Journal ArticleDOI
TL;DR: It is found that by changing the parameterization of each model its correlation with actual expression levels can be somewhat improved, although both indices are fairly insensitive to the exact way they are parameterized, which indicates a consistent codon bias amongst highly expressed genes.
Abstract: Highly expressed genes in many bacteria and small eukaryotes often have a strong compositional bias, in terms of codon usage. Two widely used numerical indices, the codon adaptation index (CAI) and the codon usage, use this bias to predict the expression level of genes. When these indices were first introduced, they were based on fairly simple assumptions about which genes are most highly expressed: the CAI was originally based on the codon composition of a set of only 24 highly expressed genes, and the codon usage on assumptions about which functional classes of genes are highly expressed in fast-growing bacteria. Given the recent advent of genome-wide expression data, we should be able to improve on these assumptions. Here, we measure, in yeast, the degree to which consideration of the current genome-wide expression data sets improves the performance of both numerical indices. Indeed, we find that by changing the parameterization of each model its correlation with actual expression levels can be somewhat improved, although both indices are fairly insensitive to the exact way they are parameterized. This insensitivity indicates a consistent codon bias amongst highly expressed genes. We also attempt direct linear regression of codon composition against genome-wide expression levels (and protein abundance data). This has some similarity with the CAI formalism and yields an alternative model for the prediction of expression levels based on the coding sequences of genes. More information is available at http://bioinfo.mbb.yale.edu/expression/codons.

159 citations

References
More filters
Book
01 Jan 1974
TL;DR: Applied Linear Statistical Models 5e as discussed by the authors is the leading authoritative text and reference on statistical modeling, which includes brief introductory and review material, and then proceeds through regression and modeling for the first half, and through ANOVA and Experimental Design in the second half.
Abstract: Applied Linear Statistical Models 5e is the long established leading authoritative text and reference on statistical modeling. The text includes brief introductory and review material, and then proceeds through regression and modeling for the first half, and through ANOVA and Experimental Design in the second half. All topics are presented in a precise and clear style supported with solved examples, numbered formulae, graphic illustrations, and "Notes" to provide depth and statistical accuracy and precision. The Fifth edition provides an increased use of computing and graphical analysis throughout, without sacrificing concepts or rigor. In general, the 5e uses larger data sets in examples and exercises, and where methods can be automated within software without loss of understanding, it is so done.

10,747 citations

Book
01 Jan 1978
TL;DR: In this article, the authors compare two straight line regression models and conclude that the Straight Line Regression Equation does not measure the strength of the Straight-line Relationship, but instead is a measure of the relationship between two straight lines.
Abstract: 1. CONCEPTS AND EXAMPLES OF RESEARCH. Concepts. Examples. Concluding Remarks. References. 2. CLASSIFICATION OF VARIABLES AND THE CHOICE OF ANALYSIS. Classification of Variables. Overlapping of Classification Schemes. Choice of Analysis. References. 3. BASIC STATISTICS: A REVIEW. Preview. Descriptive Statistics. Random Variables and Distributions. Sampling Distributions of t, ?O2, and F. Statistical Inference: Estimation. Statistical Inference: Hypothesis Testing. Error Rate, Power, and Sample Size. Problems. References. 4. INTRODUCTION TO REGRESSION ANALYSIS. Preview. Association versus Causality. Statistical versus Deterministic Models. Concluding Remarks. References. 5. STRAIGHT-LINE REGRESSION ANALYSIS. Preview. Regression with a Single Independent Variable. Mathematical Properties of a Straight Line. Statistical Assumptions for a Straight-line Model. Determining the Best-fitting Straight Line. Measure of the Quality of the Straight-line Fit and Estimate ?a2. Inferences About the Slope and Intercept. Interpretations of Tests for Slope and Intercept. Inferences About the Regression Line ?YY|X = ?O0 + ?O1X . Prediction of a New Value of Y at X0. Problems. References. 6. THE CORRELATION COEFFICIENT AND STRAIGHT-LINE REGRESSION ANALYSIS. Definition of r. r as a Measure of Association. The Bivariate Normal Distribution. r and the Strength of the Straight-line Relationship. What r Does Not Measure. Tests of Hypotheses and Confidence Intervals for the Correlation Coefficient. Testing for the Equality of Two Correlations. Problems. References. 7. THE ANALYSIS-OF-VARIANCE TABLE. Preview. The ANOVA Table for Straight-line Regression. Problems. 8. MULTIPLE REGRESSION ANALYSIS: GENERAL CONSIDERATIONS. Preview. Multiple Regression Models. Graphical Look at the Problem. Assumptions of Multiple Regression. Determining the Best Estimate of the Multiple Regression Equation. The ANOVA Table for Multiple Regression. Numerical Examples. Problems. References. 9. TESTING HYPOTHESES IN MULTIPLE REGRESSION. Preview. Test for Significant Overall Regression. Partial F Test. Multiple Partial F Test. Strategies for Using Partial F Tests. Tests Involving the Intercept. Problems. References. 10. CORRELATIONS: MULTIPLE, PARTIAL, AND MULTIPLE PARTIAL. Preview. Correlation Matrix. Multiple Correlation Coefficient. Relationship of RY|X1, X2, !KXk to the Multivariate Normal Distribution. Partial Correlation Coefficient. Alternative Representation of the Regression Model. Multiple Partial Correlation. Concluding Remarks. Problems. References. 11. CONFOUNDING AND INTERACTION IN REGRESSION. Preview. Overview. Interaction in Regression. Confounding in Regression. Summary and Conclusions. Problems. References. 12. DUMMY VARIABLES IN REGRESSION. Preview. Definitions. Rule for Defining Dummy Variables. Comparing Two Straight-line Regression Equations: An Example. Questions for Comparing Two Straight Lines. Methods of Comparing Two Straight Lines. Method I: Using Separate Regression Fits to Compare Two Straight Lines. Method II: Using a Single Regression Equation to Compare Two Straight Lines. Comparison of Methods I and II. Testing Strategies and Interpretation: Comparing Two Straight Lines. Other Dummy Variable Models. Comparing Four Regression Equations. Comparing Several Regression Equations Involving Two Nominal Variables. Problems. References. 13. ANALYSIS OF COVARIANCE AND OTHER METHODS FOR ADJUSTING CONTINUOUS DATA. Preview. Adjustment Problem. Analysis of Covariance. Assumption of Parallelism: A Potential Drawback. Analysis of Covariance: Several Groups and Several Covariates. Comments and Cautions. Summary Problems. Reference. 14. REGRESSION DIAGNOSTICS. Preview. Simple Approaches to Diagnosing Problems in Data. Residual Analysis: Detecting Outliers and Violations of Model Assumptions. Strategies of Analysis. Collinearity. Scaling Problems. Diagnostics Example. An Important Caution. Problems. References. 15. POLYNOMIAL REGRESSION. Preview. Polynomial Models. Least-squares Procedure for Fitting a Parabola. ANOVA Table for Second-order Polynomial Regression. Inferences Associated with Second-order Polynomial Regression. Example Requiring a Second-order Model. Fitting and Testing Higher-order Model. Lack-of-fit Tests. Orthogonal Polynomials. Strategies for Choosing a Polynomial Model. Problems. 16. SELECTING THE BEST REGRESSION EQUATION. Preview. Steps in Selecting the Best Regression Equation. Step 1: Specifying the Maximum Model. Step 2: Specifying a Criterion for Selecting a Model. Step 3: Specifying a Strategy for Selecting Variables. Step 4: Conducting the Analysis. Step 5: Evaluating Reliability with Split Samples. Example Analysis of Actual Data. Issues in Selecting the Most Valid Model. Problems. References. 17. ONE-WAY ANALYSIS OF VARIANCE. Preview. One-way ANOVA: The Problem, Assumptions, and Data Configuration. for One-way Fixed-effects ANOVA. Regression Model for Fixed-effects One-way ANOVA Fixed-effects Model for One-way ANOVA. Random-effects Model for One-way ANOVA. -comparison Procedures for Fixed-effects One-way ANOVA. a Multiple-comparison Technique. Orthogonal Contrasts and Partitioning an ANOVA Sum of Squares. Problems. References. 18. RANDOMIZED BLOCKS: SPECIAL CASE OF TWO-WAY ANOVA. Preview. Equivalent Analysis of a Matched-pairs Experiment. Principle of Blocking. Analysis of a Randomized-blocks Experiment. ANOVA Table for a Randomized-blocks Experiment. Models for a Randomized-blocks Experiment. Fixed-effects ANOVA Model for a Randomized-blocks Experiment. Problems. References. 19. TWO-WAY ANOVA WITH EQUAL CELL NUMBERS. Preview. Using a Table of Cell Means. General Methodology. F Tests for Two-way ANOVA. Regression Model for Fixed-effects Two-way ANOVA. Interactions in Two-way ANOVA. Random- and Mixed-effects Two-way ANOVA Models. Problems. References. 20. TWO-WAY ANOVA WITH UNEQUAL CELL NUMBERS. Preview. Problem with Unequal Cell Numbers: Nonorthogonality. Regression Approach for Unequal Cell Sample Sizes. Higher-way ANOVA. Problems. References. 21. THE METHOD OF MAXIMUM LIKELIHOOD. Preview. The Principle of Maximum Likelihood. Statistical Inference Using Maximum Likelihood. Summary. Problems. 22. LOGISTIC REGRESSION ANALYSIS. Preview. The Logistic Model. Estimating the Odds Ratio Using Logistic Regression. A Numerical Example of Logistic Regression. Theoretical Considerations. An Example of Conditional ML Estimation Involving Pair-matched Data with Unmatched Covariates. Summary. Problems. References. 23. POLYTOMOUS AND ORDINAL LOGISTIC REGRESSION. Preview. Why Not Use Binary Regression? An Example of Polytomous Logistic Regression: One Predictor, Three Outcome Categories. An Example: Extending the Polytomous Logistic Model to Several Predictors. Ordinal Logistic Regression: Overview. A "Simple" Hypothetical Example: Three Ordinal Categories and One Dichotomous Exposure Variable. Ordinal Logistic Regression Example Using Real Data with Four Ordinal Categories and Three Predictor Variables. Summary. Problems. References. 24. POISSON REGRESSION ANALYSIS. Preview. The Poisson Distribution. Example of Poisson Regression. Poisson Regression: General Considerations. Measures of Goodness of Fit. Continuation of Skin Cancer Data Example. A Second Illustration of Poisson Regression Analysis. Summary. Problems. References. 25. ANALYSIS OF CORRELATED DATA PART 1: THE GENERAL LINEAR MIXED MODEL. Preview. Examples. General Linear Mixed Model Approach. Example: Study of Effects of an Air Polluion Episode on FEV1 Levels. Summary!XAnalysis of Correlated Data: Part 1. Problems. References. 26. ANALYSIS OF CORRELATED DATA PART 2: RANDOM EFFECTS AND OTHER ISSUES. Preview. Random Effects Revisited. Results for Random Effects Models Applied to Air Pollution Study Data. Second Example!XAnalysis of Posture Measurement Data. Recommendations about Choice of Correlation Structure. Analysis of Data for Discrete Outcomes. Problems. References. 27. SAMPLE SIZE PLANNING FOR LINEAR AND LOGISTIC REGRESSION AND ANALYSIS OF VARIANCE. Preview. Review: Sample Size Calculations for Comparisons of Means and Proportions. Sample Size Planning for Linear Regression. Sample Size Planning for Logistic Regression. Power and Sample Size Determination for Linear Models: A General Approach. Sample Size Determination for Matched Case-control Studies with a Dichotomous Outcome. Practical Considerations and Cautions. Problems. References. Appendix A. Appendix B. Appendix C. Solutions to Exercises. Index.

9,433 citations

Journal ArticleDOI
TL;DR: This work proposes the use of the normal probability plot and the cumulative sum plots of the recursive residuals to check the model assumptions of normality and homoscedasticity, and other aspects of model misfits such as change of regime, outliers, and omitted predictors, in place of plots based on ordinary residuals.
Abstract: Recursive residuals are independently and identically distributed and, unlike ordinary residuals, do not have the problem of deficiencies in one part of the data being smeared over all the residuals. In addition, recursive residuals may be interpreted as showing the effect of successively deleting observations from the data set. We propose the use of the normal probability plot and the cumulative sum plots of the recursive residuals, and of the square roots of the absolute values of the recursive residuals to check the model assumptions of normality and homoscedasticity, and other aspects of model misfits such as change of regime, outliers, and omitted predictors, in place of plots based on ordinary residuals. A further advantage of recursive residuals is that they are open to formal statistical testing, so that these plots can be automated and in fact produced only when a model misfit has been detected.

55 citations