scispace - formally typeset
Search or ask a question
Journal Article•DOI•

Matrix Differential Calculus with Applications in Statistics and Econometrics

01 Sep 1991-The Statistician (John Wiley & Sons, Ltd)-Vol. 40, Iss: 3, pp 349-349
About: This article is published in The Statistician.The article was published on 1991-09-01. It has received 1677 citations till now.
Citations
More filters
Christopher M. Bishop1•
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Journal Article•DOI•
TL;DR: In this paper, the authors study the properties of the quasi-maximum likelihood estimator and related test statistics in dynamic models that jointly parameterize conditional means and conditional covariances, when a normal log-likelihood is maximized but the assumption of normality is violated.
Abstract: We study the properties of the quasi-maximum likelihood estimator (QMLE) and related test statistics in dynamic models that jointly parameterize conditional means and conditional covariances, when a normal log-likelihood os maximized but the assumption of normality is violated. Because the score of the normal log-likelihood has the martingale difference property when the forst two conditional moments are correctly specified, the QMLE is generally Consistent and has a limiting normal destribution. We provide easily computable formulas for asymptotic standard errors that are valid under nonnormality. Further, we show how robust LM tests for the adequacy of the jointly parameterized mean and variance can be computed from simple auxiliary regressions. An appealing feature of these robyst inference procedures is that only first derivatives of the conditional mean and variance functions are needed. A monte Carlo study indicates that the asymptotic results carry over to finite samples. Estimation of several AR a...

3,512 citations

Journal Article•DOI•
TL;DR: In this article, a two-stage approach based on the unstructured mean and covariance estimates obtained by the EM-algorithm is proposed to deal with missing data in social and behavioral sciences, and the asymptotic efficiencies of different estimators are compared under various assump...
Abstract: Survey and longitudinal studies in the social and behavioral sciences generally contain missing data. Mean and covariance structure models play an important role in analyzing such data. Two promising methods for dealing with missing data are a direct maximum-likelihood and a two-stage approach based on the unstructured mean and covariance estimates obtained by the EM-algorithm. Typical assumptions under these two methods are ignorable nonresponse and normality of data. However, data sets in social and behavioral sciences are seldom normal, and experience with these procedures indicates that normal theory based methods for nonnormal data very often lead to incorrect model evaluations. By dropping the normal distribution assumption, we develop more accurate procedures for model inference. Based on the theory of generalized estimating equations, a way to obtain consistent standard errors of the two-stage estimates is given. The asymptotic efficiencies of different estimators are compared under various assump...

1,412 citations

Journal Article•DOI•
TL;DR: An overview of the majorization-minimization (MM) algorithmic framework, which can provide guidance in deriving problem-driven algorithms with low computational cost and is elaborated by a wide range of applications in signal processing, communications, and machine learning.
Abstract: This paper gives an overview of the majorization-minimization (MM) algorithmic framework, which can provide guidance in deriving problem-driven algorithms with low computational cost. A general introduction of MM is presented, including a description of the basic principle and its convergence results. The extensions, acceleration schemes, and connection to other algorithmic frameworks are also covered. To bridge the gap between theory and practice, upperbounds for a large number of basic functions, derived based on the Taylor expansion, convexity, and special inequalities, are provided as ingredients for constructing surrogate functions. With the pre-requisites established, the way of applying MM to solving specific problems is elaborated by a wide range of applications in signal processing, communications, and machine learning.

1,073 citations

Journal Article•
TL;DR: A novel probabilistic interpretation of principal component analysis (PCA) that is based on a Gaussian process latent variable model (GP-LVM), and related to popular spectral techniques such as kernel PCA and multidimensional scaling.
Abstract: Summarising a high dimensional data set with a low dimensional embedding is a standard approach for exploring its structure In this paper we provide an overview of some existing techniques for discovering such embeddings We then introduce a novel probabilistic interpretation of principal component analysis (PCA) that we term dual probabilistic PCA (DPPCA) The DPPCA model has the additional advantage that the linear mappings from the embedded space can easily be non-linearised through Gaussian processes We refer to this model as a Gaussian process latent variable model (GP-LVM) Through analysis of the GP-LVM objective function, we relate the model to popular spectral techniques such as kernel PCA and multidimensional scaling We then review a practical algorithm for GP-LVMs in the context of large data sets and develop it to also handle discrete valued data and missing attributes We demonstrate the model on a range of real-world and artificially generated data sets

1,065 citations

References
More filters
Christopher M. Bishop1•
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Journal Article•DOI•
TL;DR: In this paper, the authors study the properties of the quasi-maximum likelihood estimator and related test statistics in dynamic models that jointly parameterize conditional means and conditional covariances, when a normal log-likelihood is maximized but the assumption of normality is violated.
Abstract: We study the properties of the quasi-maximum likelihood estimator (QMLE) and related test statistics in dynamic models that jointly parameterize conditional means and conditional covariances, when a normal log-likelihood os maximized but the assumption of normality is violated. Because the score of the normal log-likelihood has the martingale difference property when the forst two conditional moments are correctly specified, the QMLE is generally Consistent and has a limiting normal destribution. We provide easily computable formulas for asymptotic standard errors that are valid under nonnormality. Further, we show how robust LM tests for the adequacy of the jointly parameterized mean and variance can be computed from simple auxiliary regressions. An appealing feature of these robyst inference procedures is that only first derivatives of the conditional mean and variance functions are needed. A monte Carlo study indicates that the asymptotic results carry over to finite samples. Estimation of several AR a...

3,512 citations

Journal Article•DOI•
TL;DR: In this article, a two-stage approach based on the unstructured mean and covariance estimates obtained by the EM-algorithm is proposed to deal with missing data in social and behavioral sciences, and the asymptotic efficiencies of different estimators are compared under various assump...
Abstract: Survey and longitudinal studies in the social and behavioral sciences generally contain missing data. Mean and covariance structure models play an important role in analyzing such data. Two promising methods for dealing with missing data are a direct maximum-likelihood and a two-stage approach based on the unstructured mean and covariance estimates obtained by the EM-algorithm. Typical assumptions under these two methods are ignorable nonresponse and normality of data. However, data sets in social and behavioral sciences are seldom normal, and experience with these procedures indicates that normal theory based methods for nonnormal data very often lead to incorrect model evaluations. By dropping the normal distribution assumption, we develop more accurate procedures for model inference. Based on the theory of generalized estimating equations, a way to obtain consistent standard errors of the two-stage estimates is given. The asymptotic efficiencies of different estimators are compared under various assump...

1,412 citations

Journal Article•DOI•
TL;DR: An overview of the majorization-minimization (MM) algorithmic framework, which can provide guidance in deriving problem-driven algorithms with low computational cost and is elaborated by a wide range of applications in signal processing, communications, and machine learning.
Abstract: This paper gives an overview of the majorization-minimization (MM) algorithmic framework, which can provide guidance in deriving problem-driven algorithms with low computational cost. A general introduction of MM is presented, including a description of the basic principle and its convergence results. The extensions, acceleration schemes, and connection to other algorithmic frameworks are also covered. To bridge the gap between theory and practice, upperbounds for a large number of basic functions, derived based on the Taylor expansion, convexity, and special inequalities, are provided as ingredients for constructing surrogate functions. With the pre-requisites established, the way of applying MM to solving specific problems is elaborated by a wide range of applications in signal processing, communications, and machine learning.

1,073 citations

Journal Article•
TL;DR: A novel probabilistic interpretation of principal component analysis (PCA) that is based on a Gaussian process latent variable model (GP-LVM), and related to popular spectral techniques such as kernel PCA and multidimensional scaling.
Abstract: Summarising a high dimensional data set with a low dimensional embedding is a standard approach for exploring its structure In this paper we provide an overview of some existing techniques for discovering such embeddings We then introduce a novel probabilistic interpretation of principal component analysis (PCA) that we term dual probabilistic PCA (DPPCA) The DPPCA model has the additional advantage that the linear mappings from the embedded space can easily be non-linearised through Gaussian processes We refer to this model as a Gaussian process latent variable model (GP-LVM) Through analysis of the GP-LVM objective function, we relate the model to popular spectral techniques such as kernel PCA and multidimensional scaling We then review a practical algorithm for GP-LVMs in the context of large data sets and develop it to also handle discrete valued data and missing attributes We demonstrate the model on a range of real-world and artificially generated data sets

1,065 citations