scispace - formally typeset
Search or ask a question

Showing papers on "Conditional probability distribution published in 2001"


Journal ArticleDOI
TL;DR: The proposed method is developed in the context of MCMC chains produced by the Metropolis–Hastings algorithm, whose building blocks are used both for sampling and marginal likelihood estimation, thus economizing on prerun tuning effort and programming.
Abstract: This article provides a framework for estimating the marginal likelihood for the purpose of Bayesian model comparisons. The approach extends and completes the method presented in Chib (1995) by overcoming the problems associated with the presence of intractable full conditional densities. The proposed method is developed in the context of MCMC chains produced by the Metropolis–Hastings algorithm, whose building blocks are used both for sampling and marginal likelihood estimation, thus economizing on prerun tuning effort and programming. Experiments involving the logit model for binary data, hierarchical random effects model for clustered Gaussian data, Poisson regression model for clustered count data, and the multivariate probit model for correlated binary data, are used to illustrate the performance and implementation of the method. These examples demonstrate that the method is practical and widely applicable.

1,106 citations


Journal ArticleDOI
TL;DR: In this paper, the authors developed a new test of a parametric model of a conditional mean function against a nonparametric alternative, which adapts to the unknown smoothness of the alternative model and is uniformly consistent against alternatives whose distance from the parametric models converges to zero at the fastest possible rate.
Abstract: We develop a new test of a parametric model of a conditional mean function against a nonparametric alternative. The test adapts to the unknown smoothness of the alternative model and is uniformly consistent against alternatives whose distance from the parametric model converges to zero at the fastest possible rate. This rate is slower than n -1/2 . Some existing tests have nontrivial power against restricted classes of alternatives whose distance from the parametric model decreases at the rate n -1/2 . There are, however, sequences of alternatives against which these tests are inconsistent and ours is consistent. As a consequence, there are alternative models for which the finite-sample power of our test greatly exceeds that of existing tests. This conclusion is illustrated by the results of some Monte Carlo experiments.

371 citations


Book ChapterDOI
TL;DR: This paper used an instrumental variables estimator for quantile regression on a sample of twins to estimate an entire family of returns to education at different quantiles of the conditional distribution of wages while addressing simultaneity and measurement error biases.
Abstract: Considerable effort has been exercised in estimating mean returns to education while carefully considering biases arising from unmeasured ability and measurement error. Recent work has investigated whether there are variations from the “mean” return to education across the population with mixed results. We use an instrumental variables estimator for quantile regression on a sample of twins to estimate an entire family of returns to education at different quantiles of the conditional distribution of wages while addressing simultaneity and measurement error biases. We test whether there is individual heterogeneity in returns to education and find that: more able individuals obtain more schooling perhaps due to lower marginal costs and/or higher marginal benefits of schooling and that higher ability individuals (those further to the right in the conditional distribution of wages) have higher returns to schooling consistent with a non-trivial interaction between schooling and unobserved abilities in the generation of earnings. The estimated returns are never lower than 9 percent and can be as high as 13 percent at the top of the conditional distribution of wages but they vary significantly only along the lower to middle quantiles. Our findings may have meaningful implications for the design of educational policies.

237 citations


Journal ArticleDOI
TL;DR: In this article, several bandwidth selection methods are derived ranging from fast rules-of-thumb which assume the underlying densities are known to relatively slow procedures which use the bootstrap, and a practical bandwidth selection strategy which combines the methods is proposed.

233 citations


Proceedings Article
03 Jan 2001
TL;DR: An equivalence is derived between AdaBoost and the dual of a convex optimization problem, showing that the only difference between minimizing the exponential loss used by Ada boost and maximum likelihood for exponential models is that the latter requires the model to be normalized to form a conditional probability distribution over labels.
Abstract: We derive an equivalence between AdaBoost and the dual of a convex optimization problem, showing that the only difference between minimizing the exponential loss used by AdaBoost and maximum likelihood for exponential models is that the latter requires the model to be normalized to form a conditional probability distribution over labels. In addition to establishing a simple and easily understood connection between the two methods, this framework enables us to derive new regularization procedures for boosting that directly correspond to penalized maximum likelihood. Experiments on UCI datasets support our theoretical analysis and give additional insight into the relationship between boosting and logistic regression.

198 citations


Journal ArticleDOI
TL;DR: The authors investigate the effects of dynamic heteroskedasticity on statistical factor analysis and show that identification problems are alleviated when variation in factor variances is accounted for. But their results apply to dynamic APT models and other structural models.

188 citations


Journal ArticleDOI
Andrew J. Patton1
TL;DR: In this article, the authors make use of a theorem due to Sklar (1959) which shows that an n-dimensional distribution function may be decomposed into its n marginal distributions, and a copula which completely describes the dependence between the n variables.
Abstract: Linear correlation is only an adequate means of describing the dependence between two random variables when they are jointly elliptically distributed. When the joint distribution of two or more variables is not elliptical the linear correlation coefficient becomes just one of many possible ways of summarising the dependence structure between the variables. In this paper we make use of a theorem due to Sklar (1959), which shows that an n-dimensional distribution function may be decomposed into its n marginal distributions, and a copula, which completely describes the dependence between the n variables. We verify that Sklar's theorem may be extended to conditional distributions, and apply it to the modelling of the time-varying joint distribution of the Deutsche mark - U.S. dollar and Yen - U.S. dollar exchange rate returns. We find evidence that the conditional dependence between these exchange rates is time-varying, and that it is asymmetric: dependence is greater during appreciations of the U.S. dollar against the mark and the yen than during depreciations of the U.S. dollar. We also find strong evidence of a structural break in the conditional copula following the introduction of the euro.

185 citations


Journal ArticleDOI
TL;DR: In this article, the authors investigated the local robustness properties of generalized method of moments (GMM) estimators and of a broad class of GMM-based tests in a unified framework.

180 citations


Journal ArticleDOI
Sheng Yue1
TL;DR: In this paper, the applicability of a bivariate gamma model with five parameters for describing the joint probability behavior of multivariate flood events was investigated by using the method of moments.
Abstract: A gamma distribution is one of the most frequently selected distribution types for hydrological frequency analysis. The bivariate gamma distribution with gamma marginals may be useful for analysing multivariate hydrological events. This study investigates the applicability of a bivariate gamma model with five parameters for describing the joint probability behavior of multivariate flood events. The parameters are proposed to be estimated from the marginal distributions by the method of moments. The joint distribution, the conditional distribution, and the associated return periods are derived from marginals. The usefulness of the model is demonstrated by representing the joint probabilistic behaviour between correlated flood peak and flood volume and between correlated flood volume and flood duration in the Madawask River basin in the province of Quebec, Canada. Copyright © 2001 John Wiley & Sons, Ltd.

172 citations


Journal ArticleDOI
TL;DR: In this paper, a nonparametric estimation theory in a nonstationary environment, more precisely in the framework of null recurrent Markov chains, is developed, which makes it possible to decompose the times series under consideration into independent and identical parts.
Abstract: We develop a nonparametric estimation theory in a nonstationary environment, more precisely in the framework of null recurrent Markov chains. An essential tool is the split chain, which makes it possible to decompose the times series under consideration into independent and identical parts. A tail condition on the distribution of the recurrence time is introduced. This condition makes it possible to prove weak convergence results for sums of functions of the process depending on a smoothing parameter. These limit results are subsequently used to obtain consistency and asymptotic normality for local density estimators and for estimators of the conditional mean and the conditional variance. In contradistinction to the parametric case, the convergence rate is slower than in the stationary case, and it is directly linked to the tail behavior of the recurrence time. Applications to econometric, and in particular to cointegration models, are indicated.

170 citations


Journal ArticleDOI
TL;DR: The MAR-ARCH models appear to capture features of the data better than the competing models and are applied to two real datasets and compared to other competing models.
Abstract: We propose a mixture autoregressive conditional heteroscedastic (MAR-ARCH) model for modeling nonlinear time series. The models consist of a mixture of K autoregressive components with autoregressive conditional heteroscedasticity; that is, the conditional mean of the process variable follows a mixture AR (MAR) process, whereas the conditional variance of the process variable follows a mixture ARCH process. In addition to the advantage of better description of the conditional distributions from the MAR model, the MARARCH model allows a more flexible squared autocorrelation structure. The stationarity conditions, autocorrelation function, and squared autocorrelation function are derived. Construction of multiple step predictive distributions is discussed. The estimation can be easily done through a simple EM algorithm, and the model selection problem is addressed. The shape-changing feature of the conditional distributions makes these models capable of modeling time series with multimodal conditional distr...

Journal ArticleDOI
TL;DR: A Monte Carlo expectation–conditional maximization algorithm is proposed for finding maximum likelihood estimates of the mixed model itself, extending and accelerating an algorithm for models with binary responses.
Abstract: A difficulty in joint modeling of continuous and discrete response variables is the lack of a natural multivariate distribution. For joint modeling of clustered observations on binary and continuous responses, we study a correlated probit model that has an underlying normal latent variable for the binary responses. Catalano and Ryan have factored the model into a marginal and a conditional component and used generalized estimating equations methodology to estimate the effects. We propose a Monte Carlo expectation–conditional maximization algorithm for finding maximum likelihood estimates of the mixed model itself, extending and accelerating an algorithm for models with binary responses. We demonstrate the methodology with a developmental toxicity study measuring fetal weight and a binary malformation status for several litters of mice. A simulation study suggests that efficiency gains of joint fittings over separate fittings of the response variables occur mainly for small datasets with strong correlation...

Journal ArticleDOI
TL;DR: A self-organizing map (SOM) is computed in the new metric to explore financial statements of enterprises and represents the (local) directions in which the probability of bankruptcy changes the most.
Abstract: We introduce a method for deriving a metric, locally based on the Fisher information matrix, into the data space. A self-organizing map (SOM) is computed in the new metric to explore financial statements of enterprises. The metric measures local distances in terms of changes in the distribution of an auxiliary random variable that reflects what is important in the data. In this paper the variable indicates bankruptcy within the next few years. The conditional density of the auxiliary variable is first estimated, and the change in the estimate resulting from local displacements in the primary data space is measured using the Fisher information matrix. When a self-organizing map is computed in the new metric it still visualizes the data space in a topology-preserving fashion, but represents the (local) directions in which the probability of bankruptcy changes the most.

Journal ArticleDOI
TL;DR: In this paper, the authors extend Robins' theory of causal inference for complex longitudinal data to the case of continuously varying covariates and treatments, and establish versions of the key results of the discrete theory: the g-computation formula and a collection of powerful characterizations of the gnull hypothesis of no treatment effect.
Abstract: We extend Robins’ theory of causal inference for complex longitudinal data to the case of continuously varying as opposed to discrete covariates and treatments. In particular we establish versions of the key results of the discrete theory: the g-computation formula and a collection of powerful characterizations of the g-null hypothesis of no treatment eect. This is accomplished under natural continuity hypotheses concerning the conditional distributions of the outcome variable and of the covariates given the past. We also show that our assumptions concerning counterfactual variables place no restriction on the joint distribution of the observed variables: thus in a precise sense, these assumptions are ‘for free’, or if you prefer, harmless.

Book ChapterDOI
TL;DR: In this article, the authors consider flexible conditional (regression) measures of market risk and cast value-at-risk modeling in terms of the quantile regression function, the inverse of the conditional distribution function.
Abstract: This paper considers flexible conditional (regression) measures of market risk. Value-at-Risk modeling is cast in terms of the quantile regression function – the inverse of the conditional distribution function. A basic specification analysis relates its functional forms to the benchmark models of returns and asset pricing. We stress important aspects of measuring the extremal and intermediate conditional risk. An empirical application characterizes the key economic determinants of various levels of conditional risk.

Journal ArticleDOI
TL;DR: In this article, the compatibility and near compatibility of conditional distributions are discussed, as well as the properties of conditionally specified distributions. But the main focus of this paper is on conditional distributions that are members of prescribed parametric families of distributions.
Abstract: A bivariate distribution can sometimes be characterized completely by properties of its conditional distributions. The present article surveys available research in this area. Questions of compatibility of conditional specifications are addressed as are characterizations of distributions based on their having conditional distributions that are members of prescribed parametric families of distributions. The topics of compatibility and near compatibility of conditional distributions are discussed. Estimation strategies for conditionally specified distributions are summarized. Additionally, certain conditionally specified densities are shown to provide convenient flexible conjugate prior families in certain multi- parameter Bayesian settings.

Posted Content
TL;DR: In this article, the authors make use of a theorem due to Sklar (1959) which shows that an n-dimensional distribution function may be decomposed into its n marginal distributions, and a copula which completely describes the dependence between the n variables.
Abstract: Linear correlation is only an adequate means of describing the dependence between two random variables when they are jointly elliptically distributed. When the joint distribution of two or more variables is not elliptical the linear correlation coefficient becomes just one of many possible ways of summarising the dependence structure between the variables. In this paper we make use of a theorem due to Sklar (1959), which shows that an n-dimensional distribution function may be decomposed into its n marginal distributions, and a copula, which completely describes the dependence between the n variables. We verify that Sklar's theorem may be extended to conditional distributions, and apply it to the modelling of the time-varying joint distribution of the Deutsche mark - U.S. dollar and Yen - U.S. dollar exchange rate returns. We find evidence that the conditional dependence between these exchange rates is time-varying, and that it is asymmetric: dependence is greater during appreciations of the U.S. dollar against the mark and the yen than during depreciations of the U.S. dollar. We also find strong evidence of a structural break in the conditional copula following the introduction of the euro.

Journal ArticleDOI
TL;DR: In this paper, a bivariate Bayes method is proposed for estimating the mortality rates of a single disease for a given population, using additional information from a second disease, where the information on the two diseases is assumed to be from the same population groups or areas.
Abstract: A bivariate Bayes method is proposed for estimating the mortality rates of a single disease for a given population, using additional information from a second disease. The information on the two diseases is assumed to be from the same population groups or areas. The joint frequencies of deaths for the two diseases for given populations are assumed to have a bivariate Poisson distribution with joint means proportional to the population sizes. The relationship between the mortality rates of the two different diseases if formulated through the twofold conditional autoregressive (CAR) model, where spatial effects as well as indexes of spatial dependence are introduced to capture the structured clusterings among areas. This procedure is compared to a univariate hierarchical Bayes procedure that uses information from one disease only. Comparisons of two procedures are made by the optimal property, a Monte Carlo study, real data, and the Bayes factor. All of the methods that we consider demonstrate a substantial...

Journal ArticleDOI
TL;DR: In this article, a nonlinear filtering approach to the estimation of asset price volatility is proposed, which is suitable for high frequency data and is based on a marked point process model.
Abstract: In this paper we consider a nonlinear filtering approach to the estimation of asset price volatility. We are particularly interested in models which are suitable for high frequency data. In order to describe some of the typical features of high frequency data we consider marked point process models for the asset price dynamics. Both jump-intensity and jump-size distribution of this marked point process depend on a hidden state variable which is closely related to asset price volatility. In our setup volatility estimation can therefore be viewed as a nonlinear filtering problem with marked point process observations. We develop efficient recursive methods to compute approximations to the conditional distribution of this state variable using the so-called reference probability approach to nonlinear filtering.

Journal ArticleDOI
Jushan Bai1, Serena Ng1
TL;DR: In this paper, a procedure for testing conditional symmetry is proposed, which does not require the data to be stationary or i.i.d., and the dimension of the conditional variables can be infinite.

Journal ArticleDOI
TL;DR: A novel EM algorithm for maximum likelihood estimation and derive standard errors by using Louis's formula is proposed and a real data set involving a melanoma cancer clinical trial is examined in detail to demonstrate the methodology.
Abstract: We propose maximum likelihood methods for parameter estimation for a novel class of semiparametric survival models with a cure fraction, in which the covariates are allowed to be missing. We allow the covariates to be either categorical or continuous and specify a parametric distribution for the covariates that is written as a sequence of one-dimensional conditional distributions. We propose a novel EM algorithm for maximum likelihood estimation and derive standard errors by using Louis's formula (Louis, 1982, Journal of the Royal Statistical Society, Series B 44, 226-233). Computational techniques using the Monte Carlo EM algorithm are discussed and implemented. A real data set involving a melanoma cancer clinical trial is examined in detail to demonstrate the methodology.

01 Jan 2001
TL;DR: In this article, the compatibility and near compatibility of conditional distributions of bivariate distributions have been investigated, as well as the properties of conditionally specified distributions in the context of Bayesian networks.
Abstract: A bivariate distribution can sometimes be characterized com- pletelybyproperties of its conditional distributions. The present article surveys available research in this area. Questions of compatibility of con- ditional specifications are addressed as are characterizations of distribu- tions based on their having conditional distributions that are members of prescribed parametric families of distributions. The topics of compat- ibilityand near compatibilityof conditional distributions are discussed. Estimation strategies for conditionallyspecified distributions are sum- marized. Additionally, certain conditionally specified densities are shown to provide convenient flexible conjugate prior families in certain multi- parameter Bayesian settings.

Journal ArticleDOI
TL;DR: In this paper, the authors consider a market model based on Wiener space with two agents on different information levels: a regular agent whose information is contained in the natural ltration of the Wiener process W, and an insider who possesses some extra information from the beginning of the trading interval, given by a random variable L which contains information from a whole time interval.

Book ChapterDOI
01 Jun 2001
TL;DR: The main result is that UMDA transforms the discrete optimization problem into a continuous one defined by the average fitness W(p1, . . . , p n ) as a function of the univariate marginal distributions p i.
Abstract: First we show that all genetic algorithms can be approximated by an algorithm which keeps the population in linkage equilibrium. Here the genetic population is given as a product of univariate marginal distributions. We describe a simple algorithm which keeps the population in linkage equilibrium. It is called the univariate marginal distribution algorithm (UMDA). Our main result is that UMDA transforms the discrete optimization problem into a continuous one defined by the average fitness W(p1, . . . , p n ) as a function of the univariate marginal distributions p i. For proportionate selection UMDA performs gradient ascent in the landscape defined by W(p). We derive a difference equation for p i which has already been proposed by Wright in population genetics. We show that UMDA solves difficult multimodal optimization problems. For functions with highly correlated variables UMDA has to be extended. The factorized distribution algorithm (FDA) uses a factorization into marginal and conditional distributions. For decomposable functions the optimal factorization can be explicitly computed. In general it has to be computed from the data. This is done by LFDA. It uses a Bayesian network to represent the distribution. Computing the network structure from the data is called learning in Bayesian network theory. The problem of finding a minimal structure which explains the data is discussed in detail. It is shown that the Bayesian information criterion is a good score for this problem.

Proceedings Article
02 Aug 2001
TL;DR: The algorithm is based on Lauritzen's algorithm, and is exact in a similar sense: it computes the exact distributions over the discrete nodes, and the exact first and second moments of the continuous ones, up to inaccuracies resulting from numerical integration used within the algorithm.
Abstract: Many real life domains contain a mixture of discrete and continuous variables and can be modeled as hybrid Bayesian Networks (BNs). An important subclass of hybrid BNs are conditional linear Gaussian (CLG) networks, where the conditional distribution of the continuous variables given an assignment to the discrete variables is a multivariate Gaussian. Lauritzen's extension to the clique tree algorithm can be used for exact inference in CLG networks. However, many domains include discrete variables that depend on continuous ones, and CLG networks do not allow such dependencies to be represented. In this paper, we propose the first "exact" inference algorithm for augmented CLG networks -- CLG networks augmented by allowing discrete children of continuous parents. Our algorithm is based on Lauritzen's algorithm, and is exact in a similar sense: it computes the exact distributions over the discrete nodes, and the exact first and second moments of the continuous ones, up to inaccuracies resulting from numerical integration used within the algorithm. In the special case of softmax CPDs, we show that integration can often be done efficiently, and that using the first two moments leads to a particularly accurate approximation. We show empirically that our algorithm achieves substantially higher accuracy at lower cost than previous algorithms for this task.

Journal ArticleDOI
TL;DR: In this paper, the authors examined the effects of pooling multiple probability judgments regarding unique binary events and showed that the average probability estimate is asymptotically perfectly diagnostic of the true event state as the number of estimates pooled goes to infinity.
Abstract: Wallsten et al. (1997) developed a general framework for assessing the quality of aggregated probability judgments. Within this framework they presented a theorem regarding the effects of pooling multiple probability judgments regarding unique binary events. The theorem states that under reasonable conditions, and assuming conditional pairwise independence of the judgments, the average probability estimate is asymptotically perfectly diagnostic of the true event state as the number of estimates pooled goes to infinity. The purpose of the present study was to examine by simulation (1) the rate of convergence of averaged judgments to perfect diagnostic value under various conditions and (2) the robustness of the theorem to violations of its assumption that the covert probability judgments are conditionally pairwise independent. The results suggest that while the rate of convergence is sensitive to violations of the conditional pairwise independence, the asymptotic properties remain relatively robust under a large variety of conditions. The practical implications of these results are discussed. Copyright © 2001 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: This article presented new characterizations of the integer-valued moving average model and gave moments and probability generating functions for four model variants, including Yule-Walker and conditional least-order.
Abstract: The paper presents new characterizations of the integer-valued moving average model. For four model variants, we give moments and probability generating functions. Yule-Walker and conditional least ...

01 Jan 2001
TL;DR: This article adapted the ARMA-Power GARCH model for the conditional mean and variance to analyze times series data showing asymmetry, and proposed a general dynamic model for skewness as measured by the odds ratio of having the next observation greater than the conditional mode.
Abstract: We show how the ARMA-Power GARCH model for the conditional mean and variance can be adapted to analyze times series data showing asymmetry. Dynamics is introduced in the location and the dispersion parameters of skewed location-scale distributions using the same type of structure found in the conditional mean and in the conditional variance in the ARMA-APARCH model. We also propose a general dynamic model for skewness as measured by the odds ratio of having the next observation greater than the conditional mode. This general tool is illustrated by the analysis of the DEM-USD exchange rate over the 1980-1996 period.

Journal ArticleDOI
TL;DR: In this paper, the main advantage of longitudinal studies is that they can distinguish changes over time within individuals (longitudinal effects) from differences among subjects at the start of the study (cross-sectional effects), however, longitudinal changes need to be studied after correction for potential important cross-sectional differences between subjects.
Abstract: The main advantage of longitudinal studies is that they can distinguish changes over time within individuals (longitudinal effects) from differences among subjects at the start of the study (cross-sectional effects). In observational studies, however, longitudinal changes need to be studied after correction for potential important cross-sectional differences between subjects. It will be shown that, in the context of linear mixed models, the estimation of longitudinal effects may be highly influenced by the assumptions about cross-sectional effects. Furthermore, aspects from conditional and mixture inference will be combined, yielding so-called conditional linear mixed models that allow estimation of longitudinal effects (average trends as well as subject-specific trends), independent of any cross-sectional assumptions. These models will be introduced and justified, and extensively illustrated in the analysis of longitudinal data from 680 participants in the Baltimore Longitudinal Study of Aging.

Journal ArticleDOI
TL;DR: In this paper, a change of measure at each step of the simulation is used to reduce the variance arising from the possibility of a barrier crossing at each monitoring date. And when these one-step conditional distributions are unavailable, they introduce algorithms that combine change-of-measure and estimation of conditional probabilities simultaneously.
Abstract: Pricing financial options often requires Monte Carlo methods. One particular case is that of barrier options, whose payoff may be zero depending on whether or not an underlying asset crosses a barrier during the life of the option. This paper develops variance reduction techniques that take advantage of the special structure of barrier options, and are appropriate for general simulation problems with similar structure. We use a change of measure at each step of the simulation to reduce the variance arising from the possibility of a barrier crossing at each monitoring date. The paper details the theoretical underpinnings of this method, and evaluates alternative implementations when exact distributions conditional on one-step survival are available and when not available. When these one-step conditional distributions are unavailable, we introduce algorithms that combine change of measure and estimation of conditional probabilities simultaneously. The methods proposed are more generally applicable to terminal reward problems on Markov processes with absorbing states.