scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the American Statistical Association in 1996"


Journal ArticleDOI
TL;DR: This work characterizes networked structures in terms of nodes (individual actors, people, or things within the network) and the ties, edges, or links that connect them.
Abstract: Social Network Analysis Methods And Social network analysis (SNA) is the process of investigating social structures through the use of networks and graph theory. It characterizes networked structures in terms of nodes (individual actors, people, or things within the network) and the ties, edges, or links (relationships or interactions) that connect them. Examples of social structures commonly visualized through social network ...

12,634 citations


BookDOI
TL;DR: This chapter discusses Convergence: Weak, Almost Uniform, and in Probability, which focuses on the part of Convergence of the Donsker Property which is concerned with Uniformity and Metrization.
Abstract: 1.1. Introduction.- 1.2. Outer Integrals and Measurable Majorants.- 1.3. Weak Convergence.- 1.4. Product Spaces.- 1.5. Spaces of Bounded Functions.- 1.6. Spaces of Locally Bounded Functions.- 1.7. The Ball Sigma-Field and Measurability of Suprema.- 1.8. Hilbert Spaces.- 1.9. Convergence: Almost Surely and in Probability.- 1.10. Convergence: Weak, Almost Uniform, and in Probability.- 1.11. Refinements.- 1.12. Uniformity and Metrization.- 2.1. Introduction.- 2.2. Maximal Inequalities and Covering Numbers.- 2.3. Symmetrization and Measurability.- 2.4. Glivenko-Cantelli Theorems.- 2.5. Donsker Theorems.- 2.6. Uniform Entropy Numbers.- 2.7. Bracketing Numbers.- 2.8. Uniformity in the Underlying Distribution.- 2.9. Multiplier Central Limit Theorems.- 2.10. Permanence of the Donsker Property.- 2.11. The Central Limit Theorem for Processes.- 2.12. Partial-Sum Processes.- 2.13. Other Donsker Classes.- 2.14. Tail Bounds.- 3.1. Introduction.- 3.2. M-Estimators.- 3.3. Z-Estimators.- 3.4. Rates of Convergence.- 3.5. Random Sample Size, Poissonization and Kac Processes.- 3.6. The Bootstrap.- 3.7. The Two-Sample Problem.- 3.8. Independence Empirical Processes.- 3.9. The Delta-Method.- 3.10. Contiguity.- 3.11. Convolution and Minimax Theorems.- A. Appendix.- A.1. Inequalities.- A.2. Gaussian Processes.- A.2.1. Inequalities and Gaussian Comparison.- A.2.2. Exponential Bounds.- A.2.3. Majorizing Measures.- A.2.4. Further Results.- A.3. Rademacher Processes.- A.4. Isoperimetric Inequalities for Product Measures.- A.5. Some Limit Theorems.- A.6. More Inequalities.- A.6.1. Binomial Random Variables.- A.6.2. Multinomial Random Vectors.- A.6.3. Rademacher Sums.- Notes.- References.- Author Index.- List of Symbols.

4,600 citations


Journal ArticleDOI
TL;DR: It is shown that the instrumental variables (IV) estimand can be embedded within the Rubin Causal Model (RCM) and that under some simple and easily interpretable assumptions, the IV estimand is the average causal effect for a subgroup of units, the compliers.
Abstract: We outline a framework for causal inference in settings where assignment to a binary treatment is ignorable, but compliance with the assignment is not perfect so that the receipt of treatment is nonignorable. To address the problems associated with comparing subjects by the ignorable assignment—an “intention-to-treat analysis”—we make use of instrumental variables, which have long been used by economists in the context of regression models with constant treatment effects. We show that the instrumental variables (IV) estimand can be embedded within the Rubin Causal Model (RCM) and that under some simple and easily interpretable assumptions, the IV estimand is the average causal effect for a subgroup of units, the compliers. Without these assumptions, the IV estimand is simply the ratio of intention-to-treat causal estimands with no interpretation as an average causal effect. The advantages of embedding the IV approach in the RCM are that it clarifies the nature of critical assumptions needed for a...

3,978 citations


Journal ArticleDOI
TL;DR: A description of the assumed context and objectives of multiple imputation is provided, and a review of the multiple imputations framework and its standard results are reviewed.
Abstract: Multiple imputation was designed to handle the problem of missing data in public-use data bases where the data-base constructor and the ultimate user are distinct entities. The objective is valid frequency inference for ultimate users who in general have access only to complete-data software and possess limited knowledge of specific reasons and models for nonresponse. For this situation and objective, I believe that multiple imputation by the data-base constructor is the method of choice. This article first provides a description of the assumed context and objectives, and second, reviews the multiple imputation framework and its standard results. These preliminary discussions are especially important because some recent commentaries on multiple imputation have reflected either misunderstandings of the practical objectives of multiple imputation or misunderstandings of fundamental theoretical results. Then, criticisms of multiple imputation are considered, and, finally, comparisons are made to alt...

3,495 citations


Journal ArticleDOI
TL;DR: All of the methods in this work can fail to detect the sorts of convergence failure that they were designed to identify, so a combination of strategies aimed at evaluating and accelerating MCMC sampler convergence are recommended.
Abstract: A critical issue for users of Markov chain Monte Carlo (MCMC) methods in applications is how to determine when it is safe to stop sampling and use the samples to estimate characteristics of the distribution of interest. Research into methods of computing theoretical convergence bounds holds promise for the future but to date has yielded relatively little of practical use in applied work. Consequently, most MCMC users address the convergence problem by applying diagnostic tools to the output produced by running their samplers. After giving a brief overview of the area, we provide an expository review of 13 convergence diagnostics, describing the theoretical basis and practical implementation of each. We then compare their performance in two simple models and conclude that all of the methods can fail to detect the sorts of convergence failure that they were designed to identify. We thus recommend a combination of strategies aimed at evaluating and accelerating MCMC sampler convergence, including ap...

1,860 citations


BookDOI
TL;DR: Balanced Incomplete Block Designs and t-Designs2-(v,k,l) Designs of Small OrderBIBDs with Small Block Sizet-designs, t = 3Steiner SystemsSymmetric DesignsResolvable and Near Resolvable DesignsLatin Squares, MOLS, and Orthogonal ArraysLatin SquareMutually Orthogonomic Latin Squares (MOLS)Incomplete MOLsOrthogonal ARrays of Index More Than OneOrthoghonal Array of Strength More Than TwoPairwise Balanced Designs
Abstract: Balanced Incomplete Block Designs and t-Designs2-(v,k,l) Designs of Small OrderBIBDs with Small Block Sizet-Designs, t = 3Steiner SystemsSymmetric DesignsResolvable and Near Resolvable DesignsLatin Squares, MOLS, and Orthogonal ArraysLatin SquaresMutually Orthogonal Latin Squares (MOLS)Incomplete MOLSOrthogonal Arrays of Index More Than OneOrthogonal Arrays of Strength More Than TwoPairwise Balanced DesignsPBDs and GDDs: The BasicsPBDs: Recursive ConstructionsPBD-ClosurePairwise Balanced Designs as Linear SpacesPBDs and GDDs of Higher IndexPBDs, Frames, and ResolvabilityOther Combinatorial DesignsAssociation SchemesBalanced (Part) Ternary DesignsBalanced Tournament DesignsBhaskar Rao DesignsComplete Mappings and Sequencings of Finite GroupsConfigurationsCostas ArraysCoveringsCycle SystemsDifference FamiliesDifference MatricesDifference Sets: AbelianDifference Sets: NonabelianDifference Triangle SetsDirected DesignsD-Optimal MatricesEmbedding Partial QuasigroupsEquidistant Permutation ArraysFactorial DesignsFrequency SquaresGeneralized QuadranglesGraph Decompositions and DesignsGraphical DesignsHadamard Matrices and DesignsHall Triple SystemsHowell DesignsMaximal Sets of MOLSMendelsohn DesignsThe Oberwolfach ProblemOrdered Designs and Perpendicular ArraysOrthogonal DesignsOrthogonal Main Effect PlansPackingsPartial GeometriesPartially Balanced Incomplete Block DesignsQuasigroupsQuasi-Symmetric Designs(r,l)-DesignsRoom SquaresSelf-Orthogonal Latin Squares (SOLS)SOLS with a Symmetric Orthogonal Mate (SOLSSOM)Sequences with Zero AutocorrelationSkolem SequencesSpherical t-DesignsStartersTrades and Defining Sets(t,m,s)-NetsTuscan Squarest-Wise Balanced DesignsUniformly Resolvable DesignsVector Space DesignsWeighing Matrices and Conference MatricesWhist TournamentsYouden Designs, GeneralizedYouden SquaresApplicationsCodesComputer Science: Selected ApplicationsApplications of Designs to CryptographyDerandomizationOptimality and Efficiency: Comparing Block DesignsGroup TestingScheduling a TournamentWinning the LotteryRelated Mathematics and Computational MethodsFinite Groups and DesignsNumber Theory and Finite FieldsGraphs and MultigraphsFactorizations of GraphsStrongly Regular GraphsTwo-GraphsClassical GeometriesProjective Planes, NondesarguesianComputational Methods in Design TheoryIndex

1,664 citations


Journal ArticleDOI
TL;DR: In this article, the authors recommend a "solve-the-equation" plug-in bandwidth selector as being most reliable in terms of overall performance for kernel density estimation.
Abstract: There has been major progress in recent years in data-based bandwidth selection for kernel density estimation. Some “second generation” methods, including plug-in and smoothed bootstrap techniques, have been developed that are far superior to well-known “first generation” methods, such as rules of thumb, least squares cross-validation, and biased cross-validation. We recommend a “solve-the-equation” plug-in bandwidth selector as being most reliable in terms of overall performance. This article is intended to provide easy accessibility to the main ideas for nonexperts.

1,340 citations


Journal ArticleDOI
TL;DR: In this paper, a review of techniques for constructing non-informative priors is presented and some of the practical and philosophical issues that arise when they are used are discussed.
Abstract: Subjectivism has become the dominant philosophical foundation for Bayesian inference. Yet in practice, most Bayesian analyses are performed with so-called “noninformative” priors, that is, priors constructed by some formal rule. We review the plethora of techniques for constructing such priors and discuss some of the practical and philosophical issues that arise when they are used. We give special emphasis to Jeffreys's rules and discuss the evolution of his viewpoint about the interpretation of priors, away from unique representation of ignorance toward the notion that they should be chosen by convention. We conclude that the problems raised by the research on priors chosen by formal rules are serious and may not be dismissed lightly: When sample sizes are small (relative to the number of parameters being estimated), it is dangerous to put faith in any “default” solution; but when asymptotics take over, Jeffreys's rules and their variants remain reasonable choices. We also provide an annotated b...

1,243 citations


Journal ArticleDOI
TL;DR: This article introduces a new criterion called the intrinsic Bayes factor, which is fully automatic in the sense of requiring only standard noninformative priors for its computation and yet seems to correspond to very reasonable actual Bayes factors.
Abstract: In the Bayesian approach to model selection or hypothesis testing with models or hypotheses of differing dimensions, it is typically not possible to utilize standard noninformative (or default) prior distributions. This has led Bayesians to use conventional proper prior distributions or crude approximations to Bayes factors. In this article we introduce a new criterion called the intrinsic Bayes factor, which is fully automatic in the sense of requiring only standard noninformative priors for its computation and yet seems to correspond to very reasonable actual Bayes factors. The criterion can be used for nested or nonnested models and for multiple model comparison and prediction. From another perspective, the development suggests a general definition of a “reference prior” for model comparison.

993 citations


BookDOI
TL;DR: In this article, the Kenny-Judd model with interaction effects is used for cross-domain analysis of change over time, combining growth modeling and covariance structure analysis, and a limited-information estimator for LISREL models with or without Heteroscedastic Errors is presented.
Abstract: Contents: G.A. Marcoulides, R.E. Schumacker, Introduction. W. Wothke, Models for Multitrait-Multimethod Matrix Analysis. K.G. Joreskog, F. Yang, Nonlinear Structural Equation Models: The Kenny-Judd Model With Interaction Effects. J.J. McArdle, F. Hamagami, Multilevel Models From a Multiple Group Structural Equation Perspective. J.B. Willett, A.G. Sayer, Cross-Domain Analyses of Change Over Time: Combining Growth Modeling and Covariance Structure Analysis. S.L. Hershberger, P.C.M. Molenaar, S.E. Corneal, A Hierarchy of Univariate and Multivariate Structural Times Series Models. Y-F. Yung, P.M. Bentler, Bootstrapping Techniques in Analysis of Mean and Covariance Structures. K.A. Bollen, A Limited-Information Estimator for LISREL Models With or Without Heteroscedastic Errors. J.L. Arbuckle, Full Information Estimation in the Presence of Incomplete Data. L.J. Williams, H. Bozdogan, L. Aiman-Smith, Inference Problems With Equivalent Models. H.W. Marsh, J.R. Balla, K-T. Hau, An Evaluation of Incremental Fit Indices: A Clarification of Mathematical and Empirical Properties.

925 citations



Journal ArticleDOI
Y. Vardi1
TL;DR: In this article, the problem of estimating the node-to-node traffic intensity from repeated measurements of traffic on the links of a network is formulated and discussed under Poisson assumptions and two types of traffic-routing regimens: deterministic (a fixed known path between each directed pair of nodes) and Markovian (a random path between a pair of vertices, determined according to a known Markov chain fixed for that pair).
Abstract: The problem of estimating the node-to-node traffic intensity from repeated measurements of traffic on the links of a network is formulated and discussed under Poisson assumptions and two types of traffic-routing regimens: deterministic (a fixed known path between each directed pair of nodes) and Markovian (a random path between each directed pair of nodes, determined according to a known Markov chain fixed for that pair). Maximum likelihood estimation and related approximations are discussed, and computational difficulties are pointed out. A detailed methodology is presented for estimates based on the method of moments. The estimates are derived algorithmically, taking advantage of the fact that the first and second moment equations give rise to a linear inverse problem with positivity restrictions that can be approached by an EM algorithm, resulting in a particularly simple solution to a hard problem. A small simulation study is carried out.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a methodology for estimating the usual intake distributions that allows for varying degrees of departure from normality and recognizes the measurement error associated with one-day dietary intakes.
Abstract: The distribution of usual intakes of dietary components is important to individuals formulating food policy and to persons designing nutrition education programs. The usual intake of a dietary component for a person is the long-run average of daily intakes of that component for that person. Because it is impossible to directly observe usual intake for an individual, it is necessary to develop an estimator of the distribution of usual intakes based on a sample of individuals with a small number of daily observations on a subsample of the individuals. Daily intake data for individuals are nonnegative and often very skewed. Also, there is large day-to-day variation relative to the individual-to-individual variation, and the within-individual variance is correlated with the individual means. We suggest a methodology for estimating usual intake distributions that allows for varying degrees of departure from normality and recognizes the measurement error associated with one-day dietary intakes. The est...

Journal ArticleDOI
TL;DR: In this paper, the authors investigated the impact of the normality assumption for random effects on their estimates in the linear mixed-effects model and showed that if the distribution of random effects is a finite mixture of normal distributions, then the random effects may be badly estimated if normality is assumed.
Abstract: This article investigates the impact of the normality assumption for random effects on their estimates in the linear mixed-effects model. It shows that if the distribution of random effects is a finite mixture of normal distributions, then the random effects may be badly estimated if normality is assumed, and the current methods for inspecting the appropriateness of the model assumptions are not sound. Further, it is argued that a better way to detect the components of the mixture is to build this assumption in the model and then “compare” the fitted model with the Gaussian model. All of this is illustrated on two practical examples.

Journal ArticleDOI
TL;DR: In this article, the authors derived general formulas for the asymptotic bias in regression coefficients and variance components estimated by penalized quasi-likelihood (PQL) in generalized linear mixed models with canonical link function and multiple sets of independent random effects.
Abstract: General formulas are derived for the asymptotic bias in regression coefficients and variance components estimated by penalized quasi-likelihood (PQL) in generalized linear mixed models with canonical link function and multiple sets of independent random effects. Easily computed correction matrices result in variance component estimates that have satisfactory asymptotic behavior for small values of the variance components and significantly reduce bias for larger values. Both first-order and second-order correction procedures are developed for regression coefficients estimated by PQL. The methods are illustrated through an analysis of an experiment on salamander matings involving crossed male and female random effects, and their properties are evaluated in a simulation study.

Journal ArticleDOI
TL;DR: The Gibbs sampler may be used to explore the posterior distribution without ever having established propriety of the posterior, showing that the output from a Gibbs chain corresponding to an improper posterior may appear perfectly reasonable.
Abstract: Often, either from a lack of prior information or simply for convenience, variance components are modeled with improper priors in hierarchical linear mixed models. Although the posterior distributions for these models are rarely available in closed form, the usual conjugate structure of the prior specification allows for painless calculation of the Gibbs conditionals. Thus the Gibbs sampler may be used to explore the posterior distribution without ever having established propriety of the posterior. An example is given showing that the output from a Gibbs chain corresponding to an improper posterior may appear perfectly reasonable. Thus one cannot expect the Gibbs output to provide a “red flag,” informing the user that the posterior is improper. The user must demonstrate propriety before a Markov chain Monte Carlo technique is used. A theorem is given that classifies improper priors according to the propriety of the resulting posteriors. Applications concerning Bayesian analysis of animal breeding...

Journal ArticleDOI
TL;DR: This article presents a general review of the major trends in the conceptualization, development, and success of case-control methods for the study of disease causation and prevention.
Abstract: Statisticians have contributed enormously to the conceptualization, development, and success of case-control methods for the study of disease causation and prevention. This article reviews the major developments. It starts with Cornfield's demonstration of odds ratio invariance under cohort versus case-control sampling, proceeds through the still-popular Mantel—Haenszel procedure and its extensions for dependent data, and highlights (conditional) likelihood methods for relative risk regression. Recent work on nested case-control, case-cohort, and two-stage case-control designs demonstrates the continuing impact of statistical thinking on epidemiology. The influence of R. A. Fisher's work on these developments is mentioned wherever possible. His objections to the drawing of causal conclusions from observational data on cigarette smoking and lung cancer are used to introduce the problems of measurement error and confounding bias. The resolution of such difficulties, whether by further development a...

Journal ArticleDOI
TL;DR: In this paper, an extension of the concept of quantiles in multidimensions that uses the geometry of multivariate data clouds has been considered, based on blending and generalization of the key ideas used in the construction of spatial median and regression quantiles, both of which have been extensively studied in the literature.
Abstract: An extension of the concept of quantiles in multidimensions that uses the geometry of multivariate data clouds has been considered. The approach is based on blending as well as generalization of the key ideas used in the construction of spatial median and regression quantiles, both of which have been extensively studied in the literature. These geometric quantiles are potentially useful in constructing trimmed multivariate means as well as many other L estimates of multivariate location, and they lead to a directional notion of central and extreme points in a multidimensional setup. Such quantiles can be defined as meaningful and natural objects even in infinite-dimensional Hilbert and Banach spaces, and they yield an effective generalization of quantile regression in multiresponse linear model problems. Desirable equivariance properties are shown to hold for these multivariate quantiles, and issues related to their computation for data in finite-dimensional spaces are discussed. n 1/2 consistenc...

Journal ArticleDOI
TL;DR: In this article, the Weibull distribution is embedded in a larger family obtained by introducing an additional shape parameter, which allows for a broader class of monotone hazard rates and is analytically tractable and computationally manageable.
Abstract: The Weibull distribution, which is frequently used for modeling survival data, is embedded in a larger family obtained by introducing an additional shape parameter. This generalized family not only contains distributions with unimodal and bathtub hazard shapes, but also allows for a broader class of monotone hazard rates. Furthermore, the distributions in this family are analytically tractable and computationally manageable. The modeling and analysis of survival data using this family is discussed and illustrated in terms of a lifetime dataset and the results of a two-arm clinical trial.

Journal ArticleDOI
TL;DR: It is shown that the instrumental variables (IV) estimand can be embedded within the Rubin Causal Model (RCM) and that under some simple and easily interpretable assumptions, the IV estimand is the average causal effect for a subgroup of units, the compliers.
Abstract: We outline a framework for causal inference in settings where assignment to a binary treatment is ignorable, but compliance with the assignment is not perfect so that the receipt of treatment is nonignorable. To address the problems associated with comparing subjects by the ignorable assignment-an intention-to-treat analysis-we make use of instrumental variables, which have long been used by economists in the context of regression models with constant treatment effects. We show that the instrumental variables (IV) estimand can be embedded within the Rubin Causal Model (RCM) and that under some simple and easily interpretable assumptions, the IV estimand is the average causal effect for a subgroup of units, the compliers. Without these assumptions, the IV estimand is simply the ratio of intention-to-treat causal estimands with no interpretation as an average causal effect. The advantages of embedding the IV approach in the RCM are that it clarifies the nature of critical assumptions needed for a causal interpretation, and moreover allows us to consider sensitivity of the results to deviations from key assumptions in a straightforward manner. We apply our analysis to estimate the effect of veteran status in the Vietnam era on mortality, using the lottery number that assigned priority for the draft as an instrument, and we use our results to investigate the sensitivity of the conclusions to critical assumptions.

Journal ArticleDOI
TL;DR: A general approach using Bayesian analysis for the estimation of parameters in physiological pharmacokinetic models is described, which includes hierarchical population modeling and informative prior distributions for population parameters.
Abstract: We describe a general approach using Bayesian analysis for the estimation of parameters in physiological pharmacokinetic models. The chief statistical difficulty in estimation with these models is that any physiological model that is even approximately realistic will have a large number of parameters, often comparable to the number of observations in a typical pharmacokinetic experiment (e.g., 28 measurements and 15 parameters for each subject). In addition, the parameters are generally poorly identified, akin to the well-known ill-conditioned problem of estimating a mixture of declining exponentials. Our modeling includes (a) hierarchical population modeling, which allows partial pooling of information among different experimental subjects; (b) a pharmacokinetic model including compartments for well-perfused tissues, poorly perfused tissues, fat, and the liver; and (c) informative prior distributions for population parameters, which is possible because the parameters represent real physiological...

Journal ArticleDOI
TL;DR: The question of what levels of contamination can be detected by this algorithm as a function of dimension, computation time, sample size, contamination fraction, and distance of the contamination from the main body of data is investigated.
Abstract: New insights are given into why the problem of detecting multivariate outliers can be difficult and why the difficulty increases with the dimension of the data. Significant improvements in methods for detecting outliers are described, and extensive simulation experiments demonstrate that a hybrid method extends the practical boundaries of outlier detection capabilities. Based on simulation results and examples from the literature, the question of what levels of contamination can be detected by this algorithm as a function of dimension, computation time, sample size, contamination fraction, and distance of the contamination from the main body of data is investigated. Software to implement the methods is available from the authors and STATLIB.

Journal ArticleDOI
TL;DR: In this article, a least squares regression analysis using either of these two missing-data approaches is performed, and the exact biases of the estimators for the regression coefficients and the residual variance are derived and reported.
Abstract: The statistical literature and folklore contain many methods for handling missing explanatory variable data in multiple linear regression. One such approach is to incorporate into the regression model an indicator variable for whether an explanatory variable is observed. Another approach is to stratify the model based on the range of values for an explanatory variable, with a separate stratum for those individuals in which the explanatory variable is missing. For a least squares regression analysis using either of these two missing-data approaches, the exact biases of the estimators for the regression coefficients and the residual variance are derived and reported. The complete-case analysis, in which individuals with any missing data are omitted, is also investigated theoretically and is found to be free of bias in many situations, though often wasteful of information. A numerical evaluation of the bias of two missing-indicator methods and the complete-case analysis is reported. The missing-indi...

Journal ArticleDOI
TL;DR: In this article, the authors consider the problem of combining a collection of general regression fit vectors to obtain a better predictive model and develop a general framework for this problem and examine a cross-validation-based proposal called "model mix" or "stacking" in this context.
Abstract: We consider the problem of how to combine a collection of general regression fit vectors to obtain a better predictive model. The individual fits may be from subset linear regression, ridge regression, or something more complex like a neural network. We develop a general framework for this problem and examine a cross-validation—based proposal called “model mix” or “stacking” in this context. We also derive combination methods based on the bootstrap and analytic methods and compare them in examples. Finally, we apply these ideas to classification problems where the estimated combination weights can yield insight into the structure of the problem.

Journal ArticleDOI
TL;DR: In this paper, the Neyman truncation and wavelet thresholding were proposed to detect sharp peaks and high frequency alternations, while maintaining the same capability in detecting smooth alternative densities as the traditional tests.
Abstract: Traditional nonparametric tests, such as the Kolmogorov—Smirnov test and the Cramer—Von Mises test, are based on the empirical distribution functions. Although these tests possess root-n consistency, they effectively use only information contained in the low frequencies. This leads to low power in detecting fine features such as sharp and short aberrants as well as global features such as high-frequency alternations. The drawback can be repaired via smoothing-based test statistics. In this article we propose two such kind of test statistics based on the wavelet thresholding and the Neyman truncation. We provide extensive evidence to demonstrate that the proposed tests have higher power in detecting sharp peaks and high frequency alternations, while maintaining the same capability in detecting smooth alternative densities as the traditional tests. Similar conclusions can be made for two-sample nonparametric tests of distribution functions. In that case, the traditional linear rank tests such as th...

Journal ArticleDOI
TL;DR: The Decision-Theoretic Foundations of Statistical Inference as discussed by the authors, from Prior Information to Prior Distributions, Tests and Confidence Regions, Admissibility and Complete Classes, Invariance, Haar Measures, and Equivariant Estimators.
Abstract: Contents: Introduction.- Decision-Theoretic Foundations of Statistical Inference.- From Prior Information to Prior Distributions.- Bayesian Point Estimation.- Tests and Confidence Regions.- Admissibility and Complete Classes.- Invariance, Haar Measures, and Equivariant Estimators.- Hierarchical and Empirical Bayes Extensions.- Bayesian Calculations.- A Defense of the Bayesian Choice.

Journal ArticleDOI
TL;DR: In this article, a weighted least squares algorithm is used to estimate the slope at the right upper tail of a Pareto quantile plot, based on classical ideas on regression diagnostics, algorithms can be constructed searching for that order statistic to the right of which one obtains an optimal linear fit of the quantile plots.
Abstract: Successful application of extreme value statistics for estimating the Pareto tail index relies heavily on the choice of the number of extreme values taken into account. It is shown that these tail index estimators can be considered estimates of the slope at the right upper tail of a Pareto quantile plot, obtained using a weighted least squares algorithm. From this viewpoint, based on classical ideas on regression diagnostics, algorithms can be constructed searching for that order statistic to the right of which one obtains an optimal linear fit of the quantile plot.

Journal ArticleDOI
TL;DR: In this paper, it is shown that instead of drawing n bootstrap observations (a customary bootstrap sampling plan), much less data should be sampled for bootstrapping pairs (response, explanatory variable), and that the probability of selecting the optimal subset of variables does not converge to 1 as n → ∞.
Abstract: In a regression problem, typically there are p explanatory variables possibly related to a response variable, and we wish to select a subset of the p explanatory variables to fit a model between these variables and the response. A bootstrap variable/model selection procedure is to select the subset of variables by minimizing bootstrap estimates of the prediction error, where the bootstrap estimates are constructed based on a data set of size n. Although the bootstrap estimates have good properties, this bootstrap selection procedure is inconsistent in the sense that the probability of selecting the optimal subset of variables does not converge to 1 as n → ∞. This inconsistency can be rectified by modifying the sampling method used in drawing bootstrap observations. For bootstrapping pairs (response, explanatory variable), it is found that instead of drawing n bootstrap observations (a customary bootstrap sampling plan), much less bootstrap observations should be sampled: The bootstrap selection p...

Journal ArticleDOI
TL;DR: In this article, the authors developed a noniterative, easily computed estimator of β for models in which some components of X are discrete, which is n ½ consistent and asymptotically normal.
Abstract: Others have developed average derivative estimators of the parameter β in the model E(Y|X = x) = G(xβ), where G is an unknown function and X is a random vector. These estimators are noniterative and easy to compute but require that X be continuously distributed. This article develops a noniterative, easily computed estimator of β for models in which some components of X are discrete. The estimator is n ½ consistent and asymptotically normal. An application to data on product innovation by German manufacturers illustrates the estimator's usefulness.

Journal ArticleDOI
TL;DR: In this article, conditional means priors are extended to generalized linear models and data augmentation priors where the prior is of the same form as the likelihood are also considered, and the prior distribution on regression coefficients is induced from this specification.
Abstract: This article deals with specifications of informative prior distributions for generalized linear models. Our emphasis is on specifying distributions for selected points on the regression surface; the prior distribution on regression coefficients is induced from this specification. We believe that it is inherently easier to think about conditional means of observables given the regression variables than it is to think about model-dependent regression coefficients. Previous use of conditional means priors seems to be restricted to logistic regression with one predictor variable and to normal theory regression. We expand on the idea of conditional means priors and extend these to arbitrary generalized linear models. We also consider data augmentation priors where the prior is of the same form as the likelihood. We show that data augmentation priors are special cases of conditional means priors. With current Monte Carlo methodology, such as importance sampling and Gibbs sampling, our priors result in...