scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A reliable data-based bandwidth selection method for kernel density estimation

Simon J. Sheather, M. C. Jones1
01 Jul 1991-Journal of the royal statistical society series b-methodological (John Wiley & Sons, Ltd)-Vol. 53, Iss: 3, pp 683-690
TL;DR: The key to the success of the current procedure is the reintroduction of a non- stochastic term which was previously omitted together with use of the bandwidth to reduce bias in estimation without inflating variance.
Abstract: We present a new method for data-based selection of the bandwidth in kernel density estimation which has excellent properties It improves on a recent procedure of Park and Marron (which itself is a good method) in various ways First, the new method has superior theoretical performance; second, it also has a computational advantage; third, the new method has reliably good performance for smooth densities in simulations, performance that is second to none in the existing literature These methods are based on choosing the bandwidth to (approximately) minimize good quality estimates of the mean integrated squared error The key to the success of the current procedure is the reintroduction of a non- stochastic term which was previously omitted together with use of the bandwidth to reduce bias in estimation without inflating variance
Citations
More filters
Journal ArticleDOI
TL;DR: It is proved the convergence of a recursive mean shift procedure to the nearest stationary point of the underlying density function and, thus, its utility in detecting the modes of the density.
Abstract: A general non-parametric technique is proposed for the analysis of a complex multimodal feature space and to delineate arbitrarily shaped clusters in it. The basic computational module of the technique is an old pattern recognition procedure: the mean shift. For discrete data, we prove the convergence of a recursive mean shift procedure to the nearest stationary point of the underlying density function and, thus, its utility in detecting the modes of the density. The relation of the mean shift procedure to the Nadaraya-Watson estimator from kernel regression and the robust M-estimators; of location is also established. Algorithms for two low-level vision tasks discontinuity-preserving smoothing and image segmentation - are described as applications. In these algorithms, the only user-set parameter is the resolution of the analysis, and either gray-level or color images are accepted as input. Extensive experimental results illustrate their excellent performance.

11,727 citations


Cites methods from "A reliable data-based bandwidth sel..."

  • ...For the univariate case a reliable met hod for bandwidth selection is the plug-in rule [53], which was proven to be superior t o least squares cross validation and biased cross-validation [42], [55, p....

    [...]

Journal ArticleDOI
TL;DR: D diagnosis by intrinsic subtype adds significant prognostic and predictive information to standard parameters for patients with breast cancer.
Abstract: Purpose To improve on current standards for breast cancer prognosis and prediction of chemotherapy benefit by developing a risk model that incorporates the gene expression–based “intrinsic” subtypes luminal A, luminal B, HER2-enriched, and basal-like. Methods A 50-gene subtype predictor was developed using microarray and quantitative reverse transcriptase polymerase chain reaction data from 189 prototype samples. Test sets from 761 patients (no systemic therapy) were evaluated for prognosis, and 133 patients were evaluated for prediction of pathologic complete response (pCR) to a taxane and anthracycline regimen. Results The intrinsic subtypes as discrete entities showed prognostic significance (P = 2.26E-12) and remained significant in multivariable analyses that incorporated standard parameters (estrogen receptor status, histologic grade, tumor size, and node status). A prognostic model for node-negative breast cancer was built using intrinsic subtype and clinical information. The C-index estimate for t...

3,913 citations

Journal ArticleDOI
TL;DR: In this paper, a coherent data-generating process (DGP) is described for nonparametric estimates of productive efficiency on environmental variables in two-stage procedures to account for exogenous factors that might affect firms’ performance.

2,915 citations


Cites methods from "A reliable data-based bandwidth sel..."

  • ...…in Table 1, while the second column corresponds to the truncated regression estimates obtained with Algorithm #1 and 9The kernel density estimates were obtained using an Epanechnikov kernel and with bandwidths chosen by the two-stage plug-in procedure proposed by Sheather and Jones (1991)....

    [...]

  • ...Note that one can test whether a particular variable is an input, or an output, using the methods described in Simar and Wilson (2001a). (4)Coelli et al....

    [...]

  • ...The kernel density estimates were obtained using an Epanechnikov kernel and with bandwidths chosen by the two-stage plug-in procedure proposed by Sheather and Jones (1991). L....

    [...]

  • ...The kernel density estimates were obtained using an Epanechnikov kernel and with bandwidths chosen by the two-stage plug-in procedure proposed by Sheather and Jones (1991). L. Simar, P.W. Wilson / Journal of Econometrics 136 (2007) 31–64 46...

    [...]

  • ...Various assumptions regarding P are possible; we adopt those of Shephard (1970) and Färe (1988):...

    [...]

01 Jan 2002

2,894 citations


Cites background from "A reliable data-based bandwidth sel..."

  • ...Nevertheless, the McCulloch–Pitts model has been extremely influential in the development of artificial neural networks. Feed-forward neural networks can equally be seen as a way to parametrize a fairly general non-linear function. Such networks are rather general: Cybenko (1989), Funahashi (1989), Hornik, Stinchcombe and White (1989) and later authors have shown that neural networks with linear output units can approximate any continuous functionf uniformly on compact sets, by increasing the size of the hidden layer....

    [...]

  • ...Missing values S-PLUS has no support for missing values in character vectors; factors should be S+ used instead. There is a class " tring" described in Chambers (1998) but not fully implemented inS-PLUS....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a semiparametric procedure is presented to analyze the effects of institutional and labor market factors on recent changes in the U.S. distribution of wages, including de-unionization and supply and demand shocks.
Abstract: This paper presents a semiparametric procedure to analyze the effects of institutional and labor market factors on recent changes in the U.S. distribution of wages. The effects of these factors are estimated by applying kernel density methods to appropriately weighted samples. The procedure provides a visually clear representation of where in the density of wages these various factors exert the greatest impact. Using data from the Current Population Survey, we find, as in previous research, that de-unionization and supply and demand shocks were important factors in explaining the rise in wage inequality from 1979 to 1988. We find also compelling visual and quantitative evidence that the decline in the real value of the minimum wage explains a substantial proportion of this increase in wage inequality, particularly for women. We conclude that labor market institutions are as important as supply and demand considerations in explaining changes in the U.S. distribution of wages from 1979 to 1988.

2,677 citations

References
More filters
BookDOI
01 Jan 1986
TL;DR: The Kernel Method for Multivariate Data: Three Important Methods and Density Estimation in Action.
Abstract: Introduction. Survey of Existing Methods. The Kernel Method for Univariate Data. The Kernel Method for Multivariate Data. Three Important Methods. Density Estimation in Action.

15,499 citations


"A reliable data-based bandwidth sel..." refers methods in this paper

  • ...Silverman (1986), section 3....

    [...]

  • ...This is particularly so for nonparametric probability density function estimation by the kernel method, which is described in Silverman (1986), for example....

    [...]

Journal ArticleDOI
TL;DR: In this article, the authors compared several promising data-driven methods for selecting the bandwidth of a kernel density estimator, including least squares cross-validation, biased crossvalidation and a plug-in rule.
Abstract: This article compares several promising data-driven methods for selecting the bandwidth of a kernel density estimator. The methods compared are least squares cross-validation, biased cross-validation, and a plug-in rule. The comparison is done by asymptotic rate of convergence to the optimum and a simulation study. It is seen that the plug-in bandwidth is usually most efficient when the underlying density is sufficiently smooth, but is less robust when there is not enough smoothness present. We believe the plug-in rule is the best of those currently available, but there is still room for improvement.

456 citations

01 Jan 1988
TL;DR: This article compares several promising data-driven methods for selecting the bandwidth of a kernel density estimator and believes the plug-in rule is the best of those currently available, but there is still room for improvement.

454 citations

Journal ArticleDOI
TL;DR: In this paper, kernel density estimators are used for the estimation of integrals of various squared derivatives of a probability density, and rates of convergence in mean squared error are calculated, which show that appropriate values of the smoothing parameter are much smaller than those for ordinary density estimation.

257 citations

Journal ArticleDOI
TL;DR: Improved kernel-based estimates of integrated squared density derivatives are obtained by reinstating non-stochastic terms that have previously been omitted, and using the bandwidth to (approximately) cancel these positive quantities with the leading smoothing bias terms which are negative.

154 citations


"A reliable data-based bandwidth sel..." refers background or methods in this paper

  • ...For this estimate the bandwidths a and b are given by a normal scale model estimate of equation (9) and of the corresponding formula for estimating R(f"') in Jones and Sheather (1991) respectively to be a = 0....

    [...]

  • ...To obtain the necessary sufficiently small bias, it turns out that we must estimate R (fi"') by some Tsuch that T = R (f il) + o,(n -1/14) (Jones and Sheather, 1991)....

    [...]