scispace - formally typeset
Search or ask a question
Topic

Bernoulli sampling

About: Bernoulli sampling is a research topic. Over the lifetime, 354 publications have been published within this topic receiving 10927 citations.


Papers
More filters
Proceedings ArticleDOI
01 May 2007
TL;DR: The problem of non-parametric estimation of network flow characteristics, namely packet lengths and byte sizes, based on sampled flow data is considered, and two different approaches to deal with the problem are proposed.
Abstract: In this paper, we consider the problem of non-parametric estimation of network flow characteristics, namely packet lengths and byte sizes, based on sampled flow data. We propose two different approaches to deal with the problem at hand. The first one is based on single stage Bernoulli sampling of packets and their corresponding byte sizes. Subsequently, the flow length distribution is estimated by an adaptive expectation- maximization (EM) algorithm that in addition provides an estimate for the number of active flows. The estimation of the flow sizes (in bytes) is accomplished through a random effects regression model that utilizes the flow length information previously obtained. A variation of this approach, particularly suited for mixture distributions that appear in real network traces, is also considered. The second approach relies on a two-stage sampling procedure, which in the first stage samples flows amongst the active ones, while in the second stage samples packets from the sampled flows. Subsequently, the flow length distribution is estimated using another EM algorithm and the flow byte sizes based on a regression model. The proposed approaches are illustrated and compared on a number of synthetic and real data sets.

74 citations

Proceedings Article
01 Jul 1989
TL;DR: This work considers the design and analysis of algorithms to retrieve simple random samples from databases, and examines simple random sampling from B+ tree files, and considers both iterative and batch sampling methods.
Abstract: We consider the design and analysis of algorithms to retrieve simple random samples from databases Specifically, we examine simple random sampling from B+ tree files Existing methods of sampling from B+ trees, require the use of auxiliary rank information in the nodes of the tree Such modified B+ tree files are called “ranked B+ trees” We compare sampling from ranked Bt tree files, with new acceptance/rejection (A/R) sampling methods which sample directly from standard B+ trees Our new A/R sampling algorithm can easily be retrofit to existing DBMSs, and does not require the overhead of maintaining rank information We consider both iterative and batch sampling methods

73 citations

Book ChapterDOI
10 Apr 2006

62 citations

Journal ArticleDOI
TL;DR: Adaptive cluster sampling (ACS) is an adaptive sampling scheme which operates under the rule that when the observed value of an initially selected sampling unit satisfies some condition of interest, C, other additional units in some pre-defined accompanying neighborhood are also added to the sample as discussed by the authors.
Abstract: Adaptive cluster sampling (ACS) is an adaptive sampling scheme which operates under the rule that when the observed value of an initially selected sampling unit satisfies some condition of interest, C, other additional units in some pre-defined accompanying neighborhood are also added to the sample. In turn, if any of these additional units satisfy C, then their corresponding unit neighborhoods are added to the sample as well, and so on. This process stops when no additional units satisfying C are encountered. This paper will provide a review of the major developments and issues in ACS since its introduction by Thompson (1990) [Journal of the American Statistical Association, 85, 1050–1059].

61 citations

Posted Content
TL;DR: In this paper, a general procedure is proposed for fitting semiparametric models with estimated weights to two-phase data for Cox regression with stratified case-cohort studies, other complex survey designs and missing data problems.
Abstract: Weighted likelihood, in which one solves Horvitz-Thompson or inverse probability weighted (IPW) versions of the likelihood equations, offers a simple and robust method for fitting models to two phase stratified samples. We consider semiparametric models for which solution of infinite dimensional estimating equations leads to $\sqrt{N}$ consistent and asymptotically Gaussian estimators of both Euclidean and nonparametric parameters. If the phase two sample is selected via Bernoulli (i.i.d.) sampling with known sampling probabilities, standard estimating equation theory shows that the influence function for the weighted likelihood estimator of the Euclidean parameter is the IPW version of the ordinary influence function. By proving weak convergence of the IPW empirical process, and borrowing results on weighted bootstrap empirical processes, we derive a parallel asymptotic expansion for finite population stratified sampling. Whereas the asymptotic variance for Bernoulli sampling involves the within strata second moments of the influence function, for finite population stratified sampling it involves only the within strata variances. The latter asymptotic variance also arises when the observed sampling fractions are used as estimates of those known a priori. A general procedure is proposed for fitting semiparametric models with estimated weights to two phase data. Several of our key results have already been derived for the special case of Cox regression with stratified case-cohort studies, other complex survey designs and missing data problems more generally. This paper is intended to help place this previous work in appropriate context and to pave the way for applications to other models.

59 citations


Network Information
Related Topics (5)
Estimator
97.3K papers, 2.6M citations
78% related
Markov chain
51.9K papers, 1.3M citations
76% related
Statistical hypothesis testing
19.5K papers, 1M citations
74% related
Sample size determination
21.3K papers, 961.4K citations
74% related
Linear model
19K papers, 1M citations
73% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20218
20204
201910
20189
20179
201615