scispace - formally typeset
Search or ask a question
Author

Lalitha Sanathanan

Bio: Lalitha Sanathanan is an academic researcher. The author has contributed to research in topics: Type (model theory) & Population. The author has an hindex of 1, co-authored 1 publications receiving 183 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, the problem of estimating the number of trials of a multinomial distribution, from an incomplete observation of the cell totals, under constraints on the cell probabilities, is investigated.
Abstract: This paper deals with the problem of estimating the number of trials of a multinomial distribution, from an incomplete observation of the cell totals, under constraints on the cell probabilities. More specifically let $(n_1, \cdots, n_k)$ be distributed according to the multinomial law $M(N; p_1, \cdots, p_k)$ where $N$ is the number of trials and the $p_i$'s are the cell probabilities, $\sum^k_{i=1}p_i$ being equal to 1. Suppose that only a proper subset of $(n_1, \cdots, n_k)$ is observable, that $N, p_1, \cdots, p_k$ are unknown and that $N$ is to be estimated. Without loss of generality, $(n_1, \cdots, n_{l-1}), l \leqq k$ may be taken to be the observable random vector. For fixed $N, (n_1, \cdots, n_{l-1}, N - n)$ has the multinomial distribution $M(N; p_1, \cdots, p_l)$ where $n$ denotes $\sum^{l-1}_{i=1}n_i$ and $p_l$ denotes $1 - \sum^{l-1}_{i=1}p_i$. If the parameter space is such that $N$ can take any nonnegative integral value and each $p_i$ can take any value between 0 and 1, such that $\sum^{l-1}_{i=1}p_i n$. In specific situations, it might, however, be possible to postulate constraints of the type \begin{equation*}\tag{1.1} p_i = f_i(\theta),\quad i = 1, \cdots, l\end{equation*} where $\theta = (\theta_1, \cdots, \theta_r)$ is a vector of $r$ independent parameters and $f_i$ are known functions. This may lead to estimability of $N$. The problem of estimating $N$ in such a situation is studied here. The present investigation is motivated by the following problem. Experiments in particle physics often involve visual scanning of film containing photographs of particles (occurring, for instance, inside a bubble chamber). The scanning is done with a view to counting the number $N$ of particles of a predetermined type (these particles will be referred to as events). But owing to poor visibility caused by such characteristics as low momentum, the distribution and configuration of nearby track patterns, etc., some events are likely to be missed during the scanning process. The question, then, is: How does one get an estimate of $N$? The usual procedure of estimating $N$ is as follows. Film containing the $N$ (unknown) events is scanned separately by $w$ scanners (ordered in some specific way) using the same instructions. For each event $E$ let a $w$-vector $Z(E)$ be defined, such that the $j$th component $Z_j$ of $Z(E)$ is 1 if $E$ is detected by the $j$th scanner and is 0 otherwise. Let $\mathscr{J}$ be the set of $2^w w$-vectors of 1's and 0's and let $I_0$ by the vector of 0's. Let $x_I$ be the number of events $E$ whose $Z(E) = I$. For $I \in \mathscr{J} - \{I_0\}$, the $x_I$'s are observed. A probability model is assumed for the results of the scanning process. That is, it is assumed that there is a probability $p_I$ that $Z(E)$ assumes the value $I$ and that these $p_I$'s are constrained by equations of the type (1.1) (These constraints vary according to the assumptions made about the scanners and events, thus giving rise to different models. An example of $p_I(\theta)$ would be $E( u^{\Sigma^w_{j=1}I_j}(1 - u)^{w-\Sigma^w_{j=1}I_j})$ where $I_j$ is the $j$th component of $I$ and expectation is taken with respect to the two-parameter beta density for $v$. This is the result of assuming that all scanners are equally efficient in detecting events, that the probability $v$ that an event is seen by any scanner is a random variable and that the results of the different scans are locally independent. For a discussion of various models, see Sanathanan (1969), Chapter III. $N$ is then estimated using the observed $x_I$'s and the constraints on the $P_I$'s, provided certain conditions (e.g., the minimum number of scans required) are met. The following formulation of the problem of estimating $N$, however, leads to some systematic study including a development of the relevant asymptotic distribution theory for the estimators. The $Z(E)$'s may be regarded as realizations of $N$ independent identically distributed random variables whose common distribution is discrete with probabilities $p_I$ at $I$ (In particle counting problems, it is usually true that the particles of interest are sparsely distributed throughout the film on account of their Poisson distribution with low intensity. Thus in spite of the factors affecting their visibility outlined earlier, the events can be assumed to be independent.). The joint distribution of the $x_I$'s is, then, multinomial $M(N; p_I, I \in \mathscr{J})$. The problem of estimating $N$ is now in the form stated at the beginning of this section. Since the estimate depends on the constraints provided for the $p_I$'s, it is important to test the "fit" on the model selected. The conditional distribution of the $x_I$'s $(I eq I_0)$ given $x$ is multinomial $M(x; p_I/p(I eq I_0))$ where $x$ is defined as $\sum_{I eq I_0} x_I$ and $p$ as $\sum_{I eq I_0}P_I$. The corresponding $\chi^2$ goodness of fit test may therefore be used to test the adequacy of a model in question. Various estimators of $N$ are considered in this paper and among them is, of course, the maximum likelihood estimator of $N$. Asymptotic theory for maximum likelihood estimation of the parameters of a multinomial distribution has been developed before for the case where $N$ is known but not for the case where $N$ is unknown. Asymptotic theory related to the latter case is developed is Section 4. The result on the asymptotic joint distribution of the relevant maximum likelihood estimators is stated in Theorem 2. A second method of estimation considered is that of maximizing the likelihood based on the conditional probability of observing $(n_1,\cdots, n_{l-1})$, given $n$. This method is called the conditional maximum likelihood (C.M.L.) method. The C.M.L. estimator of $N$ is shown (Theorem 2) to be asymptotically equivalent to the maximum likelihood estimator. Section 5 contains an extension of these results to the situation involving several multinomial distributions. This situation arises in the particle scanning context when the detected events are classified into groups based on some factor like momentum which is related to visibility of an event, and a separate scanning record is available for each group. A third method of estimation considered is that of equating certain linear combinations of the cell totals (presumably chosen on the basis of some criterion) to their respective expected values. Asymptotic theory for this method is given in Section 6. This discussion is motivated by a particular case which is applicable to some models in the particle scanning problem, using a criterion based on the method of moments for the unobservable random variable, given by the number of scanners detecting an event (Discussion of the particular case can be found in Sanathanan (1969) Chapter III.). In the next section we give some definitions and a preliminary lemma.

195 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Simulation studies show that this estimator compares well with maximum likelihood estimators (i.e., empirical Bayes estimators from the Bayesian viewpoint) for which an iterative numerical procedure is needed and may be infeasible.
Abstract: Consider a stochastic abundance model in which the species arrive in the sample according to independent Poisson processes, where the abundance parameters of the processes follow a gamma distribution. We propose a new estimator of the number of species for this model. The estimator takes the form of the number of duplicated species (i.e., species represented by two or more individuals) divided by an estimated duplication fraction. The duplication fraction is estimated from all frequencies including singleton information. The new estimator is closely related to the sample coverage estimator presented by Chao and Lee (1992, Journal of the American Statistical Association 87, 210-217). We illustrate the procedure using the Malayan butterfly data discussed by Fisher, Corbet, and Williams (1943, Journal of Animal Ecology 12, 42-58) and a 1989 Christmas Bird Count dataset collected in Florida, U.S.A. Simulation studies show that this estimator compares well with maximum likelihood estimators (i.e., empirical Bayes estimators from the Bayesian viewpoint) for which an iterative numerical procedure is needed and may be infeasible.

455 citations

Journal ArticleDOI
TL;DR: In this paper, the authors developed a model that uses repeated observations of a biological community to estimate the number and composition of species in the community and suggested extensions of their model to estimate maps of occurrence of individual species and to compute inferences related to the temporal and spatial dynamics of biological communities.
Abstract: We develop a model that uses repeated observations of a biological community to estimate the number and composition of species in the community. Estimators of community-level attributes are constructed from model-based estimators of occurrence of individual species that incorporate imperfect detection of individuals. Data from the North American Breeding Bird Survey are analyzed to illustrate the variety of ecologically important quantities that are easily constructed and estimated using our model-based estimators of species occurrence. In particular, we compute site-specific estimates of species richness that honor classical notions of species-area relationships. We suggest extensions of our model to estimate maps of occurrence of individual species and to compute inferences related to the temporal and spatial dynamics of biological communities.

432 citations

Journal ArticleDOI
TL;DR: A formal modelling framework for analysis of data obtained using the robust design of the Jolly-Seber method is provided and likelihood functions for the complete data structure under a variety of models are developed and examined.
Abstract: The Jolly-Seber method has been the traditional approach to the estimation of demographic parameters in long-term capture-recapture studies of wildlife and fish species. This method involves restrictive assumptions about capture probabilities that can lead to biased estimates, especially of population size and recruitment. Pollock (1982, Journal of Wildlife Management 46, 752-757) proposed a sampling scheme in which a series of closely spaced samples were separated by longer intervals such as a year. For this "robust design," Pollock suggested a flexible ad hoc approach that combines the Jolly-Seber estimators with closed population estimators, to reduce bias caused by unequal catchability, and to provide estimates for parameters that are unidentifiable by the Jolly-Seber method alone. In this paper we provide a formal modelling framework for analysis of data obtained using the robust design. We develop likelihood functions for the complete data structure under a variety of models and examine the relationship among the models. We compute maximum likelihood estimates for the parameters by applying a conditional argument, and compare their performance against those of ad hoc and Jolly-Seber approaches using simulation.

376 citations

Journal ArticleDOI
TL;DR: Even with very large samples, the analyst will not be able to distinguish among reasonable models of heterogeneity, even though these yield quite distinct inferences about population size, with models for closed and open populations.
Abstract: Heterogeneity in detection probabilities has long been recognized as problematic in mark-recapture studies, and numerous models developed to accommodate its effects. Individual heterogeneity is especially problematic, in that reasonable alternative models may predict essentially identical observations from populations of substantially different sizes. Thus even with very large samples, the analyst will not be able to distinguish among reasonable models of heterogeneity, even though these yield quite distinct inferences about population size. The problem is illustrated with models for closed and open populations.

306 citations

Journal ArticleDOI
TL;DR: A methodology is proposed for obtaining short-term projections of the acquired immunodeficiency syndrome (AIDS) epidemic by projecting the number of cases from those already infected with the AIDS virus, which is a lower bound on the size of the epidemic.
Abstract: A methodology is proposed for obtaining short-term projections of the acquired immunodeficiency syndrome (AIDS) epidemic by projecting the number of cases from those already infected with the AIDS virus. This is a lower bound on the size of the AIDS epidemic, because even if future infections could be prevented, one could still anticipate this number of cases. The methodology is novel in that no assumptions are required about either the number of infected individuals in the population or the probability of an infected individual eventually developing AIDS. The methodology presupposes knowledge of the incubation distribution, however, among those destined to develop AIDS. Although the method does not account for new infections, it may produce accurate short-term projections because of the relatively long incubation period from infection to clinically diagnosed AIDS. The estimation procedure “back-calculates” from AIDS incidence data to numbers previously infected. The number of cases diagnosed in ...

304 citations