# On the change-point problem

TL;DR: In this article, the problem of testing a shift in the level of a process occurring at an unknown time point given an intial sample of fixed size from the original unchanged process is considered.

Abstract: We consider the problem of testing a shift in the level of a process occurring at an unknown time point given an intial sample of fixed size from the original unchanged process. Various large sample results for the proposed test are formulated and examined.

...read more

##### Citations

More filters

••

TL;DR: In this paper, a binary search procedure was proposed to detect the changepoints in the sequence of the ratios of probabilities and obtain the maximum likelihood estimators of two multinomial probability vectors under the assumption that the probability ratio sequence has a changepoint.

Abstract: This article studies the problem of testing and locating changepoints in likelihood ratios of two multinomial probability vectors. We propose a binary search procedure to detect the changepoints in the sequence of the ratios of probabilities and obtain the maximum likelihood estimators of two multinomial probability vectors under the assumption that the probability ratio sequence has a changepoint. We also give a strongly consistent estimator for the changepoint location. An information theoretic approach is used to test the equality of two discrete probability distributions against the alternative that their ratios have a changepoint. Approximate critical values of the test statistics are provided by simulation for several choices of model parameters. Finally, we examine a real life data set pertaining to average daily insulin dose from the Boston Collaborative Drug Surveillance Program and locate the changepoints in the probability ratios.

4 citations

••

TL;DR: Shoutir Kishore Chatterjee (SKC) as discussed by the authors was the National Lecturer in Statistics (1985-1986), the President of the Section of Statistics of the Indian Science Congress (1989) and an Emeritus Scientist (1997-2000) of the Council of Scientific and Industrial Research, India.

Abstract: Shoutir Kishore Chatterjee was born in Ranchi, a small hill station in India, on November 6, 1934. He received his B.Sc. in statistics from the Presidency College, Calcutta, in 1954, and M.Sc. and Ph.D. degrees in statistics from the University of Calcutta in 1956 and 1962, respectively. He was appointed a lecturer in the Department of Statistics, University of Calcutta, in 1960 and was a member of its faculty until his retirement as a professor in 1997. Indeed, from the 1970s he steered the teaching and research activities of the department for the next three decades. Professor Chatterjee was the National Lecturer in Statistics (1985–1986) of the University Grants Commission, India, the President of the Section of Statistics of the Indian Science Congress (1989) and an Emeritus Scientist (1997–2000) of the Council of Scientific and Industrial Research, India. Professor Chatterjee, affectionately known as SKC to his students and admirers, is a truly exceptional person who embodies the spirit of eternal India. He firmly believes that “fulfillment in man’s life does not come from amassing a lot of money, after the threshold of what is required for achieving a decent living is crossed. It does not come even from peer recognition for intellectual achievements. Of course, one has to work and toil a lot before one realizes these facts.”

1 citations

••

TL;DR: Shoutir Kishore Chatterjee (SKC) as mentioned in this paper was the National Lecturer in Statistics (1985--1986) of the University Grants Commission, India, the President of the Section of Statistics of the Indian Science Congress (1989) and an Emeritus Scientist (1997--2000) of Council of Scientific and Industrial Research, India.

Abstract: Shoutir Kishore Chatterjee was born in Ranchi, a small hill station in India, on November 6, 1934. He received his B.Sc. in statistics from the Presidency College, Calcutta, in 1954, and M.Sc. and Ph.D. degrees in statistics from the University of Calcutta in 1956 and 1962, respectively. He was appointed a lecturer in the Department of Statistics, University of Calcutta, in 1960 and was a member of its faculty until his retirement as a professor in 1997. Indeed, from the 1970s he steered the teaching and research activities of the department for the next three decades. Professor Chatterjee was the National Lecturer in Statistics (1985--1986) of the University Grants Commission, India, the President of the Section of Statistics of the Indian Science Congress (1989) and an Emeritus Scientist (1997--2000) of the Council of Scientific and Industrial Research, India. Professor Chatterjee, affectionately known as SKC to his students and admirers, is a truly exceptional person who embodies the spirit of eternal India. He firmly believes that ``fulfillment in man's life does not come from amassing a lot of money, after the threshold of what is required for achieving a decent living is crossed. It does not come even from peer recognition for intellectual achievements. Of course, one has to work and toil a lot before one realizes these facts.''

##### References

More filters

••

233 citations

••

TL;DR: In this paper, the authors studied the properties of the test statistic T_n, which was proposed by H. Chernoff and S. Zacks to detect shifts in a parameter of a distribution function, occurring at unknown time points between consecutively taken observations.

Abstract: The present study is concerned with the properties of a test statistic proposed by H. Chernoff and S. Zacks [1] to detect shifts in a parameter of a distribution function, occurring at unknown time points between consecutively taken observations. The testing problem we study is confined to a fixed sample size situation, and can be described as follows: Given observations on independent random variables $X_1, \cdots, X_n$, (taken at consecutive time points) which are distributed according to $F(X; \theta_i); \theta_i \varepsilon \Omega$ for all $i = 1, \cdots, n$, one has to test the simple hypothesis: $H_0 : \theta_1 = \cdots = \theta_n = \theta_0$ ($\theta_0$ is known) against the composite alternative: $H_1 : \theta_1 = \cdots = \theta_m = \theta_0 \\ \theta_{m + 1} = \cdots = \theta_n = \theta_0 + \delta;\quad\delta > 0,$ where both the point of change, $m$, and the size of the change, $\delta$, are unknown $(m = 1, \cdots, n - 1), 0 < \delta < \infty$. A Bayesian approach led Chernoff and Zacks in [1] to propose the test statistic $T_n = \sum^{n - 1}_{i = 1} iX_{i + 1}$, for the case of normally distributed random variables. A generalization for random variables, whose distributions belong to the one parameter exponential family, i.e., their density can be represented as $f(x; \theta) = h(x) \exp \lbrack\psi_1(\theta)U(x) + \psi_2(\theta)\rbrack, \theta \varepsilon \Omega$ where $\psi_1(\theta)$ is monotone, yields the test statistic $T_n = \sum^{n - 1}_{i = 1} iU(x_{i + 1})$. In the present paper we study the operating characteristics of the test statistic $T_n$. General conditions are given for the convergence of the distribution of $T_n$, as the sample size grows, to a normal distribution. The rate of convergence is also studied. It was found that the closeness of the distribution function of $T_n$ to the corresponding normal distribution is not satisfactory for the purposes of determining test criteria and values of power functions, in cases of small samples from non-normal distributions. The normal approximation can be improved by considering the first four terms in Edgeworth's asymptotic expansion of the distribution function of $T_n$ (see H. Cramer [2] p. 227). Such an approximation involves the normal distribution, its derivatives and the semi-invariants of $T_n$. The goodness of the approximations based on such an expansion, and that of the simple normal approximation, for small sample situations, were studied for cases where the observed random variables are binomially or exponentially distributed. In order to compare the exact distribution functions of $T_n$ to the approximations, the exact forms of the distributions of $T_n$ in the binomial and exponential cases were derived. As seen in Section 4, these distribution functions are quite involved, especially under the alternative hypothesis. Tables of coefficients are given for assisting the determination of these distributions, under the null hypothesis assumption, in situations of samples whose size is $2 \leqq n \leqq 10$. For samples of size $n \geqq 10$ one can use the normal approximation to the test criterion and obtain good results. The power functions of the test statistic $T_n$, for the binomial and exponential cases, are given in Section 5. The comparison with the values of the power function obtained by the normal approximation is also given. As was shown by Chernoff and Zacks in [1], when $X$ is binomially distributed the power function values of $T_n$ are higher than those of a test statistic proposed by E. S. Page [5], for most of the $m$ values (points of shift) and $\delta$ values (size of shift). A comparative study in which the effectiveness of test procedures based on $T_n$ relative to those based on Page's and other procedures will be given elsewhere for the exponential case, and other distributions of practical interest.

144 citations

••

TL;DR: In this article, a nonparametric approach to the problem of testing for a shift in the level of a process occurring at an unknown time point when a fixed number of observations are drawn consecutively in time is presented.

Abstract: This work is an investigation of a nonparametric approach to the problem of testing for a shift in the level of a process occurring at an unknown time point when a fixed number of observations are drawn consecutively in time. We observe successively the independent random variables $X_1, X_2, \cdots, X_N$ which are distributed according to the continuous cdf $F_i, i = 1, 2, \cdots, N$. An upward shift in the level shall be interpreted to mean that the random variables after the change are stochastically larger than those before. Two versions of the testing problem are studied. The first deals with the case when the initial process level is known and the second when it is unknown. In the first case, we make the simplifying assumption that the distributions $F_i$ are symmetric before the shift and introduce the known initial level by saying that the point of symmetry $\gamma_0$ is known. Without loss of generality, we set $\gamma_0 = 0$. Defining a class of cdf's $\mathscr{F}_0 = \{F:F$ continuous, $F$ symmetric about origin$\}$, the problem of detecting an upward shift becomes that of testing the null hypothesis $H_0:F_0 = F_1 = \cdots = F_N,\quad\text{some}\quad F_0 \varepsilon\mathscr{F}_0,$ against the alternative $H_1:F_0 = F_1 = \cdots = F_m > F_{m + 1} = \cdots = F_N,\quad\text{some}\quad F_0 \varepsilon\mathscr{F}_0$ where $m(0 \leqq m \leqq N - 1)$ is unknown and the notation $F_m > F_{m + 1}$ indicates that $X_{m + 1}$ is stochastically larger than $X_m$. For the second situation with unknown initial level, the problem becomes that of testing the null hypothesis $H_0^\ast:F_1 = \cdots = F_N$, against the alternatives $H_1^\ast: F_1 = \cdots = F_m > F_{m + 1} = \cdots = F_N$, where $m(1 \leqq m \leqq N - 1)$ is unknown. Here the distributions are not assumed to be symmetric. The testing problem in the case of known initial level has been considered by Page [11], Chernoff and Zacks [2] and Kander and Zacks [7]. Assuming that the observations are initially from a symmetric distribution with known mean $\gamma_0$, Page proposes a test based on the variables $\operatorname{sgn} (X_i - \gamma_0)$. Chernoff and Zacks assume that the $F_i$ are normal cdf's with constant known variance and they derive a test for shift in the mean through a Bayesian argument. Their approach is extended to the one parameter exponential family of distributions by Kander and Zacks. Except for the test based on signs, all the previous work lies within the framework of a parametric statistics. The second formulation of the testing problem, the case of unknown initial level, has not been treated in detail. The only test proposed thus far is the one derived by Chenoff and Zacks for normal distributions with constant known variance. In both problems, our approach generally is to find optimal invariant tests for certain local shift alternatives and then to examine their properties. Our optimality criterion is the maximization of local average power where the average is over the space of the nuisance parameter $m$ with respect to an arbitrary weighting $\{q_i, i = 1, 2, \cdots, N: q_i \geqq 0, \sum^N_{i = 1} q_i = 1\}$. From the Bayesian viewpoint, $q_i$ may be interpreted as the prior probability that $X_i$ is the first shifted variate. Invariant tests with maximum local average power are derived for the case of known initial level in Section 2 and for the case of unknown initial level in Section 3. In both cases, the tests are distribution-free and they are unbiased for general classes of shift alternatives. They all depend upon the weight function $\{q_i\}$. With uniform weights, certain tests in Section 3 reduce to the standard tests for trend while a degenerate weight function leads to the usual two sample tests. In Section 4, we obtain the asymptotic distributions of the test statistics under local translation alternatives and investigate their Pitman efficiencies. Some small sample powers for normal alternatives are given in Section 5.

103 citations

••

TL;DR: In this paper, a nonparametric control chart based on partial weighted sums of sequential ranks is proposed to detect the unknown change-point quickly without too many false alarms, and it is shown that the appropriately scaled and linearly interpolated graph of partial rank sums converges to a Brownian motion.

Abstract: We consider sequential observation of independent random variables $X_1,\cdots, X_N$ whose distribution changes from $F$ to $G$ after the first $\lbrack N\theta \rbrack$ variables. The object is to detect the unknown change-point quickly without too many false alarms. A nonparametric control chart based on partial weighted sums of sequential ranks is proposed. It is shown that if the change from $F$ to $G$ is small, then as $N \rightarrow \infty$, the appropriately scaled and linearly interpolated graph of partial rank sums converges to a Brownian motion on which a drift sets in at time $\theta$. Using this, the asymptotic performance of the one-sided control chart is compared with one based on partial sums of the $X$'s. Location change, scale change and contamination are considered. It is found that for distributions with heavy tails, the control chart based on ranks stops more frequently and faster than the one based on the $X$'s. Performance of the two procedures are also tested on simulated data and the outcomes are compatible with the theoretical results.

79 citations

••

TL;DR: In this paper, a two-sided procedure proposed by E. S. Page for detecting a change in the location of the distribution of a sequence of independent observations which are ordered in time is studied.

Abstract: We study a two-sided procedure proposed by E. S. Page for detecting a change in the location of the distribution of a sequence of independent observations which are ordered in time. We approximate the null distribution of Page's statistic and the power of his test for finite sequences. When the procedure is applied to an infinite sequence we approximate the average run length. In order to obtain these approximations we find the distribution function of the range of a Wiener process with drift and the Laplace transform of the time at which the range first exceeds some given value.

35 citations