scispace - formally typeset
Search or ask a question

Showing papers in "Annals of Mathematical Statistics in 1956"


Journal ArticleDOI
TL;DR: In this article, some aspects of the estimation of the density function of a univariate probability distribution are discussed, and the asymptotic mean square error of a particular class of estimates is evaluated.
Abstract: This note discusses some aspects of the estimation of the density function of a univariate probability distribution. All estimates of the density function satisfying relatively mild conditions are shown to be biased. The asymptotic mean square error of a particular class of estimates is evaluated.

4,284 citations


Journal ArticleDOI
TL;DR: In this paper, a measure of the information provided by an experiment is introduced, derived from the work of Shannon and involves the knowledge prior to performing the experiment, expressed through a prior probability distribution over the parameter space.
Abstract: A measure is introduced of the information provided by an experiment. The measure is derived from the work of Shannon [10] and involves the knowledge prior to performing the experiment, expressed through a prior probability distribution over the parameter space. The measure is used to compare some pairs of experiments without reference to prior distributions; this method of comparison is contrasted with the methods discussed by Blackwell. Finally, the measure is applied to provide a solution to some problems of experimental design, where the object of experimentation is not to reach decisions but rather to gain knowledge about the world.

1,449 citations


Journal ArticleDOI
TL;DR: In this paper, the authors considered the problem of consisteently estimating 0 (as n --* X ), where the chance variables were assumed to be scalars, and the parameters 0 and ai may be vectors.
Abstract: 0 and ai. The parameter 0, upon which all the distributions depend, is called "structural"; the parameters {aiI} are called "incidental". Throughout this paper we shall assume that the Xi, are independently distributed when 0, a1, *** a., are given, and shall consider the problem of consisteently estimating 0 (as n --* X ). The chance variables {Xij} and the parameters 0 and {fa} may be vectors. However, for simplicity of exposition we shall throughout this paper, except in Example 2, assume that they are scalars. Obvious changes will suffice to treat the vector case. Very many interesting problems are subsumed under the above formulation. Among these is the following:

1,060 citations


Journal ArticleDOI
TL;DR: In this article, the authors proved the asymptotic minimax character of the sample distribution function (d.f.) for estimating an unknown d.f. in the sense that the maximum deviation between the estimator and the true D.f is not a constant over the dimension of the distribution function.
Abstract: This paper is devoted, in the main, to proving the asymptotic minimax character of the sample distribution function (d.f.) for estimating an unknown d.f. in $\mathscr{F}$ or $\mathscr{F}_c$ (defined in Section 1) for a wide variety of weight functions. Section 1 contains definitions and a discussion of measurability considerations. Lemma 2 of Section 2 is an essential tool in our proofs and seems to be of interest per se; for example, it implies the convergence of the moment generating function of $G_n$ to that of $G$ (definitions in (2.1)). In Section 3 the asymptotic minimax character is proved for a fundamental class of weight functions which are functions of the maximum deviation between estimating and true d.f. In Section 4 a device (of more general applicability in decision theory) is employed which yields the asymptotic minimax result for a wide class of weight functions of this character as a consequence of the results of Section 3 for weight functions of the fundamental class. In Section 5 the asymptotic minimax character is proved for a class of integrated weight functions. A more general class of weight functions for which the asymptotic minimax character holds is discussed in Section 6. This includes weight functions for which the risk function of the sample d.f. is not a constant over $\mathscr{F}_c.$ Most weight functions of practical interest are included in the considerations of Sections 3 to 6. Section 6 also includes a discussion of multinomial estimation problems for which the asymptotic minimax character of the classical estimator is contained in our results. Finally, Section 7 includes a general discussion of minimization of symmetric convex or monotone functionals of symmetric random elements, with special consideration of the "tied-down" Wiener process, and with a heuristic proof of the results of Sections 3, 4, 5, and much of Section 6.

1,000 citations


Journal ArticleDOI
TL;DR: In this article, the authors show that the Pitman efficiency of the Kruskal-Wallis test never falls below 0.864, and that the same result holds for the location parameter of a single symmetric distribution.
Abstract: Consider samples from continuous distributions F(x) and F(x — θ). We may test the hypothesis θ = 0 by using the two-sample Wilcoxon test. We show in Section 1 that its asymptotic Pitman efficiency, relative to the f-test, never falls below 0.864. This result also holds for the Kruskal-Wallis test compared with the jF-test, and for testing the location parameter of a single symmetric distribution.

534 citations



Book ChapterDOI
TL;DR: In this paper, the authors considered the problem of finding the maximum and the minimum of the expected value of a real-valued function of a function g(S) when ES = np, and showed that the variability in the number of successes is highest when the successes are equally probable.
Abstract: Let S be the number of successes in n independent trials, and let p i denote the probability of success in the jth trial, j = 1, 2, …, n (Poisson trials). We consider the problem of finding the maximum and the minimum of Eg(S), the expected value of a given real-valued function of S, when ES = np is fixed. It is well known that the maximum of the variance of S is attained when p 1 = p 2 = … = p n = p This can be interpreted as showing that the variability in the number of successes is highest when the successes are equally probable (Bernoulli trials). This interpretation is further supported by the following two theorems, proved in this paper. If b and c are two integers, 0 ≦,b≦np≦c≦n, the probability P(b ≦S ≦ c) attains its minimum if and only if p 1 = p 2 = … = p n = p, unless b = 0 and c = n (Theorem 5, a corollary of Theorem 4, which gives the maximum and the minimum of P(S ≦ cc)). If g is a strictly convex function, Eg(S) attains its maximum if and only if p 1 = p 2 = … = p n = p (Theorem 3). These results are obtained with the help of two theorems concerning the extrema of the expected value of an arbitrary function g(S) under the condition ES = np. Theorem 1 gives necessary conditions for the maximum and the minimum of Eg(S). Theorem 2 gives a partial characterization of the set of points at which an extremum is attained. Corollary 2.1 states that the maximum and the minimum are attained when p 1, p 2, …, p n take on, at most, three different values, only one of which is distinct from 0 and 1. Applications of Theorems 3 and 5 to problems of estimation and testing are pointed out in Section 5.

377 citations


Journal ArticleDOI
TL;DR: In this paper, the Wilcoxon one sample signed rank test is used to find good tests of such null hypotheses as X_1, X_N$ are independently and identically distributed symmetrically about zero against such alternatives as slippage to the right.
Abstract: The one-sample problem is considered using techniques developed earlier [2], [3]. Let $Z = (Z_1, \cdots, Z_N)$ be a random vector with $Z_i = 1(0)$ if the $i$th smallest in absolute value in a sample of $N$ from the density $f(x)$ is positive (negative). Then $$P(Z = z) = N! \int_{\cdots_{0\leqq y_1\leqq\cdots\leqq yN\leqq\infty}}\int \prod_{i=1}^N \lbrack f^{1-z_i} (-y_i)f^{z_i}(y_i) dy_i\rbrack$$ Conditions are found implying $P(Z = z) > P(Z = z')$ where $z$ is derived from $z'$ by replacing a 0 by a 1, or interchanging a 0 and 1 in $z'$ by moving the 1 to the right. These conditions are met by the normal and other distributions. The results are useful in finding good tests of such null hypotheses as $X_1, \cdots, X_N$ are independently and identically distributed symmetrically about zero against such alternatives as slippage to the right. The Wilcoxon one sample signed rank test is a typical nonparametric procedure used under these conditions [4].

302 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present formulas for crossed and nested classifications, based on a model of sufficient generality and flexibility that the necessary assumptions concern only the selection of the levels of the factors and not the behavior of what is being experimented upon.
Abstract: 1. Summary. The assumptions appropriate to the application of analysis of variance to specific examples, and the effects of these assumptions on the resulting interpretations, are today a matter of very active discussion. Formulas for average values of mean squares play a central role in this problem, as do assumptionis about interactions. This paper presents formulas for crossed (and, incidentally, for nested and for non-interacting completely randomized) classifications, based on a model of sufficient generality and flexibility that the necessary assumptions concern only the selection of the levels of the factors and not the behavior of what is being experimented upon. (This means, in particular, that the average response is an arbitrary function of the factors.) These formulas are not very complex, and specialize to the classical results for crossed and nested classifications, when appropriate restrictions are made. Complete randomization is only discussed for the elementary case of "no interactions with experimental units" and randomized blocks are not discussed. In discussion and proof, we give most space to the two-way classification with replication, basing our direct proof more closely on the proof independently obtained by Cornfield [17], than on the earlier proof by Tukey [201. We also treat the three-way classification in detail. Results for the general factorial are also stated and proved. The relation of this paper to other recent work, published and unpublished, is discussed in Section 4 (average values of mean squares) and in Section 11 (various types of linear models). INITIAL DISCUSSION 2. Introduction. During the last years of the last decade it was relatively easy to believe that the analysis of variance was well understood. Eisenhart's summary article of 1947 [5], when combined with the work of Pitman [13] and Welch [15] on the randomization approach (work published in 1937-1938, which ever since has been far too much neglected), seemed to provide a simple, easily understandable account of the foundations. But as the years have passed, both statisticians and users of analysis of variance have gradually become aware of a number of areas in which we needed to deepen our understanding. One of these is the relation of formulas for average values of mean squares to assumptions. These are of central importance, since the choice of an "error term" as a basis for either

283 citations


Journal ArticleDOI
TL;DR: In this paper, Monte Carlo techniques are introduced, using stochastic models which are Markov processes, which are proved to converge with probability 1, and thus to yield direct statistical estimates of the solution to the $N$-dimensional Dirichlet problem.
Abstract: Monte Carlo techniques are introduced, using stochastic models which are Markov processes. This material includes the $N$-dimensional Spherical, General Spherical, and General Dirichlet Domain processes. These processes are proved to converge with probability 1, and thus to yield direct statistical estimates of the solution to the $N$-dimensional Dirichlet problem. The results are obtained without requiring any further restrictions on the boundary or the function defined on the boundary, in addition to those required for the existence and uniqueness of the solution to the Dirichlet problem. A detailed study is made for the $N$-dimensional Spherical process; this includes a study of the order of the average number of steps required for convergence. Asymptotic confidence intervals are obtained. When computing effort is measured in terms of the order of the average number of steps required for convergence, the often-made conjecture that the computing effort of a Monte Carlo procedure should be a linear function of the dimensionality of the problem is shown to be true for the cases considered. Comments are included regarding the application of these processes on digital computers, and truncation methods are suggested.

280 citations


Journal ArticleDOI
TL;DR: In this article, it was shown that for the special case of the exponential family of distributions, an admissible minimax procedure must be of the form: choose action 1 (accept the hypothesis) if $x x_0$. If $x = x-0, randomization may be required.
Abstract: In many statistical decision problems, the observations can be summarized in a single sufficient statistic such that the likelihood ratio for any two distributions in the family under consideration is a monotone function of that statistic. This paper assumes, accordingly, that the statistician's decision is to be based upon a single observation of a random variable $X$, whose distribution is given by (1) and satisfies the inequality (2) in Section 1. As examples of this family of distributions, we have the exponential family such as the normal, binomial, and Poisson. Other kinds of examples are given in Section 1. In connection with the ordinary testing problem, Allen [1] showed that for the composite testing problem of the one-sided type for the special case of the exponential family of distributions, an admissible minimax procedure must be of the form: choose action 1 (accept the hypothesis) if $x x_0$. If $x = x_0$, randomization may be required. Sobel [2] and Chernoff obtained partial results for the same class of distributions when the set of decisions is finite. This paper unifies, extends, and strengthens these results and treats of a wide variety of statistical decision problems for which the densities have a monotone likelihood ratio. In Section 1 the fundamental definition and preliminaries are introduced. In particular, the conditions imposed on the loss functions and the densities are delimited and some simple properties of these quantities are developed. In Section 2 we establish some of the basic lemmas. Noteworthy are Lemmas 1 and 2 which express the variation of sign diminishing properties of the densities which possess a monotone likelihood ratio. The essential completeness of the set of all monotone strategies (see Section 3 for the definition) in the class of all statistical procedures is demonstrated in Section 3 for the case of a finite number of actions. Section 4 deals with the problem of determining the form of all Bayes strategies for the statistician. The important problem of admissibility is studied in detail in Section 5. In the next section a study of the Bayes strategies for nature is made for the case of two actions. In Section 7 the complete class theory is carried through for the case of an infinite number of actions. This is accomplished by employing an argument involving a limiting procedure from the case of finite actions as treated in Section 3. The eighth section presents an analysis of the nature of the Bayes strategies for the case of an infinite number of actions. The final section entails a brief discussion of the connection of invariance theory and the conditions of monotonicity as are required throughout this paper. Further extensions of these ideas in a different direction, which involves relaxing the conditions on the loss functions and strengthening the requirements on the densities, can be found in [3].




Journal ArticleDOI
TL;DR: In this article, the authors studied the general queueing process for the case where a limiting distribution exists and gave necessary and sufficient conditions for the finiteness of various moments in the process.
Abstract: The authors continue the study (initiated in [1]) of the general queueing process (arbitrary distributions of service time and time between successive arrivals, many servers) for the case $(\rho < 1)$ where a limiting distribution exists. They discuss convergence with probability one of the mean waiting time, mean queue length, mean busy period, etc. Necessary and sufficient conditions for the finiteness of various moments are given. These results have consequences for the theory of random walk, some of which are pointed out. This paper is self-contained and may be read independently of [1]; the necessary results of [1] are quoted. No previous knowledge of the theory of queues is required for reading either [1] or the present paper.

Journal ArticleDOI
TL;DR: In this article, basic expected values are given of order statistics and products of order statistic, for samples of size twenty and less to 10 decimal places (DP), and certain other functions are tabulated to 25 DP to facilitate extension to larger sample sizes.
Abstract: Tables of the means, variances, and covariances, to five decimal places, of order statistics from samples of size ten or less have been given by Godwin [3]. In this paper basic expected values are given of order statistics and products of order statistics, for samples of size twenty and less to 10 decimal places (DP). In addition, certain other functions are tabulated to 25 DP to facilitate extension to larger sample sizes.


Journal ArticleDOI
TL;DR: In this article, a hypothesis on the structure of the probabilities in the different cells or categories is put forward and a large sample test of this hypothesis in terms of π 2 is offered.
Abstract: In a situation in which the observations are frequencies in a multi-way contingency table such that the observations are supposed to be independent and it is only the total number that is supposed to be fixed from sample to sample, a hypothesis on the structure of the probabilities in the different cells or categories is put forward. This hypothesis, by a certain analogy with the customary terminology of analysis of variance, is defined to be the hypothesis of "no interaction" and a large sample test of this hypothesis in terms of $\chi^2$ is offered. Bartlett's results [1] for the case of a $2 \times 2 \times 2$ table and Norton's results [5] for the case of a $2 \times 2 \times t$ table formally turn out to be special cases of the results given here with these differences; (i) Bartlett's and Norton's results refer to "analysis of variance" situations, with marginal frequencies along at least two ways of the table being fixed, while in this paper, for reasons explained elsewhere [7], it is only the total $n$ that is held fixed. (ii) Bartlett's and Norton's papers do not give any indication of the mechanism behind the formulae for the hypothesis of "no interaction," while this paper attempts to give a definite mathematical (and perhaps also physical) mechanism behind the formulae.


Journal ArticleDOI
TL;DR: In this paper, a mixed model is proposed in which the problem of the appropriate assumptions to make about the joint distribution of the random main effects and interactions is solved by letting this joint distribution follow from more basic and "natural" assumptions about the cell means.
Abstract: 1. Summary. A "mixed model" is proposed in which the problem of the appropriate assumptions to make about the joint distribution of the random main effects and interactions is solved by letting this joint distribution follow from more basic and "natural" assumptions about the cell means. The expectations of the mean squares ordinarily calculated turn out, with suitable definition of the variance components, to have the same values as those usually found in more restrictive models, and some of the customary tests and confidence intervals are justified, but some aspects appear to be novel. For example, the over-all test found for the fixed main effects and the associated multiplecomparison method require Hotelling's T2. 2. Introduction. We consider K replications of a two-way layout with I rows and J columns (I > 1, J > 1, K _ 1), the rows corresponding to levels of a "Model I" [4] factor A, whose effects we wish to regard as fixed effects, and the columns corresponding to the levels of a "Model II" factor B, whose effects we wish to regard as random effects. We let yijk denote the kth measurement in

Journal ArticleDOI
TL;DR: In this paper, a theorem based on a method of A. Birnbaum and E. Lehmann concerning the admissibility of certain tests of simple hypotheses in multivariate exponential families was proved.
Abstract: In Section 3 we shall prove a theorem based on a method of A. Birnbaum [1] and E. Lehmann concerning the admissibility of certain tests of simple hypotheses in multivariate exponential families. In Section 4 we compute the supporting hyperplanes of the convex acceptance region in some of the most common applications of Hotelling's $T^2$-test and show that the theorem of Section 3 implies the admissibility of this test. In Section 5 we point out some of the limitations of the method of this paper.

Journal ArticleDOI
TL;DR: In this article, a table is given to simplify the estimation of the parameters of incomplete gamma or type-III distributions, and a new procedure is suggested for estimating the parameter of a truncated gamma distribution.
Abstract: : A table is given to simplify the estimation of the parameters of an incomplete Gamma or type-III distribution. A new procedure is suggested for estimating the parameters of a truncated gamma distribution. The method is considered applicable for doubly truncated gamma distributions, singly or doubly truncated normal distributions, and a beta distribution with known range, either truncated or not. The technique is also considered useful in estimating the parameters of the normal curve for the case of systematic gaps in the observations. The method does not appear to be satisfactory for distributions with finite but unknown ranges.

Journal ArticleDOI
TL;DR: In this article, the problem of finding an exact solution for the probability distribution of the waiting-line length as a function of time is reduced to the solution of an integral equation of the Volterra type.
Abstract: Summary. Waiting-line or queuing processes of the Markov type are studied, the incoming traffic being of Poisson type and having negative-exponential holding time. The parameters are allowed to depend on time. The problem of finding an exact solution for the probability distribution of the waiting-line length as a function of time is reduced to the solution of an integral equation of the Volterra type. When the ratio of the parameters for the incoming and outgoing traffic is constant, this equation can be solved explicitly and the required distribution obtained. Using this solution, the behavior of the process for large values of t is studied, particularly for the unstable case with traffic intensity > 1. Statement of the problem. We shall consider a Markov process n(t) taking values in the discrete space of nonnegative integers 0, 1, 2, * - . , for which there exist nonnegative continuous functions X(t) and ,u(t) satisfying

Journal ArticleDOI
TL;DR: The terminology of random interactions is defined and illustrated in Section 1. A little historical background not very familiar to statisticians is sketched in Section 2. In Section 3 some difficulties about the formulation of random interaction are discussed and in Section 4 deals with models reflecting a randomization in the experiment to assign the treatment combinations to finite populations of experimental units.
Abstract: The terminology is defined and illustrated in Section 1. A little historical background not very familiar to statisticians is sketched in Section 2. In Section 3 some difficulties about the formulation of random interactions are discussed. Section 4 deals with models reflecting a randomization in the experiment to assign the treatment combinations to finite populations of experimental units.

Journal ArticleDOI
TL;DR: In this article, the authors derived the asymptotic expansion of a percentage point of Hotelling's generalized $T^2_0$ distribution in terms of the corresponding percentage points of a $\chi^2$ distribution.
Abstract: In this paper the asymptotic expansion of a percentage point of Hotelling's generalized $T^2_0$ distribution is derived in terms of the corresponding percentage point of a $\chi^2$ distribution. Our result generalizes Hotelling's and Frankel's asymptotic expansion for the generalized Student $T$ [3], [4]. The technique used in this paper for obtaining the asymptotic expansion of $T^2_0$ is an extension of the previous methods of Welch [8] and of James [5], [6], who used them to solve the distribution problem of various statistics in connection with the Behrens-Fisher problem. An asymptotic formula for the cumulative distribution function (c.d.f.) of $T^2_0$ is also given together with an upper bound for the error committed when all but the first few terms are omitted in the series. This formula is a sort of multivariate analogue of Hartley's formula of "Studentization" [2].

Journal ArticleDOI
TL;DR: In this paper, a set of sufficient conditions for uniform convergence with probability is given, and the results may be applied to some statistical problems, e.g., to deduce the asymptotic behavior of certain estimates.
Abstract: asymptotic behavior we may deduce the asymptotic behavior of certain estimates. In many of these cases, it is sufficient to demonstrate uniform convergence with probability one of these functions. In this paper, a set of sufficient conditions for this is given, and we show how these results may be applied to some statistical problems. 1. Statement of the theorem.3 Let X1, , X", be a sequence of independent and identically distributed variables with values in an arbitrary space X. Let T be a compact topological space, and let f be a complex-valued function on T X X, measurable in x for each t e T. Let P be the common distribution of the Xi.

Journal ArticleDOI
TL;DR: In this article, the authors investigated properties of the joint distribution of linearly transformed random variables and found that the efficiency of the Spearman rank correlation test depends strongly on local properties of local densities of the densities, which should invite caution.
Abstract: Let $F_{\lambda^0}$ denote the joint distribution of two independent random variables $Y_{\lambda^0}$ and $Z_{\lambda^0}$. The paper investigates properties of the joint distribution $F_\lambda$ of the linearly transformed random variables $Y_\lambda$ and $Z_\lambda$. Let $\Im_0$ be the Spearman rank correlation test, $\Im_1$ the difference sign correlation test, $\Im_2$ the unbiased grade correlation test (which is asymptotically equivalent to $\Im_0$), $\Im_3$ the medial correlation test, and $\mathcal{R}$ the ordinary (parametric) correlation test. (Whenever discussing $\mathcal{R}$ we assume existence of fourth moments.) Properties of the power of these tests are found for alternatives of the above-mentioned form, particularly for alternatives "close" to the hypothesis of independence and for large samples. Against these alternatives the efficiency of $\Im_3$ is found to depend strongly on local properties of the densities of $Y_{\lambda_0}$ and $Z_{\lambda^0}$, which should invite caution; and the efficiency of $\Im_1$ with respect to $\Im_0$ is often unity. Incidentally, Pitman's result on efficiency is extended in several directions.

Journal ArticleDOI
TL;DR: For some problems involving a parameter of interest and a nuisance parameter, it is possible to define a statistic sufficient for the parametrization of interest as discussed by the authors, which has a number of applications in nonparametric theory.
Abstract: For some problems involving a parameter of interest and a nuisance parameter, it is possible to define a statistic sufficient for the parameter of interest. The definition has a number of applications in nonparametric theory. Two theorems are derived and used by way of illustration to prove that the sign test is a uniformly most powerful test for the nonparametric form of the single sample problem of location.


Journal ArticleDOI
TL;DR: In this article, the authors extended Neyman's method to a more general case in which random vectors are dealt with and showed that under certain regularity conditions, the regular and consistent estimates obtained are asymptotically normal as the number of random vectors tends to infinity.
Abstract: This study was initiated in connection with estimating parameters involved in a certain stochastic process of population growth. Because of the nature of distribution functions arising in such studies, the usual methods of estimation result in formulas which are so complex that it is difficult, if not impossible, to obtain explicit solutions for the estimates of the parameters. Investigation of the problem led to an extension of the method of best asymptotically normal estimates developed by Neyman [1]. The estimates derived are termed regular best asymptotically normal estimates (RBAN estimates). This extension can be applied to other problems. In [1], Neyman considers a whole class of estimates which possess the properties of consistency, of asymptotic normality, and of asymptotic efficiency, and he provides estimates having these asymptotic properties for the case of multinomial distributions. His method is extended in the present paper to a more general case in which random vectors are dealt with. Such an extension was considered by Barankin and Gurland [2], who studied a large class of estimates and showed that if the distributions involved are members of Koopman's family, it is still possible to reach the Cramer-Rao lower bound. The purposes of the present paper are to discuss a subclass of the estimates considered by Barankin and Gurland and to present simple methods of generating such estimates. The estimates discussed are based on a number of independent random vectors whose distribution functions are not specified. It is proved that under certain regularity conditions, the regular and consistent estimates obtained are asymptotically normal as the number of random vectors tends to infinity. A necessary and sufficient condition for a regular and consistent estimate to have a "minimal" asymptotic covariance matrix is given. An expression is derived for the "minimal" asymptotic covariance matrix. It is also proved that if a function $\mathbf{f}$ satisfies certain conditions, then in order that $\mathbf{f}(\tilde\theta)$ be an RBAN estimate of $\mathbf{f}(\theta)$ at $\mathbf{f}(\theta^0)$, where $\theta^0$ is the true value of the parameter point $\theta$, it is necessary and sufficient that the argument $\tilde\theta$ be an RBAN estimate of $\theta$ at $\theta^0$. Methods of generating RBAN estimates are given. For simplicity of presentation, matrix notation is used throughout this paper. By derivatives of a matrix with respect to a vector (or with respect to a second matrix) is meant the derivatives of the matrix simultaneously with respect to all the components of the vector (or all the elements of the second matrix). The usual rules of differentiation with respect to vectors are used.