scispace - formally typeset
Search or ask a question

Showing papers on "Mathematical statistics published in 2019"


Book
04 Dec 2019
TL;DR: This chapter discusses statistical methods for Geography Descriptive Statistics, Regression Analysis, and Inferential Statistics, as well as some Probability Theory reviews.
Abstract: Preface to the Second Edition Preface to the Third Edition Introduction to Statistical Methods for Geography Descriptive Statistics Probability and Discrete Probability Distributions Continuous Probability Distributions and Probability Models Inferential Statistics: Confidence Intervals, Hypothesis Testing and Sampling Analysis of Variance Correlation Introduction to Regression Analysis More on Regression Spatial Patterns Some Spatial Aspects of Regression Analysis Data Reduction: Factor Analysis and Cluster Analysis Epilogue Appendix A: Statistical Tables Table A.1 Random Digits Table A.2 Normal Distribution Table A.3 Student's t-Distribution Table A.4 Cumulative t-Distribution Table A.5 F-Distribution Table A.6 X2 Distribution Appendix B: Mathematical Conventions and Notation B.1 Mathematical Conventions B.2 Mathematical Notation Appendix C: Review and Extension of Some Probability Theory C.1 Expected Values C.2 Variation of a Random Variable C.3 Covariance of Random Variables

171 citations



Book
01 Aug 2019
TL;DR: This book is a readable, digestible introduction to exponential families, encompassing statistical models based on the most useful distributions in statistical theory, including the normal, gamma, binomial, Poisson, and negative binomial.
Abstract: This book is a readable, digestible introduction to exponential families, encompassing statistical models based on the most useful distributions in statistical theory, including the normal, gamma, binomial, Poisson, and negative binomial. Strongly motivated by applications, it presents the essential theory and then demonstrates the theory's practical potential by connecting it with developments in areas like item response analysis, social network models, conditional independence and latent variable structures, and point process models. Extensions to incomplete data models and generalized linear models are also included. In addition, the author gives a concise account of the philosophy of Per Martin-Lof in order to connect statistical modelling with ideas in statistical physics, including Boltzmann's law. Written for graduate students and researchers with a background in basic statistical inference, the book includes a vast set of examples demonstrating models for applications and exercises embedded within the text as well as at the ends of chapters.

30 citations


01 Jan 2019
TL;DR: Using the methods of mathematical statistics in sports and educational research / N. Byshevets, L. Denysova, O. Shynkaruk, K. Serhiyenko, V. Usychenko, I. Syvash as discussed by the authors.
Abstract: Using the methods of mathematical statistics in sports and educational research / N. Byshevets, L. Denysova, O. Shynkaruk, K. Serhiyenko, V. Usychenko, O. Stepanenko, I. Syvash // Journal of Physical Education and Sport (JPES). - 2019. - Vol. 19 (Supp. iss. 3), аrt. 148. - Р. 1030-1034.

19 citations


Journal ArticleDOI
01 Jan 2019
TL;DR: In this article, a new compound continuous distribution named Gompertz Frechet distribution which extends the Frechet Distribution was developed, and its various statistical properties were analyzed and compared.
Abstract: In this paper, a new compound continuous distribution named the Gompertz Frechet distribution which extends the Frechet distribution was developed. Its various statistical properties were a...

14 citations


Book
31 Jan 2019
TL;DR: In this paper, the authors present an up-to-date, comprehensive coverage of stochastic dominance and its related concepts in a unified framework, including inferential methods and applications, citing and summarizing various empirical studies in order to relate the econometric methods with real applications.
Abstract: This book offers an up-to-date, comprehensive coverage of stochastic dominance and its related concepts in a unified framework. A method for ordering probability distributions, stochastic dominance has grown in importance recently as a way to measure comparisons in welfare economics, inequality studies, health economics, insurance wages, and trade patterns. Whang pays particular attention to inferential methods and applications, citing and summarizing various empirical studies in order to relate the econometric methods with real applications and using computer codes to enable the practical implementation of these methods. Intuitive explanations throughout the book ensure that readers understand the basic technical tools of stochastic dominance.

13 citations


Journal ArticleDOI
Ambrose Lo1
TL;DR: In this article, the expected values of different types of random variables is a central topic in mathematical statistics and is targeted toward students and instructors in both introductory probability and statistics courses, with a focus on the problem of estimating the expected value of a random variable.
Abstract: Calculating the expected values of different types of random variables is a central topic in mathematical statistics. Targeted toward students and instructors in both introductory probability and s...

11 citations


Journal ArticleDOI
TL;DR: In this article, the authors studied geometric properties of the maximum likelihood estimator (MLE) for weighted samples, and showed that every regular subdivision arises in the MLE for some set of weights with positive probability, but coarser subdivisions appear to be more likely to arise than finer ones.
Abstract: Shape-constrained density estimation is an important topic in mathematical statistics. We focus on densities on $$\mathbb {R}^d$$ that are log-concave, and we study geometric properties of the maximum likelihood estimator (MLE) for weighted samples. Cule, Samworth, and Stewart showed that the logarithm of the optimal log-concave density is piecewise linear and supported on a regular subdivision of the samples. This defines a map from the space of weights to the set of regular subdivisions of the samples, i.e. the face poset of their secondary polytope. We prove that this map is surjective. In fact, every regular subdivision arises in the MLE for some set of weights with positive probability, but coarser subdivisions appear to be more likely to arise than finer ones. To quantify these results, we introduce a continuous version of the secondary polytope, whose dual we name the Samworth body. This article establishes a new link between geometric combinatorics and nonparametric statistics, and it suggests numerous open problems.

11 citations


Journal ArticleDOI
TL;DR: A weighted configuration model graph is introduced, where edge weights correspond to the probability of infection in an epidemic on the graph, and copulas can produce results that are similar to those of the empirical degree distributions, indicating that in some cases a copula is a viable alternative to using the full empirical data.
Abstract: We introduce a weighted configuration model graph, where edge weights correspond to the probability of infection in an epidemic on the graph. On these graphs, we study the development of a Susceptible–Infectious–Recovered epidemic using both Reed–Frost and Markovian settings. For the special case of having two different edge types, we determine the basic reproduction numberR0, the probability of a major outbreak, and the relative final size of a major outbreak. Results are compared with those for a calibrated unweighted graph. The degree distributions are based on both theoretical constructs and empirical network data. In addition, bivariate standard normal copulas are used to model the dependence between the degrees of the two edge types, allowing for modeling the correlation between edge types over a wide range. Among the results are that the weighted graph produces much richer results than the unweighted graph. Also, while R0 always increases with increasing correlation between the two degrees, this is not necessarily true for the probability of a major outbreak nor for the relative final size of a major outbreak. When using copulas we see that these can produce results that are similar to those of the empirical degree distributions, indicating that in some cases a copula is a viable alternative to using the full empirical data.

11 citations




Journal ArticleDOI
TL;DR: The main indicator of quality functioning of the aircraft pilotage system is flight safety and in accordance with acceptable risk concept, demands with safety indicators are being determined in probability form and circumferential acceptable risk values could be very small order of magnitude 10−6–10−8.
Abstract: The main indicator of quality functioning of the aircraft pilotage system is flight safety. Thereby, in accordance with acceptable risk concept, demands with safety indicators are being determined in probability form and circumferential acceptable risk values could be very small order of magnitude 10−6–10−8. Such demands for the precision characteristics of vertical echeloning and automatic landing of civil aircrafts are valid during flights in all weather conditions. At certification of aircrafts these demands should be confirmed. Thereto, the only possible way of demand confirmation with such small risks is statistical mathematical modeling in the wast range of disturbing factors. Mathematical modeling is inseparable part of aircraft piloting system development and it is accepted as legitimate at corresponding verification, which enables the guarantee for adequate aircraft movement modeling. The most complete information about aircraft movement probability characteristics is contained within laws of their probability distributions. In classical mathematical statistics papers there is no investigations of very small and very big probabilities. Therefore for solving the safety problems, connected with needs for investigation “tail” (last) distribution parts, necessary to develop a new methods. One of them is presented into this paper. For problems of statistical modeling of aircraft movement in the wast range of disturbing factors the typical appearance of “breach”—distribution law alteration of analyzed parameters. The cause of the “breach” is realization of unlikely extreme values of disturbed random factors and their mixtures at large quantities of modeling, which reconduct corresponding systems in areas of nonlinearity. It is mentioned that the effect of “breach” is characteristical not only for problems of aircraft piloting but also for the wast range of problems in different practice applied areas—ecology, quality management (Six Sigma methodology) and other.

Journal ArticleDOI
TL;DR: The statistical Python programming convergence curriculum was developed and applied based on the idea that convergence education combining information and mathematics, programming and statistical literacy is needed according to current trends.
Abstract: The Ministry of Education (2015) announced the \"2015 Revised Curriculum for Elementary and Secondary Schools\" and announced that SW (Software) training for elementary and junior high school students to develop Computational Thinking will be gradually introduced from 2018. In addition, 'problem solving' and 'programming' have become important areas. Furthermore, the ability to analyze and utilize big data is becoming more emphasized. We developed and applied the statistical Python programming convergence curriculum based on the idea that convergence education combining information and mathematics, programming and statistical literacy is needed according to current trends. Before and after the experiment, problem solving ability test and programming / mathematical interest test were conducted and compared with the corresponding sample t-test. According to the analysis results, there were significant differences in the preand post-test on problem solving ability, programming interest and mathematical interest at the significance level of 0.05.



Journal ArticleDOI
TL;DR: In this article, two new goodness-of-fit tests for the composite hypothesis of belonging to the logistic family with unknown location parameter against the general alternatives have been proposed: the integral and the Kolmogorov-type.
Abstract: The logistic family of distributions belongs to the class of important families in the theory of probability and mathematical statistics. However, the goodness-of-fit tests for the composite hypothesis of belonging to the logistic family with unknown location parameter against the general alternatives have not been sufficiently explored. We propose two new goodness-of-fit tests: the integral and the Kolmogorov-type, based on the recent characterization of the logistic family by Hua and Lin. Here we discuss asymptotic properties of new tests and calculate their Bahadur efficiency for common alternatives.

Proceedings ArticleDOI
17 Dec 2019
TL;DR: In this article, the authors focused on the actual problem of reducing acoustic pollution made by rail and proposed a method to ensure the acoustic rail safety through physical modeling of the acoustic pollution dynamics of the railways process environment, followed by mathematical description of this process on the basis of methods of probability theory and mathematical statistics.
Abstract: The article is devoted to solving the actual problem - to reduce acoustic pollution which is made by rail. The purpose is to ensure the acoustic rail safety through physical modeling of the acoustic pollution dynamics of the railways process environment, followed by mathematical description of this process on the basis of methods of probability theory and mathematical statistics. Research methods were based on the analytical generalization of scientific and technical results of dynamic systems research on the basic provisions of the system analysis, theory, simulation of technical systems, methods of mathematical statistics and probability theory. A sound wave in the course of its dynamic interaction with physical objects in each stage changes its physical characteristics. Therefore, the essence of the physical process dynamics of the acoustic pollution caused by rail can be expressed in mathematical relationship characterizing the probability of the process realization. The resulting study of the mathematical description of dynamics physical nature of the acoustic pollution of the process fluid rail enables a scientific foundation for these or other technical solutions aimed at reducing the acoustic discomfort in the surrounding territories of populated areas.

Journal ArticleDOI
TL;DR: Application of mathematical statistics methods as a possibility of implementing the principle of “minimum initial information – maximum justified generalizations” when analyzing the results of Supervisory and control measures in land management makes it possible to come to non-trivial conclusions.
Abstract: Application of mathematical statistics methods as a possibility of implementing the principle of “minimum initial information – maximum justified generalizations” when analyzing the results of Supervisory and control measures in land management makes it possible to come to non-trivial conclusions. Significant correlation between indicators of state land supervision and General characteristics of the subjects of the Siberian Federal district (area, number and population density) were identified. High number of the eliminated violations is possible only at a large number of the revealed violations, but just 35–45 % of the revealed violations are eliminated in 33% of the general number of the areas with the revealed violations. The Amount of fines imposed is closely correlated with the amount of fines collected (r = 0,95) with a low – 65 % share of the money gathered. The role of fines for violating land legislation is not decisive in the formation of the SFD subjects’ budgets. Indicators of state land supervision closely connected with the area of land settlements, this data is system-forming in the eliminating land law violations. In proportion to the availability of these lands revenues to local budgets are formed as well. To increase the effectiveness of state land supervision as to the number of eliminated violations, the area of land with eliminated violations and the amount of fines collected, it is necessary to focus on bringing cases for each violation to a logical conclusion.

Journal ArticleDOI
31 Jul 2019-Symmetry
TL;DR: A new consecutive nonparametric method of adaptive pendular truncation is suggested for outlier detection and selection in sodar data and is implemented in a censoring algorithm.
Abstract: Statistical analysis of the results of minisodar measurements of vertical profiles of wind velocity components in a 5–200 m layer of the atmosphere shows that this problem belongs to the class of robust nonparametric problems of mathematical statistics. In this work, a new consecutive nonparametric method of adaptive pendular truncation is suggested for outlier detection and selection in sodar data. The method is implemented in a censoring algorithm. The efficiency of the suggested algorithm is tested in numerical experiments. The algorithm has been used to calculate statistical characteristics of wind velocity components, including vertical profiles of the first four moments, the correlation coefficient, and the autocorrelation and structure functions of wind velocity components. The results obtained are compared with classical sample estimates.

01 Jan 2019
TL;DR: This thesis provides the first general framework for establishing the optimal tradeoff between false positives and false negatives and proves some of the first non-asymptotic results available on this topic – initially in the context of Gaussian-like test statistics.
Abstract: Author(s): Rabinovich, Maxim | Advisor(s): Jordan, Michael I | Abstract: With the greater adoption of statistical and machine learning methods across science and industry, a greater awareness of the need to align statistical theory ever more closely with the demands of applications is developing. One recurring theme within this process is the re-examination of basic questions and core assumptions through the lens of modern mathematical statistics. This thesis targets two such basic questions in two different contexts: posterior simulation using Markov chain Monte Carlo (MCMC), on the one hand; and multiple hypothesis testing, on the other. For MCMC, we analyze convergence in terms of the expectations of a limited number of query functions, rather than the entire posterior. We show both theoretically and via simulations that the resultant theory predicts the required chain length more sharply than global convergence criteria. Furthermore, we provide match- ing lower bounds that show our bounds are essentially optimal in their dependence on chain parameters and target accuracy. For multiple testing, we provide the first general framework for establishing the optimal tradeoff between false positives (measured by the False Discovery Rate, or FDR) and false negatives (measured by the False Non-discovery rate, or FNR). We begin by proving some of the first non-asymptotic results available on this topic – initially in the context of Gaussian-like test statistics. We then go on to develop the more general framework. The latter applies to test statistics with essentially arbitrary analytic forms and dependence, recovers previous results as special cases, and yields numerically simulable lower bounds that can be evaluated in almost any model of interest.

Journal ArticleDOI
TL;DR: In this paper, the authors studied the asymptotical features of the standard statistical test (the t-test for the correlation coefficient) for verifying the significance of the coefficient of Pearson correlation between random variables x and y.
Abstract: This paper is devoted to studying the asymptotical features of the standard statistical test (sometimes called the t-test for the correlation coefficient) for verifying the hypothesis about the significance of the coefficient of Pearson correlation between random variables x and y. Despite the fact that this test has been substantiated only under the assumption of a Gaussian character for the joint distribution of x and y, it is very widely used and incorporated in most statistical packages. However, the assumption about a Gaussian character of distributions usually fails in practice, so a problem exists with describing the applicability region of the t-test at great sample sizes. It has been proven in this work that this test is asymptotically exact for independent x and y when certain additional conditions are met, whereas a simple lack of correlation may be insufficient for such a feature. In addition, an asymptotically exact and consistent test has been constructed in the absence of independence. Computational experiments argue for its applicability in practice. Moreover, these results have been extended to the partial correlation coefficient after corresponding modifications.

Journal ArticleDOI
TL;DR: In this article, the authors estimate the derivative of the regression function in fixed-design nonparametric regression and establish the almost sure convergence as well as the asymptotic normality of their estimate.
Abstract: This paper is devoted to the estimation of the derivative of the regression function in fixed-design nonparametric regression. We establish the almost sure convergence as well as the asymptotic normality of our estimate. We also provide concentration inequalities which are useful for small sample sizes. Numerical experiments on simulated data show that our nonparametric statistical procedure performs very well. We also illustrate our approach on high-frequency environmental data for the study of marine pollution.

Journal ArticleDOI
29 Jan 2019
TL;DR: In this article, the authors show the necessity add to classical course of probability theory and mathematical statistics for the specialties "Mathematics" and "Informatics" by a computer workshop aimed at mastering by students the tools of statistical information processing by means of computer technologies, the use of which greatly facilitates and accelerates the calculation of statistical indicators, compilation of statistical tables and plotting, as well as expanding the possibilities for analysis and visual representation of statistical data.
Abstract: The article shows the necessity add to classical course of probability theory and mathematical statistics for the specialties "Mathematics" and "Informatics" by a computer workshop aimed at mastering by students the tools of statistical information processing by means of computer technologies, the use of which greatly facilitates and accelerates the calculation of statistical indicators , compilation of statistical tables and plotting, as well as expanding the possibilities for analysis and visual representation of statistical data. The example of one laboratory work in the article describes the possible ways of acquiring practical skills, skills and experience of students in this direction.


Journal ArticleDOI
TL;DR: In this paper, the gear mechanism of reducers of degrees of freedom is used to identify the most informative data link and get it accurate assessment, which is carried out by so-called gear mechanism.
Abstract: The method of the specified points of the estimates of the parameter the probability distribution of the random variable based on the limited amount of statistical data, which allowed to identify the most informative data link and get it accurate assessment, has been developed. It is established that data analysis and processing are carried out with the help of well-known methods on probability theory and mathematical statistics, where considerable theoretical and practical experience is accumulated. The mathematical model that describes the state of an object, process or phenomenon presented as point estimates of the parameter of probability distribution of the random variable whose value is obtained based on a small sample. The traditional approach to identify the most informative data channel on the state of the object, or the process progress or phenomena and clipping other – less reliable, has been considered. This is carried out by so-called gear mechanism of reducers of degrees of freedom, which is the main drawback is that the cut off communication channels may be some useful information, which is not involved in the process of developing an agreed solution. So, it is needed to enter arrangements of discriminator degrees of freedom that allow all data channels to participate in the process preparation of solution with significance, which corresponds to the level of their information content in the current situation. An illustrative example is given of applying considered methods of averaging data that shows the results of calculations per iteration using gear mechanisms of implementation as a reducers as well as discriminators degrees of freedom. These arrangements reflect the peculiarities of the implementation of iterative algorithms for specific as for methods of mathematical statistics and methods for synergistic system of data averaging.

Journal Article
TL;DR: In this article, the dimensions of 100 coffee beans of the Arabica and Robusta varieties were determined by measuring their length, width, and thickness, and the results of the measurements were processed by the methods of mathematical statistics.
Abstract: Dimensions of 100 randomly selected coffee beans of the Arabica and Robusta variety were determined by measuring their length (l), width (b) and thickness (h). The results of the measurements were processed by the methods of mathematical statistics. Parameters of distributions of separate sizes as random variables are determined. By the value of the coefficient of variation, the density function of normal distribution (Gaussian distribution) is taken as a model of separate sizes of beans. Models of two-dimensional distributions of beans sizes as independent random variables are presented. The coefficients of correlation between the geometric sizes of beans are calculated. The obtained values of the correlation coefficients indicate that the geometric sizes of beans should be considered as dependent random variables. The mathematical models of geometric sizes of beans as dependent random variables as density functions of their normal distribution are proposed. By values of the sums of squared deviations as a fitting criterion it has been established that the mathematical models of geometric sizes of beans as dependent random variables in the form of density functions of their normal distribution provide better data approximation than the mathematical models of geometric sizes of beans as independent random variables.

Book ChapterDOI
01 Jan 2019
TL;DR: The focus in this survey is on nonparametric approaches for estimation and inference, with particular emphasis on inference since nothing can be learned from estimation without inference.
Abstract: A rich literature on the analysis of efficiency in production has developed since pioneering work of Tjalling Koopmans and George Debreu in the 1950s. This literature includes work by researchers in economics, econometrics, management science, operations research, mathematical statistics and other fields. The focus in this survey is on nonparametric approaches for estimation and inference, with particular emphasis on inference since nothing can be learned from estimation without inference. The statistical problem amounts to estimating the support of a multivariate random variable, subject to some shape constraints, in multiple dimensions. New results that enable inference about mean efficiency as well as tests of convexity, returns to scale, differences in mean efficiency and the separability condition described by Simar and Wilson (Journal of Econometrics 136:31–64, 2007) are discussed. The well-known curse of dimensionality presents additional challenges, but recent work indicates that reducing dimensionality using eigensystem techniques may improve estimation accuracy. Remaining challenges and open issues are also discussed.

Journal ArticleDOI
TL;DR: A separation between the academic subjects statistics and mathematical statistics has existed in Sweden almost as long as there have been statistics professors as discussed by the authors. But the same distinction has not been maintained in other countries.
Abstract: A separation between the academic subjects statistics and mathematical statistics has existed in Sweden almost as long as there have been statistics professors. The same distinction has not been maintained in other countries. Why has it been kept for so long in Sweden, and what consequences may it have had?In May 2015, it was 100 years since Mathematical Statistics was formally established as an academic discipline at a Swedish university where Statistics had existed since the turn of the century.We give an account of the debate in Lund and elsewhere about this division during the first decades after 1900 and present two of its leading personalities. The Lund University astronomer (and mathematical statistician) C. V. L. Charlier was a leading proponent for a position in mathematical statistics at the university. Charlier's adversary in the debate was Pontus Fahlbeck, professor in political science and statistics, who reserved the word statistics for ‘statistics as a social science’. Charlier not only secured the first academic position in Sweden in mathematical statistics for his former PhD student Sven Wicksell but also demonstrated that a mathematical statistician can be influential in matters of state, finance as well as in different natural sciences. Fahlbeck saw mathematical statistics as a set of tools that sometimes could be useful in his brand of statistics.After a summary of the organisational, educational and scientific growth of the statistical sciences in Sweden that has taken place during the last 50 years, we discuss what effects the Charlier–Fahlbeck divergence might have had on this development. (Less)

Book ChapterDOI
10 Jun 2019
TL;DR: This paper presents a simplified approach producing a set of clear enough equations and indicators, which are helpful for engineers during preliminary estimation of the potential increase of accuracy from the known interconnections between the measured quantities.
Abstract: Increasing of measurement accuracy is always a relevant goal. Applied to cyberphysical systems, it helps to improve their operation and quality of decision-making. One of the methods to achieve this is to use relations between quantities to be measured – if such relationships exist and are known at least approximately. At present, there are not so many published articles that describe metrological applications which use this kind of information about measured quantities to get a better accuracy. It seems that the small amount of practical realizations is due to the necessity to use rather sophisticated mathematical approaches based on the probability theory and mathematical statistics along with numerous simulations to make a conclusion about the potential increase of accuracy. This paper presents a simplified approach producing a set of clear enough equations and indicators, which are helpful for engineers during preliminary estimation of the potential increase of accuracy from the known interconnections between the measured quantities. A case of linear dependency between the measured quantities is analyzed to show how the approach works.

Journal ArticleDOI
TL;DR: In this article, it was shown that under some conditions the sum of square of centered and normalized these random variables converge in distribution to a χ2-square random variable with K degrees of freedom.
Abstract: We conceder random variables that are numbers of particles in the first K cells in a non-homogeneous allocation scheme of distinguishing particles by different cells, where K is a fixed number. It proved that under some conditions the sum of square of centered and normalized these random variables converge in distribution to a χ2-square random variable with K degrees of freedom, sums of these random variables which centered and normalized converge in distribution to a Gaussian random variable with the means 0 and the variance 1. The meathod of the proofs of our theorems founded on Kolchin representation of an allocation scheme of distinguishing particles by different cells. We give applications of these results to mathematical statistics: we consider analog of χ2-test and some S-criterion.