scispace - formally typeset
Search or ask a question

Showing papers in "arXiv: Data Analysis, Statistics and Probability in 2002"


Journal ArticleDOI
TL;DR: In this article, the authors developed a method for the multifractal characterization of nonstationary time series, which is based on a generalization of the detrended fluctuation analysis (DFA).
Abstract: We develop a method for the multifractal characterization of nonstationary time series, which is based on a generalization of the detrended fluctuation analysis (DFA). We relate our multifractal DFA method to the standard partition function-based multifractal formalism, and prove that both approaches are equivalent for stationary signals with compact support. By analyzing several examples we show that the new method can reliably determine the multifractal scaling behavior of time series. By comparing the multifractal DFA results for original series to those for shuffled series we can distinguish multifractality due to long-range correlations from multifractality due to a broad probability density function. We also compare our results with the wavelet transform modulus maxima (WTMM) method, and show that the results are equivalent.

1,891 citations


Journal ArticleDOI
TL;DR: In this article, the authors use the extension of the method of recurrence plots to cross-recurrence plots (CRP) which enables a nonlinear analysis of bivariate data.
Abstract: We use the extension of the method of recurrence plots to cross recurrence plots (CRP) which enables a nonlinear analysis of bivariate data. To quantify CRPs, we develop further three measures of complexity mainly basing on diagonal structures in CRPs. The CRP analysis of prototypical model systems with nonlinear interactions demonstrates that this technique enables to find these nonlinear interrelations from bivariate time series, whereas linear correlation tests do not. Applying the CRP analysis to climatological data, we find a complex relationship between rainfall and El Nino data.

46 citations


Posted Content
TL;DR: It is proved for Markov sequences that NSRPS together with suitable codings of the substitutions and of the substitute series does not lead to a code length increase, in the limit of infinite sequence length.
Abstract: We argue that Non-sequential Recursive Pair Substitution (NSRPS) as suggested by Jim\'enez-Monta\~no and Ebeling can indeed be used as a basis for an optimal data compression algorithm. In particular, we prove for Markov sequences that NSRPS together with suitable codings of the substitutions and of the substitute series does not lead to a code length increase, in the limit of infinite sequence length. When applied to written English, NSRPS gives entropy estimates which are very close to those obtained by other methods. Using ca. 135 GB of input data from the project Gutenberg, we estimate the effective entropy to be $\approx 1.82$ bit/character. Extrapolating to infinitely long input, the true value of the entropy is estimated as $\approx 0.8$ bit/character.

30 citations


Proceedings ArticleDOI
TL;DR: This paper re-introduces the family of entropic priors as minimizers of mutual information between the data and the parameters, as in [2], but with a small number of changes.
Abstract: The ongoing unprecedented exponential explosion of available computing power, has radically transformed the methods of statistical inference. What used to be a small minority of statisticians advocating for the use of priors and a strict adherence to bayes theorem, it is now becoming the norm across disciplines. The evolutionary direction is now clear. The trend is towards more realistic, flexible and complex likelihoods characterized by an ever increasing number of parameters. This makes the old question of: What should the prior be? to acquire a new central importance in the modern bayesian theory of inference. Entropic priors provide one answer to the problem of prior selection. The general definition of an entropic prior has existed since 1988, but it was not until 1998 that it was found that they provide a new notion of complete ignorance. This paper re-introduces the family of entropic priors as minimizers of mutual information between the data and the parameters, as in [rodriguez98b], but with a small change and a correction. The general formalism is then applied to two large classes of models: Discrete probabilistic networks and univariate finite mixtures of gaussians. It is also shown how to perform inference by efficiently sampling the corresponding posterior distributions.

30 citations


Journal ArticleDOI
TL;DR: In this article, the authors examined the NSB estimator of entropies of severely undersampled discrete variables and devised a procedure for calculating the involved integrals, and discovered that the output of the estimator has a well defined limit for large cardinalities of the variables being studied.
Abstract: We examine the recently introduced NSB estimator of entropies of severely undersampled discrete variables and devise a procedure for calculating the involved integrals. We discover that the output of the estimator has a well defined limit for large cardinalities of the variables being studied. Thus one can estimate entropies with no a priori assumptions about these cardinalities, and a closed form solution for such estimates is given.

26 citations


Journal ArticleDOI
TL;DR: In this paper, the authors consider nearest neighbor spacing distributions of composite ensembles of levels and show that the method of Bayesian inference is superior to the unfolding procedure, which leads to an overestimate of the chaoticity parameter.
Abstract: We consider nearest neighbor spacing distributions of composite ensembles of levels. These are obtained by combining independently unfolded sequences of levels containing only few levels each. Two problems arise in the spectral analysis of such data. One problem lies in fitting the nearest neighbor spacing distribution to the histogram of level spacings obtained from the data. We show that the method of Bayesian inference is superior to this procedure. The second problem occurs when one unfolds such short sequences. We show that the unfolding procedure generically leads to an overestimate of the chaoticity parameter. This trend is absent in the presence of long-range level correlations. Thus, composite ensembles of levels from a system with long-range spectral stiffness yield reliable information about the chaotic behavior of the system.

24 citations


Journal ArticleDOI
TL;DR: In this paper, maximum likelihood methods are applied to the problem of absorption tomography reconstruction with the help of an iterative algorithm and the statistics of the illuminating beam can be incorporated into the reconstruction.
Abstract: Maximum-likelihood methods are applied to the problem of absorption tomography. The reconstruction is done with the help of an iterative algorithm. We show how the statistics of the illuminating beam can be incorporated into the reconstruction. The proposed reconstruction method can be considered as a useful alternative in the extreme cases where the standard ill-posed direct-inversion methods fail.

9 citations


Posted Content
TL;DR: In this article, the authors examine the relationship between the Bayesian and information-theoretic formulations of source separation algorithms, and make use of the connection between the work of Claude E. Shannon and the "Recent Contributions" by Warren Weaver (Shannon & Weaver 1949) as clarified by Richard T. Cox (1979) and expounded upon by Robert L. Fry (1996) as a duality between a logic of assertions and the logic of questions.
Abstract: We examine the relationship between the Bayesian and information-theoretic formulations of source separation algorithms. This work makes use of the relationship between the work of Claude E. Shannon and the "Recent Contributions" by Warren Weaver (Shannon & Weaver 1949) as clarified by Richard T. Cox (1979) and expounded upon by Robert L. Fry (1996) as a duality between a logic of assertions and a logic of questions. Working with the logic of assertions requires the use of probability as a measure of degree of implication. This leads to a Bayesian formulation of the problem. Whereas, working with the logic of questions requires the use of entropy as a measure of the bearing of a question on an issue leading to an information-theoretic formulation of the problem.

7 citations


Posted Content
TL;DR: In this paper, a Bayesian approach is proposed to solve the blind source separation problem by implementing a Monte Carlo Markov Chain (MCMC) procedure, which provides in the stationary regime samples drawn from the posterior distributions of all the variables involved in the problem leading to a flexibility in the cost function choice.
Abstract: In this contribution, we consider the problem of the blind separation of noisy instantaneously mixed images. The images are modelized by hidden Markov fields with unknown parameters. Given the observed images, we give a Bayesian formulation and we propose to solve the resulting data augmentation problem by implementing a Monte Carlo Markov Chain (MCMC) procedure. We separate the unknown variables into two categories: 1. The parameters of interest which are the mixing matrix, the noise covariance and the parameters of the sources distributions. 2. The hidden variables which are the unobserved sources and the unobserved pixels classification labels. The proposed algorithm provides in the stationary regime samples drawn from the posterior distributions of all the variables involved in the problem leading to a flexibility in the cost function choice. We discuss and characterize some problems of non identifiability and degeneracies of the parameters likelihood and the behavior of the MCMC algorithm in this case. Finally, we show the results for both synthetic and real data to illustrate the feasibility of the proposed solution. keywords: MCMC, blind source separation, hidden Markov fields, segmentation, Bayesian approach

7 citations


Proceedings ArticleDOI
TL;DR: In this paper, the problem of blind signal and image separation using a sparse representation of the images in the wavelet domain is considered in a Bayesian estimation framework using the fact that the distribution of wavelet coefficients of real world images can naturally be modeled by an exponential power probability density function.
Abstract: In this paper, we consider the problem of blind signal and image separation using a sparse representation of the images in the wavelet domain. We consider the problem in a Bayesian estimation framework using the fact that the distribution of the wavelet coefficients of real world images can naturally be modeled by an exponential power probability density function. The Bayesian approach which has been used with success in blind source separation gives also the possibility of including any prior information we may have on the mixing matrix elements as well as on the hyperparameters (parameters of the prior laws of the noise and the sources). We consider two cases: first the case where the wavelet coefficients are assumed to be i.i.d. and second the case where we model the correlation between the coefficients of two adjacent scales by a first order Markov chain. This paper only reports on the first case, the second case results will be reported in a near future. The estimation computations are done via a Monte Carlo Markov Chain (MCMC) procedure. Some simulations show the performances of the proposed method. Keywords: Blind source separation, wavelets, Bayesian estimation, MCMC Hasting-Metropolis algorithm.

6 citations


Posted Content
TL;DR: In this paper, an efficient method for the reconstruction of a probability density function from the knowledge of its infinite sequence of ordinary moments is presented, resorting to maximum entropy technique, under the constraint of some fractional moments.
Abstract: We outline an efficient method for the reconstruction of a probability density function from the knowledge of its infinite sequence of ordinary moments. The approximate density is obtained resorting to maximum entropy technique, under the constraint of some fractional moments. The latter ones are obtained explicitly in terms of the infinite sequence of given ordinary moments. It is proved that the approximate density converges in entropy to the underlying density, so that it demonstrates to be useful for calculating expected values.

Posted Content
TL;DR: In this article, the Hilbert transform is used to reconstruct the analytic signal of a single pair of conjugate poles, which can be used to estimate the position of a plant.
Abstract: Many physical systems can be adequately modelled using a second order approximation. The problem of plant identification reduces to the problem of estimating the position of a single pair of complex conjugate poles. One approach to the problem is to apply the method of least squares to the time domain data. This type of computation is best carried out in "batch" mode and applies to an entire data set. Another approach would be to design an adaptive filter and to use autoregressive, AR, techniques. This would be well suited to continuous real-time data and could track slow changes on the underlying plant. I this paper we present a very fast but approximate technique for the estimation of the position of a single pair of complex conjugate poles, using the Hilbert transform to reconstruct the analytic signal.

Posted Content
TL;DR: In this paper, a root density estimator with optimal asymptotic behavior was proposed to solve the statistical inverse problem of quantum mechanics, namely, estimating the psi function on the basis of the results of mutually complementing experiments.
Abstract: A fundamental problem of statistical data analysis, distribution density estimation by experimental data, is considered. A new method with optimal asymptotic behavior, the root density estimator, is developed. The method proposed may be applied to its full extent to solve the statistical inverse problem of quantum mechanics, namely, estimating the psi function on the basis of the results of mutually complementing experiments.

Posted Content
TL;DR: The overall design of the Integrated Spectral Analysis Workbench (ISAW), being developed at Argonne, provides for an extensible, highly interactive, collaborating set of viewers for neutron scattering data.
Abstract: The overall design of the Integrated Spectral Analysis Workbench (ISAW), being developed at Argonne, provides for an extensible, highly interactive, collaborating set of viewers for neutron scattering data Large arbitrary collections of spectra from multiple detectors can be viewed as an image, a scrolled list of individual graphs, or using a 3D representation of the instrument showing the detector positions Data from an area detector can be displayed using a contour or intensity map as well as an interactive table Selected spectra can be displayed in tables or on a conventional graph A unique characteristic of these viewers is their interactivity and coordination The position "pointed at" by the user in one viewer is sent to other viewers of the same DataSet so they can track the position and display relevant information Specialized viewers for single crystal neutron diffractometers are being developed A "proof-of-concept" viewer that directly displays the 3D reciprocal lattice from a complete series of runs on a single crystal diffractometer has been implemented

Posted Content
TL;DR: In this paper, the distribution of dropped calls rates for different wireless (cellular) carriers in different markets was studied and an equation for the most probable dropped calls rate for a particular carrier in particular market, which depends on the number of observed, total number of calls and the parameters of the lognormal distribution was derived.
Abstract: We study the distributions of dropped calls rates for different wireless (cellular) carriers in different markets. Our statistics comprises over 700 different market/carrier combinations. We find that the dropped calls rates distribution is very close to lognormal. We derive an equation for the most probable dropped calls rate for particular carrier in particular market, which depends on the number of dropped calls observed, total number of calls and the parameters of the lognormal distribution. We apply this analysis to blocked and "no service" calls as well.

Posted Content
TL;DR: In this article, a theory of systems with long-range correlations based on the consideration of binary N-step Markov chains is developed, where the conditional probability that the i-th symbol in the chain equals zero (or unity) is a linear function of the number of unities among the preceding N symbols.
Abstract: A theory of systems with long-range correlations based on the consideration of binary N-step Markov chains is developed. In our model, the conditional probability that the i-th symbol in the chain equals zero (or unity) is a linear function of the number of unities among the preceding N symbols. The model allows exact analytical treatment. The correlation and distribution functions as well as the variance of number of symbols in the words of arbitrary length L are obtained analytically and numerically. A self-similarity of the studied stochastic process is revealed and the similarity transformation of the chain parameters is presented. The diffusion equation governing the distribution function of the L-words is explored. If the persistent correlations are not extremely strong, the distribution function is shown to be the Gaussian with the variance being nonlinearly dependent on L. The applicability of the developed theory to the coarse-grained written and DNA texts is discussed.

Posted Content
TL;DR: In this article, a new paper on the same topic has been submitted as physics/0310159, which was withdrawn by the authors due to significant new findings, and was submitted by the same authors as this paper.
Abstract: This paper was withdrawn by the authors due to significant new findings. A new paper on the same topic has been submitted as physics/0310159.

Posted Content
TL;DR: In this article, the scalar Minkowski valuations are extended to vector-and tensor-valued measures for describing the geometry and connectivity of spatial patterns, and the properties of these extensions are described in detail.
Abstract: Higher-rank Minkowski valuations are efficient means for describing the geometry and connectivity of spatial patterns. We show how to extend the framework of the scalar Minkowski valuations to vector- and tensor-valued measures. The properties of these extensions are described in detail. We show the versatility of these measures by using simple toy models as well as real data. Our applications cover the morphology of galaxy clusters, the structure of spiral galaxies, and the geometry of molecules. Furthermore, we consider a physical ansatz closely related to higher-rank Minkowski valuations, the Rosenfeld functional known from density functional theory.

Posted Content
TL;DR: In this article, the authors introduce the concept of statistical errors of confidence limits and argue that not only should limits be calculated but also their errors in order to represent the results of the analysis to the fullest.
Abstract: Confidence limits are common place in physics analysis. Great care must be taken in their calculation and use, especially in cases of limited statistics when often onesided limits are quoted. In order to estimate the stability of the confidence levels to addition of more data and/or change of cuts, we argue that the variance of their sampling distributions be calculated in addition to the limit itself. The square root of the variance of their sampling distribution can be thought of as a statistical error on the limit. We thus introduce the concept of statistical errors of confidence limits and argue that not only should limits be calculated but also their errors in order to represent the results of the analysis to the fullest. We show that comparison of two different limits from two different experiments becomes easier when their errors are also quoted. Use of errors of confidence limits will lead to abatement of the debate on which method is best suited to calculate confidence limits.

Posted Content
Rajendran Raja1
TL;DR: In this article, the authors provide an ansatz for determining Bayesian a priori probabilities for both the fitted quantities and a measure of the goodness of fit for unbinned likelihood fits.
Abstract: Maximum likelihood fits to data can be done using binned data (histograms) and unbinned data. With binned data, one gets not only the fitted parameters but also a measure of the goodness of fit. With unbinned data, currently, the fitted parameters are obtained but no measure of goodness of fit is available. This remains, to date, an unsolved problem in statistics. Using Bayes theorem and likelihood ratios, we provide a method by which both the fitted quantities and a measure of the goodness of fit are obtained for unbinned likelihood fits, as well as errors in the fitted quantities. We provide an ansatz for determining Bayesian a priori probabilities.

Posted Content
TL;DR: In this article, it was shown that the likelihood function is normalizeable with respect to the data, but there is no guarantee that the same holds for the model parameters, especially if the prior information is not sufficient to take care of finite integral values.
Abstract: Although the likelihood function is normalizeable with respect to the data there is no guarantee that the same holds with respect to the model parameters. This may lead to singularities in the expectation value integral of these parameters, especially if the prior information is not sufficient to take care of finite integral values. However, the problem may be solved by obeying the correct Riemannian metric imposed by the likelihood. This will be demonstrated for the example of the electron temperature evaluation in hydrogen plasmas.

Posted Content
TL;DR: In this article, an algorithm is presented which generates pairs of oscillatory random time series which have identical periodograms but differ in the number of oscillations, indicating the intrinsic limitations of spectral methods when it comes to the task of measuring frequencies.
Abstract: An algorithm is presented which generates pairs of oscillatory random time series which have identical periodograms but differ in the number of oscillations. This result indicate the intrinsic limitations of spectral methods when it comes to the task of measuring frequencies. Other examples, one from medicine and one from bifurcation theory, are given, which also exhibit these limitations of spectral methods. For two methods of spectral estimation it is verified that the particular way end points are treated, which is specific to each method, is, for long enough time series, not relevant for the main result.

Posted Content
TL;DR: In this paper, a prior-free predictive inference regarding any future datum is generated directly from the datum of a location measurement such inference turns out as if obtained from a certain pdf ("fiducial") indirectly associated with the parameter, but is inappropriate in the analysis of combined measurements unless they all are location measurements of the same parameter.
Abstract: Motivation This version is based solely on the calculus of probability, excluding any statistical principle "Location measurement" means the pdf of the error is known When the datum is obtained, intuition suggests something like a pdf for the parameter; here we attempt a critical examination of its meaning Summary In default of prior probability the parameter is not defined as a random variable, hence there can be no genuine prior-free parametric inference Nevertheless prior-free predictive inference regarding any future datum is generated directly from the datum of a location measurement Such inference turns out as if obtained from a certain pdf ("fiducial") indirectly associated with the parameter This false pdf can expedite predictive inference, but is inappropriate in the analysis of combined measurements (unless they all are location measurements of the same parameter) Also it has the same distribution as the ostensible Bayesian posterior from a uniform "prior" However, if any of these spurious entities is admitted in the analysis, inconsistent results follow When we combine measurements, we find that the quantisation errors, inevitable in data recording, must be taken into consideration These errors cannot be folded into predictive inference in an exact sense; that is, we cannot render a predictive distribution of a future datum except as an approximation Keywords: location measurement; combination of observations; parametric inference; predictive inference; prior-free inference; quantisation error; digitisation; frequentist interpretation; the fiducial argument; fiducial probability; pivotal inference; intuitive assessment; prior-free assessment

Posted Content
TL;DR: In this paper, the authors introduce the theory of marked point processes and define universal test quantities applicable to realizations of a marked point process, and show their power using concrete data sets in analyzing the luminosity-dependence of the galaxy clustering, the alignment of dark matter halos in gravitational $N$-body simulations, the morphology-and diameter-dependent of the Martian crater distribution and the size correlations of pores in sandstone.
Abstract: Mark correlations provide a systematic approach to look at objects both distributed in space and bearing intrinsic information, for instance on physical properties. The interplay of the objects' properties (marks) with the spatial clustering is of vivid interest for many applications; are, e.g., galaxies with high luminosities more strongly clustered than dim ones? Do neighbored pores in a sandstone have similar sizes? How does the shape of impact craters on a planet depend on the geological surface properties? In this article, we give an introduction into the appropriate mathematical framework to deal with such questions, i.e. the theory of marked point processes. After having clarified the notion of segregation effects, we define universal test quantities applicable to realizations of a marked point processes. We show their power using concrete data sets in analyzing the luminosity-dependence of the galaxy clustering, the alignment of dark matter halos in gravitational $N$-body simulations, the morphology- and diameter-dependence of the Martian crater distribution and the size correlations of pores in sandstone. In order to understand our data in more detail, we discuss the Boolean depletion model, the random field model and the Cox random field model. The first model describes depletion effects in the distribution of Martian craters and pores in sandstone, whereas the last one accounts at least qualitatively for the observed luminosity-dependence of the galaxy clustering.

Journal ArticleDOI
TL;DR: In this paper, the phase synchronization between atmospheric variables such as daily mean temperature and daily precipitation records was studied and significant phase synchronization was found between records of Oxford and Vienna as well as between the records of precipitation and temperature in each city.
Abstract: We study phase synchronization between atmospheric variables such as daily mean temperature and daily precipitation records. We find significant phase synchronization between records of Oxford and Vienna as well as between the records of precipitation and temperature in each city. To find the time delay in the synchronization between the records we study the time lag phase synchronization when the records are shifted by a variable time interval of days. We also compare the results of the method with the classical cross-correlation method and find that in certain cases the phase synchronization yields more significant results.

Posted Content
TL;DR: Diffuse, multiple-scattered waves can be very efficient for information transfer through disordered media, provided that antenna arrays are used for both transmission and reception of signals.
Abstract: Diffuse, multiple-scattered waves can be very efficient for information transfer through disordered media, provided that antenna arrays are used for both transmission and reception of signals. Information capacity C of a communication channel between two identical linear arrays of n equally-spaced antennas, placed in a disordered medium with diffuse scattering, grows linearly with n and can attain considerable values, if antenna spacing a > lambda/2, where lambda is the wavelength. Decrease of a below lambda/2 makes the signals received by different antennas partially correlated, thus introducing redundancy and reducing capacity of the communication system. When the size of antenna arrays is well below lambda/2, the scaling of C with n becomes logarithmic and capacity is low.

Posted Content
TL;DR: A new NeXus-API for IDL based on IDL C interface has been implemented and has been redesigned and expanded to cover both HDF versions 4 and 5.
Abstract: NeXus is a joint effort of both the synchrotron and neutron scattering community to devlop a common data exchange format based on HDF. In order to simplify access to NeXus-files a NeXus-API is provided. This NeXus-API has been redesigned and expanded to cover both HDF versions 4 and 5. Only small changes to the API were necessary in order to accomplish. A new NeXus-API for IDL based on IDL C interface has been implemented.

Posted Content
TL;DR: This work presents their work on a GUI for reflectometry data analysis and reduction written in Tcl/Tk and Octave, with underlying C code for the numerically intensive portions.
Abstract: For instruments with many occasional users, it is important to have easy to use software. To support the frequent users it is important to be flexible. Using a scripting language to design a GUI and exposing it to the user allows us to do both. We present our work on a GUI for reflectometry data analysis and reduction written in Tcl/Tk and Octave, with underlying C code for the numerically intensive portions. As well as being easier to train new users, the new software allows existing users to do in minutes what used to take hours.

Posted Content
TL;DR: In this paper, a method for the decomposition of mass spectra of mixture gases using Bayesian probability theory is presented, which works without any calibration measurement and therefore applies also to the analysis of spectra containing unstable species.
Abstract: We present a method for the decomposition of mass spectra of mixture gases using Bayesian probability theory. The method works without any calibration measurement and therefore applies also to the analysis of spectra containing unstable species. For the example of mixtures of three different hydrocarbon gases the algorithm provides concentrations and cracking coefficients of each mixture component as well as their confidence intervals. The amount of information needed to obtain reliable results and its relation to the accuracy of our analysis are discussed.

Proceedings ArticleDOI
TL;DR: In this article, the authors explore inductive logic and its role in science touching on both experimental design and analysis of experimental results, and demonstrate that the duality between the logic of assertions and the Logic of questions has important consequences.
Abstract: In celebration of the work of Richard Threlkeld Cox, we explore inductive logic and its role in science touching on both experimental design and analysis of experimental results. In this exploration we demonstrate that the duality between the logic of assertions and the logic of questions has important consequences. We discuss the conjecture that the relevance or bearing, b, of a question on an issue can be expressed in terms of the probabilities, p, of the assertions that answer the question via the entropy. In its application to the scientific method, the logic of questions, inductive inquiry, can be applied to design an experiment that most effectively addresses a scientific issue. This is performed by maximizing the relevance of the experimental question to the scientific issue to be resolved. It is shown that these results are related to the mutual information between the experiment and the scientific issue, and that experimental design is akin to designing a communication channel that most efficiently communicates information relevant to the scientific issue to the experimenter. Application of the logic of assertions, inductive inference (Bayesian inference) completes the experimental process by allowing the researcher to make inferences based on the information obtained from the experiment.