scispace - formally typeset
Search or ask a question

Showing papers on "Sampling distribution published in 2021"


Journal ArticleDOI
TL;DR: In this paper, a joint cosmological analysis of weak gravitational lensing from the fourth data release of the ESO Kilo-Degree Survey (KiDS-1000) and galaxy clustering from the partially overlapping Baryon Oscillation Spectroscopic Survey (BOSS) and the 2-degree Field Lensing Survey (2dFLenS) is presented.
Abstract: We present the methodology for a joint cosmological analysis of weak gravitational lensing from the fourth data release of the ESO Kilo-Degree Survey (KiDS-1000) and galaxy clustering from the partially overlapping Baryon Oscillation Spectroscopic Survey (BOSS) and the 2-degree Field Lensing Survey (2dFLenS). Cross-correlations between BOSS and 2dFLenS galaxy positions and source galaxy ellipticities have been incorporated into the analysis, necessitating the development of a hybrid model of non-linear scales that blends perturbative and non-perturbative approaches, and an assessment of signal contributions by astrophysical effects. All weak lensing signals were measured consistently via Fourier-space statistics that are insensitive to the survey mask and display low levels of mode mixing. The calibration of photometric redshift distributions and multiplicative gravitational shear bias has been updated, and a more complete tally of residual calibration uncertainties was propagated into the likelihood. A dedicated suite of more than 20 000 mocks was used to assess the performance of covariance models and to quantify the impact of survey geometry and spatial variations of survey depth on signals and their errors. The sampling distributions for the likelihood and the χ 2 goodness-of-fit statistic have been validated, with proposed changes for calculating the effective number of degrees of freedom. The prior volume was explicitly mapped, and a more conservative, wide top-hat prior on the key structure growth parameter S 8 = σ8 (Ωm/0.3)1/2 was introduced. The prevalent custom of reporting S 8 weak lensing constraints via point estimates derived from its marginal posterior is highlighted to be easily misinterpreted as yielding systematically low values of S 8, and an alternative estimator and associated credible interval are proposed. Known systematic effects pertaining to weak lensing modelling and inference are shown to bias S 8 by no more than 0.1 standard deviations, with the caveat that no conclusive validation data exist for models of intrinsic galaxy alignments. Compared to the previous KiDS analyses, S 8 constraints are expected to improve by 20 % for weak lensing alone and by 29 % for the joint analysis.

66 citations


Journal ArticleDOI
TL;DR: In this article, the authors examined the accuracy of p values obtained using the asymptotic mean and variance (MV) correction to the distribution of the sample standardized root mean squared residual (SRMR) proposed by Maydeu-Olivares to assess the exact fit of SEM models.
Abstract: We examine the accuracy of p values obtained using the asymptotic mean and variance (MV) correction to the distribution of the sample standardized root mean squared residual (SRMR) proposed by Maydeu-Olivares to assess the exact fit of SEM models. In a simulation study, we found that under normality, the MV-corrected SRMR statistic provides reasonably accurate Type I errors even in small samples and for large models, clearly outperforming the current standard, that is, the likelihood ratio (LR) test. When data shows excess kurtosis, MV-corrected SRMR p values are only accurate in small models (p = 10), or in medium-sized models (p = 30) if no skewness is present and sample sizes are at least 500. Overall, when data are not normal, the MV-corrected LR test seems to outperform the MV-corrected SRMR. We elaborate on these findings by showing that the asymptotic approximation to the mean of the SRMR sampling distribution is quite accurate, while the asymptotic approximation to the standard deviation is not.

43 citations


Journal ArticleDOI
TL;DR: It is demonstrated that bootEGA is a robust approach for identifying the stability and robustness of dimensionality in multivariate data.

41 citations


Journal ArticleDOI
TL;DR: A cluster-based histogram, called equal intensity $k$ -means space partitioning (EI-kMeans) is proposed and a heuristic method to improve the sensitivity of drift detection is introduced.
Abstract: The data stream poses additional challenges to statistical classification tasks because distributions of the training and target samples may differ as time passes. Such a distribution change in streaming data is called concept drift. Numerous histogram-based distribution change detection methods have been proposed to detect drift. Most histograms are developed on the grid-based or tree-based space partitioning algorithms which makes the space partitions arbitrary, unexplainable, and may cause drift blind spots. There is a need to improve the drift detection accuracy for the histogram-based methods with the unsupervised setting. To address this problem, we propose a cluster-based histogram, called equal intensity $k$ -means space partitioning (EI-kMeans). In addition, a heuristic method to improve the sensitivity of drift detection is introduced. The fundamental idea of improving the sensitivity is to minimize the risk of creating partitions in distribution offset regions. Pearson’s chi-square test is used as the statistical hypothesis test so that the test statistics remain independent of the sample distribution. The number of bins and their shapes, which strongly influence the ability to detect drift, are determined dynamically from the sample based on an asymptotic constraint in the chi-square test. Accordingly, three algorithms are developed to implement concept drift detection, including a greedy centroids initialization algorithm, a cluster amplify–shrink algorithm, and a drift detection algorithm. For drift adaptation, we recommend retraining the learner if a drift is detected. The results of experiments on the synthetic and real-world datasets demonstrate the advantages of EI-kMeans and show its efficacy in detecting concept drift.

39 citations


Journal ArticleDOI
TL;DR: In this article, an alternative parameterization for a large class of exchangeable random graphs, where the nodes are independent random vectors in a linear space equipped with an indefinite inner product, and the edge probability between two nodes equals the inner product of the corresponding node vectors.
Abstract: Exchangeable random graphs serve as an important probabilistic framework for the statistical analysis of network data. In this work, we develop an alternative parameterization for a large class of exchangeable random graphs, where the nodes are independent random vectors in a linear space equipped with an indefinite inner product, and the edge probability between two nodes equals the inner product of the corresponding node vectors. Therefore, the distribution of exchangeable random graphs in this subclass can be represented by a node sampling distribution on this linear space, which we call the graph root distribution. We study existence and identifiability of such representations, the topological relationship between the graph root distribution and the exchangeable random graph sampling distribution and estimation of graph root distributions.

23 citations


Journal ArticleDOI
07 Jun 2021
TL;DR: In this article, the exact sampling distribution of the estimated optimal portfolio weights and their characteristics is derived by deriving their sampling distribution by its stochastic representation. But the sampling distribution is not directly applied to the real world.
Abstract: Optimal portfolio selection problems are determined by the (unknown) parameters of the data generating process. If an investor wants to realize the position suggested by the optimal portfolios, he/she needs to estimate the unknown parameters and to account for the parameter uncertainty in the decision process. Most often, the parameters of interest are the population mean vector and the population covariance matrix of the asset return distribution. In this paper, we characterize the exact sampling distribution of the estimated optimal portfolio weights and their characteristics. This is done by deriving their sampling distribution by its stochastic representation. This approach possesses several advantages, e.g. (i) it determines the sampling distribution of the estimated optimal portfolio weights by expressions, which could be used to draw samples from this distribution efficiently; (ii) the application of the derived stochastic representation provides an easy way to obtain the asymptotic approximation of the sampling distribution. The later property is used to show that the high-dimensional asymptotic distribution of optimal portfolio weights is a multivariate normal and to determine its parameters. Moreover, a consistent estimator of optimal portfolio weights and their characteristics is derived under the high-dimensional settings. Via an extensive simulation study, we investigate the finite-sample performance of the derived asymptotic approximation and study its robustness to the violation of the model assumptions used in the derivation of the theoretical results.

15 citations


Posted Content
TL;DR: In this article, a new approach using extreme value theory is proposed to describe the distribution of the loudest template's detection statistic in an arbitrary template bank, which automatically generalizes to a wider class of detection statistics, including (but not limited to) line-robust statistics and transient continuous-wave signal hypotheses, and improves the estimation of the expected maximum detection statistic at a negligible computing cost.
Abstract: Searches for gravitational-wave signals are often based on maximizing a detection statistic over a bank of waveform templates, covering a given parameter space with a variable level of correlation. Results are often evaluated using a noise-hypothesis test, where the background is characterized by the sampling distribution of the loudest template. In the context of continuous gravitational-wave searches, properly describing said distribution is an open problem: current approaches focus on a particular detection statistic and neglect template-bank correlations. We introduce a new approach using extreme value theory to describe the distribution of the loudest template's detection statistic in an arbitrary template bank. Our new proposal automatically generalizes to a wider class of detection statistics, including (but not limited to) line-robust statistics and transient continuous-wave signal hypotheses, and improves the estimation of the expected maximum detection statistic at a negligible computing cost. The performance of our proposal is demonstrated on simulated data as well as by applying it to different kinds of (transient) continuous-wave searches using O2 Advanced LIGO data. We release an accompanying Python software package, distromax, implementing our new developments.

11 citations


Journal ArticleDOI
TL;DR: In this article, a novel Mahalanobis distance-based automatic threshold selection method is proposed, which involves use of proposed transformation to map Generalized Pareto distributed random variable (depicting peaks over tentative thresholds) from the original space to standard exponential (exp(1)) distributed random variables in a nondimensional space.
Abstract: An unresolved problem in statistical analysis of hydrological extremes (e.g., storms, floods) using POT model is identification of optimal threshold. There are various issues affecting performance of different methods available for threshold selection (TS). To overcome those issues, this study contributes a novel Mahalanobis distance-based automatic TS method. It involves use of proposed transformation to map Generalized Pareto distributed random variable (depicting peaks over tentative thresholds) from the original space to standard exponential (Exp(1)) distributed random variable in a nondimensional space. Optimal threshold is identified as that which minimizes Mahalanobis distance between L-moments of the transformed random variable and those of the population (i.e., Exp(1) distribution) in the nondimensional space. Its effectiveness is demonstrated over four existing automatic TS methods through Monte Carlo simulation experiments and case studies over rainfall and streamflow data sets chosen from India, United Kingdom, and Australia. The four methods include three based on goodness of fit (GoF) test statistics (of Anderson-Darling and two nonparametric tests), and a recent one based on L-moment ratio diagram whose potential is unexplored in hydrology. This study further provides insight into properties and effectiveness of the four TS methods, which is scanty in literature. Results indicate that there is inconsistency in performance of GoF test-based methods across data sets exhibiting fat and thin tail behavior, owing to their theoretical assumptions and uncertainty associated with sampling distribution of test statistics. Issues affecting performance of L-moment ratio diagram-based TS method are also identified. The proposed method overcomes those issues and appears promising for hydrologic applications. © 2020. American Geophysical Union. All Rights Reserved.

11 citations


Journal ArticleDOI
TL;DR: In this article, the authors show that sampling designs for mapping are better compared on the basis of the distribution of the map quality indices over repeated selection of the calibration sample, rather than on a single sample selected per design.
Abstract: If a map is constructed through prediction with a statistical or non-statistical model, the sampling design used for selecting the sample on which the model is fitted plays a key role in the final map accuracy. Several sampling designs are available for selecting these calibration samples. Commonly, sampling designs for mapping are compared in real-world case studies by selecting just one sample for each of the sampling designs under study. In this study, we show that sampling designs for mapping are better compared on the basis of the distribution of the map quality indices over repeated selection of the calibration sample. In practice this is only feasible by subsampling a large dataset representing the population of interest, or by selecting calibration samples from a map depicting the study variable. This is illustrated with two real-world case studies. In the first case study a quantitative variable, soil organic carbon, is mapped by kriging with an external drift in France, whereas in the second case a categorical variable, land cover, is mapped by random forest in a region in France. The performance of two sampling designs for mapping are compared: simple random sampling and conditioned Latin hypercube sampling, at various sample sizes. We show that in both case studies the sampling distributions of map quality indices obtained with the two sampling design types, for a given sample size, show large variation and largely overlap. This shows that when comparing sampling designs for mapping on the basis of a single sample selected per design, there is a serious risk of an incidental result. Highlights: We provide a method to compare sampling designs for mapping. Random designs for selecting calibration samples should be compared on the basis of the sampling distribution of the map quality indices.

11 citations


Journal ArticleDOI
TL;DR: In this article, the authors show that the logarithm of the Bayes factor attains scatter of order unity, increasing significantly with stronger tension between the models under comparison, and they develop an approximate procedure that quantifies the sampling distribution of the evidence at a small additional computational cost.
Abstract: Summary statistics of likelihood, such as Bayesian evidence, offer a principled way of comparing models and assessing tension between, or within, the results of physical experiments. Noisy realisations of the data induce scatter in these model comparison statistics. For a realistic case of cosmological inference from large-scale structure, we show that the logarithm of the Bayes factor attains scatter of order unity, increasing significantly with stronger tension between the models under comparison. We develop an approximate procedure that quantifies the sampling distribution of the evidence at a small additional computational cost and apply it to real data to demonstrate the impact of the scatter, which acts to reduce the significance of any model discrepancies. Data compression is highlighted as a potential avenue to suppressing noise in the evidence to negligible levels, with a proof of concept demonstrated using Planck cosmic microwave background data.

10 citations


Proceedings ArticleDOI
17 Oct 2021
TL;DR: Zhang et al. as mentioned in this paper proposed a general distribution-based metric to depict the pairwise distance between images, where each image is characterized by its random augmentations that can be viewed as samples from the corresponding latent semantic distribution.
Abstract: The majority of deep unsupervised hashing methods usually first construct pairwise semantic similarity information and then learn to map images into compact hash codes while preserving the similarity structure, which implies that the quality of hash codes highly depends on the constructed semantic similarity structure. However, since the features of images for each kind of semantics usually scatter in high-dimensional space with unknown distribution, previous methods could introduce a large number of false positives and negatives for boundary points of distributions in the local semantic structure based on pairwise cosine distances. Towards this limitation, we propose a general distribution-based metric to depict the pairwise distance between images. Specifically, each image is characterized by its random augmentations that can be viewed as samples from the corresponding latent semantic distribution. Then we estimate the distances between images by calculating the sample distribution divergence of their semantics. By applying this new metric to deep unsupervised hashing, we come up with Distribution-based similArity sTructure rEconstruction (DATE). DATE can generate more accurate semantic similarity information by using non-parametric ball divergence. Moreover, DATE explores both semantic-preserving learning and contrastive learning to obtain high-quality hash codes. Extensive experiments on several widely-used datasets validate the superiority of our DATE.

Journal ArticleDOI
TL;DR: An adjustable inspection scheme based on one-sided capability indices was developed by setting a limit on total sampling times to the conventional RGSP, and an example is provided to illustrate the applicability of the proposed plan.

Proceedings ArticleDOI
28 Jan 2021
TL;DR: A method to determine symmetric and unimodal pair overbounding distributions that are key to the determination of strict Gaussian bounds used in GNSS integrity is described.
Abstract: We describe a method to determine symmetric and unimodal pair overbounding distributions that are key to the determination of strict Gaussian bounds used in GNSS integrity. The method works by casting the search of the symmetric and unimodal bounding distribution as a linear program. We then use the proposed method to compute a set of Gaussian bounds with varying bias (for a given sample distribution) and to determine the approximate optimal choice for a given application. As an application, we apply the method to determine optimal Gaussian bounds for GPS clock and ephemeris errors.

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed an object-oriented sampling approach by segmenting image blocks expanded from systematically distributed seeds and carried out a rigorous comparison of seven sampling strategies, including random sampling, systematic sampling, stratified sampling, and manual sampling, to explore the impact of training sample distribution on the accuracy of land cover classification when the samples are limited.
Abstract: High-quality training samples are essential for accurate land cover classification. Due to the difficulties in collecting a large number of training samples, it is of great significance to collect a high-quality sample dataset with a limited sample size but effective sample distribution. In this paper, we proposed an object-oriented sampling approach by segmenting image blocks expanded from systematically distributed seeds (object-oriented sampling approach) and carried out a rigorous comparison of seven sampling strategies, including random sampling, systematic sampling, stratified sampling (stratified sampling with the strata of land cover classes based on classification product, Latin hypercube sampling, and spatial Latin hypercube sampling), object-oriented sampling, and manual sampling, to explore the impact of training sample distribution on the accuracy of land cover classification when the samples are limited. Five study areas from different climate zones were selected along the China–Mongolia border. Our research identified the proposed object-oriented sampling approach as the first-choice sampling strategy in collecting training samples. This approach improved the diversity and completeness of the training sample set. Stratified sampling with strata defined by the combination of different attributes and stratified sampling with the strata of land cover classes had their limitations, and they performed well in specific situations when we have enough prior knowledge or high-accuracy product. Manual sampling was greatly influenced by the experience of interpreters. All these sampling strategies mentioned above outperformed random sampling and systematic sampling in this study. The results indicate that the sampling strategies of training datasets do have great impacts on the land cover classification accuracies when the sample size is limited. This paper will provide guidance for efficient training sample collection to increase classification accuracies.

Journal ArticleDOI
TL;DR: In this paper, the authors show that the logarithm of the Bayes factor attains scatter of order unity, increasing significantly with stronger tension between the models under comparison, and they develop an approximate procedure that quantifies the sampling distribution of the evidence at small additional computational cost.
Abstract: Summary statistics of the likelihood, such as the Bayesian evidence, offer a principled way of comparing models and assessing tension between, or within, the results of physical experiments. Noisy realisations of the data induce scatter in these model comparison statistics. For a realistic case of cosmological inference from large-scale structure we show that the logarithm of the Bayes factor attains scatter of order unity, increasing significantly with stronger tension between the models under comparison. We develop an approximate procedure that quantifies the sampling distribution of the evidence at small additional computational cost and apply it to real data to demonstrate the impact of the scatter, which acts to reduce the significance of any model discrepancies. Data compression is highlighted as a potential avenue to suppressing noise in the evidence to negligible levels, with a proof of concept on Planck cosmic microwave background data.

Journal ArticleDOI
TL;DR: In this article, the authors consider the problem of how to define empirical estimates of exceedance probabilities and return periods associated with an ordered sample of observations, and derive some new results about the size of the confidence intervals for exceedance probability and return period.

Journal ArticleDOI
01 Jun 2021
TL;DR: In this article, a metric of consensus for Likert-type scales is proposed to measure the level of agreement for any given number of response options as the percentage of consensus among respondents, which is obtained as the relative weight of the distance from the point containing the proportions of observations that fall in each category to the centre of a regular polygon with as many vertices as categories.
Abstract: In this study, we present a metric of consensus for Likert-type scales. The statistic provides the level of agreement for any given number of response options as the percentage of consensus among respondents. With this aim, we use a geometric framework that allows us to analytically derive a positional indicator. The statistic is obtained as the relative weight of the distance from the point containing the proportions of observations that fall in each category to the centre of a regular polygon with as many vertices as categories, which corresponds to the point of maximum dissent. The polygon can be regarded as the area that encompasses all possible answering combinations. In order to assess the performance of the proposed metric of consensus, we conduct an iterated forecasting experiment to test whether the inclusion of the degree of agreement in households’ expectations improves out-of-sample forecast accuracy of the unemployment rate in seven European countries and the Euro Area. We find evidence that the level of consensus among households contains useful information to predict unemployment rates in all cases. This result shows the potential of agreement metrics to track the evolution of economic variables. Finally, we design a simulation experiment in which we compare the sampling distribution of the proposed metric for three- and five-response alternatives, finding that the distribution of the former shows a higher level of granularity and dispersion.

Journal ArticleDOI
18 Jun 2021
TL;DR: In this paper, a new estimator for estimating the finite population distribution function (DF) is proposed using supplementary information on the DF of the auxiliary variable under simple random sampling, and a comparative study is conducted to compare, theoretically and numerically, the adapted distribution function estimators of Cochran (1940), Murthy (1967), Bahl and Tuteja (1991, Rao (1991), Singh et al. (2009) and Grover and Kaur (2014) with the proposed estimators.
Abstract: In this paper, a new estimator for estimating the finite population distribution function(DF) are propose using supplementary information on the DF of the auxiliary variable under simple random sampling. A comparative study is conducted to compare, theoretically and numerically, the adapted distribution function estimators of Cochran (1940), Murthy (1967), Bahl and Tuteja (1991), Rao (1991), Singh et al. (2009) and Grover and Kaur (2014) with the proposed estimators. It is found that the proposed estimators always perform better than the adapted estimators in terms of MSE and percentage relative efficiency.

Journal ArticleDOI
TL;DR: The energy test of multivariate normality is an affine invariant test based on a characterization of equal distributions by energy distance as mentioned in this paper, which is a degenerate kernel V-statistic, which asymptotically has a sampling distribution that is a Gaussian quadratic form under the null hypothesis of normality.

Proceedings Article
01 Jan 2021
TL;DR: Zhang et al. as mentioned in this paper proposed an end-to-end deep model embedded with the cross-entropy method (CEM) for unsupervised 3D point cloud registration.
Abstract: In this paper, by modeling the point cloud registration task as a Markov decision process, we propose an end-to-end deep model embedded with the cross-entropy method (CEM) for unsupervised 3D registration. Our model consists of a sampling network module and a differentiable CEM module. In our sampling network module, given a pair of point clouds, the sampling network learns a prior sampling distribution over the transformation space. The learned sampling distribution can be used as a "good" initialization of the differentiable CEM module. In our differentiable CEM module, we first propose a maximum consensus criterion based alignment metric as the reward function for the point cloud registration task. Based on the reward function, for each state, we then construct a fused score function to evaluate the sampled transformations, where we weight the current and future rewards of the transformations. Particularly, the future rewards of the sampled transforms are obtained by performing the iterative closest point (ICP) algorithm on the transformed state. By selecting the top-k transformations with the highest scores, we iteratively update the sampling distribution. Furthermore, in order to make the CEM differentiable, we use the sparsemax function to replace the hard top-$k$ selection. Finally, we formulate a Geman-McClure estimator based loss to train our end-to-end registration model. Extensive experimental results demonstrate the good registration performance of our method on benchmark datasets.

Journal ArticleDOI
TL;DR: In this article, an online Wang-Mendel fuzzy inference model is proposed to address the modeling of continuous production process with dynamic and nonlinear characteristics, which extracts the fuzzy rules from the raw data without prior knowledge.

Posted Content
TL;DR: A penalized empirical likelihood (PEL) estimation is proposed and it is shown that it achieves the oracle property under which the invalid moments can be consistently detected.
Abstract: Models defined by moment conditions are at the center of structural econometric estimation, but economic theory is mostly silent about moment selection. A large pool of valid moments can potentially improve estimation efficiency, whereas a few invalid ones may undermine consistency. This paper investigates the empirical likelihood estimation of these moment-defined models in high-dimensional settings. We propose a penalized empirical likelihood (PEL) estimation and show that it achieves the oracle property under which the invalid moments can be consistently detected. While the PEL estimator is asymptotically normally distributed, a projected PEL procedure can further eliminate its asymptotic bias and provide more accurate normal approximation to the finite sample distribution. Simulation exercises are carried out to demonstrate excellent numerical performance of these methods in estimation and inference.

Journal ArticleDOI
Adam Loy1
TL;DR: An overview of how the lineup protocol for visual inference can be used to build understanding of key statistical topics throughout the statistics curriculum is provided.
Abstract: In the classroom, we traditionally visualize inferential concepts using static graphics or interactive apps. For example, there is a long history of using apps to visualize sampling distributions. ...

Journal ArticleDOI
TL;DR: This work derives the unconditional asymptotic joint sampling distribution of multiple correlation coefficients under the null and arbitrary alternatives from multiple contrast tests and simultaneous confidence intervals for rank correlation measures in general multivariate factorial designs.

Journal ArticleDOI
TL;DR: In this article, the sampling distribution of the multiple coherence estimate between one periodic, deterministic signal and a set of N other Gaussian-distributed ones has been derived based on a relationship between the estimate and the Hotelling's T2 statistic extended to complex variables.

Journal ArticleDOI
02 Jun 2021
TL;DR: In this article, a sampling distribution and an optimal sampling interpolation algorithm are presented for mapping the antenna near-field values on a planar surface, based on the theoretical background related to a non-redundant sampling representation of the electromagnetic field.
Abstract: A convenient sampling distribution and an optimal sampling interpolation algorithm are here presented for mapping the antenna near-field values on a planar surface. The mapping procedure is based on the theoretical background related to a non-redundant sampling representation of the electromagnetic field and uses an unconventional arrangement of the sampling points. Their position results from the standard plane-rectangular allocation by properly increasing the distance between near sampling points as their distance from the antenna grows. This permits to map the values in a given area by using the knowledge of a reduced number of samples and, therefore, the measurement time is conveniently shorted. Numerical simulations and antenna measurements in anechoic chamber have been executed to test the effectiveness of the mapping procedure, whose accuracy has been proved by means of maximum and mean-square reconstruction errors at given output points.

Journal ArticleDOI
TL;DR: This presented study indicates that SESCA_bayes estimates the secondary structure composition with a significantly smaller uncertainty than its predecessor, SESca_deconv, which is based on spectrum deconvolution, and provides more accurate estimates for circular dichroism spectra that contain considerable non-SS contributions.

Journal ArticleDOI
TL;DR: In this article, the authors used the backpropagation network (BP) and random forest (RF) methods to test NPV cover extraction from Landsat 8-OLI images in Mu Us Sandy Land.

Journal ArticleDOI
TL;DR: In this paper, the authors show that the noninvariance of the Wald test to such reparameterizations stems from the application of a Taylor series expansion to approximate the restriction's sampling distribution.
Abstract: Distinguishing substantively meaningful spillover effects from correlated residuals is of great importance in cross-sectional studies. Both forms of spatial dependence not only hold different implications for the choice of an unbiased estimator but also for the validity of inferences. To guide model specification, different empirical strategies involve the estimation of an unrestricted spatial Durbin model and subsequently use the Wald test to scrutinize the nonlinear restriction of common factors implied by pure error dependence. However, the Wald test’s sensitivity to algebraically equivalent formulations of the null hypothesis receives scant attention in the context of cross-sectional analyses. This article shows analytically that the noninvariance of the Wald test to such reparameterizations stems from the application of a Taylor series expansion to approximate the restriction’s sampling distribution. While asymptotically valid, Monte Carlo simulations reveal that alternative formulations of the common factor restriction frequently produce conflicting conclusions in finite samples. An empirical example illustrates the substantive implications of this problem. Consequently, researchers should either base inferences on bootstrap critical values for the Wald statistic or use the likelihood ratio test which is invariant to such reparameterizations when deciding on the model specification that adequately reflects the spatial process generating the data.

Journal ArticleDOI
TL;DR: In this article, a Monte Carlo-based program (MC-Flux) was developed to estimate CO2 flux at numerous points over a large area and applying statistics or geostatistical interpolation.
Abstract: Accurately locating and quantifying carbon dioxide (CO2) leakage to the atmosphere is important for diffuse degassing studies in volcanic / geothermal areas and for safety monitoring and/or carbon credit auditing of Carbon Capture and Storage (CCS) sites. This is typically conducted by measuring CO2 flux at numerous points over a large area and applying statistics or geostatistical interpolation. Accuracy of the results will depend on many factors related to survey/data-processing choices and site characteristics, and thus uncertainties can be difficult to quantify. To address this issue, we have developed a Monte Carlo-based program (MC-Flux) that repeatedly subsamples a high-resolution synthetic or real dataset using a choice of different sampling strategies (one random and four grid types) at multiple user-defined sample densities. The program keeps track of the anomalies found and estimates total flux using two statistical and two geostatistical approaches from the literature. This paper describes the use of MC-Flux to assess the potential impact of various sampling and interpretation decisions on the accuracy of the final results. Simulations show that an offset grid sample distribution yields the best results, however relatively dense sampling is required to obtain a high probability of an accurate flux estimate. For the test dataset used, ordinary kriging interpolation produces a range of flux estimates that are centered on the true value while sequential Gaussian simulation tends to slightly overestimate values at intermediate sample spacings and is sensitive to input parameters. These results point to the need for developing new approaches that decrease uncertainty, such as integration with high-resolution co-kriging datasets that complement the more accurate point flux measurements.