scispace - formally typeset
Search or ask a question

Showing papers on "Cumulative distribution function published in 2005"


Journal ArticleDOI
TL;DR: In this paper, a diagnostic approach to the evaluation of predictive performance that is based on the paradigm of maximizing the sharpness of the predictive distributions subject to calibration is proposed, which is illustrated by an assessment and ranking of probabilistic forecasts of wind speed at the Stateline wind energy centre in the US Pacific Northwest.
Abstract: Summary. Probabilistic forecasts of continuous variables take the form of predictive densities or predictive cumulative distribution functions. We propose a diagnostic approach to the evaluation of predictive performance that is based on the paradigm of maximizing the sharpness of the predictive distributions subject to calibration. Calibration refers to the statistical consistency between the distributional forecasts and the observations and is a joint property of the predictions and the events that materialize. Sharpness refers to the concentration of the predictive distributions and is a property of the forecasts only. A simple theoretical framework allows us to distinguish between probabilistic calibration, exceedance calibration and marginal calibration. We propose and study tools for checking calibration and sharpness, among them the probability integral transform histogram, marginal calibration plots, the sharpness diagram and proper scoring rules. The diagnostic approach is illustrated by an assessment and ranking of probabilistic forecasts of wind speed at the Stateline wind energy centre in the US Pacific Northwest. In combination with cross-validation or in the time series context, our proposal provides very general, nonparametric alternatives to the use of information criteria for model diagnostics and model selection.

1,537 citations


Journal ArticleDOI
TL;DR: This handbook is a very useful handbook for engineers, especially those working in signal processing, and provides real data bootstrap applications to illustrate the theory covered in the earlier chapters.
Abstract: tions. Bootstrap has found many applications in engineering field, including artificial neural networks, biomedical engineering, environmental engineering, image processing, and radar and sonar signal processing. Basic concepts of the bootstrap are summarized in each section as a step-by-step algorithm for ease of implementation. Most of the applications are taken from the signal processing literature. The principles of the bootstrap are introduced in Chapter 2. Both the nonparametric and parametric bootstrap procedures are explained. Babu and Singh (1984) have demonstrated that in general, these two procedures behave similarly for pivotal (Studentized) statistics. The fact that the bootstrap is not the solution for all of the problems has been known to statistics community for a long time; however, this fact is rarely touched on in the manuscripts meant for practitioners. It was first observed by Babu (1984) that the bootstrap does not work in the infinite variance case. Bootstrap Techniques for Signal Processing explains the limitations of bootstrap method with an example. I especially liked the presentation style. The basic results are stated without proofs; however, the application of each result is presented as a simple step-by-step process, easy for nonstatisticians to follow. The bootstrap procedures, such as moving block bootstrap for dependent data, along with applications to autoregressive models and for estimation of power spectral density, are also presented in Chapter 2. Signal detection in the presence of noise is generally formulated as a testing of hypothesis problem. Chapter 3 introduces principles of bootstrap hypothesis testing. The topics are introduced with interesting real life examples. Flow charts, typical in engineering literature, are used to aid explanations of the bootstrap hypothesis testing procedures. The bootstrap leads to second-order correction due to pivoting; this improvement in the results due to pivoting is also explained. In the second part of Chapter 3, signal processing is treated as a regression problem. The performance of the bootstrap for matched filters as well as constant false-alarm rate matched filters is also illustrated. Chapters 2 and 3 focus on estimation problems. Chapter 4 introduces bootstrap methods used in model selection. Due to the inherent structure of the subject matter, this chapter may be difficult for nonstatisticians to follow. Chapter 5 is the most impressive chapter in the book, especially from the standpoint of statisticians. It provides real data bootstrap applications to illustrate the theory covered in the earlier chapters. These include applications to optimal sensor placement for knock detection and land-mine detection. The authors also provide a MATLAB toolbox comprising frequently used routines. Overall, this is a very useful handbook for engineers, especially those working in signal processing.

1,292 citations


Book
08 Feb 2005

256 citations


Journal ArticleDOI
TL;DR: In this article, a new radiation scheme is proposed that uses the correlated-k distribution (CKD) method, and the definition of the k-distribution function, the transformation between frequency space and k space and the upper limit of the absorption coefficient in cumulative probability space (CPS) are discussed.
Abstract: A new radiation scheme is proposed that uses the correlated-k distribution (CKD) method. The definition of the k-distribution function, the transformation between frequency space and k space, and the upper limit of the absorption coefficient in cumulative probability space (CPS) are discussed. The corresponding relation between each interval in CPS and the heating rate profile provides a method for determining the width of intervals in CPS. Three schemes are discussed for handling the spectral overlap of gases. Method 1 rearranges the appropriate combination of gaseous absorption coefficients when the spectral overlap of two gases is extensive. Method 2 applies to most overlapping gases and addresses the most important aspects of each gas’s spectrum in each interval of CPS. Method 3 applies to weak gases only and seeks to adjust the main absorption coefficients in order that radiative forcing at the surface and the top of the atmosphere is correct. This model is quite efficient because 1) relativ...

240 citations


Journal ArticleDOI
TL;DR: A theoretical framework for a class of multivariate Weibull distributions, originated from Gaussian random processes, is introduced and analyzed, and novel analytical expressions for the joint probability density function, moment-generating function, and cumulative distribution function are derived for this class of distributions.
Abstract: Ascertaining on the suitability of the Weibull distribution to model fading channels, a theoretical framework for a class of multivariate Weibull distributions, originated from Gaussian random processes, is introduced and analyzed. Novel analytical expressions for the joint probability density function (pdf), moment-generating function (mgf), and cumulative distribution function (cdf) are derived for the bivariate distribution of this class with not necessarily identical fading parameters and average powers. Two specific distributions with arbitrary number of correlated variates are considered and studied: with exponential and with constant correlation where their pdfs are introduced. Both cases assume equal average fading powers, but not necessarily identical fading parameters. For the multivariate Weibull distribution with exponential correlation, useful corresponding formulas, as for the bivariate case, are derived. The presented theoretical results are applied to analyze the performance of several diversity receivers employed with selection, equal-gain, and maximal-ratio combining (MRC) techniques operating over correlated Weibull fading channels. For these diversity receivers, several useful performance criteria such as the moments of the output signal-to-noise ratio (SNR) (including average output SNR and amount of fading) and outage probability are analytically derived. Moreover, the average symbol error probability for several coherent and noncoherent modulation schemes is studied using the mgf approach. The proposed mathematical analysis is complemented by various evaluation results, showing the effects of the fading severity as well as the fading correlation on the diversity receivers performance.

240 citations


Journal ArticleDOI
TL;DR: It is proved that the CS algorithm is equivalent to a scheduling algorithm that regards the user rates as independent and identically distributed, and the average throughput of a user is independent of the probability distribution of other users.
Abstract: In this paper, we present a new wireless scheduling algorithm based on the cumulative distribution function (cdf) and its simple modification that limits the maximum starving time. This cdf-based scheduling (CS) algorithm selects the user for transmission based on the cdf of user rates, in such a way that the user whose rate is high enough, but least probable to become higher, is selected first. We prove that the CS algorithm is equivalent to a scheduling algorithm that regards the user rates as independent and identically distributed, and the average throughput of a user is independent of the probability distribution of other users. So, we can evaluate the exact user throughput only if we know the user's own distribution, which is a distinctive feature of this proposed algorithm. In addition, we try a modification on the CS algorithm to limit the maximum starving time, and prove that the modification does not affect the average interservice time. This CS with starving-time limitation (CS-STL) algorithm turns out to limit the maximum starving time at the cost of a negligible throughput loss.

144 citations


Journal ArticleDOI
01 Jan 2005
TL;DR: In this article, an uncertainty analysis method is proposed with the purpose of accurately and efficiently estimating the cumulative distribution function (CDF), probability density function (PDF), and statistical moments of a response given the distributions of input variables.
Abstract: Uncertainty analysis, which assesses the impact of the uncertainty of input variables on responses, is an indispensable component in engineering design under uncertainty, such as reliability-based design and robust design. However, uncertainty analysis is an unaffordable computational burden in many engineering problems. In this paper, an uncertainty analysis method is proposed with the purpose of accurately and efficiently estimating the cumulative distribution function (CDF), probability density function (PDF), and statistical moments of a response given the distributions of input variables. The bivariate dimension reduction method and numerical integration are used to calculate the moments of the response; then saddlepoint approximations are employed to estimate the CDF and PDF of the response. The proposed method requires neither the derivatives of the response nor the search of the most probable point, which is needed in the commonly used first and second order reliability methods (FORM and SORM) and the recently developed first order saddlepoint approximation. The efficiency and accuracy of the proposed method is illustrated with three example problems. With the same computational cost, this method is more accurate for reliability assessment and much more efficient for estimating the full range of the distribution of a response than FORM and SORM. This method provides results as accurate as Monte Carlo simulation, with significantly reduced computational effort.

140 citations


Journal ArticleDOI
01 Jun 2005
TL;DR: It is shown that the lower envelope of a set of probabilities bounded by cumulative probability distributions is a belief function and that warming estimates on this basis can generate very imprecise uncertainty models.
Abstract: We apply belief functions to an analysis of future climate change. It is shown that the lower envelope of a set of probabilities bounded by cumulative probability distributions is a belief function. The large uncertainty about natural and socio-economic factors influencing estimates of future climate change is quantified in terms of bounds on cumulative probability. This information is used to construct a belief function for a simple climate change model, which then is projected onto an estimate of global mean warming in the 21st century. Results show that warming estimates on this basis can generate very imprecise uncertainty models.

115 citations


Journal ArticleDOI
TL;DR: The cumulative distribution function, the probability density function and the moment generating function of the MRT output signal-to-noise ratio (SNR) with imperfect CSI are derived, enabling the evaluation of some useful performance metrics such as the average error rate and the outage performance.
Abstract: Maximal ratio transmission (MRT) is designed assuming the availability of perfect channel state information (CSI) at both the transmitter and the receiver. However, perfect CSI is not available in practice. This paper investigates the impact of Gaussian estimation errors on the MRT performance in independently and identically distributed (i.i.d.) Rayleigh fading channels. We derive the cumulative distribution function (cdf), the probability density function (pdf) and the moment generating function (mgf) of the MRT output signal-to-noise ratio (SNR) with imperfect CSI, enabling the evaluation of some useful performance metrics such as the average error rate and the outage performance. Numerical and simulation results are provided to show the impact of imperfect CSI on the MRT performance.

113 citations


01 Jan 2005
TL;DR: In this article, a kernel type estimator for the conditional cumulative distribution function (cond-cdf) is introduced, and an estimate of the quantiles by inverting this estimated cond-cDF, and asymptotic properties are stated.
Abstract: This paper deals with a scalar response conditioned by a functional random variable. The main goal is to estimate nonparametrically the quantiles of such a conditional distribution when the sample is considered as an α-mixing sequence. Firstly, a kernel type estimator for the conditional cumulative distribution function (cond-cdf) is introduced. Afterwards, we derive an estimate of the quantiles by inverting this estimated cond-cdf, and asymptotic properties are stated. This approach can be applied in time series analysis. For that, the whole observed time series has to be split into a set of functional data, and the functional conditional quantile approach can be used both to forecast and to build confidence prediction bands. The El Nino time series illustrates this. AMS (2000) subject classification. Primary 62G05, 62G99; secondary 62M10.

100 citations


Journal ArticleDOI
TL;DR: In this article, two strategies are explored to tackle this flaw of the ranked probability skill score (RPSS) for ensemble systems with small ensemble size, and it is shown that the RPSSL=1 based on the absolute rather than the squared difference between forecasted and observed cumulative probability distribution is unbiased.
Abstract: The ranked probability skill score (RPSS) is a widely used measure to quantify the skill of ensemble forecasts. The underlying score is defined by the quadratic norm and is comparable to the mean squared error (mse) but it is applied in probability space. It is sensitive to the shape and the shift of the predicted probability distributions. However, the RPSS shows a negative bias for ensemble systems with small ensemble size, as recently shown. Here, two strategies are explored to tackle this flaw of the RPSS. First, the RPSS is examined for different norms L (RPSSL). It is shown that the RPSSL=1 based on the absolute rather than the squared difference between forecasted and observed cumulative probability distribution is unbiased; RPSSL defined with higher-order norms show a negative bias. However, the RPSSL=1 is not strictly proper in a statistical sense. A second approach is then investigated, which is based on the quadratic norm but with sampling errors in climatological probabilities conside...

Journal ArticleDOI
TL;DR: It is shown that the presented infinite-series expressions converge rapidly, and can be efficiently used to study several performance criteria for dual-diversity receivers operating over correlated Rician fading channels.
Abstract: Analytical expressions for the evaluation of the bivariate Rician cumulative distribution function (CDF), the covariance, and the characteristic function (CHF) are not known, despite their usefulness in wireless communications systems analysis. In this letter, motivated by the ability of the Rician model to describe fading in wireless communications, we derive infinite-series representations for the probability density function, the CDF, the covariance, and the CHF of two correlated Rician random variables. It is shown that the presented infinite-series expressions converge rapidly, and can be efficiently used to study several performance criteria for dual-diversity receivers operating over correlated Rician fading channels.

Journal ArticleDOI
TL;DR: New infinite series representations for the joint probability density function and the joint cumulative distribution function of three and four correlated Rayleigh RVs are derived and bounds on the error resulting from truncating the infinite series are derived.
Abstract: Few theoretical results are known about the joint distribution of three or more arbitrarily correlated Rayleigh random variables (RVs). Consequently, theoretical performance results are unknown for three- and four-branch equal gain combining (EGC), selection combining (SC), and generalized SC (GSC) in correlated Rayleigh fading. This paper redresses this gap by deriving new infinite series representations for the joint probability density function (pdf) and the joint cumulative distribution function (cdf) of three and four correlated Rayleigh RVs. Bounds on the error resulting from truncating the infinite series are derived. A classical approach, due to Miller, is used to derive our results. Unfortunately, Miller's approach cannot be extended to more than four variates and, in fact, the quadrivariate case considered in this paper appears to be the most general result possible. For brevity, we treat only a limited number of applications in this paper. The new pdf and cdf expressions are used to derive the outage probability of three-branch SC, the moments of the EGC output signal-to-noise ratio (SNR), and the moment generating function of the GSC(2,3) output SNR in arbitrarily correlated Rayleigh fading. A novel application of Bonferroni's inequalities allows new outage bounds for multibranch SC in arbitrarily correlated Rayleigh channels.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a method to assess the cycle distribution and the fatigue damage in stationary broad-band non-Gaussian processes; the method is a further development of an existing procedure proposed for Gaussian processes.

Patent
30 Jun 2005
TL;DR: In this paper, a computer is programmed to fit exponential models to upper percentile subsets of observed measurements for performance metrics collected as attributes of a computer system, defined from sets chosen to reduce model bias due to expected variations in system performance, e.g. those resulting from temporal usage patterns induced by end users and/or workload scheduling.
Abstract: A computer is programmed to fit exponential models to upper percentile subsets of observed measurements for performance metrics collected as attributes of a computer system. The subsets are defined from sets chosen to reduce model bias due to expected variations in system performance, e.g. those resulting from temporal usage patterns induced by end users and/or workload scheduling. Measurement levels corresponding to high cumulative probability, indicative of likely performance anomalies, are extrapolated from the fitted models generated from measurements of lower cumulative probability. These levels are used to establish and to automatically set warning and alert thresholds which signal to (human) administrators when performance anomalies are observed.

Journal ArticleDOI
TL;DR: In this paper, an alternative derivation based on the comparison between its objective function and that of the Maximum Likelihood estimator is proposed, where the objective function is defined by a scalar continuously distributed random variable x with probability density function (p.d.f.) f(x) f (x, 0).
Abstract: In the statistics literature, asymptotic properties of the Maximum Product of Spacings estimator are derived from first principles. We propose an alternative derivation based on the comparison between its objective function and that of the Maximum Likelihood estimator. 1. MOTIVATION Consider a scalar continuously distributed random variable x with probability density function (p.d.f.) f(x) f(x, 0) and corresponding cumulative distribution function (c.d.f.) F(x) F(x, 6) known up to a possibly multidimensional parameter 0 E ?. Denote the true value of the parameter by 00. Suppose that a random sample xl,..., x, is available. To estimate 00, one usually uses the maximum likelihood (ML) estimator

Proceedings ArticleDOI
29 Jun 2005
TL;DR: Algorithms traditionally used for curve approximation are applied to reduce the size of a multidimensional tabulated Cumulative Distribution Function (CDF) by one to three orders of magnitude without compromising its fidelity.
Abstract: As image-based surface reflectance and illumination gain wider use in physically-based rendering systems, it is becoming more critical to provide representations that allow sampling light paths according to the distribution of energy in these high-dimensional measured functions. In this paper, we apply algorithms traditionally used for curve approximation to reduce the size of a multidimensional tabulated Cumulative Distribution Function (CDF) by one to three orders of magnitude without compromising its fidelity. These adaptive representations enable new algorithms for sampling environment maps according to the local orientation of the surface and for multiple importance sampling of image-based lighting and measured BRDFs.

Proceedings ArticleDOI
25 May 2005
TL;DR: An EM algorithm to treat uncertainty zones around points of RopFP in order to estimate the parameters of a mixture model defined on Ropfp and obtain a fuzzy clustering or partition is developed.
Abstract: This paper addresses the problem of fitting mixture densities to uncertain data using the EM algorithm. Uncertain data are modelled by multivariate uncertainty zones which constitute a generalization of multivariate interval-valued data. We develop an EM algorithm to treat uncertainty zones around points of Ropfp in order to estimate the parameters of a mixture model defined on Ropfp and obtain a fuzzy clustering or partition. This EM algorithm requires the evaluation of multidimensional integrals over each uncertainty zone at each iteration. In the diagonal Gaussian mixture model case, these integrals can be computed by simply using the one-dimensional normal cumulative distribution function. Results on simulated data indicate that the proposed algorithm can estimate the true underlying density better than the classical EM algorithm applied to the imprecise data, especially when the imprecision degree is high

Journal ArticleDOI
TL;DR: This paper describes a methodology and an accompanying computer program for estimating a vector of indicators by simple indicator cokriging and produces a variance-covariance matrix of the estimated vectors of indicators which is used to fit a model to the estimated local cdf by logistic regression.

Journal ArticleDOI
01 Oct 2005
TL;DR: This paper first fit marginal distributions of power price series to two special classes of distributions defined by quantile functions (termed Class I and Class II distributions), then uses a theoretical correlation structure to fit the empirical autocorrelation structure.
Abstract: We propose a class of stochastic mean-reverting models for electricity prices with Levy process-driven Ornstein-Uhlenbeck (OU) processes being the building blocks. We first fit marginal distributions of power price series to two special classes of distributions defined by quantile functions (termed Class I and Class II distributions). A theoretical correlation structure is then used to fit the empirical autocorrelation structure. Lastly, based on results from the first two steps, we construct a stochastic process by superposing two OU processes. The focus of this paper is on fitting the marginal distribution. A Class I distribution has closed-form formulas for probability density, cumulative distribution function, and quantile function, while a Class II distribution may have extremely unbalanced tails. Both classes of distributions admit realistic modelling of the marginal distribution of electricity prices. This approach effectively captures not only the anomalous tail behaviors but also the correlation structure present in the electricity price series.

Journal ArticleDOI
TL;DR: This letter presents a closed-form union upper-bound for the cumulative distribution function of the weighted sum of N independent Rayleigh fading envelopes, and computer simulation results verify the tightness of the proposed bound.
Abstract: The problem of finding the distribution of the sum of more than two Rayleigh fading envelopes has never been solved in terms of tabulated functions. In this letter, we present a closed-form union upper-bound for the cumulative distribution function of the weighted sum of N independent Rayleigh fading envelopes. Computer simulation results verify the tightness of the proposed bound for several values of N. The proposed bound can be efficiently applied in various wireless applications, such as satellite communications, equal-gain receivers, and radars.

Journal ArticleDOI
TL;DR: In this article, the authors prove existence of the conservation law with respect to a propagation-of-chaos result for systems of interacting particles with fixed intensity of jumps related to ν.
Abstract: We are interested in the one-dimensional scalar conservation law ∂ t u(t,x)=νD αu(t,x)-∂ xA(u(t,x)) with fractional viscosity operator Dαv(x) = F-1(|ξ|αF(v)(ξ))(x) is the cumulative distribution function of a signed measure on R. We associate a nonlinear martingale problem with the Fokker-Planck equation obtained by spatial differentiation of the conservation law. After checking uniqueness for both the conservation law and the martingale problem, we prove existence thanks to a propagation-of-chaos result for systems of interacting particles with fixed intensity of jumps related to ν. The empirical cumulative distribution functions of the particles converge to the solution of the conservation law. As a consequence, it is possible to approximate this solution numerically by simulating the stochastic differential equation which gives the evolution of particles. Finally, when the intensity of jumps vanishes (ν→0) as the number of particles tends to +∞, we obtain that the empirical cumulative distribution functions converge to the unique entropy solution of the inviscid (ν=0) conservation law.

Journal ArticleDOI
TL;DR: An analytical framework to study search strategies in large-scale decentralized unstructured peer-to-peer (P2P) networks is presented and how to derive the cumulative distribution function (CDF) of the time required by a peer to positively reply to a query is known is shown.

Proceedings ArticleDOI
16 May 2005
TL;DR: In this article, the log shifted gamma (LSG) approximation was proposed to model the sum of M lognormal distributed random variables and the closed-form probability density function (PDF) of the resulting LSG random variable (RV) was presented.
Abstract: This paper proposes the log shifted gamma (LSG) approximation to model the sum of M lognormal distributed random variables. The closed-form probability density function (PDF) of the resulting LSG random variable (RV) is presented and its parameters are derived from those of the M individual lognormal RV by using an iterative moment matching technique. Simulation results on the cumulative distribution function (CDF) of sum of M lognormal random variables in different conditions are used as reference curves to compare various approximation techniques. LSG approximation is found to provide better accuracy over a wide CDF range, especially for large M and/or standard deviation.

Journal ArticleDOI
TL;DR: The crossing statistics of phase processes and random frequency modulation (FM) noise are studied for Nakagami-q fading channels and the derived analytical results are in excellent agreement with those obtained by computer simulations.
Abstract: The crossing statistics of phase processes and random frequency modulation (FM) noise are studied for Nakagami-q fading channels. Closed-form expressions are first derived for the probability density function (PDF) and the cumulative distribution function (CDF) of random FM noise. The crossing rate of the phase process is then obtained for any crossing level of the phase. Moreover, the conditional PDF of random FM noise and envelope processes-conditioned on the crossings of an arbitrary level of the phase-are investigated. Since the Rayleigh fading channel is a special case of the Nakagami-q fading channel, the derived expressions are verified by comparison with results known for Rayleigh fading channels. In addition, it is shown that the derived analytical results are in excellent agreement with those obtained by computer simulations. The presented results are useful, for example, for studying the statistics of noise spikes occurring in limiter-discriminator FM receivers and for investigating the cycle slipping phenomenon in phase-locked-loop schemes when considering the transmission over Nakagami-q mobile fading channels.

Journal ArticleDOI
TL;DR: In this paper, the authors used a statistical approach (cumulative probability density functions) obtained from five thousand Monte Carlos runs to investigate the impact of measurement noise and gross errors in harmonic state estimation.
Abstract: The effectiveness of harmonic state estimation (HSE) in identifying the location and magnitude of harmonic sources is largely dependent on the accuracy of the measurements. Measurement errors (or bad data) can be classified into two groups; measurement noise and gross error. This paper uses a statistical approach (cumulative probability density functions) obtained from five thousand Monte Carlos runs to investigate the impact of measurement noise and gross errors in harmonic state estimation. The Lower South Island of the New Zealand system is used as the test system and the results are probability curves containing the statistics of the estimation error. The effect of additional measurements on an over-determined system to filter noise is also discussed.

Journal ArticleDOI
TL;DR: In this paper, a graph theoretical approach is employed to describe the support set of the nonparametric maximum likelihood estimator for the cumulative distribution function given interval-censored and left-truncated data.
Abstract: Summary. A graph theoretical approach is employed to describe the support set of the nonparametric maximum likelihood estimator for the cumulative distribution function given interval-censored and left-truncated data. A necessary and sufficient condition for the existence of a nonparametric maximum likelihood estimator is then derived. Two previously analysed data sets are revisited.

01 Jan 2005
TL;DR: In this article, continuous, nonnegative random variables with a Schur-constant joint survival function are studied and the authors show that these distributions are characterized by having an Archimedean survival copula, determine the distributions of certain functions of the random variables, and examine dependence properties and correlation coefficients.
Abstract: We study continuous, nonnegative random variables with a Schur-constant joint survival function. We show that these distributions are characterized by having an Archimedean survival copula, determine the distributions of certain functions of the random variables, and examine dependence properties and correlation coefficients for random variables with Schur-constant survival functions.

Proceedings ArticleDOI
Luís Torgo1
21 Aug 2005
TL;DR: The notion of REC surfaces are presented, how to use them to compare the performance of models, and their use with an important practical class of applications: the prediction of rare extreme values are illustrated.
Abstract: This paper presents a generalization of Regression Error Characteristic (REC) curves. REC curves describe the cumulative distribution function of the prediction error of models and can be seen as a generalization of ROC curves to regression problems. REC curves provide useful information for analyzing the performance of models, particularly when compared to error statistics like for instance the Mean Squared Error. In this paper we present Regression Error Characteristic (REC) surfaces that introduce a further degree of detail by plotting the cumulative distribution function of the errors across the distribution of the target variable, i.e. the joint cumulative distribution function of the errors and the target variable. This provides a more detailed analysis of the performance of models when compared to REC curves. This extra detail is particularly relevant in applications with non-uniform error costs, where it is important to study the performance of models for specific ranges of the target variable. In this paper we present the notion of REC surfaces, describe how to use them to compare the performance of models, and illustrate their use with an important practical class of applications: the prediction of rare extreme values.

Journal ArticleDOI
TL;DR: In this article, the density function of a continuous variable is estimated at quantiles corresponding to plots positions associated with the sample's order statistics. But the second guess is the same density function evalu- ated at different quantiles.
Abstract: De nsity probability plots show two guesses at the density function of a continuous variable, given a data sample. The first guess is the density function of a specified distribution (e.g., normal, exponential, gamma, etc.) with appropriate parameter values plugged in. The second guess is the same density function evalu- ated at quantiles corresponding to plotting positions associated with the sample's order statistics. If the specified distribution fits well, the two guesses will be close. Such plots, suggested by Jones and Daly in 1995, are explained and discussed with examples from simulated and real data. Comparisons are made with histograms, kernel density estimation, and quantile-quantile plots.