scispace - formally typeset
Search or ask a question

Showing papers on "Nonparametric statistics published in 2007"


Book
19 Jan 2007
TL;DR: This handbook provides you with everything you need to know about parametric and nonparametric statistical procedures, and helps you choose the best test for your data, interpret the results, and better evaluate the research of others.
Abstract: With more than 500 pages of new material, the Handbook of Parametric and Nonparametric Statistical Procedures, Fourth Edition carries on the esteemed tradition of the previous editions, providing up-to-date, in-depth coverage of now more than 160 statistical procedures. The book also discusses both theoretical and practical statistical topics, such as experimental design, experimental control, and statistical analysis. New to the Fourth Edition Multivariate statistics including matrix algebra, multiple regression, Hotellings T2, MANOVA, MANCOVA, discriminant function analysis, canonical correlation, logistic regression, and principal components/factor analysis Clinical trials, survival analysis, tests of equivalence, analysis of censored data, and analytical procedures for crossover design Regression diagnostics that include the Durbin-Watson test Log-linear analysis of contingency tables, Mantel-Haenszel analysis of multiple 2 2 contingency tables, trend analysis, and analysis of variance for a Latin square design Levene and Brown-Forsythe tests for evaluating homogeneity of variance, the Jarque-Bera test of normality, and the extreme studentized deviate test for identifying outliers Confidence intervals for computing the population median and the difference between two population medians The relationship between exponential and Poisson distribution Eliminating the need to search across numerous books, this handbook provides you with everything you need to know about parametric and nonparametric statistical procedures. It helps you choose the best test for your data, interpret the results, and better evaluate the research of others.

5,097 citations


Journal ArticleDOI
TL;DR: A unified approach is proposed that makes it possible for researchers to preprocess data with matching and then to apply the best parametric techniques they would have used anyway and this procedure makes parametric models produce more accurate and considerably less model-dependent causal inferences.
Abstract: Although published works rarely include causal estimates from more than a few model specifications, authors usually choose the presented estimates from numerous trial runs readers never see. Given the often large variation in estimates across choices of control variables, functional forms, and other modeling assumptions, how can researchers ensure that the few estimates presented are accurate or representative? How do readers know that publications are not merely demonstrations that it is possible to find a specification that fits the author's favorite hypothesis? And how do we evaluate or even define statistical properties like unbiasedness or mean squared error when no unique model or estimator even exists? Matching methods, which offer the promise of causal inference with fewer assumptions, constitute one possible way forward, but crucial results in this fast-growing methodological literature are often grossly misinterpreted. We explain how to avoid these misinterpretations and propose a unified approach that makes it possible for researchers to preprocess data with matching (such as with the easy-to-use software we offer) and then to apply the best parametric techniques they would have used anyway. This procedure makes parametric models produce more accurate and considerably less model-dependent causal inferences.

3,601 citations


Journal ArticleDOI
TL;DR: In this paper, a coherent data-generating process (DGP) is described for nonparametric estimates of productive efficiency on environmental variables in two-stage procedures to account for exogenous factors that might affect firms’ performance.

2,915 citations


Book Chapter
01 Dec 2007
TL;DR: This paper proposed a nonparametric method which directly produces resampling weights without distribution estimation, which works by matching distributions between training and testing sets in feature space, and experimental results demonstrate that their method works well in practice.
Abstract: We consider the scenario where training and test data are drawn from different distributions, commonly referred to as sample selection bias. Most algorithms for this setting try to first recover sampling distributions and then make appropriate corrections based on the distribution estimate. We present a nonparametric method which directly produces resampling weights without distribution estimation. Our method works by matching distributions between training and testing sets in feature space. Experimental results demonstrate that our method works well in practice.

1,227 citations


Posted Content
TL;DR: In this paper, the authors proposed a regression method to estimate the impact of explanatory variables on quantiles of the unconditional (marginal) distribution of an outcome variable, which consists of running a regression of the (recentered) influence function (RIF) on the explanatory variables.
Abstract: We propose a new regression method to estimate the impact of explanatory variables on quantiles of the unconditional (marginal) distribution of an outcome variable. The proposed method consists of running a regression of the (recentered) influence function (RIF) of the unconditional quantile on the explanatory variables. The influence function is a widely used tool in robust estimation that can easily be computed for each quantile of interest. We show how standard partial effects, as well as policy effects, can be estimated using our regression approach. We propose three different regression estimators based on a standard OLS regression (RIF-OLS), a logit regression (RIF-Logit), and a nonparametric logit regression (RIF-OLS). We also discuss how our approach can be generalized to other distributional statistics besides quantiles.

957 citations


Journal ArticleDOI
TL;DR: The present book applies kernel regression techniques to functional data problems such as functional regression or classification, where the predictor is a function and nonparametric statisticians should feel very much at home with the approach taken in this book.
Abstract: (2007). Nonparametric Functional Data Analysis: Theory And Practice. Technometrics: Vol. 49, No. 2, pp. 226-226.

805 citations


Posted Content
Xiaohong Chen1
TL;DR: The method of sieves as discussed by the authors can be used to estimate semi-nonparametric econometric models with various constraints, such as monotonicity, convexity, additivity, multiplicity, exclusion and nonnegativity.
Abstract: Often researchers find parametric models restrictive and sensitive to deviations from the parametric specifications; semi-nonparametric models are more flexible and robust, but lead to other complications such as introducing infinite-dimensional parameter spaces that may not be compact and the optimization problem may no longer be well-posed. The method of sieves provides one way to tackle such difficulties by optimizing an empirical criterion over a sequence of approximating parameter spaces (i.e., sieves); the sieves are less complex but are dense in the original space and the resulting optimization problem becomes well-posed. With different choices of criteria and sieves, the method of sieves is very flexible in estimating complicated semi-nonparametric models with (or without) endogeneity and latent heterogeneity. It can easily incorporate prior information and constraints, often derived from economic theory, such as monotonicity, convexity, additivity, multiplicity, exclusion and nonnegativity. It can simultaneously estimate the parametric and nonparametric parts in semi-nonparametric models, typically with optimal convergence rates for both parts. This chapter describes estimation of semi-nonparametric econometric models via the method of sieves. We present some general results on the large sample properties of the sieve estimates, including consistency of the sieve extremum estimates, convergence rates of the sieve M-estimates, pointwise normality of series estimates of regression functions, root-n asymptotic normality and efficiency of sieve estimates of smooth functionals of infinite-dimensional parameters. Examples are used to illustrate the general results.

654 citations


Journal ArticleDOI
TL;DR: This article developed estimators for quantile treatment effects under the identifying restriction that selection to treatment is based on observable characteristics, without requiring computation of the conditional quantiles of the potential outcomes.
Abstract: This paper develops estimators for quantile treatment effects under the identifying restriction that selection to treatment is based on observable characteristics. Identification is achieved without requiring computation of the conditional quantiles of the potential outcomes. Instead, the identification results for the marginal quantiles lead to an estimation procedure for the quantile treatment effect parameters that has two steps: nonparametric estimation of the propensity score and computation of the difference between the solutions of two separate minimization problems. Root-N consistency, asymptotic normality, and achievement of the semiparametric efficiency bound are shown for that estimator. A consistent estimation procedure for the variance is also presented. Finally, the method developed here is applied to evaluation of a job training program and to a Monte Carlo exercise. Results from the empirical application indicate that the method works relatively well even for a data set with limited overlap between treated and controls in the support of covariates. The Monte Carlo study shows that, for a relatively small sample size, the method produces estimates with good precision and low bias, especially for middle quantiles.

543 citations


Journal ArticleDOI
TL;DR: This book deals with probability distributions, discrete and continuous densities, distribution functions, bivariate distributions, means, variances, covariance, correlation, and some random process material.
Abstract: Chapter 3 deals with probability distributions, discrete and continuous densities, distribution functions, bivariate distributions, means, variances, covariance, correlation, and some random process material. Chapter 4 is a detailed study of the concept of utility including the psychological aspects, risk, attributes, rules for utilities, multidimensional utility, and normal form of analysis. Chapter 5 treats games and optimization, linear optimization, and mixed strategies. Entropy is the topic of Chapter 6 with sections devoted to entropy, disorder, information, Shannon’s theorem, demon’s roulette, Maxwell– Boltzmann distribution, Schrodinger’s nutshell, maximum entropy probability distributions, blackbodies, and Bose–Einstein distribution. Chapter 7 is standard statistical fare including transformations of random variables, characteristic functions, generating functions, and the classic limit theorems such as the central limit theorem and the laws of large numbers. Chapter 8 is about exchangeability and inference with sections on Bayesian techniques and classical inference. Partial exchangeability is also treated. Chapter 9 considers such things as order statistics, extreme value, intensity, hazard functions, and Poisson processes. Chapter 10 covers basic elements of risk and reliability, while Chapter 11 is devoted to curve fitting, regression, and Monte Carlo simulation. There is an ample number of exercises at the ends of the chapters with answers or comments on many of them in an appendix in the back of the book. Other appendices are on the common discrete and continuous distributions and mathematical aspects of integration.

539 citations


Journal ArticleDOI
TL;DR: In this paper, the authors propose a test for comparing the out-of-sample accuracy of competing density forecasts of a variable, which is valid under general conditions: the data can be heterogeneous and the forecasts can be based on (nested or non-nested) parametric models or produced by semiparametric, nonparametric, or Bayesian estimation techniques.
Abstract: We propose a test for comparing the out-of-sample accuracy of competing density forecasts of a variable. The test is valid under general conditions: The data can be heterogeneous and the forecasts can be based on (nested or nonnested) parametric models or produced by semiparametric, nonparametric, or Bayesian estimation techniques. The evaluation is based on scoring rules, which are loss functions defined over the density forecast and the realizations of the variable. We restrict attention to the logarithmic scoring rule and propose an out-of-sample “weighted likelihood ratio” test that compares weighted averages of the scores for the competing forecasts. The user-defined weights are a way to focus attention on different regions of the distribution of the variable. For a uniform weight function, the test can be interpreted as an extension of Vuong's likelihood ratio test to time series data and to an out-of-sample testing framework. We apply the tests to evaluate density forecasts of U.S. inflation produc...

535 citations


Journal ArticleDOI
TL;DR: In this paper, the authors studied the sparsity oracle properties of l1-penalized least squares in nonparametric regression with random design and showed that the penalized least square estimator satisfies sparsity inequalities, i.e., bounds in terms of the number of nonzero components of the oracle vector.
Abstract: This paper studies oracle properties of l1-penalized least squares in nonparametric regression setting with random design. We show that the penalized least squares estimator satisfies sparsity oracle inequalities, i.e., bounds in terms of the number of non-zero components of the oracle vector. The results are valid even when the dimension of the model is (much) larger than the sample size and the regression matrix is not positive definite. They can be applied to high-dimensional linear regression, to nonparametric adaptive regression estimation and to the problem of aggregation of arbitrary estimators.

Journal ArticleDOI
TL;DR: In this paper, a recursive extension of the two-step PML, called nested pseudo likelihood (NPL), is proposed to deal with the indeterminacy problem associated with the existence of multiple equilibria and the computational burden in the solution of the game.
Abstract: This paper studies the estimation of dynamic discrete games of incomplete information. Two main econometric issues appear in the estimation of these models: the indeterminacy problem associated with the existence of multiple equilibria and the computational burden in the solution of the game. We propose a class of pseudo maximum likelihood (PML) estimators that deals with these problems, and we study the asymptotic and finite sample properties of several estimators in this class. We first focus on two-step PML estimators, which, although they are attractive for their computational simplicity, have some important limitations: they are seriously biased in small samples; they require consistent nonparametric estimators of players' choice probabilities in the first step, which are not always available; and they are asymptotically inefficient. Second, we show that a recursive extension of the two-step PML, which we call nested pseudo likelihood (NPL), addresses those drawbacks at a relatively small additional computational cost. The NPL estimator is particularly useful in applications where consistent nonparametric estimates of choice probabilities either are not available or are very imprecise, e.g., models with permanent unobserved heterogeneity. Finally, we illustrate these methods in Monte Carlo experiments and in an empirical application to a model of firm entry and exit in oligopoly markets using Chilean data from several retail industries.

Posted Content
TL;DR: The method of sieves as mentioned in this paper can be used to estimate semi-nonparametric econometric models with various constraints, such as monotonicity, convexity, additivity, multiplicity, exclusion and nonnegativity.
Abstract: Often researchers find parametric models restrictive and sensitive to deviations from the parametric specifications; semi-nonparametric models are more flexible and robust, but lead to other complications such as introducing infinite-dimensional parameter spaces that may not be compact and the optimization problem may no longer be well-posed. The method of sieves provides one way to tackle such difficulties by optimizing an empirical criterion over a sequence of approximating parameter spaces (i.e., sieves); the sieves are less complex but are dense in the original space and the resulting optimization problem becomes well-posed. With different choices of criteria and sieves, the method of sieves is very flexible in estimating complicated semi-nonparametric models with (or without) endogeneity and latent heterogeneity. It can easily incorporate prior information and constraints, often derived from economic theory, such as monotonicity, convexity, additivity, multiplicity, exclusion and nonnegativity. It can simultaneously estimate the parametric and nonparametric parts in semi-nonparametric models, typically with optimal convergence rates for both parts. This chapter describes estimation of semi-nonparametric econometric models via the method of sieves. We present some general results on the large sample properties of the sieve estimates, including consistency of the sieve extremum estimates, convergence rates of the sieve M-estimates, pointwise normality of series estimates of regression functions, root-n asymptotic normality and efficiency of sieve estimates of smooth functionals of infinite-dimensional parameters. Examples are used to illustrate the general results.

Journal ArticleDOI
TL;DR: It is shown that the penalized least squares estimator satisfies sparsity oracle inequalities, i.e., bounds in terms of the number of non-zero components of the oracle vector, in nonparametric regression setting with random design.
Abstract: This paper studies oracle properties of $\ell_1$-penalized least squares in nonparametric regression setting with random design. We show that the penalized least squares estimator satisfies sparsity oracle inequalities, i.e., bounds in terms of the number of non-zero components of the oracle vector. The results are valid even when the dimension of the model is (much) larger than the sample size and the regression matrix is not positive definite. They can be applied to high-dimensional linear regression, to nonparametric adaptive regression estimation and to the problem of aggregation of arbitrary estimators.

Book
02 Feb 2007
TL;DR: This paper presents nonparametric Descriptive Methods to Check Parametric Assumptions of Parametric Models of Time-Dependence, and three models of Exponential Transition Rate Models, which are examples of time-dependence-based exponential models.
Abstract: Contents: Preface. Introduction. Event History Data Structures. Nonparametric Descriptive Methods. Exponential Transition Rate Models. Piecewise Constant Exponential Models. Exponential Models With Time-Dependent Covariates. Parametric Models of Time-Dependence. Methods to Check Parametric Assumptions. Semiparametric Transition Rate Models. Problems of Model Specification.


Book ChapterDOI
Xiaohong Chen1
TL;DR: The method of sieves as mentioned in this paper can be used to estimate semi-nonparametric econometric models with various constraints, such as monotonicity, convexity, additivity, multiplicity, exclusion and nonnegativity.
Abstract: Often researchers find parametric models restrictive and sensitive to deviations from the parametric specifications; semi-nonparametric models are more flexible and robust, but lead to other complications such as introducing infinite-dimensional parameter spaces that may not be compact and the optimization problem may no longer be well-posed. The method of sieves provides one way to tackle such difficulties by optimizing an empirical criterion over a sequence of approximating parameter spaces (i.e., sieves); the sieves are less complex but are dense in the original space and the resulting optimization problem becomes well-posed. With different choices of criteria and sieves, the method of sieves is very flexible in estimating complicated semi-nonparametric models with (or without) endogeneity and latent heterogeneity. It can easily incorporate prior information and constraints, often derived from economic theory, such as monotonicity, convexity, additivity, multiplicity, exclusion and nonnegativity. It can simultaneously estimate the parametric and nonparametric parts in semi-nonparametric models, typically with optimal convergence rates for both parts. This chapter describes estimation of semi-nonparametric econometric models via the method of sieves. We present some general results on the large sample properties of the sieve estimates, including consistency of the sieve extremum estimates, convergence rates of the sieve M-estimates, pointwise normality of series estimates of regression functions, root-n asymptotic normality and efficiency of sieve estimates of smooth functionals of infinite-dimensional parameters. Examples are used to illustrate the general results.

Journal ArticleDOI
TL;DR: In this article, a nonparametric estimator for the estimation of local average treatment effects with covariates is suggested that is root-n asymptotically normal and efficient, which is similar to our estimator.

Journal ArticleDOI
TL;DR: It is shown that nonparametric statistical tests provide convincing and elegant solutions for both problems and allow to incorporate biophysically motivated constraints in the test statistic, which may drastically increase the sensitivity of the test.

Journal ArticleDOI
TL;DR: In this paper, a nonparametric efficiency analysis based on robust estimation of partial frontiers in a complete multivariate setup (multiple inputs and multiple outputs) is proposed, achieving strong consistency and asymptotic normality.

Journal ArticleDOI
TL;DR: In this article, a class of semiparametric models for the covariance function by that imposes a parametric correlation structure while allowing a nonparametric variance function is proposed, and a kernel estimator is developed.
Abstract: Improving efficiency for regression coefficients and predicting trajectories of individuals are two important aspects in the analysis of longitudinal data. Both involve estimation of the covariance function. Yet challenges arise in estimating the covariance function of longitudinal data collected at irregular time points. A class of semiparametric models for the covariance function by that imposes a parametric correlation structure while allowing a nonparametric variance function is proposed. A kernel estimator for estimating the nonparametric variance function is developed. Two methods for estimating parameters in the correlation structure—a quasi-likelihood approach and a minimum generalized variance method—are proposed. A semiparametric varying coefficient partially linear model for longitudinal data is introduced, and an estimation procedure for model coefficients using a profile weighted least squares approach is proposed. Sampling properties of the proposed estimation procedures are studied, and asy...

Journal ArticleDOI
TL;DR: In this paper, a nonparametric stochastic frontier (SF) model is proposed based on local maximum likelihood (LML) for estimating the efficiency of a production process.

Book
16 Apr 2007
TL;DR: Outliers Combining Data Sets Statistics of Interlaboratory Collaborative Testing Random Numbers Exercises References PRESENTING DATA Tables Charts Graphs Mathematical Expressions Exercised References PROPORTIONS, SURVIVAL DATA and TIME SERIES DATA Introduction Proportions Survival Data Time Series Data Exercise References SELECTED TOPICS Basic Probability Concepts Measures of Location Tests for Nonrandomness.
Abstract: WHAT ARE DATA? Definition of Data Kinds of Data Variability Populations and Samples Importance of Reliability Metrology Computer Assisted Statistical Analyses Exercises References OBTAINING MEANINGFUL DATA Data Production Must Be Planned The Experimental Method Data Quality Indicators Data Quality Objectives Systematic Measurement Quality Assurance Importance of Peer Review Exercises References GENERAL PRINCIPLES Introduction Kinds of Statistics Decisions Error and Uncertainty Kinds of Data Accuracy, Precision, and Bias Statistical Control Distributions Tests for Normality Basic Requirements for Statistical Analysis Validity MINITAB Exercises References STATISTICAL CALCULATIONS Introduction The Mean, Variance, and Standard Deviation Degrees of Freedom Using Duplicate Measurements to Estimate a Standard Deviation Using the Range to Estimate the Standard Deviation Pooled Statistical Estimates Simple Analysis of Variance Log Normal Statistics Minimum Reporting Statistics Computations One Last Thing to Remember Exercises References DATA ANALYSIS TECHNIQUES Introduction One Sample Topics Two Sample Topics Propagation of Error in a Derived or Calculated Value Exercises References MANAGING SETS OF DATA Introduction Outliers Combining Data Sets Statistics of Interlaboratory Collaborative Testing Random Numbers Exercises References PRESENTING DATA Tables Charts Graphs Mathematical Expressions Exercises References PROPORTIONS, SURVIVAL DATA AND TIME SERIES DATA Introduction Proportions Survival Data Time Series Data Exercises References SELECTED TOPICS Basic Probability Concepts Measures of Location Tests for Nonrandomness Comparing Several Averages Type I Errors, Type II Errors and Statistical Power Critical Values and P Values Correlation Coefficient The Best Two Out of Three Comparing a Frequency Distribution with a Normal Distribution Confidence for a Fitted Line Joint Confidence Region for the Constants of a Fitted Line Shortcut Procedures Nonparametric Tests Extreme Value Data Statistics of Control Charts Simulation and Macros Exercises References CONCLUSION Summary APPENDICES Statistical Tables Glossary Answers to Numerical Exercises Index

Journal ArticleDOI
TL;DR: An approximate spatial correlation model for clustered multiple-input multiple-output (MIMO) channels is proposed and used to show that the proposed model is a good fit to the existing parametric models for low angle spreads (i.e., smaller than 10deg).
Abstract: An approximate spatial correlation model for clustered multiple-input multiple-output (MIMO) channels is proposed in this paper. The two ingredients for the model are an approximation for uniform linear and circular arrays to avoid numerical integrals and a closed-form expression for the correlation coefficients that is derived for the Laplacian azimuth angle distribution. A new performance metric to compare parametric and nonparametric channel models is proposed and used to show that the proposed model is a good fit to the existing parametric models for low angle spreads (i.e., smaller than 10deg). A computational-complexity analysis shows that the proposed method is a numerically efficient way of generating the spatially correlated MIMO channels.

Book
01 Jan 2007

Journal ArticleDOI
TL;DR: In this article, the authors consider nonparametric identification and estimation of a model that is monotonic in a nonseparable scalar disturbance, which disturbance is independent of instruments.

Journal ArticleDOI
Abstract: The paper examines various tests for assessing whether a time series model requires a slope component. We first consider the simple t-test on the mean of first differences and show that it achieves high power against the alternative hypothesis of a stochastic nonstationary slope as well as against a purely deterministic slope. The test may be modified, parametrically or nonparametrically to deal with serial correlation. Using both local limiting power arguments and finite sample Monte Carlo results, we compare the t-test with the nonparametric tests of Vogelsang (1998) and with a modified stationarity test. Overall the t-test seems a good choice, particularly if it is implemented by fitting a parametric model to the data. When standardized by the square root of the sample size, the simple t-statistic, with no correction for serial correlation, has a limiting distribution if the slope is stochastic. We investigate whether it is a viable test for the null hypothesis of a stochastic slope and conclude that its value may be limited by an inability to reject a small deterministic slope. Empirical illustrations are provided using series of relative prices in the euro-area and data on global temperature.

Journal ArticleDOI
TL;DR: This paper describes how to compute sampling-based policies, that is, policies that are computed based only on observed samples of the demands without any access to, or assumptions on, the true demand distributions.
Abstract: In this paper, we consider two fundamental inventory models, the single-period newsvendor problem and its multiperiod extension, but under the assumption that the explicit demand distributions are not known and that the only information available is a set of independent samples drawn from the true distributions. Under the assumption that the demand distributions are given explicitly, these models are well studied and relatively straightforward to solve. However, in most real-life scenarios, the true demand distributions are not available, or they are too complex to work with. Thus, a sampling-driven algorithmic framework is very attractive, both in practice and in theory. We shall describe how to compute sampling-based policies, that is, policies that are computed based only on observed samples of the demands without any access to, or assumptions on, the true demand distributions. Moreover, we establish bounds on the number of samples required to guarantee that, with high probability, the expected cost of the sampling-based policies is arbitrarily close (i.e., with arbitrarily small relative error) compared to the expected cost of the optimal policies, which have full access to the demand distributions. The bounds that we develop are general, easy to compute, and do not depend at all on the specific demand distributions.

ReportDOI
01 May 2007
TL;DR: It is argued that an approach based on interval statistics could be a reasonable alternative to current standard methods for evaluating, expressing and propagating measurement uncertainties.
Abstract: This report addresses the characterization of measurements that include epistemic uncertainties in the form of intervals. It reviews the application of basic descriptive statistics to data sets which contain intervals rather than exclusively point estimates. It describes algorithms to compute various means, the median and other percentiles, variance, interquartile range, moments, confidence limits, and other important statistics and summarizes the computability of these statistics as a function of sample size and characteristics of the intervals in the data (degree of overlap, size and regularity of widths, etc.). It also reviews the prospects for analyzing such data sets with the methods of inferential statistics such as outlier detection and regressions. The report explores the tradeoff between measurement precision and sample size in statistical results that are sensitive to both. It also argues that an approach based on interval statistics could be a reasonable alternative to current standard methods for evaluating, expressing and propagating measurement uncertainties.

Proceedings Article
03 Dec 2007
TL;DR: A statistical analysis of the properties of SpAM and empirical results on synthetic and real data show that SpAM can be effective in fitting sparse nonparametric models in high dimensional data.
Abstract: We present a new class of models for high-dimensional nonparametric regression and classification called sparse additive models (SpAM) Our methods combine ideas from sparse linear modeling and additive nonparametric regression We derive a method for fitting the models that is effective even when the number of covariates is larger than the sample size A statistical analysis of the properties of SpAM is given together with empirical results on synthetic and real data, showing that SpAM can be effective in fitting sparse nonparametric models in high dimensional data