scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Machine Learning: An Applied Econometric Approach

01 May 2017-Journal of Economic Perspectives (American Economic Association 2014 Broadway, Suite 305, Nashville, TN 37203-2418)-Vol. 31, Iss: 2, pp 87-106
TL;DR: This work presents a way of thinking about machine learning that gives it its own place in the econometric toolbox, and aims to make them conceptually easier to use by providing a crisper understanding of how these algorithms work, where they excel, and where they can stumble.
Abstract: Machines are increasingly doing “intelligent” things. Face recognition algorithms use a large dataset of photos labeled as having a face or not to estimate a function that predicts the pre...
Citations
More filters
Journal ArticleDOI
TL;DR: The algorithms of machine learning, which can sift through vast numbers of variables looking for combinations that reliably predict outcomes, will improve prognosis, displace much of the work of radiologists and anatomical pathologists, and improve diagnostic accuracy.
Abstract: The algorithms of machine learning, which can sift through vast numbers of variables looking for combinations that reliably predict outcomes, will improve prognosis, displace much of the work of radiologists and anatomical pathologists, and improve diagnostic accuracy.

1,804 citations

Journal ArticleDOI
TL;DR: This research offers significant and timely insight to AI technology and its impact on the future of industry and society in general, whilst recognising the societal and industrial influence on pace and direction of AI development.

808 citations

Posted Content
TL;DR: It is argued that it is often preferable to treat similarly risky people similarly, based on the most statistically accurate estimates of risk that one can produce, rather than requiring that algorithms satisfy popular mathematical formalizations of fairness.
Abstract: The nascent field of fair machine learning aims to ensure that decisions guided by algorithms are equitable. Over the last several years, three formal definitions of fairness have gained prominence: (1) anti-classification, meaning that protected attributes---like race, gender, and their proxies---are not explicitly used to make decisions; (2) classification parity, meaning that common measures of predictive performance (e.g., false positive and false negative rates) are equal across groups defined by the protected attributes; and (3) calibration, meaning that conditional on risk estimates, outcomes are independent of protected attributes. Here we show that all three of these fairness definitions suffer from significant statistical limitations. Requiring anti-classification or classification parity can, perversely, harm the very groups they were designed to protect; and calibration, though generally desirable, provides little guarantee that decisions are equitable. In contrast to these formal fairness criteria, we argue that it is often preferable to treat similarly risky people similarly, based on the most statistically accurate estimates of risk that one can produce. Such a strategy, while not universally applicable, often aligns well with policy objectives; notably, this strategy will typically violate both anti-classification and classification parity. In practice, it requires significant effort to construct suitable risk estimates. One must carefully define and measure the targets of prediction to avoid retrenching biases in the data. But, importantly, one cannot generally address these difficulties by requiring that algorithms satisfy popular mathematical formalizations of fairness. By highlighting these challenges in the foundation of fair machine learning, we hope to help researchers and practitioners productively advance the area.

685 citations


Cites background from "Machine Learning: An Applied Econom..."

  • ...Nevertheless, recent work at the intersection of machine learning and causal inference (Hill, 2011; Jung et al., 2018; Mullainathan and Spiess, 2017) offers hope for gains....

    [...]

Journal ArticleDOI
TL;DR: A comprehensive literature review on DL studies for financial time series forecasting implementations and grouped them based on their DL model choices, such as Convolutional Neural Networks (CNNs), Deep Belief Networks (DBNs), Long-Short Term Memory (LSTM).

504 citations


Cites background from "Machine Learning: An Applied Econom..."

  • ...Mullainathan and Spies [9] surveyed the prediction process in general from an econometric perspective....

    [...]

Journal ArticleDOI
TL;DR: While machine learning can be valuable, realizing this value requires integrating these tools into an economic framework: being clear about the link between predictions and decisions; specifying the scope of payoff functions; and constructing unbiased decision counterfactuals.
Abstract: Presented on October 24, 2016 at 10:00 a.m. in the Klaus Advanced Computing Building, room 1116

493 citations

References
More filters
Journal ArticleDOI
TL;DR: The Elements of Statistical Learning: Data Mining, Inference, and Prediction as discussed by the authors is a popular book for data mining and machine learning, focusing on data mining, inference, and prediction.
Abstract: (2004). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Journal of the American Statistical Association: Vol. 99, No. 466, pp. 567-567.

10,549 citations

Journal ArticleDOI
TL;DR: In this article, the use of instruments that explain little of the variation in the endogenous explanatory variables can lead to large inconsistencies in the IV estimates even if only a weak relationship exists between the instruments and the error in the structural equation.
Abstract: We draw attention to two problems associated with the use of instrumental variables (IV), the importance of which for empirical work has not been fully appreciated. First, the use of instruments that explain little of the variation in the endogenous explanatory variables can lead to large inconsistencies in the IV estimates even if only a weak relationship exists between the instruments and the error in the structural equation. Second, in finite samples, IV estimates are biased in the same direction as ordinary least squares (OLS) estimates. The magnitude of the bias of IV estimates approaches that of OLS estimates as the R 2 between the instruments and the endogenous explanatory variable approaches 0. To illustrate these problems, we reexamine the results of a recent paper by Angrist and Krueger, who used large samples from the U.S. Census to estimate wage equations in which quarter of birth is used as an instrument for educational attainment. We find evidence that, despite huge sample sizes, th...

4,219 citations

Journal Article
TL;DR: It is proved that a single condition, which is called the Irrepresentable Condition, is almost necessary and sufficient for Lasso to select the true model both in the classical fixed p setting and in the large p setting as the sample size n gets large.
Abstract: Sparsity or parsimony of statistical models is crucial for their proper interpretations, as in sciences and social sciences. Model selection is a commonly used method to find such models, but usually involves a computationally heavy combinatorial search. Lasso (Tibshirani, 1996) is now being used as a computationally feasible alternative to model selection. Therefore it is important to study Lasso for model selection purposes. In this paper, we prove that a single condition, which we call the Irrepresentable Condition, is almost necessary and sufficient for Lasso to select the true model both in the classical fixed p setting and in the large p setting as the sample size n gets large. Based on these results, sufficient conditions that are verifiable in practice are given to relate to previous works and help applications of Lasso for feature selection and sparse representation. This Irrepresentable Condition, which depends mainly on the covariance of the predictor variables, states that Lasso selects the true model consistently if and (almost) only if the predictors that are not in the true model are "irrepresentable" (in a sense to be clarified) by predictors that are in the true model. Furthermore, simulations are carried out to provide insights and understanding of this result.

2,803 citations


"Machine Learning: An Applied Econom..." refers background in this paper

  • ...This is seen clearly in Zhao and Yu (2006) who establish asymptotic model-selection consistency for the LASSO....

    [...]

Journal ArticleDOI
31 Mar 1989-Science
TL;DR: Research comparing these two approaches to decision-making shows the actuarial method to be superior, factors underlying the greater accuracy of actuarial methods, sources of resistance to the scientific findings, and the benefits of increased reliance on actuarial approaches are discussed.
Abstract: Professionals are frequently consulted to diagnose and predict human behavior; optimal treatment and planning often hinge on the consultant's judgmental accuracy. The consultant may rely on one of two contrasting approaches to decision-making--the clinical and actuarial methods. Research comparing these two approaches shows the actuarial method to be superior. Factors underlying the greater accuracy of actuarial methods, sources of resistance to the scientific findings, and the benefits of increased reliance on actuarial approaches are discussed.

2,102 citations


"Machine Learning: An Applied Econom..." refers background in this paper

  • ...Even when an algorithm can help, we must understand the factors that determine adoption of these tools (Dawes, Faust, and Meehl 1989; Dietvorst, Simmons, and Massey 2015; Yeomans, Shah, Mullainathan, and Kleinberg 2016)....

    [...]

Book
16 Apr 2013
TL;DR: How to Construct Nonparametric Regression Estimates * Lower Bounds * Partitioning Estimates * Kernel Estimates * k-NN Estimates * Splitting the Sample * Cross Validation * Uniform Laws of Large Numbers
Abstract: Why is Nonparametric Regression Important? * How to Construct Nonparametric Regression Estimates * Lower Bounds * Partitioning Estimates * Kernel Estimates * k-NN Estimates * Splitting the Sample * Cross Validation * Uniform Laws of Large Numbers * Least Squares Estimates I: Consistency * Least Squares Estimates II: Rate of Convergence * Least Squares Estimates III: Complexity Regularization * Consistency of Data-Dependent Partitioning Estimates * Univariate Least Squares Spline Estimates * Multivariate Least Squares Spline Estimates * Neural Networks Estimates * Radial Basis Function Networks * Orthogonal Series Estimates * Advanced Techniques from Empirical Process Theory * Penalized Least Squares Estimates I: Consistency * Penalized Least Squares Estimates II: Rate of Convergence * Dimension Reduction Techniques * Strong Consistency of Local Averaging Estimates * Semi-Recursive Estimates * Recursive Estimates * Censored Observations * Dependent Observations

1,931 citations