XGBoost: A Scalable Tree Boosting System

doi:10.1145/2939672.2939785

Open AccessProceedings ArticleDOI

XGBoost: A Scalable Tree Boosting System

Tianqi Chen, +1 more

- 09 Mar 2016 -

arXiv: Learning

TLDR

This paper proposes a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning and provides insights on cache access patterns, data compression and sharding to build a scalable tree boosting system called XGBoost.

Abstract:

Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.

Citations

PDF

Open Access

More filters

Proceedings Article

LightGBM: a highly efficient gradient boosting decision tree

Guolin Ke, +7 more

TL;DR: It is proved that, since the data instances with larger gradients play a more important role in the computation of information gain, GOSS can obtain quite accurate estimation of the information gain with a much smaller data size, and is called LightGBM.

...read moreread less

Posted Content

CatBoost: unbiased boosting with categorical features

Liudmila Ostroumova Prokhorenkova, +4 more

- 28 Jun 2017 -

arXiv: Learning

TL;DR: CatBoost as discussed by the authors is a new gradient boosting toolkit that uses ordered boosting, a permutation-driven alternative to the classic algorithm, and an innovative algorithm for processing categorical features.

...read moreread less

Journal ArticleDOI

The rise of deep learning in drug discovery.

Hongming Chen, +4 more

- 31 Jan 2018 -

Drug Discovery Today

TL;DR: The first wave of applications of deep learning in pharmaceutical research has emerged in recent years, and its utility has gone beyond bioactivity predictions and has shown promise in addressing diverse problems in drug discovery.

...read moreread less

Journal ArticleDOI

On Hyperparameter Optimization of Machine Learning Algorithms: Theory and Practice

Li Yang, +1 more

- 20 Nov 2020 -

Neurocomputing

TL;DR: This survey paper will help industrial users, data analysts, and researchers to better develop machine learning models by identifying the proper hyper-parameter configurations effectively and introducing several state-of-the-art optimization techniques.

...read moreread less

Posted Content

MoleculeNet: A Benchmark for Molecular Machine Learning

Zhenqin Wu, +7 more

- 02 Mar 2017 -

arXiv: Learning

TL;DR: MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance, however, this result comes with caveats.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Random Forests

Leo Breiman

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.

...read moreread less

Journal ArticleDOI

Greedy function approximation: A gradient boosting machine.

Jerome H. Friedman

- 01 Oct 2001 -

Annals of Statistics

TL;DR: A general gradient descent boosting paradigm is developed for additive expansions based on any fitting criterion, and specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification.

...read moreread less

Journal ArticleDOI

Additive Logistic Regression : A Statistical View of Boosting

Jerome H. Friedman, +2 more

- 01 Apr 2000 -

Annals of Statistics

TL;DR: This work shows that this seemingly mysterious phenomenon of boosting can be understood in terms of well-known statistical principles, namely additive modeling and maximum likelihood, and develops more direct approximations and shows that they exhibit nearly identical results to boosting.

...read moreread less

Journal ArticleDOI

Stochastic gradient boosting

Jerome H. Friedman

- 28 Feb 2002 -

Computational Statistics & Data Analysis

TL;DR: It is shown that both the approximation accuracy and execution speed of gradient boosting can be substantially improved by incorporating randomization into the procedure.

...read moreread less