Showing papers on "Random forest published in 2010"

PDF

Open Access

Journal Article•DOI•

Feature Selection with the Boruta Package

[...]

Miron B. Kursa¹, Witold R. Rudnicki¹•Institutions (1)

16 Sep 2010-Journal of Statistical Software

TL;DR: The Boruta package provides a convenient interface to the Boruta algorithm, implementing a novel feature selection algorithm for finding emph{all relevant variables}.

...read moreread less

Abstract: This article describes a R package Boruta, implementing a novel feature selection algorithm for finding emph{all relevant variables}. The algorithm is designed as a wrapper around a Random Forest classification algorithm. It iteratively removes the features which are proved by a statistical test to be less relevant than random probes. The Boruta package provides a convenient interface to the algorithm. The short description of the algorithm and examples of its application are presented.

...read moreread less

2,832 citations

Journal Article•DOI•

Variable selection using random forests

[...]

Robin Genuer¹, Jean-Michel Poggi¹, Christine Tuleau-Malot²•Institutions (2)

University of Paris-Sud¹, University of Nice Sophia Antipolis²

01 Oct 2010-Pattern Recognition Letters

TL;DR: This paper proposes, focusing on random forests, the increasingly used statistical method for classification and regression problems introduced by Leo Breiman in 2001, to investigate two classical issues of variable selection, and proposes a strategy involving a ranking of explanatory variables using the random forests score of importance and a stepwise ascending variable introduction strategy.

...read moreread less

1,766 citations

Journal Article•DOI•

Improving propensity score weighting using machine learning.

[...]

Brian K. Lee¹, Justin Lessler², Elizabeth A. Stuart²•Institutions (2)

Drexel University¹, Johns Hopkins University²

10 Feb 2010-Statistics in Medicine

TL;DR: The authors examine the performance of various CART‐based propensity score models using simulated data and suggest that ensemble methods, especially boosted CART, may be useful for propensity score weighting.

...read moreread less

Abstract: Machine learning techniques such as classification and regression trees (CART) have been suggested as promising alternatives to logistic regression for the estimation of propensity scores. The authors examined the performance of various CART-based propensity score models using simulated data. Hypothetical studies of varying sample sizes (n=500, 1000, 2000) with a binary exposure, continuous outcome, and 10 covariates were simulated under seven scenarios differing by degree of non-linear and non-additive associations between covariates and the exposure. Propensity score weights were estimated using logistic regression (all main effects), CART, pruned CART, and the ensemble methods of bagged CART, random forests, and boosted CART. Performance metrics included covariate balance, standard error, per cent absolute bias, and 95 per cent confidence interval (CI) coverage. All methods displayed generally acceptable performance under conditions of either non-linearity or non-additivity alone. However, under conditions of both moderate non-additivity and moderate non-linearity, logistic regression had subpar performance, whereas ensemble methods provided substantially better bias reduction and more consistent 95 per cent CI coverage. The results suggest that ensemble methods, especially boosted CART, may be useful for propensity score weighting.

...read moreread less

713 citations

Posted Content•

Analysis of a Random Forests Model

[...]

Gérard Biau¹•Institutions (1)

Pierre-and-Marie-Curie University¹

03 May 2010-arXiv: Machine Learning

TL;DR: In this paper, an in-depth analysis of a random forests model suggested by Breiman in the early 2000's is presented, showing that the procedure is consistent and adapts to sparsity, and that its rate of convergence depends only on the number of strong features and not on how many noise variables are present.

...read moreread less

Abstract: Random forests are a scheme proposed by Leo Breiman in the 2000's for building a predictor ensemble with a set of decision trees that grow in randomly selected subspaces of data. Despite growing interest and practical use, there has been little exploration of the statistical properties of random forests, and little is known about the mathematical forces driving the algorithm. In this paper, we offer an in-depth analysis of a random forests model suggested by Breiman in \cite{Bre04}, which is very close to the original algorithm. We show in particular that the procedure is consistent and adapts to sparsity, in the sense that its rate of convergence depends only on the number of strong features and not on how many noise variables are present.

...read moreread less

667 citations

Book•

Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions

[...]

Giovanni Seni¹, John Elder•Institutions (1)

Santa Clara University¹

24 Feb 2010

TL;DR: IS reveals classic ensemble methods -- bagging, random forests, and boosting -- to be special cases of a single algorithm, thereby showing how to improve their accuracy and speed, and explains the paradox of how ensembles achieve greater accuracy on new data despite their (apparently much greater) complexity.

...read moreread less

Abstract: Ensemble methods have been called the most influential development in Data Mining and Machine Learning in the past decade. They combine multiple models into one usually more accurate than the best of its components. Ensembles can provide a critical boost to industrial challenges -- from investment timing to drug discovery, and fraud detection to recommendation systems -- where predictive accuracy is more vital than model interpretability. Ensembles are useful with all modeling algorithms, but this book focuses on decision trees to explain them most clearly. After describing trees and their strengths and weaknesses, the authors provide an overview of regularization -- today understood to be a key reason for the superior performance of modern ensembling algorithms. The book continues with a clear description of two recent developments: Importance Sampling (IS) and Rule Ensembles (RE). IS reveals classic ensemble methods -- bagging, random forests, and boosting -- to be special cases of a single algorithm, thereby showing how to improve their accuracy and speed. REs are linear rule models derived from decision tree ensembles. They are the most interpretable version of ensembles, which is essential to applications such as credit scoring and fault diagnosis. Lastly, the authors explain the paradox of how ensembles achieve greater accuracy on new data despite their (apparently much greater) complexity.This book is aimed at novice and advanced analytic researchers and practitioners -- especially in Engineering, Statistics, and Computer Science. Those with little exposure to ensembles will learn why and how to employ this breakthrough method, and advanced practitioners will gain insight into building even more powerful models. Throughout, snippets of code in R are provided to illustrate the algorithms described and to encourage the reader to try the techniques. (edited by author)

...read moreread less

471 citations

Book Chapter•DOI•

Regression forests for efficient anatomy detection and localization in CT studies

[...]

Antonio Criminisi¹, Jamie Shotton¹, Duncan Robertson¹, Ender Konukoglu¹•Institutions (1)

Microsoft¹

20 Sep 2010

TL;DR: This paper introduces a new, continuous parametrization of the anatomy localization task which is effectively addressed by regression forests, and shows to be a more natural approach than classification.

...read moreread less

Abstract: This paper proposes multi-class random regression forests as an algorithm for the efficient, automatic detection and localization of anatomical structures within three-dimensional CT scans Regression forests are similar to the more popular classification forests, but trained to predict continuous outputs We introduce a new, continuous parametrization of the anatomy localization task which is effectively addressed by regression forests This is shown to be a more natural approach than classification A single pass of our probabilistic algorithm enables the direct mapping from voxels to organ location and size; with training focusing on maximizing the confidence of output predictions As a by-product, our method produces salient anatomical landmarks; ie automatically selected "anchor" regions which help localize organs of interest with high confidence Quantitative validation is performed on a database of 100 highly variable CT scans Localization errors are shown to be lower (and more stable) than those from global affine registration approaches The regressor's parallelism and the simplicity of its context-rich visual features yield typical runtimes of only 1s Applications include semantic visual navigation, image tagging for retrieval, and initializing organ-specific processing

...read moreread less

343 citations

Book Chapter•DOI•

Leveraging bagging for evolving data streams

[...]

Albert Bifet¹, Geoff Holmes¹, Bernhard Pfahringer¹•Institutions (1)

University of Waikato¹

20 Sep 2010

TL;DR: A new variant of bagging is proposed, called leveraging bagging, which combines the simplicity of baging with adding more randomization to the input, and output of the classifiers.

...read moreread less

Abstract: Bagging, boosting and Random Forests are classical ensemble methods used to improve the performance of single classifiers. They obtain superior performance by increasing the accuracy and diversity of the single classifiers. Attempts have been made to reproduce these methods in the more challenging context of evolving data streams. In this paper, we propose a new variant of bagging, called leveraging bagging. This method combines the simplicity of bagging with adding more randomization to the input, and output of the classifiers. We test our method by performing an evaluation study on synthetic and real-world datasets comprising up to ten million examples.

...read moreread less

305 citations

Journal Article•DOI•

Sensitivity of Support Vector Machines to Random Feature Selection in Classification of Hyperspectral Data

[...]

Björn Waske¹, Sebastian van der Linden², Jon Atli Benediktsson³, Andreas Rabe², Patrick Hostert² - Show less +1 more•Institutions (3)

University of Bonn¹, Humboldt State University², University of Iceland³

29 Mar 2010-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: Experimental results clearly demonstrate that the generation of an SVM-based classifier system with RFS significantly improves overall classification accuracy as well as producer's and user's accuracies.

...read moreread less

Abstract: The accuracy of supervised land cover classifications depends on factors such as the chosen classification algorithm, adequate training data, the input data characteristics, and the selection of features. Hyperspectral imaging provides more detailed spectral and spatial information on the land cover than other remote sensing resources. Over the past ten years, traditional and formerly widely accepted statistical classification methods have been superseded by more recent machine learning algorithms, e.g., support vector machines (SVMs), or by multiple classifier systems (MCS). This can be explained by limitations of statistical approaches with regard to high-dimensional data, multimodal classes, and often limited availability of training data. In the presented study, MCSs based on SVM and random feature selection (RFS) are applied to explore the potential of a synergetic use of the two concepts. We investigated how the number of selected features and the size of the MCS influence classification accuracy using two hyperspectral data sets, from different environmental settings. In addition, experiments were conducted with a varying number of training samples. Accuracies are compared with regular SVM and random forests. Experimental results clearly demonstrate that the generation of an SVM-based classifier system with RFS significantly improves overall classification accuracy as well as producer's and user's accuracies. In addition, the ensemble strategy results in smoother, i.e., more realistic, classification maps than those from stand-alone SVM. Findings from the experiments were successfully transferred onto an additional hyperspectral data set.

...read moreread less

294 citations

Journal Article•DOI•

Prediction of species specific forest inventory attributes using a nonparametric semi-individual tree crown approach based on fused airborne laser scanning and multispectral data

[...]

Johannes Breidenbach¹, Erik Næsset¹, Vegard Lien¹, Terje Gobakken¹, Svein Solberg - Show less +1 more•Institutions (1)

Norwegian University of Life Sciences¹

15 Apr 2010-Remote Sensing of Environment

TL;DR: In this article, the authors proposed a semi-ITC method for forest inventory, which overcomes the main problems related to ITC by imputing ground truth data within crown segments from the nearest neighboring segment.

...read moreread less

246 citations

Journal Article•DOI•

Random Subspace Ensembles for fMRI Classification

[...]

Ludmila I. Kuncheva¹, Juan J. Rodríguez, Catrin Plumpton¹, David Edmund Johannes Linden¹, Stephen J. Johnston¹ - Show less +1 more•Institutions (1)

Bangor University¹

02 Feb 2010-IEEE Transactions on Medical Imaging

TL;DR: It is found that RS with support vector machines (SVM) as the base classifier outperformed single classifiers as well as some of the most widely used classifier ensembles such as bagging, AdaBoost, random forest, and rotation forest.

...read moreread less

Abstract: Classification of brain images obtained through functional magnetic resonance imaging (fMRI) poses a serious challenge to pattern recognition and machine learning due to the extremely large feature-to-instance ratio. This calls for revision and adaptation of the current state-of-the-art classification methods. We investigate the suitability of the random subspace (RS) ensemble method for fMRI classification. RS samples from the original feature set and builds one (base) classifier on each subset. The ensemble assigns a class label by either majority voting or averaging of output probabilities. Looking for guidelines for setting the two parameters of the method-ensemble size and feature sample size-we introduce three criteria calculated through these parameters: usability of the selected feature sets, coverage of the set of ?important? features, and feature set diversity. Optimized together, these criteria work toward producing accurate and diverse individual classifiers. RS was tested on three fMRI datasets from single-subject experiments: the Haxby data (Haxby, 2001.) and two datasets collected in-house. We found that RS with support vector machines (SVM) as the base classifier outperformed single classifiers as well as some of the most widely used classifier ensembles such as bagging, AdaBoost, random forest, and rotation forest. The closest rivals were the single SVM and bagging of SVM classifiers. We use kappa-error diagrams to understand the success of RS.

...read moreread less

222 citations

Journal Article•DOI•

Imputation of single-tree attributes using airborne laser scanning-based height, intensity, and alpha shape metrics

[...]

Jari Vauhkonen¹, Ilkka Korpela², Matti Maltamo¹, Timo Tokola¹•Institutions (2)

University of Eastern Finland¹, University of Helsinki²

15 Jun 2010-Remote Sensing of Environment

TL;DR: In this paper, the authors used the k-Most Similar Neighbor (k-MSN) and the Random Forest (RF) methods for the simultaneous estimation of species, diameter at breast height (DBH), height and stem volume using airborne laser scanning (ALS) data.

...read moreread less

Journal Article•DOI•

Non-parametric prediction and mapping of standing timber volume and biomass in a temperate forest: application of multiple optical/LiDAR-derived predictors

[...]

Hooman Latifi¹, Arne Nothdurft², Barbara Koch¹•Institutions (2)

University of Freiburg¹, Forest Research Institute²

01 Oct 2010-Forestry

TL;DR: In this article, a mixed temperate forest landscape in southwestern Germany, multiple remote sensing variables from aerial orthoimages, Thematic Mapper data and small footprint light detection and ranging (LiDAR) were used for plot-level nonparametric predictions of the total volume and biomass using three distance measures of Euclidean, Mahalanobis and Most Similar Neighbors as well as a regression tree-based classifier (Random Forest).

...read moreread less

Abstract: Summary In a mixed temperate forest landscape in southwestern Germany, multiple remote sensing variables from aerial orthoimages, Thematic Mapper data and small footprint light detection and ranging (LiDAR) were used for plot-level nonparametric predictions of the total volume and biomass using three distance measures of Euclidean, Mahalanobis and Most Similar Neighbour as well as a regression tree-based classifier (Random Forest). The performances of nearest neighbour (NN) approaches were examined by means of relative bias and root mean squared error. The original highdimensional dataset was pruned using an evolutionary genetic algorithm search with a NN classification scenario, as well as by a stepwise selection. The genetic algorithm (GA)-selected variables showed improved performance when applying Euclidean and Mahalanobis distances for predictions, whereas the Most Similar Neighbour and Random Forests worked more precise with the full dataset. The GA search proved to be unstable in multiple runs because of intercorrelations among the high-dimensional predictors. The selected datasets are dominated by LiDAR height metrics. Furthermore, The LiDAR-based metrics showed major relevance in predicting both response variables examined here. The Random Forest proved to be superior to the other examined NN methods, which was eventually used for a wallto-wall mapping of predictions on a grid of 20 × 20 m spatial resolution.

...read moreread less

Journal Article•DOI•

Contextual land-cover classification: incorporating spatial dependence in land-cover classification models using random forests and the Getis statistic

[...]

Bardan Ghimire¹, John Rogan¹, Jennifer A. Miller²•Institutions (2)

Clark University¹, University of Texas at Austin²

22 Jan 2010-Remote Sensing Letters

TL;DR: In this paper, a novel way of incorporating spatial dependence in a heterogeneous region is tested using an ensemble learning technique called random forests and a measure of local spatial dependence called the Getis statistic.

...read moreread less

Abstract: Land-cover characterization of large heterogeneous landscapes is challenging because of the confusion caused by high intra-class variability and heterogeneous landscape artefacts. Neighbourhood context can be used to supplement spectral information, and a novel way of incorporating spatial dependence in a heterogeneous region is tested here using an ensemble learning technique called random forests and a measure of local spatial dependence called the Getis statistic. The overall Kappa accuracy of the random forest classifier that used a combination of spectral and local spatial (Getis) variables at three different neighbourhood sizes (3 × 3, 7 × 7, and 11 × 11) ranged from 0.85 to 0.92. This accuracy was higher than that of a non-spatial random forest classifier having an overall Kappa accuracy of 0.78, which was run using the spectral variables only. This study demonstrated that the use of the Getis statistic with different neighbourhood sizes leads to substantial increase in per class classification acc...

...read moreread less

Book Chapter•DOI•

Semantic segmentation of urban scenes using dense depth maps

[...]

Chenxi Zhang¹, Wang Liang¹, Ruigang Yang¹•Institutions (1)

University of Kentucky¹

05 Sep 2010

TL;DR: The result shows that only using dense depth information, this framework for semantic scene parsing and object recognition based on dense depth maps can achieve overall better accurate segmentation and recognition than that from sparse 3D features or appearance, advancing state-of-the-art performance.

...read moreread less

Abstract: In this paper we present a framework for semantic scene parsing and object recognition based on dense depth maps. Five view-independent 3D features that vary with object class are extracted from dense depth maps at a superpixel level for training a classifier using randomized decision forest technique. Our formulation integrates multiple features in a Markov Random Field (MRF) framework to segment and recognize different object classes in query street scene images. We evaluate our method both quantitatively and qualitatively on the challenging Cambridge-driving Labeled Video Database (CamVid). The result shows that only using dense depth information, we can achieve overall better accurate segmentation and recognition than that from sparse 3D features or appearance, or even the combination of sparse 3D features and appearance, advancing state-of-the-art performance. Furthermore, by aligning 3D dense depth based features into a unified coordinate frame, our algorithm can handle the special case of view changes between training and testing scenarios. Preliminary evaluation in cross training and testing shows promising results.

...read moreread less

Book•

Recursive Partitioning and Applications

[...]

Heping Zhang, Burton H. Singer

14 Jul 2010

TL;DR: A Practical Guide to Tree Construction using Tree-Based Analysis for Binary Response and Regression Trees and Adaptive Splines for a Continuous Response.

...read moreread less

Abstract: A Practical Guide to Tree Construction.- Logistic Regression.- Classification Trees for a Binary Response.- Examples Using Tree-Based Analysis.- Random and Deterministic Forests.- Analysis of Censored Data: Examples.- Analysis of Censored Data: Concepts and Classical Methods.- Analysis of Censored Data: Survival Trees and Random Forests.- Regression Trees and Adaptive Splines for a Continuous Response.- Analysis of Longitudinal Data.- Analysis of Multiple Discrete Responses.

...read moreread less

Journal Article•DOI•

Ensemble habitat mapping of invasive plant species.

[...]

Thomas J. Stohlgren¹, Peter L. A. Ma², Sunil Kumar³, Monique E. Rocca³, Jeffrey T. Morisette¹, Catherine S. Jarnevich¹, Nate Benson⁴ - Show less +3 more•Institutions (4)

United States Geological Survey¹, Goddard Space Flight Center², Colorado State University³, National Park Service⁴

01 Feb 2010-Risk Analysis

TL;DR: For the four invasive plant species tested, ensemble models were the only models that ranked in the top three models for both field validation and test data, suggesting that ensemble models may be more robust than individual species-environment matching models for risk analysis.

...read moreread less

Abstract: Ensemble species distribution models combine the strengths of several species environmental matching models, while minimizing the weakness of any one model. Ensemble models may be particularly useful in risk analysis of recently arrived, harmful invasive species because species may not yet have spread to all suitable habitats, leaving species-environment relationships difficult to determine. We tested five individual models (logistic regression, boosted regression trees, random forest, multivariate adaptive regression splines (MARS), and maximum entropy model or Maxent) and ensemble modeling for selected nonnative plant species in Yellowstone and Grand Teton National Parks, Wyoming; Sequoia and Kings Canyon National Parks, California, and areas of interior Alaska. The models are based on field data provided by the park staffs, combined with topographic, climatic, and vegetation predictors derived from satellite data. For the four invasive plant species tested, ensemble models were the only models that ranked in the top three models for both field validation and test data. Ensemble models may be more robust than individual species-environment matching models for risk analysis.

...read moreread less

Proceedings Article•

Feature Engineering and Classifier Ensemble for KDD Cup 2010

[...]

Hsiang-Fu Yu¹, Hung-Yi Lo², Hsun-Ping Hsieh¹, Jing-Kai Lou², Todd G. McKenzie, Jung-Wei Chou¹, Po-Han Chung, Chia-Hua Ho¹, Chun-Fu Chang¹, Jui-Yu Weng¹, En-Syu Yan, Che-Wei Chang, Tsung-Ting Kuo¹, Chien-Yuan Wang¹, Yi-Hung Huang, Yu-Xun Ruan¹, Yu-Shi Lin, Shou-De Lin¹, Hsuan-Tien Lin¹, Chih-Jen Lin¹ - Show less +16 more•Institutions (2)

National Taiwan University¹, Academia Sinica²

01 Jan 2010

TL;DR: This team is the first prize winner of both tracks (all teams and student teams) of KDD Cup 2010 and combined results of student sub-teams by regularized linear regression.

...read moreread less

Abstract: KDD Cup 2010 is an educational data mining competition. Participants are asked to learn a model from students' past behavior and then predict their future performance. At National Taiwan University, we organized a course for this competition. Most student sub-teams expanded features by various binarization and discretization techniques. The resulting sparse feature sets were trained by logistic regression (using LIBLINEAR). One sub-team considered condensed features using simple statistical techniques and applied Random Forest (through Weka) for training. Initial development was conducted on an internal split of training data for training and validation. We identied some useful feature combinations to improve performance. For the nal submission, we combined results of student sub-teams by regularized linear regression. Our team is the rst prize winner of both tracks (all teams and student teams) of KDD Cup 2010.

...read moreread less

Journal Article•DOI•

Random Forests for Photometric Redshifts

[...]

Samuel Carliles¹, Tamás Budavári¹, Sebastien Heinis¹, Carey E. Priebe¹, Alexander S. Szalay¹ - Show less +1 more•Institutions (1)

Johns Hopkins University¹

20 Mar 2010-The Astrophysical Journal

TL;DR: In this paper, an empirical method based on Random Forests is proposed to estimate the photometric redshift of each galaxy by building a set of optimal decision trees on subsets of the available spectroscopic sample.

...read moreread less

Abstract: The main challenge today in photometric redshift estimation is not in the accuracy but in understanding the uncertainties. We introduce an empirical method based on Random Forests to address these issues. The training algorithm builds a set of optimal decision trees on subsets of the available spectroscopic sample, which provide independent constraints on the redshift of each galaxy. The combined forest estimates have intriguing statistical properties, notable among which are Gaussian errors. We demonstrate the power of our approach on multi-color measurements of the Sloan Digital Sky Survey.

...read moreread less

Book Chapter•DOI•

MIForests: multiple-instance learning with randomized trees

[...]

Christian Leistner¹, Amir Saffari¹, Horst Bischof¹•Institutions (1)

Graz University of Technology¹

05 Sep 2010

TL;DR: A novel multiple-instance learning algorithm for randomized trees called MIForests, which achieves state-of-the-art results while being faster than previous approaches and being able to inherently solve multi-class problems.

...read moreread less

Abstract: Multiple-instance learning (MIL) allows for training classifiers from ambiguously labeled data. In computer vision, this learning paradigm has been recently used in many applications such as object classification, detection and tracking. This paper presents a novel multiple-instance learning algorithmfor randomized trees called MIForests. Randomized trees are fast, inherently parallel and multi-class and are thus increasingly popular in computer vision. MIForest combine the advantages of these classifiers with the flexibility of multiple instance learning. In order to leverage the randomized trees for MIL, we define the hidden class labels inside target bags as random variables. These random variables are optimized by training random forests and using a fast iterative homotopy method for solving the non-convex optimization problem. Additionally, most previously proposed MIL approaches operate in batch or off-line mode and thus assume access to the entire training set. This limits their applicability in scenarios where the data arrives sequentially and in dynamic environments.We show that MIForests are not limited to off-line problems and present an on-line extension of our approach. In the experiments, we evaluate MIForests on standard visual MIL benchmark datasets where we achieve state-of-the-art results while being faster than previous approaches and being able to inherently solve multi-class problems. The on-line version of MIForests is evaluated on visual object tracking where we outperform the state-of-the-art method based on boosting.

...read moreread less

Journal Article•DOI•

Real-Time Visual Concept Classification

[...]

Jasper Uijlings¹, Arnold W. M. Smeulders¹, Remko Scha¹•Institutions (1)

University of Amsterdam¹

01 Nov 2010-IEEE Transactions on Multimedia

TL;DR: This paper reviews techniques to accelerate concept classification, where the trade-off between computational efficiency and accuracy is shown and the results lead to a 7-fold speed increase without accuracy loss, and a 70- fold speed increase with 3% accuracy loss.

...read moreread less

Abstract: As datasets grow increasingly large in content-based image and video retrieval, computational efficiency of concept classification is important. This paper reviews techniques to accelerate concept classification, where we show the trade-off between computational efficiency and accuracy. As a basis, we use the Bag-of-Words algorithm that in the 2008 benchmarks of TRECVID and PASCAL lead to the best performance scores. We divide the evaluation in three steps: 1) Descriptor Extraction, where we evaluate SIFT, SURF, DAISY, and Semantic Textons. 2) Visual Word Assignment, where we compare a k-means visual vocabulary with a Random Forest and evaluate subsampling, dimension reduction with PCA, and division strategies of the Spatial Pyramid. 3) Classification, where we evaluate the χ2, RBF, and Fast Histogram Intersection kernel for the SVM. Apart from the evaluation, we accelerate the calculation of densely sampled SIFT and SURF, accelerate nearest neighbor assignment, and improve accuracy of the Histogram Intersection kernel. We conclude by discussing whether further acceleration of the Bag-of-Words pipeline is possible. Our results lead to a 7-fold speed increase without accuracy loss, and a 70-fold speed increase with 3% accuracy loss. The latter system does classification in real-time, which opens up new applications for automatic concept classification. For example, this system permits five standard desktop PCs to automatically tag for 20 classes all images that are currently uploaded to Flickr.

...read moreread less

Journal Article•DOI•

Consistency of Random Survival Forests

[...]

Hemant Ishwaran¹, Udaya B. Kogalur¹•Institutions (1)

Cleveland Clinic¹

01 Jul 2010-Statistics & Probability Letters

TL;DR: In this article, the authors prove uniform consistency of Random Survival Forests (RSF) under general splitting rules, bootstrapping, and random selection of variables, and show that the forest ensemble survival function converges uniformly to the true population survival function.

...read moreread less

Journal Article•DOI•

Classification in conservation biology: A comparison of five machine-learning methods

[...]

Christian Kampichler¹, Ralf Wieland, Sophie Calmé², Holger Weissenberger, Stefan L. Arriaga-Weiss¹ - Show less +1 more•Institutions (2)

Universidad Juárez Autónoma de Tabasco¹, Université de Sherbrooke²

01 Nov 2010-Ecological Informatics

TL;DR: Characteristics such as time effort, classifier comprehensibility and method intricacy are evaluated—aspects that determine the success of a classification technique among ecologists and conservation biologists as well as for the communication with managers and decision makers.

...read moreread less

Posted Content•

Mass appraisal of residential apartments: An application of Random forest for valuation and a CART-based approach for model diagnostics

[...]

Evgeny Antipov¹, Elena Pokryshevskaya¹•Institutions (1)

National Research University – Higher School of Economics¹

29 Jul 2010-Research Papers in Economics

TL;DR: In this article, the use of Random Forest as a potential technique for residential estate mass appraisal has been attempted for the first time and the method performed better than such techniques as CHAID, CART, KNN, multiple regression analysis, Artificial Neural Networks (MLP and RBF) and Boosted Trees.

...read moreread less

Abstract: To the best knowledge of authors, the use of Random forest as a potential technique for residential estate mass appraisal has been attempted for the first time. In the empirical study using data on residential apartments the method performed better than such techniques as CHAID, CART, KNN, multiple regression analysis, Artificial Neural Networks (MLP and RBF) and Boosted Trees. An approach for automatic detection of segments where a model significantly underperforms and for detecting segments with systematically under- or overestimated prediction is introduced. This segmentational approach is applicable to various expert systems including, but not limited to, those used for the mass appraisal.

...read moreread less

Journal Article•DOI•

A fuzzy random forest

[...]

Piero P. Bonissone¹, José Manuel Cadenas², M. Carmen Garrido², R. Andrés Díaz-Valladares³•Institutions (3)

General Electric¹, University of Murcia², University of Montemorelos³

01 Sep 2010-International Journal of Approximate Reasoning

TL;DR: The proposed multiple classifier system based on a ''forest'' of fuzzy decision trees, i.e., a fuzzy random forest, is proposed, which exhibits a good accuracy classification, comparable to that of the best classifiers when tested with conventional data sets.

...read moreread less

Journal Article•DOI•

Image segmentation scale parameter optimization and land cover classification using the Random Forest algorithm

[...]

Alan Smith¹•Institutions (1)

Ducks Unlimited¹

09 Jul 2010-Journal of Spatial Science

TL;DR: In this article, an approach to using the Random Forest classification algorithm to quantitatively evaluate a range of potential image segmentation scale alternatives in order to identify the segmentation scales(s) that best predict land cover classes of interest was described.

...read moreread less

Abstract: This paper describes an approach to using the Random Forest classification algorithm to quantitatively evaluate a range of potential image segmentation scale alternatives in order to identify the segmentation scale(s) that best predict land cover classes of interest. The image segmentation scale selection process was used to identify three critical image object scales that when combined produced an optimal level of land cover classification accuracy. Following segmentation scale optimization, the Random Forest classifier was then used to assign land cover classes to 11 scenes of SPOT satellite imagery in North and South Dakota with an average overall accuracy of 85.2 percent.

...read moreread less

Journal Article•DOI•

Random forest based lung nodule classification aided by clustering

[...]

S. L. A. Lee¹, Abbas Z. Kouzani¹, Eric Hu²•Institutions (2)

Deakin University¹, University of Adelaide²

01 Oct 2010-Computerized Medical Imaging and Graphics

TL;DR: The method takes advantage of the random forest algorithm and offers a structure for a hybrid random forest based lung nodule classification aided by clustering and a high receiver operating characteristic (ROC) A(z) of 0.9786 has been achieved.

...read moreread less

Journal Article•DOI•

A comparison of regression tree ensembles: Predicting Sirex noctilio induced water stress in Pinus patula forests of KwaZulu-Natal, South Africa

[...]

Riyad Ismail¹, Onisimo Mutanga¹•Institutions (1)

University of KwaZulu-Natal¹

01 Feb 2010-International Journal of Applied Earth Observation and Geoinformation

TL;DR: The water index and Ratio975 had the best ability to assay the water status of S. noctilio infested trees thus making it possible to remotely predict and quantify the severity of damage caused by the wasp.

...read moreread less

Mixed Effects Random Forest for Clustered Data

[...]

Ahlem Hajjem¹, François Bellavance¹, Denis Larocque¹•Institutions (1)

HEC Montréal¹

01 Oct 2010

TL;DR: Simulation results show that the proposed MERF method provides substantial improvements over standard RF when the random effects are non-negligible.

...read moreread less

Abstract: This paper presents an extension of the random forest (RF) method to the case of clustered data. The proposed ‘mixed-effects random forest’ (MERF) is implemented using a standard RF algorithm within the framework of the expectation–maximization algorithm. Simulation results show that the proposed MERF method provides substantial improvements over standard RF when the random effects are non-negligible. The use of the method is illustrated to predict the first-week box office revenues of movies.

...read moreread less

Journal Article•DOI•

Computer aided diagnosis system for the Alzheimer's disease based on partial least squares and random forest SPECT image classification

[...]

Javier Ramírez¹, Juan Manuel Górriz¹, Fermín Segovia¹, R. Chaves¹, Diego Salas-Gonzalez¹, Miriam Romero López¹, I. Álvarez¹, Pablo Padilla¹ - Show less +4 more•Institutions (1)

University of Granada¹

19 Mar 2010-Neuroscience Letters

TL;DR: This letter shows a computer aided diagnosis technique for the early detection of the Alzheimer's disease (AD) by means of single photon emission computed tomography (SPECT) image classification based on partial least squares (PLS) regression model and a random forest (RF) predictor.

...read moreread less

Journal Article•DOI•

Automatic color constancy algorithm selection and combination

[...]

Simone Bianco¹, Gianluigi Ciocca¹, Claudio Cusano¹, Raimondo Schettini¹•Institutions (1)

University of Milano-Bicocca¹

01 Mar 2010-Pattern Recognition

TL;DR: This work investigates how illuminant estimation techniques can be improved taking into account intrinsic, low level properties of the images, and shows how these properties can be used to drive the selection of the best algorithm for a given image.

...read moreread less

Collapse