Showing papers on "AdaBoost published in 2015"

PDF

Open Access

Journal Article•DOI•

A rapid learning algorithm for vehicle classification

[...]

Xuezhi Wen¹, Ling Shao², Yu Xue¹, Wei Fang¹•Institutions (2)

Nanjing University of Information Science and Technology¹, Northumbria University²

20 Feb 2015-Information Sciences

TL;DR: Experimental results demonstrate that the proposed approaches not only speed up the training and incremental learning processes of AdaBoost, but also yield better or competitive vehicle classification accuracies compared with several state-of-the-art methods, showing their potential for real-time applications.

...read moreread less

460 citations

Journal Article•DOI•

Evaluating multiple classifiers for stock price direction prediction

[...]

Michel Ballings¹, Dirk Van den Poel², Nathalie Hespeels², Ruben Gryp²•Institutions (2)

University of Tennessee¹, Ghent University²

15 Nov 2015-Expert Systems With Applications

TL;DR: The results indicate that Random Forest is the top algorithm followed by Support Vector Machines, Kernel Factory, AdaBoost, Neural Networks, K-Nearest Neighbors and Logistic Regression in the domain of stock price direction prediction.

...read moreread less

Abstract: We predict long term stock price direction.We benchmark three ensemble methods against four single classifiers.We use five times twofold cross-validation and AUC as a performance measure.Random Forest is the top algorithm.This study is the first to make such an extensive benchmark in this domain. Stock price direction prediction is an important issue in the financial world. Even small improvements in predictive performance can be very profitable. The purpose of this paper is to benchmark ensemble methods (Random Forest, AdaBoost and Kernel Factory) against single classifier models (Neural Networks, Logistic Regression, Support Vector Machines and K-Nearest Neighbor). We gathered data from 5767 publicly listed European companies and used the area under the receiver operating characteristic curve (AUC) as a performance measure. Our predictions are one year ahead. The results indicate that Random Forest is the top algorithm followed by Support Vector Machines, Kernel Factory, AdaBoost, Neural Networks, K-Nearest Neighbors and Logistic Regression. This study contributes to literature in that it is, to the best of our knowledge, the first to make such an extensive benchmark. The results clearly suggest that novel studies in the domain of stock price direction prediction should include ensembles in their sets of algorithms. Our extensive literature review evidently indicates that this is currently not the case.

...read moreread less

368 citations

Journal Article•DOI•

A comparison of machine learning techniques for customer churn prediction

[...]

Thanasis Vafeiadis, Konstantinos I. Diamantaras, George Sarigiannidis, K. Ch. Chatzisavvas

01 Jun 2015-Simulation Modelling Practice and Theory

TL;DR: A comparative study on the most popular machine learning methods applied to the challenging problem of customer churning prediction in the telecommunications industry demonstrates clear superiority of the boosted versions of the models against the plain (non-boosted) versions.

...read moreread less

256 citations

Journal Article•DOI•

Automatic Segmentation of Liver Tumor in CT Images with Deep Convolutional Neural Networks

[...]

Li Wen, Fucang Jia, Qingmao Hu

19 Nov 2015-Journal of Computational Chemistry

TL;DR: The CNNs method has better performance than other methods and is promising in liver tumor segmentation, compared to popular machine learning algorithms: AdaBoost, Random Forests, and support vector machine.

...read moreread less

Abstract: Liver tumors segmentation from computed tomography (CT) images is an essential task for diagnosis and treatments of liver cancer. However, it is difficult owing to the variability of appearances, fuzzy boundaries, heterogeneous densities, shapes and sizes of lesions. In this paper, an automatic method based on convolutional neural networks (CNNs) is presented to segment lesions from CT images. The CNNs is one of deep learning models with some convolutional filters which can learn hierarchical features from data. We compared the CNNs model to popular machine learning algorithms: AdaBoost, Random Forests (RF), and support vector machine (SVM). These classifiers were trained by handcrafted features containing mean, variance, and contextual features. Experimental evaluation was performed on 30 portal phase enhanced CT images using leave-one-out cross validation. The average Dice Similarity Coefficient (DSC), precision, and recall achieved of 80.06% ± 1.63%, 82.67% ± 1.43%, and 84.34% ± 1.61%, respectively. The results show that the CNNs method has better performance than other methods and is promising in liver tumor segmentation.

...read moreread less

220 citations

Journal Article•DOI•

Dynamic cattle behavioural classification using supervised ensemble classifiers

[...]

Ritaban Dutta¹, Daniel Smith¹, Richard Rawnsley², Greg Bishop-Hurley³, James Hills², Greg Timms¹, DA Henry³ - Show less +3 more•Institutions (3)

Hobart Corporation¹, University of Tasmania², Commonwealth Scientific and Industrial Research Organisation³

01 Feb 2015-Computers and Electronics in Agriculture

TL;DR: This study has shown that cattle behaviours can be classified with a high accuracy using supervised machine learning technique, and provides a significant potential in providing a mechanism for the early detection and quantitative assessment of animal health issues.

...read moreread less

147 citations

Journal Article•DOI•

Geometric mean based boosting algorithm with over-sampling to resolve data imbalance problem for bankruptcy prediction

[...]

Myoung-Jong Kim¹, Dae-Ki Kang², Hong Bae Kim²•Institutions (2)

Pusan National University¹, Dongseo University²

15 Feb 2015-Expert Systems With Applications

TL;DR: The results and their comparative analysis with AdaBoost and cost-sensitive boosting indicate that GMBoost has the advantages of high prediction power and robust learning capability in imbalanced data as well as balanced data distribution.

...read moreread less

Abstract: We propose geometric mean based boosting algorithm (GMBoost).We propose GMBoost to resolve data imbalance problem.GMBoost considers geometric mean of error rates of majority and minority classes.We experiment GMBoost, AdaBoost and cost-sensitive boosting on bankruptcy prediction.The comparative results shows GMBoost outperforms in imbalanced and balanced data. In classification or prediction tasks, data imbalance problem is frequently observed when most of instances belong to one majority class. Data imbalance problem has received considerable attention in machine learning community because it is one of the main causes that degrade the performance of classifiers or predictors. In this paper, we propose geometric mean based boosting algorithm (GMBoost) to resolve data imbalance problem. GMBoost enables learning with consideration of both majority and minority classes because it uses the geometric mean of both classes in error rate and accuracy calculation. To evaluate the performance of GMBoost, we have applied GMBoost to bankruptcy prediction task. The results and their comparative analysis with AdaBoost and cost-sensitive boosting indicate that GMBoost has the advantages of high prediction power and robust learning capability in imbalanced data as well as balanced data distribution.

...read moreread less

145 citations

Journal Article•DOI•

A hybrid data mining model of feature selection algorithms and ensemble learning classifiers for credit scoring

[...]

Fatemeh Nemati Koutanaei¹, Hedieh Sajedi², Mohammad Khanbabaei¹•Institutions (2)

Islamic Azad University¹, University of Tehran²

01 Nov 2015-Journal of Retailing and Consumer Services

TL;DR: A hybrid data mining model of feature selection and ensemble learning classification algorithms on the basis of three stages is developed and the hybrid model is verified and proposed as an operative and strong model for performing credit scoring.

...read moreread less

136 citations

Journal Article•DOI•

An empirical evaluation of the performance of binary classifiers in the prediction of credit ratings changes

[...]

Stewart Jones¹, David Johnstone¹, Roy Wilson•Institutions (1)

University of Sydney¹

01 Jul 2015-Journal of Banking and Finance

TL;DR: This study examines the predictive performance of a wide class of binary classifiers using a large sample of international credit ratings changes from the period 1983–2013 to conclude that simpler classifiers can be viable alternatives to more sophisticated approaches, particularly if interpretability is an important objective of the modelling exercise.

...read moreread less

Abstract: In this study, we examine the predictive performance of a wide class of binary classifiers using a large sample of international credit ratings changes from the period 1983–2013. Using a number of financial, market, corporate governance, macro-economic and other indicators as explanatory variables, we compare classifiers ranging from conventional techniques (such as logit/probit and LDA) to fully nonlinear classifiers, including neural networks, support vector machines and more recent statistical learning techniques such as generalised boosting, AdaBoost and random forests. We find that the newer classifiers significantly outperform all other classifiers on both the cross sectional and longitudinal test samples; and prove remarkably robust to different data structures and assumptions. Simple linear classifiers such as logit/probit and LDA are found nonetheless to predict quite accurately on the test samples, in some cases performing comparably well to more flexible model structures. We conclude that simpler classifiers can be viable alternatives to more sophisticated approaches, particularly if interpretability is an important objective of the modelling exercise. We also suggest effective ways to enhance the predictive performance of many of the binary classifiers examined in this study.

...read moreread less

133 citations

Journal Article•DOI•

Efficient Feature Selection and Classification for Vehicle Detection

[...]

Xuezhi Wen¹, Ling Shao², Wei Fang¹, Yu Xue¹•Institutions (2)

Nanjing University of Information Science and Technology¹, Northumbria University²

01 Mar 2015-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: Experimental results demonstrate that the proposed approaches not only speed up the feature selection process with AdaBoost, but also yield better detection performance than the state-of-the-art methods.

...read moreread less

Abstract: The focus of this paper is on the problem of Haar-like feature selection and classification for vehicle detection. Haar-like features are particularly attractive for vehicle detection because they form a compact representation, encode edge and structural information, capture information from multiple scales, and especially can be computed efficiently. Due to the large-scale nature of the Haar-like feature pool, we present a rapid and effective feature selection method via AdaBoost by combining a sample’s feature value with its class label. Our approach is analyzed theoretically and empirically to show its efficiency. Then, an improved normalization algorithm for the selected feature values is designed to reduce the intra-class difference, while increasing the inter-class variability. Experimental results demonstrate that the proposed approaches not only speed up the feature selection process with AdaBoost, but also yield better detection performance than the state-of-the-art methods.

...read moreread less

132 citations

Journal Article•DOI•

Learning to Detect Vehicles by Clustering Appearance Patterns

[...]

Eshed Ohn-Bar¹, Mohan M. Trivedi¹•Institutions (1)

University of California, San Diego¹

23 Mar 2015-IEEE Transactions on Intelligent Transportation Systems

TL;DR: The analysis provides insight into the design of a robust vehicle detection system, showing promise in terms of detection performance and orientation estimation accuracy.

...read moreread less

Abstract: This paper studies efficient means in dealing with intracategory diversity in object detection. Strategies for occlusion and orientation handling are explored by learning an ensemble of detection models from visual and geometrical clusters of object instances. An AdaBoost detection scheme is employed with pixel lookup features for fast detection. The analysis provides insight into the design of a robust vehicle detection system, showing promise in terms of detection performance and orientation estimation accuracy.

...read moreread less

131 citations

Journal Article•

Risk bounds for the majority vote: from a PAC-Bayesian analysis to a learning algorithm

[...]

Pascal Germain¹, Alexandre Lacasse¹, François Laviolette¹, Mario Marchand¹, Jean-Francis Roy¹ - Show less +1 more•Institutions (1)

Laval University¹

01 Jan 2015-Journal of Machine Learning Research

TL;DR: In this article, an extensive analysis of the behavior of majority votes in binary classification is presented, where the authors introduce a risk bound for majority votes, called the C-bound, that takes into account the average quality of the voters and their average disagreement.

...read moreread less

Abstract: We propose an extensive analysis of the behavior of majority votes in binary classification. In particular, we introduce a risk bound for majority votes, called the C-bound, that takes into account the average quality of the voters and their average disagreement. We also propose an extensive PAC-Bayesian analysis that shows how the C-bound can be estimated from various observations contained in the training data. The analysis intends to be self-contained and can be used as introductory material to PAC-Bayesian statistical learning theory. It starts from a general PAC-Bayesian perspective and ends with uncommon PAC-Bayesian bounds. Some of these bounds contain no Kullback-Leibler divergence and others allow kernel functions to be used as voters (via the sample compression setting). Finally, out of the analysis, we propose the MinCq learning algorithm that basically minimizes the C-bound. MinCq reduces to a simple quadratic program. Aside from being theoretically grounded, MinCq achieves state-of-the-art performance, as shown in our extensive empirical comparison with both AdaBoost and the Support Vector Machine.

...read moreread less

Proceedings Article•DOI•

A comparative study on machine learning algorithms for indoor positioning

[...]

Sinem Bozkurt¹, Gulin Elibol¹, Serkan Gunal², Ugur Yayan•Institutions (2)

Eskişehir Osmangazi University¹, Anadolu University²

28 Sep 2015

TL;DR: Experimental results reveal that k-Nearest Neighbor (k-NN) algorithm is the most suitable one during the positioning and ensemble algorithms such as AdaBoost and Bagging are applied to improve the decision tree classifier performance nearly same as k-NN that is resulted as the best classifier for indoor positioning.

...read moreread less

Abstract: Fingerprinting based positioning is commonly used for indoor positioning In this method, initially a radio map is created using Received Signal Strength (RSS) values that are measured from predefined reference points During the positioning, the best match between the observed RSS values and existing RSS values in the radio map is established as the predicted position In the positioning literature, machine learning algorithms have widespread usage in estimating positions One of the main problems in indoor positioning systems is to find out appropriate machine learning algorithm In this paper, selected machine learning algorithms are compared in terms of positioning accuracy and computation time In the experiments, UJIIndoorLoc indoor positioning database is used Experimental results reveal that k-Nearest Neighbor (k-NN) algorithm is the most suitable one during the positioning Additionally, ensemble algorithms such as AdaBoost and Bagging are applied to improve the decision tree classifier performance nearly same as k-NN that is resulted as the best classifier for indoor positioning

...read moreread less

Journal Article•DOI•

Situation assessment and decision making for lane change assistance using ensemble learning methods

[...]

Yi Hou¹, Praveen Edara¹, Carlos Sun¹•Institutions (1)

University of Missouri¹

15 May 2015-Expert Systems With Applications

TL;DR: This paper investigated two ensemble learning methods, random forest, and AdaBoost, for developing a lane change assistance system and showed that both ensemblelearning methods produced higher classification accuracy and lower false positive rates than the Bayes/Decision tree classifier used in the literature.

...read moreread less

Abstract: Lane change assistance system was developed using ensemble learning methods.Proposed system has potential to prevent lane change crashes and thus reducing injuries and fatalities.Random forest and AdaBoost outperformed Bayes/Decision tree classifier.Higher classification accuracy and lower false positive rates achieved.Accuracies of 99.1% and 98.7% were achieved for lane keeping. Lane change maneuvers contribute to a significant number of road traffic accidents. Advanced driver assistance systems (ADAS) that can assess a traffic situation and warn drivers of unsafe lane changes can offer additional safety and convenience. In addition, ADAS can be extended for use in automatic lane changing in driverless vehicles. This paper investigated two ensemble learning methods, random forest, and AdaBoost, for developing a lane change assistance system. The focus on increasing the accuracy of safety critical lane change events has a significant impact on lowering the occurrence of crashes. This is the first study to explore ensemble learning methods for modeling lane changes using a comprehensive set of variables. Detailed vehicle trajectory data from the Next Generation Simulation (NGSIM) dataset in the US were used for model development and testing. The results showed that both ensemble learning methods produced higher classification accuracy and lower false positive rates than the Bayes/Decision tree classifier used in the literature. The impact of misclassification of lane changing events was also studied. A sensitivity analysis performed by varying the accuracy of lane changing showed that the lane keeping accuracy can be increased to as high as 99.1% for the AdaBoost system and 98.7% for the random forest system. The corresponding true positive rates were 96.3% and 94.6%. High accuracy of lane keeping and high true positive rates are desirable due to their safety implications.

...read moreread less

Journal Article•DOI•

Use of acceleration data for transportation mode prediction

[...]

Muhammad Awais Shafique¹, Muhammad Awais Shafique², Eiji Hato²•Institutions (2)

University of Engineering and Technology, Lahore¹, University of Tokyo²

01 Jan 2015-Transportation

TL;DR: A comparison is made between changes in pre-processing, selection methods for generating training data, and classifiers, using the accelerometer data collected from three cities in Japan to suggest that using a 125-point moving average during pre- processing and selecting training data proportionally for all modes will maximise prediction accuracy.

...read moreread less

Abstract: Most smartphones today are equipped with an accelerometer, in addition to other sensors. Any data recorded by the accelerometer can be successfully utilised to determine the mode of transportation in use, which will provide an alternative to conventional household travel surveys and make it possible to implement customer-oriented advertising programmes. In this study, a comparison is made between changes in pre-processing, selection methods for generating training data, and classifiers, using the accelerometer data collected from three cities in Japan. The classifiers used were support vector machines (SVM), adaptive boosting (AdaBoost), decision tree and random forests. The results of this exercise suggest that using a 125-point moving average during pre-processing and selecting training data proportionally for all modes will maximise prediction accuracy. Moreover, random forests outperformed all other classifiers by yielding an overall prediction accuracy of 99.8 % for all three cities.

...read moreread less

Proceedings Article•DOI•

Optimal Action Extraction for Random Forests and Boosted Trees

[...]

Zhicheng Cui¹, Wenlin Chen¹, Yujie He¹, Yixin Chen¹•Institutions (1)

Washington University in St. Louis¹

10 Aug 2015

TL;DR: The NP-hardness of the optimal action extraction problem for ATMs is proved and this problem is formulated in an integer linear programming formulation which can be efficiently solved by existing packages.

...read moreread less

Abstract: Additive tree models (ATMs) are widely used for data mining and machine learning. Important examples of ATMs include random forest, adaboost (with decision trees as weak learners), and gradient boosted trees, and they are often referred to as the best off-the-shelf classifiers. Though capable of attaining high accuracy, ATMs are not well interpretable in the sense that they do not provide actionable knowledge for a given instance. This greatly limits the potential of ATMs on many applications such as medical prediction and business intelligence, where practitioners need suggestions on actions that can lead to desirable outcomes with minimum costs.To address this problem, we present a novel framework to post-process any ATM classifier to extract an optimal actionable plan that can change a given input to a desired class with a minimum cost. In particular, we prove the NP-hardness of the optimal action extraction problem for ATMs and formulate this problem in an integer linear programming formulation which can be efficiently solved by existing packages. We also empirically demonstrate the effectiveness of the proposed framework by conducting comprehensive experiments on challenging real-world datasets.

...read moreread less

Journal Article•DOI•

Wrapper Feature Subset Selection for Dimension Reduction Based on Ensemble Learning Algorithm

[...]

Rattanawadee Panthong¹, Anongnart Srivihok¹•Institutions (1)

Kasetsart University¹

01 Jan 2015-Procedia Computer Science

TL;DR: This study shows that the search technique based on the bagging algorithm using Decision Tree obtained better results in average accuracy than other methods and an increased accuracy rate and a reduced run-time when searching multimedia data consisting of a large number of multidimensional datasets.

...read moreread less

Journal Article•DOI•

EEG signal classification for BCI applications by wavelets and interval type-2 fuzzy logic systems

[...]

Thanh Nguyen¹, Abbas Khosravi¹, Douglas Creighton¹, Saeid Nahavandi¹•Institutions (1)

Deakin University¹

01 Jun 2015-Expert Systems With Applications

TL;DR: The proposed wavelet-IT2FLS method considerably dominates the comparable classifiers on both datasets, and outperforms the best performance on the Ia and Ib datasets reported in the BCI competition II by 1.40% and 2.27% respectively.

...read moreread less

Abstract: Propose Haar wavelet transformation and ROC curve for EEG signal feature extractionCombine wavelets and interval type-2 fuzzy logic system for EEG signal classificationBenchmark datasets downloaded from the BCI competition II are used for experimentsProposed wavelet-IT2FLS outperforms the winner methods of the BCI competition IIIT2FLS dominates competing classifiers: FFNN, SVM, kNN, AdaBoost and ANFIS The nonlinear, noisy and outlier characteristics of electroencephalography (EEG) signals inspire the employment of fuzzy logic due to its power to handle uncertainty This paper introduces an approach to classify motor imagery EEG signals using an interval type-2 fuzzy logic system (IT2FLS) in a combination with wavelet transformation Wavelet coefficients are ranked based on the statistics of the receiver operating characteristic curve criterion The most informative coefficients serve as inputs to the IT2FLS for the classification task Two benchmark datasets, named Ia and Ib, downloaded from the brain-computer interface (BCI) competition II, are employed for the experiments Classification performance is evaluated using accuracy, sensitivity, specificity and F-measure Widely-used classifiers, including feedforward neural network, support vector machine, k-nearest neighbours, AdaBoost and adaptive neuro-fuzzy inference system, are also implemented for comparisons The wavelet-IT2FLS method considerably dominates the comparable classifiers on both datasets, and outperforms the best performance on the Ia and Ib datasets reported in the BCI competition II by 140% and 227% respectively The proposed approach yields great accuracy and requires low computational cost, which can be applied to a real-time BCI system for motor imagery data analysis

...read moreread less

Proceedings Article•DOI•

A new lithography hotspot detection framework based on AdaBoost classifier and simplified feature extraction

[...]

Tetsuaki Matsunawa¹, Jhih-Rong Gao², Bei Yu², David Z. Pan²•Institutions (2)

Toshiba¹, University of Texas at Austin²

18 Mar 2015-Proceedings of SPIE

TL;DR: A highly accurate and low-false-alarm hotspot detection framework that outperforms other works in the 2012 ICCAD contest in terms of both accuracy and false alarm.

...read moreread less

Abstract: Under the low-k1 lithography process, lithography hotspot detection and elimination in the physical verification phase have become much more important for reducing the process optimization cost and improving manufacturing yield. This paper proposes a highly accurate and low-false-alarm hotspot detection framework. To define an appropriate and simplified layout feature for classification model training, we propose a novel feature space evaluation index. Furthermore, by applying a robust classifier based on the probability distribution function of layout features, our framework can achieve very high accuracy and almost zero false alarm. The experimental results demonstrate the effectiveness of the proposed method in that our detector outperforms other works in the 2012 ICCAD contest in terms of both accuracy and false alarm.

...read moreread less

Proceedings Article•DOI•

Real time finger tracking and contour detection for gesture recognition using OpenCV

[...]

Ruchi Manish Gurav, Premanand K. Kadbe

28 May 2015

TL;DR: This paper proposes a context-free grammar based proposed method that gives effective real time performance with great accuracy and robustness for more than four hand gestures and implements the alternate representation method for same gestures i.e. fingertip detection using convex hull algorithm.

...read moreread less

Abstract: Gestures are important for communicating information among the human. Nowadays new technologies of Human Computer Interaction (HCI) are being developed to deliver user's command to the robots. Users can interact with machines through hand, head, facial expressions, voice and touch. The objective of this paper is to use one of the important modes of interaction i.e. hand gestures to control the robot or for offices and household applications. Hand gesture detection algorithms are based on various machine learning methods such as neural networks, support vector machine, and Adaptive Boosting (AdaBoost). Among these methods, AdaBoost based hand-pose detectors are trained with a reduced Haar-like feature set to make the detector robust. The corresponding context-free grammar based proposed method gives effective real time performance with great accuracy and robustness for more than four hand gestures. Rectangles are creating some problem due to that we have also implement the alternate representation method for same gestures i.e. fingertip detection using convex hull algorithm.

...read moreread less

Journal Article•DOI•

Real-time image smoke detection using staircase searching-based dual threshold AdaBoost and dynamic analysis

[...]

Feiniu Yuan, Zhijun Fang¹, Shiqian Wu², Yong Yang, Yuming Fang - Show less +1 more•Institutions (2)

Shanghai University¹, Wuhan University²

17 Sep 2015-Iet Image Processing

TL;DR: The authors combine dual threshold AdaBoost with staircase searching technique to propose and implement an image smoke detection method that has a good robustness in terms of early smoke detection and low false alarm rate and can detect smoke from videos with size of 320 × 240 in real time.

...read moreread less

Abstract: It is very challenging to accurately detect smoke from images because of large variances of smoke colour, textures, shapes and occlusions. To improve performance, the authors combine dual threshold AdaBoost with staircase searching technique to propose and implement an image smoke detection method. First, extended Haar-like features and statistical features are efficiently extracted from integral images from both intensity and saturation components of RGB images. Then, a dual threshold AdaBoost algorithm with a staircase searching technique is proposed to classify the features of smoke for smoke detection. The staircase searching technique aims at keeping consistency of training and classifying as far as possible. Finally, dynamic analysis is proposed to further validate the existence of smoke. Experimental results demonstrate that the proposed system has a good robustness in terms of early smoke detection and low false alarm rate, and it can detect smoke from videos with size of 320 × 240 in real time.

...read moreread less

Journal Article•DOI•

Social Event Classification via Boosted Multimodal Supervised Latent Dirichlet Allocation

[...]

Shengsheng Qian¹, Tianzhu Zhang¹, Changsheng Xu¹, M. Shamim Hossain²•Institutions (2)

Chinese Academy of Sciences¹, King Saud University²

07 Jan 2015-ACM Transactions on Multimedia Computing, Communications, and Applications

TL;DR: The proposed BMM-SLDA can effectively exploit the multimodality and the multiclass property of social events jointly, and make use of the supervised category label information to classify multiclass social event directly, and is suitable for large-scale data analysis by utilizing boosting weighted sampling strategy to iteratively select a small subset of data to efficiently train the corresponding topic models.

...read moreread less

Abstract: With the rapidly increasing popularity of social media sites (e.g., Flickr, YouTube, and Facebook), it is convenient for users to share their own comments on many social events, which successfully facilitates social event generation, sharing and propagation and results in a large amount of user-contributed media data (e.g., images, videos, and text) for a wide variety of real-world events of different types and scales. As a consequence, it has become more and more difficult to exactly find the interesting events from massive social media data, which is useful to browse, search and monitor social events by users or governments. To deal with these issues, we propose a novel boosted multimodal supervised Latent Dirichlet Allocation (BMM-SLDA) for social event classification by integrating a supervised topic model, denoted as multi-modal supervised Latent Dirichlet Allocation (mm-SLDA), in the boosting framework. Our proposed BMM-SLDA has a number of advantages. (1) Our mm-SLDA can effectively exploit the multimodality and the multiclass property of social events jointly, and make use of the supervised category label information to classify multiclass social event directly. (2) It is suitable for large-scale data analysis by utilizing boosting weighted sampling strategy to iteratively select a small subset of data to efficiently train the corresponding topic models. (3) It effectively exploits social event structure by the document weight distribution with classification error and can iteratively learn new topic model to correct the previously misclassified event documents. We evaluate our BMM-SLDA on a real world dataset and show extensive experimental results, which demonstrate that our model outperforms state-of-the-art methods.

...read moreread less

Posted Content•

Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm

[...]

Pascal Germain¹, Alexandre Lacasse¹, François Laviolette¹, Mario Marchand¹, Jean-Francis Roy¹ - Show less +1 more•Institutions (1)

Laval University¹

28 Mar 2015-arXiv: Machine Learning

TL;DR: An extensive analysis of the behavior of majority votes in binary classification is proposed and a risk bound for majority votes, called the C-bound, is introduced that takes into account the average quality of the voters and their average disagreement.

...read moreread less

Journal Article•DOI•

Hyperspectral Image Classification With Limited Labeled Training Samples Using Enhanced Ensemble Learning and Conditional Random Fields

[...]

Fan Li¹, Linlin Xu¹, Parthipan Siva¹, Alexander Wong¹, David A. Clausi¹ - Show less +1 more•Institutions (1)

University of Waterloo¹

08 Apr 2015-IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

TL;DR: Experimental results show that the classification accuracy by MBRF as well as its integration with CRF consistently outperforms the other referenced state-of-the-art classification methods when limited labeled samples are available for training.

...read moreread less

Abstract: Classification of hyperspectral imagery using few labeled samples is a challenging problem, considering the high dimensionality of hyperspectral imagery. Classifiers trained on limited samples with abundant spectral bands tend to overfit, leading to weak generalization capability. To address this problem, we have developed an enhanced ensemble method called multiclass boosted rotation forest (MBRF), which combines the rotation forest algorithm and a multiclass AdaBoost algorithm. The benefit of this combination can be explained by bias-variance analysis, especially in the situation of inadequate training samples and high dimensionality. Furthermore, MBRF innately produces posterior probabilities inherited from AdaBoost, which are served as the unary potentials of the conditional random field (CRF) model to incorporate spatial context information. Experimental results show that the classification accuracy by MBRF as well as its integration with CRF consistently outperforms the other referenced state-of-the-art classification methods when limited labeled samples are available for training.

...read moreread less

Journal Article•DOI•

DELTA: A Distal Enhancer Locating Tool Based on AdaBoost Algorithm and Shape Features of Chromatin Modifications.

[...]

Yiming Lu, Wubin Qu, Guangyu Shan, Chenggang Zhang

19 Jun 2015-PLOS ONE

TL;DR: This study developed an enhancer predicting method, DELTA (Distal Enhancer Locating Tool based on AdaBoost), which significantly outperforms current enhancer prediction methods in prediction accuracy on different datasets and can predict enhancers in one cell type using models trained in other cell types without loss of accuracy.

...read moreread less

Abstract: Accurate identification of DNA regulatory elements becomes an urgent need in the post-genomic era. Recent genome-wide chromatin states mapping efforts revealed that DNA elements are associated with characteristic chromatin modification signatures, based on which several approaches have been developed to predict transcriptional enhancers. However, their practical application is limited by incomplete extraction of chromatin features and model inconsistency for predicting enhancers across different cell types. To address these issues, we define a set of non-redundant shape features of histone modifications, which shows high consistency across cell types and can greatly reduce the dimensionality of feature vectors. Integrating shape features with a machine-learning algorithm AdaBoost, we developed an enhancer predicting method, DELTA (Distal Enhancer Locating Tool based on AdaBoost). We show that DELTA significantly outperforms current enhancer prediction methods in prediction accuracy on different datasets and can predict enhancers in one cell type using models trained in other cell types without loss of accuracy. Overall, our study presents a novel framework for accurately identifying enhancers from epigenetic data across multiple cell types.

...read moreread less

Journal Article•DOI•

Supervised geochemical anomaly detection by pattern recognition

[...]

Arman Mohammadi Gonbadi¹, Seyed Hasan Tabatabaei¹, Emmanuel John M. Carranza²•Institutions (2)

Isfahan University of Technology¹, James Cook University²

01 Oct 2015-Journal of Geochemical Exploration

TL;DR: Applied classification algorithms outperform Gaussian linear discriminant analysis (GLDA) and provide more accurate, robust and reliable results and feature selection algorithms could play an important role in increasing the accuracy and generalization ability of the classifiers used.

...read moreread less

Journal Article•DOI•

Identification of VoIP encrypted traffic using a machine learning approach

[...]

Riyad Alshammari¹, A. Nur Zincir-Heywood²•Institutions (2)

King Saud bin Abdulaziz University for Health Sciences¹, Dalhousie University²

01 Jan 2015-Journal of King Saud University - Computer and Information Sciences archive

TL;DR: This work investigates the performance of three different machine learning algorithms, namely C5.0, AdaBoost and Genetic programming (GP), to generate robust classifiers for identifying VoIP encrypted traffic, and shows that finding and employing the most suitable sampling and machine learning technique can improve theperformance of classifying VoIP significantly.

...read moreread less

Abstract: We investigate the performance of three different machine learning algorithms, namely C5.0, AdaBoost and Genetic programming (GP), to generate robust classifiers for identifying VoIP encrypted traffic. To this end, a novel approach (Alshammari and Zincir-Heywood, 2011) based on machine learning is employed to generate robust signatures for classifying VoIP encrypted traffic. We apply statistical calculation on network flows to extract a feature set without including payload information, and information based on the source and destination of ports number and IP addresses. Our results show that finding and employing the most suitable sampling and machine learning technique can improve the performance of classifying VoIP significantly.

...read moreread less

Journal Article•DOI•

Intelligent Travel Recommendation System by Mining Attributes from Community Contributed Photos

[...]

V. Subramaniyaswamy¹, V. Vijayakumar², R. Logesh¹, V. Indragandhi¹•Institutions (2)

Shanmugha Arts, Science, Technology & Research Academy¹, VIT University²

01 Jan 2015-Procedia Computer Science

TL;DR: In this paper, a system which helps user in finding tourist locations that he/she might like to visit a place from available user contributed photos of that place available on photo sharing websites.

...read moreread less

Proceedings Article•DOI•

A comparative study of various classifiers for automated sleep apnea screening based on single-lead electrocardiogram

[...]

Ahnaf Rashik Hassan¹•Institutions (1)

Bangladesh University of Engineering and Technology¹

01 Nov 2015

TL;DR: This study suggests that ELM is a promising classification model for automated sleep apnea detection by employing statistical moment based features and Empirical Mode Decomposition in a feature extraction scheme.

...read moreread less

Abstract: Computerized sleep apnea detection is necessary to alleviate the onus of physicians of analyzing a high volume of data. The overall performance of an automated apnea detection scheme greatly depends of the choice of classifier. Most of the existing works focus on the feature extraction part. The effect of various classification models is poorly studied. In the present work, we employ statistical moment based features and Empirical Mode Decomposition to devise a feature extraction scheme. Furthermore, we study the performance of nine well-know classifiers for this feature extraction scheme- naive bayes, kNN, neural network, AdaBoost, Bagging, random forest, extreme learning machine (ELM), discriminant analysis and restricted boltzmann machine. The optimal choice of parameters of each of the classifiers is also studied. This study suggests that ELM is a promising classification model for automated sleep apnea detection.

...read moreread less

Journal Article•DOI•

Automatic Detection of Polyp Using Hessian Filter and HOG Features

[...]

Yuji Iwahori¹, Akira Hattori¹, Yoshinori Adachi¹, Manas Kamal Bhuyan², Robert J. Woodham³, Kunio Kasugai⁴ - Show less +2 more•Institutions (4)

Chubu University¹, Indian Institute of Technology Guwahati², University of British Columbia³, Aichi Medical University⁴

01 Jan 2015-Procedia Computer Science

TL;DR: A new approach for the automatic detection of polyp regions in an endoscope image using a Hessian Filter and machine learning approaches is proposed and K-means++ is introduced to integrate the detection results in the classification.

...read moreread less

Journal Article•DOI•

A new pedestrian detection method based on combined HOG and LSS features

[...]

Shihong Yao¹, Shaoming Pan¹, Tao Wang¹, Chun-Hou Zheng², Weiming Shen¹, Yanwen Chong¹ - Show less +2 more•Institutions (2)

Wuhan University¹, Anhui University²

03 Mar 2015-Neurocomputing

TL;DR: Experimental results show that the description ability of the new combination features is improved on the basis of the single feature and HOG–LSS combined feature has the strongest description ability.

...read moreread less

Collapse