scispace - formally typeset
Search or ask a question
Author

Marina Skurichina

Bio: Marina Skurichina is an academic researcher from Delft University of Technology. The author has contributed to research in topics: Linear discriminant analysis & Boosting (machine learning). The author has an hindex of 17, co-authored 25 publications receiving 1594 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: Simulation studies show that the performance of the combining techniques is strongly affected by the small sample size properties of the base classifier: boosting is useful for large training sample sizes, while bagging and the random subspace method are useful for criticalTraining sample sizes.
Abstract: Recently bagging, boosting and the random subspace method have become popular combining techniques for improving weak classifiers. These techniques are designed for, and usually applied to, decision trees. In this paper, in contrast to a common opinion, we demonstrate that they may also be useful in linear discriminant analysis. Simulation studies, carried out for several artificial and real data sets, show that the performance of the combining techniques is strongly affected by the small sample size properties of the base classifier: boosting is useful for large training sample sizes, while bagging and the random subspace method are useful for critical training sample sizes. Finally, a table describing the possible usefulness of the combining techniques for linear classifiers is presented.

449 citations

Journal ArticleDOI
TL;DR: Diversity measures indicated that Boosting succeeds in inducing diversity even for stable classifiers whereas Bagging does not, confirming in a quantitative way the intuitive explanation behind the success of Boosting for linear classifiers for increasing training sizes, and the poor performance of Bagging.

165 citations

Journal ArticleDOI
TL;DR: The contributions of diffuse reflectance and autofluorescence spectroscopy to diagnostic performance are determined in the present study.
Abstract: Background and Objectives: Autofluorescence and diffuse reflectance spectroscopy have been used separately and combined for tissue diagnostics. Previously, we assessed the value of autofluorescence spectroscopy for the classification of oral (pre-)malignancies. In the present study, we want to determine the contributions of diffuse reflectance and autofluorescence spectroscopy to diagnostic performance. Study Design/Materials and Methods: Autofluorescence and diffuse reflectance spectra were recorded from 172 oral lesions and 70 healthy volunteers. Autofluorescence spectra were corrected in first order for blood absorption effects using diffuse reflectance spectra. Principal Components Analysis (PCA) with various classifiers was applied to distinguish (1) cancer and (2) all lesions from healthy oral mucosa, and (3) dysplastic and malignant lesions from benign lesions. Autofluorescence and diffuse reflectance spectra were evaluated separately and combined. Results: The classification of cancer versus healthy mucosa gave excellent results for diffuse reflectance as well as corrected autofluorescence (Receiver Operator Characteristic (ROC) areas up to 0.98). For both autofluorescence and diffuse reflectance spectra, the classification of lesions versus healthy mucosa was successful (ROC areas up to 0.90). However, the classification of benign and (pre-)malignant lesions was not successful for raw or corrected autofluorescence spectra (ROC areas <0.70). For diffuse reflectance spectra, the results were slightly better (ROC areas up to 0.77). Conclusions: The results for plain and corrected autofluorescence as well as diffuse reflectance spectra were similar. The relevant information for distinguishing lesions from healthy oral mucosa is probably sufficiently contained in blood absorption and scattering information, as well as in corrected autofluorescence. However, neither type of information is capable of distinguishing benign from dysplastic and malignant lesions. Combining autofluorescence and reflectance only slightly improved the results. Lasers Surg. Med. 36:356–364, 2005. 2005 Wiley-Liss, Inc.

141 citations

Journal ArticleDOI
TL;DR: Bagging (bootstrapping and aggregating) is studied for linear classifiers and it is shown experimentally that bagging might improve the performance of the classifier only for very unstable situations.

126 citations

Journal ArticleDOI
TL;DR: The influence of anatomical location on healthy mucosa autofluorescence is investigated and the reliability of this tool for oral cancer detection is improved by using a reference database of spectra fromhealthy mucosa.
Abstract: Background and Objectives: Autofluorescence spectroscopy is a promising tool for oral cancer detection. Its reliability might be improved by using a reference database of spectra from healthy mucosa. We investigated the influence of anatomical location on healthy mucosa autofluorescence. Study Design/Materials and Methods: Spectra were recorded from 97 volunteers using seven excitation wavelengths (350-450 nm), 455-867 nm emission. We studied intensity and applied principal component analysis (PCA) with classification algorithms. Class overlap estimates were calculated. Results: We observed differences in fluorescence intensity between locations. These were significant but small compared to standard deviations (SD). Normalized spectra looked similar for locations, except for the dorsal side of the tongue (DST) and the vermilion border (VB). Porphyrin-like fluorescence was observed frequently, especially at DST. PCA and classification confirmed VB and DST to be spectrally distinct. The remaining locations showed large class overlaps. Conclusions: No relevant systematic spectral differences have been observed between most locations, allowing the use of one large reference database. For DST and VB separate databases are required.

91 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A common theoretical framework for combining classifiers which use distinct pattern representations is developed and it is shown that many existing schemes can be considered as special cases of compound classification where all the pattern representations are used jointly to make a decision.
Abstract: We develop a common theoretical framework for combining classifiers which use distinct pattern representations and show that many existing schemes can be considered as special cases of compound classification where all the pattern representations are used jointly to make a decision. An experimental comparison of various classifier combination schemes demonstrates that the combination rule developed under the most restrictive assumptions-the sum rule-outperforms other classifier combinations schemes. A sensitivity analysis of the various schemes to estimation errors is carried out to show that this finding can be justified theoretically.

5,670 citations

MonographDOI
02 Jul 2004
TL;DR: This combining pattern classifiers methods and algorithms helps people to enjoy a good book with a cup of coffee in the afternoon, instead they cope with some harmful virus inside their computer.
Abstract: Thank you for downloading combining pattern classifiers methods and algorithms. Maybe you have knowledge that, people have look hundreds times for their chosen novels like this combining pattern classifiers methods and algorithms, but end up in infectious downloads. Rather than enjoying a good book with a cup of coffee in the afternoon, instead they cope with some harmful virus inside their computer.

2,667 citations

Journal ArticleDOI
TL;DR: This paper reviews existing ensemble techniques and can be served as a tutorial for practitioners who are interested in building ensemble based systems.
Abstract: The idea of ensemble methodology is to build a predictive model by integrating multiple models. It is well-known that ensemble methods can be used for improving prediction performance. Researchers from various disciplines such as statistics and AI considered the use of ensemble methodology. This paper, review existing ensemble techniques and can be served as a tutorial for practitioners who are interested in building ensemble based systems.

2,273 citations

Journal ArticleDOI
TL;DR: In this article, the authors evaluated four statistical models (Regression Tree Analysis (RTA), Bagging Trees (BT), Random Forests (RF), and Multivariate Adaptive Regression Splines (MARS) for predictive vegetation mapping under current and future climate scenarios according to the Canadian Climate Centre global circulation model.
Abstract: The task of modeling the distribution of a large number of tree species under future climate scenarios presents unique challenges. First, the model must be robust enough to handle climate data outside the current range without producing unacceptable instability in the output. In addition, the technique should have automatic search mechanisms built in to select the most appropriate values for input model parameters for each species so that minimal effort is required when these parameters are fine-tuned for individual tree species. We evaluated four statistical models—Regression Tree Analysis (RTA), Bagging Trees (BT), Random Forests (RF), and Multivariate Adaptive Regression Splines (MARS)—for predictive vegetation mapping under current and future climate scenarios according to the Canadian Climate Centre global circulation model. To test, we applied these techniques to four tree species common in the eastern United States: loblolly pine (Pinus taeda), sugar maple (Acer saccharum), American beech (Fagus grandifolia), and white oak (Quercus alba). When the four techniques were assessed with Kappa and fuzzy Kappa statistics, RF and BT were superior in reproducing current importance value (a measure of basal area in addition to abundance) distributions for the four tree species, as derived from approximately 100,000 USDA Forest Service’s Forest Inventory and Analysis plots. Future estimates of suitable habitat after climate change were visually more reasonable with BT and RF, with slightly better performance by RF as assessed by Kappa statistics, correlation estimates, and spatial distribution of importance values. Although RTA did not perform as well as BT and RF, it provided interpretive models for species whose distributions were captured well by our current set of predictors. MARS was adequate for predicting current distributions but unacceptable for future climate. We consider RTA, BT, and RF modeling approaches, especially when used together to take advantage of their individual strengths, to be robust for predictive mapping and recommend their inclusion in the ecological toolbox.

1,879 citations

Journal ArticleDOI
TL;DR: This work examined the rotation forest ensemble on a random selection of 33 benchmark data sets from the UCI repository and compared it with bagging, AdaBoost, and random forest and prompted an investigation into diversity-accuracy landscape of the ensemble models.
Abstract: We propose a method for generating classifier ensembles based on feature extraction. To create the training data for a base classifier, the feature set is randomly split into K subsets (K is a parameter of the algorithm) and principal component analysis (PCA) is applied to each subset. All principal components are retained in order to preserve the variability information in the data. Thus, K axis rotations take place to form the new features for a base classifier. The idea of the rotation approach is to encourage simultaneously individual accuracy and diversity within the ensemble. Diversity is promoted through the feature extraction for each base classifier. Decision trees were chosen here because they are sensitive to rotation of the feature axes, hence the name "forest". Accuracy is sought by keeping all principal components and also using the whole data set to train each base classifier. Using WEKA, we examined the rotation forest ensemble on a random selection of 33 benchmark data sets from the UCI repository and compared it with bagging, AdaBoost, and random forest. The results were favorable to rotation forest and prompted an investigation into diversity-accuracy landscape of the ensemble models. Diversity-error diagrams revealed that rotation forest ensembles construct individual classifiers which are more accurate than these in AdaBoost and random forest, and more diverse than these in bagging, sometimes more accurate as well

1,708 citations