scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Investigations on Impact of Feature Normalization Techniques on Classifier's Performance in Breast Tumor Classification

22 Apr 2015-International Journal of Computer Applications (Foundation of Computer Science (FCS))-Vol. 116, Iss: 19, pp 11-15
TL;DR: This paper investigates and evaluates some popular feature normalization techniques and studies their impact on performance of classifier with application to breast tumor classification using ultrasound images and shows that that normalization of features has significant effect on the classification accuracy.
Abstract: Feature extraction and feature normalization is an important preprocessing technique, usually employed before classification. Feature normalization is a useful step to restrict the values of all features within predetermined ranges. However, appropriate choice of normalization technique and normalization range is an important issue, since, applying normalization on the input could change the structure of data and thereby affecting the outcome of multivariate analysis and calibration used in data mining and pattern recognition problems. This paper investigates and evaluates some popular feature normalization techniques and studies their impact on performance of classifier with application to breast tumor classification using ultrasound images. For evaluating the feature normalization techniques, back-propagation artificial neural network [BPANN] and support vector machine [SVM] classifier models are used. Results show that that normalization of features has significant effect on the classification accuracy. General Terms Pattern Recognition, Medical Image Processing.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: The results reveal the potential in using Random Forest (RF) or Support Vector Machine (SVM) techniques for estimating the presence of PD with a very high accuracy.

93 citations

Journal ArticleDOI
TL;DR: This work uses machine learning to create models for the early prediction of students’ performance in solving LMS assignments, by just analyzing the LMS log files generated up to the moment of prediction, and detects at-risk, fail and excellent students in the early stages of the course.
Abstract: The early prediction of students' performance is a valuable resource to improve their learning. If we are able to detect at-risk students in the initial stages of the course, we will have more time to improve their performance. Likewise, excellent students could be motivated with customized additional activities. This is why there are research works aimed to early detect students’ performance. Some of them try to achieve it with the analysis of LMS log files, which store information about student interaction with the LMS. Many works create predictive models with the log files generated for the whole course, but those models are not useful for early prediction because the actual log information used for predicting is different to the one used to train the models. Other works do create predictive models with the log information retrieved at the early stages of courses, but they are just focused on a particular type of course. In this work, we use machine learning to create models for the early prediction of students' performance in solving LMS assignments, by just analyzing the LMS log files generated up to the moment of prediction. Moreover, our models are course agnostic, because the datasets are created with all the University of Oviedo1 courses for one academic year. We predict students' performance at 10%, 25%, 33% and 50% of the course length. Our objective is not to predict the exact student's mark in LMS assignments, but to detect at-risk, fail and excellent students in the early stages of the course. That is why we create different classification models for each of those three student groups. Decision tree, nave Bayes, logistic regression, multilayer perceptron (MLP) neural network, and support vector machine models are created and evaluated. Accuracies of all the models grow as the moment of prediction increases. Although all the algorithms but nave Bayes show accuracy differences lower than 5%, MLP obtains the best performance: from 80.1% accuracy when 10% of the course has been delivered to 90.1% when half of it has taken place. We also discuss the LMS log entries that most influence the students' performance. By using a clustering algorithm, we detect six different clusters of students regarding their interaction with the LMS. Analyzing the interaction patterns of each cluster, we find that those patterns are repeated in all the early stages of the course. Finally, we show how four out of those six student-LMS interaction patterns have a strong correlation with students' performance.

55 citations


Cites methods from "Investigations on Impact of Feature..."

  • ...We normalize all the feature values between 0 and 1, since some classifiers such as MLP and SVN show better performance with normalized features [38]....

    [...]

Journal ArticleDOI
TL;DR: Results indicate that the DWTLSTM outperforms conventional methods such as k-nearest neighbor (k-NN), linear discriminant analysis (LDA), support vector machine/support vector regression (SVM/SVR), multilayer perceptron (MLP), and even standard long-short term memory (LSTM).
Abstract: The smart packaging system is needed to continuously monitor the quality of beef and microbial population for both the meat industries as well as end consumers. Moreover, several feasibility studies of electronic nose (e-nose) for rapid beef quality assessment are also conducted in recent years. The characteristics of e-nose are fast, cheap, and easy to use make it suitable and scalable for beef quality monitoring applications. It is also potential to be integrated with consumer electronics such as refrigerator and meat chiller. However, the inevitable challenge is how to handle time-series data that is contaminated with noise. In this paper, discrete wavelet transform and long short-term memory (DWTLSTM) is proposed to overcome the e-nose signal contaminated with noise in monitoring beef quality. In beef quality classification task, our proposed has a favorable performance with 94.83% of average accuracy and 85.05% of average F-measure. Moreover, it presents a satisfactory performance in the prediction of microbial population (RMSE = 0.0515 and R2 = 0.9712). These results indicate that the DWTLSTM outperforms conventional methods such as k-nearest neighbor (k-NN), linear discriminant analysis (LDA), support vector machine/support vector regression (SVM/SVR), multilayer perceptron (MLP), and even standard long-short term memory (LSTM).

54 citations

Journal ArticleDOI
TL;DR: The empirical results suggest that eliminating doubtful training examples can improve the decision making performance of expert systems, and the proposed approach show promising results and need further evaluation in other applications of expert and intelligent systems.
Abstract: New classification approach is developed.Proposed approach eliminates ambiguous and doubtful samples from training dataset.Proposed approach is used to classify breast tumors in ultrasound images.Proposed classifier outperforms conventional ones.Proposed method achieves high classification accuracy. The performance of supervised classification algorithms is highly dependent on the quality of training data. Ambiguous training patterns may misguide the classifier leading to poor classification performance. Further, the manual exploration of class labels is an expensive and time consuming process. An automatic method is needed to identify noisy samples in the training data to improve the decision making process. This article presents a new classification technique by combining an unsupervised learning technique (i.e. fuzzy c-means clustering (FCM)) and supervised learning technique (i.e. back-propagation artificial neural network (BPANN)) to categorize benign and malignant tumors in breast ultrasound images. Unsupervised learning is employed to identify ambiguous examples in the training data. Experiments were conducted on 178 B-mode breast ultrasound images containing 88 benign and 90 malignant cases on MATLABź software platform. A total of 457 features were extracted from ultrasound images followed by feature selection to determine the most significant features. Accuracy, sensitivity, specificity, area under the receiver operating characteristic curve (AUC) and Mathew's correlation coefficient (MCC) were used to access the performance of different classifiers. The result shows that the proposed approach achieves classification accuracy of 95.862% when all the 457 features were used for classification. However, the accuracy is reduced to 94.138% when only 19 most relevant features selected by multi-criterion feature selection approach were used for classification. The results were discussed in light of some recently reported studies. The empirical results suggest that eliminating doubtful training examples can improve the decision making performance of expert systems. The proposed approach show promising results and need further evaluation in other applications of expert and intelligent systems.

49 citations

Journal ArticleDOI
R. Krithiga1, P. Geetha1
TL;DR: This study starts with an overview of tissue preparation, analysis of stained images, and a prognosis for cancer patients, and the performance of the machine learning and deep learning techniques applied to predict breast cancer recurrence rates is evaluated.
Abstract: Digital pathology represents a major evolution in modern medicine. Pathological examinations constitute the standard in medical protocols and the law, and call for specific action in the diagnostic process. Advances in digital pathology have made it possible for image analysis to take advantage of the information analysis from hematoxylin and eosin stained images. In spite of concern, it is recorded in the majority of breast cancer datasets, which makes research more difficult in prediction. The objective of our work is to evaluate the performance of the machine learning and deep learning techniques applied to predict breast cancer recurrence rates. This study starts with an overview of tissue preparation, analysis of stained images, and a prognosis for cancer patients. The high accuracy results recorded are compromised in terms of sensitivity and specificity. The missing loss function and class imbalance problems are rarely addressed, and most often the chosen performance measures are context-inappropriate. The challenge that presents itself is to analyse whole slide images for the content imaging required with diagnostic biomarkers, and prognosis support backed by digital pathology.

47 citations

References
More filters
Journal ArticleDOI
01 Nov 1973
TL;DR: These results indicate that the easily computable textural features based on gray-tone spatial dependancies probably have a general applicability for a wide variety of image-classification applications.
Abstract: Texture is one of the important characteristics used in identifying objects or regions of interest in an image, whether the image be a photomicrograph, an aerial photograph, or a satellite image. This paper describes some easily computable textural features based on gray-tone spatial dependancies, and illustrates their application in category-identification tasks of three different kinds of image data: photomicrographs of five kinds of sandstones, 1:20 000 panchromatic aerial photographs of eight land-use categories, and Earth Resources Technology Satellite (ERTS) multispecial imagery containing seven land-use categories. We use two kinds of decision rules: one for which the decision regions are convex polyhedra (a piecewise linear decision rule), and one for which the decision regions are rectangular parallelpipeds (a min-max decision rule). In each experiment the data set was divided into two parts, a training set and a test set. Test set identification accuracy is 89 percent for the photomicrographs, 82 percent for the aerial photographic imagery, and 83 percent for the satellite imagery. These results indicate that the easily computable textural features probably have a general applicability for a wide variety of image-classification applications.

20,442 citations


"Investigations on Impact of Feature..." refers background in this paper

  • ...Summary of texture and shape features used in classification of breast tumor [10-23]...

    [...]

Book
01 Dec 2003
TL;DR: 1. Fundamentals of Image Processing, 2. Intensity Transformations and Spatial Filtering, and 3. Frequency Domain Processing.
Abstract: 1. Introduction. 2. Fundamentals. 3. Intensity Transformations and Spatial Filtering. 4. Frequency Domain Processing. 5. Image Restoration. 6. Color Image Processing. 7. Wavelets. 8. Image Compression. 9. Morphological Image Processing. 10. Image Segmentation. 11. Representation and Description. 12. Object Recognition.

6,306 citations

Journal ArticleDOI
Robert M. Haralick1
01 Jan 1979
TL;DR: This survey reviews the image processing literature on the various approaches and models investigators have used for texture, including statistical approaches of autocorrelation function, optical transforms, digital transforms, textural edgeness, structural element, gray tone cooccurrence, run lengths, and autoregressive models.
Abstract: In this survey we review the image processing literature on the various approaches and models investigators have used for texture. These include statistical approaches of autocorrelation function, optical transforms, digital transforms, textural edgeness, structural element, gray tone cooccurrence, run lengths, and autoregressive models. We discuss and generalize some structural approaches to texture based on more complex primitives than gray tone. We conclude with some structural-statistical generalizations which apply the statistical techniques to the structural primitives.

5,112 citations

01 Mar 1975
TL;DR: Three standard approaches to automatic texture classification make use of features based on the Fourier power spectrum, on second-order gray level statistics, and on first-order statistics of gray level differences, respectively; it was found that the Fouriers generally performed more poorly, while the other feature sets all performned comparably.
Abstract: Three standard approaches to automatic texture classification make use of features based on the Fourier power spectrum, on second-order gray level statistics, and on first-order statistics of gray level differences, respectively. Feature sets of these types, all designed analogously, were used to classify two sets of terrain samples. It was found that the Fourier features generally performed more poorly, while the other feature sets all performned comparably.

1,526 citations


"Investigations on Impact of Feature..." refers background in this paper

  • ...Summary of texture and shape features used in classification of breast tumor [10-23]...

    [...]

Journal ArticleDOI
01 Apr 1976
TL;DR: In this paper, three standard approaches to automatic texture classification make use of features based on the Fourier power spectrum, on second-order gray level statistics, and on first-order statistics of gray level differences, respectively.
Abstract: Three standard approaches to automatic texture classification make use of features based on the Fourier power spectrum, on second-order gray level statistics, and on first-order statistics of gray level differences, respectively. Feature sets of these types, all designed analogously, were used to classify two sets of terrain samples. It was found that the Fourier features generally performed more poorly, while the other feature sets all performned comparably.

1,379 citations