scispace - formally typeset
Search or ask a question
Author

B. S. Harish

Bio: B. S. Harish is an academic researcher from Sri Jayachamarajendra College of Engineering. The author has contributed to research in topics: Cluster analysis & Feature selection. The author has an hindex of 13, co-authored 66 publications receiving 605 citations. Previous affiliations of B. S. Harish include MVJ College of Engineering & University of Mysore.


Papers
More filters
Journal Article
TL;DR: Various text representation schemes and compare different classifiers used to classify text documents to the predefined classes are presented and the existing methods are compared and contrasted based on qualitative parameters.
Abstract: Text classification is one of the important research issues in the field of text mining, where the documents are classified with supervised knowledge. In literature we can find many text representation schemes and classifiers/learning algorithms used to classify text documents to the predefined categories. In this paper, we present various text representation schemes and compare different classifiers used to classify text documents to the predefined classes. The existing methods are compared and contrasted based on qualitative parameters viz., criteria used for classification, algorithms adopted and classification time complexities. General Terms Pattern Recognition, Text Mining, Algorithms

103 citations

Journal ArticleDOI
TL;DR: This work shows that the use of Hybrid features obtained by concatenating Machine Learning features (TF, TF-IDF) with Lexicon features (Positive-Negative word count, Connotation) gives better results when tested against classifiers like SVM, Naive Bayes, KNN and Maximum Entropy.
Abstract: Social Networking sites have become popular and common places for sharing wide range of emotions through short texts. These emotions include happiness, sadness, anxiety, fear, etc. Analyzing short texts helps in identifying the sentiment expressed by the crowd. Sentiment Analysis on IMDb movie reviews identifies the overall sentiment or opinion expressed by a reviewer towards a movie. Many researchers are working on pruning the sentiment analysis model that clearly identifies and distinguishes between a positive review and a negative review. In the proposed work, we show that the use of Hybrid features obtained by concatenating Machine Learning features (TF, TF-IDF) with Lexicon features (Positive-Negative word count, Connotation) gives better results both in terms of accuracy and complexity when tested against classifiers like SVM, Naive Bayes, KNN and Maximum Entropy. The proposed model clearly differentiates between a positive review and negative review. Since understanding the context of the reviews plays an important role in classification, using hybrid features helps in capturing the context of the movie reviews and hence increases the accuracy of classification.

58 citations

Journal Article
TL;DR: The main objective of this paper is to review the various machine learning approaches for diagnosing Myocardial Infarction, differentiate Arrhythmias (heart beat variation), Hypertrophy (increase thickness of the heart muscle) and Enlargement of Heart.
Abstract: Electrocardiogram (ECG) is a P, QRS and T wave demonstrating the electrical activity of the heart. Feature extraction and segmentation in ECG plays a significant role in diagnosing most of the cardiac disease. The main objective of this paper is to review the various machine learning approaches for diagnosing Myocardial Infarction (heart attack), differentiate Arrhythmias (heart beat variation), Hypertrophy (increase thickness of the heart muscle) and Enlargement of Heart. Further, we also present various machine learning approaches and compare different methods and results used to analyze the ECG. The existing methods are compared and contrasted based on qualitative and qualitative parameters viz., purpose of the work, algorithms adopted and results obtained.

50 citations

Proceedings ArticleDOI
01 Oct 2017
TL;DR: This paper compares various classification algorithms such as Random Forest, Gradient Boosting, Decision Tree, Adaptive Boost, Logistic Regression and Gaussian Naïve Bayes to detect sarcasm in tweets from the Twitter Streaming API and chooses the best classifier to provide the best possible accuracy.
Abstract: The growth of social media has been exponential in the recent years. Immense amount of data is being put out onto the public domain through social media. This huge publicly available data can be used for research and a variety of applications. The objective of this paper is to counter problems with the social media dataset, namely : short text nature - the limited quantity of text data (140 to 160 characters), continuous streaming nature, usage of short forms and modern slangs and increasing use of sarcasm in messages and posts. Sarcastic tweets can mislead data mining activities and result in wrong classification. This paper compares various classification algorithms such as Random Forest, Gradient Boosting, Decision Tree, Adaptive Boost, Logistic Regression and Gaussian Naive Bayes to detect sarcasm in tweets from the Twitter Streaming API. The best classifier is chosen and paired with various pre-processing and filtering techniques using emoji and slang dictionary mapping to provide the best possible accuracy. The emoji and slang dictionary being the novel idea introduced in this paper. The obtained results can be used as input to other research and applications.

47 citations

Proceedings ArticleDOI
TL;DR: The aim of this competition was to record the recent developments in sclera segmentation and eye recognition in the visible spectrum (using iris, sClera and peri-ocular, and their fusion), and also to gain the attention of researchers on this subject.
Abstract: This paper summarises the results of the Sclera Segmentation and Eye Recognition Benchmarking Competition (SSERBC 2017) It was organised in the context of the International Joint Conference on Biometrics (IJCB 2017) The aim of this competition was to record the recent developments in sclera segmentation and eye recognition in the visible spectrum (using iris, sclera and peri-ocular, and their fusion), and also to gain the attention of researchers on this subject In this regard, we have used the Multi-Angle Sclera Dataset (MASD version 1) It is comprised of2624 images taken from both the eyes of 82 identities Therefore, it consists of images of 164 (82×2) eyes A manual segmentation mask of these images was created to baseline both tasks Precision and recall based statistical measures were employed to evaluate the effectiveness of the segmentation and the ranks of the segmentation task Recognition accuracy measure has been employed to measure the recognition task Manually segmented sclera, iris and peri-ocular regions were used in the recognition task Sixteen teams registered for the competition, and among them, six teams submitted their algorithms or systems for the segmentation task and two of them submitted their recognition algorithm or systems The results produced by these algorithms or systems reflect current developments in the literature of sclera segmentation and eye recognition, employing cutting edge techniques The MASD version 1 dataset with some of the ground truth will be freely available for research purposes The success of the competition also demonstrates the recent interests of researchers from academia as well as industry on this subject

40 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

Journal ArticleDOI
TL;DR: An extensive analysis and comparison between different ML techniques using a case study from Algeria is undertaken, noting that tree-based ensemble algorithms achieve excellent results compared to other machine learning algorithms and that the Random Forest algorithm offers robust performance for accurate landslide susceptibility mapping with only a small number of adjustments required before training the model.

362 citations

Journal ArticleDOI
TL;DR: This paper transforms a document using three document representation methods: term frequency–inverse document frequency (TF–IDF) based on the bag-of-words scheme, topic distribution based on latent Dirichlet allocation (LDA), and neural-network-based document embedding known as document to vector (Doc2Vec).

270 citations

Journal ArticleDOI
TL;DR: This paper has tried to give the introduction ofText classification, process of text classification as well as the overview of the classifiers and tried to compare the some existing classifier on basis of few criteria like time complexity, principal and performance.
Abstract: As most information (over 80%) is stored as text, text mining is believed to have a high commercial potential value. knowledge may be discovered from m any sources of information; yet, unstructured texts remain the largest readily available source of knowledge .Text classification which classifies the documents according to predefined categories .In this paper we are tried to give the introduction of tex t classification, process of text classification as well as the overview of the classifiers and tried to compare the some existing classifier on basis of few criteria like time complexity, principal and performance .

238 citations

Journal Article
TL;DR: In English, I'm going to argue that evaluation is about assessing different possibilities for meaning and drawing conclusions.
Abstract: Evaluation is hard It's one of those command words that comes up in a range of contexts In Math it is to do with substituting variables in order to solve an equation In many subjects it's about looking at the pros and cons of a situation in order to make a decision In English, I'm going to argue that it is about assessing different possibilities for meaning and drawing conclusions In all cases, it's a process

224 citations