scispace - formally typeset
Search or ask a question

Showing papers by "Saptarsi Goswami published in 2019"


Proceedings ArticleDOI
01 Oct 2019
TL;DR: This research work uses variations of recurrent neural networks, such as simple RNN, GRU, LSTM and Bidirectional L STM, to find out which model performs the best in multi-class classification of sentiment.
Abstract: Sentiment Analysis is a major element in Artificial Intelligence. Its applications include machine translation, text analysis, computational linguistics, etc. In most cases, classification of sentiment is done into two or three classes. But in some situations, for example rating a product from Amazon, there are multiple classes. One major challenge in such tasks is the class imbalance which reduces the accuracy by making the model biased. To deal with this problem, we use oversampling to reduce the class imbalance of the dataset before training the model. In this research work, first we use variations of recurrent neural networks, such as simple RNN, GRU, LSTM and Bidirectional LSTM, to find out which model performs the best in multi-class classification of sentiment. Then, we use that model to understand the effect of oversampling a dataset before using it to train a model.

15 citations


Journal ArticleDOI
TL;DR: A feature selection algorithm, which represents the dataset as a graph and then uses maximal independent sets and minimal vertex covers to improve traditional hill climbing search and produces statistically significant improvement over standard search-based methods.
Abstract: Search-based methods that use matrix- or vector-based representations of the dataset are commonly employed to solve the problem of feature selection. These methods are more generalized and easy to apply. Recently, a set of algorithms have started using graph-based representation of the dataset instead of the traditional representations. These methods require additional modelling as the dataset needs to be represented as a graph. However, graph-based methods help in visualizing inter-feature relationship based on which graph-theoretic principles can be applied to identify good-quality feature subsets. A combination of the graph-based representation with traditional search techniques has the potential to increase model performance as well as interpretability. As per literature study, there is hardly any method which combines these approaches. In this paper, we have proposed a feature selection algorithm, which represents the dataset as a graph and then uses maximal independent sets and minimal vertex covers to improve traditional hill climbing search. The proposed method produces statistically significant improvement over (i) hill climbing, (ii) standard search-based methods and (iii) pure graph-based methods.

13 citations


Book ChapterDOI
01 Jan 2019
TL;DR: A comparison has been done with genetic algorithm, which shows the effectiveness of hill climbing methods in the context of feature selection.
Abstract: Feature selection remains one of the most important steps for usability of a model for both supervised and unsupervised classification. For a dataset, with n features, the number of possible feature subsets is 2n. Even for a moderate size of n, there is a combinatorial explosion in the search space. Feature selection is a NP-hard problem; hence finding the optimal solution is not feasible. Typically various kinds of intelligent and metaheuristic search techniques can be employed for this purpose. Hill climbing is arguably the simplest of such techniques. It has many variants based on (a) trade-off between greediness and randomness, (b) direction of the search, and (c) size of the neighborhood. Consequently it might not be trivial for the practitioner to choose a suitable method for the task in hand. In this paper, we have attempted to address this issue in the context of feature selection. The descriptions of the methods are followed by an extensive empirical study over 20 publicly available datasets. Finally a comparison has been done with genetic algorithm, which shows the effectiveness of hill climbing methods in the context of feature selection.

12 citations


Proceedings ArticleDOI
01 Sep 2019
TL;DR: Jeffries-Matusita (JM) distance improves Bhattacharya distance by normalizing it between 0 and 2, which can provide a good intuition on how good a dataset is for classification and point out the need of or lack of further feature collection.
Abstract: Feature selection is one of the most important preprocessing steps in Machine Learning. This can be broadly divided into search based methods and ranking based methods. The ranking based methods are very popular because they need much lesser computational power. There can be many different ways to rank the features. One of the ways to measure effectiveness of a feature is by evaluating its ability to separate the classes involved. These interclass Separability based measures can be directly used as a feature ranking tool for binary classification problems. Bhattacharya Distance which is the most popular among them has been used majorly in a recursive setup to select good quality feature subsets. Jeffries-Matusita (JM) distance improves Bhattacharya distance by normalizing it between 0 and 2. In this paper, we have ranked the features based on JM distance. The results are comparable with mutual information, Relief and Chi Squared based measures as per experiments conducted over 24 public datasets but in much lesser time. JM distance also provide some intuition about the dataset prior to any feature selection or machine learning algorithm. A comparison has been done on classification accuracy and JM scores of these datasets, which can provide a good intuition on how good a dataset is for classification and point out the need of or lack of further feature collection.

9 citations


Proceedings ArticleDOI
01 Sep 2019
TL;DR: This paper has attempted to apply machine learning and deep learning techniques on a publicly available dataset and an accuracy of 97.32% is achieved, which indicates suitability ofDeep learning techniques over traditional machine learning techniques for the task of human activity recognition using mobile sensor data.
Abstract: The smartphone has become quite ubiquitous and an indispensable part of our lives in the modern day. It has many sensors which capture several minute details pertaining to our activities. So, it is but inevitable that human desire creeps in to augment and improve one's own actions by studying such behaviour captured through the instrumentalities of the smart-phone. In this context, study of data on human activities captured through accelerometer and gyroscope get primal significance. In this paper, we have attempted to apply machine learning and deep learning techniques on a publicly available dataset. Initially, classification algorithms like K-Nearest Neighbours and Random Forest are applied. The classification accuracies observed are 90.46% and 92.97% respectively. Using benchmark feature selection and dimensionality reduction techniques does not improve the model accuracies to a large extent - with reported accuracies of 91.48% and 92.56% respectively. However, on employing deep neural network techniques, an accuracy of 97.32% is achieved, which indicates suitability of deep learning techniques over traditional machine learning techniques for the task of human activity recognition using mobile sensor data.

8 citations


Proceedings ArticleDOI
01 Oct 2019
TL;DR: A comparative study of the stability of several well-known filter based feature selection algorithms, producing ranked feature sub set, has been done and shows that JMD-based feature selection algorithm exhibits more stability irrespective of all types of stability measures.
Abstract: Feature selection is an important step prior to classification stage of machine learning, pattern recognition and data mining problems for addressing the high dimensionality of the data. It removes irrelevant and redundant features which lead to simplify classification process and improve accuracy. Several feature selection algorithms have been proposed so far and quality of the selected feature subset varies from algorithm to algorithm. One of the measures for assessing the quality of a feature selection algorithm is its stability. Stability refers to the robustness of the selected feature set to small changes in the training set or set of various parameters of the algorithm. In this work, a comparative study of the stability of several well-known filter based feature selection algorithms, producing ranked feature sub set, has been done. Fifteen benchmark datasets from the UCI repository have been used for simulation experiments. Three types of stability measures, index-based, rank-based and weight based are used to evaluate the stability of feature selection algorithms. Simulation results demonstrate that for most of the datasets, JMD-based feature selection algorithm exhibits more stability irrespective of all types of stability measures. It is also observed that Relief shows the least stability.

2 citations


Book ChapterDOI
TL;DR: The user engagement statistics, popular post types, trends in traffic updates of Kolkata, and extracting sentiments of the public are analyzed to provide practical useful insights for the users, as well as the community who are working on the policing frameworks.
Abstract: With the increasing social media buzz, connectivity, interaction and power, it is highly important to utilize the insights generated from these sites for better performance, expansion, and opportunities. Police departments are leveraging these online platforms to spread awareness, connect to the mass, know their concerns and opinions to use their potential accordingly for better satisfaction. We have considered the official Facebook page maintained by Kolkata Traffic Police Department for analysis. The empirical study consists of 773 page owner posts and 595 visitors’ posts, 301 reviews, and comments on page owner posts. We analyze the user engagement statistics, popular post types, trends in traffic updates of Kolkata, and extracting sentiments of the public. Our results provide practical useful insights for the users, as well as the community who are working on the policing frameworks.

1 citations