Showing papers by "Saptarsi Goswami published in 2019"

PDF

Open Access

Proceedings Article•DOI•

Utilization of Oversampling for multiclass sentiment analysis on Amazon Review Dataset

[...]

Anirban Mukherjee¹, Sabyasachi Mukhopadhyay, Prasanta K. Panigrahi², Saptarsi Goswami³•Institutions (3)

University of Engineering & Management¹, Indian Institute of Science Education and Research, Kolkata², University of Calcutta³

01 Oct 2019

TL;DR: This research work uses variations of recurrent neural networks, such as simple RNN, GRU, LSTM and Bidirectional L STM, to find out which model performs the best in multi-class classification of sentiment.

...read moreread less

Abstract: Sentiment Analysis is a major element in Artificial Intelligence. Its applications include machine translation, text analysis, computational linguistics, etc. In most cases, classification of sentiment is done into two or three classes. But in some situations, for example rating a product from Amazon, there are multiple classes. One major challenge in such tasks is the class imbalance which reduces the accuracy by making the model biased. To deal with this problem, we use oversampling to reduce the class imbalance of the dataset before training the model. In this research work, first we use variations of recurrent neural networks, such as simple RNN, GRU, LSTM and Bidirectional LSTM, to find out which model performs the best in multi-class classification of sentiment. Then, we use that model to understand the effect of oversampling a dataset before using it to train a model.

...read moreread less

15 citations

Journal Article•DOI•

An approach of feature selection using graph-theoretic heuristic and hill climbing

[...]

Saptarsi Goswami¹, Amit Kumar Das¹, Priyanka Guha, Arunabha Tarafdar, Sanjay Chakraborty, Amlan Chakrabarti¹, Basabi Chakraborty² - Show less +3 more•Institutions (2)

Information Technology University¹, Iwate Prefectural University²

01 May 2019-Pattern Analysis and Applications

TL;DR: A feature selection algorithm, which represents the dataset as a graph and then uses maximal independent sets and minimal vertex covers to improve traditional hill climbing search and produces statistically significant improvement over standard search-based methods.

...read moreread less

Abstract: Search-based methods that use matrix- or vector-based representations of the dataset are commonly employed to solve the problem of feature selection. These methods are more generalized and easy to apply. Recently, a set of algorithms have started using graph-based representation of the dataset instead of the traditional representations. These methods require additional modelling as the dataset needs to be represented as a graph. However, graph-based methods help in visualizing inter-feature relationship based on which graph-theoretic principles can be applied to identify good-quality feature subsets. A combination of the graph-based representation with traditional search techniques has the potential to increase model performance as well as interpretability. As per literature study, there is hardly any method which combines these approaches. In this paper, we have proposed a feature selection algorithm, which represents the dataset as a graph and then uses maximal independent sets and minimal vertex covers to improve traditional hill climbing search. The proposed method produces statistically significant improvement over (i) hill climbing, (ii) standard search-based methods and (iii) pure graph-based methods.

...read moreread less

13 citations

Book Chapter•DOI•

Filter-Based Feature Selection Methods Using Hill Climbing Approach

[...]

Saptarsi Goswami¹, Sanjay Chakraborty², Priyanka Guha, Arunabha Tarafdar, Aman Kedia - Show less +1 more•Institutions (2)

University of Calcutta¹, Techno India²

01 Jan 2019

TL;DR: A comparison has been done with genetic algorithm, which shows the effectiveness of hill climbing methods in the context of feature selection.

...read moreread less

Abstract: Feature selection remains one of the most important steps for usability of a model for both supervised and unsupervised classification. For a dataset, with n features, the number of possible feature subsets is 2n. Even for a moderate size of n, there is a combinatorial explosion in the search space. Feature selection is a NP-hard problem; hence finding the optimal solution is not feasible. Typically various kinds of intelligent and metaheuristic search techniques can be employed for this purpose. Hill climbing is arguably the simplest of such techniques. It has many variants based on (a) trade-off between greediness and randomness, (b) direction of the search, and (c) size of the neighborhood. Consequently it might not be trivial for the practitioner to choose a suitable method for the task in hand. In this paper, we have attempted to address this issue in the context of feature selection. The descriptions of the methods are followed by an extensive empirical study over 20 publicly available datasets. Finally a comparison has been done with genetic algorithm, which shows the effectiveness of hill climbing methods in the context of feature selection.

...read moreread less

12 citations

Proceedings Article•DOI•

Jeffries-Matusita distance as a tool for feature selection

[...]

Rikta Sen¹, Saptarsi Goswami², Basabi Chakraborty¹•Institutions (2)

Iwate Prefectural University¹, University of Calcutta²

01 Sep 2019

TL;DR: Jeffries-Matusita (JM) distance improves Bhattacharya distance by normalizing it between 0 and 2, which can provide a good intuition on how good a dataset is for classification and point out the need of or lack of further feature collection.

...read moreread less

Abstract: Feature selection is one of the most important preprocessing steps in Machine Learning. This can be broadly divided into search based methods and ranking based methods. The ranking based methods are very popular because they need much lesser computational power. There can be many different ways to rank the features. One of the ways to measure effectiveness of a feature is by evaluating its ability to separate the classes involved. These interclass Separability based measures can be directly used as a feature ranking tool for binary classification problems. Bhattacharya Distance which is the most popular among them has been used majorly in a recursive setup to select good quality feature subsets. Jeffries-Matusita (JM) distance improves Bhattacharya distance by normalizing it between 0 and 2. In this paper, we have ranked the features based on JM distance. The results are comparable with mutual information, Relief and Chi Squared based measures as per experiments conducted over 24 public datasets but in much lesser time. JM distance also provide some intuition about the dataset prior to any feature selection or machine learning algorithm. A comparison has been done on classification accuracy and JM scores of these datasets, which can provide a good intuition on how good a dataset is for classification and point out the need of or lack of further feature collection.

...read moreread less

9 citations

Proceedings Article•DOI•

Human Activity Recognition using Deep Neural Network

[...]

Piyush Mishra¹, Sourankana Dey², Suvro Shankar Ghosh, Dibyendu Bikash Seal³, Saptarsi Goswami³ - Show less +1 more•Institutions (3)

International Institute of Information Technology¹, Institute of Technology and Marine Engineering², University of Calcutta³

01 Sep 2019

TL;DR: This paper has attempted to apply machine learning and deep learning techniques on a publicly available dataset and an accuracy of 97.32% is achieved, which indicates suitability ofDeep learning techniques over traditional machine learning techniques for the task of human activity recognition using mobile sensor data.

...read moreread less

Abstract: The smartphone has become quite ubiquitous and an indispensable part of our lives in the modern day. It has many sensors which capture several minute details pertaining to our activities. So, it is but inevitable that human desire creeps in to augment and improve one's own actions by studying such behaviour captured through the instrumentalities of the smart-phone. In this context, study of data on human activities captured through accelerometer and gyroscope get primal significance. In this paper, we have attempted to apply machine learning and deep learning techniques on a publicly available dataset. Initially, classification algorithms like K-Nearest Neighbours and Random Forest are applied. The classification accuracies observed are 90.46% and 92.97% respectively. Using benchmark feature selection and dimensionality reduction techniques does not improve the model accuracies to a large extent - with reported accuracies of 91.48% and 92.56% respectively. However, on employing deep neural network techniques, an accuracy of 97.32% is achieved, which indicates suitability of deep learning techniques over traditional machine learning techniques for the task of human activity recognition using mobile sensor data.

...read moreread less

8 citations

Proceedings Article•DOI•

A Comparative Study of the Stability of Filter based Feature Selection Algorithms

[...]

Rikta Sen¹, Ashis Kumar Mandal¹, Saptarsi Goswami², Basabi Chakraborty¹•Institutions (2)

Iwate Prefectural University¹, University of Calcutta²

01 Oct 2019

TL;DR: A comparative study of the stability of several well-known filter based feature selection algorithms, producing ranked feature sub set, has been done and shows that JMD-based feature selection algorithm exhibits more stability irrespective of all types of stability measures.

...read moreread less

Abstract: Feature selection is an important step prior to classification stage of machine learning, pattern recognition and data mining problems for addressing the high dimensionality of the data. It removes irrelevant and redundant features which lead to simplify classification process and improve accuracy. Several feature selection algorithms have been proposed so far and quality of the selected feature subset varies from algorithm to algorithm. One of the measures for assessing the quality of a feature selection algorithm is its stability. Stability refers to the robustness of the selected feature set to small changes in the training set or set of various parameters of the algorithm. In this work, a comparative study of the stability of several well-known filter based feature selection algorithms, producing ranked feature sub set, has been done. Fifteen benchmark datasets from the UCI repository have been used for simulation experiments. Three types of stability measures, index-based, rank-based and weight based are used to evaluate the stability of feature selection algorithms. Simulation results demonstrate that for most of the datasets, JMD-based feature selection algorithm exhibits more stability irrespective of all types of stability measures. It is also observed that Relief shows the least stability.

...read moreread less

2 citations

Book Chapter•DOI•

Study of Social Media Activity of Local Traffic Police Department: Their Posting Nature, Interaction, and Reviews of the Public

[...]

Mohammed Allama Hossain, Rashika Daga, Saptarsi Goswami¹, S. Chakrabarti•Institutions (1)

University of Calcutta¹

01 Jan 2019-Advances in intelligent systems and computing

TL;DR: The user engagement statistics, popular post types, trends in traffic updates of Kolkata, and extracting sentiments of the public are analyzed to provide practical useful insights for the users, as well as the community who are working on the policing frameworks.

...read moreread less

Abstract: With the increasing social media buzz, connectivity, interaction and power, it is highly important to utilize the insights generated from these sites for better performance, expansion, and opportunities. Police departments are leveraging these online platforms to spread awareness, connect to the mass, know their concerns and opinions to use their potential accordingly for better satisfaction. We have considered the official Facebook page maintained by Kolkata Traffic Police Department for analysis. The empirical study consists of 773 page owner posts and 595 visitors’ posts, 301 reviews, and comments on page owner posts. We analyze the user engagement statistics, popular post types, trends in traffic updates of Kolkata, and extracting sentiments of the public. Our results provide practical useful insights for the users, as well as the community who are working on the policing frameworks.

...read moreread less

1 citations