scispace - formally typeset
Search or ask a question
Author

Saptarsi Goswami

Bio: Saptarsi Goswami is an academic researcher from Bangabasi College. The author has contributed to research in topics: Feature selection & Cluster analysis. The author has an hindex of 11, co-authored 62 publications receiving 475 citations. Previous affiliations of Saptarsi Goswami include Information Technology University & University of Calcutta.

Papers published on a yearly basis

Papers
More filters
Journal ArticleDOI
TL;DR: An extensive and in-depth literature study on current techniques for disaster prediction, detection and management has been done and the results are summarized according to various types of disasters.

120 citations

Journal ArticleDOI
TL;DR: A two-step feature selection method based on firstly a univariate feature selection and then feature clustering, where the proposed algorithm is shown to outperform other traditional methods like greedy search based wrapper or CFS.
Abstract: With the proliferation of unstructured data, text classification or text categorization has found many applications in topic classification, sentiment analysis, authorship identification, spam detection, and so on. There are many classification algorithms available. Naive Bayes remains one of the oldest and most popular classifiers. On one hand, implementation of naive Bayes is simple and, on the other hand, this also requires fewer amounts of training data. From the literature review, it is found that naive Bayes performs poorly compared to other classifiers in text classification. As a result, this makes the naive Bayes classifier unusable in spite of the simplicity and intuitiveness of the model. In this paper, we propose a two-step feature selection method based on firstly a univariate feature selection and then feature clustering, where we use the univariate feature selection method to reduce the search space and then apply clustering to select relatively independent feature sets. We demonstrate the effectiveness of our method by a thorough evaluation and comparison over 13 datasets. The performance improvement thus achieved makes naive Bayes comparable or superior to other classifiers. The proposed algorithm is shown to outperform other traditional methods like greedy search based wrapper or CFS.

57 citations

Journal ArticleDOI
TL;DR: A hybrid feature selection algorithm using graph-based technique has been proposed, which has used the concept of Feature Association Map as an underlying foundation and has used graph-theoretic principles of minimal vertex cover and maximal independent set to derive feature subset.
Abstract: Feature selection, both for supervised as well as for unsupervised classification is a relevant problem pursued by researchers for decades. There are multiple benchmark algorithms based on filter, wrapper and hybrid methods. These algorithms adopt different techniques which vary from traditional search-based techniques to more advanced nature inspired algorithm based techniques. In this paper, a hybrid feature selection algorithm using graph-based technique has been proposed. The proposed algorithm has used the concept of Feature Association Map (FAM) as an underlying foundation. It has used graph-theoretic principles of minimal vertex cover and maximal independent set to derive feature subset. This algorithm applies to both supervised and unsupervised classification. The performance of the proposed algorithm has been compared with several benchmark supervised and unsupervised feature selection algorithms and found to be better than them. Also, the proposed algorithm is less computationally expensive and hence has taken less execution time for the publicly available datasets used in the experiments, which include high-dimensional datasets.

51 citations

Book ChapterDOI
01 Jan 2020
TL;DR: A concise description of the existing types of clustering approaches is given followed by a survey of the fields where clustering analytics has been effectively employed in pattern recognition and knowledge discovery.
Abstract: In modern world, we have to deal with huge volumes of data which include image, video, text and web documents, DNA, microarray gene data, etc. Organizing such data into rational groups is a critical first step to draw inferences. Data clustering analysis has emerged as an effective technique to accurately accomplish the task of categorizing data into sensible groups. Clustering has a rich association with researches in various scientific domains. One of the most popular clustering algorithms, k-means algorithm was proposed as early as 1957. Since then, many clustering algorithms have been developed and used, to group data in various commercial and non-commercial sectors alike. In this paper, we have given concise description of the existing types of clustering approaches followed by a survey of the fields where clustering analytics has been effectively employed in pattern recognition and knowledge discovery.

46 citations

Journal ArticleDOI
TL;DR: A near comprehensive list of problems that have been solved using feature selection across technical and commercial domain is produced and can serve as a valuable tool to practitioners across industry and academia.
Abstract: Feature selection is one of the most important preprocessing steps in data mining and knowledge Engineering. In this short review paper, apart from a brief taxonomy of current feature selection methods, we review feature selection methods that are being used in practice. Subsequently we produce a near comprehensive list of problems that have been solved using feature selection across technical and commercial domain. This can serve as a valuable tool to practitioners across industry and academia. We also present empirical results of filter based methods on various datasets. The empirical study covers task of classification, regression, text classification and clustering respectively. We also compare filter based ranking methods using rank correlation.

44 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: It is found that Turkish tweets carried slightly more positive sentiments towards Syrians and refugees than neutral and negative sentiments, nevertheless the sentiments of tweets were almost evenly distributed among the three major categories.

221 citations

Journal ArticleDOI
TL;DR: In this paper, the authors conceptualized an original theoretical model to show, using the competing value model (CVM), how big data analytics capability (BDAC) under a moderating influence of organizational culture, affects swift trust (ST) and collaborative performance (CP).

214 citations

Journal ArticleDOI
TL;DR: This study examines big data in DM to present main contributions, gaps, challenges and future research agenda, and shows a classification of publications, an analysis of the trends and the impact of published research in the DM context.
Abstract: The era of big data and analytics is opening up new possibilities for disaster management (DM). Due to its ability to visualize, analyze and predict disasters, big data is changing the humanitarian operations and crisis management dramatically. Yet, the relevant literature is diverse and fragmented, which calls for its review in order to ascertain its development. A number of publications have dealt with the subject of big data and its applications for minimizing disasters. Based on a systematic literature review, this study examines big data in DM to present main contributions, gaps, challenges and future research agenda. The study presents the findings in terms of yearly distribution, main journals, and most cited papers. The findings also show a classification of publications, an analysis of the trends and the impact of published research in the DM context. Overall the study contributes to a better understanding of the importance of big data in disaster management.

211 citations

Journal ArticleDOI
TL;DR: This work considers Bayesian network classifiers to perform sentiment analysis on two datasets in Spanish: the 2010 Chilean earthquake and the 2017 Catalan independence referendum, and adopts a Bayes factor approach, yielding more realistic networks.

178 citations