scispace - formally typeset
Search or ask a question
Author

Kiran Bhowmick

Bio: Kiran Bhowmick is an academic researcher from Dwarkadas J. Sanghvi College of Engineering. The author has contributed to research in topics: Statistical classification & Support vector machine. The author has an hindex of 6, co-authored 18 publications receiving 236 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: A survey is presented which covers the problem of sentiment analysis, techniques and methods used for the same and the major challenge lies in analyzing the sentiments and identifying emotions expressed in texts.
Abstract: A huge amount of online information, rich web resources are highly unstructured and such natural language are not solvable by machine directly. The increased demand to capture opinions of general public about social events, campaigns and sales of the product has led to study of the field opinion mining and sentiment analysis. Opinion refers to extraction of lines in raw data which expresses an opinion. Sentiment analysis identifies polarity of extracted opinions. The major challenge lies in analyzing the sentiments and identifying emotions expressed in texts. This paper presents a survey which covers a problem of sentiment analysis, techniques and methods used for the same.

170 citations

Journal ArticleDOI
TL;DR: A new technique of identifying virus infected files by using Fisher Score and applying them as input to the neural network is proposed.
Abstract: A virus is defined as a program that spreads or replicates by copying itself, and generally has malicious effects. The antivirus systems used today mainly detect malware on the basis of known virus patterns, making detection of a new virus very difficult. This deficiency can be overcome by training an artificial neural network with the inputs from Portable Executable (PE) Structure of executable files, as they learn from the training data and will be able to identify unknown virus patterns. PE Structure contains various fields by which one can identify virus infected executable files from the legitimate ones without executing them, and Fisher Score can be used to select the most relevant features (fields) to speed up the analysis. A new technique of identifying virus infected files by using Fisher Score and applying them as input to the neural network is proposed. General Terms Virus, Program, Patterns, Executable files

18 citations

Book ChapterDOI
01 Jan 2018
TL;DR: A potential use of outlier detection to identify irregular events that cause traffic congestion is proposed and a future research direction is discussed.
Abstract: With the advent of Global Positioning System (GPS) and extensive use of smartphones, trajectory data for moving objects is available easily and at cheaper price. Moreover, the use of GPS devices in vehicles is now possible to keep a track of moving vehicles on the road. It is also possible to identify anomalous behavior of vehicle with this trajectory data. In the field of trajectory mining, outlier detection of trajectories has become one of the important topics that can be used to detect anomalies in the trajectories. In this paper, certain existing issues and challenges of trajectory data are identified and a future research direction is discussed. This paper proposes a potential use of outlier detection to identify irregular events that cause traffic congestion.

13 citations

Proceedings ArticleDOI
17 Mar 2017
TL;DR: Results show that K-means helps in balancing the data and hence the accuracy and time taken to classify balanced dataset is much better than simply classifying the imbalanced dataset.
Abstract: The task of accurately predicting the target class for each case in the data is called classification of data in data mining. Classification of balanced data set is fairly simple and easy to perform but it becomes difficult when the data is not balanced. Class Imbalance problem is the problem in machine learning where the total number of a class of data (positive) is far less than the total number of another class of data (negative). In this paper, we have used K-Means algorithm to balance the imbalanced dataset and then use SVM to classify the balanced dataset. We have compared the accuracy, precision, recall and time taken in classifying balanced as well as imbalanced datasets and results show that K-means helps in balancing the data and hence the accuracy and time taken to classify balanced dataset is much better than simply classifying the imbalanced dataset.

10 citations

Journal ArticleDOI
TL;DR: This paper tries to parallelize the FP-Growth algorithm on multicore machines, partition the huge database, into the number of cores, and utilize the combined strength of all the cores, to achieve maximum performance.

9 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: An overview of text classification algorithms is discussed, which covers different text feature extractions, dimensionality reduction methods, existing algorithms and techniques, and evaluations methods.
Abstract: In recent years, there has been an exponential growth in the number of complex documents and texts that require a deeper understanding of machine learning methods to be able to accurately classify texts in many applications. Many machine learning approaches have achieved surpassing results in natural language processing. The success of these learning algorithms relies on their capacity to understand complex models and non-linear relationships within data. However, finding suitable structures, architectures, and techniques for text classification is a challenge for researchers. In this paper, a brief overview of text classification algorithms is discussed. This overview covers different text feature extractions, dimensionality reduction methods, existing algorithms and techniques, and evaluations methods. Finally, the limitations of each technique and their application in the real-world problem are discussed.

612 citations

Posted Content
TL;DR: In this article, the authors investigated the research development, current trends and intellectual structure of topic modeling based on Latent Dirichlet Allocation (LDA), and summarized challenges and introduced famous tools and datasets in topic modelling based on LDA.
Abstract: Topic modeling is one of the most powerful techniques in text mining for data mining, latent data discovery, and finding relationships among data, text documents. Researchers have published many articles in the field of topic modeling and applied in various fields such as software engineering, political science, medical and linguistic science, etc. There are various methods for topic modeling, which Latent Dirichlet allocation (LDA) is one of the most popular methods in this field. Researchers have proposed various models based on the LDA in topic modeling. According to previous work, this paper can be very useful and valuable for introducing LDA approaches in topic modeling. In this paper, we investigated scholarly articles highly (between 2003 to 2016) related to Topic Modeling based on LDA to discover the research development, current trends and intellectual structure of topic modeling. Also, we summarize challenges and introduce famous tools and datasets in topic modeling based on LDA.

546 citations

Journal ArticleDOI
TL;DR: The results of this study show that worldwide energy crises can be managed by integrating renewable energy sources in the power generation and the lack of public awareness is a major barrier to the acceptance of renewable energy technologies.
Abstract: The use of renewable energy resources, such as solar, wind, and biomass will not diminish their availability. Sunlight being a constant source of energy is used to meet the ever-increasing energy need. This review discusses the world's energy needs, renewable energy technologies for domestic use, and highlights public opinions on renewable energy. A systematic review of the literature was conducted from 2009 to 2018. During this process, more than 300 articles were classified and 42 papers were filtered for critical review. The literature analysis showed that despite serious efforts at all levels to reduce reliance on fossil fuels by promoting renewable energy as its alternative, fossil fuels continue to contribute 73.5% to the worldwide electricity production in 2017. Conversely, renewable sources contributed only 26.5%. Furthermore, this study highlights that the lack of public awareness is a major barrier to the acceptance of renewable energy technologies. The results of this study show that worldwide energy crises can be managed by integrating renewable energy sources in the power generation. Moreover, in order to facilitate the development of renewable energy technologies, this systematic review has highlighted the importance of public opinion and performed a real-time analysis of public tweets. This example of tweet analysis is a relatively novel initiative in a review study that will seek to direct the attention of future researchers and policymakers toward public opinion and recommend the implications to both academia and industries.

426 citations

Proceedings ArticleDOI
13 May 2013
TL;DR: This work investigates whether the signals in social media can potentially help sentiment analysis by providing a unified way to model two main categories of emotional signals, i.e., emotion indication and emotion correlation and incorporates the signals into an unsupervised learning framework for sentiment analysis.
Abstract: The explosion of social media services presents a great opportunity to understand the sentiment of the public via analyzing its large-scale and opinion-rich data In social media, it is easy to amass vast quantities of unlabeled data, but very costly to obtain sentiment labels, which makes unsupervised sentiment analysis essential for various applications It is challenging for traditional lexicon-based unsupervised methods due to the fact that expressions in social media are unstructured, informal, and fast-evolving Emoticons and product ratings are examples of emotional signals that are associated with sentiments expressed in posts or words Inspired by the wide availability of emotional signals in social media, we propose to study the problem of unsupervised sentiment analysis with emotional signals In particular, we investigate whether the signals can potentially help sentiment analysis by providing a unified way to model two main categories of emotional signals, ie, emotion indication and emotion correlation We further incorporate the signals into an unsupervised learning framework for sentiment analysis In the experiment, we compare the proposed framework with the state-of-the-art methods on two Twitter datasets and empirically evaluate our proposed framework to gain a deep understanding of the effects of emotional signals

374 citations

Journal ArticleDOI
TL;DR: The thesis is that multimodal sentiment analysis holds a significant untapped potential with the arrival of complementary data streams for improving and going beyond text-based sentiment analysis.

357 citations