Showing papers by "Saptarsi Goswami published in 2021"

PDF

Open Access

Journal Article•DOI•

Designing a long short-term network for short-term forecasting of global horizontal irradiance

[...]

Sourav Malakar¹, Saptarsi Goswami², Bhaswati Ganguli¹, Amlan Chakrabarti¹, Sugata Sen Roy¹, K. Boopathi³, A. G. Rangaraj³ - Show less +3 more•Institutions (3)

University of Calcutta¹, Bangabasi College², Government of India³

01 Apr 2021

TL;DR: An empirical investigation based on data from three solar stations from two climatic zones of India over two seasons finds that the number of nodes in an LSTM network, as well as batch size, is influenced by the variability of the input data.

...read moreread less

Abstract: Long short-term memory (LSTM) models based on specialized deep neural network-based architecture have emerged as an important model for forecasting time-series. However, the literature does not provide clear guidelines for design choices, which affect forecasting performance. Such choices include the need for pre-processing techniques such as deseasonalization, ordering of the input data, network size, batch size, and forecasting horizon. We detail this in the context of short-term forecasting of global horizontal irradiance, an accepted proxy for solar energy. Particularly, short-term forecasting is critical because the cloud conditions change at a sub-hourly having large impacts on incident solar radiation. We conduct an empirical investigation based on data from three solar stations from two climatic zones of India over two seasons. From an application perspective, it may be noted that despite the thrust given to solar energy generation in India, the literature contains few instances of robust studies across climatic zones and seasons. The model thus obtained subsequently outperformed three recent benchmark methods based on random forest, recurrent neural network, and LSTM, respectively, in terms of forecasting accuracy. Our findings underscore the importance of considering the temporal order of the data, lack of any discernible benefit from data pre-processing, the effect of making the LSTM model stateful. It is also found that the number of nodes in an LSTM network, as well as batch size, is influenced by the variability of the input data.

...read moreread less

20 citations

Book Chapter•DOI•

Analysis of GHI Forecasting Using Seasonal ARIMA

[...]

Aditya Kumar Barik¹, Sourav Malakar¹, Saptarsi Goswami², Bhaswati Ganguli³, Sugata Sen Roy³, Amlan Chakrabarti¹ - Show less +2 more•Institutions (3)

Information Technology University¹, Bangabasi College², University of Calcutta³

01 Jan 2021

TL;DR: In this article, the authors used an Auto-Regressive Integrated Moving Average (ARIMA) model to predict the seasonality of the Global Horizontal Irradiance (GHI) data.

...read moreread less

Abstract: A precise understanding of solar energy generation is important for many reasons like storage, delivery, and integration. Global Horizontal Irradiance (GHI) is the strongest predictor of actual generation. Hence, the solar energy prediction problem can be attempted by predicting GHI. Auto-Regressive Integrated Moving Average (ARIMA) is one of the fundamental models for time series prediction. India is a country with significant solar energy possibilities and with extremely high weather variability across climatic zones. However, rigorous study over different climatic zones seems to be lacking from the literature study. In this paper, 90 solar stations have been considered from the 5 different climatic zones of India and an ARIMA model has been used for prediction for the month of August, the month with most variability in GHI. The prediction of the models has also been analyzed in terms of Root Mean Square Error. The components of the AR models have also been investigated critically for all climatic zones. In this study, some issues were observed for the ARIMA model where the model is not being able to predict the seasonality that is present in the data. Hence, a Seasonal ARIMA (SARIMA) model has also been used as it is more capable in case of seasonal data and the GHI data exhibits a strong seasonality pattern due to its availability only in the day time. Lastly, a comparison has also been done between the two models in terms of RMSE and 7 days Ahead Prediction.

...read moreread less

5 citations

Journal Article•DOI•

An efficient methodology for aspect-based sentiment analysis using BERT through refined aspect extraction

[...]

Wazib Ansar¹, Saptarsi Goswami², Amlan Chakrabarti¹, Basabi Chakraborty³•Institutions (3)

University of Calcutta¹, Bangabasi College², Iwate Prefectural University³

01 Jan 2021-Journal of Intelligent and Fuzzy Systems

TL;DR: A unique set of rules has been formulated to extract aspect-opinion phrases and yields enhanced performance and efficiency compared to other state-of-the-art methods.

...read moreread less

Abstract: Aspect-Based Sentiment Analysis (ABSA) has become a trending research domain due to its ability to transform lives as well as the technical challenges involved in it. In this paper, a unique set of rules has been formulated to extract aspect-opinion phrases. It helps to reduce the average sentence length by 84% and the complexity of the text by 50%. A modified rank-based version of Term-Frequency - Inverse-Document-Frequency (TF-IDF) has been proposed to identify significant aspects. An innovative word representation technique has been applied for aspect categorization which identifies both local as well as global context of a word. For sentiment classification, pre-trained Bidirectional Encoder Representations from Transformers (BERT) has been applied as it helps to capture long-term dependencies and reduce the overhead of training the model from scratch. However, BERT has drawbacks like quadratic drop in efficiency with an increase in sequence length which is limited to 512 tokens. The proposed methodology mitigates these drawbacks of a typical BERT classifier accompanied by a rise in efficiency along with an improvement of 8% in its accuracy. Furthermore, it yields enhanced performance and efficiency compared to other state-of-the-art methods. The assertions have been established through extensive analysis upon movie reviews and Sentihood data-sets.

...read moreread less

4 citations

Book Chapter•DOI•

Efficacy of Oversampling Over Machine Learning Algorithms in Case of Sentiment Analysis

[...]

Deb Prakash Chatterjee¹, Sabyasachi Mukhopadhyay, Saptarsi Goswami², Prasanta K. Panigrahi³•Institutions (3)

Techno India University¹, University of Calcutta², Indian Institute of Science Education and Research, Kolkata³

01 Jan 2021

TL;DR: This way of approach is to understand the sentiment of a text considering as a boon in the customer management system and can easily be applied to the social media sites, such as twitter or e-commerce websites, like amazon to get the customer review and analyze.

...read moreread less

Abstract: Text classification is a very important problem in artificial intelligence domain and covers a wide portion in natural language processing, which can be called as sentiment analysis. Sentiment analysis is basically extracting the tone or emotion of the writer, by understanding the text sequence. This way of approach is to understand the sentiment of a text considering as a boon in the customer management system and can easily be applied to the social media sites, such as twitter or e-commerce websites, like amazon to get the customer review and analyze. Sentiment analysis can be binary or multiclass, here in our approach, we will consider both of them, by doing a comparative study between long short-term memory (LSTM), random forest, support vector machine(SVM), and XGBoost, to check if they can be as good as LSTM in any case. Also, as we discover the data distribution problem in our datasets, so we will be applying oversampling to make the distribution in a stabilized form.

...read moreread less

4 citations

Book Chapter•DOI•

An Approach of Feature Subset Selection Using Simulated Quantum Annealing

[...]

Ashis Kumar Mandal¹, Mrityunjoy Panday², Aniruddha Biswas², Saptarsi Goswami², Amlan Chakrabarti², Basabi Chakraborty¹ - Show less +2 more•Institutions (2)

Iwate Prefectural University¹, University of Calcutta²

01 Jan 2021

TL;DR: This paper reduces classical benchmark feature evaluation functions like mRMR, JMI, and FCBF to QUBO formulation, enabling the use of quantum annealing based optimization to feature selection and empirical results confirm that SQA is able to produce at most equal or less number of features in all selected subset compared to SA.

...read moreread less

Abstract: Feature selection is one of the important preprocessing steps in machine learning and data mining domain However, finding the best feature subsets for large datasets is computationally expensive task Meanwhile, quantum computing has emerged as a new computational model that is able to speed up many classical computationally expensive problems Annealing-based quantum model, for example, finds the lowest energy state of an Ising model Hamiltonian, which is the formalism for Quadratic Unconstrained Binary Optimization (QUBO) Due to its capabilities in producing quality solution to the hard combinatorial optimization problems with less computational effort, quantum annealing has the potentiality in feature subset selection Although several hard optimization problems are solved, using quantum annealing, not sufficient work has been done on quantum annealing based feature subset selection Though the reported approaches have good theoretical foundation, they usually lack required empirical rigor In this paper, we attempt to reduce classical benchmark feature evaluation functions like mRMR, JMI, and FCBF to QUBO formulation, enabling the use of quantum annealing based optimization to feature selection We then apply QUBO of ten datasets using both Simulated Annealing (SA) and Simulated Quantum Annealing (SQA) and compared the result Our empirical results confirm that, for seven in ten datasets, SQA is able to produce at most equal or less number of features in all selected subset compared to SA does SQA also results in stable feature subsets for all datasets

...read moreread less

4 citations

Journal Article•DOI•

Comparative Study of Univariate and Multivariate Long Short-Term Memory for Very Short-Term Forecasting of Global Horizontal Irradiance

[...]

Ashis Kumar Mandal, Rikta Sen, Saptarsi Goswami, Basabi Chakraborty

23 Aug 2021-Symmetry

TL;DR: In this article, the authors developed and compared univariate and several multivariate LSTM models that can predict global horizontal irradiance (GHI) in Guntur, India on a very short-term basis.

...read moreread less

Abstract: Accurate global horizontal irradiance (GHI) forecasting is crucial for efficient management and forecasting of the output power of photovoltaic power plants. However, developing a reliable GHI forecasting model is challenging because GHI varies over time, and its variation is affected by changes in weather patterns. Recently, the long short-term memory (LSTM) deep learning network has become a powerful tool for modeling complex time series problems. This work aims to develop and compare univariate and several multivariate LSTM models that can predict GHI in Guntur, India on a very short-term basis. To build the multivariate time series models, we considered all possible combinations of temperature, humidity, and wind direction variables along with GHI as inputs and developed seven multivariate models, while in the univariate model, we considered only GHI variability. We collected the meteorological data for Guntur from 1 January 2016 to 31 December 2016 and built 12 datasets, each containing variability of GHI, temperature, humidity, and wind direction of a month. We then constructed the models, each of which measures up to 2 h ahead of forecasting of GHI. Finally, to measure the symmetry among the models, we evaluated the performances of the prediction models using root mean square error (RMSE) and mean absolute error (MAE). The results indicate that, compared to the univariate method, each multivariate LSTM performs better in the very short-term GHI prediction task. Moreover, among the multivariate LSTM models, the model that incorporates the temperature variable with GHI as input has outweighed others, achieving average RMSE values 0.74 W/m2–1.5 W/m2.

...read moreread less

3 citations

Book Chapter•DOI•

Non-parametric Distance—A New Class Separability Measure

[...]

Sayoni Roychowdhury¹, Aditya Basak, Saptarsi Goswami¹•Institutions (1)

University of Calcutta¹

01 Jan 2021

TL;DR: This experiment proves that the new box-plot-based method can provide greater accuracy and efficiency than the conventional ranking-based measures for feature selection such as the Chi-Square, Symmetric Uncertainty and Information Gain.

...read moreread less

Abstract: Feature Selection, one of the most important preprocessing steps in Machine Learning, is the process where we automatically or manually select those features which contribute most to our prediction variable or output in which we are interested This subset of features has some very important benefits: it reduces the computational complexity of learning algorithms, saves time, improves accuracy, and the selected features can be insightful for the people involved in the problem domain Among the different ways of performing feature selection such as filter, wrapper and hybrid, filter-based separability methods can be used as a feature ranking tool in binary classification problems, most popular being the Bhattacharya distance and the Jeffries–Matusita (JM) distance However, these measures are parametric and their computation requires knowledge of the distribution from which the samples are drawn In real life, we often come across instances where it is difficult to have an idea about the distribution of observations In this paper, we have presented a new non-parametric approach for performing feature selection called the ‘Non-Parametric Distance Measure’ The experiment with the new measure is performed over nine datasets and the results are compared with other ranking-based methods for feature selection using those datasets This experiment proves that the new box-plot-based method can provide greater accuracy and efficiency than the conventional ranking-based measures for feature selection such as the Chi-Square, Symmetric Uncertainty and Information Gain

...read moreread less

1 citations

Journal Article•DOI•

An effective feature subset selection approach based on Jeffries-Matusita distance for multiclass problems

[...]

Rikta Sen¹, Saptarsi Goswami², Ashis Kumar Mandal¹, Basabi Chakraborty¹•Institutions (2)

Iwate Prefectural University¹, Bangabasi College²

01 Jan 2021-Journal of Intelligent and Fuzzy Systems

TL;DR: A novel heuristic approach for finding out the optimum feature subset from JM distance based ranked feature lists for multiclass problems have been developed without explicitly using any specific search technique.

...read moreread less

Abstract: Jeffries-Matusita (JM) distance, a transformation of the Bhattacharyya distance, is a widely used measure of the spectral separability distance between the two class density functions and is generally used as a class separability measure. It can be considered to have good potential to be used for evaluation of the effectiveness of a feature in discriminating two classes. The capability of JM distance as a ranking based feature selection technique for binary classification problems has been verified in some research works as well as in our earlier work. It was found by our simulation experiments with benchmark data sets that JM distance works equally well compared to other popular feature ranking methods based on mutual information, information gain or Relief. Extension of JM distance measure for feature ranking in multiclass problems has also been reported in the literature. But all of them are basically rank based approaches which deliver the ranking of the features and do not automatically produce the final optimal feature subset. In this work, a novel heuristic approach for finding out the optimum feature subset from JM distance based ranked feature lists for multiclass problems have been developed without explicitly using any specific search technique. The proposed approach integrates the extension of JM measure for multiclass problems and the selection of the final optimal feature subset in a unified process. The performance of the proposed algorithm has been evaluated by simulation experiments with benchmark data sets in comparison with two other previously developed rank based feature selection algorithms with multiclass JM distance measures (weighted average JM distance and another multiclass extension equivalent to Bhattacharyya bound) and some other popular filter based feature ranking algorithms. It is found that the proposed algorithm performs better in terms of classification accuracy, F-measure, AUC with a reduced set of features and computational cost.

...read moreread less

1 citations

Book Chapter•DOI•

A Data Science Approach to Analysis of Tweets Based on Cyclone Fani

[...]

Wazib Ansar¹, Saptarsi Goswami¹, Amit Kumar Das•Institutions (1)

University of Calcutta¹

01 Jan 2021

TL;DR: Analysis of the tweets of people based on cyclone Fani, which struck the Eastern region of India and adjoining regions in the month of May 2019, has been presented and an analysis of tweet and retweet counts belonging to credible Twitter handles has been showcased.

...read moreread less

Abstract: The advent of social media has contributed to faster as well as wider propagation of information and emotions of people In the time of emergencies and natural disasters, social media becomes an important tool for communication, spreading of alerts and knowing the needs and feelings of people in crisis In this paper, analysis of the tweets of people based on cyclone Fani, which struck the Eastern region of India and adjoining regions in the month of May 2019, has been presented The study has been divided into three phases—onset of cyclone Fani, during cyclone Fani and aftermath of cyclone Fani As part of the primary analysis, Word Cloud representations have been used to depict the most frequent words in the tweets during all three phases of cyclone Fani After that, Word Embedding using Word2Vec has been carried out using both Skip-Gram and Continuous-Bag-of-Words approaches Using Principal Component Analysis, the results have been presented as bubble plots Then, Sentiment Analysis using Naive Bayes Classifier has been performed and the tweets were classified based on both polarity and subjectivity The results have been presented using graphical plots, and the accuracy of the results has been analyzed Finally, an analysis of tweet and retweet counts belonging to credible Twitter handles has been showcased

...read moreread less

1 citations

Book Chapter•DOI•

Force of Gravity Oriented Classification Technique in Machine Learning

[...]

Pinaki Prasad Guha Neogi¹, Saptarsi Goswami²•Institutions (2)

Meghnad Saha Institute of Technology¹, Information Technology University²

01 Jan 2021

TL;DR: A novel approach for machine learning classification by employing the concept of center of mass and force of gravity and the efficiency of the proposed method of classification can be perceived when the output obtained is compared with other standard and popular methods of classification.

...read moreread less

Abstract: The level of impersonation from nature is astonishingly high in the domain of Evolutionary Computation and Computational Intelligence and attempts are being constantly made to develop more number of algorithms, which completely or partially imitate nature and the activities that occur in a particular natural phenomenon. The same can also be employed in various aspects of machine learning such as classification methodologies. This paper puts forward a novel approach for machine learning classification by employing the concept of center of mass and force of gravity. The performance of the proposed force of gravity (Fg)-based classification technique has been evaluated on a number of standard and popular datasets. The efficiency of the proposed method of classification can be perceived when the output obtained is compared with other standard and popular methods of classification such as Logistic Regression, KNN, Decision Tree, and SVM.

...read moreread less

1 citations

Book Chapter•DOI•

A Survey on Sentiment Analysis

[...]

Deb Prakash Chatterjee¹, Anirban Mukherjee², Sabyasachi Mukhopadhyay³, Mrityunjoy Panday⁴, Prasanta K. Panigrahi⁵, Saptarsi Goswami⁶ - Show less +2 more•Institutions (6)

Techno India University¹, University of Engineering & Management², Islamic Azad University³, University of Calcutta⁴, Indian Institute of Science Education and Research, Kolkata⁵, Bangabasi College⁶

01 Jan 2021

TL;DR: A brief review of many technologies discovered by many scientists across the world and focus on some of the state-of-the-art works done in the domain of sentiment analysis.

...read moreread less

Abstract: Natural language processing (NLP) is a booming field in this era of data, where almost all businesses and organizations have access to many review sites, social media, and e-commerce websites. Recently, deep learning models have shown state-of-the-art results in NLP tasks. With the help of complex models like long-short term memory, various problems such as vanishing gradient problem have been diminished and new models like the attention model or aspect embedding increases accuracy. These made a drastic change in the field of sentiment analysis and made it more business-oriented, like most of the big business organizations, for example, Amazon and Flipkart, where it is used for analyzing details about their customer review. Some researchers have shown us a way to not even using complex models like LSTM we can do so, even better with adding gating mechanism to our well-known CNN. Watching all of these, we are going to do a brief review of many technologies discovered by many scientists across the world and focus on some of the state-of-the-art works done in the domain of sentiment analysis.

...read moreread less

Book Chapter•DOI•

An Empirical Analysis of Classifiers Using Ensemble Techniques

[...]

Reshu Parsuramka¹, Saptarsi Goswami¹, Sourav Malakar¹, Sanjay Chakraborty¹•Institutions (1)

University of Calcutta¹

01 Jan 2021

TL;DR: A comparison has been done with XGBoost and Random forest classifiers, which shows the effectiveness of the used ensemble methods for classification.

...read moreread less

Abstract: Ensemble methods are algorithms that combine various models together to give higher accuracy than individual models. The ensemble methods used here are majority voting, XGBoost, and random forest. Several decision trees are combined using voting classifier, Random forest tree, and XGBoost. These are considered as the best universal models which are used here to compare the accuracies with other models. The datasets are being split randomly 9, 18, and 27 times, respectively. The decision tree model is applied and later combined with voting classifier. The descriptions of the methods are followed by an extensive empirical study over 10 publicly available datasets. An ensemble model with five classifiers is also implemented that give us the accuracy of the model, and later all the accuracies are compared. Finally, a comparison has been done with XGBoost and Random forest classifiers, which shows the effectiveness of the used ensemble methods for classification.

...read moreread less

Book Chapter•DOI•

A Hybrid Graph Centrality Based Feature Selection Approach for Supervised Learning

[...]

Abhirup Banerjee¹, Saptarsi Goswami², Amit Kumar Das•Institutions (2)

University of Calcutta¹, Information Technology University²

01 Jan 2021

TL;DR: A hybrid graph-based approach for selecting a subset of features in the context of supervised learning using the utilisation of graph-theoretic concepts of degree centrality as well as eigenvector centrality to assess the relative importance of features.

...read moreread less

Abstract: Graph-based approaches of feature selection have found their application in diverse areas owing to their efficacy to detect potential connections among the features. In this paper, we propose a hybrid graph-based approach for selecting a subset of features in the context of supervised learning. The novelty of this approach lies in the utilisation of graph-theoretic concepts of degree centrality as well as eigenvector centrality to assess the relative importance of features. Additionally, entropy-based measures have been used. The performance of the proposed approach is compared to that of the existing benchmark algorithms using publicly available datasets. The results have come out to be quite encouraging.

...read moreread less