scispace - formally typeset
Search or ask a question
Author

Kedar Potdar

Bio: Kedar Potdar is an academic researcher. The author has contributed to research in topics: Index of industrial production & Wholesale price index. The author has an hindex of 2, co-authored 2 publications receiving 180 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: Results show that the data encoded with Sum Coding and Backward Difference Coding technique give highest accuracy as compared to the data pre-processed by rest of the techniques.
Abstract: In classification analysis, the dependent variable is frequently influenced not only by ratio scale variables, but also by qualitative (nominal scale) variables. Machine Learning algorithms accept only numerical inputs, hence, it is necessary to encode these categorical variables into numerical values using encoding techniques. This paper presents a comparative study of seven categorical variable encoding techniques to be used for classification using Artificial Neural Networks on a categorical dataset. The Car Evaluation dataset provided by UCI is used for training. Results show that the data encoded with Sum Coding and Backward Difference Coding technique give highest accuracy as compared to the data pre-processed by rest of the techniques.

332 citations

Proceedings ArticleDOI
14 Jul 2017
TL;DR: In this article, an artificial neural network (ANN) was applied to forecast IIIP using data spanning from 2004-05 to F.Y. 2013-14 of Gross Domestic Product (GDP), consumer price index (CPI), Wholesale Price Index (WPI), and Index of the Eight Core Industries (Electricity, Steel, Refinery Products, Crude Oil, Coal, Cement, Natural Gas and Fertilizers) to forecast IIP.
Abstract: For a developing country such as India, to have the best usage of resources, public planning requires good forecasts of future trends. India's Index of Industrial Production (IIIP) is an index which conveys the status of production in the industrial sector of the economy. In this study, an artificial neural network (ANN) was applied to forecast IIIP. Accordingly, the inputs to the ANN consisted of data spanning from F.Y. 2004–05 to F.Y. 2013–14 of Gross Domestic Product (GDP), Consumer Price Index (CPI), Wholesale Price Index (WPI) and Index of the Eight Core Industries (Electricity, Steel, Refinery Products, Crude Oil, Coal, Cement, Natural Gas and Fertilizers) to forecast IIIP. Therefore, a methodology for forecasting was developed using Nonlinear Autoregressive (NAR) and Nonlinear Autoregressive neural network with exogenous inputs (NARX) models. Several structures of the neural networks were tested for forecasting, and then the results were compared in terms of forecasting error. The NARX network with 11 hidden layers and 1 delay line provided the best results with a Mean Square Error (MSE) of 2.168. Thus, ANNs can be used for accurate forecasting of Industrial Production.

10 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

Journal ArticleDOI
TL;DR: Results show that the data encoded with Sum Coding and Backward Difference Coding technique give highest accuracy as compared to the data pre-processed by rest of the techniques.
Abstract: In classification analysis, the dependent variable is frequently influenced not only by ratio scale variables, but also by qualitative (nominal scale) variables. Machine Learning algorithms accept only numerical inputs, hence, it is necessary to encode these categorical variables into numerical values using encoding techniques. This paper presents a comparative study of seven categorical variable encoding techniques to be used for classification using Artificial Neural Networks on a categorical dataset. The Car Evaluation dataset provided by UCI is used for training. Results show that the data encoded with Sum Coding and Backward Difference Coding technique give highest accuracy as compared to the data pre-processed by rest of the techniques.

332 citations

Journal ArticleDOI
TL;DR: This study provides a starting point for research in determining which techniques for preparing qualitative data for use with neural networks are best, and is the first in-depth look at techniques for working with categorical data in neural networks.
Abstract: This survey investigates current techniques for representing qualitative data for use as input to neural networks. Techniques for using qualitative data in neural networks are well known. However, researchers continue to discover new variations or entirely new methods for working with categorical data in neural networks. Our primary contribution is to cover these representation techniques in a single work. Practitioners working with big data often have a need to encode categorical values in their datasets in order to leverage machine learning algorithms. Moreover, the size of data sets we consider as big data may cause one to reject some encoding techniques as impractical, due to their running time complexity. Neural networks take vectors of real numbers as inputs. One must use a technique to map qualitative values to numerical values before using them as input to a neural network. These techniques are known as embeddings, encodings, representations, or distributed representations. Another contribution this work makes is to provide references for the source code of various techniques, where we are able to verify the authenticity of the source code. We cover recent research in several domains where researchers use categorical data in neural networks. Some of these domains are natural language processing, fraud detection, and clinical document automation. This study provides a starting point for research in determining which techniques for preparing qualitative data for use with neural networks are best. It is our intention that the reader should use these implementations as a starting point to design experiments to evaluate various techniques for working with qualitative data in neural networks. The third contribution we make in this work is a new perspective on techniques for using categorical data in neural networks. We organize techniques for using categorical data in neural networks into three categories. We find three distinct patterns in techniques that identify a technique as determined, algorithmic, or automated. The fourth contribution we make is to identify several opportunities for future research. The form of the data that one uses as an input to a neural network is crucial for using neural networks effectively. This work is a tool for researchers to find the most effective technique for working with categorical data in neural networks, in big data settings. To the best of our knowledge this is the first in-depth look at techniques for working with categorical data in neural networks.

217 citations

Journal ArticleDOI
20 Jul 2019-Genes
TL;DR: The theoretical foundations of DL are described and a generic code that can be easily modified to suit specific needs is provided that is easily implemented using Keras and TensorFlow public software.
Abstract: Deep learning (DL) has emerged as a powerful tool to make accurate predictions from complex data such as image, text, or video. However, its ability to predict phenotypic values from molecular data is less well studied. Here, we describe the theoretical foundations of DL and provide a generic code that can be easily modified to suit specific needs. DL comprises a wide variety of algorithms which depend on numerous hyperparameters. Careful optimization of hyperparameter values is critical to avoid overfitting. Among the DL architectures currently tested in genomic prediction, convolutional neural networks (CNNs) seem more promising than multilayer perceptrons (MLPs). A limitation of DL is in interpreting the results. This may not be relevant for genomic prediction in plant or animal breeding but can be critical when deciding the genetic risk to a disease. Although DL technologies are not “plug-and-play”, they are easily implemented using Keras and TensorFlow public software. To illustrate the principles described here, we implemented a Keras-based code in GitHub.

83 citations

Journal ArticleDOI
01 May 2021
TL;DR: This research provides an alternative solution to deep learning structure models through an automatic hyperparameter optimization process that combines grid search and random search techniques that outperform previous approaches in terms of performance metrics in multiclass classification.
Abstract: A network intrusion detection system (NIDS) is a solution that mitigates the threat of attacks on a network. The success of a NIDS depends on the success of its algorithm and the performance of its method in recognizing attacks. We propose a deep learning intrusion detection system (IDS) using a pretraining approach with deep autoencoder (PTDAE) combined with a deep neural network (DNN). Models were developed using hyperparameter optimization procedures. This research provides an alternative solution to deep learning structure models through an automatic hyperparameter optimization process that combines grid search and random search techniques. The automated hyperparameter optimization process helps determine the value of hyperparameters and the best categorical hyperparameter configuration to improve detection performance. The proposed model was tested on the NSL-KDD, and CSE-CIC-ID2018 datasets. In the pretraining phase, we present the results of applying our technique to three feature extraction methods: deep autoencoder (DAE), autoencoder (AE), and stack autoencoder (SAE). The best results are obtained for the DAE method. These performance results also successfully outperform previous approaches in terms of performance metrics in multiclass classification.

68 citations