scispace - formally typeset
Search or ask a question
Author

Jaya Sharma

Bio: Jaya Sharma is an academic researcher from Indian Institutes of Information Technology. The author has contributed to research in topics: Test case & Random forest. The author has an hindex of 1, co-authored 1 publications receiving 3 citations.

Papers
More filters
Book ChapterDOI
17 Dec 2019
TL;DR: A dynamic weighing scheme is proposed between test samples and decision tree in RF using exponential distribution, which is rigorously tested over benchmark datasets from the UCI repository for both classification and regression tasks.
Abstract: Random forest (RF) is a supervised, non-parametric, ensemble-based machine learning method used for classification and regression task. It is easy in terms of implementation and scalable, hence attracting many researchers. Being an ensemble-based method, it considers equal weights/votes to all atomic units i.e. decision trees. However, this may not be true always for varying test cases. Hence, the correlation between decision tree and data samples are explored in the recent past to take care of such issues. In this paper, a dynamic weighing scheme is proposed between test samples and decision tree in RF. The correlation is defined in terms of similarity between the test case and the decision tree using exponential distribution. Hence, the proposed method named as Exponentially Weighted Random Forest (EWRF). The performance of the proposed method is rigorously tested over benchmark datasets from the UCI repository for both classification and regression tasks.

6 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In this paper, a method based on learning automata is presented, through which the adaptive capabilities of the problem space, as well as the independence of the data domain, are added to the random forest to increase its efficiency.
Abstract: The goal of aggregating the base classifiers is to achieve an aggregated classifier that has a higher resolution than individual classifiers Random forest is one of the types of ensemble learning methods that have been considered more than other ensemble learning methods due to its simple structure, ease of understanding, as well as higher efficiency than similar methods The ability and efficiency of classical methods are always influenced by the data The capabilities of independence from the data domain, and the ability to adapt to problem space conditions, are the most challenging issues about the different types of classifiers In this paper, a method based on learning automata is presented, through which the adaptive capabilities of the problem space, as well as the independence of the data domain, are added to the random forest to increase its efficiency Using the idea of reinforcement learning in the random forest has made it possible to address issues with data that have a dynamic behaviour Dynamic behaviour refers to the variability in the behaviour of a data sample in different domains Therefore, to evaluate the proposed method, and to create an environment with dynamic behaviour, different domains of data have been considered In the proposed method, the idea is added to the random forest using learning automata The reason for this choice is the simple structure of the learning automata and the compatibility of the learning automata with the problem space The evaluation results confirm the improvement of random forest efficiency

22 citations

Journal ArticleDOI
TL;DR: This work empirically proves that the performance of the MaRF improves due to the improvement in the strength of the M-ary trees, and proposes to use multiple features at a node for splitting the data as in axis parallel method.
Abstract: Random Forest (RF) is composed of decision trees as base classifiers. In general, a decision tree recursively partitions the feature space into two disjoint subspaces using a single feature as axis-parallel splits for each internal node. The oblique decision tree uses a linear combination of features (to form a hyperplane) to partition the feature space in two subspaces. The later approach is an NP-hard problem to compute the best-suited hyperplane. In this work, we propose to use multiple features at a node for splitting the data as in axis parallel method. Each feature independently divides into two subspaces and this process is done by multiple features at one node. Hence, the given space is divided into multiple subspaces simultaneously, and in turn, to construct the M-ary trees. Hence, the forest formed is named as M-ary Random Forest (MaRF). To measure the performance of the task in MaRF, we have extended the notion of tree strength of the regression tree. We empirically prove that the performance of the MaRF improves due to the improvement in the strength of the M-ary trees. We have shown the performance to wide range of datasets ranging from UCI datasets, Hyperspectral dataset, MNIST dataset, Caltech 101 and Caltech 256 datasets. The efficiency of the MaRF approach is found satisfactory as compared to state-of-the-art methods.

7 citations

Journal ArticleDOI
TL;DR: An adaptive deep belief network framework (A-DBNF) that can adapt to different datasets with minimum human labor and validate the performance of the proposed framework on several benchmark datasets, comparing the regression and classification accuracy with state-of-the-art methods.
Abstract: Many machine learning methods and models have been proposed for multivariate data regression and classification in recent years. Most of them are supervised learning methods, which require a large number of labeled data. Moreover, current methods need exclusive human labor and supervision to fine-tune the model hyperparameters. In this paper, we propose an adaptive deep belief network framework (A-DBNF) that can adapt to different datasets with minimum human labor. The proposed framework employs a deep belief network (DBN) to extract representative features of the datasets in the unsupervised learning phase and then fine-tune the network parameters by using few labeled data in the supervised learning phase. We integrate the DBN model with a genetic algorithm (GA) to select and optimize the model hyperparameters and further improve the network performance. We validate the performance of the proposed framework on several benchmark datasets, comparing the regression and classification accuracy with state-of-the-art methods. A-DBNF showed a noticeable performance improvement on three regression tasks using only 40–50% of labeled data. Our model outperformed most of the related methods in classification tasks by using 23–48% of labeled data.

6 citations

Proceedings ArticleDOI
07 Jun 2022
TL;DR: In this article , a prediction method based on random forests is proposed to predict the harvested energy for various locations in different scenarios with high accuracy while only requiring limited resources, and the predictor executes in 22.2 μs and requires 2.60 μJ to generate a prediction.
Abstract: Indoor energy harvesting has recently enabled long-term deployments of sustainable IoT sensor nodes. The performance of such systems operating in an energy-neutral man-ner can be optimized by exploiting energy prediction models. Numerous prediction algorithms have been developed, yet they are primarily intended for outdoor (solar) energy harvesting. Indoor environments are much more challenging to predict since the primary energy is very variable. We propose a prediction method based on random forests that is capable of capturing and predicting this variability. It estimates the harvested energy for various locations in different scenarios with high accuracy while only requiring limited resources. We deploy the predictor on a dual processor platform powered by indoor lighting with various sensors including indoor air quality sensors. The predictor executes in 22.2 μs and requires 2.60 μJ to generate a prediction. Furthermore, the predictor continuously learns from the system's local environment. The proposed online learning is resource-efficient and requires only limited data, enabling it to run on the harvesting-based system. Over time, online learning reduces the energy required to generate a prediction by up to 77 % while maintaining its high prediction accuracy.

3 citations

TL;DR: In this article , a comparative analysis of machine learning algorithms with a combination of preprocessing stages to predict the potential loss of bank customers is performed, where dimensionality reduction and feature selection are applied using the Variance threshold and Correlation coefficient methods.
Abstract: Customers are one of the most valuable assets of a banking business. They are the spearhead of product users who will provide benefits for banks, especially in credit card products. This study aims to find out which customers have the potential to leave credit card services from a bank. In previous studies, no one has conducted a comparative analysis of machine learning algorithms with various preprocessing stages to predict the potential loss of bank customers. This study performs a comparative analysis of machine learning algorithms with a combination of preprocessing stages to predict the potential loss of bank customers. This analysis is important for selecting the most suitable algorithm for predicting potential loss of bank customers. At the preprocessing stage, dimensionality reduction and feature selection are applied using the Variance threshold and Correlation coefficient methods. The classification methods used are Logistic regression (LR), Decision tree (DT), and Naïve Bayes (NB) algorithms. The highest result of the three methods is the Decision Tree which is able to have an F1 score of 96% and an accuracy value of 93%. Logistic regression and Naïve Bayes are second and third after the decision tree. It was also found that the presence or absence of data preprocessing stages did not have a significant effect on the F1 score and accuracy.