scispace - formally typeset
Search or ask a question
Author

Vrushali Kulkarni

Bio: Vrushali Kulkarni is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Random forest & Decision tree. The author has an hindex of 7, co-authored 35 publications receiving 248 citations. Previous affiliations of Vrushali Kulkarni include Maharashtra Institute of Technology.

Papers
More filters
Proceedings ArticleDOI
18 Jul 2012
TL;DR: Systematic survey of pruning efforts of Random Forest classifier along with the required theoretical background is presented and a Comparison Chart is generated by taking relevant parameters.
Abstract: Random Forest is an ensemble supervised machine learning technique. Based on bagging and random feature selection, number of decision trees (base classifiers) is generated and majority voting is taken for classification. For effective learning and classification of Random Forest, there is need for reducing number of trees (Pruning) in Random Forest. We have presented here systematic survey of pruning efforts of Random Forest classifier along with the required theoretical background. Most of the work for pruning takes static approach while recently dynamic pruning is being targeted. We have also generated a Comparison Chart by taking relevant parameters. There is research scope for analyzing behavior of Random forest, generating accurate and diverse base decision trees, truly dynamic pruning algorithm for Random Forest classifier, and generating optimal subset of Random forest.

107 citations

Journal ArticleDOI
TL;DR: This paper will explore the various domains where blockchain has had an impact and where future implementations may be expected and will bring together all the key developments so far in terms of putting blockchain to practice.
Abstract: Blockchain is being termed as the fifth disruptive innovation in computing. In simplest words, it is a distributed ledger of records that is immutable and verifiable. Since its advent in 2008, blockchain as a concept has been used in various ways. The largest impact or application is seen as a multitude of cryptocurrencies that have sprung up. However, with time, it has become clear that blockchain as a technology is likely to have an impact much wider than just the cryptocurrency domain and much deeper than simple distributed ledger storage. This detailed survey intends to bring together all the key developments so far in terms of putting blockchain to practice. While the most common adoption of blockchain is in finance and banking domain, there are experiments being conducted by many big players in various other domains. This paper will explore the various domains where blockchain has had an impact and where future implementations may be expected.

59 citations

01 Jun 2014
TL;DR: An attempt is made to improve performance of Random Forest classifiers in terms of accuracy, and time required for learning and classification, to achieve this, five new approaches are proposed.
Abstract: Random Forest is a supervised machine learning algorithm. In Data Mining domain, machine learning algorithms are extensively used to analyze data, and generate predictions based on this data. Being an ensemble algorithm, Random Forest generates multiple decision trees as base classifiers and applies majority voting to combine the outcomes of the base trees. Strength of individual decision trees and correlation among the base trees are key issues which decide generalization error of Random Forest classifiers. Based on accuracy measure, Random Forest classifiers are at par with existing ensemble techniques like bagging and boosting. In this research work an attempt is made to improve performance of Random Forest classifiers in terms of accuracy, and time required for learning and classification. To achieve this, five new approaches are proposed. The empirical analysis and outcomes of experiments carried out in this research work lead to effective learning and classification using Random Forest algorithm.

32 citations

Journal ArticleDOI
TL;DR: A new approach of hybrid decision tree model for random forest classifier is proposed, which is augmented by weighted voting based on the strength of individual tree and has shown notable increase in the accuracy of random forest.
Abstract: Random Forest is an ensemble, supervised machine learning algorithm. An ensemble generates many classifiers and combines their results by majority voting. Random forest uses decision tree as base classifier. In decision tree induction, an attribute split/evaluation measure is used to decide the best split at each node of the decision tree. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation among them. The work presented in this paper is related to attribute split measures and is a two step process: first theoretical study of the five selected split measures is done and a comparison matrix is generated to understand pros and cons of each measure. These theoretical results are verified by performing empirical analysis. For empirical analysis, random forest is generated using each of the five selected split measures, chosen one at a time. i.e. random forest using information gain, random forest using gain ratio, etc. The next step is, based on this theoretical and empirical analysis, a new approach of hybrid decision tree model for random forest classifier is proposed. In this model, individual decision tree in Random Forest is generated using different split measures. This model is augmented by weighted voting based on the strength of individual tree. The new approach has shown notable increase in the accuracy of random forest.

29 citations

Proceedings ArticleDOI
01 Oct 2019
TL;DR: A methodology that classifies grapes image into ripen and unripen category is proposed, taking into account increasing productivity of grapes and there is need to focus on ripeness estimation of grapes at the correct time.
Abstract: India is worldwide well known for exporting fruits, having a massive significance in the world. Global food security is essential for the durable production of fruits as well as for a remarkable reduction in pre and post-harvest waste. Harvesting and estimating the ripeness of fruits by a human is an expensive, laborious and time-consuming task. Ripeness estimation is carried out on single fruit like orange, apple, tomato, banana, papaya and etc., using color features. By taking into account increasing productivity of grapes and there is need to focus on ripeness estimation of grapes at the correct time. In this paper, we proposed a methodology that classifies grapes image into ripen and unripen category. A local breed of grape ‘Sonaka’ was examined during the harvest season from January to March 2019. The images were separated into two ripen categories, e.g. unripen and ripen according to the color and shape of grapes. This image was subjected to a classification model like Convolutional Neural Network (CNN) and support vector machine (SVM). Color features such as RGB and HSV and morphological features such as the shape of grapes were chosen as features for this classification model. The validation result shows that the CNN model achieves higher classification accuracy with 79.49% than the SVM classifier having 69%.

23 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

01 Jan 2002

9,314 citations

Journal ArticleDOI
TL;DR: This review has revealed that RF classifier can successfully handle high data dimensionality and multicolinearity, being both fast and insensitive to overfitting.
Abstract: A random forest (RF) classifier is an ensemble classifier that produces multiple decision trees, using a randomly selected subset of training samples and variables. This classifier has become popular within the remote sensing community due to the accuracy of its classifications. The overall objective of this work was to review the utilization of RF classifier in remote sensing. This review has revealed that RF classifier can successfully handle high data dimensionality and multicolinearity, being both fast and insensitive to overfitting. It is, however, sensitive to the sampling design. The variable importance (VI) measurement provided by the RF classifier has been extensively exploited in different scenarios, for example to reduce the number of dimensions of hyperspectral data, to identify the most relevant multisource remote sensing and geographic data, and to select the most suitable season to classify particular target classes. Further investigations are required into less commonly exploited uses of this classifier, such as for sample proximity analysis to detect and remove outliers in the training samples.

3,244 citations

Journal ArticleDOI
TL;DR: This article gives a tutorial introduction into the methodology of gradient boosting methods with a strong focus on machine learning aspects of modeling.
Abstract: Gradient boosting machines are a family of powerful machine-learning techniques that have shown considerable success in a wide range of practical applications. They are highly customizable to the particular needs of the application, like being learned with respect to different loss functions. This article gives a tutorial introduction into the methodology of gradient boosting methods. A theoretical information is complemented with many descriptive examples and illustrations which cover all the stages of the gradient boosting model design. Considerations on handling the model complexity are discussed. A set of practical examples of gradient boosting applications are presented and comprehensively analyzed.

1,463 citations

Journal ArticleDOI
TL;DR: This work presents the state-of-the-art methods and proposes the following contributions: a taxonomy of sentiment analysis; a survey on polarity classification methods and resources, especially those related to emotion mining; a complete survey on emotion theories and emotion-mining research; and some useful resources, including lexicons and datasets.
Abstract: Sentiment analysis from text consists of extracting information about opinions, sentiments, and even emotions conveyed by writers towards topics of interest. It is often equated to opinion mining, but it should also encompass emotion mining. Opinion mining involves the use of natural language processing and machine learning to determine the attitude of a writer towards a subject. Emotion mining is also using similar technologies but is concerned with detecting and classifying writers emotions toward events or topics. Textual emotion-mining methods have various applications, including gaining information about customer satisfaction, helping in selecting teaching materials in e-learning, recommending products based on users emotions, and even predicting mental-health disorders. In surveys on sentiment analysis, which are often old or incomplete, the strong link between opinion mining and emotion mining is understated. This motivates the need for a different and new perspective on the literature on sentiment analysis, with a focus on emotion mining. We present the state-of-the-art methods and propose the following contributions: (1) a taxonomy of sentiment analysis; (2) a survey on polarity classification methods and resources, especially those related to emotion mining; (3) a complete survey on emotion theories and emotion-mining research; and (4) some useful resources, including lexicons and datasets.

331 citations