Bio: Nalini Nagendran is an academic researcher from VIT University. The author has contributed to research in topics: Block (data storage) & Data stream. The author has an hindex of 1, co-authored 2 publications receiving 3 citations.
••01 Jan 2019
TL;DR: A survey on various ensemble classifiers for learning in data stream mining is provided and their performance on accuracy, memory, and time on synthetic and real datasets with different drift scenarios are compared.
Abstract: Mining in data stream plays a vital role in Big Data analytics. Traffic management, sensor networks and monitoring, weblogs analysis are the application of dynamic environments which generate streaming data. In a dynamic environment, data arrives at high speed and algorithms that process them need to fulfill the constraints on limited memory, computation time, and one-time scan of incoming data. The significant challenge in data stream mining is data distribution changes over a time period which is called concept drifts. So, learning model need to detect the changes and adapt according to that model. By nature, ensemble classifiers are adapting to changes very well and deal the concept drift very well. Three ensemble-based approaches were used to handle the concept drift: online, block-based ensemble, and hybrid approaches. We provide a survey on various ensemble classifiers for learning in data stream mining. Finally, we compare their performance on accuracy, memory, and time on synthetic and real datasets with different drift scenarios.
••01 Jan 2021
TL;DR: In this article, the authors discuss the importance of preprocessing big data data in terms of analysis time, utilized resources percentage, storage, efficiency of analyzed data and the output gained information.
Abstract: Big data is a trending word in the industry and academia that represents the huge flood of collected data, this data is very complex in its nature. Big data as a term used to describe many concepts related to the data from technological and cultural meaning. In the big data community, big data analytics is used to discover the hidden patterns and values that give an accurate representation of the data. Big data preprocessing is considered an important step in the analysis process. It a key to the success of the analysis process in terms of analysis time, utilized resources percentage, storage, the efficiency of the analyzed data and the output gained information. Preprocessing data involves dealing with concepts like concept drift, data streams that are considered as significant challenges.
TL;DR: In this article, a hybrid block-based ensemble framework is proposed for multi-class classification in evolving data streams, which integrates the main pros of an online drift detector for a k-class problem and the concept blockbased weighting with a view to react to different types of drift.
Abstract: Data stream mining is an important research topic that has received increasing attention due to its use in a wide range of applications, such as sensor networks, banking, and telecommunication. The phenomenon of data streams evolving over time is known as concept drift. In addition, the presence of multiple classes aggravates the problem of a loss in performance during the process of drift detection in data streams. Several drift detectors and ensemble approaches have been widely employed, however they either incur a high cost in terms of memory consumption and run time or ensemble approaches may respond slowly due to using outdated blocks to train classifiers. Motivated by this, we propose a hybrid block-based ensemble, which is a framework for multi-class classification in evolving data streams. The multi-class framework aims to integrate the main pros of an online drift detector for a k-class problem and the concept block-based weighting with a view to react to different types of drifts. The experimental evaluations on well-known synthetic and real-world datasets through a comprehensive comparison upon eleven drift detectors and five ensemble approaches, it shows that our proposed algorithms performs significantly better than other drift detectors and ensemble approaches.
••01 Jan 2021
TL;DR: This paper has focused discussing data stream classification algorithms and simulated the same with real and synthetic dataset to understand performance parameters of discussed algorithms.
Abstract: Data stream mining has taken over as a new field of research during past few years. It has gained lot of attention recently due to its challenging characteristics like dynamic nature, huge data size and continuous flow, temporal, etc. Processing and classifying these types of data confront many issues in terms of storage and analysis both. Moreover, existing traditional classification algorithms do not fit well with data stream, as they process over the data which is stored in memory for once and all. Data streams if taken up for mining can render very crucial information for any non-stationary system from which it is generated. Also, storing data streams is not feasible as storage cost increases with the increasing data size. But the algorithm designed for data streams should have characteristics which address incremental and multi-pass approach to deal with new data and to analyze exiting at the same time. Data stream classification aims at labeling data, and it is nearly impossible to do in real life due to the characteristics of data which act as challenges. Traditional data mining algorithm fits limited number of instances, and this model would not work with data stream. In this paper, we have focused discussing data stream classification algorithms and simulated the same with real and synthetic dataset to understand performance parameters of discussed algorithms.
TL;DR: This survey explores, summarizes, and categorizes work within the domain of stream classification and identifies core research threads over the past few years, which are structured based on the stream classification process to facilitate coordination within this complex topic.
Abstract: Due to the rise of continuous data-generating applications, analyzing data streams has gained increasing attention over the past decades. A core research area in stream data is stream classification, which categorizes or detects data points within an evolving stream of observations. Areas of stream classification are diverse—ranging, e.g., from monitoring sensor data to analyzing a wide range of (social) media applications. Research in stream classification is related to developing methods that adapt to the changing and potentially volatile data stream. It focuses on individual aspects of the stream classification pipeline, e.g., designing suitable algorithm architectures, an efficient train and test procedure, or detecting so-called concept drifts. As a result of the many different research questions and strands, the field is challenging to grasp, especially for beginners. This survey explores, summarizes, and categorizes work within the domain of stream classification and identifies core research threads over the past few years. It is structured based on the stream classification process to facilitate coordination within this complex topic, including common application scenarios and benchmarking data sets. Thus, both newcomers to the field and experts who want to widen their scope can gain (additional) insight into this research area and find starting points and pointers to more in-depth literature on specific issues and research directions in the field.