scispace - formally typeset
Search or ask a question
Topic

Concept drift

About: Concept drift is a research topic. Over the lifetime, 2304 publications have been published within this topic receiving 53287 citations. The topic is also known as: data drift.


Papers
More filters
Journal ArticleDOI
TL;DR: In this article , an adaptive learning algorithm-based framework, Twitter Sentiment Drift Analysis-Bidirectional Encoder Representations from Transformers (TSDA-BERT), is proposed to detect and handle sentiment drifts in real-time data streams.
Abstract: Handling sentiment drifts in real time twitter data streams are a challenging task while performing sentiment classifications, because of the changes that occur in the sentiments of twitter users, with respect to time. The growing volume of tweets with sentiment drifts has led to the need for devising an adaptive approach to detect and handle this drift in real time. This work proposes an adaptive learning algorithm-based framework, Twitter Sentiment Drift Analysis-Bidirectional Encoder Representations from Transformers (TSDA-BERT), which introduces a sentiment drift measure to detect drifts and a domain impact score to adaptively retrain the classification model with domain relevant data in real time. The framework also works on static data by converting them to data streams using the Kafka tool. The experiments conducted on real time and simulated tweets of sports, health care and financial topics show that the proposed system is able to detect sentiment drifts and maintain the performance of the classification model, with accuracies of 91%, 87% and 90%, respectively. Though the results have been provided only for a few topics, as a proof of concept, this framework can be applied to detect sentiment drifts and perform sentiment classification on real time data streams of any topic.
Book ChapterDOI
01 Jan 2022
TL;DR: In this paper , the authors analyzed wind energy generation data extracted from the Sistema de Información del Operador del Sismema (ESIOS) of the Spanish power grid and performed a study to evaluate detecting concept drifts to retrain models and thus improve the quality of forecasting.
Abstract: AbstractMost of the current data sources generate large amounts of data over time. Renewable energy generation is one example of such data sources. Machine learning is often applied to forecast time series. Since data flows are usually large, trends in data may change and learned patterns might not be optimal in the most recent data. In this paper, we analyse wind energy generation data extracted from the Sistema de Información del Operador del Sistema (ESIOS) of the Spanish power grid. We perform a study to evaluate detecting concept drifts to retrain models and thus improve the quality of forecasting. To this end, we compare the performance of a linear regression model when it is retrained randomly and when a concept drift is detected, respectively. Our experiments show that a concept drift approach improves forecasting between a 7.88% and a 33.97% depending on the concept drift technique applied.KeywordsMachine learningConcept drift detectionData streamingTime seriesWind energy forecasting
Journal ArticleDOI
TL;DR: In this paper , an ensemble learning framework called Ensemble-based Streaming Outlier Detection (ESOD) is presented to detect outliers over streaming data using a sliding window technique that is updated in response to the incoming events from the data streaming environment.
Abstract: —In the last few years, data streams have drawn lots of researchers’ attention due to their various applications, such as healthcare monitoring systems, fraud and intrusion detection, the internet of things (IoT), and financial market applications. A data stream is an unbounded sequence of data continually generated over time and is prone to evolution. Outliers in streaming data are the elements that significantly deviate from the majority of elements and then have to be detected as they may be error values or events of interest. Detection of outliers is a challenging issue in streaming data and is one of the most crucial tasks in data stream mining. Existing outlier detection methods for static data are unsuitable for use in data stream settings due to the unique characteristics of streaming data such as unpredictability, uncertainty, high-dimensionality, and changes in data distribution. Thus, in this paper, a novel ensemble learning framework called Ensemble-based Streaming Outlier Detection (ESOD) is presented to perfectly detect outliers over streaming data using a sliding window technique that is updated in response to the incoming events from the data streaming environment to overcome the concept evolution nature of streaming data. The proposed framework has three phases, namely the training phase, testing/offline phase, and outlier detection/online phase. A detection weighted vote technique is used to determine the final decisions for potential outliers. In the extensive experimental study, which was conducted on 11 real-world benchmark datasets, the proposed framework was assessed using many accuracy metrics. The experiment results showed that the proposed framework beats many other state-of-the-art methods.
Posted ContentDOI
26 May 2023
TL;DR: The authors proposed three dimensions of linguistic dataset drift: vocabulary, structural, and semantic drift, which correspond to content word frequency divergences, syntactic and meaning changes not captured by word frequencies (e.g. lexical semantic change).
Abstract: NLP models often degrade in performance when real world data distributions differ markedly from training data. However, existing dataset drift metrics in NLP have generally not considered specific dimensions of linguistic drift that affect model performance, and they have not been validated in their ability to predict model performance at the individual example level, where such metrics are often used in practice. In this paper, we propose three dimensions of linguistic dataset drift: vocabulary, structural, and semantic drift. These dimensions correspond to content word frequency divergences, syntactic divergences, and meaning changes not captured by word frequencies (e.g. lexical semantic change). We propose interpretable metrics for all three drift dimensions, and we modify past performance prediction methods to predict model performance at both the example and dataset level for English sentiment classification and natural language inference. We find that our drift metrics are more effective than previous metrics at predicting out-of-domain model accuracies (mean 16.8% root mean square error decrease), particularly when compared to popular fine-tuned embedding distances (mean 47.7% error decrease). Fine-tuned embedding distances are much more effective at ranking individual examples by expected performance, but decomposing into vocabulary, structural, and semantic drift produces the best example rankings of all considered model-agnostic drift metrics (mean 6.7% ROC AUC increase).

Network Information
Related Topics (5)
Support vector machine
73.6K papers, 1.7M citations
86% related
Cluster analysis
146.5K papers, 2.9M citations
86% related
Deep learning
79.8K papers, 2.1M citations
85% related
Artificial neural network
207K papers, 4.5M citations
85% related
Fuzzy logic
151.2K papers, 2.3M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023140
2022313
2021276
2020323
2019246
2018209