Web Traffic Time Series Forecasting Using LSTM Neural Networks with Distributed Asynchronous Training

doi:10.3390/MATH9040421

Open AccessJournal ArticleDOI

Web Traffic Time Series Forecasting Using LSTM Neural Networks with Distributed Asynchronous Training

Roberto Casado-Vara, +4 more

- Vol. 9, Iss: 4, pp 421

Chats0

TLDR

An architecture that collects source data and in a supervised way performs the forecasting of the time series of the page views of the Wikipedia page views is introduced, representing a significant step forward in the field of time series prediction for web traffic forecasting.

Abstract:

Evaluating web traffic on a web server is highly critical for web service providers since, without a proper demand forecast, customers could have lengthy waiting times and abandon that website. However, this is a challenging task since it requires making reliable predictions based on the arbitrary nature of human behavior. We introduce an architecture that collects source data and in a supervised way performs the forecasting of the time series of the page views. Based on the Wikipedia page views dataset proposed in a competition by Kaggle in 2017, we created an updated version of it for the years 2018–2020. This dataset is processed and the features and hidden patterns in data are obtained for later designing an advanced version of a recurrent neural network called Long Short-Term Memory. This AI model is distributed training, according to the paradigm called data parallelism and using the Downpour training strategy. Predictions made for the seven dominant languages in the dataset are accurate with loss function and measurement error in reasonable ranges. Despite the fact that the analyzed time series have fairly bad patterns of seasonality and trend, the predictions have been quite good, evidencing that an analysis of the hidden patterns and the features extraction before the design of the AI model enhances the model accuracy. In addition, the improvement of the accuracy of the model with the distributed training is remarkable. Since the task of predicting web traffic in as precise quantities as possible requires large datasets, we designed a forecasting system to be accurate despite having limited data in the dataset. We tested the proposed model on the new Wikipedia page views dataset we created and obtained a highly accurate prediction; actually, the mean absolute error of predictions regarding the original one on average is below 30. This represents a significant step forward in the field of time series prediction for web traffic forecasting.

Web Traffic Time Series Forecasting Using LSTM Neural Networks with Distributed Asynchronous Training

Citations

Hybrid Model for Time Series of Complex Structure with ARIMA Components

Wavelet LSTM for Fault Forecasting in Electrical Power Grids

False Data Injection Attack Detection in Smart Grid Using Energy Consumption Forecasting

Fault Diagnosis of Electric Motors Using Deep Learning Algorithms and Its Application: A Review

A comparative study of series hybrid approaches to model and predict the vehicle operating states

References

Large Scale Distributed Deep Networks

An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

25 years of time series forecasting

Statistical and Machine Learning forecasting methods: Concerns and ways forward.

A dual-stage attention-based recurrent neural network for time series prediction

Related Papers (5)

A Study of Deep Learning for Network Traffic Data Forecasting

Combining time series prediction models using genetic algorithm to autoscaling Web applications hosted in the cloud infrastructure

Leadership discovery when data correlatively evolve

Online QoS Prediction in the Cloud Environments Using Hybrid Time-Series Data Mining Approach

Sketch-based change detection: methods, evaluation, and applications