scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Boreholes Data Analysis Architecture Based on Clustering and Prediction Models for Enhancing Underground Safety Verification

25 May 2021-IEEE Access (Institute of Electrical and Electronics Engineers (IEEE))-Vol. 9, pp 78428-78451
TL;DR: In this article, the authors presented a new solution to process and analyze boreholes data to monitor mining operations and identify the boreholes shortcomings, and developed a bi-directional long short-term memory (BD-LSTM) to predict the borehole depth to minimize the cost and time of the digging operations.
Abstract: During the last decade, substantial resources have been invested to exploit massive amounts of boreholes data collected through groundwater extraction. Furthermore, boreholes depth can be considered one of the crucial factors in digging borehole efficiency. Therefore, a new solution is needed to process and analyze boreholes data to monitor digging operations and identify the boreholes shortcomings. This research study presents a boreholes data analysis architecture based on data and predictive analysis models to improve borehole efficiency, underground safety verification, and risk evaluation. The proposed architecture aims to process and analyze borehole data based on different hydrogeological characteristics using data and predictive analytics to enhance underground safety verification and planning of borehole resources. The proposed architecture is developed based on two modules; descriptive data analysis and predictive analysis modules. The descriptive analysis aims to utilize data and clustering analysis techniques to process and extract hidden hydrogeological characteristics from borehole history data. The predictive analysis aims to develop a bi-directional long short-term memory (BD-LSTM) to predict the boreholes depth to minimize the cost and time of the digging operations. Furthermore, different performance measures are utilized to evaluate the performance of the proposed clustering and regression models. Moreover, our proposed BD-LSTM model is evaluated and compared with conventional machine learning (ML) regression models. The $R^{2}$ score of the proposed BD-LSTM is 0.989, which indicates that the proposed model accurately and precisely predicts boreholes depth compared to the conventional regression models. The experimental and comparative analysis results reveal the significance and effectiveness of the proposed borehole data analysis architecture. The experimental results will improve underground safety management and the efficiency of boreholes for future wells.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
24 Aug 2021
TL;DR: In this paper, a robust LSTM-based model for predicting bubble point pressure (Pb) using a global data set of 760 collected data points from different fields worldwide to build the model.
Abstract: The bubble point pressure (Pb) is a crucial pressure-volume-temperature (PVT) property and a primary input needed for performing many petroleum engineering calculations, such as reservoir simulation. The industrial practice of determining Pb is by direct measurement from PVT tests or prediction using empirical correlations. The main problems encountered with the published empirical correlations are their lack of accuracy and the noncomprehensive data set used to develop the model. In addition, most of the published correlations have not proven the relationships between the inputs and outputs as part of the validation process (i.e., no trend analysis was conducted). Nowadays, deep learning techniques such as long short-term memory (LSTM) networks have begun to replace the empirical correlations as they generate high accuracy. This study, therefore, presents a robust LSTM-based model for predicting Pb using a global data set of 760 collected data points from different fields worldwide to build the model. The developed model was then validated by applying trend analysis to ensure that the model follows the correct relationships between the inputs and outputs and performing statistical analysis after comparing the most published correlations. The robustness and accuracy of the model have been verified by performing various statistical analyses and using additional data that was not part of the data set used to develop the model. The trend analysis results have proven that the proposed LSTM-based model follows the correct relationships, indicating the model's reliability. Furthermore, the statistical analysis results have shown that the lowest average absolute percent relative error (AAPRE) is 8.422% and the highest correlation coefficient is 0.99. These values are much better than those given by the most accurate models in the literature.

11 citations

Journal ArticleDOI
21 Oct 2021-Sensors
TL;DR: In this article, an enhanced PDR-BLE compensation mechanism for improving indoor localization is presented, which takes advantage of the non-requirement of linearization of the system around its current state in an unscented Kalman filter (UKF) and Kalman Filter (KF) in smoothing of RSSI values.
Abstract: This paper presents an enhanced PDR-BLE compensation mechanism for improving indoor localization, which is considerably resilient against variant uncertainties. The proposed method of ePDR-BLE compensation mechanism (EPBCM) takes advantage of the non-requirement of linearization of the system around its current state in an unscented Kalman filter (UKF) and Kalman filter (KF) in smoothing of received signal strength indicator (RSSI) values. In this paper, a fusion of conflicting information and the activity detection approach of an object in an indoor environment contemplates varying magnitude of accelerometer values based on the hidden Markov model (HMM). On the estimated orientation, the proposed approach remunerates the inadvertent body acceleration and magnetic distortion sensor data. Moreover, EPBCM can precisely calculate the velocity and position by reducing the position drift, which gives rise to a fault in zero-velocity and heading error. The developed EPBCM localization algorithm using Bluetooth low energy beacons (BLE) was applied and analyzed in an indoor environment. The experiments conducted in an indoor scenario shows the results of various activities performed by the object and achieves better orientation estimation, zero velocity measurements, and high position accuracy than other methods in the literature.

4 citations

Journal ArticleDOI
TL;DR: In this article, an L2-weighted K-means clustering algorithm was proposed to estimate the drilling time and depth for different soil materials and land layers, and the proposed clustering scheme is evaluated widely used evaluation metrics such as Dunn Index, Davies-Bouldin index (DBI), Silhouette coefficient (SC), and Calinski-Harabaz Index (CHI).
Abstract: Recently groundwater scarcity has accelerated drilling operations worldwide as drilled boreholes are substantial for replenishing the needs of safe drinking water and achieving long-term sustainable development goals. However, the quest for achieving optimal drilling efficiency is ever continued. This paper aims to provide valuable insights into borehole drilling data utilizing the potential of advanced analytics by employing several enhanced cluster analysis techniques to propel drilling efficiency optimization and knowledge discovery. The study proposed an L2-weighted K-mean clustering algorithm in which the mean is computed from transformed weighted feature space. To verify the effectiveness of our proposed L2-weighted K-mean algorithm, we performed a comparative analysis of the proposed work with traditional clustering algorithms to estimate the digging time and depth for different soil materials and land layers. The proposed clustering scheme is evaluated widely used evaluation metrics such as Dunn Index, Davies–Bouldin index (DBI), Silhouette coefficient (SC), and Calinski–Harabaz Index (CHI). The study results highlight the significance of the proposed clustering algorithm as it achieved better clustering results than conventional clustering approaches. Moreover, for facilitation of subsequent learning, achievement of reliable classification, and generalization, we performed feature extraction based on the time interval of the drilling process according to soil material and land layer. We formulated the solution by grouping the extracted features into six different blocks to achieve our desired objective. Each block corresponds to various characteristics of soil materials and land layers. Extracted features are examined and visualized in point cloud space to analyze the water level patterns, depth, and days required to complete the drilling operations.

3 citations

Journal ArticleDOI
TL;DR: In this paper , an ensemble architecture of machine learning and deep learning is proposed to detect click fraud in online advertisement campaigns, which consists of a Convolutional Neural Network (CNN), and a Bidirectional Long Short-Term Memory network (BiLSTM), while the Random Forest (RF) is used for classification.
Abstract: With the rapid development of online advertising, click fraud is a serious issue for the internet market. Click fraud is a dishonest attempt to improve a website’s profit or deplete an advertiser’s budget by clicking on pay-per-click advertisements. For an extended period, this illegal act has a threat to the industrial sectors. As a result, these businesses hesitate to advertise their items on mobile apps and websites, as numerous groups attempt to take advantage of themes. To safely advertise their services and products online, a robust mechanism is needed for efficient click fraud detection. To tackle this issue, an ensemble architecture of machine learning and deep learning is proposed to detect click fraud in online advertisement campaigns. The proposed ensemble architecture consists of a Convolutional Neural Network (CNN), and a Bidirectional Long Short-Term Memory network (BiLSTM) is used to extract hidden features, while the Random Forest (RF) is used for classification. The main objective of the proposed research study is to develop a hybrid DL model for automatic feature extraction from clicks data and then process through an RF classifier into two classes, such as fraudulent and non-fraudulent clicks. Furthermore, a preprocessing module is developed to preprocess data by dealing with categorical attributes and imbalanced data to enhance the reliability and consistency of the clicks data. In addition, different evaluation criteria are used to evaluate and compare the performance of the proposed CNN-BiLSTM-RF with the ensemble and standalone models. The experimental results indicate that our ensemble architecture achieved the accuracy of 99.19 ± 0.08%, precision 99.89 ± 0.03%, sensitivity 98.50 ± 0.11%, F1-score 99.19 ± 0.08% and specificity 99.89 ± 0.03%. Furthermore, our proposed architecture produced superior results compared to other developed ensemble and conventional models. Moreover, our proposed ensemble architecture can be used as a safeguard against click fraud for pay-per-click advertising to facilitate industries for the safe and reliable promotion of their products.

1 citations

Journal ArticleDOI
TL;DR: In this article , an ensemble architecture of machine learning and deep learning is proposed to detect click fraud in online advertisement campaigns, which consists of a Convolutional Neural Network (CNN), and a Bidirectional Long Short-Term Memory network (BiLSTM), while the Random Forest (RF) is used for classification.
Abstract: With the rapid development of online advertising, click fraud is a serious issue for the internet market. Click fraud is a dishonest attempt to improve a website’s profit or deplete an advertiser’s budget by clicking on pay-per-click advertisements. For an extended period, this illegal act has a threat to the industrial sectors. As a result, these businesses hesitate to advertise their items on mobile apps and websites, as numerous groups attempt to take advantage of themes. To safely advertise their services and products online, a robust mechanism is needed for efficient click fraud detection. To tackle this issue, an ensemble architecture of machine learning and deep learning is proposed to detect click fraud in online advertisement campaigns. The proposed ensemble architecture consists of a Convolutional Neural Network (CNN), and a Bidirectional Long Short-Term Memory network (BiLSTM) is used to extract hidden features, while the Random Forest (RF) is used for classification. The main objective of the proposed research study is to develop a hybrid DL model for automatic feature extraction from clicks data and then process through an RF classifier into two classes, such as fraudulent and non-fraudulent clicks. Furthermore, a preprocessing module is developed to preprocess data by dealing with categorical attributes and imbalanced data to enhance the reliability and consistency of the clicks data. In addition, different evaluation criteria are used to evaluate and compare the performance of the proposed CNN-BiLSTM-RF with the ensemble and standalone models. The experimental results indicate that our ensemble architecture achieved the accuracy of 99.19 ± 0.08%, precision 99.89 ± 0.03%, sensitivity 98.50 ± 0.11%, F1-score 99.19 ± 0.08% and specificity 99.89 ± 0.03%. Furthermore, our proposed architecture produced superior results compared to other developed ensemble and conventional models. Moreover, our proposed ensemble architecture can be used as a safeguard against click fraud for pay-per-click advertising to facilitate industries for the safe and reliable promotion of their products.

1 citations

References
More filters
Journal ArticleDOI
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Abstract: Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O. 1. Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.

72,897 citations

Journal ArticleDOI
TL;DR: Recent work in the area of unsupervised feature learning and deep learning is reviewed, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks.
Abstract: The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks. This motivates longer term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation, and manifold learning.

11,201 citations

Journal ArticleDOI
TL;DR: This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.
Abstract: Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models (GMMs) to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input. An alternative way to evaluate the fit is to use a feed-forward neural network that takes several frames of coefficients as input and produces posterior probabilities over HMM states as output. Deep neural networks (DNNs) that have many hidden layers and are trained using new methods have been shown to outperform GMMs on a variety of speech recognition benchmarks, sometimes by a large margin. This article provides an overview of this progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.

9,091 citations

Journal ArticleDOI
TL;DR: It is suggested that reporting discrimination and calibration will always be important for a prediction model and decision-analytic measures should be reported if the predictive model is to be used for clinical decisions.
Abstract: The performance of prediction models can be assessed using a variety of methods and metrics. Traditional measures for binary and survival outcomes include the Brier score to indicate overall model performance, the concordance (or c) statistic for discriminative ability (or area under the receiver operating characteristic [ROC] curve), and goodness-of-fit statistics for calibration.Several new measures have recently been proposed that can be seen as refinements of discrimination measures, including variants of the c statistic for survival, reclassification tables, net reclassification improvement (NRI), and integrated discrimination improvement (IDI). Moreover, decision-analytic measures have been proposed, including decision curves to plot the net benefit achieved by making decisions based on model predictions.We aimed to define the role of these relatively novel approaches in the evaluation of the performance of prediction models. For illustration, we present a case study of predicting the presence of residual tumor versus benign tissue in patients with testicular cancer (n = 544 for model development, n = 273 for external validation).We suggest that reporting discrimination and calibration will always be important for a prediction model. Decision-analytic measures should be reported if the predictive model is to be used for clinical decisions. Other measures of performance may be warranted in specific applications, such as reclassification metrics to gain insight into the value of adding a novel predictor to an established model.

3,473 citations

Journal ArticleDOI
TL;DR: The need to develop appropriate and efficient analytical methods to leverage massive volumes of heterogeneous data in unstructured text, audio, and video formats is highlighted and the need to devise new tools for predictive analytics for structured big data is reinforced.

2,962 citations