scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Automation of the process of selecting hyperparameters for artificial neural networks for processing retrospective text information

01 Sep 2020-Vol. 577, Iss: 1, pp 012012
About: The article was published on 2020-09-01 and is currently open access. It has received 11 citations till now. The article focuses on the topics: Artificial neural network.
Citations
More filters
DOI
10 Mar 2022
TL;DR: A thermospheric neutral mass density model with robust and reliable uncertainty estimates is developed based on the Space Environment Technologies (SET) High Accuracy Satellite Drag Model (HASDM) density database, and a storm‐time comparison shows that HASDM‐ML also supplies meaningful uncertainty estimates during extreme geomagnetic events.
Abstract: A thermospheric neutral mass density model with robust and reliable uncertainty estimates is developed based on the Space Environment Technologies (SET) High Accuracy Satellite Drag Model (HASDM) density database. This database, created by SET, contains 20 years of outputs from the U.S. Space Force's HASDM, which currently represents the state of the art for density and drag modeling. We utilize principal component analysis for dimensionality reduction, which creates the coefficients upon which nonlinear machine‐learned (ML) regression models are trained. These models use three unique loss functions: Mean square error (MSE), negative logarithm of predictive density (NLPD), and continuous ranked probability score. Three input sets are also tested, showing improved performance when introducing time histories for geomagnetic indices. These models leverage Monte Carlo dropout to provide uncertainty estimates, and the use of the NLPD loss function results in well‐calibrated uncertainty estimates while only increasing error by 0.25% (<10% mean absolute error) relative to MSE. By comparing the best HASDM‐ML model to the HASDM database along satellite orbits, we found that the model provides robust and reliable density uncertainties over diverse space weather conditions. A storm‐time comparison shows that HASDM‐ML also supplies meaningful uncertainty estimates during extreme geomagnetic events.

13 citations

Journal ArticleDOI
TL;DR: In this paper , the authors used principal component analysis (PCA) for dimensionality reduction, which creates the coefficients upon which nonlinear machine-learned (ML) regression models are trained.
Abstract: A thermospheric neutral mass density model with robust and reliable uncertainty estimates is developed based on the Space Environment Technologies (SET) High Accuracy Satellite Drag Model (HASDM) density database. This database, created by SET, contains 20 years of outputs from the U.S. Space Force's HASDM, which currently represents the state of the art for density and drag modeling. We utilize principal component analysis for dimensionality reduction, which creates the coefficients upon which nonlinear machine-learned (ML) regression models are trained. These models use three unique loss functions: Mean square error (MSE), negative logarithm of predictive density (NLPD), and continuous ranked probability score. Three input sets are also tested, showing improved performance when introducing time histories for geomagnetic indices. These models leverage Monte Carlo dropout to provide uncertainty estimates, and the use of the NLPD loss function results in well-calibrated uncertainty estimates while only increasing error by 0.25% (<10% mean absolute error) relative to MSE. By comparing the best HASDM-ML model to the HASDM database along satellite orbits, we found that the model provides robust and reliable density uncertainties over diverse space weather conditions. A storm-time comparison shows that HASDM-ML also supplies meaningful uncertainty estimates during extreme geomagnetic events.

7 citations

Journal ArticleDOI
TL;DR: This study applied seven popular machine learning and deep learning algorithms, including Naïve Bayes, Support Vector Machine, Random Forest, XGBoost, Multilayer Perception, Transformer Neural Network, and stacking and voting ensemble models to build a customized classification model for vaping-related tweets.
Abstract: There are increasingly strict regulations surrounding the purchase and use of combustible tobacco products (i.e., cigarettes); simultaneously, the use of other tobacco products, including e-cigarettes (i.e., vaping products), has dramatically increased. However, public attitudes toward vaping vary widely, and the health effects of vaping are still largely unknown. As a popular social media, Twitter contains rich information shared by users about their behaviors and experiences, including opinions on vaping. It is very challenging to identify vaping-related tweets to source useful information manually. In the current study, we proposed to develop a detection model to accurately identify vaping-related tweets using machine learning and deep learning methods. Specifically, we applied seven popular machine learning and deep learning algorithms, including Naïve Bayes, Support Vector Machine, Random Forest, XGBoost, Multilayer Perception, Transformer Neural Network, and stacking and voting ensemble models to build our customized classification model. We extracted a set of sample tweets during an outbreak of e-cigarette or vaping-related lung injury (EVALI) in 2019 and created an annotated corpus to train and evaluate these models. After comparing the performance of each model, we found that the stacking ensemble learning achieved the highest performance with an F1-score of 0.97. All models could achieve 0.90 or higher after tuning hyperparameters. The ensemble learning model has the best average performance. Our study findings provide informative guidelines and practical implications for the automated detection of themed social media data for public opinions and health surveillance purposes.

2 citations

Journal ArticleDOI
01 Jun 2021
TL;DR: This study uses 2D core CT scan image slices to train a convolutional neural network whose purpose is to automatically predict the lithology of a well on the Norwegian continental shelf and identifies and merged similar lithofacies classes through ad hoc analysis considering the degree of confusion from the prediction confusion matrix and aided by porosity–permeability cross-plot relationships.
Abstract: X-ray computerized tomography (CT) images as digital representations of whole cores can provide valuable information on the composition and internal structure of cores extracted from wells. Incorporation of millimeter-scale core CT data into lithology classification workflows can result in high-resolution lithology description. In this study, we use 2D core CT scan image slices to train a convolutional neural network (CNN) whose purpose is to automatically predict the lithology of a well on the Norwegian continental shelf. The images are preprocessed prior to training, i.e., undesired artefacts are automatically flagged and removed from further analysis. The training data include expert-derived lithofacies classes obtained by manual core description. The trained classifier is used to predict lithofacies on a set of test images that are unseen by the classifier. The prediction results reveal that distinct classes are predicted with high recall (up to 92%). However, there are misclassification rates associated with similarities in gray-scale values and transport properties. To postprocess the acquired results, we identified and merged similar lithofacies classes through ad hoc analysis considering the degree of confusion from the prediction confusion matrix and aided by porosity–permeability cross-plot relationships. Based on this analysis, the lithofacies classes are merged into four rock classes. Another CNN classifier trained on the resulting rock classes generalize well, with higher pixel-wise precision when detecting thin layers and bed boundaries compared to the manual core description. Thus, the classifier provides additional and complementing information to the already existing rock type description.

2 citations

DOI
TL;DR: This work uses a combination of environmental and experimental data, such as atmospheric pressure, gas temperature, and the flux of incident particles as inputs to a sequential Neural Network to recommend a high voltage setting and the corresponding calibration constants in order to maintain consistent gain and optimal resolution throughout the experiment.
Abstract: The AI for Experimental Controls project is developing an AI system to control and calibrate detector systems located at Jefferson Laboratory. Currently, calibrations are performed offline and require significant time and attention from experts. This work would reduce the amount of data and the amount of time spent calibrating in an offline setting. The first use case involves the Central Drift Chamber (CDC) located inside the GlueX spectrometer in Hall D. We use a combination of environmental and experimental data, such as atmospheric pressure, gas temperature, and the flux of incident particles as inputs to a sequential Neural Network (NN) to recommend a high voltage setting and the corresponding calibration constants in order to maintain consistent gain and optimal resolution throughout the experiment. Utilizing AI in this manner represents an initial shift from offline calibration towards near real time calibrations performed at Jefferson Laboratory.
References
More filters
Journal ArticleDOI
28 May 2015-Nature
TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.
Abstract: Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.

46,982 citations

Book
18 Nov 2016
TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.
Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

38,208 citations

Posted Content
TL;DR: Caffe as discussed by the authors is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Abstract: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models. The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures. Caffe fits industry and internet-scale media needs by CUDA GPU computation, processing over 40 million images a day on a single K40 or Titan GPU ($\approx$ 2.5 ms per image). By separating model representation from actual implementation, Caffe allows experimentation and seamless switching among platforms for ease of development and deployment from prototyping machines to cloud environments. Caffe is maintained and developed by the Berkeley Vision and Learning Center (BVLC) with the help of an active community of contributors on GitHub. It powers ongoing research projects, large-scale industrial applications, and startup prototypes in vision, speech, and multimedia.

12,531 citations

Journal Article
TL;DR: A novel algorithm is introduced, Hyperband, for hyperparameter optimization as a pure-exploration non-stochastic infinite-armed bandit problem where a predefined resource like iterations, data samples, or features is allocated to randomly sampled configurations.
Abstract: Performance of machine learning algorithms depends critically on identifying a good set of hyperparameters. While recent approaches use Bayesian optimization to adaptively select configurations, we focus on speeding up random search through adaptive resource allocation and early-stopping. We formulate hyperparameter optimization as a pure-exploration nonstochastic infinite-armed bandit problem where a predefined resource like iterations, data samples, or features is allocated to randomly sampled configurations. We introduce a novel algorithm, Hyperband, for this framework and analyze its theoretical properties, providing several desirable guarantees. Furthermore, we compare Hyperband with popular Bayesian optimization methods on a suite of hyperparameter optimization problems. We observe that Hyperband can provide over an order-of-magnitude speedup over our competitor set on a variety of deep-learning and kernel-based learning problems.

683 citations

Journal ArticleDOI
TL;DR: In this article, the LSTM-RNN model was used for sentence embedding in a web search engine and the results showed that the proposed method significantly outperformed the Paragraph Vector method for web document retrieval task.
Abstract: This paper develops a model that addresses sentence embedding, a hot topic in current natural language processing research, using recurrent neural networks (RNN) with Long Short-Term Memory (LSTM) cells. The proposed LSTM-RNN model sequentially takes each word in a sentence, extracts its information, and embeds it into a semantic vector. Due to its ability to capture long term memory, the LSTM-RNN accumulates increasingly richer information as it goes through the sentence, and when it reaches the last word, the hidden layer of the network provides a semantic representation of the whole sentence. In this paper, the LSTM-RNN is trained in a weakly supervised manner on user click-through data logged by a commercial web search engine. Visualization and analysis are performed to understand how the embedding process works. The model is found to automatically attenuate the unimportant words and detect the salient keywords in the sentence. Furthermore, these detected keywords are found to automatically activate different cells of the LSTM-RNN, where words belonging to a similar topic activate the same cell. As a semantic representation of the sentence, the embedding vector can be used in many different applications. These automatic keyword detection and topic allocation abilities enabled by the LSTM-RNN allow the network to perform document retrieval, a difficult language processing task, where the similarity between the query and documents can be measured by the distance between their corresponding sentence embedding vectors computed by the LSTM-RNN. On a web search task, the LSTM-RNN embedding is shown to significantly outperform several existing state of the art methods. We emphasize that the proposed model generates sentence embedding vectors that are specially useful for web document retrieval tasks. A comparison with a well known general sentence embedding method, the Paragraph Vector, is performed. The results show that the proposed method in this paper significantly outperforms Paragraph Vector method for web document retrieval task.

659 citations