Automation of the process of selecting hyperparameters for artificial neural networks for processing retrospective text information

doi:10.1088/1755-1315/577/1/012012

Home
/
Papers
/
Automation of the process of selecting hyperparameters for artificial neural networks for processing retrospective text information

Journal Article•DOI•

Automation of the process of selecting hyperparameters for artificial neural networks for processing retrospective text information

01 Sep 2020-Vol. 577, Iss: 1, pp 012012

About: The article was published on 2020-09-01 and is currently open access. It has received 11 citations till now. The article focuses on the topics: Artificial neural network.

...read moreread less

Citations

PDF

Open Access

More filters

DOI•

Machine‐Learned HASDM Thermospheric Mass Density Model With Uncertainty Quantification

[...]

Richard J. Licata, Piyush M. Mehta, W. Kent Tobiska, Shilpa Huzurbazar

10 Mar 2022

TL;DR: A thermospheric neutral mass density model with robust and reliable uncertainty estimates is developed based on the Space Environment Technologies (SET) High Accuracy Satellite Drag Model (HASDM) density database, and a storm‐time comparison shows that HASDM‐ML also supplies meaningful uncertainty estimates during extreme geomagnetic events.

...read moreread less

Abstract: A thermospheric neutral mass density model with robust and reliable uncertainty estimates is developed based on the Space Environment Technologies (SET) High Accuracy Satellite Drag Model (HASDM) density database. This database, created by SET, contains 20 years of outputs from the U.S. Space Force's HASDM, which currently represents the state of the art for density and drag modeling. We utilize principal component analysis for dimensionality reduction, which creates the coefficients upon which nonlinear machine‐learned (ML) regression models are trained. These models use three unique loss functions: Mean square error (MSE), negative logarithm of predictive density (NLPD), and continuous ranked probability score. Three input sets are also tested, showing improved performance when introducing time histories for geomagnetic indices. These models leverage Monte Carlo dropout to provide uncertainty estimates, and the use of the NLPD loss function results in well‐calibrated uncertainty estimates while only increasing error by 0.25% (<10% mean absolute error) relative to MSE. By comparing the best HASDM‐ML model to the HASDM database along satellite orbits, we found that the model provides robust and reliable density uncertainties over diverse space weather conditions. A storm‐time comparison shows that HASDM‐ML also supplies meaningful uncertainty estimates during extreme geomagnetic events.

...read moreread less

13 citations

Journal Article•DOI•

Machine‐Learned HASDM Thermospheric Mass Density Model With Uncertainty Quantification

[...]

Paulo Moreira¹•Institutions (1)

West Virginia University¹

01 Apr 2022-Space Weather-the International Journal of Research and Applications

TL;DR: In this paper , the authors used principal component analysis (PCA) for dimensionality reduction, which creates the coefficients upon which nonlinear machine-learned (ML) regression models are trained.

...read moreread less

Abstract: A thermospheric neutral mass density model with robust and reliable uncertainty estimates is developed based on the Space Environment Technologies (SET) High Accuracy Satellite Drag Model (HASDM) density database. This database, created by SET, contains 20 years of outputs from the U.S. Space Force's HASDM, which currently represents the state of the art for density and drag modeling. We utilize principal component analysis for dimensionality reduction, which creates the coefficients upon which nonlinear machine-learned (ML) regression models are trained. These models use three unique loss functions: Mean square error (MSE), negative logarithm of predictive density (NLPD), and continuous ranked probability score. Three input sets are also tested, showing improved performance when introducing time histories for geomagnetic indices. These models leverage Monte Carlo dropout to provide uncertainty estimates, and the use of the NLPD loss function results in well-calibrated uncertainty estimates while only increasing error by 0.25% (<10% mean absolute error) relative to MSE. By comparing the best HASDM-ML model to the HASDM database along satellite orbits, we found that the model provides robust and reliable density uncertainties over diverse space weather conditions. A storm-time comparison shows that HASDM-ML also supplies meaningful uncertainty estimates during extreme geomagnetic events.

...read moreread less

7 citations

Journal Article•DOI•

Automated Detection of Vaping-Related Tweets on Twitter During the 2019 EVALI Outbreak Using Machine Learning Classification

[...]

Yong Ren, Dezhi Wu, Avineet Singh, Erin Kasson, Ming Huang, Patricia A. Cavazos-Rehg - Show less +2 more

10 Feb 2022-Frontiers in big data

TL;DR: This study applied seven popular machine learning and deep learning algorithms, including Naïve Bayes, Support Vector Machine, Random Forest, XGBoost, Multilayer Perception, Transformer Neural Network, and stacking and voting ensemble models to build a customized classification model for vaping-related tweets.

...read moreread less

Abstract: There are increasingly strict regulations surrounding the purchase and use of combustible tobacco products (i.e., cigarettes); simultaneously, the use of other tobacco products, including e-cigarettes (i.e., vaping products), has dramatically increased. However, public attitudes toward vaping vary widely, and the health effects of vaping are still largely unknown. As a popular social media, Twitter contains rich information shared by users about their behaviors and experiences, including opinions on vaping. It is very challenging to identify vaping-related tweets to source useful information manually. In the current study, we proposed to develop a detection model to accurately identify vaping-related tweets using machine learning and deep learning methods. Specifically, we applied seven popular machine learning and deep learning algorithms, including Naïve Bayes, Support Vector Machine, Random Forest, XGBoost, Multilayer Perception, Transformer Neural Network, and stacking and voting ensemble models to build our customized classification model. We extracted a set of sample tweets during an outbreak of e-cigarette or vaping-related lung injury (EVALI) in 2019 and created an annotated corpus to train and evaluate these models. After comparing the performance of each model, we found that the stacking ensemble learning achieved the highest performance with an F1-score of 0.97. All models could achieve 0.90 or higher after tuning hyperparameters. The ensemble learning model has the best average performance. Our study findings provide informative guidelines and practical implications for the automated detection of themed social media data for public opinions and health surveillance purposes.

...read moreread less

2 citations

Journal Article•DOI•

Lithology classification of whole core CT scans using convolutional neural networks

[...]

Kurdistan Chawshin¹, Carl Fredrik Berg¹, Damiano Varagnolo¹, Olivier Lopez•Institutions (1)

Norwegian University of Science and Technology¹

01 Jun 2021

TL;DR: This study uses 2D core CT scan image slices to train a convolutional neural network whose purpose is to automatically predict the lithology of a well on the Norwegian continental shelf and identifies and merged similar lithofacies classes through ad hoc analysis considering the degree of confusion from the prediction confusion matrix and aided by porosity–permeability cross-plot relationships.

...read moreread less

Abstract: X-ray computerized tomography (CT) images as digital representations of whole cores can provide valuable information on the composition and internal structure of cores extracted from wells. Incorporation of millimeter-scale core CT data into lithology classification workflows can result in high-resolution lithology description. In this study, we use 2D core CT scan image slices to train a convolutional neural network (CNN) whose purpose is to automatically predict the lithology of a well on the Norwegian continental shelf. The images are preprocessed prior to training, i.e., undesired artefacts are automatically flagged and removed from further analysis. The training data include expert-derived lithofacies classes obtained by manual core description. The trained classifier is used to predict lithofacies on a set of test images that are unseen by the classifier. The prediction results reveal that distinct classes are predicted with high recall (up to 92%). However, there are misclassification rates associated with similarities in gray-scale values and transport properties. To postprocess the acquired results, we identified and merged similar lithofacies classes through ad hoc analysis considering the degree of confusion from the prediction confusion matrix and aided by porosity–permeability cross-plot relationships. Based on this analysis, the lithofacies classes are merged into four rock classes. Another CNN classifier trained on the resulting rock classes generalize well, with higher pixel-wise precision when detecting thin layers and bed boundaries compared to the manual core description. Thus, the classifier provides additional and complementing information to the already existing rock type description.

...read moreread less

2 citations

DOI•

AI for Experimental Controls at Jefferson Lab

[...]

Torri C. Jeske, D. McSpadden, N. Kalra, Thomas Durkee Britton, N. S. Jarvis, D. Lawrence - Show less +2 more

01 Mar 2022-Journal of Instrumentation

TL;DR: This work uses a combination of environmental and experimental data, such as atmospheric pressure, gas temperature, and the flux of incident particles as inputs to a sequential Neural Network to recommend a high voltage setting and the corresponding calibration constants in order to maintain consistent gain and optimal resolution throughout the experiment.

...read moreread less

Abstract: The AI for Experimental Controls project is developing an AI system to control and calibrate detector systems located at Jefferson Laboratory. Currently, calibrations are performed offline and require significant time and attention from experts. This work would reduce the amount of data and the amount of time spent calibrating in an offline setting. The first use case involves the Central Drift Chamber (CDC) located inside the GlueX spectrometer in Hall D. We use a combination of environmental and experimental data, such as atmospheric pressure, gas temperature, and the flux of incident particles as inputs to a sequential Neural Network (NN) to recommend a high voltage setting and the corresponding calibration constants in order to maintain consistent gain and optimal resolution throughout the experiment. Utilizing AI in this manner represents an initial shift from offline calibration towards near real time calibrations performed at Jefferson Laboratory.

...read moreread less

References

PDF

Open Access

More filters

Journal Article•DOI•

Deep learning

[...]

Yann LeCun¹, Yann LeCun², Yoshua Bengio³, Geoffrey E. Hinton⁴, Geoffrey E. Hinton⁵ - Show less +1 more•Institutions (5)

New York University¹, Facebook², Université de Montréal³, University of Toronto⁴, Google⁵

28 May 2015-Nature

TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.

...read moreread less

Abstract: Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.

...read moreread less

46,982 citations

Book•

Deep Learning

[...]

Ian Goodfellow¹, Yoshua Bengio², Aaron Courville²•Institutions (2)

Google¹, Université de Montréal²

18 Nov 2016

TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.

...read moreread less

Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

...read moreread less

38,208 citations

Posted Content•

Caffe: Convolutional Architecture for Fast Feature Embedding

[...]

Yangqing Jia¹, Evan Shelhamer², Jeff Donahue², Sergey Karayev², Jonathan Long², Ross Girshick², Sergio Guadarrama², Trevor Darrell² - Show less +4 more•Institutions (2)

Google¹, University of California, Berkeley²

20 Jun 2014-arXiv: Computer Vision and Pattern Recognition

TL;DR: Caffe as discussed by the authors is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.

...read moreread less

Abstract: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models. The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures. Caffe fits industry and internet-scale media needs by CUDA GPU computation, processing over 40 million images a day on a single K40 or Titan GPU ($\approx$ 2.5 ms per image). By separating model representation from actual implementation, Caffe allows experimentation and seamless switching among platforms for ease of development and deployment from prototyping machines to cloud environments. Caffe is maintained and developed by the Berkeley Vision and Learning Center (BVLC) with the help of an active community of contributors on GitHub. It powers ongoing research projects, large-scale industrial applications, and startup prototypes in vision, speech, and multimedia.

...read moreread less

12,531 citations

Journal Article•

Hyperband: a novel bandit-based approach to hyperparameter optimization

[...]

Lisha Li¹, Kevin Jamieson², Giulia DeSalvo³, Afshin Rostamizadeh³, Ameet Talwalkar¹ - Show less +1 more•Institutions (3)

Carnegie Mellon University¹, University of Washington², Google³

01 Jan 2017-Journal of Machine Learning Research

TL;DR: A novel algorithm is introduced, Hyperband, for hyperparameter optimization as a pure-exploration non-stochastic infinite-armed bandit problem where a predefined resource like iterations, data samples, or features is allocated to randomly sampled configurations.

...read moreread less

Abstract: Performance of machine learning algorithms depends critically on identifying a good set of hyperparameters. While recent approaches use Bayesian optimization to adaptively select configurations, we focus on speeding up random search through adaptive resource allocation and early-stopping. We formulate hyperparameter optimization as a pure-exploration nonstochastic infinite-armed bandit problem where a predefined resource like iterations, data samples, or features is allocated to randomly sampled configurations. We introduce a novel algorithm, Hyperband, for this framework and analyze its theoretical properties, providing several desirable guarantees. Furthermore, we compare Hyperband with popular Bayesian optimization methods on a suite of hyperparameter optimization problems. We observe that Hyperband can provide over an order-of-magnitude speedup over our competitor set on a variety of deep-learning and kernel-based learning problems.

...read moreread less

683 citations

Journal Article•DOI•

Deep sentence embedding using long short-term memory networks: analysis and application to information retrieval

[...]

Hamid Palangi¹, Li Deng², Yelong Shen², Jianfeng Gao², Xiaodong He², Jianshu Chen², Xinying Song², Rabab K. Ward¹ - Show less +4 more•Institutions (2)

University of British Columbia¹, Microsoft²

01 Apr 2016-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: In this article, the LSTM-RNN model was used for sentence embedding in a web search engine and the results showed that the proposed method significantly outperformed the Paragraph Vector method for web document retrieval task.

...read moreread less

Abstract: This paper develops a model that addresses sentence embedding, a hot topic in current natural language processing research, using recurrent neural networks (RNN) with Long Short-Term Memory (LSTM) cells. The proposed LSTM-RNN model sequentially takes each word in a sentence, extracts its information, and embeds it into a semantic vector. Due to its ability to capture long term memory, the LSTM-RNN accumulates increasingly richer information as it goes through the sentence, and when it reaches the last word, the hidden layer of the network provides a semantic representation of the whole sentence. In this paper, the LSTM-RNN is trained in a weakly supervised manner on user click-through data logged by a commercial web search engine. Visualization and analysis are performed to understand how the embedding process works. The model is found to automatically attenuate the unimportant words and detect the salient keywords in the sentence. Furthermore, these detected keywords are found to automatically activate different cells of the LSTM-RNN, where words belonging to a similar topic activate the same cell. As a semantic representation of the sentence, the embedding vector can be used in many different applications. These automatic keyword detection and topic allocation abilities enabled by the LSTM-RNN allow the network to perform document retrieval, a difficult language processing task, where the similarity between the query and documents can be measured by the distance between their corresponding sentence embedding vectors computed by the LSTM-RNN. On a web search task, the LSTM-RNN embedding is shown to significantly outperform several existing state of the art methods. We emphasize that the proposed model generates sentence embedding vectors that are specially useful for web document retrieval tasks. A comparison with a well known general sentence embedding method, the Paragraph Vector, is performed. The results show that the proposed method in this paper significantly outperforms Paragraph Vector method for web document retrieval task.

...read moreread less

659 citations