scispace - formally typeset
Journal ArticleDOI

Machine learning for data-driven discovery in solid Earth geoscience

Reads0
Chats0
TLDR
Solid Earth geoscience is a field that has very large set of observations, which are ideal for analysis with machine-learning methods, and how these methods can be applied to solid Earth datasets is reviewed.
Abstract
BACKGROUND The solid Earth, oceans, and atmosphere together form a complex interacting geosystem. Processes relevant to understanding Earth’s geosystem behavior range in spatial scale from the atomic to the planetary, and in temporal scale from milliseconds to billions of years. Physical, chemical, and biological processes interact and have substantial influence on this complex geosystem, and humans interact with it in ways that are increasingly consequential to the future of both the natural world and civilization as the finiteness of Earth becomes increasingly apparent and limits on available energy, mineral resources, and fresh water increasingly affect the human condition. Earth is subject to a variety of geohazards that are poorly understood, yet increasingly impactful as our exposure grows through increasing urbanization, particularly in hazard-prone areas. We have a fundamental need to develop the best possible predictive understanding of how the geosystem works, and that understanding must be informed by both the present and the deep past. This understanding will come through the analysis of increasingly large geo-datasets and from computationally intensive simulations, often connected through inverse problems. Geoscientists are faced with the challenge of extracting as much useful information as possible and gaining new insights from these data, simulations, and the interplay between the two. Techniques from the rapidly evolving field of machine learning (ML) will play a key role in this effort. ADVANCES The confluence of ultrafast computers with large memory, rapid progress in ML algorithms, and the ready availability of large datasets place geoscience at the threshold of dramatic progress. We anticipate that this progress will come from the application of ML across three categories of research effort: (i) automation to perform a complex prediction task that cannot easily be described by a set of explicit commands; (ii) modeling and inverse problems to create a representation that approximates numerical simulations or captures relationships; and (iii) discovery to reveal new and often unanticipated patterns, structures, or relationships. Examples of automation include geologic mapping using remote-sensing data, characterizing the topology of fracture systems to model subsurface transport, and classifying volcanic ash particles to infer eruptive mechanism. Examples of modeling include approximating the viscoelastic response for complex rheology, determining wave speed models directly from tomographic data, and classifying diverse seismic events. Examples of discovery include predicting laboratory slip events using observations of acoustic emissions, detecting weak earthquake signals using similarity search, and determining the connectivity of subsurface reservoirs using groundwater tracer observations. OUTLOOK The use of ML in solid Earth geosciences is growing rapidly, but is still in its early stages and making uneven progress. Much remains to be done with existing datasets from long-standing data sources, which in many cases are largely unexplored. Newer, unconventional data sources such as light detection and ranging (LiDAR), fiber-optic sensing, and crowd-sourced measurements may demand new approaches through both the volume and the character of information that they present. Practical steps could accelerate and broaden the use of ML in the geosciences. Wider adoption of open-science principles such as open source code, open data, and open access will better position the solid Earth community to take advantage of rapid developments in ML and artificial intelligence. Benchmark datasets and challenge problems have played an important role in driving progress in artificial intelligence research by enabling rigorous performance comparison and could play a similar role in the geosciences. Testing on high-quality datasets produces better models, and benchmark datasets make these data widely available to the research community. They also help recruit expertise from allied disciplines. Close collaboration between geoscientists and ML researchers will aid in making quick progress in ML geoscience applications. Extracting maximum value from geoscientific data will require new approaches for combining data-driven methods, physical modeling, and algorithms capable of learning with limited, weak, or biased labels. Funding opportunities that target the intersection of these disciplines, as well as a greater component of data science and ML education in the geosciences, could help bring this effort to fruition. The list of author affiliations is available in the full article online.

read more

Citations
More filters

Characterization of Acoustic Emissions From Analogue Rocks Using Sparse Regression‐DMDc

TL;DR: In this paper , an unsupervised sparse regression model, Dynamic Mode Decomposition with control, was used to characterize the acoustic signals recorded during the drying of porous analogue rock samples fabricated with ordinary Portland cement, with and without clay.

Morphology Decoder: Untangling Heterogeneous Texture and Determining Permeability with Machine Learning 3D Vision

TL;DR: The Morphology Decoder is a parallel and serial flow reconstruction of machine learning-driven semantically segmented heterogeneous rock texture images of 3D X-Ray Micro Computerized Tomography and Nuclear Magnetic Resonance and introduces controllablemeasurable-volume as new supervised semantic segmentation for 3D vision.

Characterization of Subsurface Hydrogeological Structures With Convolutional Conditional Neural Processes on Limited Training Data

TL;DR: In this paper , a 3D convolutional conditional neural network (ConvCNP) is proposed to reconstruct the entire spatial structures of subsurface hydrological attributes and channels from a limited amount of conditioning data.
Journal ArticleDOI

Application of the transfer learning method in multisource geophysical data fusion

TL;DR: Zhang et al. as discussed by the authors proposed a transfer learning method to extract the features of multisource images, which can further improve the computational efficiency and fusion accuracy in fusion, and the fusion image is obtained using fusion rules that are designed based on the current state.
Posted ContentDOI

Predicting Fault Slip via Transfer Learning

TL;DR: In this article, a transfer learning approach using numerical simulations to train a convolutional encoder-decoder that predicts fault-slip behavior in laboratory experiments is described, and the model learns a mapping between acoustic emission histories and fault-sink from numerical simulations, and generalizes to produce accurate results using laboratory data.
References
More filters
Journal ArticleDOI

Random Forests

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Journal ArticleDOI

Long short-term memory

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Journal Article

Scikit-learn: Machine Learning in Python

TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.
Related Papers (5)