Machine learning for data-driven discovery in solid Earth geoscience

doi:10.1126/SCIENCE.AAU0323

Journal ArticleDOI

Machine learning for data-driven discovery in solid Earth geoscience

Karianne J. Bergen, +4 more

- 22 Mar 2019 -

Science

- Vol. 363, Iss: 6433

Chats0

TLDR

Solid Earth geoscience is a field that has very large set of observations, which are ideal for analysis with machine-learning methods, and how these methods can be applied to solid Earth datasets is reviewed.

Abstract:

BACKGROUND The solid Earth, oceans, and atmosphere together form a complex interacting geosystem. Processes relevant to understanding Earth’s geosystem behavior range in spatial scale from the atomic to the planetary, and in temporal scale from milliseconds to billions of years. Physical, chemical, and biological processes interact and have substantial influence on this complex geosystem, and humans interact with it in ways that are increasingly consequential to the future of both the natural world and civilization as the finiteness of Earth becomes increasingly apparent and limits on available energy, mineral resources, and fresh water increasingly affect the human condition. Earth is subject to a variety of geohazards that are poorly understood, yet increasingly impactful as our exposure grows through increasing urbanization, particularly in hazard-prone areas. We have a fundamental need to develop the best possible predictive understanding of how the geosystem works, and that understanding must be informed by both the present and the deep past. This understanding will come through the analysis of increasingly large geo-datasets and from computationally intensive simulations, often connected through inverse problems. Geoscientists are faced with the challenge of extracting as much useful information as possible and gaining new insights from these data, simulations, and the interplay between the two. Techniques from the rapidly evolving field of machine learning (ML) will play a key role in this effort. ADVANCES The confluence of ultrafast computers with large memory, rapid progress in ML algorithms, and the ready availability of large datasets place geoscience at the threshold of dramatic progress. We anticipate that this progress will come from the application of ML across three categories of research effort: (i) automation to perform a complex prediction task that cannot easily be described by a set of explicit commands; (ii) modeling and inverse problems to create a representation that approximates numerical simulations or captures relationships; and (iii) discovery to reveal new and often unanticipated patterns, structures, or relationships. Examples of automation include geologic mapping using remote-sensing data, characterizing the topology of fracture systems to model subsurface transport, and classifying volcanic ash particles to infer eruptive mechanism. Examples of modeling include approximating the viscoelastic response for complex rheology, determining wave speed models directly from tomographic data, and classifying diverse seismic events. Examples of discovery include predicting laboratory slip events using observations of acoustic emissions, detecting weak earthquake signals using similarity search, and determining the connectivity of subsurface reservoirs using groundwater tracer observations. OUTLOOK The use of ML in solid Earth geosciences is growing rapidly, but is still in its early stages and making uneven progress. Much remains to be done with existing datasets from long-standing data sources, which in many cases are largely unexplored. Newer, unconventional data sources such as light detection and ranging (LiDAR), fiber-optic sensing, and crowd-sourced measurements may demand new approaches through both the volume and the character of information that they present. Practical steps could accelerate and broaden the use of ML in the geosciences. Wider adoption of open-science principles such as open source code, open data, and open access will better position the solid Earth community to take advantage of rapid developments in ML and artificial intelligence. Benchmark datasets and challenge problems have played an important role in driving progress in artificial intelligence research by enabling rigorous performance comparison and could play a similar role in the geosciences. Testing on high-quality datasets produces better models, and benchmark datasets make these data widely available to the research community. They also help recruit expertise from allied disciplines. Close collaboration between geoscientists and ML researchers will aid in making quick progress in ML geoscience applications. Extracting maximum value from geoscientific data will require new approaches for combining data-driven methods, physical modeling, and algorithms capable of learning with limited, weak, or biased labels. Funding opportunities that target the intersection of these disciplines, as well as a greater component of data science and ML education in the geosciences, could help bring this effort to fruition. The list of author affiliations is available in the full article online.

Machine learning for data-driven discovery in solid Earth geoscience

Citations

Characterization of Acoustic Emissions From Analogue Rocks Using Sparse Regression‐DMDc

Morphology Decoder: Untangling Heterogeneous Texture and Determining Permeability with Machine Learning 3D Vision

Characterization of Subsurface Hydrogeological Structures With Convolutional Conditional Neural Processes on Limited Training Data

Application of the transfer learning method in multisource geophysical data fusion

Predicting Fault Slip via Transfer Learning

References

Random Forests

ImageNet Classification with Deep Convolutional Neural Networks

Long short-term memory

ImageNet: A large-scale hierarchical image database

Scikit-learn: Machine Learning in Python

Related Papers (5)

Deep learning

Convolutional neural network for earthquake detection and location

Deep learning and process understanding for data-driven Earth system science

Random Forests

U-Net: Convolutional Networks for Biomedical Image Segmentation