scispace - formally typeset
Search or ask a question
Journal ArticleDOI

What Role Does Hydrological Science Play in the Age of Machine Learning

TL;DR: This commentary is a call to action for the hydrology community to focus on developing a quantitative understanding of where and when hydrological process understanding is valuable in a modeling discipline increasingly dominated by machine learning.
Abstract: We suggest that there is a potential danger to the hydrological sciences community in not recognizing how transformative machine learning will be for the future of hydrological modeling. Given the recent success of machine learning applied to modeling problems, it is unclear what the role of hydrological theory might be in the future. We suggest that a central challenge in hydrology right now should be to clearly delineate where and when hydrological theory adds value to prediction systems. Lessons learned from the history of hydrological modeling motivate several clear next steps toward integrating machine learning into hydrological modeling workflows.
Citations
More filters
Journal ArticleDOI
TL;DR: In this paper, the authors describe the recent breakthroughs in artificial intelligence (AI), and particularly in deep learning (DL), have created tremendous excitement and opportunities in the earth and environmental sciences communitie...
Abstract: Recent breakthroughs in artificial intelligence (AI), and particularly in deep learning (DL), have created tremendous excitement and opportunities in the earth and environmental sciences communitie...

64 citations

Journal ArticleDOI
TL;DR: MIKA-SHA-induced optimal models outperform the lumped models used in this study in terms of efficiency values while benefitting hydrologists with more meaningful hydrological inferences about the runoff dynamics of the Rappahannock River basin.
Abstract: . Despite showing great success of applications in many commercial fields, machine learning and data science models generally show limited success in many scientific fields, including hydrology (Karpatne et al., 2017). The approach is often criticized for its lack of interpretability and physical consistency. This has led to the emergence of new modelling paradigms, such as theory-guided data science (TGDS) and physics-informed machine learning. The motivation behind such approaches is to improve the physical meaningfulness of machine learning models by blending existing scientific knowledge with learning algorithms. Following the same principles in our prior work (Chadalawada et al., 2020), a new model induction framework was founded on genetic programming (GP), namely the Machine Learning Rainfall–Runoff Model Induction (ML-RR-MI) toolkit. ML-RR-MI is capable of developing fully fledged lumped conceptual rainfall–runoff models for a watershed of interest using the building blocks of two flexible rainfall–runoff modelling frameworks. In this study, we extend ML-RR-MI towards inducing semi-distributed rainfall–runoff models. The meaningfulness and reliability of hydrological inferences gained from lumped models may tend to deteriorate within large catchments where the spatial heterogeneity of forcing variables and watershed properties is significant. This was the motivation behind developing our machine learning approach for distributed rainfall–runoff modelling titled Machine Induction Knowledge Augmented – System Hydrologique Asiatique (MIKA-SHA). MIKA-SHA captures spatial variabilities and automatically induces rainfall–runoff models for the catchment of interest without any explicit user selections. Currently, MIKA-SHA learns models utilizing the model building components of two flexible modelling frameworks. However, the proposed framework can be coupled with any internally coherent collection of building blocks. MIKA-SHA's model induction capabilities have been tested on the Rappahannock River basin near Fredericksburg, Virginia, USA. MIKA-SHA builds and tests many model configurations using the model building components of the two flexible modelling frameworks and quantitatively identifies the optimal model for the watershed of concern. In this study, MIKA-SHA is utilized to identify two optimal models (one from each flexible modelling framework) to capture the runoff dynamics of the Rappahannock River basin. Both optimal models achieve high-efficiency values in hydrograph predictions (both at catchment and subcatchment outlets) and good visual matches with the observed runoff response of the catchment. Furthermore, the resulting model architectures are compatible with previously reported research findings and fieldwork insights of the watershed and are readily interpretable by hydrologists. MIKA-SHA-induced semi-distributed model performances were compared against existing lumped model performances for the same basin. MIKA-SHA-induced optimal models outperform the lumped models used in this study in terms of efficiency values while benefitting hydrologists with more meaningful hydrological inferences about the runoff dynamics of the Rappahannock River basin.

43 citations

Journal ArticleDOI
TL;DR: In this paper, three Long Short-Term Memory (LSTM) daily streamflow prediction models (deep learning networks) are compared for 531 basins across the contiguous United States (CONUS) and compared their performance: (1) a LSTM post-processor trained on the U.S. National Water Model (NWM) outputs as a target variable.
Abstract: We build three Long Short-Term Memory (LSTM) daily streamflow prediction models (deep learning networks) for 531 basins across the contiguous United States (CONUS), and compare their performance: (1) a LSTM post-processor trained on the U.S. National Water Model (NWM) outputs (LSTM_PP) as a target variable, (2) a LSTM post-processor trained on the NWM outputs and using atmospheric forcings (LSTM_PPA), and (3) a LSTM model trained on USGS average daily streamflow data and using atmospheric forcing (LSTM_A). We trained the LSTMs for the period 2004-2014 and evaluated on 1994-2002, and compared several performance metrics to the NWM reanalysis. Overall performance of the three LSTMs is similar, with median NSE scores of 0.73 (LSTM_PP), 0.75 (LSTM_PPA), and 0.74 (LSTM_A), and all three LSTMs outperform the NWM validation scores of 0.62. Additionally, LSTM_A outperforms LSTM_PP and LSTM_PPA in ungauged basins. While LSTM as a post-processor improves NWM predictions substantially, we achieved comparable performance with the LSTM trained without the NWM outputs (LSTM_A). Finally, we performed a sensitivity analysis to diagnose the land surface component of the NWM as the source of mass bias error and the channel router as a source of simulation timing error. This indicates that the NWM routing scheme should be considered a priority for NWM improvement.

41 citations

Journal ArticleDOI
TL;DR: In this article, the LSTM-based models were used to simulate discharge with median Nash-Sutcliffe efficiency (NSE) scores of 0.88 and 0.86 respectively.
Abstract: . Long short-term memory (LSTM) models are recurrent neural networks from the field of deep learning (DL) which have shown promise for time series modelling, especially in conditions when data are abundant. Previous studies have demonstrated the applicability of LSTM-based models for rainfall–runoff modelling; however, LSTMs have not been tested on catchments in Great Britain (GB). Moreover, opportunities exist to use spatial and seasonal patterns in model performances to improve our understanding of hydrological processes and to examine the advantages and disadvantages of LSTM-based models for hydrological simulation. By training two LSTM architectures across a large sample of 669 catchments in GB, we demonstrate that the LSTM and the Entity Aware LSTM (EA LSTM) models simulate discharge with median Nash–Sutcliffe efficiency (NSE) scores of 0.88 and 0.86 respectively. We find that the LSTM-based models outperform a suite of benchmark conceptual models, suggesting an opportunity to use additional data to refine conceptual models. In summary, the LSTM-based models show the largest performance improvements in the north-east of Scotland and in south-east of England. The south-east of England remained difficult to model, however, in part due to the inability of the LSTMs configured in this study to learn groundwater processes, human abstractions and complex percolation properties from the hydro-meteorological variables typically employed for hydrological modelling.

36 citations

References
More filters
Journal ArticleDOI
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Abstract: Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O. 1. Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.

72,897 citations


"What Role Does Hydrological Science..." refers methods in this paper

  • ...It is worth noting that the DLmodels used by Kratzert et al. (2019) were invented around the same time (Hochreiter, 1991; Hochreiter & Schmidhuber, 1997) as some of the earliest shallow neural network applications in hydrology (e.g., Hsu et al., 1995)....

    [...]

Journal ArticleDOI
TL;DR: The Global Land Data Assimilation System (GLDAS) as mentioned in this paper is an uncoupled land surface modeling system that drives multiple models, integrates a huge quantity of observation-based data, runs globally at high resolution (0.25°), and produces results in near-real time (typically within 48 h of the present).
Abstract: A Global Land Data Assimilation System (GLDAS) has been developed. Its purpose is to ingest satellite- and ground-based observational data products, using advanced land surface modeling and data assimilation techniques, in order to generate optimal fields of land surface states and fluxes. GLDAS is unique in that it is an uncoupled land surface modeling system that drives multiple models, integrates a huge quantity of observation-based data, runs globally at high resolution (0.25°), and produces results in near–real time (typically within 48 h of the present). GLDAS is also a test bed for innovative modeling and assimilation capabilities. A vegetation-based “tiling” approach is used to simulate subgrid-scale variability, with a 1-km global vegetation dataset as its basis. Soil and elevation parameters are based on high-resolution global datasets. Observation-based precipitation and downward radiation and output fields from the best available global coupled atmospheric data assimilation systems are employe...

3,857 citations

Journal ArticleDOI
TL;DR: The FLUXNET project as mentioned in this paper is a global network of micrometeorological flux measurement sites that measure the exchanges of carbon dioxide, water vapor, and energy between the biosphere and atmosphere.
Abstract: FLUXNET is a global network of micrometeorological flux measurement sites that measure the exchanges of carbon dioxide, water vapor, and energy between the biosphere and atmosphere. At present over 140 sites are operating on a long-term and continuous basis. Vegetation under study includes temperate conifer and broadleaved (deciduous and evergreen) forests, tropical and boreal forests, crops, grasslands, chaparral, wetlands, and tundra. Sites exist on five continents and their latitudinal distribution ranges from 70°N to 30°S. FLUXNET has several primary functions. First, it provides infrastructure for compiling, archiving, and distributing carbon, water, and energy flux measurement, and meteorological, plant, and soil data to the science community. (Data and site information are available online at the FLUXNET Web site, http://www-eosdis.ornl.gov/FLUXNET/.) Second, the project supports calibration and flux intercomparison activities. This activity ensures that data from the regional networks are intercomparable. And third, FLUXNET supports the synthesis, discussion, and communication of ideas and data by supporting project scientists, workshops, and visiting scientists. The overarching goal is to provide information for validating computations of net primary productivity, evaporation, and energy absorption that are being generated by sensors mounted on the NASA Terra satellite. Data being compiled by FLUXNET are being used to quantify and compare magnitudes and dynamics of annual ecosystem carbon and water balances, to quantify the response of stand-scale carbon dioxide and water vapor flux densities to controlling biotic and abiotic factors, and to validate a hierarchy of soil–plant–atmosphere trace gas exchange models. Findings so far include 1) net CO 2 exchange of temperate broadleaved forests increases by about 5.7 g C m −2 day −1 for each additional day that the growing season is extended; 2) the sensitivity of net ecosystem CO 2 exchange to sunlight doubles if the sky is cloudy rather than clear; 3) the spectrum of CO 2 flux density exhibits peaks at timescales of days, weeks, and years, and a spectral gap exists at the month timescale; 4) the optimal temperature of net CO 2 exchange varies with mean summer temperature; and 5) stand age affects carbon dioxide and water vapor flux densities.

3,162 citations

Trending Questions (1)
What are the consequences of hydrological problems?

The consequences of hydrological problems are not mentioned in the given information.