Most computational hydrology is not reproducible, so is it really science?

doi:10.1002/2016WR019285

Hutton, C., Wagener, T., Freer, J., Han, D., Duffy, C., & Arheimer, B.

(2016). Most computational hydrology is not reproducible, so is it

really science?

Water Resources Research

,

52

(10), 7548–7555.

https://doi.org/10.1002/2016WR019285

Peer reviewed version

Link to published version (if available):

10.1002/2016WR019285

Link to publication record in Explore Bristol Research

PDF-document

University of Bristol - Explore Bristol Research

General rights

This document is made available in accordance with publisher policies. Please cite only the

published version using the reference above. Full terms of use are available:

http://www.bristol.ac.uk/red/research-policy/pure/user-guides/ebr-terms/

Most Computational Hydrology is not Reproducible,

1

so is it Really Science?

2

3

Christopher Hutton

1

, Thorsten Wagener

1

, Jim Freer

2

, Dawei Han

1

, Chris Duffy

3

, Berit

4

Arheimer

4

5

1

Department of Civil Engineering, University of Bristol, Bristol, UK.

6

2

School of Geographical Sciences, University of Bristol, Bristol, UK.

7

3

Department of Civil Engineering, The Pennsylvania State University, University Park,

8

Pennsylvania, USA

9

4

Swedish Meteorological and Hydrological Institute, Norrköping, Sweden

10

Corresponding author: Christopher Hutton (chutton294@gmail.com)

11

Key points

12

 Articles that rely on computational work do not provide sufficient information to

13

allow published scientific findings to be reproduced.

14

 We argue for open re-useable code, data, and formal workflows, allowing published

15

findings to be verified.

16

 Reproducible computational hydrology will provide a more robust foundation for

17

scientific advancement and policy support.

18

Abstract

19

Reproducibility is a foundational principle in scientific research. Yet in computational

20

hydrology, the code and data that actually produces published results is not regularly made

21

available, inhibiting the ability of the community to reproduce and verify previous findings. In

22

order to overcome this problem we recommend that re-useable code and formal workflows,

23

which unambiguously reproduce published scientific results, are made available for the

24

community alongside data, so that we can verify previous findings, and build directly from

25

previous work. In cases where reproducing large-scale hydrologic studies is computationally

26

very expensive and time-consuming, new processes are required to ensure scientific rigour.

27

Such changes will strongly improve the transparency of hydrological research, and thus provide

28

a more credible foundation for scientific advancement and policy support.

29

Index Terms

30

Computational Hydrology; Modeling; Metadata; Software re-use; Workflow

31

Keywords

32

Hydrology; Reproducibility; Software; Code; Verification; Workflows

33

Main Text

34

Upon observing order of magnitude differences in Darcy-Weisbach Friction Factors

35

estimated from hillslope surface properties in two previous studies [Weltz et al. 1992;

36

Abrahams et al. 1994], Parsons et al [1994] conducted additional experiments to identify

37

factors controlling hillslope overland flow in semi-arid environments, and identified that the

38

experimental set-up was the main factor controlling the difference between the previous

39

experimental results. Whilst exact reproducibility is impossible in open hydrological systems,

40

attempting to reproduce the main scientific finding within an acceptable margin of error is a

41

core principle of scientific research [Popper 1959]. As illustrated, independent observation

42

helps to verify the legitimacy of individual findings. In turn, this helps us to build upon sound

43

observations so that we can evolve hypotheses (and models) of how catchments function

44

[McGlynn et al. 2002], and move them from specific circumstances to more general theory

45

[Wagener et al., 2007].

46

As in Parsons et al [1994], attempts at reproducibility have failed in a number of

47

disciplines, leading to increased focus on the topic in the broader scientific literature [Begley

48

& Ellis 2012; Prinz et al. 2011; Ioannidis et al. 2001; Nosek 2012]. Such failures have occurred

49

not just because of differences in experimental setup, but because of scientific misconduct

50

[Yong 2012; Collins & Tabak 2014; Fang et al. 2012], poor application of statistics to achieve

51

apparent significant results [Ioannidis 2005; Hutton 2014], and importantly, insufficient

52

reporting of methodologies and data quality in journals to enable reproducibility to be assessed

53

by the community. An oft-cited underlying reason for such failures is the present reward system

54

in scientific publication, which prioritises the publication of innovative, and seemingly

55

statistically significant results over the publication of both null results [Franco et al 2014;

56

Jennions & Møller, 2002; cf Freer et al 2003], and reproduced experiments. Such a system

57

provides few incentives to adopt open science practices that support and enable verification

58

[Nosek et al 2015].

59

The prominence of computational research across scientific disciplines – from big data

60

analysis in genomic research to computational modelling in climate science – has brought

61

increased focus on the reproducibility issue. This is because the full code and workflow used

62

to produce published scientific findings is typically not made available, thus inhibiting attempts

63

to verify the provenance of published results [Buckheit & Donoho 1995; Mesirov 2010]. Given

64

the extent to which this lack of transparency is considered a problem for reproducibility more

65

broadly in the scientific literature [Donoho et al. 2009], to what extent is reproducibility, or a

66

lack thereof, also a problem in computational hydrology? Computational analysis has grown

67

rapidly in hydrology over the past 30 years, transforming the process of scientific discovery.

68

Whilst code is most obviously used for hydrological modelling [e.g. Clark et al. 2008; Wrede

69

et al. 2014; Duan et al. 2006], some form of code is used to produce the vast majority of

70

hydrological research papers, from data processing and quality analysis [Teegavarapu 2009;

71

Mcmillan et al. 2012; Coxon et al. 2015], regionalisation and large-scale statistical analysis

72

across catchments [Blöschl et al. 2013; Berghuijs et al. 2016], all the way to figure preparation.

73

However, as in other disciplines the full code that produces presented results is typically not

74

made available alongside the publication to document their provenance, which inhibits

75

attempts to reproduce published findings.

76

In order to advance scientific progress in hydrology, reproducibility is required in

77

computational hydrology for several key reasons. First, the reliability of scientific computer

78

code is often unclear. From our own experience it is often very difficult to spot errors unless

79

they manifest themselves in very obvious errors in model outputs. Thus, code needs to be

80

transparent to allow the legitimacy of published results to be verified. Second, the complexity

81

of many hydrologic models and data analysis codes used today makes it simply infeasible to

82

report all settings that can be adjusted (e.g. initial conditions, parameters, etc) in publications -

83

a point recognised recently in a joint editorial published in five hydrology journals [Blöschl et

84

al. 2014]. Transparency across hydrology is especially important given research builds on

85

previous research. For example, being able to evaluate how “tidied up” datasets have been

86

created by explicitly showing all of the assumptions made will lead to benefits in interpreting

87

where and why subsequent models that are built upon such datasets fail. Finally, the complexity

88

and diversity of catchment systems means that we need to be able to reproduce exact

89

methodologies applied in specific settings more broadly across a range of catchment

90

environments, so that we can robustly evaluate competing hypotheses of hydrologic behaviour

91

across scales and locations [Clark et al 2016]. Our current inability to achieve this hinders both

92

the ability of the broader community to learn from, and build on, previous work, and

93

importantly, verify previous findings. So what material should be provided, and therefore what

94

is required to reproduce computational hydrology?

95

The necessary information that leads to, and therefore documents the provenance of the

96

final research paper has been termed the research compendium [Gentleman & Lang 2004]. In

97

the context of computational hydrology this includes the original data used; all

98

analysis/modelling code; and the workflow that ties together the code and data to produce the

99

published results. Although these components are not routinely published alongside journal

100

articles, current practices in hydrology do facilitate reproducibility to varying extents. For

101

example, initiatives are relatively well developed in hydrology for opening up and sharing data

102

from individual catchments and cross-catchment datasets [McKee & Druliner 1998; Renard et

103

al. 2008; Kirby et al. 1991; Newman et al. 2015; Duan et al. 2006], including (quite recently)

104

the development of infrastructures and standards for sharing open water data [Emmett et al

105

2014; Leonard & Duffy 2013; Tarboton et al. 2009; Taylor, 2012; Tarboton et al 2014]. In

106

addition, different code packages has been made available by developers. Prominent examples

107

include the hydrologic models such as Topmodel [Beven & Kirkby, 1979], VIC [Wood et al.,

108

1992], FUSE [Clark et al., 2008], HYPE [Lindström et al., 2010], open-source groundwater

109

models includingMODFLOW [Harbough, 2005] and PFLOTRAN, and codes linked to

110

modelling, including optimization/uncertainty algorithms such as SCE [Duan et al., 1993],

111

SCEM [Vrugt et al., 2003] or GLUE [Beven & Binley, 1992]. By being made open, such code

112

has helped spread new ideas and concepts to advance hydrology, and made reproducing each-

113

others’ work easier However, whilst sharing data and code are important first steps, sharing

114

alone does not provide the critical detail on implementation contained within a workflow that

115

is required to reproduce published results.

116

We argue that in order to advance and make more robust the process of knowledge

117

creation and hypothesis testing within the computational hydrological community, we need to

118

adopt common standards and infrastructures to: [1] make code readable and re-useable; [2]

119

create well documented workflows that combine re-useable code together with data to enable

120

published scientific findings to be reproduced; [3] make code and workflows available and

121

easy to find through use of code repositories and creation of code metadata; [4] use unique

122

persistent identifiers (e.g. DOIs) to reference re-useable code and workflows, thereby clearly

123

showing the provenance of published scientific findings (Figure 1).

124

125

Figure 1. Schematic figure of steps required leading to reproducible and re-useable

126

hydrological publications.

127

The first step towards more open, reproducible science is to adopt common standards

128

that facilitate code readability and re-use. As most researchers in hydrology are scientists first,

129

programmers second, setting high standards for code re-use may be counter-productive to

130

broad adoption of reproducible practices. Yet long, poorly documented scripts are not re-

131

useable, and certainly difficult to reproduce if their ability to do the intended job cannot be

132

verified. As a minimum standard we therefore recommend that code should come with an

133

example workflow, as commonly adopted [e.g. Pianosi et al., 2015], and where possible, also

134

packaged with input and output data to provide a means to ensure correct implementation of a

135

method prior to application. Implementing code correctly however is not enough to make it re-

136

useable; sufficient information is required to understand what the code does, and to be

137

reproducible, whether it does this correctly. Therefore, code should be modularised into

138

functions and classes that may be re-useable by the wider community, with comments that

139

Most computational hydrology is not reproducible, so is it really science?

Citations

Twenty-three unsolved problems in hydrology (UPH)–a community perspective

The Future of Sensitivity Analysis: An essential discipline for systems modeling and policy support

The Variable Infiltration Capacity model version 5 (VIC-5): infrastructure improvements for new applications and reproducibility

A Comparison of Methods for Streamflow Uncertainty Estimation

Pysteps: an open-source Python library for probabilistic precipitation nowcasting (v1.0)

References

A physically based, variable contributing area model of basin hydrology

Why Most Published Research Findings Are False

Why Most Published Research Findings Are False

A physically based, variable contributing area model of basin hydrology / Un modèle à base physique de zone d'appel variable de l'hydrologie du bassin versant

The future of distributed models: model calibration and uncertainty prediction.

Related Papers (5)

Reproducible Research in Computational Science

Improvement of a parsimonious model for streamflow simulation

Sensitivity analysis of environmental models

1,500 scientists lift the lid on reproducibility