scispace - formally typeset
Open AccessJournal ArticleDOI

Modeling for Understanding v. Modeling for Numbers

Edward B. Rastetter
- 01 Mar 2017 - 
- Vol. 20, Iss: 2, pp 215-221
Reads0
Chats0
TLDR
A distinction is drawn between Modeling for Numbers, which aims to address how much, when, and where questions, and Models for Understanding,Which aims to addressed how and why questions.
Abstract
I draw a distinction between Modeling for Numbers, which aims to address how much, when, and where questions, and Modeling for Understanding, which aims to address how and why questions. For-numbers models are often empirical, which can be more accurate than their mechanistic analogues as long as they are well calibrated and predictions are made within the domain of the calibration data. To extrapolate beyond the domain of available system-level data, for-numbers models should be mechanistic, relying on the ability to calibrate to the system components even if it is not possible to calibrate to the system itself. However, development of a mechanistic model that is reliable depends on an adequate understanding of the system. This understanding is best advanced using a for-understanding modeling approach. To address how and why questions, for-understanding models have to be mechanistic. The best of these for-understanding models are focused on specific questions, stripped of extraneous detail, and elegantly simple. Once the mechanisms are well understood, one can then decide if the benefits of incorporating the mechanism in a for-numbers model is worth the added complexity and the uncertainty associated with estimating the additional model parameters.

read more

Content maybe subject to copyright    Report

1
Modeling for Understanding v. Modeling for Numbers
(In Press at Ecosystems)
Edward B. Rastetter
The Ecosystems Center, Marine Biological Laboratory, Woods Hole, MA 02543
Abstract. I draw a distinction between Modeling for Numbers, which aims to address how
much, when, and where questions, and Modeling for Understanding, which aims to address how
and why questions. For-numbers models are often empirical, which can be more accurate than
their mechanistic analogues as long as they are well calibrated and predictions are made within
the domain of the calibration data. To extrapolate beyond the domain of available system-level
data, for-numbers models should be mechanistic, relying on the ability to calibrate to the system
components even if it is not possible to calibrate to the system itself. However, development of a
mechanistic model that is reliable depends on an adequate understanding of the system. This
understanding is best advanced using a for-understanding modeling approach. To address how
and why questions, for-understanding models have to be mechanistic. The best of these for-
understanding models are focused on specific questions, stripped of extraneous detail, and
elegantly simple. Once the mechanisms are well understood, one can then decide if the benefits
of incorporating the mechanism in a for-numbers model is worth the added complexity and the
uncertainty associated with estimating the additional model parameters.
Introduction. I draw a distinction between two types of modeling that actually represent
extremes on a continuum. The first I call Modeling for Numbers. The questions addressed using
these models can be summarized as: How much, where, and when? For example, how much
carbon will be sequestered or released, by which parts of the biosphere, on what time course over
the next 100 years (e.g., Cramer and others 2001)? The use of these models is clearly important;
they address pressing environmental issues and attract a large amount of research money and
effort. The second type of modeling I call Modeling for Understanding. The questions
addressed with these models can be summarized as: How and why? For example, why can there
be only one species per limiting factor (Levin 1970)? These for-understanding questions are
more qualitative than the for-numbers questions. The emphasis of modeling for understanding is
to understand underlying mechanisms, often by stripping away extraneous detail and thereby
sacrificing quantitative accuracy. Modeling for understanding is at least as important as
modeling for numbers (Ågren and Bosatta 1990), although the application to pressing ecological
issues might be less direct.
Modeling for Numbers. There is no inherent reason why a for-numbers model has to be
mechanistic. Answers to how much, where, and when can frequently be found based on past
experience using purely empirical or statistical models. Such models have been used for
thousands of years, for example, to know when to sow crops (e.g., after the Nile flood; Janick
2002). Modern science relies on non-mechanistic models in many ways. For example, to assess
the medical risk of smoking, LaCroix and others (1991) followed 11,000 individuals, 65 years of
age or older, for five years to quantify the relationship between mortality rates and smoking

2
(Table 1). The resulting tabular model is purely correlative and therefore cannot address the how
and why connecting smoking to mortality, but it has diagnostic and predictive value. The push
for "big data" approaches in Ecology hopes to capitalize on analogous analyses of large
ecological data bases (e.g., Hampton and others 2013).
Empirical models are common in ecology. For example, biomass allometric equations
(Yanai and others 2010), stand self-thinning relationships (Vanclay and Sands 2009), and
degree-day sum phenology models (Richardson and others 2006) are all empirical models.
Although various mechanisms might be hypothesized based on an examination with these
models (e.g., West and Brown 2005), the models themselves have no underlying mechanism and
therefore describe, rather than explain, the relationship.
Empirical models like the ones listed above have obvious value. I would further argue that
in terms of producing quantitative predictions, empirical models in Biology are often, perhaps
usually, more accurate than mechanistic models. For example, I cannot conceive of a
mechanistic model doing as well predicting increased mortality rates with smoking as the
LaCroix and others (1991) tabular model (Table 1); there is simply too much uncertainty
associated with any hypothesized causal mechanism. Even if the underlying mechanism is well
understood, error in estimating the parameters needed to implement a mechanistic model adds
uncertainty that might overwhelm any benefit of a mechanistic approach (O'Neill 1973). I think
that most empirical models are more accurate than their mechanistic analogues, with two
caveats: (1) that there are enough data available to adequately calibrate the empirical model and
(2) that the predictions are interpolated within the domain of the data used to calibrate the
empirical model.
The weakness of empirical models is in extrapolation. Outside the domain of the calibration
data there is just no way to know how well the empirical model will work, and in many cases the
extrapolation is known to be poor (e.g., Richardson and others 2006). I suspect that most
empirically based ecological models will not do well if extrapolated to the warmer, high-CO
2
conditions of the future; the new conditions will change productivity, allometry, competition,
phenology, and many other ecosystem characteristics and thereby alter the relationships
underlying these models.
Of course, many of the most pressing environmental issues involve extrapolation into
conditions for which there is little or no data (e.g., under future CO
2
concentrations). Because of
the long response times of most ecosystems, experimental approaches cannot generate the data
needed to develop empirical models quickly enough to be of practical use. The only alternative
is to use mechanistic models.
But why should mechanistic models be better for extrapolation than empirical models? The
Table 1. Relative mortality rates in relation to smoking for men and women over 65.
Numbers indicate the factor (and 95% confidence limit) by which mortality rates increase
relative to individuals of the same sex that have never smoked. Source: LaCroix and others
(1991).
Men
Women
Current smoker
2.1 (1.7 - 2.7)
1.8 (1.4 - 2.4)
Former smoker
1.5 (1.2 - 1.9)
1.1 (0.8 - 1.5)

3
main reason is that it is often possible to empirically constrain mathematical representations of
the components of a system even when it is not possible to similarly constrain an empirical
representation of the whole system. Mechanistic models take advantage of the hierarchical
structure of ecosystems (O'Neill and others 1986) and tie system-level behaviors to the
characteristics of and interactions among the components of that system (Rastetter and Vallino
2015). Because of this hierarchical structure, the system components have to be smaller and
respond more quickly than the system itself (O'Neill and others 1986), which makes them more
tractable for experimental and observational study than the whole system. For example, the
long-term, whole-system question to be addressed might be: Will forests sequester carbon over
the next 200 years of elevated CO
2
and warming? To address this question empirically at the
ecosystem scale would require replicated experiments on whole forests that last 200 years. With
that data one might then derive empirical relationships between initial stand biomass and soil
properties and the magnitude of carbon sequestration or loss. However, such an approach is not
of much use for predicting those responses for the next 200 years because the experiment takes
too long. The mechanistic alternative is to instead conduct short-term experiments (<10 years)
on individual trees of different ages and different species, and on soils with different
characteristics and then piece that information and any other available information together in a
mechanistic model of the ecosystem to try to predict the long-term, whole-system rate of carbon
sequestration.
This mechanistic approach also has its caveats (Rastetter 1996). Although short-term
experiments can constrain representations of system components, they do not yield information
about feedbacks acting at a system level when those components are linked together. Thus there
is no way to know if the slow-responding system feedbacks that might dominate long-term
responses are adequately represented in the model. The model might therefore be corroborated
with existing short-term data even though it is inadequate for making long-term projections.
Conversely, the system-level predictions of the model might be falsified with short-term, high-
frequency data even though the long-term, slow-responding feedbacks that dominate long-term
responses, and will eventually override the short-term, high-frequency responses, are in fact
adequately represented in the model. Confidence in such models should therefore be taken with
caution (Stroeve and others 2007), but should build slowly over time with the iterative process of
model development and testing and the accumulation of many independent sources of
corroborating evidence.
A key objective of modeling for numbers is often prediction accuracy. O'Neill (1973)
postulated a tradeoff between model complexity and errors associated with estimating the
parameters needed to represent that complexity in the model. As more process detail is
incorporated into the model, prediction of system-level dynamics should improve because more
of the processes determining those dynamics are included in the model. However, the added
model complexity comes at the price of having to estimate more parameters. Error in estimating
those parameters will propagate through the model and, as more parameters are added, prediction
accuracy will deteriorate. Thus as the model becomes more complex there should be a tradeoff
between errors associated with lack of mechanistic detail in a too-simple representation of the
system (systematic error) versus cumulative errors associated with estimation of more and more
parameters to account for those mechanistic details (estimation error). This tradeoff results in an
optimum model complexity where overall prediction error is minimized.
The O'Neill (1973) analysis, however, presupposes that the underlying structure of the

4
system being modeled is actually understood. Only if that structure is understood will added
model complexity be guaranteed to reduce systematic error. If it is not understood, then the
added complexity might have no relation to the real mechanism and systematic error could
actually increase. Thus, there is at least one more axis to be considered in the O'Neill (1973)
analysis, an axis reflecting how well the system is understood.
Modeling for Understanding. My ontological perspective is strictly reductionist; in
principle, all properties of a system can be explained, and therefore understood, based on the
properties of its component parts and their interactions. However, what seems straightforward in
principle is often intractable in practice. The problem is the daunting complexity of biological
systems. Bedau (2013) argues that some systems have interactions that are "too complex to
predict exactly in practice, except by crawling the causal web." In this view, emergent system
properties can be fully explained in terms of the properties of its component parts and their
interactions, but that explanation might be "incompressible" in the sense that the system
properties can only be replicated by simulation of the full complexity of the system (Bedau
2013).
The issue of incompressibility is hugely problematic. Taken to its extreme, it implies that a
system can only be understood from the perspective of a model that is at least as complex as the
system itself. What possible use could such a model be, other than to demonstrate that you have
"crawled the causal web" correctly? Certainly the heuristic value of such a model would be very
limited. Indeed, the formulations of many for-understanding models in ecology are selected
explicitly for ease of analytical or graphical analysis, that is, for compressibility (e.g., Lotka
1925, Volterra 1926, MacArthur and Levins 1964, Tilman 1980). Achieving this compressibility
requires a high degree of abstraction, a focus on a specific subset of system properties, and the
sacrifice of quantitative accuracy. In exchange, there can be substantial heuristic return.
Unlike for-numbers models, which at least have accurate quantitative prediction as a
common goal, it is difficult to generalize about for-understanding models except to say that they
have to be mechanistic. Otherwise, how could they address how and why questions? However,
a mechanistic model does not require inclusion of every process or mechanism ever described for
the system. As I imply above, such an approach is counterproductive; it degrades the heuristic
value of the model and therefore impedes understanding rather than enhances it. The key
modeling step, and often the most difficult aspect of modeling for understanding, is identifying
only those components and processes absolutely needed to address the question being asked.
The typical for-understanding model has three elements: (1) a characterization of the
potential behaviors of each of the relevant system components, (2) a characterization of the
interactions among these system components, and (3) a set of boundary conditions that specify,
for example, the initial properties of the system components and the influence of any factors
outside the system on the components of the system. The very best for-understanding models are
elegantly simple.
A classic example of an elegant for-understanding model is the Lotka-Volterra model of the
interactions among competing species (Lotka 1925, Volterra 1926):
(1)
i
n
j
jjii
ii
i
K
NK
Nr
dt
dN
1

5
where N
i
is the number of individuals of species i, r
i
is the intrinsic growth parameter for species
i, K
i
is the number of individuals of species i that the environment is able to support in the
absence of competition (carrying capacity),
ji
is the number of individuals of species i that are
displaced from the carrying capacity by one individual of species j, n is the number of competing
species, and t is time. The components of the system are the environment, characterized by the
carrying capacities for each of the n species (K
i
), and the n species of competing populations,
characterized by their intrinsic rates of growth
(r
i
N
i
). The environment interacts with each of
the species through a density-dependent
feedback that slows the rate of population
growth as the population size approaches the
environment’s carrying capacity for that species
([K
i
- N
i
]/K
i
; here I have assumed
ii
= 1; Fig.
1). Each species j interacts with the other
species i by reducing the carrying capacity of
the environment for species i in proportion to
the abundance of species j (
ji
N
j
). The only
boundary conditions needed are the initial sizes
of each of the n populations.
The Lotka-Volterra model spawned lots of
research up through the early 1980s (e.g., Gause
1934, MacArthur and Wilson 1967, Parry
1981). This research sought to examine the
nature of competition, the structuring of
communities, and the struggle for existence.
The model is still used today, but mostly as a
component within larger models (e.g., Pao
2015). However, perhaps the most important
legacy of this or any for-understanding model is
that through its limitations it inspires a new
generation of models.
The most influential of these next-
generation models is one developed by
MacArthur and Levins (1964) and further
developed and applied especially by Tilman
(1977, 1980):
(2)
ii
jij
j
p
j
ii
i
Bm
Rk
R
Bg
dt
dB
1
min
(3)
n
i
kik
k
p
k
iiijj
j
Rk
R
BgqS
dt
dR
1
1
min
where B
i
is the biomass of species i, g
i
and m
i
are the growth and turnover parameters for
Figure 1: Comparison of net growth
versus species abundance for Eqs. 1, 2, and
5. Upper panel- Lotka (1925) and Volterra
(1926) model (Eq. 1) with r
i
= 0.01,
ii
= 1,
and
ji
= 0 for j ≠ i. Middle panel -
MacArthur and Levins (1964) model (Eq.
2) with g
i
= 0.1, k
ij
= 10, and m
i
= 0.05 and
R = the most limiting resource is held
constant at the specified value. Lower
panel - Rastetter and Ågren (2002) model
(Eq. 5) with g
i
= 0.125, k
ij
= 10, and m
i
=
0.05,
i
= 0.001,
i
= 0.00431, and R = the
most limiting resource is held constant at
the specified value.
-0.4
-0.2
0
0.2
0.4
0 20 40 60 80 100
-0.4
-0.2
0
0.2
0.4
0 20 40 60 80 100
Individuals (N) or biomass (B)
Net growth (dN/dt or dB/dt)
K = 100
R = R*= 10
R = 10.9
Lotka-Volterra
MacArthur & Levins
Rastetter & Ågren

Citations
More filters
Journal ArticleDOI

The Theory of Island Biogeography

TL;DR: Preface to the Princeton Landmarks in Biology Edition vii Preface xi Symbols used xiii 1.
Journal ArticleDOI

Advancing global change biology through experimental manipulations: Where have we been and where might we go?

TL;DR: This commentary summarizes the publication history of Global Change Biology for works on experimental manipulations over the past 25 years and highlights a number of key publications.
Journal ArticleDOI

Looking beyond the mean: Drivers of variability in postfire stand development of conifers in Greater Yellowstone

TL;DR: The relative importance of drivers of stand structural variability differed between density and basal area and among species due to differential species traits, growth rates, and sensitivity to intraspecific competition versus abiotic conditions.
Journal ArticleDOI

The multi-assumption architecture and testbed (MAAT v1.0): R code for generating ensembles with dynamic model structure and analysis of epistemic uncertainty from multiple sources

TL;DR: MAAT is a modular modelling code that can simply and efficiently vary model structure (process representation), allowing for the generation and running of large model ensembles that vary in process representation, parameters, parameter values, and environmental conditions during a single execution of the code.
References
More filters
Journal ArticleDOI

The Logic of Scientific Discovery

T. W. Hutchison, +1 more
- 01 Jun 1959 - 
Journal ArticleDOI

Verification, Validation, and Confirmation of Numerical Models in the Earth Sciences

TL;DR: Verification and validation of numerical models of natural systems is impossible because natural systems are never closed and because model results are always nonunique.
Journal ArticleDOI

The Paradox of the Plankton

TL;DR: The problem that is presented by the phytoplankton is essentially how it is possible for a number of species to coexist in a relatively isotropic or unstructured environment all competing for the same sorts of materials.
Related Papers (5)