1

Modeling for Understanding v. Modeling for Numbers

(In Press at Ecosystems)

Edward B. Rastetter

The Ecosystems Center, Marine Biological Laboratory, Woods Hole, MA 02543

Abstract. I draw a distinction between Modeling for Numbers, which aims to address how

much, when, and where questions, and Modeling for Understanding, which aims to address how

and why questions. For-numbers models are often empirical, which can be more accurate than

their mechanistic analogues as long as they are well calibrated and predictions are made within

the domain of the calibration data. To extrapolate beyond the domain of available system-level

data, for-numbers models should be mechanistic, relying on the ability to calibrate to the system

components even if it is not possible to calibrate to the system itself. However, development of a

mechanistic model that is reliable depends on an adequate understanding of the system. This

understanding is best advanced using a for-understanding modeling approach. To address how

and why questions, for-understanding models have to be mechanistic. The best of these for-

understanding models are focused on specific questions, stripped of extraneous detail, and

elegantly simple. Once the mechanisms are well understood, one can then decide if the benefits

of incorporating the mechanism in a for-numbers model is worth the added complexity and the

uncertainty associated with estimating the additional model parameters.

Introduction. I draw a distinction between two types of modeling that actually represent

extremes on a continuum. The first I call Modeling for Numbers. The questions addressed using

these models can be summarized as: How much, where, and when? For example, how much

carbon will be sequestered or released, by which parts of the biosphere, on what time course over

the next 100 years (e.g., Cramer and others 2001)? The use of these models is clearly important;

they address pressing environmental issues and attract a large amount of research money and

effort. The second type of modeling I call Modeling for Understanding. The questions

addressed with these models can be summarized as: How and why? For example, why can there

be only one species per limiting factor (Levin 1970)? These for-understanding questions are

more qualitative than the for-numbers questions. The emphasis of modeling for understanding is

to understand underlying mechanisms, often by stripping away extraneous detail and thereby

sacrificing quantitative accuracy. Modeling for understanding is at least as important as

modeling for numbers (Ågren and Bosatta 1990), although the application to pressing ecological

issues might be less direct.

Modeling for Numbers. There is no inherent reason why a for-numbers model has to be

mechanistic. Answers to how much, where, and when can frequently be found based on past

experience using purely empirical or statistical models. Such models have been used for

thousands of years, for example, to know when to sow crops (e.g., after the Nile flood; Janick

2002). Modern science relies on non-mechanistic models in many ways. For example, to assess

the medical risk of smoking, LaCroix and others (1991) followed 11,000 individuals, 65 years of

age or older, for five years to quantify the relationship between mortality rates and smoking

2

(Table 1). The resulting tabular model is purely correlative and therefore cannot address the how

and why connecting smoking to mortality, but it has diagnostic and predictive value. The push

for "big data" approaches in Ecology hopes to capitalize on analogous analyses of large

ecological data bases (e.g., Hampton and others 2013).

Empirical models are common in ecology. For example, biomass allometric equations

(Yanai and others 2010), stand self-thinning relationships (Vanclay and Sands 2009), and

degree-day sum phenology models (Richardson and others 2006) are all empirical models.

Although various mechanisms might be hypothesized based on an examination with these

models (e.g., West and Brown 2005), the models themselves have no underlying mechanism and

therefore describe, rather than explain, the relationship.

Empirical models like the ones listed above have obvious value. I would further argue that

in terms of producing quantitative predictions, empirical models in Biology are often, perhaps

usually, more accurate than mechanistic models. For example, I cannot conceive of a

mechanistic model doing as well predicting increased mortality rates with smoking as the

LaCroix and others (1991) tabular model (Table 1); there is simply too much uncertainty

associated with any hypothesized causal mechanism. Even if the underlying mechanism is well

understood, error in estimating the parameters needed to implement a mechanistic model adds

uncertainty that might overwhelm any benefit of a mechanistic approach (O'Neill 1973). I think

that most empirical models are more accurate than their mechanistic analogues, with two

caveats: (1) that there are enough data available to adequately calibrate the empirical model and

(2) that the predictions are interpolated within the domain of the data used to calibrate the

empirical model.

The weakness of empirical models is in extrapolation. Outside the domain of the calibration

data there is just no way to know how well the empirical model will work, and in many cases the

extrapolation is known to be poor (e.g., Richardson and others 2006). I suspect that most

empirically based ecological models will not do well if extrapolated to the warmer, high-CO

2

conditions of the future; the new conditions will change productivity, allometry, competition,

phenology, and many other ecosystem characteristics and thereby alter the relationships

underlying these models.

Of course, many of the most pressing environmental issues involve extrapolation into

conditions for which there is little or no data (e.g., under future CO

2

concentrations). Because of

the long response times of most ecosystems, experimental approaches cannot generate the data

needed to develop empirical models quickly enough to be of practical use. The only alternative

is to use mechanistic models.

But why should mechanistic models be better for extrapolation than empirical models? The

Table 1. Relative mortality rates in relation to smoking for men and women over 65.

Numbers indicate the factor (and 95% confidence limit) by which mortality rates increase

relative to individuals of the same sex that have never smoked. Source: LaCroix and others

(1991).

Men

Women

Current smoker

2.1 (1.7 - 2.7)

1.8 (1.4 - 2.4)

Former smoker

1.5 (1.2 - 1.9)

1.1 (0.8 - 1.5)

3

main reason is that it is often possible to empirically constrain mathematical representations of

the components of a system even when it is not possible to similarly constrain an empirical

representation of the whole system. Mechanistic models take advantage of the hierarchical

structure of ecosystems (O'Neill and others 1986) and tie system-level behaviors to the

characteristics of and interactions among the components of that system (Rastetter and Vallino

2015). Because of this hierarchical structure, the system components have to be smaller and

respond more quickly than the system itself (O'Neill and others 1986), which makes them more

tractable for experimental and observational study than the whole system. For example, the

long-term, whole-system question to be addressed might be: Will forests sequester carbon over

the next 200 years of elevated CO

2

and warming? To address this question empirically at the

ecosystem scale would require replicated experiments on whole forests that last 200 years. With

that data one might then derive empirical relationships between initial stand biomass and soil

properties and the magnitude of carbon sequestration or loss. However, such an approach is not

of much use for predicting those responses for the next 200 years because the experiment takes

too long. The mechanistic alternative is to instead conduct short-term experiments (<10 years)

on individual trees of different ages and different species, and on soils with different

characteristics and then piece that information and any other available information together in a

mechanistic model of the ecosystem to try to predict the long-term, whole-system rate of carbon

sequestration.

This mechanistic approach also has its caveats (Rastetter 1996). Although short-term

experiments can constrain representations of system components, they do not yield information

about feedbacks acting at a system level when those components are linked together. Thus there

is no way to know if the slow-responding system feedbacks that might dominate long-term

responses are adequately represented in the model. The model might therefore be corroborated

with existing short-term data even though it is inadequate for making long-term projections.

Conversely, the system-level predictions of the model might be falsified with short-term, high-

frequency data even though the long-term, slow-responding feedbacks that dominate long-term

responses, and will eventually override the short-term, high-frequency responses, are in fact

adequately represented in the model. Confidence in such models should therefore be taken with

caution (Stroeve and others 2007), but should build slowly over time with the iterative process of

model development and testing and the accumulation of many independent sources of

corroborating evidence.

A key objective of modeling for numbers is often prediction accuracy. O'Neill (1973)

postulated a tradeoff between model complexity and errors associated with estimating the

parameters needed to represent that complexity in the model. As more process detail is

incorporated into the model, prediction of system-level dynamics should improve because more

of the processes determining those dynamics are included in the model. However, the added

model complexity comes at the price of having to estimate more parameters. Error in estimating

those parameters will propagate through the model and, as more parameters are added, prediction

accuracy will deteriorate. Thus as the model becomes more complex there should be a tradeoff

between errors associated with lack of mechanistic detail in a too-simple representation of the

system (systematic error) versus cumulative errors associated with estimation of more and more

parameters to account for those mechanistic details (estimation error). This tradeoff results in an

optimum model complexity where overall prediction error is minimized.

The O'Neill (1973) analysis, however, presupposes that the underlying structure of the

4

system being modeled is actually understood. Only if that structure is understood will added

model complexity be guaranteed to reduce systematic error. If it is not understood, then the

added complexity might have no relation to the real mechanism and systematic error could

actually increase. Thus, there is at least one more axis to be considered in the O'Neill (1973)

analysis, an axis reflecting how well the system is understood.

Modeling for Understanding. My ontological perspective is strictly reductionist; in

principle, all properties of a system can be explained, and therefore understood, based on the

properties of its component parts and their interactions. However, what seems straightforward in

principle is often intractable in practice. The problem is the daunting complexity of biological

systems. Bedau (2013) argues that some systems have interactions that are "too complex to

predict exactly in practice, except by crawling the causal web." In this view, emergent system

properties can be fully explained in terms of the properties of its component parts and their

interactions, but that explanation might be "incompressible" in the sense that the system

properties can only be replicated by simulation of the full complexity of the system (Bedau

2013).

The issue of incompressibility is hugely problematic. Taken to its extreme, it implies that a

system can only be understood from the perspective of a model that is at least as complex as the

system itself. What possible use could such a model be, other than to demonstrate that you have

"crawled the causal web" correctly? Certainly the heuristic value of such a model would be very

limited. Indeed, the formulations of many for-understanding models in ecology are selected

explicitly for ease of analytical or graphical analysis, that is, for compressibility (e.g., Lotka

1925, Volterra 1926, MacArthur and Levins 1964, Tilman 1980). Achieving this compressibility

requires a high degree of abstraction, a focus on a specific subset of system properties, and the

sacrifice of quantitative accuracy. In exchange, there can be substantial heuristic return.

Unlike for-numbers models, which at least have accurate quantitative prediction as a

common goal, it is difficult to generalize about for-understanding models except to say that they

have to be mechanistic. Otherwise, how could they address how and why questions? However,

a mechanistic model does not require inclusion of every process or mechanism ever described for

the system. As I imply above, such an approach is counterproductive; it degrades the heuristic

value of the model and therefore impedes understanding rather than enhances it. The key

modeling step, and often the most difficult aspect of modeling for understanding, is identifying

only those components and processes absolutely needed to address the question being asked.

The typical for-understanding model has three elements: (1) a characterization of the

potential behaviors of each of the relevant system components, (2) a characterization of the

interactions among these system components, and (3) a set of boundary conditions that specify,

for example, the initial properties of the system components and the influence of any factors

outside the system on the components of the system. The very best for-understanding models are

elegantly simple.

A classic example of an elegant for-understanding model is the Lotka-Volterra model of the

interactions among competing species (Lotka 1925, Volterra 1926):

(1)

i

n

j

jjii

ii

i

K

NK

Nr

dt

dN

1

5

where N

i

is the number of individuals of species i, r

i

is the intrinsic growth parameter for species

i, K

i

is the number of individuals of species i that the environment is able to support in the

absence of competition (carrying capacity),

ji

is the number of individuals of species i that are

displaced from the carrying capacity by one individual of species j, n is the number of competing

species, and t is time. The components of the system are the environment, characterized by the

carrying capacities for each of the n species (K

i

), and the n species of competing populations,

characterized by their intrinsic rates of growth

(r

i

N

i

). The environment interacts with each of

the species through a density-dependent

feedback that slows the rate of population

growth as the population size approaches the

environment’s carrying capacity for that species

([K

i

- N

i

]/K

i

; here I have assumed

ii

= 1; Fig.

1). Each species j interacts with the other

species i by reducing the carrying capacity of

the environment for species i in proportion to

the abundance of species j (

ji

N

j

). The only

boundary conditions needed are the initial sizes

of each of the n populations.

The Lotka-Volterra model spawned lots of

research up through the early 1980s (e.g., Gause

1934, MacArthur and Wilson 1967, Parry

1981). This research sought to examine the

nature of competition, the structuring of

communities, and the struggle for existence.

The model is still used today, but mostly as a

component within larger models (e.g., Pao

2015). However, perhaps the most important

legacy of this or any for-understanding model is

that through its limitations it inspires a new

generation of models.

The most influential of these next-

generation models is one developed by

MacArthur and Levins (1964) and further

developed and applied especially by Tilman

(1977, 1980):

(2)

ii

jij

j

p

j

ii

i

Bm

Rk

R

Bg

dt

dB

1

min

(3)

n

i

kik

k

p

k

iiijj

j

Rk

R

BgqS

dt

dR

1

1

min

where B

i

is the biomass of species i, g

i

and m

i

are the growth and turnover parameters for

Figure 1: Comparison of net growth

versus species abundance for Eqs. 1, 2, and

5. Upper panel- Lotka (1925) and Volterra

(1926) model (Eq. 1) with r

i

= 0.01,

ii

= 1,

and

ji

= 0 for j ≠ i. Middle panel -

MacArthur and Levins (1964) model (Eq.

2) with g

i

= 0.1, k

ij

= 10, and m

i

= 0.05 and

R = the most limiting resource is held

constant at the specified value. Lower

panel - Rastetter and Ågren (2002) model

(Eq. 5) with g

i

= 0.125, k

ij

= 10, and m

i

=

0.05,

i

= 0.001,

i

= 0.00431, and R = the

most limiting resource is held constant at

the specified value.

-0.4

-0.2

0

0.2

0.4

0 20 40 60 80 100

-0.4

-0.2

0

0.2

0.4

0 20 40 60 80 100

-0.4

-0.2

0

0.2

0.4

0 20 40 60 80 100

Individuals (N) or biomass (B)

Net growth (dN/dt or dB/dt)

K = 100

R = R*= 10

R = 10.9

Lotka-Volterra

MacArthur & Levins

Rastetter & Ågren