Visualization of Regression Models Using visreg

doi:10.32614/RJ-2017-046

CONTRIBUTED RESEARCH ARTICLE 56

Visualization of Regression Models

Using visreg

by Patrick Breheny and Woodrow Burchett

Abstract

Regression models allow one to isolate the relationship between the outcome and an ex-

planatory variable while the other variables are held constant. Here, we introduce an R package,

visreg

, for the convenient visualization of this relationship via short, simple function calls. In addition

to estimates of this relationship, the package also provides pointwise conﬁdence bands and partial

residuals to allow assessment of variability as well as outliers and other deviations from modeling

assumptions. The package provides several options for visualizing models with interactions, including

lattice plots, contour plots, and both static and interactive perspective plots. The implementation of

the package is designed to be fully object-oriented and interface seamlessly with R’s rich collection of

model classes, allowing a consistent interface for visualizing not only linear models, but generalized

linear models, proportional hazards models, generalized additive models, robust regression models,

and many more.

Introduction

In simple linear regression, it is both straightforward and extremely useful to plot the regression line.

The plot tells you everything you need to know about the model and what it predicts. It is common to

superimpose this line over a scatter plot of the two variables. A further reﬁnement is the addition of

a conﬁdence band. Thus, in one plot, the analyst can immediately assess the empirical relationship

between

x

and

y

in addition to the relationship estimated by the model and the uncertainty in that

estimate, and also assess how well the two agree and whether assumptions may be violated.

Multiple regression models address a more complicated question: what is the relationship between

an explanatory variable and the outcome as the other explanatory variables are held constant? This

relationship is just as important to visualize as the relationship in simple linear regression, but doing

so is not nearly as common in statistical practice.

As models get more complicated, it becomes more difﬁcult to construct these sorts of plots. With

multiple variables, we cannot simply plot the observed data, as this does not hold the other variables

constant. Interactions among variables, transformations, and non-linear relationships all add extra

barriers, making it time-consuming for the analyst to construct these plots. This is unfortunate,

however – as models grow more complex, there is an even greater need to represent them with clear

illustrations.

In this paper, we aim to eliminate the hurdle of implementation through the development of a

simple interface for visualizing regression models arising from a wide class of models: linear models,

generalized linear models, robust regression models, additive models, proportional hazards models,

and more. We implement this interface in R and provide it as the package

visreg

, publicly available

from the Comprehensive R Archive Network. The purpose of the package is to automate the work

involved in plotting regression functions, so that after ﬁtting one of the above types of models, the

analyst can construct attractive and illustrative plots with simple, one-line function calls. In particular,

visreg

offers several tools for the visualization of models containing interactions, which are among the

easiest to misinterpret and the hardest to explain.

It is worth noting that there are two distinct goals involved in plotting regression models: illustrat-

ing the ﬁtted model visually and diagnosing violations of model assumptions through examination of

residuals. The approach taken by

visreg

is to construct a single plot that simultaneously addresses

both goals. This is not a new idea. Indeed, this project was inspired by the work of Trevor Hastie,

Robert Tibshirani, and Simon Wood, who have convincingly demonstrated the utility of these types of

plots in the context of generalized additive models (Hastie and Tibshirani, 1990; Wood, 2006).

In particular,

visreg

offers partial residuals, which can be deﬁned for any regression model and are

easily superimposed on visualization plots. Partial residuals are widely useful in detecting many types

of problems, although several authors have pointed out that they are not without limitations (Mallows,

1986; Cook, 1993). Various extensions and modiﬁcations of partial residuals have been proposed,

and there is an extensive literature on regression diagnostics (Belsley et al., 1980; Cook and Weisberg,

1982); indeed, many diagnostics are speciﬁc to the type of model (e.g., Pregibon, 1981; Grambsch and

Therneau, 1994; Loy and Hofmann, 2013). Partial residuals are a useful, easily generalized idea that

can applied to virtually any type of model although it is certainly worth being aware of other types of

diagnostics that are speciﬁc to the modeling framework in question.

There are a number of R packages that offer functions for visualizing regression models, including

The R Journal Vol. 9/2, December 2017 ISSN 2073-4859

CONTRIBUTED RESEARCH ARTICLE 57

rms (Harrell, 2015), rockchalk (Johnson, 2016), car (Fox and Weisberg, 2011), effects (Fox, 2003), and,

in base R, the

termplot

function. The primary advantage of

visreg

over these alternatives is that each

of them is speciﬁc to visualizing a certain class of model, usually

lm

or

glm

.

visreg

, by virtue of its

object-oriented approach, works with any model that provides a

predict

method – meaning that it

can be used with hundreds of different R packages as well as user-deﬁned model classes. We also feel

that

visreg

offers a simpler interface and produces nicer-looking plots, but admit that beauty is in the

eye of the beholder. Nevertheless, there are situations in which each of these packages are very useful

and offer some features that others do not, such as greater ﬂexibility for other types of residuals (car)

and better support for visualizing three-way interactions (effects).

Each type of model has different mathematical details. All models, however, describe how the

response is expected to vary as a function of the explanatory variables. In R, this is implemented for an

extensive catalog of models that provide an associated

predict

method. Although there are no explicit

rules forcing programmers to write

predict

methods for a given class in a consistent manner, there

is a widely agreed-upon convention to follow the general syntax of

predict.lm

. It is this abstraction

upon which

visreg

is based: the use of object-oriented programming to provide a single tool with a

consistent interface for the convenient visualization of a wide array of models.

There are thousands of R packages, many of which provide an implementation of some type of

model. It is impossible for any programmer or team of programmers to write an R package that is

familiar with the details of all of them. However, the encapsulation and abstraction offered by an

object-oriented programming language allow for an elegant solution to this problem. By passing a

ﬁtted model object to

visreg

, we can call the

predict

method provided by that model class to obtain

appropriate predictions and standard errors without needing to know any of the details concerning

how those calculations work for that type of model; the same applies to construction of residuals

through the residual method.

The only other R package that we are aware of that provides this kind of object-oriented ﬂexibility

is

plotmo

by Stephen Milborrow. The

visreg

and

plotmo

projects were each started independently

around the year 2011 and have developed into mature, widely used packages for model visualization.

The organization and syntax of the packages is quite different, but both are based on the idea of using

the generic

predict

and

residuals

methods provided by a model class to offer a single interface

capable of visualizing virtually any type of model. The primary difference between the two packages

is that

plotmo

separates the visualization of models and the plotting of residuals, constructed using

the

plotmo()

and

plotres()

functions, respectively, while as mentioned earlier,

visreg

combines the

two into a single plot (

plotmo

offers an option to superimpose the unadjusted response onto a plot,

but this is very different from plotting partial residuals). Furthermore, as one would expect, each

package offers a few options that the other does not. For example,

plotmo

offers the ability to construct

partial dependence plots (Hastie et al., 2009), while

visreg

offers options for contrast plots and what

we call “cross-sectional” plots (Figs. 6, 7, and 8). Broadly speaking,

plotmo

is somewhat more oriented

towards machine learning-type models, while

visreg

is more oriented towards regression models,

though both packages can be used for either purpose. In particular,

plotmo

supports the

X,y

syntax

used by packages like

glmnet

, which is more popular among machine learning packages, while

visreg

focuses exclusively on models that use a formula-based interface.

The outline of the paper is as follows. In “Conditional and contrast plots”, we explicitly deﬁne

the relevant mathematical details for what appears in

visreg

’s plots. The remainder of the article is

devoted to illustrating the interface and results produced by the software in three extensions of simple

linear regression: multiple (additive) linear regression models, models that possess interactions, and

ﬁnally, other sorts of models, such as generalized linear models, proportional hazards models, random

effect models, random forests, etc.

Conditional and contrast plots

We begin by considering regression models, where all types of

visreg

plots are well-developed and

clearly deﬁned. At the end of this section, we describe how these ideas can be extended generically to

any model capable of making predictions.

In a regression model, the relationship between the outcome and the explanatory variables is

expressed in terms of a linear predictor η:

η = Xβ =

∑

j

x

j

β

j

, (1)

where

x

j

is the

j

th column of the design matrix

X

. For the sake of clarity, we focus in this section on

linear regression, in which the expected value of the outcome

E(Y

i

)

equals

η

i

; extensions to other,

nonlinear models are discussed in “Other models”. In the absence of interactions (see “Linear models

The R Journal Vol. 9/2, December 2017 ISSN 2073-4859

CONTRIBUTED RESEARCH ARTICLE 58

with interactions”), the relationship between

X

j

and

Y

is neatly summarized by

β

j

, which expresses

the amount by which the expected value of Y changes given a one-unit change in X

j

.

Partial residuals are a natural multiple regression analog to plotting the observed

x

and

y

in simple

linear regression. Partial residuals were developed by Ezekiel (1924), rediscovered by Larsen and

McCleary (1972), and have been discussed in numerous papers and textbooks ever since (Wood, 1973;

Atkinson, 1982; Kutner et al., 2004). Letting

r

denote the vector of residuals for a given model ﬁt, the

partial residuals belonging to variable j are deﬁned as

r

j

= y − X

−j

b

β

−j

(2)

= r + x

j

b

β

j

, (3)

where the

−j

subscript refers to the portion of

X

or

β

that remains after the

j

th column/element is

removed.

The reason partial residuals are a natural extension to the multiple regression setting is that the

slope of the simple linear regression of

r

j

on

x

j

is equal to the value

b

β

j

that we obtain from the multiple

regression model (Larsen and McCleary, 1972).

Thus, it would seem straightforward to visualize the relationship between

X

j

and

Y

by plotting a

line with slope

β

j

through the partial residuals. Clearly, however, we may add any constant to the line

and to

r

j

and the above result would still hold. Nor is it obvious how the conﬁdence bands should be

calculated.

We consider asking two subtly different questions about the relationship between X

j

and Y:

(1) What is the relationship between E(Y) and X

j

given x

−j

= x

∗

−j

?

(2) How do changes in X

j

relative to a reference value x

∗

j

affect E(Y)?

The biggest difference between the two questions is that the ﬁrst requires speciﬁcation of some

x

∗

−j

,

whereas the second does not. The reward for specifying

x

∗

−j

is that speciﬁc values for the predicted

E(Y)

may be plotted on the scale of the original variable

Y

; the latter type of plot can address only

relative changes. Here, we refer to the ﬁrst type of plot as a conditional plot, and the second type as

a contrast plot. As we will see, the two questions produce regression lines with identical slopes, but

with different intercepts and conﬁdence bands. It is worth noting that these are not the only possible

questions; other possibilities, such as “What is the marginal relationship between

X

j

and

Y

, integrating

over X

−j

?” exist, although we do not explore them here.

For a contrast plot, we consider the effect of changing

X

j

away from an arbitrary point

x

∗

j

; the

choice of

x

∗

j

thereby determines the intercept, as the line by deﬁnition passes through

(x

∗

j

, 0

)

. The

equation of this line is

y = (x − x

∗

j

)

b

β

j

. For a continuous

X

j

, we set

x

∗

j

equal to

¯

x

j

. The conﬁdence

interval at the point x

j

= x is based on

V(x) = V

n

ˆ

η(x) −

ˆ

η(x

∗

j

)

o

= (x − x

∗

j

)

2

V(

b

β

j

).

When

X

j

is categorical, we plot differences between each level of the factor and the reference category

(see Figure 3 for an example); in this case, we are literally plotting contrasts in the classical ANOVA

sense of the term (hence the name). Our usage of the term “contrast” for continuous variables is

somewhat looser, but still logical in the sense that it estimates the contrast between a value of

X

j

and

the reference value.

For a conditional plot, on the other hand, all explanatory variables are fully speciﬁed by

x

and

x

∗

−j

.

Let

λ(x)

T

denote the row of the design matrix that would be constructed from

x

j

= x

and

x

∗

−j

. Then

the equation of the line is y = λ(x)

T

b

β and the conﬁdence interval at x is based on

V(x) = V

n

λ(x)

T

b

β

o

= λ(x)

T

V(

b

β)λ(x).

In both conditional and contrast plots, the conﬁdence interval at

x

is then formed around the

estimate in the usual manner by adding and subtracting

t

n−p,1−α /2

p

V(x)

, where

t

n−p,1−α /2

is 1

− α/

2

quantile of the

t

distribution with

n − p

degrees of freedom. Examples of contrast plots and conditional

plots are given in Figures 2 and 3. Both plots depict the same relationship between wind and ozone

level as estimated by the same model (details given in the following section). Note the difference,

however, in the vertical scale and conﬁdence bands. In particular, the conﬁdence interval for the

contrast plot has zero width at

x

∗

j

; all other things remaining the same, if we do not change

X

j

, we can

say with certainty that

E(Y)

will not change either. There is still uncertainty, however, regarding the

The R Journal Vol. 9/2, December 2017 ISSN 2073-4859

CONTRIBUTED RESEARCH ARTICLE 59

actual value of

E(Y)

, which is illustrated in the fact that the conﬁdence interval of the conditional plot

has positive width everywhere.

This description of conﬁdence intervals focuses on Wald-type conﬁdence intervals of the form

of estimate

±

multiple of the standard error, constructed on the scale of the linear predictor. This

is the most common type of interval provided by modeling packages in R, and the only one for

which a widely agreed-upon, object-oriented consensus has emerged in terms of what the

predict

method returns. For this reason, this is usually the only type of interval available for plotting by

visreg

. However, it should be noted that these intervals are common for their convenience, not due to

superiority; it is typically the case that more accurate conﬁdence intervals exist (see, for example, Efron,

1987; Withers and Nadarajah, 2012). In principle, one could plot other types of intervals, but

visreg

does not calculate intervals itself so much as plot the intervals that the modeling package returns.

Thus, unless the modeling package provides methods for calculating other types of intervals,

visreg

is

restricted to plotting Wald intervals.

Contrast plots can only be constructed for regression-based models, as they explicitly require an

additive decomposition in terms of a design matrix and coefﬁcients. Conditional plots, however, can

be constructed for any model that produces predictions. Denote this prediction

f (x)

, where

x

is a

vector of predictors for the model. Writing this as a one-dimensional function of predictor

j

with the

remaining predictors ﬁxed at x

∗

−j

, let us express this prediction as f (x|x

∗

−j

). In a conditional plot, the

partial residuals for predictor j are

r

j

= r + x

j

b

β

j

+ x

∗

−j

b

β

−j

= r + f (x|x

∗

−j

),

which offers a clear procedure for constructing the equivalent of partial residual for general prediction

models. Note that this construction requires the model class to implement a

residuals

method. If a

model class lacks a

residuals

method,

visreg

will still produce a plot, but must omit the partial residu-

als; see “Non-regression models” for additional details. Likewise,

visreg

requires the

predict

method

for the model class to return standard errors in order to plot conﬁdence intervals; see “Hierarchical

and random effect models” for an example in which standard errors are not returned.

It is worth mentioning that

visreg

is only concerned with conﬁdence bands for the conditional

mean

E(Y|X)

, not “prediction intervals” that have a speciﬁed probability of containing a future

outcome

Y

observed for a certain value of

X

. Unlike standard errors for the mean, very few model

classes in R offer methods for calculating such intervals – indeed, such intervals are often not well-

deﬁned outside of classical linear models.

Additive linear models

We are now ready to describe the basic framework and usage of

visreg

. In this section, we will

ﬁt various models to a data set involving the relationship between air quality (in terms of ozone

concentration) and various aspects of weather in the standard R data set airquality.

Basic framework

The basic interface to the package is the function

visreg

, which requires only one argument: the ﬁtted

model object. So, for example, the following code produces Figure 1:

fit <- lm(Ozone ~ Solar.R + Wind + Temp, data=airquality)

visreg(fit)

By default,

visreg

provides conditional plots for each of the explanatory variables in the model.

For the conditioning, the other variables in

x

∗

−j

are set to their median for numeric variables and to

the most common category for factors. All of these options can be modiﬁed by passing additional

arguments to visreg. For example, contrast plots can be obtained with the

type

argument; the following

code produces Figure 2.

visreg(fit, "Wind", type="contrast")

visreg(fit, "Wind", type="conditional")

The second argument speciﬁes the explanatory variable to be visualized; note that the right plot in

Figure 2 is the same as the middle plot in Figure 1.

In addition to continuous explanatory variables,

visreg

also allows the easy visualization of

differences between the levels of categorical variables (factors). The following block of code creates a

The R Journal Vol. 9/2, December 2017 ISSN 2073-4859

CONTRIBUTED RESEARCH ARTICLE 60

0 50 150 250

0

20

40

60

80

100

120

140

Solar.R

Ozone

●

5 10 15 20

0

50

100

150

Wind

Ozone

●

60 70 80 90

0

50

100

150

Temp

Ozone

●

Figure 1:

Basic output of

visreg

for an additive linear model: conditional plots for each explanatory

variable.

5 10 15 20

−50

0

50

100

Wind

∆Ozone

●

5 10 15 20

0

50

100

150

Wind

Ozone

●

Figure 2:

The estimated relationship between wind and ozone concentration in the same model, as

illustrated by two different types of plots. Left: Contrast plot. Right: Conditional plot.

factor called

Heat

by discretizing

Temp

, and then visualizes its relationship with

Ozone

, producing the

plot in Figure 3.

airquality$Heat <- cut(airquality$Temp, 3, labels=c("Cool", "Mild", "Hot"))

fit.heat <- lm(Ozone ~ Solar.R + Wind + Heat, data=airquality)

visreg(fit.heat, "Heat", type="contrast")

visreg(fit.heat, "Heat", type="conditional")

−20

0

20

40

60

80

100

120

Heat

∆Ozone

●

Cool Mild Hot

0

20

40

60

80

100

120

140

Heat

Ozone

●

Cool Mild Hot

Figure 3:

Visualization of a regression function involving a categorical explanatory variable. Left:

Contrast plot. Right: Conditional plot.

Again, note that the conﬁdence interval for the contrast plot has zero width for the reference

category. There is no uncertainty about how the expected value of ozone will change if we remain at

the same level of

Heat

; it is zero by deﬁnition. On the other hand, the width of the conﬁdence interval

for

Mild

heat is wider for the contrast plot than it is for the conditional plot. There is less uncertainty

about the expected value of ozone on a mild day than there is about the difference in expected values

between mild and cool days.

The R Journal Vol. 9/2, December 2017 ISSN 2073-4859

Visualization of Regression Models Using visreg

Citations

Cites background or methods from "Visualization of Regression Models ..."

References

"Visualization of Regression Models ..." refers methods in this paper

Related Papers (5)