A PANIC ATTACK ON UNIT ROOTS AND COINTEGRATION
Jushan Bai
∗
Serena Ng
†
December 2001
Abstract
This paper develops a new methodology that makes use of the factor structure of large
dimensional panels to understand the nature of non-stationarity in the data. We refer to it
as PANIC– a ‘Panel Analysis of Non-stationarity in Idiosyncratic and Common components’.
PANIC consists of univariate and panel tests with a number of novel features. It can detect
whether the nonstationarity is pervasive, or variable-specific, or both. It tests the components
of the data instead of the observed series. Inference is therefore more accurate when the compo-
nents have different orders of integration. PANIC also permits the construction of valid panel
tests even when cross-section correlation invalidates pooling of statistics constructed using the
observed data. The key to PANIC is consistent estimation of the components even when the
regressions are individually spurious. We provide a rigorous theory for estimation and inference.
In Monte Carlo simulations, the tests have very good size and power. PANIC is applied to a
panel of inflation series.
Keywords: Panel data, common factors, common trends, principal components
∗
Dept. of Economics, Boston College, Chestnut Hill, MA 02467 Email Jushan.Bai@bc.edu
†
Dept. of Economics, Johns Hopkins University, Baltimore, MD 21218 Email: Serena.Ng@jhu.edu
This paper was presented at the NSF 2001 Summer Symposium on Econometrics and Statistics in Berkeley, California,
the CEPR/Banca d’ltalia Conference in Rome, and at NYU. We thank the seminar participants and the discussants
(Andrew Harvey and Marco Lippi) for many helpful comments. We also thank To dd Clark for providing us with the
inflation data. The first author acknowledges financial support from the NSF (grant SBR 9709508).
1 Introduction
Knowledge of whether a series is stationary or non-stationary is important for a wide range of
economic analysis. As such, unit root testing is extensively conducted in empirical work. But
in spite of the development of many elegant theories, the power of univariate unit root tests is
severely constrained in practice by the short span of macroeconomic time series. Panel unit root
tests have since been developed with the goal of increasing power through pooling information
across units. But pooling is valid only if the units are independent, an assumption that is perhaps
unreasonable given that many economic models imply, and the data support, the comovement of
economic variables.
In this paper, we propose a new approach to understanding non-stationarity in the data, both
on a series by series basis, and from the viewpoint of a panel. Rather than treating the cross-
section correlation as a nuisance, we exploit these comovements to develop new univariate statistics
and valid pooled tests for the null hypothesis of non-stationarity. Our tests are applied to two
components of the data, one with the characteristic that it is strongly correlated with many series,
and one with the characteristic that it is largely unit specific. More precisely, we consider a factor
analytic model:
X
it
= D
it
+ λ
0
i
F
t
+ e
it
where D
it
is a polynomial trend function of order p, F
t
is a r × 1 vector of common factors, and
λ
i
is a vector of factor loadings. The series X
it
is the sum of a deterministic component D
it
, a
common component λ
0
i
F
t
, and an error e
it
that is largely idiosyncratic. A factor model with N
variables will have N idiosyncratic components but a small number of common factors.
1
A series with a factor structure is non-stationary if one or more of the common factors are non-
stationary, or the idiosyncratic error is non-stationary, or both. Except by assumption, there is
nothing that restricts F
t
to be all I(1) or all I(0). There is also nothing that rules out the possibility
that F
t
and e
it
are integrated of different orders. These are not merely cases of theoretical interest,
but also of empirical relevance. As an example, let X
it
be real output of country i. It may consist
of a global trend component F
1t
, a global cyclical component F
2t
, and an idiosyncratic component
(e
it
) that may or may not be stationary. As another example, the inflation rate of durable goods
may consist of a component that is common to all prices, and a component that is specific to
durable goods.
It is well known that the sum of two time series can have dynamic properties very different
from the individual series themselves. If one component is I(1) and one is I(0), it could be difficult
1
This is a static factor model, and is to be distinguished from the dynamic factor model being analyzed in Forni,
Hallin, Lippi and Reichlin (2000).
1
to establish that a unit root exists from observations on X
it
alone, especially if the stationary
component is large. Unit root tests on X
it
can be expected to be oversized while stationarity
tests will have no power. The issue is documented in Schwert (1989), and formally analyzed in
Pantula (1991), Ng and Perron (2001), among others, in the context of a negative moving-average
component in the first-differenced data.
Instead of testing for the presence of unit roots in X
it
, the approach proposed in this paper is
to test the common factors and the idiosyncratic components separately. We refer such a Panel
Analysis of Non-stationarity in the Idiosyncratic and Common components as PANIC. PANIC
allows us to determine if nonstationarity comes from a pervasive or an idiosyncratic source. To
our knowledge, there does not exist a test in the literature for this purpose. PANIC can also
potentially resolve three econometric problems. The first is the size issue relating to summing
series with different orders of integration just mentioned. The second is a consequence of the fact
that the idiosyncratic components in a factor model can only be weakly correlated across i by design.
In contrast, X
it
will be strongly correlated across units if the data obey a factor structure. Thus,
pooled tests based upon e
it
are more likely to satisfy the cross-section independence assumption
required for pooling. The third relates to power, and follows from the fact that pooled tests exploit
cross-section information and are more powerful than univariate unit root tests.
Since the factors and the idiosyncratic components are both unobserved, and our objective is
to test if they have unit roots, the key to our analysis is consistent estimation of these components
irrespective of their stationarity properties. To this end, we propose a robust common-idiosyncratic
(I-C) decomposition of the data using large dimensional panels. That is, datasets in which the
number of observations in the time (T ) and the cross-section (N ) dimensions are both large. Loosely
speaking, the large N permits consistent estimation of the common variation whether or not they
are stationary, while a large T enables application of the relevant central limit theorems so that
limiting distributions of the tests can be obtained. Robustness is achieved by a ‘differencing and
re-cummulating’ estimation procedure so that I(1) and I(0) errors can be accommodated. Our
results add to the growing literature on large dimensional factor analysis by showing how consistent
estimates of the factors can be obtained using the method of principal components even without
imposing stationarity on the errors.
Our framework differs from conventional multivariate time series models in which N is small.
In small N analysis of cointegration, common trends and cycles, the estimation methodology being
employed typically depends on whether the variables considered are all I(1) or all I(0).
2
Pretesting
2
See, for example, King, Plosser, Stock and Watson (1991), Engle and Kozicki (1993), and Gonzalo and Granger
(1995).
2
for unit roots is thus necessary. Because N is small, what is extracted is the trend or the cycle
common to just a small number of variables. Not only is the information in many potentially
relevant series left unexploited, consistent estimation of common factors is in fact not possible
when the number of variables is small. In our analysis with N and T large, the common variation
can be extracted without appealing to stationarity assumptions and/or cointegration restrictions.
This makes it possible to decouple the extraction of common trends and cycles from the issue of
testing stationarity.
The rest of the paper is organized as follows. In Section 2, we describe the PANIC procedures
and present asymptotic results for the Dickey-Fuller t test of the unit root hypothesis. As an inter-
mediate result, we establish uniform consistency of the factor estimates even when the individual
regressions are spurious. As this result is important in its own right, we devote Section 3 to the
large sample properties of the factor estimates. Section 4 uses simulations to illustrate the prop-
erties of the factor estimates and the tests in finite samples. PANIC is then applied to a panel of
inflation data. Proofs are given in the Appendix.
2 PANIC
The data X
it
are assumed to be generated by
X
it
= c
i
+ β
i
t + λ
0
i
F
t
+ e
it
, t = 1, . . . T, (1)
F
mt
= α
m
F
mt−1
+ u
mt
m = 1, . . . r (2)
e
it
= ρ
i
e
it−1
+ ²
it
, i = 1, . . . N. (3)
Factor m is stationary if α
m
< 1. The idiosyncratic error e
it
is stationary if ρ
i
< 1. The objective
is to understand the stationarity property of F
mt
and e
it
when these are all unobserved, and for
which we estimate by the method of principal components.
When e
it
is I(0), the principal components estimators for F
t
and λ
i
have been shown to be
consistent when all the factors are I(0) and when some or all of them are I(1). But consistent
estimation of the factors when e
it
is I(1) has not been considered in the literature. Indeed, when
e
it
has a unit root, a regression of X
it
on F
t
is spurious even if F
t
was observed, and the estimates
of λ
i
and thus of e
it
will not be consistent. The validity of PANIC thus hinges on the ability to
obtain estimates of F
t
and e
it
that preserve their orders of integration, both when e
it
is I(1) and
when it is I(0). We now outline a set of procedures that accomplish this goal. Essentially, the
trick is to apply the method of principal components to the first differenced data. We show in this
section that inference about unit roots is not affected by the fact that F
t
and e
it
are not observed.
3
We defer the discussion on the theoretical underpinnings of PANIC and the properties of factor
estimates to Section 3 so as to keep unit root testing the main focus of this section.
We consider two specifications of the deterministic trend function, leading to what will be
referred as the intercept only model and the linear trend model. We assume the number of common
factors (r) is known.
3
To simplify the proof, we let ²
it
and u
t
be serially uncorrelated. This allows
us to consider the t statistic on the first order autoregressive parameter developed in Dickey and
Fuller (1979). More general errors can be permitted, provided they satisfy the assumptions stated
in Section 3. Remarks to this effect will be made below.
2.1 The Intercept Only Case
The factor model in the intercept only case is
X
it
= c
i
+ λ
0
i
F
t
+ e
it
. (4)
We assume E(∆F
t
) = 0. This is without loss of generality because if F
t
= a + ξ
t
such that
E(∆ξ
t
) = 0, then X
it
= c
i
+ λ
0
i
a + λ
0
i
ξ
t
+ e
it
. The first differenced model ∆X
it
= λ
0
i
∆ξ
t
+ ∆e
it
is
thus observationally equivalent to ∆X
it
= λ
0
i
∆F
t
+ ∆e
it
. Denote
x
it
= ∆X
it
, f
t
= ∆F
t
, and z
it
= ∆e
it
. (5)
Then the model in first-differenced form is:
x
it
= λ
0
i
f
t
+ z
it
. (6)
The test statistics are constructed as follows:
1. Difference the data and estimate f
t
and λ
i
from (6) by the method of principal components. To
be precise, let x be the (T −1) ×N data matrix such that the i
th
column is (x
i2
, x
i3
, ..., x
iT
)
0
,
i = 1, 2, ..., N. Let f = (f
2
, f
3
, ..., f
T
)
0
and Λ = (λ
1
, ..., λ
N
)
0
. The principal component
estimator of f , denoted
b
f, is
√
T − 1 times the r eigenvectors corresponding to the first r
largest eigenvalues of the (T − 1) × (T − 1) matrix xx
0
. The estimated loading matrix is
b
Λ = x
0
b
f/(T − 1). Define bz
it
= x
it
−
b
λ
0
i
b
f
t
.
2. Given
b
f
t
, define for each m = 1, . . . r,
b
F
mt
=
t
X
s=2
b
f
ms
.
3
Consistent estimation of r is possible using the method of Bai and Ng (2002) with data in differences. It can be
shown that this will not affect the limiting distribution of the test statistics when the numb er of factors is estimated.
4