Forecasting Using Principal Components From a Large Number of Predictors

doi:10.1198/016214502388618960

Forecasting Using Principal Components

From a Large Number of Predictors

James

H.

STOCK

and

Mark

W. WATSON

This article considers forecasting a single time series when there are many predictors

(N)

and time series observations (T). When the

data follow an approximate factor model, the predictors can be summarized by a small number of indexes, which we estimate using

principal components. Feasible forecasts are shown to be asymptotically efficient in the sense that the difference between the feasible

forecasts and the infeasible forecasts constructed using the actual values of the factors converges in probability to

0

as both

N

and T

grow large. The estimated factors are shown to be consistent, even in the presence of time variation in the factor model.

KEY WORDS:

Factor models; Forecasting; Principal components.

1.

INTRODUCTION

classic factor analysis model. In our macroeconomic forecast-

This article considers forecasting one series using a large

number of predictor series. In macroeconomic forecasting, for

example, the number of candidate predictor series (N) can be

very large, often larger than the number of time series obser-

vations (T) available for model fitting. This high-dimensional

problem is simplified by modeling the covariability of the

series in terms of a relatively few number of unobserved latent

factors. Forecasting can then be carried out in a two-step pro-

cess. First, a time series of the factors is estimated from the

predictors; second, the relationship between the variable to be

forecast and the factors is estimated by a linear regression. If

the number of predictors is large, then precise estimates of the

latent factors can be constructed using simple methods even

under fairly general assumptions about the cross-sectional and

temporal dependence in the variables. We estimate the factors

using principal components, and show that these estimates are

consistent in an approximate factor model with idiosyncratic

errors that are serially and cross-sectionally correlated.

To be specific, let y, be the scalar time series variable to be

forecast and let

Xi be a N-dimensional multiple time series

of candidate predictors. It is assumed that (Xi, y,,,) admit a

factor model representation with

r

common latent factors F,,

X,

=

AF,

+

e,

(1)

and

Yr+h

=

PkFr

+PLwt

+

Et+h

(2)

where

e,

is a N

x

1 vector idiosyncratic disturbances,

h

is the

forecast horizon, w, is a

m

x

1 vector of observed variables

(e.g., lags of y,), that together with

F,

are useful for forecasting

y,+,, and

st+,

is the resulting forecast error. Data are available

for {y,, X,, w,}:,, and the goal is to forecast y,+,.

If the idiosyncratic disturbances e, in (1) were cross-

sectionally independent and temporally iid, then (1) is the

James

H.

Stock is Professor, Kennedy School of Government, Harvard

University, Cambridge, MA 02138, and the National Bureau of Economic

Research (E-mail:

james-stock@harvard.edu).

Mark W. Watson is Profes-

sor, Department of Economics and Woodrow Wilson School, Princeton

University, Princeton, NJ 08540, and the National Bureau of Economic

Research (E-mail: mwatson@princeton.edu). The results in this article origi-

nally appeared in the paper titled "Diffusion Indexes" (NBER Working Paper

6702, August 1998). The authors thank the associate editor and referees,

Jushan Bai, Michael Boldin, Frank Diebold, Gregory Chow, Andrew Harvey,

Lucrezia Reichlin, Ken Wallis, and Charles Whiteman for helpful discussions

and/or comments, and Lewis Chan, Piotr Eliasz, and Alexei Onatski for skilled

research assistance. This research was supported in part by National Science

Foundation grants SBR-9409629 and SBR-9730489.

ing application, these assumptions are unlikely to be satisfied,

and so we allow the error terms to be both serially corre-

lated and (weakly) cross-sectionally correlated. In this sense,

(1) is a serially correlated version of the approximate factor

model introduced by Chamberlain and Rothschild (1983) for

the study of asset prices. To construct forecasts of

y,,,, we

form principal components of {X,}T=, to serve as estimates of

the factors. These estimated factors, together with w,, are then

used in (2) to estimate the regression coefficients. The fore-

cast is constructed as

j,,,

=

&PT

+F,w,, where

p,,

and

pT

are the estimated coefficients and factors.

This article makes three contributions. First, under general

conditions on the errors discussed in Section 2, we show that

the principal components of

Xi are consistent estimators of

the true latent factors (subject to a normalization discussed

in Sec. 2). Consistency requires that both N and T

+

co,

although there are no restrictions on their relative rates of

increase. Second, we show that the feasible forecast, j,,,,

constructed from the estimated factors together with the esti-

mated coefficients converge to the infeasible forecast that

would be obtained if the factors and coefficients were known.

Again, this result holds as N,

T

+

co.

Thus the feasible fore-

cast is first-order asymptotically efficient. Finally, motivated

by the problem of temporal instability in macroeconomic fore-

casting models, we study the robustness of the consistency

results to time variation in the factor model. We show that

these results continue to hold when the temporal instability

is small (as suggested by empirical work in macroeconomics)

and weakly cross-sectionally dependent, in a sense that is

made precise in Section

3.

This article is related to a large literature on factor anal-

ysis and a much smaller literature on forecasting. The liter-

ature on principal components and classical factor models is

large and well known

(Lawley and Maxwell 1971). Sargent

and Sims (1977) and Geweke (1977) extended the classical

factor model to dynamic models, and several researchers have

applied versions of their dynamic factor model. In most appli-

cations of the classic factor model and its dynamic general-

ization, the dimension of

X

is small, and so the question of

O

2002 American Statistical Association

Journal of the American Statistical Association

December 2002, Vol. 97, No. 460, Theory and Methods

DO1 10.1 198101 621450238861 8960

1168

consistent estimation of the factors is not relevant. However,

several authors have noted that with large N, consistent esti-

mation is possible. Connor and Korajczyk (1986, 1988, 1993)

discussed the problem in a static model and argue that the fac-

tors can be consistently estimated by principle components as

N

-t

co even if the errors terms are weakly cross-sectionally

correlated. Forni and Reichlin (1996, 1998) and Forni, Hallin,

Lippi, and Reichlin (1999) discussed consistent (N, T

+

co)

estimation of factors in a dynamic version of the approximate

model. Finally, in a prediction problem similar to the one con-

sidered here, Ding and Hwang (1999) analyzed the properties

of forecasts constructed from principal components in a set-

ting with large N and T. Their analysis is conducted under

the assumption that error process {e,, is cross-sectionally

and temporally iid, an assumption that is inappropriate for

economic models and when interest focuses on multiperiod

forecasts. We highlight the differences between our results and

those of others later in the article.

The article is organized as follows. Section 2 presents the

model in more detail, discusses the assumptions, and presents

the main consistency results. Section 3 generalizes the model

to allow temporal instability in the factor model. Section 4

examines the finite-sample performance of these methods in a

Monte

Carlo study, and Section

5

discusses an application to

macroeconomic forecasting.

2.

THE MODEL AND ESTIMATION

2.1

Assumptions

As described in Section 1, we focus on a forecasting sit-

uation in which N and T are both large. This motivates our

asymptotic results requiring that N, T

+

co jointly or, equiv-

alently, that N

=

N(T) with lim,,, N(T)

+

co. No restric-

tions on the relative rates of N and T are required.

The assumptions about the model are grouped into assump-

tions about the factors and factor loading, assumptions about

the errors in the (I), and assumptions about the regressors and

errors in (2).

Assumption

Fl

(Factors and Factor Loading).

a.

(A'AIN)

-+

I,.

b.

E(F,FI1)

=

C,,,

where

ZFF

is a diagonal matrix with

elements

a,,

>

a,

>

0

for i

<

j.

C.

Ihi,,,l

5

h

<

M.

d. T-I

C,

F,F;

i;

XFF.

Assumption F1 serves to identify the factors. The nonsin-

gular limiting values of (A'AlN) and

ZFF

imply that each of

the factors provides a nonnegligible contribution to the aver-

age variance of

x,,,

where

x,,

is the ith element of

X,

and

the average is taken over both i and t. Moreover, because

AF,

=

ARR-'F, for any nonsingular matrix R, a normaliza-

tion is required to uniquely define the factors. Said differ-

ently, the model with factor loadings AR and factors R-IF,

is observationally equivalent to the model with factor load-

ings

A

and factors

F,.

Assumption Fl(a) restricts R to be

orthonormal, and this together with Assumption Fl(b) restricts

R to be a diagonal matrix with diagonal elements of

*I.

This

identifies the factors up to a change of sign. Equivalently,

Assumption Fl provides this normalization (asymptotically)

Journal

of

the American Statistical Association, December

2002

by associating A with the ordered orthonormal eigenvectors of

(NT)-I xT=, AF,F:A' and {F,}~=, with the principal compo-

nents of {AF,}:=,. The diagonal elements of

ZFF

correspond

to the limiting eigenvalues of (NT)-I

cT=,

AF,F,'A1. For con-

venience, these eigenvalues are assumed to be distinct. If they

were not distinct, then the factors could only be consistently

estimated up to an orthonormal transformation.

Assumption

Fl(b) allows the factors to be serially corre-

lated, although it does rule out stochastic trends and other pro-

cesses with nonconstant uncondititional second moments. The

assumption also allows lags of the factors to enter the equa-

tions for

x,,

and y,+,

.

A leading example of this occurs in the

dynamic factor model

and

Yt+h

=

Ps(L)'fr

+

P:.wI

+

&i+h, (4)

where A,(L) and Pf(L) are lag polynomials is nonnegative

powers of the lag operator L. If the lag polynomials have

finite order

q,

then (3)-(4) can be rewritten as (1)-(2) with

F,

=

(f(

fk,

.

f(-,)', and Assumption Fl(b) will be satisfied

if the

f,

process is covariance stationary.

In the classical model, the errors or "uniquenesses" are

assumed to be iid and normally distributed. This assumption is

clearly inappropriate in the macroeconomic forecasting appli-

cation, because the variables are serially correlated, and many

or the variables

(e.g., alternative measures of the money sup-

ply) may be cross-correlated even after the aggregate factors

are controlled for. We therefore modify the classic assump-

tions to accommodate these complications.

Assumption

MI (Moments of the Errors e,)

Let e,, denote the zth element of e,; then

b. E(e,telr)

=

rll,

r,

limN+,

SUP,

N-'

CLl

C,N_l

IriJ

l

1

<

"9

and

C. 1lmN-m SUP,

5

N-I

EL1 Ey=1

lcov(e,sei,, e,seJ,)l

<

CQ.

Assumption Ml(a) allows for serial correlation in the

el,

processes. As in the approximate factor model of Chamber-

lain and Rothschild (1983) and Connor and Korajczyk (1986,

1993), Assumption Ml(b) allows (e,,} to be weakly corre-

lated across series. Forni et al. (1999) also allowed for serial

correlation and cross-correlation with assumptions similar to

Ml(a)-(b). Normality is not assumed, but Ml(c) limits the

size of fourth moments.

It is assumed that the forecasting equation

(2)

is well

behaved in the sense that if {F,} were observed, then ordi-

nary least squares (OLS) would provide a consistent estimator

of the regression coefficients. The specific assumption is as

follows.

Assumption

YI

(Forecasting Equation).

Let

z,

=

(F: w:)'

and

p=

(pk

PL,)'.

Then the following hold:

a.

E(z,zi)

=

XI

=

['FF

xu,io

a positive definite matrix.

xw~

'~~'1

1169

Stock and Watson: Forecasting From Many Predictors

Assumptions Yl(a)-(c) are a standard set of conditions that

imply consistency of OLS from the regression of

y,+,

onto

(Fi wi). Here

F,

is not observed, and the additional assump-

tions are useful for showing consistency of the OLS regres-

sion coefficients in the regression of y,,, onto

(e

w:) and the

resulting forecast of y,+,.

2.2

Estimation

In "small-N" dynamic factor models, forecasts are gener-

ally constructed using a three-step process (see, e.g., Stock

and Watson 1989). First, parametric models are postulated for

the joint stochastic process {y,,,, X,, w,, e,}, and the sample

data {y,,,, X,, w,}:_;~ are used to estimate the parameters of

this process, typically using a Gaussian Maximum likelihood

estimator (MLE). Next, these estimated parameters are used

in signal extraction algorithms to estimate the unknown value

of F,. Finally, the forecast of y,,, is constructed using this

estimated value of the factor and the estimated parameters.

When N is large, this process requires estimating many param-

eters using iterative nonlinear methods, which can be compu-

tationally prohibitive. We therefore take a different approach

and estimate the dynamic factors nonparametrically using the

method of principal components.

Consider the nonlinear least squares objective function,

written as a function of hypothetical values of the factors

(F)

-

and factor loadings (A), where

F=

(4F2.

.

FT)' and

Xi

is the

ith row of

X.

Let

FI

and

;i

denote the minimizers of

v(F,

X).

After concentrating out

F,

minimizing (5) is equivalent to

maximizing ~~[X'X'XX]

=

where X is subject to A'A/N

I,,

the T

x

N data matrix with tth row Xi and tr(.) denotes the

matrix trace. This is the classical principal components prob-

lem, which is solved by setting

;i

equal to the eigenvectors of

X'X corresponding to its r largest eigenvalues. The resulting

principal components estimator of

F

is then

Computation of

F

requires the eigenvectors of the N

x

N

matrix X'X; when

N

>

T, a computationally simp@ approach

uses the T

x

T matrix XX'. By concentrating ou@, minimiz-

ing (5) is equivalent to maximizing ~~[F(xx')F], subject to

FFIT

=

I,

which yields the estimator, say

?,

which is the

matrix of the first r eigenvectors of XX'. The column spaces

-

of

F^

and

F

are equivalent, and so for forecasting purposes

they can be used interchangeably, depending on computational

convenience.

2.3 Consistent Estimation of Factors

and Forecasts

(1)

and

(2)

The

first result presented in this section shows that the prin-

cipal component estimator is pointwise (for any date t) con-

sistent and has limiting mean squared error (MSE) over all t

that converges in probability to 0. Because Assumption

F1

does not identify the sign of the factors, the theorem is stated

in terms of sign-adjusted estimators.

Theorem

1.

Let Si denote a variable with value of

f

1, let

N, T

+

co,

and suppose that F1 and

MI

hold. Suppose that

k

factors are estimated, where

k

may be

5

or

>

r, the true

number of factors. Then Si can be chosen so that the following

hold:

P

a. For

i

=

1,2,

. . .

,

r,

T-'

CL,(S~E,

-

-+

0.

-

P

b. For i= 1,2,..

.

,r, SiFl,-+

Fir.

c. Fori=r+l,

...,

k,

T-'C~~~$O.

The details of the proof are provided in the Appendix;

here we offer only a few remarks to provide some insight

into problem and the need for the assumptions given in

the preceding section. The estimation problem would be

considerably simplified if it happened that

A were known,

because then

F,

could be estimated by the least squares

regression of {xit)El onto {Ai)E1. Consistency of the result-

A

ing estimator would then be studied by analyzing

F,

-

F,

=

(A'A/N)-' (N-'

El

hie,,). Because N

-+

oo,

(AfA/N)

-+

I,

[by Fl(a)], and N-'

xi

hiei,

5

0 [by Ml(a) and Fl(c)], the

consistency of would follow directly. Alternatively, if

F

were known, then

A,

could be estimated by regression {xit}~=,

onto {F,}:,

,

and consistency would be studied analyzing

(T-'

C,

F,F:)-'T-'

Er

Freir, as T

+

oo

in a similar fashion.

Because both

F

and

A

are unknown, both N and T

+

co

are

needed, and as it turns out, the proof is more complicated than

these two simple steps suggest. The strategy that we have used

is to show that the first r eigenvectors of

(NT)-'X'X behave

like the first r eigenvectors of (NT)-'A'F'FA (Assumption

M1 is critical in this regard), and then show that these eigen-

vectors can be used to construct a consistent estimator of

F

(Assumption F1 is critical in this regard).

The next result shows that the feasible forecast (constructed

using the estimated factors and estimated parameters) con-

verges to the optimal infeasible forecast and thus is asymptoti-

cally efficient. In addition, it shows that the feasible regression

coefficient estimators are consistent.

The result assumes that the forecasting equation is estimated

using the

k

=

r factors. This is with little loss of generality,

because there are several methods for consistently estimating

the number of factors. For example, using analysis similar to

that in Theorem 1, Bai and Ng (2001) constructed estimators

of

r

based on penalized versions of the minimized value of

(5), and in an earlier version of this article (Stock and Watson

1998a), we developed a consistent estimator of r based on the

fit of the forecasting equation

(2).

Theorem

2.

Suppose that Y1 and the conditions of The-

orem

1

hold. Let

@,

and

fi,

denote the OLS estimates of

@,

and

@,

from the regression of {y,,,}~. onto

{c,

w,}Y_;h.

Then the following conditions hold:

a. (fi>F^,

+

fiU,wT>

-

(P>FT +@mwT)

+

P

0.

b.

p,,

-

@,

0

and Si (defined in Theorem

1)

can be cho-

sen so that SipiF

-

Pi,

0

for i

=

I,

. . .

,

r.

1170

The theorem follows directly from Theorem 1 together with

Assumption Y1. The details of the proof are given in the

Appendix.

3.

TIME-VARYING FACTOR LOADINGS

In practice, when macroeconomic forecasts are constructed

using many variables over a long period, some degree of tem-

poral instability is inevitable. In this section we model this

instability as stochastic drift in the factor loadings, and show

that if this drift is not too large and not too dependent across

series (in a sense made precise later), then the results of Theo-

rems 1 and 2 continue to hold. Thus the principal components

estimator and forecast are robust to small and idiosynchratic

shifts in the factor loadings.

Specifically, replace the time-invariant factor model

(1) with

and

'it

=

'it-1

+gi~lir

@I

for

i

=

1,

.

,

N and t

=

1,

. . .

,

T, where g,, is a scalar and

Lit

is an r

x

1 vector of random variables. This formulation

implies that factor loadings for the ith variable shift by an

amount, giTli,, in time period t. The assumptions given in this

section limit this time variation in two ways. First, the scalar

giT is assumed to be small [g,,

--

Op(T-I)] which is consistent

with the empirical literature in macroeconomics that estimates

the degree of temporal instability in macroeconomic relations

(Stock and Watson 1996,

1998b). This nesting implies that

means that

A,, -Aio

-

O,(T-'/~). Second,

lit

is assumed to

have weak cross-sectional dependence. That is, whereas some

of the

x

variables may undergo related shifts in a given period,

wholesale shifts involving a large number of the x's are ruled

out. Presumably such wholesale shifts could be better repre-

sented by shifts in the factors rather than in the factor load-

ings. In any event, this section shows that when these assump-

tions hold (along with technical assumptions given later), then

the instability does not affect the consistency of the principal

components estimator of

F,.

To motivate the additional assumptions used in this section,

rewrite

(7)

as

where air

=

eit

+

(A,,

-

Aio)

F,

=

eit

+

giT C:=,

lL!sF,.

This equa-

tion has the same general form of the time-invariant factor

model studied in the last section, with

A,

and a,, in (9) replac-

ing

A,

and e,, in the time-invariant model. This section intro-

duces two new sets of assumptions that imply that

Aio

and a,,

in (9) satisfy the assumptions concerning hi and e,, from the

preceding section. This means that the conclusions of Theo-

rems 1 and 2 will continue to hold for the time-varying factor

model of this section.

The

first new assumption is as follows.

Assumption F2.

a.

g,, is independent of F,, ej,, and

lj,

for all i, j, and t

and supi,j.k.m T[E(lgrTgjTgkTgmT1)1141

<

co

i,

j,

k,

and m.

Journal of the American Statistical Association, December 2002

b. The initial values of the values loadings satisfy

N-I

CiA:,Aio

=

AbA,,/N

4

I,

and sup,,,

IAij,,\

<

A,

where

Aij,,

is the jth element of

A,.

As discussed earlier, Assumption F2(a) makes the amount

of time variation small. Assumption F2(b) means that the ini-

tial value of the factor loadings satisfy the same assumptions

as the time-invariant factor loading of the preceding section.

The next assumption limits the dependence in

if,. This

assumption is written in fairly general form, allowing for some

dependence in the random variables in the model.

Assumption

M2.

Let

l,,,,

denote the mth element of

lit.

Then the following hold:

This assumption essentially repeats Assumption M1 for

the components of the composite error term a,, in (9).

To interpret the assumption, consider the leading case in

which the various components

{E,},

{F,}, {ei,}, and {lit}

are independent and have mean

0.

Then, assuming that

the

F,

have finite fourth moments, and given the assump-

tions made in the last section, Assumption M2 is satis-

fied if (a) limT+m

T-s

Cu=l-s su~i,m IE(lis,/lis+u,m)l

<

~23;

(b) lim~,m N-'

Ci

Cj

s'JPi.s,u IE(lis,/lju.m)I

<

and (c)

lim~+mN-'

Ci

Cj

SUP(ikl;=,.

I

lit2.iZ'

ljtj,

m.

{tkl;=,

~~~(lit,,i~

l,,,

lA)l

<

co,

which are the analogs of the assumptions in M1

applied to the

[

error terms.

These two new assumptions yield the main result of this

section, which follows.

Theorem

3.

Given Fl(b), Fl(d), F2, MI, and M2, the

results of Theorems 1 and 2 continue to hold.

The proof is given in the Appendix.

4.

MONTE CARL0 ANALYSIS

In this section we study some of the finite-sample proper-

ties of the principal components estimator and forecast using

a small Monte Carlo experiment. The framework used in the

preceding two sections was quite rich, allowing for distributed

lags of potentially serially correlated factors to enter the x and

y

equations, error terms that were conditionally heteroscedas-

tic and serially and cross-correlated, and factor loadings that

1171

Stock and Watson: Forecasting From Many Predictors

evolved through time. The design used here incorporates all

of these features, and the data are generated according to

and

a:

=

So +S,U:~, +Slv;,-l,

(15)

where

i

=

1,

. . .

,

N,

t

=

1,

. .

.

,

T,

f,

and

A:,,

are

J

x

1, and

the variables {Jijt}, {uj,}, and {77il} are mutually independent

iid N(0, 1) random variables. Equation (10) is dynamic factor

model with q lags of J factors that, as shown in Section 2, can

be represented as the static factor model (1) with r

=

J(l

+

q)

factors. From (12), the factors evolve as a vector autoregres-

sive [VAR(l)] model with common scalar autoregressive (AR)

parameter

a.

From (13), the error terms in the factor equation

are serially correlated, with an AR(1) coefficient of

a,

and

cross-correlated, [with spatial moving average [MA(1)] coeffi-

cient

b].

The innovations

wit

are conditionally heteroscedastic

and follow a GARCH(1, 1) process with parameters So, S,,

and 6, [see (14) and (15)l. Finally, from (1 I), the factor load-

ings evolve as random walk, with innovation standard devia-

tion proportional to

c.

The scalar variable to be forecast is generated as

where

L

is an J

x

1 vector of 1s and

E,+,

-

iid N(0, 1) and is

independent of the other errors in (10)-(15).

The other design details are as follows. The initial factor

loading matrix, A,, was chosen as a function of RZ, the frac-

tion of the variance of xio explained by

Fo.

The value of R?

was chosen as an iid random variable equal to 0 with proba-

bility

T

and drawn from a uniform distribution on

[.

1, .8] with

probability 1

-

T.

A nonzero value of

.rr

allows for the inclu-

sion of x's unrelated to the factors. Given this value of R;, the

initial factor loading was computed as

Aijo

=

A*

(R;) hijo, where

A*(R;) is a scalar and

hijo

-

iid N(O,1) and independent of

{qi,, lij, u,}. The initial values of the factors were drawn from

their stationary distribution. The parameter So was chosen so

that the unconditional variance of

vi,

was unity.

Principal components were used to estimate k factors, as

discussed in Section 2.2. These k estimated factors were

used to estimate

r

(the true number of factors) using meth-

ods described later, and the coefficients

P

in the forecasting

regression (2) were estimated by the OLS coefficients

6

in the

A

regression of y,+, onto F,,,

j

=

1,

. . .

,

i,

t

=

1,

.

. .

,

T

-

1,

where

F

is the estimated ntm_ber of factors. The out-of-sample

forecast is

jT+,/,

=

c:=, PjFjT. For comparison purposes, the

-

infeasible out-of-sample forecast

jT+,/,

=

PIFT

was also com-

puted, where

p

is

the OLS estimator obtained from regress-

ing y,,, onto F,,

t

=

1,

. . .

,

T

-

1. The free parameters in

the Monte Carlo experiment are N, T,

?,

q, k, T,

a,

b,

c,

S,,

and 6,.

The results are summarized by two statistics. The first statis-

tic is a trace R2 of the multivariate regression of

FI

onto F,

where

2

denotes the expectation estimated by averaging the

relevant statistic over the Monte Carlo repetitions and

PF

=

P

F(F'F)-IF'. According to Theorem 1, R;,

-+

1.

The second statistic measures how close the forecast based

on the estimated factors is to the infeasible forecast based on

the true factors,

P

Because

jT+llT

-

jT+,/,

-+

0 when k

=

r

from Theorem 1,

S;,

j,

should be close to 1 for appropriately large N and T.

S;,?

is computed for several choices of

i.

First, as a benchmark,

results are shown for

i

=

r. Second,

F

is formed using three

of the information criteria suggested by Bai and Ng (2001).

These criteria have the form ICp(k)

=

ln(c)

+

kg(T, N),

where

f;,

is the minimized value of the objective function (5)

for a model with k factors and gj(T, N) is a penalty function.

Three of the penalty functions suggested by Bai and Ng are

used:

and

where

C;,

=

min(N, T), resulting in criteria labeled IC,,,

IC,,, and ICp3. The minimizers of these criteria yield a con-

sistent estimator of r, and interest here focuses on their rela-

tive small-sample accuracy. Finally, results are shown with

i

computed using the conventional Akaike information criterion

(AIC) and Bayes information criterion (BIC) applied to the

forecasting equation (2).

The results are summarized in Table 1. Panel A presents

results for the static factor model with iid errors and factors

and with large N and

T(N, T

>

100). Panel B gives corre-

sponding results for small values of N and T(N, T

1

50).

Panel C adds irrelevant xi,'s to the model (T

>

0). Panel

D

extends the model to idiosyncratic errors that are serially

correlated, cross-correlated, conditionally heteroscedastic, or

some combination of these. Panel

E

considers the dynamic

factor model with serially correlated factors and/or lags of the

factors entering'

X,.

Finally, panel F gives time-varying factor

loadings.

First, consider the results for the static factor model shown

in panel A. The values of R;, exceed .85 except when many

redundant factors are estimated. The smallest value of

R;,~

is

.69,

which obtains when N and T are relatively small

(N

=

T

=

100) and there are 10 redundant factors (r

=

5 and

Forecasting Using Principal Components From a Large Number of Predictors

Figures

Citations

Time Series Analysis

Macroeconomic Forecasting Using Diffusion Indexes

25 years of time series forecasting

A PANIC Attack on Unit Roots and Cointegration

Panel Data Models With Interactive Fixed Effects

References

Determining the Number of Factors in Approximate Factor Models

Macroeconomic Forecasting Using Diffusion Indexes

Determining the Number of Factors in Approximate Factor Models

The Generalized Dynamic-Factor Model: Identification and Estimation

Arbitrage, Factor Structure, and Mean-Variance Analysis on Large Asset Markets

Related Papers (5)

Determining the Number of Factors in Approximate Factor Models

Macroeconomic Forecasting Using Diffusion Indexes

The Generalized Dynamic-Factor Model: Identification and Estimation

Inferential Theory for Factor Models of Large Dimensions

The Generalized Dynamic Factor Model one-sided estimation and forecasting