scispace - formally typeset
Open AccessJournal ArticleDOI

Cluster Analysis of Typhoon Tracks. Part I: General Properties

Reads0
Chats0
TLDR
In this article, a probabilistic clustering technique based on a regression mixture model was used to describe tropical cyclone trajectories in the western North Pacific, where each component of the mixture model consists of a quadratic regression curve of cyclone position against time.
Abstract
A new probabilistic clustering technique, based on a regression mixture model, is used to describe tropical cyclone trajectories in the western North Pacific. Each component of the mixture model consists of a quadratic regression curve of cyclone position against time. The best-track 1950–2002 dataset is described by seven distinct clusters. These clusters are then analyzed in terms of genesis location, trajectory, landfall, intensity, and seasonality. Both genesis location and trajectory play important roles in defining the clusters. Several distinct types of straight-moving, as well as recurving, trajectories are identified, thus enriching this main distinction found in previous studies. Intensity and seasonality of cyclones, though not used by the clustering algorithm, are both highly stratified from cluster to cluster. Three straight-moving trajectory types have very small withincluster spread, while the recurving types are more diffuse. Tropical cyclone landfalls over East and Southeast Asia are found to be strongly cluster dependent, both in terms of frequency and region of impact. The relationships of each cluster type with the large-scale circulation, sea surface temperatures, and the phase of the El Nino–Southern Oscillation are studied in a companion paper.

read more

Content maybe subject to copyright    Report

Cluster Analysis of Typhoon Tracks. Part I: General Properties
SUZANA J. CAMARGO AND ANDREW W. ROBERTSON
International Research Institute for Climate and Society, The Earth Institute at Columbia University, Palisades, New York
SCOTT J. GAFFNEY AND PADHRAIC SMYTH
Department of Computer Science, University of California, Irvine, Irvine, California
MICHAEL GHIL*
Department of Atmospheric and Oceanic Sciences, and Institute for Geophysics and Planetary Physics, University of California,
Los Angeles, Los Angeles, California
(Manuscript received 6 January 2006, in final form 28 August 2006)
ABSTRACT
A new probabilistic clustering technique, based on a regression mixture model, is used to describe tropical
cyclone trajectories in the western North Pacific. Each component of the mixture model consists of a
quadratic regression curve of cyclone position against time. The best-track 1950–2002 dataset is described
by seven distinct clusters. These clusters are then analyzed in terms of genesis location, trajectory, landfall,
intensity, and seasonality.
Both genesis location and trajectory play important roles in defining the clusters. Several distinct types
of straight-moving, as well as recurving, trajectories are identified, thus enriching this main distinction found
in previous studies. Intensity and seasonality of cyclones, though not used by the clustering algorithm, are
both highly stratified from cluster to cluster. Three straight-moving trajectory types have very small within-
cluster spread, while the recurving types are more diffuse. Tropical cyclone landfalls over East and South-
east Asia are found to be strongly cluster dependent, both in terms of frequency and region of impact.
The relationships of each cluster type with the large-scale circulation, sea surface temperatures, and the
phase of the El Niño–Southern Oscillation are studied in a companion paper.
1. Introduction
Typhoons have a large socioeconomic impact in
many Asian countries. The risk of landfall of a typhoon
or tropical storm depends on its trajectory. These tra-
jectories, in turn, vary strongly with the season (Gray
1979; Harr and Elsberry 1991), as well as on interannual
(Chan 1985) and interdecadal time scales (Ho et al.
2004). However, current knowledge is largely qualita-
tive, and the probabilistic behavior of tropical cyclone
trajectories needs to be better understood in order to
isolate potentially predictable aspects of landfall. Well-
calibrated probabilistic seasonal predictions of landfall
risk could form an important tool in risk management.
Tropical cyclogenesis over the tropical northwest
(NW) Pacific takes place in a broad region west of the
date line, between about and 25°N. South of 15°N,
most of these tropical cyclones (TCs) follow rather
straight west-northwestward tracks. About one-third of
them continue in this direction and make landfall in
southeast Asia and southern China. Most of the re-
mainder “recurve,” that is, slow down, turn northward,
and then accelerate eastward as they enter the midlati-
tude westerlies (e.g., Harr and Elsberry 1995). Another
fraction of TCs track northward over the ocean, posing
no threat to land.
The large-scale circulation of the atmosphere has a
* Additional affiliation: Département Terre-Atmosphére-
Océan, and Laboratoire de Météorologie Dynamique du CNRS/
IPSL, Ecole Normale Supérieure, Paris, France.
Corresponding author address: Dr. Suzana J. Camargo, Inter-
national Research Institute for Climate and Society, Monell 225,
61 Route 9W, Palisades, NY 10964-8000.
E-mail: suzana@iri.columbia.edu
15 J
ULY 2007 CAMARGO ET AL. 3635
DOI: 10.1175/JCLI4188.1
© 2007 American Meteorological Society
JCLI4188

predominant role in determining a TCs motion through
the steering by the surrounding large-scale flow (e.g.,
Chan and Gray 1982; Franklin et al. 1996; Chan 2005).
The cyclone and the environment interact to modify the
surrounding flow (Wu and Emanuel 1995), and the vor-
tex is then advected (steered) by the modified flow.
One important dynamical factor is the beta drift, in-
volving the interaction of the cyclone, the planetary
vorticity gradient, and the environmental flow. This
leads TCs to move northwestward even in a resting
environment in the Northern Hemisphere (Adem 1956;
Holland 1983; Wu and Wang 2004). Other effects can
also be important: the interaction of tropical cyclones
with mountain ranges leads to significant variations in
tracks, as often occurs in Taiwan (Wu and Kuo 1999).
This two-part study explores the hypothesis that the
large observed spread of TC tracks over the tropical
NW Pacific can be described well by a small number of
clusters of tracks, or TC regimes. The observed TC
variability on seasonal and interannual time scales is
then interpreted in terms of changes in the frequency of
occurrence of these TC regimes. In this paper, we ex-
plore the basic attributes of the underlying clusters by
applying a new clustering technique to the best-track
dataset of the Joint Typhoon Warning Center (JTWC).
The technique employs a mixture of polynomial regres-
sion models (i.e., curves) to fit the geographical
shape of the trajectories (Gaffney and Smyth 1999,
2005; Gaffney 2004). Camargo et al. (2007, hereafter
Part II) examine relationships between the clusters we
describe in the present paper and the large-scale atmo-
spheric circulation, as well as the El NiñoSouthern
Oscillation (ENSO).
In midlatitude meteorology, the concept of planetary
circulation regimes (Legras and Ghil 1985), sometimes
called weather regimes (Reinhold and Pierrehumbert
1982), has been introduced in attempting to connect the
observations of persistent and recurring midlatitude
flow patterns with large-scale atmospheric dynamics.
These midlatitude circulation regimes have intrinsic
time scales of several days to a week or more and exert
a control on local weather (e.g., Robertson and Ghil
1999). Longer time-scale variability of weather statistics
(TCs in our case) is a result of changes over time in the
frequency-of-occurrence of circulation regimes. This
paradigm of climate variability provides a counterpart
to wave-like decompositions of atmospheric variability,
allowing the connection to be made with oscillatory
phenomena (Ghil and Robertson 2002), such as the
MaddenJulian oscillation.
Circulation regimes have most often been defined in
terms of clustering, whether fuzzy (Mo and Ghil 1987)
or hierarchical (Cheng and Wallace 1993), in terms of
maxima in the probability density function (PDF) of
the large-scale, low-frequency flow (Molteni et al. 1990;
Kimoto and Ghil 1993a,b), as well as in terms of quasi
stationarity (Ghil and Childress 1987; Vautard 1990)
and, more recently, using a probabilistic Gaussian mix-
ture model (Smyth et al. 1999).
In the case of TC trajectories, the K-means method
(MacQueen 1967) has been used to study western
North Pacific (Elsner and Liu 2003) and North Atlantic
(Elsner 2003) TCs. In those studies, the grouping
was done according to the positions of maximum and
final hurricane intensity (i.e., the last position at which
the TC had hurricane intensity). In both basins, three
clusters were chosen to describe the trajectories. The
K-means approach has also been used to cluster
North Atlantic extratropical cyclone trajectories, where
6-hourly latitudelongitude positions over 3 days were
converted into 24-dimensional vectors suitable for clus-
tering (Blender et al. 1997).
The K-means method is a straightforward and widely
used partitioning method that seeks to assign each track
to one of K groups such that the total variance among
the groups is minimized. However, K-means cannot ac-
commodate tracks of different lengths, and we show
this to be a serious shortcoming for TCs. On a different
approach, Harr and Elsberry (1995) used fuzzy cluster
analysis and empirical orthogonal functions to describe
the spatial patterns associated with different typhoon
characteristics.
The finite mixture model used in this paper to fit the
geographical shape of the trajectories allows the clus-
tering to be posed in a rigorous probabilistic framework
and accommodates tropical cyclone tracks of different
lengths. These characteristics provide advantages over
the K-means method used in previous studies. The
main novelty here is to use an objective method to
classify the typhoon tracks based not only on a few
points of the trajectory, but on trajectory shape and
location.
The clustering methodology is briefly described in
section 2 and applied to the JTWC best-track dataset in
section 3. The two main trajectory types identified by
the cluster analysis correspond to straight movers and
recurvers; additional clusters correspond to more de-
tailed differences among these two main types, based
on location and track type. We study several character-
istics of the TCs in each cluster, including first position,
mean track, landfall, intensity, and lifetime, and com-
pare them with previous works in section 4. Discussion
and conclusions follow in section 5. In Part II, we study
how the large-scale circulation and ENSO affect each
cluster.
3636 JOURNAL OF CLIMATE VOLUME 20

2. Data and methodology
a. Data and definitions
The TC data used in this paper were based on the
JTWC best-track dataset available at 6-hourly sampling
frequency over the time interval 19502002 (Joint Ty-
phoon Warning Center 2005). The tracks were studied
over the western North Pacific, defined such that the
latitudelongitude of the TCs are inside the rectangle
(0°–60°N and 100°E180°) during at least part of their
lifetimes. The clustering technique and the resulting
analysis were applied to a total of 1393 cyclone tracks.
We included only TCs with tropical storm intensity or
higher: tropical storms (TSs), both category 1 and 2
typhoons (TYs) as defined by the SaffirSimpson scale
(Saffir 1977; Simpson and Riehl 1981), and intense ty-
phoons (ITYs; categories 35). Tropical depressions are
not included in the analysis.
The observed data quality is thought to be consider-
ably poorer during presatellite years (pre-1970). We
assume that although some of the TCs may be missing
in the JTWC (2005) database for the presatellite data,
especially those that remain over the ocean, the tracks
for those that do appear in the dataset are reliable, even
if their intensity is not. We repeated the cluster analysis
for the time interval 19702002 and found that the types
of tracks obtained in each cluster are essentially the
same. This verification lends credence to the data in the
earlier part of the record and demonstrates the robust-
ness of our results.
b. Clustering methodology
We present here a brief summary of the clustering
methodology (details are given in the appendix). A
more complete discussion is given by Gaffney (2004),
with an application of the clustering method to extra-
tropical cyclones over the North Atlantic (Gaffney et
al. 2007; a Matlab toolbox with the clustering algo-
rithms described in this paper is available online at
http://www.datalab.uci.edu/resources/CCT).
Our curve clustering method is based on the finite
mixture model (e.g., Everitt and Hand 1981), which
represents a data distribution as a convex linear com-
bination of component density functions. A key feature
of the mixture model is its ability to model highly non-
Gaussian (and possibly multimodal) densities using a
small set of basic component densities. Finite mixture
models have been widely used for clustering data in a
variety of areas (e.g., McLachlan and Basford 1988),
including the large-scale atmospheric circulation
(Smyth et al. 1999; Hannachi and ONeill 2001).
Regression mixture models extend the standard mix-
ture modeling framework by replacing the marginal
component densities with conditional density compo-
nents. The new conditional densities are functions of
the data (i.e., cyclone position) conditioned on an in-
dependent variable (i.e., time). In this paper, the com-
ponent densities model a cyclones longitudinal and
latitudinal positions versus time using quadratic poly-
nomial regression functions, as discussed in Gaffney
(2004). The latitude and longitude positions are treated
as conditionally independent given the model, and thus
the complete function for a cyclone track is the product
of these two. Other models, such as higher-order poly-
nomials and splines can also be used within the mixture
framework, but the simple quadratic model appears to
offer the best trade-off between ease of interpretation
and goodness-of-fit.
Each trajectory (i.e., each cyclone track) is assumed
to be generated by one of K different regression mod-
els, each having its own shape parameters. The cluster-
ing problem is to (i) learn the parameters of all K mod-
els given the TC tracks, and (ii) infer which of the K
models are most likely to have generated each TC
track. Each track can be assigned to the mixture com-
ponent (and thus the cluster) that was most likely to
have generated that track given the model. In other
words, the assigned cluster has the highest posterior
probability given the track. An expectation maximiza-
tion (EM) algorithm for learning these model param-
eters can be defined in a manner similar to that for
standard (unconditional) mixtures (DeSarbo and Cron
1988; Gaffney and Smyth 1999; McLachlan and Krish-
nan 1997; McLachlan and Peel 2000). The resulting EM
algorithm is straightforward to implement and use, and
its computational complexity is linear in the number of
observations.
Certain preprocessing steps are typically performed
on the cyclone tracks prior to clustering. For example,
Blender et al. (1997) subtract the coordinates of the
initial points of each extratropical cyclone track so that
they all begin at the latitudelongitude position of 0°,
0°. In addition they also normalize the latitude and lon-
gitude measurements to have the same variance. In our
experiments below we did not use any such preprocess-
ingclustering the tracks directly produced results that
were easier to interpret and more meaningful than the
clustering of preprocessed tracks.
c. Number of clusters
To select the most appropriate number of clusters,
we looked at both the in-sample and out-of-sample log-
likelihood values. The log-likelihood is defined as the
log-probability of the observed data under the model,
which can be seen as a goodness-of-fit metric for proba-
bilistic models. Used as an objective measure, one se-
15 JULY 2007 CAMARGO ET AL. 3637

lects the number of clusters for which the log-likelihood
is largest across a candidate set of values. Our resulting
in-sample score curve is shown in Fig. 1 (the out-of-
sample curve is similar and is not shown). The observed
log-likelihood values increased in direct relation to the
number of clusters, and thus did not directly provide an
optimal number of chosen clusters. In addition the
within-cluster spread is plotted in Fig. 2 and can be used
as an additional measure for goodness of fit. The curves
in Figs. 1 and 2 mirror each other, showing obvious
diminishing returns of improvement in fit beyond K
68, suggesting a reasonable stopping point somewhere
in-between.
To evaluate the values K 68 as candidates for the
number of clusters, we also carried out a qualitative
analysis based on how much the track types differ from
one cluster to another as the number of clusters in-
creases. Preliminary results carried out with six clusters
(Camargo et al. 2004) are very similar to those pre-
sented here. The main difference is that one of the K
6 track types splits in two when K is set to 7, with
slightly different characteristics. Most of the results pre-
sented here and in Camargo et al. (2007) are not sen-
sitive to the choices between K 68. As described by
Camargo et al. (2007), the choice of K 7 is found to
produce particularly interpretable results with respect
to ENSO and was thus taken to be our final choice.
Figure 3 illustrates how the choice for the number of
clusters from K 29 affects the final regression
curves. To emphasize differences in shape, the mean
regression trajectories are plotted with their initial po-
sitions collocated at the origin. The two main types of
TC behavior found in previous studies (Harr and Els-
berry 1991, 1995) are evident in these plots, namely,
straight movers and recurvers. The differentiation
between the two types is achieved for K 3. For each
of these two broad types, additional clusters yield dif-
ferences in compass bearing for the straight movers and
differences in the recurving portion for the recurvers.
This remark is particularly valid for odd values of K
(Figs. 3b,c,f,h). Although some of the regression curves
look very similar in Fig. 3, their initial positions differ in
several cases and there are also differences in trajectory
length. Since the regression curves are plotted with the
same number of points, the distances between plotted
points are smaller or larger based on average speed
over such a period. It is interesting to note that along
the recurving trajectories, the points are very close to
each other within the recurving portion, showing that
TCs slow down before changing direction. The recurv-
ing usually occurs when the storms move from a region
of easterlies to a region of westerlies, with the wind
speed decreasing near the recurve point. It is important
to note, however, that the clustering technique has no
access to the wind fields.
The regression trajectories for the six, seven, and
eight clusters are shown in Fig. 4; in this case, the initial
positions were retained. Note that the odd (even) clus-
ters share greater similarity than adjacent values of K.
For the chosen number of clusters (K 7), shown in
FIG. 1. Log-likelihood values for different number of TC track
clusters. The log-likehood values shown are the maximum of 16
runs, obtained by a random permutation of the tropical cyclones
given to the cluster model.
FIG. 2. Within cluster error for different number of TC track
clusters. The cluster error values shown are the minimum of 16
runs, obtained by a random permutation of the tropical cyclones
given to the cluster model.
3638 JOURNAL OF CLIMATE VOLUME 20

Fig. 4c there are four clusters of straight movers and
three of recurvers. Notice the strong separation be-
tween the clusters in terms of their genesis location: five
clusters have genesis positions near 10°N in latitude,
but spread in longitude from near the Philippines to just
west of the date line. The other two clusters (both re-
curvers) start near 20°N.
Looking at the population of each cluster in Table 1,
we see that there are three dominant clusters (A, B, and
C), each accounting for approximately 20% of the
tracks. Clusters D and E occur less often (13%), while
clusters F and G (each containing about 100 cyclones)
are relatively rare (8%). When only considering the last
33 yr, 19702002, the number and characteristics of the
clusters did not change (see section 2a), but their rela-
tive sizes did change somewhat (not shown), with the
dominant clusters (such as A and C) decreasing and the
least populated ones (E, F, and G) increasing. This sig-
nificant change in relative cluster sizes could be due to
either a decadal shift in the occurrence of tracks (Ho et
al. 2004), or to data issues, with fewer TCs being de-
tected over open waters before the satellite era.
3. Tropical cyclone clusters
a. Trajectories
The TC tracks in clusters AG from the time interval
19832002 are shown in Fig. 5, along with the mean
regression curves for each cluster. For comparison, the
tracks of all TCs in the same time interval are also
shown (Fig. 5h). The figure illustrates the high degree
of geographic localization achieved by the cluster
analysis, mainly due to the fact that the tracks were not
reduced to a common origin before performing the
clustering. The spread about the mean track for the
straight-moving clusters B, D, and F is particularly
small. Although the mean regression trajectories of
FIG. 3. Mean regression trajectories of the western North Pacific TCs with (a) two, (b) three,
(c) four, (d) five, (e) six, (f) seven, (g) eight, and (h) nine clusters. The mean trajectories start
at 0° lat and lon, for plotting purposes only.
15 J
ULY 2007 CAMARGO ET AL. 3639

Citations
More filters
Journal ArticleDOI

Climate Modulation of North Atlantic Hurricane Tracks

TL;DR: In this article, the variability of North Atlantic tropical storm and hurricane tracks, and its relationship to climate variability, is explored, using a cluster technique that has been previously applied to tropical cyclones in other ocean basins.
Journal ArticleDOI

Global and Regional Aspects of Tropical Cyclone Activity in the CMIP5 Models

TL;DR: In this paper, the authors analyzed 14 models from phase 5 of the Coupled Model Intercomparison Project (CMIP5) and compared the global TC activity in the historical runs with observations.
Journal ArticleDOI

Cluster Analysis of Typhoon Tracks. Part II: Large-Scale Circulation and ENSO

TL;DR: In this article, a probabilistic clustering method based on a regression mixture model was used to describe tropical cyclone propagation in the western North Pacific (WNP) and seven clusters were obtained and described in Part I of this two-part study.
Journal ArticleDOI

A More General Framework for Understanding Atlantic Hurricane Variability and Trends

TL;DR: In this article, the Atlantic hurricane variability on decadal and interannual time scales is reconsidered in a framework based on a leading mode of coupled ocean-atmosphere variability known as the Atlantic meridional mode (AMM), and it is suggested that the AMM serves to unify a number of previously documented relationships between hurricanes and Atlantic regional climate variability.
References
More filters

Typhoons Affecting Taiwan:Current Understanding and Future Challenges

TL;DR: In this paper, an improved understanding of the dynamics of typhoon circulation and their interaction with the Taiwan terrain is needed for more accurate prediction of the track, intensity, precipitation, and strong winds for typhoon affecting Taiwan.
Journal ArticleDOI

Statistics and Dynamics of Persistent Anomalies

TL;DR: In this article, a connection between statistical and dynamical methods of description and prediction of persistent anomalies is established by computing and analyzing the empirical orthogonal functions (EOFs) in a simple deterministic model, and in Southern Hemisphere geopotential heights, on the other.
Journal ArticleDOI

Multiple Flow Regimes in the Northern Hemisphere Winter. Part I: Methodology and Hemispheric Regimes

TL;DR: In this paper, the authors identify persistent and persistent flow patterns by examining multivariate probability density functions (PDFs) in the phase space of large-scale atmospheric motions using a 37-year dataset.
Journal ArticleDOI

Dynamics of Weather Regimes: Quasi-Stationary Waves and Blocking.

TL;DR: In this paper, the authors extend the model of Charney and Straus (1980) to include an additional wave in the zonal direction which is highly baroclinically unstable and can interact directly with the externally forced large-scale wave.
Journal ArticleDOI

Identification of cyclone-track regimes in the North Atlantic

TL;DR: In this article, a Langrangian-type climatology of North Atlantic cyclones is established based on the high-resolution European Centre for Medium-Range Weather Forecasts data-set of the 1000 hPa height-field.
Related Papers (5)