scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Detecting correlation in stock market

TL;DR: In order to find hidden correlations in the daily returns, this work builds cross prediction models and uses the normalized modeling error as a generalized correlation measure that extends the concept of the classical correlation matrix.
Abstract: We present a new method for detecting dependencies in the stock market. In order to find hidden correlations in the daily returns, we build cross prediction models and use the normalized modeling error as a generalized correlation measure that extends the concept of the classical correlation matrix.

Summary (1 min read)

1 Introduction

  • , where the brackets indicate the time average over all trading days in the investigated period.
  • Following their investigations the authors see strong indications that this asymmetric interaction exists in a way that the dynamics of single stocks are leading the dynamics of others significantly.
  • The authors indicate this with a cross modeling scheme which is described in the following section.

2 Mixed State Analysis

  • For δ(i, j) > 0 the authors have cp(i, j) > cp(j, i) which means that the returns of the i-th stock contain more useful information to model the returns of the j-th stock than the other way around.
  • In the terms of synchronization this indicates an asymmetrical coupling strength between the two stocks.

3 Numerical Simulations

  • For all 30 stocks in the DJIA, the authors build the time series of daily returns and calculate the cross-correlation matrix ρ(i, j) (see equation 1).
  • The stocks that behave anti correlated with respect to the index (the blue stripes in the correlation matrix) occur in cp(i, j) with an modeling error near one.
  • In the matrix of the error differences δ(i, j) the authors find the amount of asymmetry regarding their mixed state analysis that offers a field of further investigations.
  • The next step will be a detailed analysis of the time dependence of these asymmetries an the nonlinear dependencies in the stock market.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Detecting Correlation in Stock Market
org D. Wichard, Christian Merkwirth, Maciej OgorzaÃlek
a,b,a
a
AGH University of Science and Technology
Department of Electrical Engineering
al. Mickiewicza 30
30-059 Krak´ow, Poland
b
Max-Planck-Institut f¨ur Informatik
Stuhlsatzenhausweg 85
66123 Saarbr¨ucken, Germany
Abstract
We present a new method for detecting dependencies in the stock market. In order
to find hidden correlations in the daily returns, we build cross prediction models
and use the normalized modeling error as a generalized correlation measure that
extends the concept of the classical correlation matrix.
Key words: Econophysics, Multivariate analysis, Time series analysis
PACS: 89.65.Gh, 02.50.Sk, 05.45.Tp
1 Introduction
The analysis of the the cross-correlation matrix of the returns plays an impor-
tant role in portfolio theory and financial analysis. We build the time series
of daily returns
R
i
(t) =
Y
i
(t + 1) Y
i
(t)
Y
i
(t)
,
wherein Y
i
(t) denotes the closing-price of the i-th stock at day t. The cross-
correlation matrix of the returns is defined as
ρ
ij
=
hR
i
R
j
i hR
i
ihR
j
i
q
hR
2
i
hR
i
i
2
ihR
2
j
hR
j
i
2
i
,
where the brackets indicate the time average over all trading days in the
investigated period. The analysis of ρ
ij
leads to some interesting insights in the
market dynamics. Mantegna (see Mantegna (1999)) discovered a hierarchical
Preprint submitted to Elsevier Science 25 April 2004

organization inside a portfolio of stocks by introducing a metric related to
the correlation coefficients. By definition the correlation matrix is symmetric
with respect to i and j and thus cannot be used to distinguish a symmetrical
interaction between different stocks from an asymmetric one. Following our
investigations we see strong indications that this asymmetric interaction exists
in a way that the dynamics of single stocks are leading the dynamics of others
significantly. We indicate this with a cross modeling scheme which is described
in the following section.
2 Mixed State Analysis
The scheme we introduce for market analysis is related to the “mixed state
analysis” of multivariate time series which was developed to detect weak cou-
pling between dynamical systems in the framework of chaotic synchronization
(see Wiesenfeldt et al. (2001)). This approach is based on the reconstruction
of mixed states consisting of delayed samples taken from simultaneously mea-
sured time series of both systems under investigation.
We adopted this idea and changed it for our purpose in a way that a linear
model f(
~
R
i,j
(t)) is constructed that maps the time-lagged returns of the j-th
stock together with the time-lagged returns of the i-th stock
~
R
i,j
(t) = (R
j
(t), R
j
(t 1), . . . , R
j
(t τ), R
i
(t 1), . . . , R
i
(t τ)) (1)
onto the actual returns of the i-th stock R
i
(t). The model f(·) is a linear
function that is fitted using the standard least squares approach (see for ex-
ample Hastie et al. (2001)) for multiple linear regression models, i.e. it should
minimize the residual sum of squares
P
t
(R
i
(t) f (
~
R
i,j
(t)))
2
. We would like
to remark that this model f(·) is for sure not able to make predictions of the
returns for the next day, however it is able to find the relationship between
the actual returns R
i
(t) and R
j
(t) with resp ect to the time lagged returns,
that may contain some information about linear trends on short time scales.
If we consider a portfolio of N different stocks, we can define the N ×N-matrix
of the normalized modeling error as
cp(i, j) =
h(R
i
f(
~
R
i,j
))
2
i
hR
2
i
hR
i
i
2
i
, (2)
where the brackets denote the time average. The modeling error is normalized
with the variance of the time series R
i
(t) for a simple reason: A value of
cp(i, j) 1.0 indicates that the mean value hR
i
i is a more appropriate model
than f (·), which means that there is no linear dep endence in the the time series
under investigation. Smaller values of cp(i, j) give an indication that there is
at least a weak linear interrelation between the dynamics of the returns. In
2

general, the matrix cp(i, j) is not symmetric, i.e. cp(i, j) 6= cp(j, i). We define
the matrix of differences δ(i, j) as
δ(i, j) = cp(i, j) cp(j, i). (3)
The values of δ(i, j) reflect asymmetric dependencies in the market dynamics.
If the returns of i and j are uncorrelated or they interact on the same level,
then we expect δ(i, j) 0.
For δ(i, j) > 0 we have cp(i, j) > cp(j, i) which means that the returns of
the i-th stock contain more useful information to model the returns of the
j-th stock than the other way around. In the terms of synchronization this
indicates an asymmetrical coupling strength between the two stocks.
3 Numerical Simulations
We investigate 600 trading days of the Dow-Jones Industrial Average (DJIA)
between 2-Oct-2000 and 3-Mar-2003. For all 30 stocks in the DJIA, we build
the time series of daily returns and calculate the cross-correlation matrix ρ(i, j)
(see equation 1). For the mixed state analysis we use a time lag of τ = 3 and
we calculate the matrix of the modeling error
1
as defined in equation 2 and
further the matrix of differences δ(i, j) from equation 3. The results are shown
in Figure 2. The cross-correlation matrix shows some interesting structures,
for example are there obvious clusters, there were described by Mantegna
(1999). A part of this structures can be found in the matrix of the modeling
error cp(i, j). The stocks that behave anti correlated with respect to the index
(the blue stripes in the correlation matrix) occur in cp(i, j) with an modeling
error near one. In the matrix of the error differences δ(i, j) we find the amount
of asymmetry regarding our mixed state analysis that offers a field of further
investigations. The next step will be a detailed analysis of the time dependence
of these asymmetries an the nonlinear dependencies in the stock market.
References
Hastie, T., Tibshirani, R., Friedman, J., 2001. The Elements of Statistical
Learning. Springer Series in Statistics. Springer-Verlag.
Mantegna, R., 1999. Hierarchical structure in financial markets. Eur. Phys. J.
B. 11, 193–197.
Wiesenfeldt, M., Parlitz, U., Lauterborn, W., 2001. Mixed state analysis of
multivariate time series. Int. J. Bifurcation and Chaos 11 (8), 2217–2226.
1
In order to achieve a better graphical resolution in the plots, we set the zero
diagonal elements to one.
3

AA
AXP
BA
C
CAT
DD
DIS
EK
GE
GM
HD
HON
HPQ
IBM
INTC
IP
JNJ
JPM
KO
MCD
MMM
MO
MRK
MSFT
PG
SBC
T
UTX
WMT
XOM
−0.5
−0.25
0
0.25
0.5
0.75
1
AA
AXP
BA
C
CAT
DD
DIS
EK
GE
GM
HD
HON
HPQ
IBM
INTC
IP
JNJ
JPM
KO
MCD
MMM
MO
MRK
MSFT
PG
SBC
T
UTX
WMT
XOM
0.50
0.60
0.70
0.80
0.90
+1.00
Student Version of MATLAB
AA
AXP
BA
C
CAT
DD
DIS
EK
GE
GM
HD
HON
HPQ
IBM
INTC
IP
JNJ
JPM
KO
MCD
MMM
MO
MRK
MSFT
PG
SBC
T
UTX
WMT
XOM
−0.2
−0.15
−0.1
−0.05
0
0.05
0.1
0.15
0.2
Fig. 1. The cross-correlation matrix (top), the matrix of the normalized modeling
error cp(i, j) (middle) and the matrix δ(i, j) of the error differences as defined in
equation 3 (bottom) for 600 days of the DJIA (Ticker symbols on the left).
4
Citations
More filters
Book ChapterDOI
14 Nov 2005
TL;DR: A genetic programming technique (called Multi-Expression programming) for the prediction of two stock indices is introduced and empirical results reveal that the resulting ensemble obtain the best results.
Abstract: The use of intelligent systems for stock market predictions has been widely established. This paper introduces a genetic programming technique (called Multi-Expression programming) for the prediction of two stock indices. The performance is then compared with an artificial neural network trained using Levenberg-Marquardt algorithm, support vector machine, Takagi-Sugeno neuro-fuzzy model and a difference boosting neural network. As evident from the empirical results, none of the five considered techniques could find an optimal solution for all the four performance measures. Further the results obtained by these five techniques are combined using an ensemble and two well known Evolutionary Multiobjective Optimization (EMO) algorithms namely Non-dominated Sorting Genetic Algorithm II (NSGA II) and Pareto Archive Evolution Strategy (PAES)algorithms in order to obtain an optimal ensemble combination which could also optimize the four different performance measures (objectives). We considered Nasdaq-100 index of Nasdaq Stock Market and the S&P CNX NIFTY stock index as test data. Empirical results reveal that the resulting ensemble obtain the best results.

21 citations

Journal ArticleDOI
TL;DR: This paper proposes an efficient algorithm which is able to track the lagged correlation and compute the leaders incrementally, while still achieving good accuracy, and demonstrates high predictive power on the event of general time series entities, which can enlighten both weather monitoring and financial risk control.
Abstract: Nowadays, World Wide Web is full of rich information, including text data, XML data, multimedia data, time series data, etc. The web is usually represented as a large graph and PageRank is computed to rank the importance of web pages. In this paper, we study the problem of ranking evolving time series and discovering leaders from them by analyzing lead-lag relations. A time series is considered to be one of the leaders if its rise or fall impacts the behavior of many other time series. At each time point, we compute the lagged correlation between each pair of time series and model them in a graph. Then, the leadership rank is computed from the graph, which brings order to time series. Based on the leadership ranking, the leaders of time series are extracted. However, the problem poses great challenges since the dynamic nature of time series results in a highly evolving graph, in which the relationships between time series are modeled. We propose an efficient algorithm which is able to track the lagged correlation and compute the leaders incrementally, while still achieving good accuracy. Our experiments on real weather science data and stock data show that our algorithm is able to compute time series leaders efficiently in a real-time manner and the detected leaders demonstrate high predictive power on the event of general time series entities, which can enlighten both weather monitoring and financial risk control.

19 citations


Cites background from "Detecting correlation in stock mark..."

  • ...We are also aware of a stream of work [10, 16, 17, 31] that constructs a weighted graph on time series in order to discover different interesting patterns....

    [...]

Proceedings ArticleDOI
J.D. Wichard1
16 Jul 2006
TL;DR: A method to build ensemble models based on an extended cross-validation approach that puts several model classes in a tournament and selects the best performing model with respect to the validation set.
Abstract: We like to present a method to build ensemble models based on an extended cross-validation approach. The cross-validation puts several model classes in a tournament and selects the best performing model with respect to the validation set. This leads to a model selection strategy and an estimation of the expected modelling error.

18 citations

Journal ArticleDOI
TL;DR: The performance of different classification models and their ability to recognize prostate cancer in an early stage is investigated and ensembles of classification models are built in order to increase the classification performance.
Abstract: We investigate the performance of different classification models and their ability to recognize prostate cancer in an early stage. We build ensembles of classification models in order to increase the classification performance. We measure the performance of our models in an extensive cross-validation procedure and compare different classification models. The datasets come from clinical examinations and some of the classification models are already in use to support the urologists in their clinical work.

14 citations

Proceedings ArticleDOI
29 Oct 2007
TL;DR: A method for building ensembles of models in order to build proper classifiers is presented, an extension of the classical bagging and the K-fold-cross-validation approach.
Abstract: We present a method for building ensembles of models in order to build proper classifiers. The main advantage of our method is an automated model selection procedure and an automated model parameter estimation. The method is an extension of the classical bagging and the K-fold-cross-validation approach.

10 citations

References
More filters
Journal ArticleDOI
TL;DR: Chapter 11 includes more case studies in other areas, ranging from manufacturing to marketing research, and a detailed comparison with other diagnostic tools, such as logistic regression and tree-based methods.
Abstract: Chapter 11 includes more case studies in other areas, ranging from manufacturing to marketing research. Chapter 12 concludes the book with some commentary about the scientiŽ c contributions of MTS. The Taguchi method for design of experiment has generated considerable controversy in the statistical community over the past few decades. The MTS/MTGS method seems to lead another source of discussions on the methodology it advocates (Montgomery 2003). As pointed out by Woodall et al. (2003), the MTS/MTGS methods are considered ad hoc in the sense that they have not been developed using any underlying statistical theory. Because the “normal” and “abnormal” groups form the basis of the theory, some sampling restrictions are fundamental to the applications. First, it is essential that the “normal” sample be uniform, unbiased, and/or complete so that a reliable measurement scale is obtained. Second, the selection of “abnormal” samples is crucial to the success of dimensionality reduction when OAs are used. For example, if each abnormal item is really unique in the medical example, then it is unclear how the statistical distance MD can be guaranteed to give a consistent diagnosis measure of severity on a continuous scale when the larger-the-better type S/N ratio is used. Multivariate diagnosis is not new to Technometrics readers and is now becoming increasingly more popular in statistical analysis and data mining for knowledge discovery. As a promising alternative that assumes no underlying data model, The Mahalanobis–Taguchi Strategy does not provide sufŽ cient evidence of gains achieved by using the proposed method over existing tools. Readers may be very interested in a detailed comparison with other diagnostic tools, such as logistic regression and tree-based methods. Overall, although the idea of MTS/MTGS is intriguing, this book would be more valuable had it been written in a rigorous fashion as a technical reference. There is some lack of precision even in several mathematical notations. Perhaps a follow-up with additional theoretical justiŽ cation and careful case studies would answer some of the lingering questions.

11,507 citations


"Detecting correlation in stock mark..." refers methods in this paper

  • ...The model f(·) is a linear function that is fitted using the standard least squares approach (see for example Hastie et al. (2001)) for multiple linear regression models, i.e. it should minimize the residual sum of squares ∑ t(Ri(t) − f(~Ri,j(t)))2....

    [...]

Journal ArticleDOI
TL;DR: A hierarchical arrangement of stocks traded in a financial market is found by investigating the daily time series of the logarithm of stock price and the hierarchical tree of the subdominant ultrametric space associated with the graph provides a meaningful economic taxonomy.
Abstract: I find a hierarchical arrangement of stocks traded in a financial market by investigating the daily time series of the logarithm of stock price. The topological space is a subdominant ultrametric space associated with a graph connecting the stocks of the portfolio analyzed. The graph is obtained starting from the matrix of correlation coefficient computed between all pairs of stocks of the portfolio by considering the synchronous time evolution of the difference of the logarithm of daily stock price. The hierarchical tree of the subdominant ultrametric space associated with the graph provides a meaningful economic taxonomy.

1,808 citations


"Detecting correlation in stock mark..." refers background in this paper

  • ...Mantegna (see Mantegna (1999)) discovered a hierarchical Preprint submitted to Elsevier Science 25 April 2004 organization inside a portfolio of stocks by introducing a metric related to the correlation coefficients....

    [...]

  • ...The cross-correlation matrix shows some interesting structures, for example are there obvious clusters, there were described by Mantegna (1999)....

    [...]

  • ...Tp...

    [...]

Journal ArticleDOI
TL;DR: A method is presented for detecting weak coupling between (chaotic) dynamical systems below the threshold of (generalized) synchronization using reconstruction of mixed states consisting of delayed samples taken from simultaneously measured time series of both systems.
Abstract: A method is presented for detecting weak coupling between (chaotic) dynamical systems below the threshold of (generalized) synchronization. This approach is based on reconstruction of mixed states consisting of delayed samples taken from simultaneously measured time series of both systems.

50 citations


"Detecting correlation in stock mark..." refers background in this paper

  • ...The scheme we introduce for market analysis is related to the “mixed state analysis” of multivariate time series which was developed to detect weak coupling between dynamical systems in the framework of chaotic synchronization (see Wiesenfeldt et al. (2001))....

    [...]

  • ...We build the time series of daily returns Ri(t) = Yi(t + 1)− Yi(t) Yi(t) , wherein Yi(t) denotes the closing-price of the i-th stock at day t....

    [...]

Frequently Asked Questions (8)
Q1. What have the authors contributed in "Detecting correlation in stock market" ?

The authors present a new method for detecting dependencies in the stock market. 

The scheme the authors introduce for market analysis is related to the “mixed state analysis” of multivariate time series which was developed to detect weak coupling between dynamical systems in the framework of chaotic synchronization (see Wiesenfeldt et al. (2001)). 

The modeling error is normalized with the variance of the time series Ri(t) for a simple reason: A value of cp(i, j) ≥ 1.0 indicates that the mean value 〈Ri〉 is a more appropriate model than f(·), which means that there is no linear dependence in the the time series under investigation. 

The model f(·) is a linear function that is fitted using the standard least squares approach (see for example Hastie et al. (2001)) for multiple linear regression models, i.e. it should minimize the residual sum of squares ∑t(Ri(t) − f(~Ri,j(t)))2. 

The analysis of the the cross-correlation matrix of the returns plays an important role in portfolio theory and financial analysis. 

By definition the correlation matrix is symmetric with respect to i and j and thus cannot be used to distinguish a symmetrical interaction between different stocks from an asymmetric one. 

For the mixed state analysis the authors use a time lag of τ = 3 and the authors calculate the matrix of the modeling error 1 as defined in equation 2 and further the matrix of differences δ(i, j) from equation 3. 

This approach is based on the reconstruction of mixed states consisting of delayed samples taken from simultaneously measured time series of both systems under investigation. 

Trending Questions (1)
How can we predict stock market correlation?

The paper proposes a method of detecting correlations in the stock market by building cross prediction models and using normalized modeling error as a correlation measure.