scispace - formally typeset
Search or ask a question
Author

Andreas S. Weigend

Bio: Andreas S. Weigend is an academic researcher from New York University. The author has contributed to research in topics: Probability distribution & Time series. The author has an hindex of 14, co-authored 28 publications receiving 1543 citations.

Papers
More filters
Posted Content
TL;DR: It is shown that the overall stock price can be reconstructed surprisingly well by using a small number of thresholded weighted ICs, and when using shocks derived from principal components instead of independent components, the reconstructed price is less similar to the original one.
Abstract: This paper discusses the application of a modern signal processing technique known as independentcomponent analysis (ICA) or blind source separation to multivariate financial time series such as aportfolio of stocks. The key idea of ICA is to linearly map the observed multivariate time series into a newspace of statistically independent components (ICs). This can be viewed as a factorization of the portfoliosince joint probabilities become simple products in the coordinate system of the ICs.We apply ICA to three years of daily returns of the 28 largest Japanese stocks and compare the results withthose obtained using principal component analysis. The results indicate that the estimated ICs fall into twocategories, (i) infrequent but large shocks (responsible for the major changes in the stock prices), and (ii)frequent smaller fluctuations (contributing little to the overall level of the stocks). We show that the overallstock price can be reconstructed surprisingly well by using a small number of thresholded weighted ICs.In contrast, when using shocks derived from principal components instead of independent components, thereconstructed price is less similar to the original one. Independent component analysis is a potentially powerfulmethod of analyzing and understanding driving mechanisms in financial markets. There are furtherpromising applications to risk management since ICA focuses on higher-order statistics.

291 citations

Journal ArticleDOI
TL;DR: In this paper, the authors apply independent component analysis (ICA) to multivariate financial time series such as a portfolio of stocks and show that the overall stock price can be reconstructed surprisingly well by using a small number of thresholded independent components.
Abstract: This paper explores the appliation of a signal processing technique known as independent component analysis (ICA) or blind source separation to multivariate financial time series such as a portfolio of stocks. The key idea of ICA is to linearly map the observed multivariate time series into a new space of statistically independent components (ICs). We apply ICA to three years of daily returns of the 28 largest Japanese stocks and compare the results with those obtained using principal component analysis. The results indicate that the estimated ICs fall into two categories, (i) infrequent large shocks (responsible for the major changes in the stock prices), and (ii) frequent smaller fluctuations (contributing little to the overall level of the stocks). We show that the overall stock price can be reconstructed surprisingly well by using a small number of thresholded weighted ICs. In contrast, when using shocks derived from principal components instead of independent components, the reconstructed price is le...

272 citations

Posted ContentDOI
TL;DR: This chapter reports on a competition run through the Santa Fe Institute in which participants from a range of relevant disciplines applied a variety of time series analysis tools to a small group of common data sets in order to help make meaningful comparisons among their approaches.
Abstract: Throughout scientific research, measured time series are the basis for characterizing an observed system and for predicting its future behavior. A number of new techniques (such as state-space reconstruction and neural networks) promise insights that traditional approaches to these very old problems cannot provide. In practice, however, the application of such new techniques has been hampered by the unreliability of their results and by the difficulty of relating their performance to those of mature algorithms. This chapter reports on a competition run through the Santa Fe Institute in which participants from a range of relevant disciplines applied a variety of time series analysis tools to a small group of common data sets in order to help make meaningful comparisons among their approaches. The design and the results of this competiton are described, and the historical and theoretical backgrounds necessary to understand the successful entries are reviewed.

196 citations

Journal ArticleDOI
TL;DR: The structure that is present in the semantic space of topics is used in order to improve performance in text categorization: according to their meaning, topics can be grouped together into “meta-topics”, e.g., gold, silver, and copper are all metals.
Abstract: With the recent dramatic increase in electronic access to documents, text categorization—the task of assigning topics to a given document—has moved to the center of the information sciences and knowledge management. This article uses the structure that is present in the semantic space of topics in order to improve performance in text categorization: according to their meaning, topics can be grouped together into “meta-topics”, e.g., gold, silver, and copper are all metals. The proposed architecture matches the hierarchical structure of the topic space, as opposed to a flat model that ignores the structure. It accommodates both single and multiple topic assignments for each document. Its probabilistic interpretation allows its predictions to be combined in a principled way with information from other sources. The first level of the architecture predicts the probabilities of the meta-topic groups. This allows the individual models for each topic on the second level to focus on finer discriminations within the group. Evaluating the performance of a two-level implementation on the Reuters-22173 testbed of newswire articles shows the most significant improvement for rare classes.

182 citations

Journal ArticleDOI
TL;DR: In this paper, the authors compare the uncertainty in the solution stemming from the data splitting with neural network specific uncertainties (parameter initialization, choice of number of hidden units, etc.).
Abstract: This article exposes problems of the commonly used technique of splitting the available data into training, validation, and test sets that are held fixed, warns about drawing too strong conclusions from such static splits, and shows potential pitfalls of ignoring variability across splits. Using a bootstrap or resampling method, we compare the uncertainty in the solution stemming from the data splitting with neural network specific uncertainties (parameter initialization, choice of number of hidden units, etc.). We present two results on data from the New York Stock Exchange. First, the variation due to different resamplings is significantly larger than the variation due to different network conditions. This result implies that it is important to not over-interpret a model (or an ensemble of models) estimated on one specific split of the data. Second, on each split, the neural network solution with early stopping is very close to a linear model; no significant nonlinearities are extracted.

82 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The basic theory and applications of ICA are presented, and the goal is to find a linear representation of non-Gaussian data so that the components are statistically independent, or as independent as possible.

8,231 citations

Journal ArticleDOI
TL;DR: This survey discusses the main approaches to text categorization that fall within the machine learning paradigm and discusses in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.
Abstract: The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert labor power, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.

7,539 citations

Journal ArticleDOI
TL;DR: The theory of proper scoring rules on general probability spaces is reviewed and developed, and the intuitively appealing interval score is proposed as a utility function in interval estimation that addresses width as well as coverage.
Abstract: Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper if the forecaster maximizes the expected score for an observation drawn from the distributionF if he or she issues the probabilistic forecast F, rather than G ≠ F. It is strictly proper if the maximum is unique. In prediction problems, proper scoring rules encourage the forecaster to make careful assessments and to be honest. In estimation problems, strictly proper scoring rules provide attractive loss and utility functions that can be tailored to the problem at hand. This article reviews and develops the theory of proper scoring rules on general probability spaces, and proposes and discusses examples thereof. Proper scoring rules derive from convex functions and relate to information measures, entropy functions, and Bregman divergences. In the case of categorical variables, we prove a rigorous version of the ...

4,644 citations

Posted Content
TL;DR: In this paper, the authors provide a unified and comprehensive theory of structural time series models, including a detailed treatment of the Kalman filter for modeling economic and social time series, and address the special problems which the treatment of such series poses.
Abstract: In this book, Andrew Harvey sets out to provide a unified and comprehensive theory of structural time series models. Unlike the traditional ARIMA models, structural time series models consist explicitly of unobserved components, such as trends and seasonals, which have a direct interpretation. As a result the model selection methodology associated with structural models is much closer to econometric methodology. The link with econometrics is made even closer by the natural way in which the models can be extended to include explanatory variables and to cope with multivariate time series. From the technical point of view, state space models and the Kalman filter play a key role in the statistical treatment of structural time series models. The book includes a detailed treatment of the Kalman filter. This technique was originally developed in control engineering, but is becoming increasingly important in fields such as economics and operations research. This book is concerned primarily with modelling economic and social time series, and with addressing the special problems which the treatment of such series poses. The properties of the models and the methodological techniques used to select them are illustrated with various applications. These range from the modellling of trends and cycles in US macroeconomic time series to to an evaluation of the effects of seat belt legislation in the UK.

4,252 citations

Journal ArticleDOI
TL;DR: In this paper, the authors present a state-of-the-art survey of ANN applications in forecasting and provide a synthesis of published research in this area, insights on ANN modeling issues, and future research directions.

3,680 citations