scispace - formally typeset
Journal ArticleDOI

Data-mining massive time series astronomical data: challenges, problems and solutions

TLDR
The major characteristics of the time series astronomical data, data preprocessing techniques to process these time series, and some domain-specific techniques to separate candidate variable stars from the nonvariant ones are presented.
Abstract
In this paper we present some initial results of a project which uses data-mining techniques to search for evidence of massive compact halo objects (MACHOs) from very large time series database. MACHOs are the proposed materials that probably make the “dark matter” surrounding our own and other galaxies. It was suggested that MACHOs may be detected through the gravitational microlensing effect which can be identified from the light curves of background stars. The objective of this project is two-fold, namely, (i) identification of new classes of variable stars and (ii) detection of microlensing events. In this paper, we present the major characteristics of the time series astronomical data, data preprocessing techniques to process these time series, and some domain-specific techniques to separate candidate variable stars from the nonvariant ones. We discuss the use of the Fourier model to represent the time series and the k -means based clustering method to classify variable stars.

read more

Citations
More filters
Journal ArticleDOI

Time series clustering and classification by the autoregressive metric

TL;DR: The statistical properties of the autoregressive distance between ARIMA processes are investigated and the asymptotic distribution of the squared AR distance and an approximation which is computationally efficient are derived.
Proceedings ArticleDOI

Matching patterns from historical data using PCA and distance similarity factors

TL;DR: The diagnosis of abnormal plant operation can be greatly facilitated if periods of similar plant performance can be located in the historical database, and a novel methodology is proposed for this pattern matching problem.
Journal ArticleDOI

A coherence-based approach for the pattern recognition of time series

TL;DR: A pattern recognition approach based on the frequency domain measure of squared coherence is a useful approach to identify linearly related groupings of time series over different periods of time.
Journal ArticleDOI

Clustering streamflow time series for regional classification

TL;DR: In this article, the authors show how the Mahalanobis distance between regression coefficients and the Euclidean distance between Autoregressive weights can be applied to hydrologic time series clustering.
Journal ArticleDOI

An Improved Ant Colony Optimization Cluster Algorithm Based on Swarm Intelligence

TL;DR: An improved ant colony optimization cluster algorithm based on a classics algorithm - LF algorithm that can handle large category dataset more rapidly, accurately and effectively, and keep the good scalability at the same time is proposed.
References
More filters
Journal ArticleDOI

Smoothing Noisy Data with Spline Functions Estimating the Correct Degree of Smoothing by the Method of Generalized Cross-Validation*

Peter Craven, +1 more
TL;DR: In this paper, a method for estimating the optimum amount of smoothing from the data is presented, based on smoothing splines, which is well known to provide nice curves which smooth discrete, noisy data.
Journal ArticleDOI

Smoothing noisy data with spline functions

TL;DR: In this article, a generalized cross-validation estimate for smoothing polynomial splines is proposed, where the tradeoff between the "roughness" of the solution, as measured by the average square error of the smoothing spline, is defined.
Related Papers (5)