Publication: Optimal distance estimation between compressed data series
Program
KU-Authors
KU Authors
Co-Authors
Freris, Nikolaos M.
Vlachos, Michail
Advisor
Publication Date
Language
English
Journal Title
Journal ISSN
Volume Title
Abstract
Most real-world data contain repeated or periodic patterns. This suggests that they can be effectively represented and compressed using only a few coefficients of an appropriate complete orthogonal basis (e.g., Fourier, Wavelets, Karhunen-Loeve expansion or Principal Components).
In the face of ever increasing data repositories and given that most mining operations are distance-based, it is vital to perform accurate distance estimation directly on the compressed data. However, distance estimation when the data are represented using different sets of coefficients is still a largely unexplored area. This work studies the optimization problems related to obtaining the tightest lower/upper bound on the distance based on the available information. In particular, we consider the problem where a distinct set of coefficients is maintained for each sequence, and the L2-norm of the compression error is recorded. We establish the properties of optimal solutions, and leverage the theoretical analysis to develop a fast algorithm to obtain an exact solution to the problem. The suggested solution provides the tightest provable estimation of the L2-norm or the correlation, and executes at least two order of magnitudes faster than a numerical solution based on convex optimization. The contributions of this work extend beyond the purview of periodic data, as our methods are applicable to any sequential or high-dimensional data as well as to any orthogonal data transformation used for the underlying data compression scheme.
Source:
Proceedings of the 12th SIAM International Conference on Data Mining, SDM 2012
Publisher:
Society for Industrial and Applied Mathematics Publications
Keywords:
Subject
Engineering, Electrical and electronics engineering