Publication: Optimal distance estimation between compressed data series
dc.contributor.coauthor | Freris, Nikolaos M. | |
dc.contributor.coauthor | Vlachos, Michail | |
dc.contributor.department | Department of Electrical and Electronics Engineering | |
dc.contributor.kuauthor | Kozat, Süleyman Serdar | |
dc.contributor.schoolcollegeinstitute | College of Engineering | |
dc.date.accessioned | 2024-11-09T22:53:42Z | |
dc.date.issued | 2012 | |
dc.description.abstract | Most real-world data contain repeated or periodic patterns. This suggests that they can be effectively represented and compressed using only a few coefficients of an appropriate complete orthogonal basis (e.g., Fourier, Wavelets, Karhunen-Loeve expansion or Principal Components). In the face of ever increasing data repositories and given that most mining operations are distance-based, it is vital to perform accurate distance estimation directly on the compressed data. However, distance estimation when the data are represented using different sets of coefficients is still a largely unexplored area. This work studies the optimization problems related to obtaining the tightest lower/upper bound on the distance based on the available information. In particular, we consider the problem where a distinct set of coefficients is maintained for each sequence, and the L2-norm of the compression error is recorded. We establish the properties of optimal solutions, and leverage the theoretical analysis to develop a fast algorithm to obtain an exact solution to the problem. The suggested solution provides the tightest provable estimation of the L2-norm or the correlation, and executes at least two order of magnitudes faster than a numerical solution based on convex optimization. The contributions of this work extend beyond the purview of periodic data, as our methods are applicable to any sequential or high-dimensional data as well as to any orthogonal data transformation used for the underlying data compression scheme. | |
dc.description.indexedby | Scopus | |
dc.description.openaccess | YES | |
dc.description.publisherscope | International | |
dc.description.sponsoredbyTubitakEu | N/A | |
dc.description.sponsorship | American Statistical Association | |
dc.identifier.doi | 10.1137/1.9781611972825.30 | |
dc.identifier.isbn | 9781-6119-7232-0 | |
dc.identifier.quartile | N/A | |
dc.identifier.scopus | 2-s2.0-84880227969 | |
dc.identifier.uri | https://doi.org/10.1137/1.9781611972825.30 | |
dc.identifier.uri | https://hdl.handle.net/20.500.14288/7245 | |
dc.keywords | Compression | |
dc.keywords | Distance estimation | |
dc.keywords | Fourier | |
dc.keywords | KKT conditions | |
dc.keywords | Orthogonal bases | |
dc.keywords | Tine series | |
dc.keywords | Water-filling algorithm | |
dc.keywords | Wavelets | |
dc.keywords | Clustering algorithms | |
dc.keywords | Compaction | |
dc.keywords | Convex optimization | |
dc.keywords | Data mining | |
dc.keywords | Fourier series | |
dc.keywords | Distance estimation | |
dc.keywords | Fourier | |
dc.keywords | KKT condition | |
dc.keywords | Orthogonal basis | |
dc.keywords | Tine series | |
dc.keywords | Water-filling algorithm | |
dc.keywords | Wavelets | |
dc.keywords | Metadata | |
dc.language.iso | eng | |
dc.publisher | Society for Industrial and Applied Mathematics Publications | |
dc.relation.ispartof | Proceedings of the 12th SIAM International Conference on Data Mining, SDM 2012 | |
dc.subject | Engineering | |
dc.subject | Electrical and electronics engineering | |
dc.title | Optimal distance estimation between compressed data series | |
dc.type | Conference Proceeding | |
dspace.entity.type | Publication | |
local.contributor.kuauthor | Kozat, Süleyman Serdar | |
local.publication.orgunit1 | College of Engineering | |
local.publication.orgunit2 | Department of Electrical and Electronics Engineering | |
relation.isOrgUnitOfPublication | 21598063-a7c5-420d-91ba-0cc9b2db0ea0 | |
relation.isOrgUnitOfPublication.latestForDiscovery | 21598063-a7c5-420d-91ba-0cc9b2db0ea0 | |
relation.isParentOrgUnitOfPublication | 8e756b23-2d4a-4ce8-b1b3-62c794a8c164 | |
relation.isParentOrgUnitOfPublication.latestForDiscovery | 8e756b23-2d4a-4ce8-b1b3-62c794a8c164 |