Publication:
Optimal distance estimation between compressed data series

dc.contributor.coauthorFreris, Nikolaos M.
dc.contributor.coauthorVlachos, Michail
dc.contributor.departmentDepartment of Electrical and Electronics Engineering
dc.contributor.kuauthorKozat, Süleyman Serdar
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.date.accessioned2024-11-09T22:53:42Z
dc.date.issued2012
dc.description.abstractMost real-world data contain repeated or periodic patterns. This suggests that they can be effectively represented and compressed using only a few coefficients of an appropriate complete orthogonal basis (e.g., Fourier, Wavelets, Karhunen-Loeve expansion or Principal Components). In the face of ever increasing data repositories and given that most mining operations are distance-based, it is vital to perform accurate distance estimation directly on the compressed data. However, distance estimation when the data are represented using different sets of coefficients is still a largely unexplored area. This work studies the optimization problems related to obtaining the tightest lower/upper bound on the distance based on the available information. In particular, we consider the problem where a distinct set of coefficients is maintained for each sequence, and the L2-norm of the compression error is recorded. We establish the properties of optimal solutions, and leverage the theoretical analysis to develop a fast algorithm to obtain an exact solution to the problem. The suggested solution provides the tightest provable estimation of the L2-norm or the correlation, and executes at least two order of magnitudes faster than a numerical solution based on convex optimization. The contributions of this work extend beyond the purview of periodic data, as our methods are applicable to any sequential or high-dimensional data as well as to any orthogonal data transformation used for the underlying data compression scheme.
dc.description.indexedbyScopus
dc.description.openaccessYES
dc.description.publisherscopeInternational
dc.description.sponsoredbyTubitakEuN/A
dc.description.sponsorshipAmerican Statistical Association
dc.identifier.doi10.1137/1.9781611972825.30
dc.identifier.isbn9781-6119-7232-0
dc.identifier.quartileN/A
dc.identifier.scopus2-s2.0-84880227969
dc.identifier.urihttps://doi.org/10.1137/1.9781611972825.30
dc.identifier.urihttps://hdl.handle.net/20.500.14288/7245
dc.keywordsCompression
dc.keywordsDistance estimation
dc.keywordsFourier
dc.keywordsKKT conditions
dc.keywordsOrthogonal bases
dc.keywordsTine series
dc.keywordsWater-filling algorithm
dc.keywordsWavelets
dc.keywordsClustering algorithms
dc.keywordsCompaction
dc.keywordsConvex optimization
dc.keywordsData mining
dc.keywordsFourier series
dc.keywordsDistance estimation
dc.keywordsFourier
dc.keywordsKKT condition
dc.keywordsOrthogonal basis
dc.keywordsTine series
dc.keywordsWater-filling algorithm
dc.keywordsWavelets
dc.keywordsMetadata
dc.language.isoeng
dc.publisherSociety for Industrial and Applied Mathematics Publications
dc.relation.ispartofProceedings of the 12th SIAM International Conference on Data Mining, SDM 2012
dc.subjectEngineering
dc.subjectElectrical and electronics engineering
dc.titleOptimal distance estimation between compressed data series
dc.typeConference Proceeding
dspace.entity.typePublication
local.contributor.kuauthorKozat, Süleyman Serdar
local.publication.orgunit1College of Engineering
local.publication.orgunit2Department of Electrical and Electronics Engineering
relation.isOrgUnitOfPublication21598063-a7c5-420d-91ba-0cc9b2db0ea0
relation.isOrgUnitOfPublication.latestForDiscovery21598063-a7c5-420d-91ba-0cc9b2db0ea0
relation.isParentOrgUnitOfPublication8e756b23-2d4a-4ce8-b1b3-62c794a8c164
relation.isParentOrgUnitOfPublication.latestForDiscovery8e756b23-2d4a-4ce8-b1b3-62c794a8c164

Files