Publication:
Optimal distance bounds on time-series data

dc.contributor.coauthorVlachos, Michail
dc.contributor.coauthorYu, Philip S.
dc.contributor.departmentDepartment of Electrical and Electronics Engineering
dc.contributor.kuauthorKozat, Süleyman Serdar
dc.contributor.kuprofileFaculty Member
dc.contributor.otherDepartment of Electrical and Electronics Engineering
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.contributor.yokid177972
dc.date.accessioned2024-11-09T23:58:58Z
dc.date.issued2009
dc.description.abstractMost data mining operations include an integral search component at their core. For example, the performance of similarity search or classification based on Nearest Neighbors is largely dependent on the underlying compression and distance estimation techniques. As data repositories grow larger, there is an explicit need not only for storing the data in a compressed form, but also for facilitating mining operations directly on the compressed data. Naturally, the quality or tightness of the estimated distances on the compressed objects directly affects the search performance. We motivate our work within the setting of search engine weblog repositories, where keyword demand trends over time are represented and stored as compressed time-series data. Search and analysis over such sequence data has important applications for the search engines, including discovery of important news events, keyword recommendation and efficient keyword-to-advertisement mapping. We present new mechanisms for very fast search operations over the compressed time-series data, with specific focus on weblog data. An important contribution of this work is the derivation of optimally tight bounds on the Euclidean distance estimation between compressed sequences. Since our methodology is applicable to sequential data in general, the proposed technique is of independent interest. Additionally, our distance estimation strategy is not tied to a specific compression methodology, but can be applied on top of any orthonormal based compression technique (Fourier, Wavelet, PCA, etc). The experimental results indicate that the new optimal bounds lead to a significant improvement in the pruning power of search compared to previous state-of-the-art, in many cases eliminating more than 80% of the candidate search sequences.
dc.description.indexedbyScopus
dc.description.openaccessYES
dc.description.publisherscopeInternational
dc.description.volume1
dc.identifier.doiN/A
dc.identifier.isbn9781-6156-7109-0
dc.identifier.linkhttps://www.scopus.com/inward/record.uri?eid=2-s2.0-72849126859andpartnerID=40andmd5=4ebdbc0ca716bf118d930ce67ca49c88
dc.identifier.quartileN/A
dc.identifier.scopus2-s2.0-72849126859
dc.identifier.uriN/A
dc.identifier.urihttps://hdl.handle.net/20.500.14288/15560
dc.keywordsCompression techniques
dc.keywordsData repositories
dc.keywordsDistance bound
dc.keywordsDistance estimation
dc.keywordsEuclidean distance
dc.keywordsFast search
dc.keywordsFourier
dc.keywordsMining operations
dc.keywordsNearest neighbors
dc.keywordsNew mechanisms
dc.keywordsON time
dc.keywordsOptimal bounds
dc.keywordsOrthonormal
dc.keywordsSearch performance
dc.keywordsSequence data
dc.keywordsSequential data
dc.keywordsSimilarity search
dc.keywordsTight bound
dc.keywordsTime-series data
dc.keywordsEstimation
dc.keywordsInformation retrieval
dc.keywordsMining
dc.keywordsSearch engines
dc.keywordsWorld Wide Web
dc.keywordsData handling
dc.languageEnglish
dc.publisherSociety for Industrial and Applied Mathematics
dc.sourceSociety for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics
dc.subjectEngineering
dc.subjectElectrical and electronics engineering
dc.titleOptimal distance bounds on time-series data
dc.typeConference proceeding
dspace.entity.typePublication
local.contributor.authorid0000-0002-6488-3848
local.contributor.kuauthorKozat, Süleyman Serdar
relation.isOrgUnitOfPublication21598063-a7c5-420d-91ba-0cc9b2db0ea0
relation.isOrgUnitOfPublication.latestForDiscovery21598063-a7c5-420d-91ba-0cc9b2db0ea0

Files