Publication:
Optimal distance bounds for fast search on compressed time-series query logs

dc.contributor.coauthorVlachos, Michail
dc.contributor.coauthorYu, Philip S.
dc.contributor.departmentDepartment of Electrical and Electronics Engineering
dc.contributor.departmentDepartment of Electrical and Electronics Engineering
dc.contributor.kuauthorKozat, Süleyman Serdar
dc.contributor.kuprofileFaculty Member
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.contributor.yokid177972
dc.date.accessioned2024-11-09T23:19:22Z
dc.date.issued2010
dc.description.abstractConsider a database of time-series, where each datapoint in the series records the total number of users who asked for a specific query at an internet search engine. Storage and analysis of such logs can be very beneficial for a search company from multiple perspectives. First, from a data organization perspective, because query Weblogs capture important trends and statistics, they can help enhance and optimize the search experience (keyword recommendation, discovery of news events). Second, Weblog data can provide an important polling mechanism for the microeconomic aspects of a search engine, since they can facilitate and promote the advertising facet of the search engine (understand what users request and when they request it). Due to the sheer amount of time-series Weblogs, manipulation of the logs in a compressed form is an impeding necessity for fast data processing and compact storage requirements. Here, we explicate how to compute the lower and upper distance bounds on the time-series logs when working directly on their compressed form. Optimal distance estimation means tighter bounds, leading to better candidate selection/elimination and ultimately faster search performance. Our derivation of the optimal distance bounds is based on the careful analysis of the problem using optimization principles. The experimental evaluation suggests a clear performance advantage of the proposed method, compared to previous compression/search techniques. The presented method results in a 10-30% improvement on distance estimations, which in turn leads to 25-80% improvement on the search performance.
dc.description.indexedbyWoS
dc.description.indexedbyScopus
dc.description.issue2
dc.description.openaccessNO
dc.description.publisherscopeInternational
dc.description.volume4
dc.identifier.doi10.1145/1734200.1734203
dc.identifier.eissn1559-114X
dc.identifier.issn1559-1131
dc.identifier.quartileQ2
dc.identifier.scopus2-s2.0-77951928248
dc.identifier.urihttp://dx.doi.org/10.1145/1734200.1734203
dc.identifier.urihttps://hdl.handle.net/20.500.14288/10540
dc.identifier.wos277060300003
dc.keywordsAlgorithms
dc.languageEnglish
dc.publisherAssoc Computing Machinery
dc.sourceAcm Transactions on The Web
dc.subjectComputer science
dc.subjectInformation systems
dc.subjectSoftware engineering
dc.titleOptimal distance bounds for fast search on compressed time-series query logs
dc.typeJournal Article
dspace.entity.typePublication
local.contributor.authorid0000-0002-6488-3848
local.contributor.kuauthorKozat, Süleyman Serdar
relation.isOrgUnitOfPublication21598063-a7c5-420d-91ba-0cc9b2db0ea0
relation.isOrgUnitOfPublication.latestForDiscovery21598063-a7c5-420d-91ba-0cc9b2db0ea0

Files