Selecting rows and columns for training support vector regression models with large retail datasets

Publication:
Selecting rows and columns for training support vector regression models with large retail datasets

dc.contributor.department	Department of Business Administration
dc.contributor.department	Graduate School of Sciences and Engineering
dc.contributor.kuauthor	Faculty Member, Ali, Özden Gür
dc.contributor.kuauthor	Master Student, Yaman, Kübra
dc.contributor.schoolcollegeinstitute	College of Administrative Sciences and Economics
dc.contributor.schoolcollegeinstitute	GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
dc.date.accessioned	2024-11-09T22:51:48Z
dc.date.issued	2013
dc.description.abstract	Although support vector regression models are being used successfully in various applications, the size of the business datasets with millions of observations and thousands of variables makes training them difficult, if not impossible to solve. This paper introduces the Row and Column Selection Algorithm (ROCSA) to select a small but informative dataset for training support vector regression models with standard SVM tools. ROCSA uses epsilon-SVR models with L-1-norm regularization of the dual and primal variables for the row and column selection steps, respectively. The first step involves parallel processing of data chunks and selects a fraction of the original observations that are either representative of the pattern identified in the chunk, or represent those observations that do not fit the identified pattern. The column selection step dramatically reduces the number of variables and the multicolinearity in the dataset, increasing the interpretability of the resulting models and their ease of maintenance. Evaluated on six retail datasets from two countries and a publicly available research dataset, the reduced ROCSA training data improves the predictive accuracy on average by 39% compared with the original dataset when trained with standard SVM tools. Comparison with the epsilon SSVR method using reduced kernel technique shows similar performance improvement. Training a standard SVM tool with the ROCSA selected observations improves the predictive accuracy on average by 21% compared to the practical approach of random sampling. (C) 2012 Elsevier B.V. All rights reserved.
dc.description.indexedby	WOS
dc.description.indexedby	Scopus
dc.description.issue	3
dc.description.openaccess	NO
dc.description.publisherscope	International
dc.description.sponsoredbyTubitakEu	N/A
dc.description.sponsorship	KUMPEM This work is partially supported by a KUMPEM grant. We thank the leading grocery store chain of Turkey for providing the Daily Grocery data. We thank IRI<SUP>2</SUP> for providing the Weekly Grocery data. We thank the anonymous reviewers for their insightful comments that improved the paper significantly.
dc.description.volume	226
dc.identifier.doi	10.1016/j.ejor.2012.11.013
dc.identifier.eissn	1872-6860
dc.identifier.issn	0377-2217
dc.identifier.quartile	Q1
dc.identifier.scopus	2-s2.0-84872927704
dc.identifier.uri	https://doi.org/10.1016/j.ejor.2012.11.013
dc.identifier.uri	https://hdl.handle.net/20.500.14288/6910
dc.identifier.wos	314559900009
dc.keywords	Data mining
dc.keywords	Support vector regression
dc.keywords	Feature selection
dc.keywords	Sampling
dc.keywords	Retail
dc.keywords	Big data
dc.language.iso	eng
dc.publisher	Elsevier
dc.relation.ispartof	European Journal of Operational Research
dc.subject	Management
dc.subject	Operations research
dc.subject	Management science
dc.title	Selecting rows and columns for training support vector regression models with large retail datasets
dc.type	Journal Article
dspace.entity.type	Publication
local.contributor.kuauthor	Ali, Özden Gür
local.contributor.kuauthor	Yaman, Kübra
local.publication.orgunit1	College of Administrative Sciences and Economics
local.publication.orgunit1	GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
local.publication.orgunit2	Department of Business Administration
local.publication.orgunit2	Graduate School of Sciences and Engineering
relation.isOrgUnitOfPublication	ca286af4-45fd-463c-a264-5b47d5caf520
relation.isOrgUnitOfPublication	3fc31c89-e803-4eb1-af6b-6258bc42c3d8
relation.isOrgUnitOfPublication.latestForDiscovery	ca286af4-45fd-463c-a264-5b47d5caf520
relation.isParentOrgUnitOfPublication	972aa199-81e2-499f-908e-6fa3deca434a
relation.isParentOrgUnitOfPublication	434c9663-2b11-4e66-9399-c863e2ebae43
relation.isParentOrgUnitOfPublication.latestForDiscovery	972aa199-81e2-499f-908e-6fa3deca434a

Collections

Publications without Fulltext

Publication: Selecting rows and columns for training support vector regression models with large retail datasets

Files

Collections

Publication:
Selecting rows and columns for training support vector regression models with large retail datasets