Publication: Fast and interpretable genomic data analysis using multiple approximate kernel learning
dc.contributor.coauthor | Ak, Ciğdem | |
dc.contributor.department | Department of Industrial Engineering | |
dc.contributor.kuauthor | Gönen, Mehmet | |
dc.contributor.kuauthor | Bektaş, Ayyüce Begüm | |
dc.contributor.kuprofile | Faculty Member | |
dc.contributor.other | Department of Industrial Engineering | |
dc.contributor.schoolcollegeinstitute | School of Medicine | |
dc.contributor.schoolcollegeinstitute | College of Engineering | |
dc.contributor.schoolcollegeinstitute | Graduate School of Sciences and Engineering | |
dc.contributor.yokid | 237468 | |
dc.contributor.yokid | N/A | |
dc.date.accessioned | 2024-11-09T11:43:13Z | |
dc.date.issued | 2022 | |
dc.description.abstract | Motivation: dataset sizes in computational biology have been increased drastically with the help of improved data collection tools and increasing size of patient cohorts. Previous kernel-based machine learning algorithms proposed for increased interpretability started to fail with large sample sizes, owing to their lack of scalability. To overcome this problem, we proposed a fast and efficient multiple kernel learning (MKL) algorithm to be particularly used with large-scale data that integrates kernel approximation and group Lasso formulations into a conjoint model. Our method extracts significant and meaningful information from the genomic data while conjointly learning a model for out-of-sample prediction. It is scalable with increasing sample size by approximating instead of calculating distinct kernel matrices. Results: to test our computational framework, namely, Multiple Approximate Kernel Learning (MAKL), we demonstrated our experiments on three cancer datasets and showed that MAKL is capable to outperform the baseline algorithm while using only a small fraction of the input features. We also reported selection frequencies of approximated kernel matrices associated with feature subsets (i.e. gene sets/pathways), which helps to see their relevance for the given classification task. Our fast and interpretable MKL algorithm producing sparse solutions is promising for computational biology applications considering its scalability and highly correlated structure of genomic datasets, and it can be used to discover new biomarkers and new therapeutic guidelines. | |
dc.description.fulltext | YES | |
dc.description.indexedby | WoS | |
dc.description.indexedby | Scopus | |
dc.description.indexedby | PubMed | |
dc.description.issue | Sup-1 | |
dc.description.openaccess | YES | |
dc.description.publisherscope | International | |
dc.description.sponsoredbyTubitakEu | TÜBİTAK | |
dc.description.sponsorship | Scientific and Technological Research Council of Turkey (TÜBİTAK) | |
dc.description.sponsorship | Turkish Academy of Sciences (TÜBA) Young Outstanding Researcher Support Programme (GEBIP) | |
dc.description.sponsorship | Science Academy of Turkey (BAGEP The Young Scientist Award Program) | |
dc.description.version | Publisher version | |
dc.description.volume | 38 | |
dc.format | ||
dc.identifier.doi | 10.1093/bioinformatics/btac241 | |
dc.identifier.eissn | 1460-2059 | |
dc.identifier.embargo | NO | |
dc.identifier.filenameinventoryno | IR03781 | |
dc.identifier.issn | 1367-4803 | |
dc.identifier.link | https://doi.org/10.1093/bioinformatics/btac241 | |
dc.identifier.quartile | Q1 | |
dc.identifier.scopus | 2-s2.0-85133882826 | |
dc.identifier.uri | https://hdl.handle.net/20.500.14288/309 | |
dc.identifier.wos | 817250400014 | |
dc.keywords | Varying coefficient model | |
dc.keywords | Quantile regression | |
dc.keywords | High-dimensional | |
dc.language | English | |
dc.publisher | Oxford University Press (OUP) | |
dc.relation.grantno | EEEAG 117E181 | |
dc.relation.uri | http://cdm21054.contentdm.oclc.org/cdm/ref/collection/IR/id/10639 | |
dc.source | Bioinformatics | |
dc.subject | Biochemical research methods | |
dc.subject | Biotechnology and applied microbiology | |
dc.subject | Computer science, interdisciplinary applications | |
dc.subject | Mathematical and computational biology | |
dc.subject | Statistics and probability | |
dc.title | Fast and interpretable genomic data analysis using multiple approximate kernel learning | |
dc.type | Journal Article | |
dspace.entity.type | Publication | |
local.contributor.authorid | 0000-0002-2483-075X | |
local.contributor.authorid | N/A | |
local.contributor.kuauthor | Gönen, Mehmet | |
local.contributor.kuauthor | Bektaş, Ayyüce Begüm | |
relation.isOrgUnitOfPublication | d6d00f52-d22d-4653-99e7-863efcd47b4a | |
relation.isOrgUnitOfPublication.latestForDiscovery | d6d00f52-d22d-4653-99e7-863efcd47b4a |
Files
Original bundle
1 - 1 of 1