Integrating gene set analysis and nonlinear predictive modeling of disease phenotypes using a Bayesian multitask formulation

Publication:
Integrating gene set analysis and nonlinear predictive modeling of disease phenotypes using a Bayesian multitask formulation

dc.contributor.department	Department of Industrial Engineering
dc.contributor.kuauthor	Gönen, Mehmet
dc.contributor.schoolcollegeinstitute	College of Engineering
dc.date.accessioned	2024-11-09T11:58:06Z
dc.date.issued	2016
dc.description.abstract	Identifying molecular signatures of disease phenotypes is studied using two mainstream approaches: (i) Predictive modeling methods such as linear classification and regression algorithms are used to find signatures predictive of phenotypes from genomic data, which may not be robust due to limited sample size or highly correlated nature of genomic data. (ii) Gene set analysis methods are used to find gene sets on which phenotypes are linearly dependent by bringing prior biological knowledge into the analysis, which may not capture more complex nonlinear dependencies. Thus, formulating an integrated model of gene set analysis and nonlinear predictive modeling is of great practical importance. In this study, we propose a Bayesian binary classification framework to integrate gene set analysis and nonlinear predictive modeling. We then generalize this formulation to multitask learning setting to model multiple related datasets conjointly. Our main novelty is the probabilistic nonlinear formulation that enables us to robustly capture nonlinear dependencies between genomic data and phenotype even with small sample sizes. We demonstrate the performance of our algorithms using repeated random subsampling validation experiments on two cancer and two tuberculosis datasets by predicting important disease phenotypes from genome-wide gene expression data. We are able to obtain comparable or even better predictive performance than a baseline Bayesian nonlinear algorithm and to identify sparse sets of relevant genes and gene sets on all datasets. We also show that our multitask learning formulation enables us to further improve the generalization performance and to better understand biological processes behind disease phenotypes.
dc.description.fulltext	YES
dc.description.indexedby	WOS
dc.description.indexedby	Scopus
dc.description.indexedby	PubMed
dc.description.openaccess	YES
dc.description.publisherscope	International
dc.description.sponsoredbyTubitakEu	N/A
dc.description.sponsorship	Koç University
dc.description.version	Publisher version
dc.identifier.doi	10.1186/s12859-016-1311-3
dc.identifier.embargo	NO
dc.identifier.filenameinventoryno	IR00988
dc.identifier.issn	1471-2105
dc.identifier.quartile	N/A
dc.identifier.scopus	2-s2.0-85003430034
dc.identifier.uri	https://doi.org/ 10.1186/s12859-016-1311-3
dc.identifier.wos	392601400001
dc.keywords	Gene set analysis
dc.keywords	Nonlinear predictive modeling
dc.keywords	Disease phenotypes
dc.keywords	Multiple kernel learning
dc.keywords	Cancer
dc.keywords	Tuberculosis
dc.language.iso	eng
dc.publisher	BioMed Central
dc.relation.ispartof	BMC Bioinformatics
dc.relation.uri	http://cdm21054.contentdm.oclc.org/cdm/ref/collection/IR/id/1003
dc.subject	Industrial engineering
dc.subject	Phenotype
dc.title	Integrating gene set analysis and nonlinear predictive modeling of disease phenotypes using a Bayesian multitask formulation
dc.type	Conference Proceeding
dspace.entity.type	Publication
local.contributor.kuauthor	Gönen, Mehmet
local.publication.orgunit1	College of Engineering
local.publication.orgunit2	Department of Industrial Engineering
person.familyName	Gönen
person.givenName	Mehmet
relation.isOrgUnitOfPublication	d6d00f52-d22d-4653-99e7-863efcd47b4a
relation.isOrgUnitOfPublication.latestForDiscovery	d6d00f52-d22d-4653-99e7-863efcd47b4a
relation.isParentOrgUnitOfPublication	8e756b23-2d4a-4ce8-b1b3-62c794a8c164
relation.isParentOrgUnitOfPublication.latestForDiscovery	8e756b23-2d4a-4ce8-b1b3-62c794a8c164

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 1003.pdf
Size:: 1.23 MB
Format:: Adobe Portable Document Format

Download

Collections

Publications with Fulltext

Publication: Integrating gene set analysis and nonlinear predictive modeling of disease phenotypes using a Bayesian multitask formulation

Files

Original bundle

Collections

Publication:
Integrating gene set analysis and nonlinear predictive modeling of disease phenotypes using a Bayesian multitask formulation