Research Outputs

Permanent URI for this communityhttps://hdl.handle.net/20.500.14288/2

Browse

Search Results

Now showing 1 - 7 of 7

Metadata only
A multitask multiple kernel learning formulation for discriminating early- and late-stage cancers
(Oxford University Press (OUP), 2020) N/A; N/A; Department of Industrial Engineering; Rahimi, Arezou; Gönen, Mehmet; PhD Student; Faculty Member; Department of Industrial Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; 237468
Motivation: Genomic information is increasingly being used in diagnosis, prognosis and treatment of cancer. The severity of the disease is usually measured by the tumor stage. Therefore, identifying pathways playing an important role in progression of the disease stage is of great interest. Given that there are similarities in the underlying mechanisms of different cancers, in addition to the considerable correlation in the genomic data, there is a need for machine learning methods that can take these aspects of genomic data into account. Furthermore, using machine learning for studying multiple cancer cohorts together with a collection of molecular pathways creates an opportunity for knowledge extraction. Results: We studied the problem of discriminating early- and late-stage tumors of several cancers using genomic information while enforcing interpretability on the solutions. To this end, we developed a multitask multiple kernel learning (MTMKL) method with a co-clustering step based on a cutting-plane algorithm to identify the relationships between the input tasks and kernels. We tested our algorithm on 15 cancer cohorts and observed that, in most cases, MTMKL outperforms other algorithms (including random forests, support vector machine and single-task multiple kernel learning) in terms of predictive power. Using the aggregate results from multiple replications, we also derived similarity matrices between cancer cohorts, which are, in many cases, in agreement with available relationships reported in the relevant literature.
Open Access
Classification of drug molecules considering their IC(50) values using mixed-integer linear programming based hyper-boxes method
(BioMed Central, 2008) Department of Industrial Engineering; Department of Chemical and Biological Engineering; Armutlu, Pelin; Özdemir, Muhittin Emre; Yüksektepe, Fadime Üney; Kavaklı, İbrahim Halil; Türkay, Metin; Faculty Member; Department of Industrial Engineering; Department of Chemical and Biological Engineering; The Center for Computational Biology and Bioinformatics (CCBB); College of Engineering; N/A; N/A; N/A; 40319; 24956
Background: A priori analysis of the activity of drugs on the target protein by computational approaches can be useful in narrowing down drug candidates for further experimental tests. Currently, there are a large number of computational methods that predict the activity of drugs on proteins. In this study, we approach the activity prediction problem as a classification problem and, we aim to improve the classification accuracy by introducing an algorithm that combines partial least squares regression with mixed-integer programming based hyper-boxes classification method, where drug molecules are classified as low active or high active regarding their binding activity (IC(50) values) on target proteins. We also aim to determine the most significant molecular descriptors for the drug molecules. Results: We first apply our approach by analyzing the activities of widely known inhibitor datasets including Acetylcholinesterase (ACHE), Benzodiazepine Receptor (BZR), Dihydrofolate Reductase (DHFR), Cyclooxygenase-2 (COX-2) with known IC(50) values. The results at this stage proved that our approach consistently gives better classification accuracies compared to 63 other reported classification methods such as SVM, Naive Bayes, where we were able to predict the experimentally determined IC50 values with a worst case accuracy of 96%. To further test applicability of this approach we first created dataset for Cytochrome P450 C17 inhibitors and then predicted their activities with 100% accuracy. Conclusion: Our results indicate that this approach can be utilized to predict the inhibitory effects of inhibitors based on their molecular descriptors. This approach will not only enhance drug discovery process, but also save time and resources committed.
Metadata only
Discriminating early- and late-stage cancers using multiple kernel learning on gene sets
(Oxford Univ Press, 2018) N/A; N/A; Department of Industrial Engineering; Rahimi, Arezou; Gönen, Mehmet; PhD Student; Faculty Member; Department of Industrial Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; 237468
Motivation: Identifying molecular mechanisms that drive cancers from early to late stages is highly important to develop new preventive and therapeutic strategies. Standard machine learning algorithms could be used to discriminate early-and late-stage cancers from each other using their genomic characterizations. Even though these algorithms would get satisfactory predictive performance, their knowledge extraction capability would be quite restricted due to highly correlated nature of genomic data. That is why we need algorithms that can also extract relevant information about these biological mechanisms using our prior knowledge about pathways/gene sets. Results: In this study, we addressed the problem of separating early- and late-stage cancers from each other using their gene expression profiles. We proposed to use a multiple kernel learning (MKL) formulation that makes use of pathways/gene sets (i) to obtain satisfactory/improved predictive performance and (ii) to identify biological mechanisms that might have an effect in cancer progression. We extensively compared our proposed MKL on gene sets algorithm against two standard machine learning algorithms, namely, random forests and support vector machines, on 20 diseases from the Cancer Genome Atlas cohorts for two different sets of experiments. Our method obtained statistically significantly better or comparable predictive performance on most of the datasets using significantly fewer gene expression features. We also showed that our algorithm was able to extract meaningful and disease-specific information that gives clues about the progression mechanism.
Open Access
Fast and interpretable genomic data analysis using multiple approximate kernel learning
(Oxford University Press (OUP), 2022) Ak, Ciğdem; Department of Industrial Engineering; Gönen, Mehmet; Bektaş, Ayyüce Begüm; Faculty Member; Department of Industrial Engineering; School of Medicine; College of Engineering; Graduate School of Sciences and Engineering; 237468; N/A
Motivation: dataset sizes in computational biology have been increased drastically with the help of improved data collection tools and increasing size of patient cohorts. Previous kernel-based machine learning algorithms proposed for increased interpretability started to fail with large sample sizes, owing to their lack of scalability. To overcome this problem, we proposed a fast and efficient multiple kernel learning (MKL) algorithm to be particularly used with large-scale data that integrates kernel approximation and group Lasso formulations into a conjoint model. Our method extracts significant and meaningful information from the genomic data while conjointly learning a model for out-of-sample prediction. It is scalable with increasing sample size by approximating instead of calculating distinct kernel matrices. Results: to test our computational framework, namely, Multiple Approximate Kernel Learning (MAKL), we demonstrated our experiments on three cancer datasets and showed that MAKL is capable to outperform the baseline algorithm while using only a small fraction of the input features. We also reported selection frequencies of approximated kernel matrices associated with feature subsets (i.e. gene sets/pathways), which helps to see their relevance for the given classification task. Our fast and interpretable MKL algorithm producing sparse solutions is promising for computational biology applications considering its scalability and highly correlated structure of genomic datasets, and it can be used to discover new biomarkers and new therapeutic guidelines.
Metadata only
Modeling gene-wise dependencies improves the identification of drug response biomarkers in cancer studies
(Oxford Univ Press, 2017) Nikolova, Olga; Moser, Russell; Kemp, Christopher; Margolin, Adam A.; Department of Industrial Engineering; Gönen, Mehmet; Faculty Member; Department of Industrial Engineering; College of Engineering; 237468
Motivation: In recent years, vast advances in biomedical technologies and comprehensive sequencing have revealed the genomic landscape of common forms of human cancer in unprecedented detail. The broad heterogeneity of the disease calls for rapid development of personalized therapies. Translating the readily available genomic data into useful knowledge that can be applied in the clinic remains a challenge. Computational methods are needed to aid these efforts by robustly analyzing genome-scale data from distinct experimental platforms for prioritization of targets and treatments. Results: We propose a novel, biologically motivated, Bayesian multitask approach, which explicitly models gene-centric dependencies across multiple and distinct genomic platforms. We introduce a gene-wise prior and present a fully Bayesian formulation of a group factor analysis model. In supervised prediction applications, our multitask approach leverages similarities in response profiles of groups of drugs that are more likely to be related to true biological signal, which leads to more robust performance and improved generalization ability. We evaluate the performance of our method on molecularly characterized collections of cell lines profiled against two compound panels, namely the Cancer Cell Line Encyclopedia and the Cancer Therapeutics Response Portal. We demonstrate that accounting for the gene-centric dependencies enables leveraging information from multi-omic input data and improves prediction and feature selection performance. We further demonstrate the applicability of our method in an unsupervised dimensionality reduction application by inferring genes essential to tumorigenesis in the pancreatic ductal adenocarcinoma and lung adenocarcinoma patient cohorts from The Cancer Genome Atlas.
Metadata only
Path2Surv: Pathway/gene set-based survival analysis using multiple kernel learning
(Oxford University Press (OUP), 2019) N/A; Department of Industrial Engineering; Department of Industrial Engineering; Dereli, Onur; Oğuz, Ceyda; Gönen, Mehmet; PhD Student; Faculty Member; Faculty Member; Department of Industrial Engineering; Graduate School of Sciences and Engineering; College of Engineering; College of Engineering; N/A; 6033; 237468
Motivation: Survival analysis methods that integrate pathways/gene sets into their learning model could identify molecular mechanisms that determine survival characteristics of patients. Rather than first picking the predictive pathways/gene sets from a given collection and then training a predictive model on the subset of genomic features mapped to these selected pathways/gene sets, we developed a novel machine learning algorithm (Path2Surv) that conjointly performs these two steps using multiple kernel learning. Results: We extensively tested our Path2Surv algorithm on 7655 patients from 20 cancer types using cancer-specific pathway/gene set collections and gene expression profiles of these patients. Path2Surv statistically significantly outperformed survival random forest (RF) on 12 out of 20 datasets and obtained comparable predictive performance against survival support vector machine (SVM) using significantly fewer gene expression features (i.e. less than 10% of what survival RF and survival SVM used).
Metadata only
Structure based drug design against biological clock related diseases
(Current Biology Ltd, 2011) N/A; Department of Chemical and Biological Engineering; Department of Industrial Engineering; Kavaklı, İbrahim Halil; Türkay, Metin; Faculty Member; Faculty Member; Department of Chemical and Biological Engineering; Department of Industrial Engineering; College of Engineering; College of Engineering; 40319; 24956
N/A

Research Outputs

Browse

Filters

Advanced Search

Filter by

Settings

Sort By

Results per page

Search Results