Publications with Fulltext

Permanent URI for this collectionhttps://hdl.handle.net/20.500.14288/6

Browse

Search Results

Now showing 1 - 2 of 2
  • Thumbnail Image
    PublicationOpen Access
    Classification of drug molecules considering their IC(50) values using mixed-integer linear programming based hyper-boxes method
    (BioMed Central, 2008) Department of Industrial Engineering; Department of Chemical and Biological Engineering; Armutlu, Pelin; Özdemir, Muhittin Emre; Yüksektepe, Fadime Üney; Kavaklı, İbrahim Halil; Türkay, Metin; Faculty Member; Department of Industrial Engineering; Department of Chemical and Biological Engineering; The Center for Computational Biology and Bioinformatics (CCBB); College of Engineering; N/A; N/A; N/A; 40319; 24956
    Background: A priori analysis of the activity of drugs on the target protein by computational approaches can be useful in narrowing down drug candidates for further experimental tests. Currently, there are a large number of computational methods that predict the activity of drugs on proteins. In this study, we approach the activity prediction problem as a classification problem and, we aim to improve the classification accuracy by introducing an algorithm that combines partial least squares regression with mixed-integer programming based hyper-boxes classification method, where drug molecules are classified as low active or high active regarding their binding activity (IC(50) values) on target proteins. We also aim to determine the most significant molecular descriptors for the drug molecules. Results: We first apply our approach by analyzing the activities of widely known inhibitor datasets including Acetylcholinesterase (ACHE), Benzodiazepine Receptor (BZR), Dihydrofolate Reductase (DHFR), Cyclooxygenase-2 (COX-2) with known IC(50) values. The results at this stage proved that our approach consistently gives better classification accuracies compared to 63 other reported classification methods such as SVM, Naive Bayes, where we were able to predict the experimentally determined IC50 values with a worst case accuracy of 96%. To further test applicability of this approach we first created dataset for Cytochrome P450 C17 inhibitors and then predicted their activities with 100% accuracy. Conclusion: Our results indicate that this approach can be utilized to predict the inhibitory effects of inhibitors based on their molecular descriptors. This approach will not only enhance drug discovery process, but also save time and resources committed.
  • Thumbnail Image
    PublicationOpen Access
    Fast and interpretable genomic data analysis using multiple approximate kernel learning
    (Oxford University Press (OUP), 2022) Ak, Ciğdem; Department of Industrial Engineering; Gönen, Mehmet; Bektaş, Ayyüce Begüm; Faculty Member; Department of Industrial Engineering; School of Medicine; College of Engineering; Graduate School of Sciences and Engineering; 237468; N/A
    Motivation: dataset sizes in computational biology have been increased drastically with the help of improved data collection tools and increasing size of patient cohorts. Previous kernel-based machine learning algorithms proposed for increased interpretability started to fail with large sample sizes, owing to their lack of scalability. To overcome this problem, we proposed a fast and efficient multiple kernel learning (MKL) algorithm to be particularly used with large-scale data that integrates kernel approximation and group Lasso formulations into a conjoint model. Our method extracts significant and meaningful information from the genomic data while conjointly learning a model for out-of-sample prediction. It is scalable with increasing sample size by approximating instead of calculating distinct kernel matrices. Results: to test our computational framework, namely, Multiple Approximate Kernel Learning (MAKL), we demonstrated our experiments on three cancer datasets and showed that MAKL is capable to outperform the baseline algorithm while using only a small fraction of the input features. We also reported selection frequencies of approximated kernel matrices associated with feature subsets (i.e. gene sets/pathways), which helps to see their relevance for the given classification task. Our fast and interpretable MKL algorithm producing sparse solutions is promising for computational biology applications considering its scalability and highly correlated structure of genomic datasets, and it can be used to discover new biomarkers and new therapeutic guidelines.