Research Outputs
Permanent URI for this communityhttps://hdl.handle.net/20.500.14288/2
Browse
9 results
Search Results
Publication Metadata only A multitask multiple kernel learning formulation for discriminating early- and late-stage cancers(Oxford University Press (OUP), 2020) N/A; N/A; Department of Industrial Engineering; Rahimi, Arezou; Gönen, Mehmet; PhD Student; Faculty Member; Department of Industrial Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; 237468Motivation: Genomic information is increasingly being used in diagnosis, prognosis and treatment of cancer. The severity of the disease is usually measured by the tumor stage. Therefore, identifying pathways playing an important role in progression of the disease stage is of great interest. Given that there are similarities in the underlying mechanisms of different cancers, in addition to the considerable correlation in the genomic data, there is a need for machine learning methods that can take these aspects of genomic data into account. Furthermore, using machine learning for studying multiple cancer cohorts together with a collection of molecular pathways creates an opportunity for knowledge extraction. Results: We studied the problem of discriminating early- and late-stage tumors of several cancers using genomic information while enforcing interpretability on the solutions. To this end, we developed a multitask multiple kernel learning (MTMKL) method with a co-clustering step based on a cutting-plane algorithm to identify the relationships between the input tasks and kernels. We tested our algorithm on 15 cancer cohorts and observed that, in most cases, MTMKL outperforms other algorithms (including random forests, support vector machine and single-task multiple kernel learning) in terms of predictive power. Using the aggregate results from multiple replications, we also derived similarity matrices between cancer cohorts, which are, in many cases, in agreement with available relationships reported in the relevant literature.Publication Metadata only Correlated coalescing Brownian flows on R and the circle(IMPA - Instituto de Matemática Pura e Aplicada, 2018) Hajri, Hatem; Department of Mathematics; N/A; Çağlar, Mine; Karakuş, Abdullah Harun; Faculty Member; Master Student; Department of Mathematics; College of Sciences; Graduate School of Sciences and Engineering; 105131; N/AWe consider a stochastic differential equation on the real line which is driven by two correlated Brownian motions B+ and B- respectively on the positive half line and the negative half line. We assume |d〈B+,B-〉t| ≤ ρ dt with ρ ∈ [0, 1). We prove it has a unique flow solution. Then, we generalize this flow to a flow on the circle, which represents an oriented graph with two edges and two vertices. We prove that both flows are coalescing. Coalescence leads to the study of a correlated reflected Brownian motion on the quadrant. Moreover, we find the distribution of the hitting time to the origin of a reflected Brownian motion. This has implications for the effect of the correlation coefficient ρ on the coalescence time of our flows.Publication Metadata only Dandelion plot: a method for the visualization of R-mode exploratory factor analyses(Springer Heidelberg, 2014) Çene, Erhan; Sedef, Ahmet; Demir, İbrahim; N/A; Manukyan, Artur; PhD Student; Graduate School of Sciences and Engineering; N/AOne of the important aspects of exploratory factor analysis (EFA) is to discover underlying structures in real life problems. Especially, R-mode methods of EFA aim to investigate the relationship between variables. Visualizing an efficient EFA model is as important as obtaining one. A good graph of an EFA should be simple, informative and easy to interpret. A few number of visualization methods exist. Dandelion plot, a novel method of visualization for R-mode EFA, is used in this study, providing a more effective representation of factors. With this method, factor variances and factor loadings can be plotted on a single window. The representation of both positivity and negativity among factor loadings is another strength of the method.Publication Metadata only DeepCOP: deep learning-based approach to predict gene regulating effects of small molecules(Oxford Univ Press, 2020) Woo, Godwin; Fernandez, Michael; Hsing, Michael; Cherkasov, Artem; N/A; N/A; Lack, Nathan Alan; Cavga, Ayşe Derya; Faculty Member; PhD Student; School of Medicine; Graduate School of Sciences and Engineering; 120842; N/AMotivation: Recent advances in the areas of bioinformatics and chemogenomics are poised to accelerate the discovery of small molecule regulators of cell development. Combining large genomics and molecular data sources with powerful deep learning techniques has the potential to revolutionize predictive biology. In this study, we present Deep gene COmpound Profiler (DeepCOP), a deep learning based model that can predict gene regulating effects of low-molecular weight compounds. This model can be used for direct identification of a drug candidate causing a desired gene expression response, without utilizing any information on its interactions with protein target(s). Results: In this study, we successfully combined molecular fingerprint descriptors and gene descriptors (derived from gene ontology terms) to train deep neural networks that predict differential gene regulation endpoints collected in LINCS database. We achieved 10-fold cross-validation RAUC scores of and above 0.80, as well as enrichment factors of >5. We validated our models using an external RNA-Seq dataset generated in-house that described the effect of three potent antiandrogens (with different modes of action) on gene expression in LNCaP prostate cancer cell line. The results of this pilot study demonstrate that deep learning models can effectively synergize molecular and genomic descriptors and can be used to screen for novel drug candidates with the desired effect on gene expression. We anticipate that such models can find a broad use in developing novel cancer therapeutics and can facilitate precision oncology efforts.Publication Metadata only Discriminating early- and late-stage cancers using multiple kernel learning on gene sets(Oxford Univ Press, 2018) N/A; N/A; Department of Industrial Engineering; Rahimi, Arezou; Gönen, Mehmet; PhD Student; Faculty Member; Department of Industrial Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; 237468Motivation: Identifying molecular mechanisms that drive cancers from early to late stages is highly important to develop new preventive and therapeutic strategies. Standard machine learning algorithms could be used to discriminate early-and late-stage cancers from each other using their genomic characterizations. Even though these algorithms would get satisfactory predictive performance, their knowledge extraction capability would be quite restricted due to highly correlated nature of genomic data. That is why we need algorithms that can also extract relevant information about these biological mechanisms using our prior knowledge about pathways/gene sets. Results: In this study, we addressed the problem of separating early- and late-stage cancers from each other using their gene expression profiles. We proposed to use a multiple kernel learning (MKL) formulation that makes use of pathways/gene sets (i) to obtain satisfactory/improved predictive performance and (ii) to identify biological mechanisms that might have an effect in cancer progression. We extensively compared our proposed MKL on gene sets algorithm against two standard machine learning algorithms, namely, random forests and support vector machines, on 20 diseases from the Cancer Genome Atlas cohorts for two different sets of experiments. Our method obtained statistically significantly better or comparable predictive performance on most of the datasets using significantly fewer gene expression features. We also showed that our algorithm was able to extract meaningful and disease-specific information that gives clues about the progression mechanism.Publication Metadata only Path2Surv: Pathway/gene set-based survival analysis using multiple kernel learning(Oxford University Press (OUP), 2019) N/A; Department of Industrial Engineering; Department of Industrial Engineering; Dereli, Onur; Oğuz, Ceyda; Gönen, Mehmet; PhD Student; Faculty Member; Faculty Member; Department of Industrial Engineering; Graduate School of Sciences and Engineering; College of Engineering; College of Engineering; N/A; 6033; 237468Motivation: Survival analysis methods that integrate pathways/gene sets into their learning model could identify molecular mechanisms that determine survival characteristics of patients. Rather than first picking the predictive pathways/gene sets from a given collection and then training a predictive model on the subset of genomic features mapped to these selected pathways/gene sets, we developed a novel machine learning algorithm (Path2Surv) that conjointly performs these two steps using multiple kernel learning. Results: We extensively tested our Path2Surv algorithm on 7655 patients from 20 cancer types using cancer-specific pathway/gene set collections and gene expression profiles of these patients. Path2Surv statistically significantly outperformed survival random forest (RF) on 12 out of 20 datasets and obtained comparable predictive performance against survival support vector machine (SVM) using significantly fewer gene expression features (i.e. less than 10% of what survival RF and survival SVM used).Publication Metadata only Portfolio selection with imperfect information: a hidden Markov model(Wiley-Blackwell, 2011) N/A; Department of Industrial Engineering; Çanakoğlu, Ethem; Özekici, Süleyman; Researcher; Faculty Member; Department of Industrial Engineering; Graduate School of Social Sciences and Humanities; College of Engineering; 114906; 32631We consider a utility-based portfolio selection problem, where the parameters change according to a Markovian market that cannot be observed perfectly. The market consists of a riskless and many risky assets whose returns depend on the state of the unobserved market process. The states of the market describe the prevailing economic, financial, social, political or other conditions that affect the deterministic and probabilistic parameters of the model. However, investment decisions are based on the information obtained by the investors. This constitutes our observation process. Therefore, there is a Markovian market process whose states are unobserved, and a separate observation process whose states are observed by the investors who use this information to determine their portfolios. There is, of course, a probabilistic relation between the two processes. The market process is a hidden Markov chain and we use sufficient statistics to represent the state of our financial system. The problem is solved using the dynamic programming approach to obtain an explicit characterization of the optimal policy and the value function. In particular, the return-risk frontiers of the terminal wealth are shown to have linear forms. Copyright (C) 2011 John Wiley & Sons, Ltd.Publication Metadata only Prediction of protein-protein interactions by combining structure and sequence conservation in protein interfaces(Oxford Univ Press, 2005) N/A; Department of Computer Engineering; Department of Chemical and Biological Engineering; Aytuna, Ali Selim; Gürsoy, Attila; Keskin, Özlem; Master Student; Faculty Member; Faculty Member; Department of Computer Engineering; Department of Chemical and Biological Engineering; Graduate School of Sciences and Engineering; College of Engineering; College of Engineering; N/A; 8745; 26605Motivation: Elucidation of the full network of protein- protein interactions is crucial for understanding of the principles of biological systems and processes. Thus, there is a need for in silico methods for predicting interactions. We present a novel algorithm for automated prediction of protein-protein interactions that employs a unique bottom-up approach combining structure and sequence conservation in protein interfaces. Results: Running the algorithm on a template dataset of 67 interfaces and a sequentially non-redundant dataset of 6170 protein structures, 62616 potential interactions are predicted. These interactions are compared with the ones in two publicly available interaction databases (Database of Interacting Proteins and Biomolecular Interaction Network Database) and also the Protein Data Bank. A significant number of predictions are verified in these databases. The unverified ones may correspond to (1) interactions that are not covered in these databases but known in literature, (2) unknown interactions that actually occur in nature and (3) interactions that do not occur naturally but may possibly be realized synthetically in laboratory conditions. Some unverified interactions, supported significantly with studies found in the literature, are discussed.Publication Metadata only Testing for the stochastic dominance efficiency of a given portfolio(Wiley-Blackwell, 2014) Linton, Oliver; Whang, Yoon-Jae; N/A; Post, Gerrit Tjeerd; Other; Graduate School of Business; N/AWe propose a new statistical test of the stochastic dominance efficiency of a given portfolio over a class of portfolios. We establish its null and alternative asymptotic properties, and define a method for consistently estimating critical values. We present some numerical evidence that our tests work well in moderate-sized samples.