Research Outputs
Permanent URI for this communityhttps://hdl.handle.net/20.500.14288/2
Browse
8 results
Search Results
Publication Metadata only An investigation of new graph invariants related to the domination number of random proximity catch digraphs(Springer, 2012) Department of Mathematics; Ceyhan, Elvan; Faculty Member; Department of Mathematics; College of Sciences; N/AProximity catch digraphs (PCDs) are a special type of proximity graphs based on proximity maps which yield proximity regions. PCDs are defined using the relative allocation of points from two or more classes in a region of interest and have applications in various fields. We introduce some auxiliary tools for PCDs and graph invariants related to the domination number of the PCDs and investigate their probabilistic properties. We consider the cases in which the vertices of the PCDs come from uniform and non-uniform distributions in the region of interest. We also provide some of the newly defined proximity maps as illustrative examples.Publication Metadata only Cell-specific and post-hoc spatial clustering tests based on nearest neighbor contingency tables(Korean Statistical Soc, 2017) Department of Mathematics; Ceyhan, Elvan; Faculty Member; Department of Mathematics; College of Sciences; N/ASpatial clustering patterns in a multi-class setting such as segregation and association between classes have important implications in various fields, e.g., in ecology, and can be tested using nearest neighbor contingency tables (NNCTs). a NNCT is constructed based on the types of the nearest neighbor (NN) pairs and their frequencies. We survey the cell-specific (or pairwise) and overall segregation tests based on NNCTs in literature and introduce new ones and determine their asymptotic distributions. We demonstrate that cell-specific tests enjoy asymptotic normality, while overall tests have chi-square distributions asymptotically. Some of the overall tests are confounded by the unstable generalized inverse of the rank-deficient covariance matrix. To overcome this problem, we propose rank-based corrections for the overall tests to stabilize their behavior. We also perform an extensive' Monte Carlo simulation study to compare the finite sample performance of the tests in terms of empirical size and power based on the asymptotic and Monte Carlo critical values and determine the tests that have the best size and power performance and are robust to differences in relative abundances (of the classes). in addition to the cell-specific tests, we discuss one(-class)-versus-rest type of tests as post-hoc,tests after a significant overall test. We also introduce the concepts of total, strong, and partial segregatioN/Association to differentiate different levels of these patterns. We compare the new tests with the existing NNCT-tests in literature with simulations and illustrate the tests on an ecological data set. (C) 2016 the Korean Statistical Society. Published by Elsevier B.V. all rights reserved.Publication Metadata only Discriminating early- and late-stage cancers using multiple kernel learning on gene sets(Oxford Univ Press, 2018) N/A; N/A; Department of Industrial Engineering; Rahimi, Arezou; Gönen, Mehmet; PhD Student; Faculty Member; Department of Industrial Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; 237468Motivation: Identifying molecular mechanisms that drive cancers from early to late stages is highly important to develop new preventive and therapeutic strategies. Standard machine learning algorithms could be used to discriminate early-and late-stage cancers from each other using their genomic characterizations. Even though these algorithms would get satisfactory predictive performance, their knowledge extraction capability would be quite restricted due to highly correlated nature of genomic data. That is why we need algorithms that can also extract relevant information about these biological mechanisms using our prior knowledge about pathways/gene sets. Results: In this study, we addressed the problem of separating early- and late-stage cancers from each other using their gene expression profiles. We proposed to use a multiple kernel learning (MKL) formulation that makes use of pathways/gene sets (i) to obtain satisfactory/improved predictive performance and (ii) to identify biological mechanisms that might have an effect in cancer progression. We extensively compared our proposed MKL on gene sets algorithm against two standard machine learning algorithms, namely, random forests and support vector machines, on 20 diseases from the Cancer Genome Atlas cohorts for two different sets of experiments. Our method obtained statistically significantly better or comparable predictive performance on most of the datasets using significantly fewer gene expression features. We also showed that our algorithm was able to extract meaningful and disease-specific information that gives clues about the progression mechanism.Publication Open Access Distribution of maximum loss of fractional Brownian motion with drift(Elsevier, 2013) Vardar-Acar, Ceren; Department of Mathematics; Çağlar, Mine; Faculty Member; Department of Mathematics; College of Sciences; 105131In this paper, we find bounds on the distribution of the maximum loss of fractional Brownian motion with H >= 1/2 and derive estimates on its tail probability. Asymptotically, the tail of the distribution of maximum loss over [0, t] behaves like the tail of the marginal distribution at time t.Publication Metadata only Path2Surv: Pathway/gene set-based survival analysis using multiple kernel learning(Oxford University Press (OUP), 2019) N/A; Department of Industrial Engineering; Department of Industrial Engineering; Dereli, Onur; Oğuz, Ceyda; Gönen, Mehmet; PhD Student; Faculty Member; Faculty Member; Department of Industrial Engineering; Graduate School of Sciences and Engineering; College of Engineering; College of Engineering; N/A; 6033; 237468Motivation: Survival analysis methods that integrate pathways/gene sets into their learning model could identify molecular mechanisms that determine survival characteristics of patients. Rather than first picking the predictive pathways/gene sets from a given collection and then training a predictive model on the subset of genomic features mapped to these selected pathways/gene sets, we developed a novel machine learning algorithm (Path2Surv) that conjointly performs these two steps using multiple kernel learning. Results: We extensively tested our Path2Surv algorithm on 7655 patients from 20 cancer types using cancer-specific pathway/gene set collections and gene expression profiles of these patients. Path2Surv statistically significantly outperformed survival random forest (RF) on 12 out of 20 datasets and obtained comparable predictive performance against survival support vector machine (SVM) using significantly fewer gene expression features (i.e. less than 10% of what survival RF and survival SVM used).Publication Metadata only Simulation and characterization of multi-class spatial patterns from stochastic point processes of randomness, clustering and regularity(Springer, 2014) Department of Mathematics; Ceyhan, Elvan; Faculty Member; Department of Mathematics; College of Sciences; N/ASpatial pattern analysis of data from multiple classes (i.e., multi-class data) has important implications. We investigate the resulting patterns when classes are generated from various spatial point processes. Our null pattern is that the nearest neighbor probabilities being proportional to class frequencies in the multi-class setting. In the two-class case, the deviations are mainly in two opposite directions, namely, segregation and association of the classes. But for three or more classes, the classes might exhibit mixed patterns, in which one pair exhibiting segregation, while another pair exhibiting association or complete spatial randomness independence. To detect deviations from the null case, we employ tests based on nearest neighbor contingency tables (NNCTs), as NNCT methods can provide an omnibus test and post-hoc tests after a significant omnibus test in a multi-class setting. In particular, for analyzing these multi-class patterns (mixed or not), we use an omnibus overall test based on NNCTs. After the overall test, the pairwise interactions are analyzed by the post-hoc cell-specific tests based on NNCTs. We propose various parameterizations of the segregation and association alternatives, list some appealing properties of these patterns, and propose three processes for the two-class association pattern. We also consider various clustering and regularity patterns to determine which one(s) cause segregation from or association with a class from a homogeneous Poisson process and from other processes as well. We perform an extensive Monte Carlo simulation study to investigate the newly proposed association patterns and to understand which stochastic processes might result in segregation or association. The methodology is illustrated on two real life data sets from plant ecology.Publication Metadata only Spatial clustering tests based on the domination number of a new random digraph family(Taylor & Francis Inc, 2011) Department of Mathematics; Ceyhan, Elvan; Faculty Member; Department of Mathematics; College of Sciences; N/AWe use the domination number of a parametrized random digraph family called proportional-edge proximity catch digraphs (PCDs) for testing multivariate spatial point patterns. This digraph family is based on relative positions of data points from various classes. We extend the results on the distribution of the domination number of proportional-edge PCDs, and use the domination number as a statistic for testing segregation and association against complete spatial randomness. We demonstrate that the domination number of the PCD has binomial distribution when size of one class is fixed while the size of the other (whose points constitute the vertices of the digraph) tends to infinity and has asymptotic normality when sizes of both classes tend to infinity. We evaluate the finite sample performance of the test by Monte Carlo simulations and prove the consistency of the test under the alternatives. We find the optimal parameters for testing each of the segregation and association alternatives. Furthermore, the methodology discussed in this article is valid for data in higher dimensions also.Publication Metadata only The distribution of the relative arc density of a family of interval catch digraph based on uniform data(Springer, 2012) Department of Mathematics; Ceyhan, Elvan; Faculty Member; Department of Mathematics; College of Sciences; N/AWe study a family of interval catch digraph called proportional-edge proximity catch digraph (PCD) which is also a special type of intersection digraphs parameterized with an expansion and a centrality parameter. PCDs are random catch digraphs that have been developed recently and have applications in classification and spatial pattern analysis. We investigate a graph invariant of the PCDs called relative arc density. We demonstrate that relative arc density of PCDs is a U-statistic and using the central limit theory of U-statistics, we derive the (asymptotic) distribution of the relative arc density of proportional-edge PCD for uniform data in one dimension. We also determine the parameters for which the rate of convergence to asymptotic normality is fastest.