Research Outputs

Permanent URI for this communityhttps://hdl.handle.net/20.500.14288/2

Browse

Search Results

Now showing 1 - 10 of 18
  • Placeholder
    Publication
    A kernel-based multilayer perceptron framework to identify pathways related to cancer stages
    (Springer International Publishing Ag, 2023) Mokhtaridoost, Milad; Department of Industrial Engineering; Soleimanpoor, Marzieh; Gönen, Mehmet; Department of Industrial Engineering; Graduate School of Sciences and Engineering; College of Engineering
    Standard machine learning algorithms have limited knowledge extraction capability in discriminating cancer stages based on genomic characterizations, due to the strongly correlated nature of high-dimensional genomic data. Moreover, activation of pathways plays a crucial role in the growth and progression of cancer from early-stage to latestage. That is why we implemented a kernel-based neural network framework that integrates pathways and gene expression data using multiple kernels and discriminates early- and late-stages of cancers. Our goal is to identify the relevant molecular mechanisms of the biological processes which might be driving cancer progression. As the input of developed multilayer perceptron (MLP), we constructed kernel matrices on multiple views of expression profiles of primary tumors extracted from pathways. We used Hallmark and Pathway Interaction Database (PID) datasets to restrict the search area to interpretable solutions. We applied our algorithm to 12 cancer cohorts from the Cancer Genome Atlas (TCGA), including more than 5100 primary tumors. The results showed that our algorithm could extract meaningful and disease-specific mechanisms of cancers. We tested the predictive performance of our MLP algorithm and compared it against three existing classification algorithms, namely, random forests, support vector machines, and multiple kernel learning. Our MLP method obtained better or comparable predictive performance against these algorithms.
  • Thumbnail Image
    PublicationOpen Access
    Alpha-beta-conspiracy search
    (International Computer Games Association (ICGA), 2002) McAllester, David A.; Department of Computer Engineering; Yüret, Deniz; Faculty Member; Department of Computer Engineering; College of Engineering; 179996
    We introduce a variant of alpha-beta search in which each node is associated with two depths rather than one. The purpose of alpha-beta search is to find strategies for each player that together establish a value for the root position. A max strategy establishes a lower bound and the min strategy establishes an upper bound. It has long been observed that forced moves should be searched more deeply. Here we make the observation that in the max strategy we are only concerned with the forcedness of max moves and in the min strategy we are only concerned with the forcedness of min moves. This leads to two measures of depth - one for each strategy - and to a two-depth variant of alpha-beta called ABC search. The two-depth approach can be formally derived from conspiracy theory and the structure of the ABC procedure is justified by two theorems relating ABC search and conspiracy numbers.
  • Placeholder
    Publication
    Analysis and pptimization on FlexDPDP: a practical solution for dynamic provable data possession
    (Springer-Verlag Berlin, 2015) N/A; Department of Computer Engineering; Department of Computer Engineering; Esiner, Ertem; Küpçü, Alptekin; Özkasap, Öznur; Master Student; Faculty Member; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; College of Engineering; College of Engineering; N/A; 168060; 113507
    Security measures, such as proving data integrity, became more important with the increase in popularity of cloud data storage services. Dynamic Provable Data Possession (DPDP) was proposed in the literature to enable the cloud server to prove to the client that her data is kept intact, even in a dynamic setting where the client may update her files. Realizing that variable-sized updates are very inefficient in DPDP (in the worst case leading to uploading the whole file again), Flexible DPDP (FlexDPDP) was proposed. In this paper, we analyze FlexDPDP scheme and propose optimized algorithms. We show that the initial pre-processing phase at the client and server sides during the file upload (generally the most time-consuming operation) can be efficiently performed by parallelization techniques that result in a speed up of 6 with 8 cores. We propose a way of handling multiple updates at once both at the server and the client side, achieving an efficiency gain of 60% at the server side and 90% in terms of the client's update verification time. We deployed the optimized FlexDPDP on the large-scale network testbed PlanetLab and demonstrate the efficiency of our proposed optimizations on multi-client scenarios according to real workloads based on version control system traces.
  • Placeholder
    Publication
    Application of data mining techniques to protein-protein interaction prediction
    (Springer, 2003) Atalay, R.; N/A; Department of Computer Engineering; Kocataş, Alper Tolga; Gürsoy, Attila; Master Student; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; 8745
    Protein-protein interactions are key to understanding biological processes and disease mechanisms in organisms. There is a vast amount of data on proteins waiting to be explored. In this paper, we describe application of data mining techniques, namely association rule mining and ID3 classification, to the problem of predicting protein-protein interactions. We have combined available interaction data and protein domain decomposition data to infer new interactions. Preliminary results show that our approach helps us find plausible rules to understand biological processes.
  • Placeholder
    Publication
    Automated and modular refinement reasoning for concurrent programs
    (Springer International Publishing Ag, 2015) Hawblitzel, Chris; Petrank, Erez; Qadeer, Shaz; Department of Computer Engineering; Taşıran, Serdar; Faculty Member; Department of Computer Engineering; College of Engineering; N/A
    We present CIVL, a language and verifier for concurrent programs based on automated and modular refinement reasoning. CIVL supports reasoning about a concurrent program at many levels of abstraction. Atomic actions in a high-level description are refined to fine-grain and optimized lower-level implementations. A novel combination of automata theoretic and logic-based checks is used to verify refinement. Modular specifications and proof annotations, such as location invariants and procedure pre- and post-conditions, are specified separately, independently at each level in terms of the variables visible at that level. We have implemented CIVL as an extension to the BOOGIE language and verifier. We have used CIVL to refine a realistic concurrent garbage collection algorithm from a simple high-level specification down to a highly-concurrent implementation described in terms of individual memory accesses.
  • Placeholder
    Publication
    Corporate network analysis based on graph learning
    (Springer International Publishing Ag, 2023) Atan, Emre; Duymaz, Ali; Sarisozen, Funda; Aydin, Ugur; Koras, Murat; Department of Computer Engineering;Department of Industrial Engineering; Akgün, Barış; Gönen, Mehmet; College of Engineering
    We constructed a financial network based on the relationships of the customers in our database with our other customers or other bank customers using our large-scale data set of money transactions. There are two main aims in this study. Our first aim is to identify the most profitable customers by prioritizing companies in terms of centrality based on the volume of money transfers between companies. This requires acquiring new customers, deepening existing customers and activating inactive customers. Our second aim is to determine the effect of customers on related customers as a result of the financial deterioration in this network. In this study, while creating the network, a data set was created over money transfers between companies. Here, text similarity algorithms were used while trying to match the company title in the database with the title during the transfer. For customers who are not customers of our bank, information such as IBAN numbers are assigned as unique identifiers. We showed that the average profitability of the top 30% customers in terms of centrality is five times higher than the remaining customers. Besides, the variables we created to examine the effect of financial disruptions on other customers contributed an additional 1% Gini coefficient to the model that the bank is currently using even if it is difficult to contribute to a strong model that already works with a high Gini coefficient.
  • Placeholder
    Publication
    Dynamic verification for hybrid concurrent programming models
    (Springer Int Publishing Ag, 2014) Gajinov, Vladimir; Cristal, Adrian; Unsal, Osman S.; N/A; Department of Computer Engineering; Mutlu, Erdal; Taşıran, Serdar; PhD Student; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; N/A
    We present a dynamic verification technique for a class of concurrent programming models that combine dataflow and shared memory programming. In this class of hybrid concurrency models, programs are built from tasks whose data dependencies are explicitly defined by a programmer and used by the runtime system to coordinate task execution. Differently from pure dataflow, tasks are allowed to have shared state which must be properly protected using synchronization mechanisms, such as locks or transactional memory (TM). While these hybrid models enable programmers to reason about programs, especially with irregular data sharing and communication patterns, at a higher level, they may also give rise to new kinds of bugs as they are unfamiliar to the programmers. We identify and illustrate a novel category of bugs in these hybrid concurrency programming models and provide a technique for randomized exploration of program behaviors in this setting.
  • Placeholder
    Publication
    Geolocation risk scores for credit scoring models
    (Springer Science and Business Media Deutschland Gmbh, 2024) Ünal, Erdem; Aydın, Uğur; Koraş, Murat; Department of Computer Engineering;Department of Industrial Engineering; Akgün, Barış; Gönen, Mehmet; College of Engineering
    Customer location is considered as one of the most informative demographic data for predictive modeling. It has been widely used in various sectors including finance. Commercial banks use this information in the evaluation of their credit scoring systems. Generally, customer city and district are used as demographic features. Even if these features are quite informative, they are not fully capable of capturing socio-economical heterogeneity of customers within cities or districts. In this study, we introduced a micro-region approach alternative to this district or city approach. We created features based on characteristics of micro-regions and developed predictive credit risk models. Since models only used micro-region specific data, we were able to apply it to all possible locations and calculate risk scores of each micro-region. We showed their positive contribution to our regular credit risk models.
  • Placeholder
    Publication
    Learning from the users for spatio-temporal data visualization explorations on social events
    (Springer Int Publishing Ag, 2016) N/A; Department of Media and Visual Arts; Çay, Damla; Yantaç, Asım Evren; PhD Student; Faculty Member; Department of Media and Visual Arts; KU Arçelik Research Center for Creative Industries (KUAR) / KU Arçelik Yaratıcı Endüstriler Uygulama ve Araştırma Merkezi (KUAR); Graduate School of Social Sciences and Humanities; College of Social Sciences and Humanities; N/A; 52621
    The amount of volunteered geographic information is on the rise through geo-tagged data on social media. While this growth opens new paths for designers and developers to form new geographical visualizations and interactive geographic tools, it also engenders new design and visualization problems. We now can turn any kind of data into daily useful information to be used during our daily lives. This paper is about exploration of novel visualization methods for spatio-temporal data related to what is happening in the city, planned or unplanned. We, hereby evaluate design students' works on visualizing social events in the city and share the results as design implications. Yet we contribute by presenting intuitive visualization ideas for social events, for the use of interactive media designers and developers who are developing map based interactive tools.
  • Placeholder
    Publication
    Line segmentation of individual demographic data from Arabic handwritten population registers of Ottoman Empire
    (Springer International Publishing Ag, 2021) N/A; Department of History; Can, Yekta Said; Kabadayı, Mustafa Erdem; Researcher; Faculty Member; Department of History; College of Social Sciences and Humanities; College of Social Sciences and Humanities; N/A; 33267
    Recently, more and more studies have applied state-of-the-art algorithms for extracting information from handwritten historical documents. Line segmentation is a vital stage in the HTR systems; it directly affects the character segmentation stage, which affects the recognition success. In this study, we first applied deep learning-based layout analysis techniques to detect individuals in the first Ottoman population register series collected between the 1840s and 1860s. Then, we used a star path planning algorithm-based line segmentation to the demographic information of these detected individuals in these registers. We achieved encouraging results from the selected regions, which could be used to recognize the text in these registers.