Publications with Fulltext

Permanent URI for this collectionhttps://hdl.handle.net/20.500.14288/6

Browse

Search Results

Now showing 1 - 10 of 77
  • Thumbnail Image
    PublicationOpen Access
    User interface paradigms for visually authoring mid-air gestures: a survey and a provocation
    (CEUR-WS, 2014) Department of Media and Visual Arts; Department of Computer Engineering; Baytaş, Mehmet Aydın; Yemez, Yücel; Özcan, Oğuzhan; Faculty Member; Faculty Member; Department of Media and Visual Arts; Department of Computer Engineering; College of Social Sciences and Humanities; College of Engineering; N/A; N/A; 12532
    Gesture authoring tools enable the rapid and experiential prototyping of gesture-based interfaces. We survey visual authoring tools for mid-air gestures and identify three paradigms used for representing and manipulating gesture information: graphs, visual markup languages and timelines. We examine the strengths and limitations of these approaches and we propose a novel paradigm to authoring location-based mid-air gestures based on space discretization.
  • Thumbnail Image
    PublicationOpen Access
    Engagement rewarded actor-critic with conservative Q-learning for speech-driven laughter backchannel generation
    (Association for Computing Machinery (ACM), 2021) Department of Computer Engineering; Bayramoğlu, Öykü Zeynep; Erzin, Engin; Sezgin, Tevfik Metin; Yemez, Yücel; Faculty Member; Faculty Member; Faculty Member; Department of Computer Engineering; Koç Üniversitesi İş Bankası Yapay Zeka Uygulama ve Araştırma Merkezi (KUIS AI)/ Koç University İş Bank Artificial Intelligence Center (KUIS AI); College of Engineering; Graduate School of Sciences and Engineering; N/A; 34503; 18632; 107907
    We propose a speech-driven laughter backchannel generation model to reward engagement during human-agent interaction. We formulate the problem as a Markov decision process where speech signal represents the state and the objective is to maximize human engagement. Since online training is often impractical in the case of human-agent interaction, we utilize the existing human-to-human dyadic interaction datasets to train our agent for the backchannel generation task. We address the problem using an actor-critic method based on conservative Q-learning (CQL), that mitigates the distributional shift problem by suppressing Q-value over-estimation during training. The proposed CQL based approach is evaluated objectively on the IEMOCAP dataset for laughter generation task. When compared to the existing off-policy Q-learning methods, we observe an improved compliance with the dataset in terms of laugh generation rate. Furthermore, we show the effectiveness of the learned policy by estimating the expected engagement using off-policy policy evaluation techniques.
  • Thumbnail Image
    PublicationOpen Access
    Craft: a benchmark for causal reasoning about forces and in teractions
    (Association for Computational Linguistics (ACL), 2022) Ateş, Tayfun; Ateşoğlu, M. Şamil; Yiğit, Çağatay; Department of Computer Engineering; Department of Psychology; Erdem, Aykut; Göksun, Tilbe; Yüret, Deniz; Kesen, İlker; Kobaş, Mert; Faculty Member; Faculty Member; Faculty Member; Master Student; Department of Computer Engineering; Department of Psychology; Koç Üniversitesi İş Bankası Yapay Zeka Uygulama ve Araştırma Merkezi (KUIS AI)/ Koç University İş Bank Artificial Intelligence Center (KUIS AI); Graduate School of Sciences and Engineering; College of Engineering; College of Social Sciences and Humanities; 20331; 47278; 179996; N/A; N/A; N/A
    Humans are able to perceive, understand and reason about causal events. Developing models with similar physical and causal understanding capabilities is a long-standing goal of artificial intelligence. As a step towards this direction, we introduce CRAFT1, a new video question answering dataset that requires causal reasoning about physical forces and object interactions. It contains 58K video and question pairs that are generated from 10K videos from 20 different virtual environments, containing various objects in motion that interact with each other and the scene. Two question categories in CRAFT include previously studied descriptive and counterfactual questions. Additionally, inspired by the Force Dynamics Theory in cognitive linguistics, we introduce a new causal question category that involves understanding the causal interactions between objects through notions like cause, enable, and prevent. Our results show that even though the questions in CRAFT are easy for humans, the tested baseline models, including existing state-of-the-art methods, do not yet deal with the challenges posed in our benchmark.
  • Thumbnail Image
    PublicationOpen Access
    Kart-ON: an extensible paper programming strategy for affordable early programming education
    (Association for Computing Machinery (ACM), 2022) Department of Computer Engineering; Sezgin, Tevfik Metin; Sabuncuoğlu, Alpay; Faculty Member; Department of Computer Engineering; Koç Üniversitesi İş Bankası Yapay Zeka Uygulama ve Araştırma Merkezi (KUIS AI)/ Koç University İş Bank Artificial Intelligence Center (KUIS AI); College of Engineering; Graduate School of Sciences and Engineering; 18632; N/A
    Programming has become a core subject in primary and middle school curricula. Yet, conventional solutions for in-class programming activities require each student to have expensive equipment, which creates an opportunity gap for low-income students. Paper programming can provide an affordable, engaging, and collaborative in-class programming experience by allowing groups of students to use inexpensive materials and share smartphones. However, current paper-programming examples are limited in terms of language expressivity and generalizability. Addressing these limitations, we developed a paper-programming flow and its variants in different abstraction levels and input/output styles. The programming environments consist of pre-defined tangible programming cards and a mobile application that runs computer vision models to recognize them. This paper describes our educational and technical development process, presents a qualitative analysis of the early user study results and shares our design considerations to help develop wide-reaching paper programming environments.
  • Thumbnail Image
    PublicationOpen Access
    GestAnalytics: experiment and analysis tool for gesture-elicitation studies
    (Association for Computing Machinery (ACM), 2017) Department of Computer Engineering; Buruk, Oğuz Turan; Özcan, Oğuzhan; Faculty Member; Department of Computer Engineering; KU Arçelik Research Center for Creative Industries (KUAR) / KU Arçelik Yaratıcı Endüstriler Uygulama ve Araştırma Merkezi (KUAR); College of Engineering; N/A; 12532
    Gesture-elicitation studies are common and important studies for understanding user preferences. In these studies, researchers aim at extracting gestures which are desirable by users for different kinds of interfaces. During this process, researchers have to manually analyze many videos which is a tiring and a time consuming process. Although current tools for video analysis provide annotation opportunity and features like automatic gesture analysis, researchers still need to (1) divide videos into meaningful pieces, (2) manually examine each piece, (3) match collected user data with these, (4) code each video and (5) verify their coding. These processes are burdensome and current tools do not aim to make this process easier and faster. To fill this gap, we developed “GestAnalytics” with features of simultaneous video monitoring, video tagging and filtering. Our internal pilot tests show that GestAnalytics can be a beneficial tool for researchers who practice video analysis for gestural interfaces.
  • Thumbnail Image
    PublicationOpen Access
    Tree-stack LSTM in transition based dependency parsing
    (Association for Computational Linguistics (ACL), 2018) Department of Computer Engineering; N/A; Yüret, Deniz; Faculty Member; Department of Computer Engineering; College of Engineering; Graduate School of Sciences and Engineering; 179996; N/A
    We introduce tree-stack LSTM to model state of a transition based parser with recurrent neural networks. Tree-stack LSTM does not use any parse tree based or hand-crafted features, yet performs better than models with these features. We also develop new set of embeddings from raw features to enhance the performance. There are 4 main components of this model: stack's σ-LSTM, buffer's βLSTM, actions' LSTM and tree-RNN. All LSTMs use continuous dense feature vectors (embeddings) as an input. Tree-RNN updates these embeddings based on transitions. We show that our model improves performance with low resource languages compared with its predecessors. We participate in CoNLL 2018 UD Shared Task as the”KParse” team and ranked 16th in LAS, 15th in BLAS and BLEX metrics, of 27 participants parsing 82 test sets from 57 languages.
  • Thumbnail Image
    PublicationOpen Access
    Observation of the correlations between pair wise interaction and functional organization of the proteins, in the protein ınteraction network of saccaromyces cerevisiae
    (World Academy of Science, Engineering and Technology (WASET), 2008) Haliloğlu, T.; Department of Computer Engineering; Department of Chemical and Biological Engineering; Tunçbağ, Nurcan; Keskin, Özlem; Faculty Member; Department of Computer Engineering; Department of Chemical and Biological Engineering; College of Engineering; N/A; 26605
    Understanding the cell's large-scale organization is an interesting task in computational biology. Thus, protein-protein interactions can reveal important organization and function of the cell. Here, we investigated the correspondence between protein interactions and function for the yeast. We obtained the correlations among the set of proteins. Then these correlations are clustered using both the hierarchical and biclustering methods. The detailed analyses of proteins in each cluster were carried out by making use of their functional annotations. As a result, we found that some functional classes appear together in almost all biclusters. On the other hand, in hierarchical clustering, the dominancy of one functional class is observed. In brief, from interaction data to function, some correlated results are noticed about the relationship between interaction and function which might give clues about the organization of the proteins.
  • Thumbnail Image
    PublicationOpen Access
    On the importance of hidden bias and hidden entropy in representational efficiency of the Gaussian-Bipolar Restricted Boltzmann Machines
    (Elsevier, 2018) Department of Computer Engineering; Isabekov, Altynbek; Erzin, Engin; Faculty Member; Department of Computer Engineering; College of Engineering; N/A; 34503
    In this paper, we analyze the role of hidden bias in representational efficiency of the Gaussian-Bipolar Restricted Boltzmann Machines (GBPRBMs), which are similar to the widely used Gaussian-Bernoulli RBMs. Our experiments show that hidden bias plays an important role in shaping of the probability density function of the visible units. We define hidden entropy and propose it as a measure of representational efficiency of the model. By using this measure, we investigate the effect of hidden bias on the hidden entropy and provide a full analysis of the hidden entropy as function of the hidden bias for small models with up to three hidden units. We also provide an insight into understanding of the representational efficiency of the larger scale models. Furthermore, we introduce Normalized Empirical Hidden Entropy (NEHE) as an alternative to hidden entropy that can be computed for large models. Experiments on the MNIST, CIFAR-10 and Faces data sets show that NEHE can serve as measure of representational efficiency and gives an insight on minimum number of hidden units required to represent the data.
  • Thumbnail Image
    PublicationOpen Access
    Leveraging frequency based salient spatial sound localization to improve 360 degrees video saliency prediction
    (Institute of Electrical and Electronics Engineers (IEEE), 2021) Çökelek, Mert; İmamoğlu, Nevrez; Özçınar, Çağrı; Department of Computer Engineering; Erdem, Aykut; Faculty Member; Department of Computer Engineering; College of Engineering; 20331
    Virtual and augmented reality (VR/AR) systems dramatically gained in popularity with various application areas such as gaming, social media, and communication. It is therefore a crucial task to have the knowhow to efficiently utilize, store or deliver 360° videos for end-users. Towards this aim, researchers have been developing deep neural network models for 360° multimedia processing and computer vision fields. In this line of work, an important research direction is to build models that can learn and predict the observers' attention on 360° videos to obtain so-called saliency maps computationally. Although there are a few saliency models proposed for this purpose, these models generally consider only visual cues in video frames by neglecting audio cues from sound sources. In this study, an unsupervised frequency-based saliency model is presented for predicting the strength and location of saliency in spatial audio. The prediction of salient audio cues is then used as audio bias on the video saliency predictions of state-of-the-art models. Our experiments yield promising results and show that integrating the proposed spatial audio bias into the existing video saliency models consistently improves their performance.
  • Thumbnail Image
    PublicationOpen Access
    Sparse: Koç University graph-based parsing system for the CoNLL 2018 shared task
    (Association for Computational Linguistics (ACL), 2018) Department of Computer Engineering; N/A; Yüret, Deniz; Önder, Berkay Furkan; Gümeli, Can; Faculty Member; Department of Computer Engineering; College of Engineering; Graduate School of Sciences and Engineering; 179996; N/A; N/A
    We present SParse, our Graph-Based Parsing model submitted for the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies (Zeman et al., 2018). Our model extends the state-of-the-art biaffine parser (Dozat and Manning, 2016) with a structural meta-learning module, SMeta, that combines local and global label predictions. Our parser has been trained and run on Universal Dependencies datasets (Nivre et al., 2016, 2018) and has 87.48% LAS, 78.63% MLAS, 78.69% BLEX and 81.76% CLAS (Nivre and Fang, 2017) score on the Italian-ISDT dataset and has 72.78% LAS, 59.10% MLAS, 61.38% BLEX and 61.72% CLAS score on the Japanese-GSD dataset in our official submission. All other corpora are evaluated after the submission deadline, for whom we present our unofficial test results.