Publications without Fulltext
Permanent URI for this collectionhttps://hdl.handle.net/20.500.14288/3
Browse
499 results
Search Results
Publication Metadata only Subspace-based techniques for retrieval of general 3D models(IEEE, 2009) Sankur, Bülent; Dutaǧac, Helin; Department of Computer Engineering; Yemez, Yücel; Faculty Member; Department of Computer Engineering; College of Engineering; 107907In this paper we investigate the potential of subspace techniques, such as, PCA, ICA and NMF in the case of indexing and retrieval of generic 3D models. We address the 3D shape alignment problem via continuous PCA and the exhaustive axis labeling and reflections. We find that the most propitious 3D distance transform leading to discriminative subspace features is the inverse distance transform. Our performance on the Princeton Shape Benchmark is on a par with the state-of-the-art methods. ©2009 IEEE.Publication Metadata only Enhancement of throat microphone recordings by learning phone-dependent mappings of speech spectra(Institute of Electrical and Electronics Engineers (IEEE), 2013) N/A; Department of Computer Engineering; Turan, Mehmet Ali Tuğtekin; Erzin, Engin; PhD Student; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; 34503We investigate spectral envelope mapping problem with joint analysis of throat- and acoustic-microphone recordings to enhance throatmicrophone speech. A new phone-dependent GMM-based spectral envelope mapping scheme, which performs the minimum mean square error (MMSE) estimation of the acoustic-microphone spectral envelope, has been proposed. Experimental evaluations are performed to compare the proposed mapping scheme to the state-of-theart GMM-based estimator using both objective and subjective evaluations. Objective evaluations are performed with the log-spectral distortion (LSD) and the wideband perceptual evaluation of speech quality (PESQ) metrics. Subjective evaluations are performed with the A/B pair comparison listening test. Both objective and subjective evaluations yield that the proposed phone-dependent mapping consistently improves performances over the state-of-the-art GMM estimator.Publication Metadata only A novel test coverage metric for concurrently-accessed software components (A work-in-progress paper)(Springer-Verlag Berlin, 2006) N/A; Department of Computer Engineering; N/A; Department of Computer Engineering; Department of Computer Engineering; Taşıran, Serdar; Elmas, Tayfun; Bölükbaşı, Güven; Keremoğlu, M. Erkan; Faculty Member; PhD Student; Undergraduate Student; Reseacher; Department of Computer Engineering; College of Engineering; Graduate School of Sciences and Engineering; College of Engineering, College of Engineering; N/A; N/A; N/A; N/AWe propose a novel, practical coverage metric called "location pairs" (LP) for concurrently-accessed software components. The LP metric captures well common concurrency errors that lead to atomicity or refinement violations. We describe a software tool for measuring LP coverage and outline an inexpensive application of predicate abstraction and model checking for ruling out infeasible coverage targets.Publication Metadata only Hotspotizer: end-user authoring of mid-air gestural interactions(Association for Computing Machinery, 2014) N/A; Department of Computer Engineering; Department of Media and Visual Arts; Baytaş, Mehmet Aydın; Yemez, Yücel; Özcan, Oğuzhan; PhD Student; Faculty Member; Faculty Member; Department of Computer Engineering; Department of Media and Visual Arts; KU Arçelik Research Center for Creative Industries (KUAR) / KU Arçelik Yaratıcı Endüstriler Uygulama ve Araştırma Merkezi (KUAR); Graduate School of Social Sciences and Humanities; College of Engineering; College of Social Sciences and Humanities; N/A; 107907; 12532Drawing from a user-centered design process and guidelines derived from the literature, we developed a paradigm based on space discretization for declaratively authoring mid-air gestures and implemented it in Hotspotizer, an end-to-end toolkit for mapping custom gestures to keyboard commands. Our implementation empowers diverse user populations - including end-users without domain expertise - to develop custom gestural interfaces within minutes, for use with arbitrary applications.Publication Metadata only An audio-driven dancing avatar(Springer, 2008) Balci, Koray; Kizoglu, Idil; Akarun, Lale; Canton-Ferrer, Cristian; Tilmanne, Joelle; Bozkurt, Elif; Erdem, A. Tanju; Department of Computer Engineering; N/A; N/A; Department of Computer Engineering; Department of Electrical and Electronics Engineering; Yemez, Yücel; Ofli, Ferda; Demir, Yasemin; Erzin, Engin; Tekalp, Ahmet Murat; Faculty Member; PhD Student; Master Student; Faculty Member; Faculty Member; Department of Computer Engineering; Department of Electrical and Electronics Engineering; College of Engineering; Graduate School of Sciences and Engineering; Graduate School of Sciences and Engineering; College of Engineering; College of Engineering; 107907; N/A; N/A; 34503; 26207We present a framework for training and synthesis of an audio-driven dancing avatar. The avatar is trained for a given musical genre using the multicamera video recordings of a dance performance. The video is analyzed to capture the time-varying posture of the dancer's body whereas the musical audio signal is processed to extract the beat information. We consider two different marker-based schemes for the motion capture problem. The first scheme uses 3D joint positions to represent the body motion whereas the second uses joint angles. Body movements of the dancer are characterized by a set of recurring semantic motion patterns, i.e., dance figures. Each dance figure is modeled in a supervised manner with a set of HMM (Hidden Markov Model) structures and the associated beat frequency. In the synthesis phase, an audio signal of unknown musical type is first classified, within a time interval, into one of the genres that have been learnt in the analysis phase, based on mel frequency cepstral coefficients (MFCC). The motion parameters of the corresponding dance figures are then synthesized via the trained HMM structures in synchrony with the audio signal based on the estimated tempo information. Finally, the generated motion parameters, either the joint angles or the 3D joint positions of the body, are animated along with the musical audio using two different animation tools that we have developed. Experimental results demonstrate the effectiveness of the proposed framework.Publication Metadata only Multimodal analysis of speech prosody and upper body gestures using hidden semi-Markov models(Institute of Electrical and Electronics Engineers (IEEE), 2013) N/A; N/A; N/A; Department of Computer Engineering; Department of Computer Engineering; Bozkurt, Elif; Asta, Shahriar; Özkul, Serkan; Yemez, Yücel; Erzin, Engin; PhD Student; PhD Student; Master Student; Faculty Member; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; Graduate School of Sciences and Engineering; Graduate School of Sciences and Engineering; College of Engineering; College of Engineering; N/A; N/A; N/A; 107907; 34503Gesticulation is an essential component of face-to-face communication, and it contributes significantly to the natural and affective perception of human-to-human communication. In this work we investigate a new multimodal analysis framework to model relationships between intonational and gesture phrases using the hidden semi-Markov models (HSMMs). The HSMM framework effectively associates longer duration gesture phrases to shorter duration prosody clusters, while maintaining realistic gesture phrase duration statistics. We evaluate the multimodal analysis framework by generating speech prosody driven gesture animation, and employing both subjective and objective metrics.Publication Metadata only Multicamera audio-visual analysis of dance figures(IEEE, 2007) N/A; N/A; Department of Computer Engineering; Department of Computer Engineering; Department of Electrical and Electronics Engineering; Ofli, Ferda; Erzin, Engin; Yemez, Yücel; Tekalp, Ahmet Murat; PhD Student; Faculty Member; Faculty Member; Faculty Member; Department of Computer Engineering; Department of Electrical and Electronics Engineering; Graduate School of Sciences and Engineering; College of Engineering; College of Engineering; College of Engineering; N/A; 34503; 107907; 26207We present an automated system for multicamera motion capture and audio-visual analysis of dance figures. the multiview video of a dancing actor is acquired using 8 synchronized cameras. the motion capture technique is based on 3D tracking of the markers attached to the person's body in the scene, using stereo color information without need for an explicit 3D model. the resulting set of 3D points is then used to extract the body motion features as 3D displacement vectors whereas MFC coefficients serve as the audio features. in the first stage of multimodal analysis, we perform Hidden Markov Model (HMM) based unsupervised temporal segmentation of the audio and body motion features, separately, to determine the recurrent elementary audio and body motion patterns. then in the second stage, we investigate the correlation of body motion patterns with audio patterns, that can be used for estimation and synthesis of realistic audio-driven body animation.Publication Metadata only Audio-facial laughter detection in naturalistic dyadic conversations(Ieee-Inst Electrical Electronics Engineers Inc, 2017) N/A; N/A; Department of Computer Engineering; Department of Computer Engineering; Department of Computer Engineering; Türker, Bekir Berker; Yemez, Yücel; Sezgin, Tevfik Metin; Erzin, Engin; PhD Student; Faculty Member; Faculty Member; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; College of Engineering; College of Engineering; N/A; 107907; 18632; 34503We address the problem of continuous laughter detection over audio-facial input streams obtained from naturalistic dyadic conversations. We first present meticulous annotation of laughters, cross-talks and environmental noise in an audio-facial database with explicit 3D facial mocap data. Using this annotated database, we rigorously investigate the utility of facial information, head movement and audio features for laughter detection. We identify a set of discriminative features using mutual information-based criteria, and show how they can be used with classifiers based on support vector machines (SVMs) and time delay neural networks (TDNNs). Informed by the analysis of the individual modalities, we propose a multimodal fusion setup for laughter detection using different classifier-feature combinations. We also effectively incorporate bagging into our classification pipeline to address the class imbalance problem caused by the scarcity of positive laughter instances. Our results indicate that a combination of TDNNs and SVMs lead to superior detection performance, and bagging effectively addresses data imbalance. Our experiments show that our multimodal approach supported by bagging compares favorably to the state of the art in presence of detrimental factors such as cross-talk, environmental noise, and data imbalance.Publication Metadata only Object placement for high bandwidth memory augmented with high capacity memory(IEEE, 2017) N/A; N/A; Department of Computer Engineering; Laghari, Mohammad; Erten, Didem Unat; Master Student; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; 219274High bandwidth memory (HBM) is a new emerging technology that aims to improve the performance of bandwidth limited applications. Even though it provides high bandwidth, it must be augmented with DRAM to meet the memory capacity requirement of any applications. Due to this limitation, objects in an application should be optimally placed on the heterogeneous memory subsystems. In this study, we propose an object placement algorithm that places program objects to fast or slow memories in case the capacity of fast memory is insufficient to hold all the objects to increase the overall application performance. Our algorithm uses the reference counts and type of references (read or write) to make an initial placement of data. In addition, we perform various memory bandwidth benchmarks to be used in our placement algorithm on Intel Knights Landing (KNL) architecture. Not surprisingly high bandwidth memory sustains higher read bandwidth than write bandwidth, however, placing write-intensive data on HBM results in better overall performance because write-intensive data is punished by the DRAM speed more severely compared to read intensive data. Moreover, our benchmarks demonstrate that if a basic block makes references to both types of memories, it performs worse than if it makes references to only one type of memory in some cases. We test our proposed placement algorithm with 6 applications under various system configurations. By allocating objects according to our placement scheme, we are able to achieve a speedup of up to 2x.Publication Metadata only Fundamental frequency estimation for heterophonical Turkish music by using VMD(Institute of Electrical and Electronics Engineers (IEEE), 2016) Simsek, Berrak Ozturk; Akan, Aydin; Department of Computer Engineering; Bozkurt, Barış; Faculty Member; Department of Computer Engineering; College of Engineering; N/AIn this study, a new method is presented for the fundamental frequency estimation of heterophonical Turkish makam music recordings that include percusssive instrument by using Variational Mode Decomposition (VMD). VMD is a method to decompose an input signal into an ensemble of sub-signals (modes) which is entirely non-recursive and determines the relevant bands adaptively and estimates the corresponding modes concurrently. In order to decompose a given signal optimally, actuated by the narrow-band properties corresponding to the Intrinsic Mode Function definition used in Emprical Mode Decomposition (EMD), and we seek an ensemble of modes. Simulation results on fundamental frequency estimation of real music data show comparable performance to other common decomposition methods for music signals such as YIN and MELODIA based methods.