Research Outputs

Permanent URI for this communityhttps://hdl.handle.net/20.500.14288/2

Browse

Search Results

Now showing 1 - 10 of 22
  • Placeholder
    Publication
    Adaptive classifier cascade for multimodal speaker identification
    (International Speech Communication Association, 2004) Department of Electrical and Electronics Engineering; Department of Computer Engineering; Department of Computer Engineering; Tekalp, Ahmet Murat; Erzin, Engin; Yemez, Yücel; Faculty Member; Faculty Member; Faculty Member; Department of Electrical and Electronics Engineering; Department of Computer Engineering; College of Engineering; College of Engineering; College of Engineering; 26207; 34503; 107907
    We present a multimodal open-set speaker identification system that integrates information coming from audio, face and lip motion modalities. For fusion of multiple modalities, we propose a new adaptive cascade rule that favors reliable modality combinations through a cascade of classifiers. The order of the classifiers in the cascade is adaptively determined based on the reliability of each modality combination. A novel reliability measure, that genuinely fits to the open-set speaker identification problem, is also proposed to assess accept or reject decisions of a classifier. The proposed adaptive rule is more robust in the presence of unreliable modalities, and outperforms the hard-level max rule and soft-level weighted summation rule, provided that the employed reliability measure is effective in assessment of classifier decisions. Experimental results that support this assertion are provided.
  • Placeholder
    Publication
    Adaptive streaming of multiview video over P2P networks
    (Wiley, 2013) N/A; Department of Electrical and Electronics Engineering; Gürler, Cihat Göktuğ; Tekalp, Ahmet Murat; PhD Student; Faculty Member; Department of Electrical and Electronics Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; 26207
    Three-dimensional (3D) video is the next natural step in the evolution of digital media technologies. Recent 3D auto-stereoscopic displays can display multiview video with up to 200 views. While it is possible to broadcast 3D stereo video (two views) over digital TV platforms today, streaming over IP provides a more flexible approach for distribution of stereo and free-view 3D media to home and mobile with different connection bandwidth and different 3D displays. Here, flexible transport refers to quality-scalable and view-scalable transport over the Internet. These scalability options are the key to deal with the biggest challenge, which is the scarcity of bandwidth in IP networks, in the delivery of multiview video. However, even with the scalability options at hand, it is very possible that the bandwidth requirement of the sender side can reach to critical levels and render such a service infeasible. Peer-to-peer (P2P) video streaming is a promising approach and has received significant attention recently and can be used to alleviate the problem of bandwidth scarcity in server-client-based applications. Unfortunately, P2P also introduces new challenges, such as handling unstable peer connections and peers' limited upload capacity. In this chapter, we provide an adaptive P2P video streaming solution that addresses these challenges for streaming multiview video over P2P overlays. We start with reviewing fundamental video transmission concepts and the state-of-the-art P2P video streaming solutions. We then take a look at beyond the state of the art and introduce the methods for enabling adaptive video streaming for P2P network to distribute legacy monoscopic video. Finally, we move to modifications that are needed to deliver multiview video in an adaptive manner over the Internet. We provide benchmark test results against the state of the P2P video streaming solutions to prove the superiority of the proposed approach in adaptive video transmission.
  • Placeholder
    Publication
    Analysis and synthesis of multiview audio-visual dance figures
    (IEEE, 2008) Canton-Ferrer C.; Tilmanne J.; Balcı K.; Bozkurt E.; Kızoǧlu I.Akarun L.; Erdem A.T.; Department of Electrical and Electronics Engineering; Department of Computer Engineering; Department of Computer Engineering; N/A; N/A; Tekalp, Ahmet Murat; Erzin, Engin; Yemez, Yücel; Ofli, Ferda; Demir, Yasemin; Faculty Member; Faculty Member; Faculty Member; PhD Student; Master Student; Department of Electrical and Electronics Engineering; Department of Computer Engineering; College of Engineering; College of Engineering; College of Engineering; Graduate School of Sciences and Engineering; Graduate School of Sciences and Engineering; Graduate School of Sciences and Engineering; 26207; 34503; 107907; N/A; N/A; N/A
    This paper presents a framework for audio-driven human body motion analysis and synthesis. The video is analyzed to capture the time-varying posture of the dancer's body whereas the musical audio signal is processed to extract the beat information. The human body posture is extracted from multiview video information without any human intervention using a novel marker-based algorithm based on annealing particle filtering. Body movements of the dancer are characterized by a set of recurring semantic motion patterns, i.e., dance figures. Each dance figure is modeled in a supervised manner with a set of HMM (Hidden Markov Model) structures and the associated beat frequency. In synthesis, given an audio signal of a learned musical type, the motion parameters of the corresponding dance figures are synthesized via the trained HMM structures in synchrony with the input audio signal based on the estimated tempo information. Finally, the generated motion parameters are animated along with the musical audio using a graphical animation tool. Experimental results demonstrate the effectiveness of the proposed framework.
  • Placeholder
    Publication
    Analysis of distributed algorithms for density estimation in VANETs (poster)
    (IEEE-Inst Electrical Electronics Engineers Inc, 2012) N/A; Department of Computer Engineering; Department of Electrical and Electronics Engineering; N/A; Özkasap, Öznur; Ergen, Sinem Çöleri; Akhtar, Nabeel; Faculty Member; Faculty Member; Master Student; Department of Computer Engineering; Department of Electrical and Electronics Engineering; College of Engineering; College of Engineering; Graduate School of Sciences and Engineering; 113507; 7211; N/A
    Vehicle density is an important system metric used in monitoring road traffic conditions. Most of the existing methods for vehicular density estimation require either building an infrastructure, such as pressure pads, inductive loop detector, roadside radar, cameras and wireless sensors, or using a centralized approach based on counting the number of vehicles in a particular geographical location via clustering or grouping mechanisms. These techniques however suffer from low reliability and limited coverage as well as high deployment and maintenance cost. In this paper, we propose fully distributed and infrastructure-free mechanisms for the density estimation in vehicular ad hoc networks. Unlike previous distributed approaches, that either rely on group formation, or on vehicle flow and speed information to calculate density, our study is inspired by the mechanisms proposed for system size estimation in peer-to-peer networks. We adapted and implemented three fully distributed algorithms, namely Sample & Collide, Hop Sampling and Gossip-based Aggregation. The extensive simulations of these algorithms at different vehicle traffic densities and area sizes for both highways and urban areas reveal that Hop Sampling provides the highest accuracy in least convergence time and introduces least overhead on the network, but at the cost of higher load on the initiator node.
  • Placeholder
    Publication
    Comparative lip motion analysis for speaker identification
    (Institute of Electrical and Electronics Engineers (IEEE), 2005) Department of Computer Engineering; Department of Computer Engineering; Department of Electrical and Electronics Engineering; N/A; Yemez, Yücel; Erzin, Engin; Tekalp, Ahmet Murat; Çetingül, Hasan Ertan; Faculty Member; Faculty Member; Faculty Member; Master Student; Department of Computer Engineering; Department of Electrical and Electronics Engineering; College of Engineering; College of Engineering; College of Engineering; Graduate School of Sciences and Engineering; 107907; 34503; 26207; N/A
    The aim of this work is to determine the best lip analysis system, thus the most accurate lip motion features for audio-visual open-set speaker identification problem. Based on different analysis points on the lip region, two alternatives for initial lip motion representation is considered. In the first alternative, the feature vector is composed of the 2D-DCT coefficients of the motion vectors estimated within the rectangular mouth region whereas in the second, outer lip boundaries are tracked over the video frames and only the motion vectors around the lip contour are taken into account along with the shape of the lip boundary. Another comparison has been performed between optical flow and block-matching motion estimation methods to find the best model for lip movement. The dimension of the obtained lip feature vector is then reduced by a two-stage discrimination method selecting the most discriminative lip features. An HMM-based identification system has been used for performance comparison of these motion representations. It is observed that the lower-dimensional feature vector computed by block-matching within a rectangular grid in the lip region maximizes the identification performance. /Bu çalışmanın amacı, görsel-işitsel açık set konuşmacı tanıma problemi için en iyi dudak analiz sistemini, dolayısıyla en doğru dudak hareketi özelliklerini belirlemektir. Dudak bölgesindeki farklı analiz noktalarına dayalı olarak, başlangıç dudak hareketi gösterimi için iki alternatif göz önünde bulundurulur. Birinci alternatifte öznitelik vektörü dikdörtgen ağız bölgesi içinde tahmin edilen hareket vektörlerinin 2D-DCT katsayılarından oluşurken, ikinci alternatifte dış dudak sınırları video kareleri üzerinden izlenir ve sadece dudak konturu etrafındaki hareket vektörleri izlenir. dudak sınırının şekli ile birlikte dikkate alınır. Dudak hareketi için en iyi modeli bulmak için optik akış ve blok eşleştirme hareket tahmin yöntemleri arasında başka bir karşılaştırma yapılmıştır. Elde edilen dudak özelliği vektörünün boyutu daha sonra en ayırt edici dudak özelliklerini seçen iki aşamalı bir ayrım yöntemiyle azaltılır. Bu hareket gösterimlerinin performans karşılaştırması için HMM tabanlı bir tanımlama sistemi kullanılmıştır. Dudak bölgesinde dikdörtgen bir grid içerisinde blok eşleştirme ile hesaplanan alt boyutlu özellik vektörünün tanımlama performansını maksimuma çıkardığı görülmektedir.
  • Placeholder
    Publication
    Dans figürlerinin işitsel-görsel analizi için işi̇tsel özniteliklerin deǧerlendi̇ri̇lmesi̇
    (IEEE, 2008) Department of Electrical and Electronics Engineering; Department of Computer Engineering; Department of Computer Engineering; N/A; N/A; Tekalp, Ahmet Murat; Erzin, Engin; Yemez, Yücel; Ofli, Ferda; Demir, Yasemin; Faculty Member; Faculty Member; Faculty Member; PhD Student; Master Student; Department of Electrical and Electronics Engineering; Department of Computer Engineering; College of Engineering; College of Engineering; College of Engineering; Graduate School of Sciences and Engineering; Graduate School of Sciences and Engineering; Graduate School of Sciences and Engineering; 26207; 34503; 107907; N/A; N/A; N/A
    We present a framework for selecting best audio features for audiovisual analysis and synthesis of dance figures. Dance figures are performed synchronously with the musical rhythm. They can be analyzed through the audio spectra using spectral and rhythmic musical features. In the proposed audio feature evaluation system, dance figures are manually labeled over the video stream. The music segments, which correspond to labeled dance figures, are used to train hidden Markov model (HMM) structures to learn temporal spectrum patterns for the dance figures. The dance figure recognition performances of the HMM models for various spectral feature sets are evaluated. Audio features, which are maximizing dance figure recognition performances, are selected as the best audio features for the analyzed audiovisual dance recordings. In our evaluations, mel-scale cepstral coefficients (MFCC) with their first and second derivatives, spectral centroid, spectral flux and spectral roll-off are used as candidate audio features. Selection of the best audio features can be used towards analysis and synthesis of audio-driven body animation.
  • Placeholder
    Publication
    Detection of stride time and stance phase ratio from accelerometer data for gait analysis
    (Institute of Electrical and Electronics Engineers Inc., 2022) N/A; Department of Computer Engineering; N/A; Department of Computer Engineering; Department of Computer Engineering; Department of Electrical and Electronics Engineering; Vural, Atay; Erzin, Engin; Akar, Kardelen; Tokmak, Fadime; Köprücü, Nursena; Emirdağı, Ahmet Rasim; Faculty Member; Faculty Member; Master Student; Student; Student; Student; Department of Computer Engineering; Department of Electrical and Electronics Engineering; Koç University Research Center for Translational Medicine (KUTTAM) / Koç Üniversitesi Translasyonel Tıp Araştırma Merkezi (KUTTAM); N/A; N/A; N/A; N/A; N/A; School of Medicine; College of Engineering; Graduate School of Sciences and Engineering; College of Engineering; College of Engineering; College of Engineering; 182369; 34503; N/A; N/A; N/A; N/A
    Stride time and stance phase ratio are supportive biomarkers used in the diagnosis and treatment of gait disorders and are currently frequently used in research studies. In this study, the 3-axis accelerometer signal, taken from the foot, was denoised by a low-pass FIR (finite impulse response) filter. By using the fundamental frequency analysis the dominant frequency was found and with that frequency an optimal length for a window to be shifted across the whole signal for further purposes. And the turning region was extracted by using the Pearson correlation coefficient with the segments that overlapped by shifting the selected window over the whole signal, after getting the walking segments the stride time parameter is calculated by using a simple peak-picking algorithm. The stance and swing periods of the pseudo-steps, which emerged as a result of the double step time calculation algorithm, were found with the dynamic time warping method, and the ratio of the stance phase in a step to the whole step was calculated as a percentage. The results found were compared with the results of the APDM system, and the mean absolute error rate was calculated as 0.029 s for the stride time and 0.0084 for the stance phase ratio.
  • Placeholder
    Publication
    Discriminative LIP-motion features for biometric speaker identification
    (IEEE, 2004) Department of Electrical and Electronics Engineering; Department of Computer Engineering; Department of Computer Engineering; N/A; Tekalp, Ahmet Murat; Erzin, Engin; Yemez, Yücel; Çetingül, Hasan Ertan; Faculty Member; Faculty Member; Faculty Member; Master Student; Department of Electrical and Electronics Engineering; Department of Computer Engineering; College of Engineering; College of Engineering; College of Engineering; Graduate School of Sciences and Engineering; 26207; 34503; 107907; N/A
    This paper addresses the selection of best lip motion features for biometric open-set speaker identification. The best features are those that result in the highest discrimination of individual speakers in a population. We first detect the face region in each video frame. The lip region for each frame is then segmented following registration of successive face regions by global motion compensation. The initial lip feature vector is composed of the 2D-DCT coefficients of the optical flow vectors within the lip region at each frame. The discriminant analysis is composed of two stages. At the first stage, the most discriminative features are selected from the full set of DCT coefficients of a single lip motion frame by using a probabilistic measure that maximizes the ratio of intra-class and inter-class probabilities. At the second stage, the resulting discriminative feature vectors are interpolated and concatenated for each time instant within a neighborhood, and further analyzed by LDA to reduce dimension, this time taking into account temporal discrimination information. Experimental results of the HMM-based speaker identification system are included to demonstrate the performance.
  • Placeholder
    Publication
    Distributed qos architectures for multimedia streaming over software defined networks
    (Institute of Electrical and Electronics Engineers (IEEE), 2014) Eğilmez, Hilmi E.; Department of Electrical and Electronics Engineering; Tekalp, Ahmet Murat; Faculty Member; Department of Electrical and Electronics Engineering; College of Engineering; 26207
    This paper presents novel QoS extensions to distributed control plane architectures for multimedia delivery over large-scale, multi-operator Software Defined Networks (SDNs). We foresee that large-scale SDNs shall be managed by a distributed control plane consisting of multiple controllers, where each controller performs optimal QoS routing within its domain and shares summarized (aggregated) QoS routing information with other domain controllers to enable inter-domain QoS routing with reduced problem dimensionality. To this effect, this paper proposes (i) topology aggregation and link summarization methods to efficiently acquire network topology and state information, (ii) a general optimization framework for flow-based end-to-end QoS provision over multi-domain networks, and (iii) two distributed control plane designs by addressing the messaging between controllers for scalable and secure inter-domain QoS routing. We apply these extensions to streaming of layered videos and compare the performance of different control planes in terms of received video quality, communication cost and memory overhead. Our experimental results show that the proposed distributed solution closely approaches the global optimum (with full network state information) and nicely scales to large networks.
  • Placeholder
    Publication
    Dynamic management of control plane performance in software-defined networks
    (Institute of Electrical and Electronics Engineers (IEEE), 2016) Görkemli, Burak; Parlakışık, A. Murat; Civanlar, Seyhan; Ulaş, Aydın; Department of Electrical and Electronics Engineering; Tekalp, Ahmet Murat; Faculty Member; Department of Electrical and Electronics Engineering; College of Engineering; 26207
    The controller or the control plane is at the heart of software defined networks (SDN). As SDN migrates to wide area networks (WAN), scalability and performance are two important factors that differentiate one controller from another, and they are critical for success of SDN for end-to-end service management. We distinguish control flows from data flows, and introduce a novel dynamic control plane architecture to distribute different control flows among multiple controller instances depending on specific controller load and controller processor utilization or on the data flow service type. We propose control flow tables - a concept introduced in this paper - that are embedded in OpenFlow flow tables to distribute the control flows among various controller instances. Experimental results demonstrate the improvements in the data plane service performance as a result of the proposed control flow management procedures when the bottleneck is the controller CPU or throughput of links between the controller and switches.