Researcher: Çetingül, Hasan Ertan
Name Variants
Çetingül, Hasan Ertan
Email Address
Birth Date
8 results
Search Results
Now showing 1 - 8 of 8
Publication Metadata only On optimal selection of lip-motion features for speaker identification(IEEE, 2004) N/A; Department of Computer Engineering; Department of Computer Engineering; Department of Electrical and Electronics Engineering; Çetingül, Hasan Ertan; Erzin, Engin; Yemez, Yücel; Tekalp, Ahmet Murat; Master Student; Faculty Member; Faculty Member; Faculty Member; Department of Computer Engineering; Department of Electrical and Electronics Engineering; Graduate School of Sciences and Engineering; College of Engineering; College of Engineering; College of Engineering; N/A; 34503; 107907; 26207This paper addresses the selection of best lip motion features for biometric open-set speaker identification. The best features are those that result in the highest discrimination of individual speakers in a population. We first detect the face region in each video frame. The lip region for each frame is then segmented following registration of successive face regions by global motion compensation. The initial lip feature vector is composed of the 2D-DCT coefficients of the optical flow vectors within the lip region at each frame. We propose to select the most discriminative features from the full set of transform coefficients by using a probabilistic measure that maximizes the ratio of intra-class and inter-class probabilities. The resulting discriminative feature vector with reduced dimension is expected to maximize the identification performance. Experimental results are also included to demonstrate the performance.Publication Metadata only Robust lip-motion features for speaker identification(IEEE, 2005) N/A; Department of Computer Engineering; Department of Computer Engineering; Department of Electrical and Electronics Engineering; Çetingül, Hasan Ertan; Yemez, Yücel; Erzin, Engin; Master Student; Faculty Member; Faculty Member; Faculty Member; Department of Computer Engineering; Department of Electrical and Electronics Engineering; Graduate School of Sciences and Engineering; College of Engineering; College of Engineering; College of Engineering; N/A; 107907; 34503; 26207This paper addresses the selection of robust lip-motion features for audio-visual open-set speaker identification problem. We consider two alternatives for initial lip motion representation. In the first alternative. the feature vector is composed of the 2D-DCT coefficients of the motion vectors estimated within the detected rectangular mouth region whereas in the second, lip boundaries are tracked over the video frames and only the motion vectors around the lip contour are taken into account along with the shape of the lip boundary. Experimental results of the HMM-based identification system are included for performance comparison of the two lip motion representation alternatives.Publication Metadata only The use of lip motion for biometric speaker identification(IEEE, 2004) Department of Electrical and Electronics Engineering; Department of Computer Engineering; Department of Computer Engineering; N/A; Tekalp, Ahmet Murat; Erzin, Engin; Yemez, Yücel; Çetingül, Hasan Ertan; Faculty Member; Faculty Member; Faculty Member; Master Student; Department of Electrical and Electronics Engineering; Department of Computer Engineering; College of Engineering; College of Engineering; College of Engineering; Graduate School of Sciences and Engineering; 26207; 34503; 107907; N/AThis paper addresses the selection of best lip motion features for biometric open-set speaker identification. The best features are those that result in the highest discrimination of individual speakers in a population. We first detect the face region in each video frame. The lip region for each frame is then segmented following registration of successive face regions by global motion compensation. The initial lip feature vector is composed of the 2D-DCT coefficients of the optical flow vectors within the lip region at each frame. We propose to select the most discriminative features from the full set of transform coefficients by using a probabilistic measure that maximizes the ratio of intra-class and inter-class probabilities. The resulting discriminative feature vector with reduced dimension is expected to maximize the identification performance. Experimental results support that the resulting discriminative feature vector with reduced dimension improves the identification performance.Publication Metadata only Comparative lip motion analysis for speaker identification(Institute of Electrical and Electronics Engineers (IEEE), 2005) Department of Computer Engineering; Department of Computer Engineering; Department of Electrical and Electronics Engineering; N/A; Yemez, Yücel; Erzin, Engin; Tekalp, Ahmet Murat; Çetingül, Hasan Ertan; Faculty Member; Faculty Member; Faculty Member; Master Student; Department of Computer Engineering; Department of Electrical and Electronics Engineering; College of Engineering; College of Engineering; College of Engineering; Graduate School of Sciences and Engineering; 107907; 34503; 26207; N/AThe aim of this work is to determine the best lip analysis system, thus the most accurate lip motion features for audio-visual open-set speaker identification problem. Based on different analysis points on the lip region, two alternatives for initial lip motion representation is considered. In the first alternative, the feature vector is composed of the 2D-DCT coefficients of the motion vectors estimated within the rectangular mouth region whereas in the second, outer lip boundaries are tracked over the video frames and only the motion vectors around the lip contour are taken into account along with the shape of the lip boundary. Another comparison has been performed between optical flow and block-matching motion estimation methods to find the best model for lip movement. The dimension of the obtained lip feature vector is then reduced by a two-stage discrimination method selecting the most discriminative lip features. An HMM-based identification system has been used for performance comparison of these motion representations. It is observed that the lower-dimensional feature vector computed by block-matching within a rectangular grid in the lip region maximizes the identification performance. /Bu çalışmanın amacı, görsel-işitsel açık set konuşmacı tanıma problemi için en iyi dudak analiz sistemini, dolayısıyla en doğru dudak hareketi özelliklerini belirlemektir. Dudak bölgesindeki farklı analiz noktalarına dayalı olarak, başlangıç dudak hareketi gösterimi için iki alternatif göz önünde bulundurulur. Birinci alternatifte öznitelik vektörü dikdörtgen ağız bölgesi içinde tahmin edilen hareket vektörlerinin 2D-DCT katsayılarından oluşurken, ikinci alternatifte dış dudak sınırları video kareleri üzerinden izlenir ve sadece dudak konturu etrafındaki hareket vektörleri izlenir. dudak sınırının şekli ile birlikte dikkate alınır. Dudak hareketi için en iyi modeli bulmak için optik akış ve blok eşleştirme hareket tahmin yöntemleri arasında başka bir karşılaştırma yapılmıştır. Elde edilen dudak özelliği vektörünün boyutu daha sonra en ayırt edici dudak özelliklerini seçen iki aşamalı bir ayrım yöntemiyle azaltılır. Bu hareket gösterimlerinin performans karşılaştırması için HMM tabanlı bir tanımlama sistemi kullanılmıştır. Dudak bölgesinde dikdörtgen bir grid içerisinde blok eşleştirme ile hesaplanan alt boyutlu özellik vektörünün tanımlama performansını maksimuma çıkardığı görülmektedir.Publication Metadata only Multimodal speaker identification using discriminative lip motion features(IGI Global, 2009) Department of Electrical and Electronics Engineering; Department of Computer Engineering; Department of Computer Engineering; N/A; Tekalp, Ahmet Murat; Erzin, Engin; Yemez, Yücel; Çetingül, Hasan Ertan; Faculty Member; Faculty Member; Faculty Member; Master Student; Department of Electrical and Electronics Engineering; Department of Computer Engineering; College of Engineering; College of Engineering; College of Engineering; Graduate School of Sciences and Engineering; 26207; 34503; 107907; N/AThis chapter presents a multimodal speaker identification system that integrates audio, lip texture, and lip motion modalities, and the authors propose to use the "explicit" lip motion information that best represent the modality for the given problem. The work is presented in two stages: First, they consider several lip motion feature candidates such as dense motion features on the lip region, motion features on the outer lip contour, and lip shape features. Meanwhile, the authors introduce their main contribution, which is a novel two-stage, spatial-temporal discrimination analysis framework designed to obtain the best lip motion features. For speaker identification, the best lip motion features result in the highest discrimination among speakers. Next, they investigate the benefits of the inclusion of the best lip motion features for multimodal recognition. Audio, lip texture, and lip motion modalities are fused by the reliability weighted summation (RWS) decision rule, and hidden Markov model (HMM)-based modeling is performed for both unimodal and multimodal recognition. Experimental results indicate that discriminative grid-based lip motion features are proved to be more valuable and provide additional performance gains in speaker identification. © 2009, IGI Global.Publication Metadata only Discriminative analysis of lip motion features for speaker identification and speech-reading(Ieee-Inst Electrical Electronics Engineers Inc, 2006) N/A; Department of Computer Engineering; Department of Computer Engineering; Department of Computer Engineering; Çetingül, Hasan Ertan; Yemez, Yücel; Erzin, Engin; Tekalp, Ahmet Murat; Master Student; Faculty Member; Faculty Member; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; College of Engineering; College of Engineering; College of Engineering; N/A; 107907; 34503; 26207There have been several studies that jointly use audio, lip intensity, and lip geometry information for speaker identification and speech-reading applications. This paper proposes using explicit lip motion information, instead of or in addition to lip intensity and/or geometry information, for speaker identification and speech-reading within a unified feature selection and discrimination analysis framework, and addresses two important issues: 1) Is using explicit lip motion information useful, and, 2) if so, what are the best lip motion features for these two applications? The best lip motion features for speaker identification are considered to be those that result in the highest discrimination of individual speakers in a population, whereas for speech-reading, the best features are those providing the highest phoneme/word/phrase recognition rate. Several lip motion feature candidates have been considered including dense motion features within a bounding box about the lip, lip contour motion features, and combination of these with lip shape features. Furthermore, a novel two-stage, spatial, and temporal discrimination analysis is introduced to select the best lip motion features for speaker identification and speech-reading applications. Experimental results using an hidden-Markov-model-based recognition system indicate that using explicit lip motion information provides additional performance gains in both applications, and lip motion features prove more valuable in the case of speech-reading application.Publication Metadata only Multimodal speaker/speech recognition using lip motion, lip texture and audio(Elsevier, 2006) N/A; Department of Computer Engineering; Department of Computer Engineering; Department of Computer Engineering; Çetingül, Hasan Ertan; Erzin, Engin; Yemez, Yücel; Tekalp, Ahmet Murat; Master Student; Faculty Member; Faculty Member; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; College of Engineering; College of Engineering; College of Engineering; N/A; 34503; 107907; 26207We present a new multimodal speaker/speech recognition system that integrates audio, lip texture and lip motion modalities. Fusion of audio and face texture modalities has been investigated in the literature before. The emphasis of this work is to investigate the benefits of inclusion of lip motion modality for two distinct cases: speaker and speech recognition. The audio modality is represented by the well-known mel-frequency cepstral coefficients (MFCC) along with the first and second derivatives, whereas lip texture modality is represented by the 2D-DCT coefficients of the luminance component within a bounding box about the lip region. In this paper, we employ a new lip motion modality representation based on discriminative analysis of the dense motion vectors within the same bounding box for speaker/speech recognition. The fusion of audio, lip texture and lip motion modalities is performed by the so-called reliability weighted summation (RWS) decision rule. Experimental results show that inclusion of lip motion modality provides further performance gains over those which are obtained by fusion of audio and lip texture alone, in both speaker identification and isolated word recognition scenarios. (c) 2006 Published by Elsevier B.V.Publication Metadata only Discriminative LIP-motion features for biometric speaker identification(IEEE, 2004) Department of Electrical and Electronics Engineering; Department of Computer Engineering; Department of Computer Engineering; N/A; Tekalp, Ahmet Murat; Erzin, Engin; Yemez, Yücel; Çetingül, Hasan Ertan; Faculty Member; Faculty Member; Faculty Member; Master Student; Department of Electrical and Electronics Engineering; Department of Computer Engineering; College of Engineering; College of Engineering; College of Engineering; Graduate School of Sciences and Engineering; 26207; 34503; 107907; N/AThis paper addresses the selection of best lip motion features for biometric open-set speaker identification. The best features are those that result in the highest discrimination of individual speakers in a population. We first detect the face region in each video frame. The lip region for each frame is then segmented following registration of successive face regions by global motion compensation. The initial lip feature vector is composed of the 2D-DCT coefficients of the optical flow vectors within the lip region at each frame. The discriminant analysis is composed of two stages. At the first stage, the most discriminative features are selected from the full set of DCT coefficients of a single lip motion frame by using a probabilistic measure that maximizes the ratio of intra-class and inter-class probabilities. At the second stage, the resulting discriminative feature vectors are interpolated and concatenated for each time instant within a neighborhood, and further analyzed by LDA to reduce dimension, this time taking into account temporal discrimination information. Experimental results of the HMM-based speaker identification system are included to demonstrate the performance.