Researcher:
Bozkurt, Elif

Loading...
Profile Picture
ORCID

Job Title

PhD Student

First Name

Elif

Last Name

Bozkurt

Name

Name Variants

Bozkurt, Elif

Email Address

Birth Date

Search Results

Now showing 1 - 10 of 22
  • Placeholder
    Publication
    Multimodal analysis of speech prosody and upper body gestures using hidden semi-Markov models
    (Institute of Electrical and Electronics Engineers (IEEE), 2013) N/A; N/A; N/A; Department of Computer Engineering; Department of Computer Engineering; Bozkurt, Elif; Asta, Shahriar; Özkul, Serkan; Yemez, Yücel; Erzin, Engin; PhD Student; PhD Student; Master Student; Faculty Member; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; Graduate School of Sciences and Engineering; Graduate School of Sciences and Engineering; College of Engineering; College of Engineering; N/A; N/A; N/A; 107907; 34503
    Gesticulation is an essential component of face-to-face communication, and it contributes significantly to the natural and affective perception of human-to-human communication. In this work we investigate a new multimodal analysis framework to model relationships between intonational and gesture phrases using the hidden semi-Markov models (HSMMs). The HSMM framework effectively associates longer duration gesture phrases to shorter duration prosody clusters, while maintaining realistic gesture phrase duration statistics. We evaluate the multimodal analysis framework by generating speech prosody driven gesture animation, and employing both subjective and objective metrics.
  • Placeholder
    Publication
    Agreement and disagreement classification of dyadic interactions using vocal and gestural cues
    (Institute of Electrical and Electronics Engineers (IEEE), 2016) N/A; N/A; N/A; Department of Computer Engineering; Khaki, Hossein; Bozkurt, Elif; Erzin, Engin; PhD Student; PhD Student; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; N/A; 34503
    In human-to-human communication gesture and speech co-exist in time with a tight synchrony, where we tend to use gestures to complement or to emphasize speech. In this study, we investigate roles of vocal and gestural cues to identify a dyadic interaction as agreement and disagreement. In this investigation we use the JESTKOD database, which consists of speech and full-body motion capture data recordings for dyadic interactions under agreement and disagreement scenarios. Spectral features of vocal channel and upper body joint angles of gestural channel are employed to extract unimodal and multimodal classification performances. Both of the modalities attain classification rates significantly above the chance level and the multimodal classifier performed more than 80% classification rate over 15 second utterances using statistical features of speech and motion.
  • Placeholder
    Publication
    Real-time speech driven gesture animation
    (2016) N/A; N/A; N/A; Department of Computer Engineering; Department of Computer Engineering; Kasarcı, Kenan; Bozkurt, Elif; Yemez, Yücel; Erzin, Engin; Master Student; PhD Student; Faculty Member; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; Graduate School of Sciences and Engineering; College of Engineering; College of Engineering; N/A; N/A; 107907; 34503
    Gesture and speech co-exist in time with a tight synchrony, and they are planned and shaped by the emotional state and produced together. In our early studies we have developed joint gesture-speech models and proposed algorithms for speech driven gesture animation. These algorithms mainly based on the Viterbi decoders and can not run in realtime. In this paper we are presenting necessary modifications of these algorithms to run in realtime. In the experimental studies we present the conditions that satisfy realtime factor of 1. We as well demonstrate the realtime and non-realtime speech driven gesture animations for subjective evaluations.
  • Placeholder
    Publication
    The JESTKOD database: an affective multimodal database of dyadic interactions
    (Springer, 2017) N/A; N/A; N/A; N/A; Department of Computer Engineering; Department of Computer Engineering; Bozkurt, Elif; Khaki, Hossein; Keçeci, Sinan; Türker, Bekir Berker; Yemez, Yücel; Erzin, Engin; PhD Student; PhD Student; Master Student; PhD Student; Faculty Member; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; Graduate School of Sciences and Engineering; Graduate School of Sciences and Engineering; Graduate School of Sciences and Engineering; College of Engineering; College of Engineering; N/A; N/A; N/A; N/A; 107907; 34503
    in human-to-human communication, gesture and speech co-exist in time with a tight synchrony, and gestures are often utilized to complement or to emphasize speech. in human-computer interaction systems, natural, Affective and believable use of gestures would be a valuable key component in adopting and emphasizing human-centered aspects. However, natural and affective multimodal data, for studying computational models of gesture and speech, is limited. in this study, we introduce the JESTKOD database, which consists of speech and full-body motion capture data recordings in dyadic interaction setting under agreement and disagreement scenarios. Participants of the dyadic interactions are native Turkish speakers and recordings of each participant are rated in dimensional affect space. We present our multimodal data collection and annotation process, As well as our preliminary experimental studies on agreement/disagreement classification of dyadic interactions using body gesture and speech data. the JESTKOD database provides a valuable asset to investigate gesture and speech towards designing more natural and affective human-computer interaction systems.
  • Placeholder
    Publication
    Exploring modulation spectrum features for speech-based depression level classification
    (International Speech and Communication Association, 2014) Toledo-Ronen, Orith; Sorin, Alexander; N/A; Bozkurt, Elif; PhD Student; Graduate School of Sciences and Engineering; N/A
    In this paper, we propose a Modulation Spectrum-based manageable feature set for detection of depressed speech. Modulation Spectrum (MS) is obtained from the conventional speech spectrogram by spectral analysis along the temporal trajectories of the acoustic frequency bins. While MS representation of speech provides rich and high-dimensional joint frequency information, extraction of discriminative features from it remains as an open question. We propose a lower dimensional representation, which first employs a Melfrequency filterbank in the acoustic frequency domain and Discrete Cosine Transform in the modulation frequency domain, and then applies feature selection in both domains. We compare and fuse the proposed feature set with other complementary prosodic and spectral features at the feature and decision levels. In our experiments, we use Support Vector Machines for discriminating the depressed speech in a speaker-independent fashion. Feature-level fusion of the proposed MS-based features with other prosodic and spectral features after dimension reduction provides up to ~9% improvement over the baseline results and also correlates the most with clinical ratings of patients' depression level.
  • Placeholder
    Publication
    Evaluation of emotion recognition from speech
    (IEEE, 2012) Department of Computer Engineering; N/A; Erzin, Engin; Bozkurt, Elif; Faculty Member; PhD Student; Department of Computer Engineering; College of Engineering; Graduate School of Sciences and Engineering; 34503; N/A
    Over the last few years, interest on paralinguistic information classification has grown considerably. However, in comparison to related speech processing tasks such as Automatic Speech and Speaker Recognition, practically no standardised corpora and test-conditions exist to compare performances under exactly the same conditions. The successive challenges proposed at the world's largest conference on automatic speech processing, namely the INTERSPEECH conferences, are important for comparing performance of statistical classifiers. In this paper, we summarize results, commonly used methods of challenge participants and results of Koç University, Multimedia, Vision and Graphics Laboratory on the same tasks. Our main contributions include Formant Position-based weighted Spectral features that emphasize emotion in speech and RANSAC-based (Random Sampling Consensus) Training data selection for pruning possible outliers in the training set. © 2012 IEEE./ Öz: Son yıllarda otomatik sözsüz iletişim üzerine araştırmalar yoğunluk kazandı; ancak, otomatik konuşma ve konuşmacı tanıma sistemlerine kıyasla sözsüz iletişim araştırmaları için oluşturulmuş¸ standart, genel bir veri tabanı ya da aynı koşullar altında başarım değerlendirmesi ölçütü tanımlanmış¸ değil. Bu nedenle konuşma işleme alanında en kapsamlı ve önemli konferans olan INTERSPEECH konferansı kap- samında son yıllarda düzenlenen sözsüz iletişimle ilgili yarışmalar bu güncel problemlere farklı bakış¸ açılarının değerlendirilmesi ve karşılaştırılması bakımından önem taşıyor. Bu bildiride, konuşma işleme ve sözsüz iletişimle ilgili düzen- lenen yarışmaları, en başarılı sonuçları, yöntemleri ve Koç¸ Üniversitesi, Multimedya, Görü ve Grafik Laboratuvarı’nın bu yarışmalarda kullandığı yöntemleri özetliyoruz. Başlıca katkılarımız olan konuşmadaki duygu durumunu öne çıkaran birleştirici konumu ağırlıklı spektral öznitelikler ve veri ta- banında bulunan aykırı, etiketlenmesi guç¸ ya da belirsiz olan kayıtların temizlenmesi için önerdiğimiz RANSAC (Ran- dom Sampling Consensus) temelli sınıflandırma yontemlerini değerlendiriyoruz.
  • Placeholder
    Publication
    Ransac-based training data selection for speaker state recognition
    (Isca-Int Speech Communication Assoc, 2011) Erdem, Çiğdem Eroğlu; Erdem, A. Tanju; N/A; Department of Computer Engineering; Bozkurt, Elif; Erzin, Engin; PhD Student; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; 34503
    We present a Random Sampling Consensus (RANSAC) based training approach for the problem of speaker state recognition from spontaneous speech. Our system is trained and tested with the INTERSPEECH 2011 Speaker State Challenge corpora that includes the Intoxication and the Sleepiness Sub-challenges, where each sub-challenge defines a two-class classification task. We aim to perform a RANSAC-based training data selection coupled with the Support Vector Machine (SVM) based classification to prune possible outliers, which exist in the training data. Our experimental evaluations indicate that utilization of RANSAC-based training data selection provides 66.32 % and 65.38 % unweighted average (UA) recall rate on the development and test sets for the Sleepiness Sub-challenge, respectively and a slight improvement on the Intoxication Sub-challenge performance.
  • Placeholder
    Publication
    Ransac-based training data selection for emotion recognition from spontaneous speech
    (ACM, 2010) Erdem, Çiǧdem Eroǧlu; Erdem, A. Tanju; Department of Computer Engineering; N/A; Erzin, Engin; Bozkurt, Elif; Faculty Member; PhD Student; Department of Computer Engineering; College of Engineering; Graduate School of Sciences and Engineering; 34503; N/A
    Training datasets containing spontaneous emotional expressions are often imperfect due the ambiguities and difficulties of labeling such data by human observers. In this paper, we present a Random Sampling Consensus (RANSAC) based training approach for the problem of emotion recognition from spontaneous speech recordings. Our motivation is to insert a data cleaning process to the training phase of the Hidden Markov Models (HMMs) for the purpose of removing some suspicious instances of labels that may exist in the training dataset. Our experiments using HMMs with various number of states and Gaussian mixtures per state indicate that utilization of RANSAC in the training phase provides an improvement of up to 2.84% in the unweighted recall rates on the test set. This improvement in the accuracy of the classifier is shown to be statistically significant using McNemar's test.
  • Placeholder
    Publication
    Interspeech 2009 emotion recognition challenge evaluation
    (IEEE, 2010) Erdem, Çiǧdem Eroǧlu; Erdem, A. Tanju; Department of Computer Engineering; N/A; Erzin, Engin; Bozkurt, Elif; Faculty Member; PhD Student; Department of Computer Engineering; College of Engineering; Graduate School of Sciences and Engineering; 34503; N/A
    In this paper we evaluate INTERSPEECH 2009 Emotion Recognition Challenge results. The challenge presents the problem of accurate classification of natural and emotionally rich FAU Aibo recordings into five and two emotion classes. We evaluate prosody related, spectral and HMM-based features with Gaussian mixture model (GMM) classifiers to attack this problem. Spectral features consist of mel-scale cepstral coefficients (MFCC), line spectral frequency (LSF) features and their derivatives, whereas prosody-related features consist of pitch, first derivative of pitch and intensity. We employ unsupervised training of HMM structures with prosody related temporal features to define HMM-based features. We also investigate data fusion of different features and decision fusion of different classifiers to improve emotion recognition results. Our two-stage decision fusion method achieves 41.59 % and 67.90 % recall rate for the five and two-class problems, respectively and takes second and fourth place among the overall challenge results. ©2010 IEEE.
  • Placeholder
    Publication
    Speech rhythm-driven gesture animation
    (Institute of Electrical and Electronics Engineers (IEEE), 2013) N/A; Department of Computer Engineering; Department of Computer Engineering; Bozkurt, Elif; Erzin, Engin; Yemez, Yücel; PhD Student; Faculty Member; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; College of Engineering; College of Engineering; N/A; 34503; 107907
    Gesticulation is an essential component of face-toface communication, and it contributes significantly to the natural and affective perception of human-to-human communication. In this work, we investigate a new multimodal analysis framework to model the relationship between speech rhythm and gesture phrases. We extract speech rhythm using Fourier analysis of the amplitude envelope of bandpass filtered speech rather than computing rhythm with time domain measurements of interval durations, a frequency domain representation is employed. The speech rhythm driven gesture animation framework effectively associates gesture phrases to speech phrases, while maintaining the realistic gesture rhythm. We evaluate the speech rhythm driven gesture animation by subjective tests./ Öz: Konuşmaya eşlik eden el hareketleri (jestler) yüzyüze iletişimin temel bileşenlerinden biridir ve insan insana iletişimin doğal ve duyuşsal algısına önemli bir şekilde katkıda bulunurlar. Bu çalışmada, konuşma ritmi ve jest ifadelerinin ilişkisini modellemek icin yeni bir çok kipli analiz sistemini inceliyoruz. Konuşma ritmini zaman alanında aralık süreleri ölçütleri ile hesaplamaktansa, bant geçiren filtreleme sonucu elde ettigimiz genlik zarfının Fourier analizi ile frekans alanında çıkartıyoruz. Konuşma ritmi sürümlü jest canlandırma sistemi jest ifadelerini başarılı bir şekilde konuşma ifadeleri ile ilişkilendiriyor. Konuşma ritmi sürümlü jest canlandırmasını öznel testler ile degerlendiy- oruz.