Research Outputs
Permanent URI for this communityhttps://hdl.handle.net/20.500.14288/2
Browse
59 results
Search Results
Publication Metadata only 3D Shape recovery and tracking from multi-camera video sequences via surface deformation(IEEE, 2006) Skala, V.; N/A; Department of Computer Engineering; Sahillioğlu, Yusuf; Yemez, Yücel; PhD Student; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; College of Engineering; 215195; 107907This paper addresses 3D reconstruction and modeling of time-varying real objects using multicamera video. The work consists of two phases. In the first phase, the initial shape of the object is recovered from its silhouettes using a surface deformation model. The same deformation model is also employed in the second phase to track the recovered initial shape through the time-varying silhouette information by surface evolution. The surface deformation/evolution model allows us to construct a spatially and temporally smooth surface mesh representation having fixed connectivity. This eventually leads to an overall space-time representation that preserves the semantics of the underlying motion and that is much more efficient to process, to visualize, to store and to transmit.Publication Metadata only A new statistical excitation mapping for enhancement of throat microphone recordings(International Speech and Communication Association, 2013) N/A; Department of Computer Engineering; Turan, Mehmet Ali Tuğtekin; Erzin, Engin; PhD Student; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; 34503In this paper we investigate a new statistical excitation mapping technique to enhance throat-microphone speech using joint analysis of throat- And acoustic-microphone recordings. In a recent study we employed source-filter decomposition to enhance spectral envelope of the throat-microphone recordings. In the source-filter decomposition framework we observed that the spectral envelope difference of the excitation signals of throatand acoustic-microphone recordings is an important source of the degradation in the throat-microphone voice quality. In this study we model spectral envelope difference of the excitation signals as a spectral tilt vector, and we propose a new phone-dependent GMM-based spectral tilt mapping scheme to enhance throat excitation signal. Experiments are performed to evaluate the proposed excitation mapping scheme in comparison with the state-of-the-art throat-microphone speech enhancement techniques using both objective and subjective evaluations. Objective evaluations are performed with the wideband perceptual evaluation of speech quality (ITU-PESQ) metric. Subjective evaluations are performed with the A/B pair comparison listening test. Both objective and subjective evaluations yield that the proposed statistical excitation mapping consistently delivers higher improvements than the statistical mapping of the spectral envelope to enhance the throat-microphone recordings.Publication Metadata only A phonetic classification for throat microphone enhancement(IEEE, 2014) N/A; Department of Computer Engineering; Turan, Mehmet Ali Tuğtekin; Erzin, Engin; PhD Student; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; College of Engineering; N/A; 34503In this analysis paper, we investigate the effect of phonetic clustering based on place and manner of articulation for the enhancement of throat-microphone speech through spectral envelope mapping. Place of articulation (PoA) and manner of articulation (MoA) dependent GMM-based spectral envelope mapping schemes have been investigated using the reflection coefficient representation of the linear prediction model. Reflection coefficients are expected to localize mapping performance within the concatenation of lossless tubes model of the vocal tract. In experimental studies, we evaluate spectral mapping performance within clusters of the PoA and MoA using the log-spectral distortion (LSD) and as function of reflection coefficient mapping using the mean-square error distance. Our findings indicate that highest degradations after the spectral mapping occur with stops and liquids of the MoA, and velar and alveolar classes of the PoA. The MoA classification attains higher improvements than the PoA classification.Publication Metadata only A volumetric fusion technique for surface reconstruction from silhouettes and range data(academic Press inc Elsevier Science, 2007) Department of Computer Engineering; N/A; N/A; Yemez, Yücel; Wetherilt, Can James; Faculty Member; Master Student; Department of Computer Engineering; College of Engineering; 107907; N/AOptical triangulation, An active reconstruction technique, is known to be an accurate method but has several shortcomings due to occlusion and laser reflectance properties of the object surface, that often lead to holes and inaccuracies on the recovered surface. Shape from silhouette, on the other hand, As a passive reconstruction technique, yields robust, hole-free reconstruction of the visual hull of the object. in this paper, A hybrid surface reconstruction method that fuses geometrical information acquired from silhouette images and optical triangulation is presented. Our motivation is to recover the geometry from silhouettes on those parts of the surface which the range data fail to capture. a volumetric octree representation is first obtained from the silhouette images and then carved by range points to amend the missing cavity information. an isolevel value on each surface cube of the carved octree structure is accumulated using local surface triangulations obtained separately from range data and silhouettes. the MARChing cubes algorithm is then applied for triangulation of the volumetric representation. the performance of the proposed technique is demonstrated on several real objects.Publication Metadata only Adaptive classifier cascade for multimodal speaker identification(International Speech Communication Association, 2004) Department of Electrical and Electronics Engineering; Department of Computer Engineering; Department of Computer Engineering; Tekalp, Ahmet Murat; Erzin, Engin; Yemez, Yücel; Faculty Member; Faculty Member; Faculty Member; Department of Electrical and Electronics Engineering; Department of Computer Engineering; College of Engineering; College of Engineering; College of Engineering; 26207; 34503; 107907We present a multimodal open-set speaker identification system that integrates information coming from audio, face and lip motion modalities. For fusion of multiple modalities, we propose a new adaptive cascade rule that favors reliable modality combinations through a cascade of classifiers. The order of the classifiers in the cascade is adaptively determined based on the reliability of each modality combination. A novel reliability measure, that genuinely fits to the open-set speaker identification problem, is also proposed to assess accept or reject decisions of a classifier. The proposed adaptive rule is more robust in the presence of unreliable modalities, and outperforms the hard-level max rule and soft-level weighted summation rule, provided that the employed reliability measure is effective in assessment of classifier decisions. Experimental results that support this assertion are provided.Publication Metadata only Affect burst recognition using multi-modal cues(IEEE Computer Society, 2014) N/A; N/A; Department of Computer Engineering; Department of Computer Engineering; Department of Computer Engineering; Türker, Bekir Berker; Marzban, Shabbir; Erzin, Engin; Yemez, Yücel; Sezgin, Tevfik Metin; PhD Student; Master Student; Faculty Member; Faculty Member; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; Graduate School of Sciences and Engineering; College of Engineering; College of Engineering; College of Engineering; N/A; N/A; 34503; 107907; 18632Affect bursts, which are nonverbal expressions of emotions in conversations, play a critical role in analyzing affective states. Although there exist a number of methods on affect burst detection and recognition using only audio information, little effort has been spent for combining cues in a multi-modal setup. We suggest that facial gestures constitute a key component to characterize affect bursts, and hence have potential for more robust affect burst detection and recognition. We take a data-driven approach to characterize affect bursts using Hidden Markov Models (HMM), and employ a multimodal decision fusion scheme that combines cues from audio and facial gestures for classification of affect bursts. We demonstrate the contribution of facial gestures to affect burst recognition by conducting experiments on an audiovisual database which comprise speech and facial motion data belonging to various dyadic conversations.Publication Metadata only Analysis and synthesis of multiview audio-visual dance figures(IEEE, 2008) Canton-Ferrer C.; Tilmanne J.; Balcı K.; Bozkurt E.; Kızoǧlu I.Akarun L.; Erdem A.T.; Department of Electrical and Electronics Engineering; Department of Computer Engineering; Department of Computer Engineering; N/A; N/A; Tekalp, Ahmet Murat; Erzin, Engin; Yemez, Yücel; Ofli, Ferda; Demir, Yasemin; Faculty Member; Faculty Member; Faculty Member; PhD Student; Master Student; Department of Electrical and Electronics Engineering; Department of Computer Engineering; College of Engineering; College of Engineering; College of Engineering; Graduate School of Sciences and Engineering; Graduate School of Sciences and Engineering; Graduate School of Sciences and Engineering; 26207; 34503; 107907; N/A; N/A; N/AThis paper presents a framework for audio-driven human body motion analysis and synthesis. The video is analyzed to capture the time-varying posture of the dancer's body whereas the musical audio signal is processed to extract the beat information. The human body posture is extracted from multiview video information without any human intervention using a novel marker-based algorithm based on annealing particle filtering. Body movements of the dancer are characterized by a set of recurring semantic motion patterns, i.e., dance figures. Each dance figure is modeled in a supervised manner with a set of HMM (Hidden Markov Model) structures and the associated beat frequency. In synthesis, given an audio signal of a learned musical type, the motion parameters of the corresponding dance figures are synthesized via the trained HMM structures in synchrony with the input audio signal based on the estimated tempo information. Finally, the generated motion parameters are animated along with the musical audio using a graphical animation tool. Experimental results demonstrate the effectiveness of the proposed framework.Publication Metadata only Analysis of engagement and user experience with a laughter responsive social robot(Isca-int Speech Communication assoc, 2017) N/A; N/A; Department of Computer Engineering; Department of Computer Engineering; Department of Computer Engineering; Türker, Bekir Berker; Buçinca, Zana; Erzin, Engin; Yemez, Yücel; Sezgin, Tevfik Metin; PhD Student; Master Student; Faculty Member; Faculty Member; Faculty Member; Department of Computer Engineering; Graduate School of Sciences and Engineering; Graduate School of Sciences and Engineering; College of Engineering; College of Engineering; College of Engineering; N/A; N/A; 34503; 107907; 18632We explore the effect of laughter perception and response in terms of engagement in human-robot interaction. We designed two distinct experiments in which the robot has two modes: laughter responsive and laughter non-responsive. in responsive mode, the robot detects laughter using a multimodal real-time laughter detection module and invokes laughter as a backchannel to users accordingly. in non-responsive mode, robot has no utilization of detection, thus provides no feedback. in the experimental design, we use a straightforward question-answer based interaction scenario using a back-projected robot head. We evaluate the interactions with objective and subjective measurements of engagement and user experience.Publication Metadata only Analysis of head gesture and prosody patterns for prosody-driven head-gesture animation(IEEE Computer Soc, 2008) Sargin, Mehmet Emre; Department of Computer Engineering; Department of Computer Engineering; Department of Electrical and Electronics Engineering; Yemez, Yücel; Erzin, Engin; Tekalp, Ahmet Murat; Faculty Member; Faculty Member; Faculty Member; Department of Computer Engineering; Department of Electrical and Electronics Engineering; College of Engineering; College of Engineering; College of Engineering; 107907; 34503; 26207We propose a new two-stage framework for joint analysis of head gesture and speech prosody patterns of a speaker toward automatic realistic synthesis of head gestures from speech prosody. In the first stage analysis, we perform Hidden Markov Model (HMM)-based unsupervised temporal segmentation of head gesture and speech prosody features separately to determine elementary head gesture and speech prosody patterns, respectively, for a particular speaker. In the second stage, joint analysis of correlations between these elementary head gesture and prosody patterns is performed using Multistream HMMs to determine an audio-visual mapping model. The resulting audio-visual mapping model is then employed to synthesize natural head gestures from arbitrary input test speech given a head model for the speaker. In the synthesis stage, the audio-visual mapping model is used to predict a sequence of gesture patterns from the prosody pattern sequence computed for the input test speech. The Euler angles associated with each gesture pattern are then applied to animate the speaker head model. Objective and subjective evaluations indicate that the proposed synthesis by analysis scheme provides natural looking head gestures for the speaker with any input test speech, as well as in "prosody transplant" and "gesture transplant" scenarios.Publication Metadata only Analysis of interaction attitudes using data-driven hand gesture phrases(Institute of Electrical and Electronics Engineers (IEEE), 2014) Yang, Zhaojun; Metallinou, Angeliki; Narayanan, Shrikanth; Department of Computer Engineering; Erzin, Engin; Faculty Member; Department of Computer Engineering; College of Engineering; 34503Hand gesture is one of the most expressive, natural and common types of body language for conveying attitudes and emotions in human interactions. In this paper, we study the role of hand gesture in expressing attitudes of friendliness or conflict towards the interlocutors during interactions. We first employ an unsupervised clustering method using a parallel HMM structure to extract recurring patterns of hand gesture (hand gesture phrases or primitives). We further investigate the validity of the derived hand gesture phrases by examining the correlation of dyad's hand gesture for different interaction types defined by the attitudes of interlocutors. Finally, we model the interaction attitudes with SVM using the dynamics of the derived hand gesture phrases over an interaction. The classification results are promising, suggesting the expressiveness of the derived hand gesture phrases for conveying attitudes and emotions.