Publication:
Analysis of head gesture and prosody patterns for prosody-driven head-gesture animation

dc.contributor.coauthorSargin, Mehmet Emre
dc.contributor.departmentDepartment of Computer Engineering
dc.contributor.departmentDepartment of Computer Engineering
dc.contributor.departmentDepartment of Electrical and Electronics Engineering
dc.contributor.kuauthorYemez, Yücel
dc.contributor.kuauthorErzin, Engin
dc.contributor.kuauthorTekalp, Ahmet Murat
dc.contributor.kuprofileFaculty Member
dc.contributor.kuprofileFaculty Member
dc.contributor.kuprofileFaculty Member
dc.contributor.otherDepartment of Computer Engineering
dc.contributor.otherDepartment of Electrical and Electronics Engineering
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.contributor.yokid107907
dc.contributor.yokid34503
dc.contributor.yokid26207
dc.date.accessioned2024-11-09T22:59:42Z
dc.date.issued2008
dc.description.abstractWe propose a new two-stage framework for joint analysis of head gesture and speech prosody patterns of a speaker toward automatic realistic synthesis of head gestures from speech prosody. In the first stage analysis, we perform Hidden Markov Model (HMM)-based unsupervised temporal segmentation of head gesture and speech prosody features separately to determine elementary head gesture and speech prosody patterns, respectively, for a particular speaker. In the second stage, joint analysis of correlations between these elementary head gesture and prosody patterns is performed using Multistream HMMs to determine an audio-visual mapping model. The resulting audio-visual mapping model is then employed to synthesize natural head gestures from arbitrary input test speech given a head model for the speaker. In the synthesis stage, the audio-visual mapping model is used to predict a sequence of gesture patterns from the prosody pattern sequence computed for the input test speech. The Euler angles associated with each gesture pattern are then applied to animate the speaker head model. Objective and subjective evaluations indicate that the proposed synthesis by analysis scheme provides natural looking head gestures for the speaker with any input test speech, as well as in "prosody transplant" and "gesture transplant" scenarios.
dc.description.indexedbyWoS
dc.description.indexedbyScopus
dc.description.indexedbyPubMed
dc.description.issue8
dc.description.openaccessYES
dc.description.publisherscopeInternational
dc.description.volume30
dc.identifier.doi10.1109/TPAMI.2007.70797
dc.identifier.eissn1939-3539
dc.identifier.issn0162-8828
dc.identifier.quartileQ1
dc.identifier.scopus2-s2.0-46149109647
dc.identifier.urihttp://dx.doi.org/10.1109/TPAMI.2007.70797
dc.identifier.urihttps://hdl.handle.net/20.500.14288/7941
dc.identifier.wos256679700002
dc.keywordsMultimedia information systems
dc.keywordsSpeech analysis
dc.keywordsSace and gesture recognition
dc.keywordsPattern analysis and recognition
dc.keywordsAnimation
dc.languageEnglish
dc.publisherIEEE Computer Soc
dc.sourceIEEE Transactions on Pattern Analysis and Machine Intelligence
dc.subjectComputer Science
dc.subjectArtificial intelligence
dc.subjectElectrical electronics engineering
dc.titleAnalysis of head gesture and prosody patterns for prosody-driven head-gesture animation
dc.typeJournal Article
dspace.entity.typePublication
local.contributor.authorid0000-0002-7515-3138
local.contributor.authorid0000-0002-2715-2368
local.contributor.authorid0000-0003-1465-8121
local.contributor.kuauthorYemez, Yücel
local.contributor.kuauthorErzin, Engin
local.contributor.kuauthorTekalp, Ahmet Murat
relation.isOrgUnitOfPublication89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication21598063-a7c5-420d-91ba-0cc9b2db0ea0
relation.isOrgUnitOfPublication.latestForDiscovery21598063-a7c5-420d-91ba-0cc9b2db0ea0

Files