Publication:
Multimodal speech driven facial shape animation using deep neural networks

dc.contributor.departmentDepartment of Electrical and Electronics Engineering
dc.contributor.departmentDepartment of Electrical and Electronics Engineering
dc.contributor.kuauthorErzin, Engin
dc.contributor.kuauthorSadiq, Rizwan
dc.contributor.kuauthorAsadiabadi, Sasan
dc.contributor.kuprofileFaculty Member
dc.contributor.schoolcollegeinstituteGraduate School of Sciences and Engineering
dc.contributor.yokid34503
dc.contributor.yokidN/A
dc.contributor.yokidN/A
dc.date.accessioned2024-11-09T11:50:10Z
dc.date.issued2018
dc.description.abstractIn this paper we present a deep learning multimodal approach for speech driven generation of face animations. Training a speaker independent model, capable of generating different emotions of the speaker, is crucial for realistic animations. Unlike the previous approaches which either use acoustic features or phoneme label features to estimate the facial movements, we utilize both modalities to generate natural looking speaker independent lip animations synchronized with affective speech. A phoneme-based model qualifies generation of speaker independent animation, whereas an acoustic feature-based model enables capturing affective variation during the animation generation. We show that our multimodal approach not only performs significantly better on affective data, but improves performance over neutral data as well. We evaluate the proposed multimodal speech-driven animation model using two large scale datasets, GRID and SAVEE, by reporting the mean squared error (MSE) over various network structures.
dc.description.fulltextYES
dc.description.indexedbyWoS
dc.description.indexedbyScopus
dc.description.openaccessYES
dc.description.publisherscopeInternational
dc.description.sponsoredbyTubitakEuN/A
dc.description.sponsorshipN/A
dc.description.versionAuthor's final manuscript
dc.formatpdf
dc.identifier.doi10.23919/APSIPA.2018.8659713
dc.identifier.embargoNO
dc.identifier.filenameinventorynoIR01880
dc.identifier.isbn9789881476852
dc.identifier.issn2309-9402
dc.identifier.linkhttps://doi.org/10.23919/APSIPA.2018.8659713
dc.identifier.quartileN/A
dc.identifier.scopus2-s2.0-85063081138
dc.identifier.urihttps://hdl.handle.net/20.500.14288/669
dc.identifier.wos468383400245
dc.keywordsDeep learning
dc.keywordsSpeech driven animations
dc.keywordsDeep neural network (DNN)
dc.keywordsActive shape models (ASM)
dc.languageEnglish
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)
dc.relation.grantnoNA
dc.relation.urihttp://cdm21054.contentdm.oclc.org/cdm/ref/collection/IR/id/8562
dc.source2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
dc.subjectEngineering, electrical and electronic
dc.titleMultimodal speech driven facial shape animation using deep neural networks
dc.typeConference proceeding
dspace.entity.typePublication
local.contributor.authorid0000-0002-2715-2368
local.contributor.authoridN/A
local.contributor.authoridN/A
local.contributor.kuauthorErzin, Engin
local.contributor.kuauthorSadiq, Rizwan
local.contributor.kuauthorAsadiabadi, Sasan
relation.isOrgUnitOfPublication21598063-a7c5-420d-91ba-0cc9b2db0ea0
relation.isOrgUnitOfPublication.latestForDiscovery21598063-a7c5-420d-91ba-0cc9b2db0ea0

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
8562.pdf
Size:
334.4 KB
Format:
Adobe Portable Document Format