Multimodal speech driven facial shape animation using deep neural networks

Publication:
Multimodal speech driven facial shape animation using deep neural networks

dc.contributor.department	Department of Computer Engineering
dc.contributor.department	Graduate School of Sciences and Engineering
dc.contributor.kuauthor	Asadiabadi, Sasan
dc.contributor.kuauthor	Erzin, Engin
dc.contributor.kuauthor	Sadiq, Rizwan
dc.contributor.schoolcollegeinstitute	College of Engineering
dc.contributor.schoolcollegeinstitute	GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
dc.date.accessioned	2024-11-09T23:12:54Z
dc.date.issued	2018
dc.description.abstract	In this paper we present a deep learning multimodal approach for speech driven generation of face animations. Training a speaker independent model, capable of generating different emotions of the speaker, is crucial for realistic animations. Unlike the previous approaches which either use acoustic features or phoneme label features to estimate the facial movements, we utilize both modalities to generate natural looking speaker independent lip animations synchronized with affective speech. A phoneme-based model qualifies generation of speaker independent animation, whereas an acoustic feature-based model enables capturing affective variation during the animation generation. We show that our multimodal approach not only performs significantly better on affective data, but improves performance over neutral data as well. We evaluate the proposed multimodal speech-driven animation model using two large scale datasets, GRID and SAVEE, by reporting the mean squared error (MSE) over various network structures.
dc.description.indexedby	WOS
dc.description.indexedby	Scopus
dc.description.openaccess	YES
dc.description.publisherscope	International
dc.description.sponsoredbyTubitakEu	N/A
dc.identifier.isbn	978-9-8814-7685-2
dc.identifier.issn	2309-9402
dc.identifier.scopus	2-s2.0-85063081138
dc.identifier.uri	https://hdl.handle.net/20.500.14288/9884
dc.identifier.wos	468383400245
dc.keywords	Deep learning
dc.keywords	Speech driven animations
dc.keywords	Deep neural network (DNN)
dc.keywords	Active shape models (ASM)
dc.language.iso	eng
dc.publisher	Ieee
dc.relation.ispartof	2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (Apsipa Asc)
dc.subject	Engineering
dc.subject	Electrical electronic engineering
dc.title	Multimodal speech driven facial shape animation using deep neural networks
dc.type	Conference Proceeding
dspace.entity.type	Publication
local.contributor.kuauthor	Asadiabadi, Sasan
local.contributor.kuauthor	Sadiq, Rizwan
local.contributor.kuauthor	Erzin, Engin
local.publication.orgunit1	GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
local.publication.orgunit1	College of Engineering
local.publication.orgunit2	Department of Computer Engineering
local.publication.orgunit2	Graduate School of Sciences and Engineering
relation.isOrgUnitOfPublication	89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication	3fc31c89-e803-4eb1-af6b-6258bc42c3d8
relation.isOrgUnitOfPublication.latestForDiscovery	89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isParentOrgUnitOfPublication	8e756b23-2d4a-4ce8-b1b3-62c794a8c164
relation.isParentOrgUnitOfPublication	434c9663-2b11-4e66-9399-c863e2ebae43
relation.isParentOrgUnitOfPublication.latestForDiscovery	8e756b23-2d4a-4ce8-b1b3-62c794a8c164

Collections

Publications without Fulltext

Publication: Multimodal speech driven facial shape animation using deep neural networks

Files

Collections

Publication:
Multimodal speech driven facial shape animation using deep neural networks