Multimodal speech driven facial shape animation using deep neural networks

Publication:
Multimodal speech driven facial shape animation using deep neural networks

dc.contributor.department	Department of Electrical and Electronics Engineering
dc.contributor.department	Graduate School of Sciences and Engineering
dc.contributor.kuauthor	Asadiabadi, Sasan
dc.contributor.kuauthor	Erzin, Engin
dc.contributor.kuauthor	Sadiq, Rizwan
dc.contributor.schoolcollegeinstitute	College of Engineering
dc.contributor.schoolcollegeinstitute	GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
dc.date.accessioned	2024-11-09T11:50:10Z
dc.date.issued	2018
dc.description.abstract	In this paper we present a deep learning multimodal approach for speech driven generation of face animations. Training a speaker independent model, capable of generating different emotions of the speaker, is crucial for realistic animations. Unlike the previous approaches which either use acoustic features or phoneme label features to estimate the facial movements, we utilize both modalities to generate natural looking speaker independent lip animations synchronized with affective speech. A phoneme-based model qualifies generation of speaker independent animation, whereas an acoustic feature-based model enables capturing affective variation during the animation generation. We show that our multimodal approach not only performs significantly better on affective data, but improves performance over neutral data as well. We evaluate the proposed multimodal speech-driven animation model using two large scale datasets, GRID and SAVEE, by reporting the mean squared error (MSE) over various network structures.
dc.description.fulltext	YES
dc.description.indexedby	WOS
dc.description.indexedby	Scopus
dc.description.openaccess	YES
dc.description.publisherscope	International
dc.description.sponsoredbyTubitakEu	N/A
dc.description.sponsorship	N/A
dc.description.version	Author's final manuscript
dc.identifier.doi	10.23919/APSIPA.2018.8659713
dc.identifier.embargo	NO
dc.identifier.filenameinventoryno	IR01880
dc.identifier.isbn	9789881476852
dc.identifier.issn	2309-9402
dc.identifier.quartile	N/A
dc.identifier.scopus	2-s2.0-85063081138
dc.identifier.uri	https://hdl.handle.net/20.500.14288/669
dc.identifier.wos	468383400245
dc.keywords	Deep learning
dc.keywords	Speech driven animations
dc.keywords	Deep neural network (DNN)
dc.keywords	Active shape models (ASM)
dc.language.iso	eng
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.relation.grantno	NA
dc.relation.ispartof	2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
dc.relation.uri	http://cdm21054.contentdm.oclc.org/cdm/ref/collection/IR/id/8562
dc.subject	Engineering, electrical and electronic
dc.title	Multimodal speech driven facial shape animation using deep neural networks
dc.type	Conference Proceeding
dspace.entity.type	Publication
local.contributor.kuauthor	Erzin, Engin
local.contributor.kuauthor	Sadiq, Rizwan
local.contributor.kuauthor	Asadiabadi, Sasan
local.publication.orgunit1	GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
local.publication.orgunit1	College of Engineering
local.publication.orgunit2	Department of Electrical and Electronics Engineering
local.publication.orgunit2	Graduate School of Sciences and Engineering
relation.isOrgUnitOfPublication	21598063-a7c5-420d-91ba-0cc9b2db0ea0
relation.isOrgUnitOfPublication	3fc31c89-e803-4eb1-af6b-6258bc42c3d8
relation.isOrgUnitOfPublication.latestForDiscovery	21598063-a7c5-420d-91ba-0cc9b2db0ea0
relation.isParentOrgUnitOfPublication	8e756b23-2d4a-4ce8-b1b3-62c794a8c164
relation.isParentOrgUnitOfPublication	434c9663-2b11-4e66-9399-c863e2ebae43
relation.isParentOrgUnitOfPublication.latestForDiscovery	8e756b23-2d4a-4ce8-b1b3-62c794a8c164

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 8562.pdf
Size:: 334.4 KB
Format:: Adobe Portable Document Format

Download

Collections

Publications with Fulltext

Publication: Multimodal speech driven facial shape animation using deep neural networks

Files

Original bundle

Collections

Publication:
Multimodal speech driven facial shape animation using deep neural networks