Publication: Multimodal speech driven facial shape animation using deep neural networks
dc.contributor.department | Department of Electrical and Electronics Engineering | |
dc.contributor.department | Department of Electrical and Electronics Engineering | |
dc.contributor.kuauthor | Erzin, Engin | |
dc.contributor.kuauthor | Sadiq, Rizwan | |
dc.contributor.kuauthor | Asadiabadi, Sasan | |
dc.contributor.kuprofile | Faculty Member | |
dc.contributor.schoolcollegeinstitute | Graduate School of Sciences and Engineering | |
dc.contributor.yokid | 34503 | |
dc.contributor.yokid | N/A | |
dc.contributor.yokid | N/A | |
dc.date.accessioned | 2024-11-09T11:50:10Z | |
dc.date.issued | 2018 | |
dc.description.abstract | In this paper we present a deep learning multimodal approach for speech driven generation of face animations. Training a speaker independent model, capable of generating different emotions of the speaker, is crucial for realistic animations. Unlike the previous approaches which either use acoustic features or phoneme label features to estimate the facial movements, we utilize both modalities to generate natural looking speaker independent lip animations synchronized with affective speech. A phoneme-based model qualifies generation of speaker independent animation, whereas an acoustic feature-based model enables capturing affective variation during the animation generation. We show that our multimodal approach not only performs significantly better on affective data, but improves performance over neutral data as well. We evaluate the proposed multimodal speech-driven animation model using two large scale datasets, GRID and SAVEE, by reporting the mean squared error (MSE) over various network structures. | |
dc.description.fulltext | YES | |
dc.description.indexedby | WoS | |
dc.description.indexedby | Scopus | |
dc.description.openaccess | YES | |
dc.description.publisherscope | International | |
dc.description.sponsoredbyTubitakEu | N/A | |
dc.description.sponsorship | N/A | |
dc.description.version | Author's final manuscript | |
dc.format | ||
dc.identifier.doi | 10.23919/APSIPA.2018.8659713 | |
dc.identifier.embargo | NO | |
dc.identifier.filenameinventoryno | IR01880 | |
dc.identifier.isbn | 9789881476852 | |
dc.identifier.issn | 2309-9402 | |
dc.identifier.link | https://doi.org/10.23919/APSIPA.2018.8659713 | |
dc.identifier.quartile | N/A | |
dc.identifier.scopus | 2-s2.0-85063081138 | |
dc.identifier.uri | https://hdl.handle.net/20.500.14288/669 | |
dc.identifier.wos | 468383400245 | |
dc.keywords | Deep learning | |
dc.keywords | Speech driven animations | |
dc.keywords | Deep neural network (DNN) | |
dc.keywords | Active shape models (ASM) | |
dc.language | English | |
dc.publisher | Institute of Electrical and Electronics Engineers (IEEE) | |
dc.relation.grantno | NA | |
dc.relation.uri | http://cdm21054.contentdm.oclc.org/cdm/ref/collection/IR/id/8562 | |
dc.source | 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) | |
dc.subject | Engineering, electrical and electronic | |
dc.title | Multimodal speech driven facial shape animation using deep neural networks | |
dc.type | Conference proceeding | |
dspace.entity.type | Publication | |
local.contributor.authorid | 0000-0002-2715-2368 | |
local.contributor.authorid | N/A | |
local.contributor.authorid | N/A | |
local.contributor.kuauthor | Erzin, Engin | |
local.contributor.kuauthor | Sadiq, Rizwan | |
local.contributor.kuauthor | Asadiabadi, Sasan | |
relation.isOrgUnitOfPublication | 21598063-a7c5-420d-91ba-0cc9b2db0ea0 | |
relation.isOrgUnitOfPublication.latestForDiscovery | 21598063-a7c5-420d-91ba-0cc9b2db0ea0 |
Files
Original bundle
1 - 1 of 1