Publication: Speech-driven automatic facial expression synthesis
dc.contributor.coauthor | Bozkurt, Elif | |
dc.contributor.coauthor | Erdem, Cigdem Eroglu | |
dc.contributor.coauthor | Erdem, Tanju | |
dc.contributor.coauthor | Oezkan, Mehmet | |
dc.contributor.department | Department of Electrical and Electronics Engineering | |
dc.contributor.department | Department of Computer Engineering | |
dc.contributor.kuauthor | Erzin, Engin | |
dc.contributor.kuauthor | Tekalp, Ahmet Murat | |
dc.contributor.schoolcollegeinstitute | College of Engineering | |
dc.date.accessioned | 2024-11-09T22:56:47Z | |
dc.date.issued | 2008 | |
dc.description.abstract | This paper focuses on the problem of automatically generating speech synchronous facial expressions for 3D talking heads. The proposed system is speaker and language independent. We parameterize speech data with prosody related features and spectral features together with their first and second order derivatives. Then, we classify the seven emotions in the dataset with two different classifiers: Gaussian mixture models (GMMs) and Hidden Markov Models (HMMs). Probability density function of the spectral feature space is modeled with a GMM for each emotion. Temporal patterns of the emotion dependent prosody contours are modeled with an HMM based classifier. We use the Berlin Emotional Speech dataset (EMO-DB) [1] during the experiments. GMM classifier has the best overall recognition rate 82.85% when cepstral features with delta and acceleration coefficients are used. HMM based classifier has lower recognition rates than the GMM based classifier. However, fusion of the two classifiers has 83.80% recognition rate on the average. Experimental results on automatic facial expression synthesis are encouraging. | |
dc.description.indexedby | WOS | |
dc.description.indexedby | Scopus | |
dc.description.openaccess | NO | |
dc.description.publisherscope | International | |
dc.description.sponsoredbyTubitakEu | N/A | |
dc.description.sponsorship | EC within FP6 [511568] This work is supported by EC within FP6 under Grant 511568 with the acronym 3DTV | |
dc.identifier.isbn | 978-1-4244-1760-5 | |
dc.identifier.issn | 2161-2021 | |
dc.identifier.scopus | 2-s2.0-51149092366 | |
dc.identifier.uri | https://hdl.handle.net/20.500.14288/7443 | |
dc.identifier.wos | 258372100064 | |
dc.keywords | Emotion recognition | |
dc.keywords | Facial expression synthesis | |
dc.keywords | Classifier fusion | |
dc.language.iso | eng | |
dc.publisher | IEEE | |
dc.relation.ispartof | 2008 3dtv-Conference: The True Vision - Capture, Transmission and Display of 3d Video | |
dc.subject | Engineering | |
dc.subject | Electrical electronic engineering | |
dc.subject | Imaging science | |
dc.subject | Photographic technology | |
dc.title | Speech-driven automatic facial expression synthesis | |
dc.type | Conference Proceeding | |
dspace.entity.type | Publication | |
local.contributor.kuauthor | Erzin, Engin | |
local.contributor.kuauthor | Tekalp, Ahmet Murat | |
local.publication.orgunit1 | College of Engineering | |
local.publication.orgunit2 | Department of Computer Engineering | |
local.publication.orgunit2 | Department of Electrical and Electronics Engineering | |
relation.isOrgUnitOfPublication | 21598063-a7c5-420d-91ba-0cc9b2db0ea0 | |
relation.isOrgUnitOfPublication | 89352e43-bf09-4ef4-82f6-6f9d0174ebae | |
relation.isOrgUnitOfPublication.latestForDiscovery | 21598063-a7c5-420d-91ba-0cc9b2db0ea0 | |
relation.isParentOrgUnitOfPublication | 8e756b23-2d4a-4ce8-b1b3-62c794a8c164 | |
relation.isParentOrgUnitOfPublication.latestForDiscovery | 8e756b23-2d4a-4ce8-b1b3-62c794a8c164 |