Investigating contributions of speech and facial landmarks for talking head generation

Publication:
Investigating contributions of speech and facial landmarks for talking head generation

dc.contributor.department	Department of Computer Engineering
dc.contributor.department	Graduate School of Sciences and Engineering
dc.contributor.kuauthor	Erzin, Engin
dc.contributor.kuauthor	Kesim, Ege
dc.contributor.schoolcollegeinstitute	College of Engineering
dc.contributor.schoolcollegeinstitute	GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
dc.date.accessioned	2024-11-09T13:10:40Z
dc.date.issued	2021
dc.description.abstract	Talking head generation is an active research problem. It has been widely studied as a direct speech-to-video or two stage speech-to-landmarks-to-video mapping problem. In this study, our main motivation is to assess individual and joint contributions of the speech and facial landmarks to the talking head generation quality through a state-of-the-art generative adversarial network (GAN) architecture. Incorporating frame and sequence discriminators and a feature matching loss, we investigate performances of speech only, landmark only and joint speech and landmark driven talking head generation on the CREMA-D dataset. Objective evaluations using the peak signal-to-noise ratio (PSNR), structural similarity index (SSIM) and landmark distance (LMD) indicate that while landmarks bring PSNR and SSIM improvements to the speech driven system, speech brings LMD improvement to the landmark driven system. Furthermore, feature matching is observed to improve the speech driven talking head generation models significantly.
dc.description.fulltext	YES
dc.description.indexedby	Scopus
dc.description.openaccess	YES
dc.description.publisherscope	International
dc.description.sponsoredbyTubitakEu	N/A
dc.description.sponsorship	N/A
dc.description.version	Publisher version
dc.identifier.doi	10.21437/Interspeech.2021-1585
dc.identifier.embargo	NO
dc.identifier.filenameinventoryno	IR03341
dc.identifier.isbn	9.78171E+12
dc.identifier.issn	2308-457X
dc.identifier.quartile	N/A
dc.identifier.scopus	2-s2.0-85119170398
dc.identifier.uri	https://hdl.handle.net/20.500.14288/2823
dc.keywords	Speech driven animation
dc.keywords	Talking head generation
dc.language.iso	eng
dc.publisher	International Speech Communication Association (ISCA)
dc.relation.grantno	NA
dc.relation.ispartof	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
dc.relation.uri	http://cdm21054.contentdm.oclc.org/cdm/ref/collection/IR/id/10127
dc.subject	Photography
dc.subject	Image color analysis
dc.subject	Pipelines
dc.subject	Computer architecture
dc.subject	Network architecture
dc.subject	Noise measurement
dc.subject	Colored noise
dc.subject	Computational photography
dc.subject	Low-light imaging
dc.subject	Image denoising
dc.subject	Burst images
dc.title	Investigating contributions of speech and facial landmarks for talking head generation
dc.type	Conference Proceeding
dspace.entity.type	Publication
local.contributor.kuauthor	Kesim, Ege
local.contributor.kuauthor	Erzin, Engin
local.publication.orgunit1	GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
local.publication.orgunit1	College of Engineering
local.publication.orgunit2	Department of Computer Engineering
local.publication.orgunit2	Graduate School of Sciences and Engineering
relation.isOrgUnitOfPublication	89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication	3fc31c89-e803-4eb1-af6b-6258bc42c3d8
relation.isOrgUnitOfPublication.latestForDiscovery	89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isParentOrgUnitOfPublication	8e756b23-2d4a-4ce8-b1b3-62c794a8c164
relation.isParentOrgUnitOfPublication	434c9663-2b11-4e66-9399-c863e2ebae43
relation.isParentOrgUnitOfPublication.latestForDiscovery	8e756b23-2d4a-4ce8-b1b3-62c794a8c164

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 10127.pdf
Size:: 414.5 KB
Format:: Adobe Portable Document Format

Download

Collections

Publications with Fulltext

Publication: Investigating contributions of speech and facial landmarks for talking head generation

Files

Original bundle

Collections

Publication:
Investigating contributions of speech and facial landmarks for talking head generation