Publication:
Investigating contributions of speech and facial landmarks for talking head generation

dc.contributor.departmentN/A
dc.contributor.departmentDepartment of Computer Engineering
dc.contributor.kuauthorKesim, Ege
dc.contributor.kuauthorErzin, Engin
dc.contributor.kuprofileFaculty Member
dc.contributor.otherDepartment of Computer Engineering
dc.contributor.schoolcollegeinstituteGraduate School of Sciences and Engineering
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.contributor.yokidN/A
dc.contributor.yokid34503
dc.date.accessioned2024-11-09T13:10:40Z
dc.date.issued2021
dc.description.abstractTalking head generation is an active research problem. It has been widely studied as a direct speech-to-video or two stage speech-to-landmarks-to-video mapping problem. In this study, our main motivation is to assess individual and joint contributions of the speech and facial landmarks to the talking head generation quality through a state-of-the-art generative adversarial network (GAN) architecture. Incorporating frame and sequence discriminators and a feature matching loss, we investigate performances of speech only, landmark only and joint speech and landmark driven talking head generation on the CREMA-D dataset. Objective evaluations using the peak signal-to-noise ratio (PSNR), structural similarity index (SSIM) and landmark distance (LMD) indicate that while landmarks bring PSNR and SSIM improvements to the speech driven system, speech brings LMD improvement to the landmark driven system. Furthermore, feature matching is observed to improve the speech driven talking head generation models significantly.
dc.description.fulltextYES
dc.description.indexedbyScopus
dc.description.openaccessYES
dc.description.publisherscopeInternational
dc.description.sponsoredbyTubitakEuN/A
dc.description.sponsorshipN/A
dc.description.versionPublisher version
dc.formatpdf
dc.identifier.doi10.21437/Interspeech.2021-1585
dc.identifier.embargoNO
dc.identifier.filenameinventorynoIR03341
dc.identifier.isbn9.78171E+12
dc.identifier.issn2308-457X
dc.identifier.linkhttps://doi.org/10.21437/Interspeech.2021-1585
dc.identifier.quartileN/A
dc.identifier.scopus2-s2.0-85119170398
dc.identifier.urihttps://hdl.handle.net/20.500.14288/2823
dc.keywordsSpeech driven animation
dc.keywordsTalking head generation
dc.languageEnglish
dc.publisherInternational Speech Communication Association (ISCA)
dc.relation.grantnoNA
dc.relation.urihttp://cdm21054.contentdm.oclc.org/cdm/ref/collection/IR/id/10127
dc.sourceProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
dc.subjectPhotography
dc.subjectImage color analysis
dc.subjectPipelines
dc.subjectComputer architecture
dc.subjectNetwork architecture
dc.subjectNoise measurement
dc.subjectColored noise
dc.subjectComputational photography
dc.subjectLow-light imaging
dc.subjectImage denoising
dc.subjectBurst images
dc.titleInvestigating contributions of speech and facial landmarks for talking head generation
dc.typeConference proceeding
dspace.entity.typePublication
local.contributor.authoridN/A
local.contributor.authorid0000-0002-2715-2368
local.contributor.kuauthorKesim, Ege
local.contributor.kuauthorErzin, Engin
relation.isOrgUnitOfPublication89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication.latestForDiscovery89352e43-bf09-4ef4-82f6-6f9d0174ebae

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
10127.pdf
Size:
414.5 KB
Format:
Adobe Portable Document Format