Publication:
Investigating contributions of speech and facial landmarks for talking head generation

dc.contributor.coauthorN/A
dc.contributor.departmentDepartment of Computer Engineering
dc.contributor.departmentGraduate School of Sciences and Engineering
dc.contributor.departmentKUIS AI (Koç University & İş Bank Artificial Intelligence Center)
dc.contributor.kuauthorErzin, Engin
dc.contributor.kuauthorKesim, Ege
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.contributor.schoolcollegeinstituteGRADUATE SCHOOL OF SCIENCES AND ENGINEERING
dc.contributor.schoolcollegeinstituteResearch Center
dc.date.accessioned2024-11-09T22:55:57Z
dc.date.issued2021
dc.description.abstractTalking head generation is an active research problem. It has been widely studied as a direct speech-to-video or two stage speech-to-landmarks-to-video mapping problem. in this study, our main motivation is to assess individual and joint contributions of the speech and facial landmarks to the talking head generation quality through a state-of-the-art generative adversarial network (Gan) architecture. incorporating frame and sequence discriminators and a feature matching loss, we investigate performances of speech only, landmark only and joint speech and landmark driven talking head generation on the CREMa-D dataset. Objective evaluations using the peak signal-to-noise ratio (PSNR), structural similarity index (SSIM) and landmark distance (LMD) indicate that while landmarks bring PSNR and SSIM improvements to the speech driven system, speech brings LMD improvement to the landmark driven system. Furthermore, feature matching is observed to improve the speech driven talking head generation models significantly.
dc.description.indexedbyWOS
dc.description.indexedbyScopus
dc.description.openaccessYES
dc.description.publisherscopeInternational
dc.description.sponsoredbyTubitakEuN/A
dc.identifier.doi10.21437/interspeech.2021-1585
dc.identifier.issn2308-457X
dc.identifier.quartileN/A
dc.identifier.scopus2-s2.0-85119170398
dc.identifier.urihttps://doi.org/10.21437/interspeech.2021-1585
dc.identifier.urihttps://hdl.handle.net/20.500.14288/7273
dc.identifier.wos841879501148
dc.keywordsTalking head generation
dc.keywordsSpeech driven animation
dc.language.isoeng
dc.publisherIsca-int Speech Communication assoc
dc.relation.ispartofinterspeech 2021
dc.subjectAudiology
dc.subjectSpeech-language pathology
dc.subjectComputer science
dc.subjectArtificial intelligence
dc.subjectComputer science
dc.subjectSoftware engineering
dc.titleInvestigating contributions of speech and facial landmarks for talking head generation
dc.typeConference Proceeding
dspace.entity.typePublication
local.contributor.kuauthorKesim, Ege
local.contributor.kuauthorErzin, Engin
local.publication.orgunit1GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
local.publication.orgunit1College of Engineering
local.publication.orgunit1Research Center
local.publication.orgunit2Department of Computer Engineering
local.publication.orgunit2KUIS AI (Koç University & İş Bank Artificial Intelligence Center)
local.publication.orgunit2Graduate School of Sciences and Engineering
relation.isOrgUnitOfPublication89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication3fc31c89-e803-4eb1-af6b-6258bc42c3d8
relation.isOrgUnitOfPublication77d67233-829b-4c3a-a28f-bd97ab5c12c7
relation.isOrgUnitOfPublication.latestForDiscovery89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isParentOrgUnitOfPublication8e756b23-2d4a-4ce8-b1b3-62c794a8c164
relation.isParentOrgUnitOfPublication434c9663-2b11-4e66-9399-c863e2ebae43
relation.isParentOrgUnitOfPublicationd437580f-9309-4ecb-864a-4af58309d287
relation.isParentOrgUnitOfPublication.latestForDiscovery8e756b23-2d4a-4ce8-b1b3-62c794a8c164

Files