Investigating contributions of speech and facial landmarks for talking head generation

Publication:
Investigating contributions of speech and facial landmarks for talking head generation

Files

10127.pdf (414.5 KB)

Departments

Organizational Unit

Department of Computer Engineering

Organizational Unit

Graduate School of Sciences and Engineering

School / College / Institute

Organizational Unit

College of Engineering

Organizational Unit

GRADUATE SCHOOL OF SCIENCES AND ENGINEERING

Upper Org Unit

KU-Authors

Erzin, Engin

Kesim, Ege

Date

2021

Type

Conference Proceeding

Embargo Status

NO

Abstract

Talking head generation is an active research problem. It has been widely studied as a direct speech-to-video or two stage speech-to-landmarks-to-video mapping problem. In this study, our main motivation is to assess individual and joint contributions of the speech and facial landmarks to the talking head generation quality through a state-of-the-art generative adversarial network (GAN) architecture. Incorporating frame and sequence discriminators and a feature matching loss, we investigate performances of speech only, landmark only and joint speech and landmark driven talking head generation on the CREMA-D dataset. Objective evaluations using the peak signal-to-noise ratio (PSNR), structural similarity index (SSIM) and landmark distance (LMD) indicate that while landmarks bring PSNR and SSIM improvements to the speech driven system, speech brings LMD improvement to the landmark driven system. Furthermore, feature matching is observed to improve the speech driven talking head generation models significantly.

Publisher

International Speech Communication Association (ISCA)

Subject

Photography, Image color analysis, Pipelines, Computer architecture, Network architecture, Noise measurement, Colored noise, Computational photography, Low-light imaging, Image denoising, Burst images

Source

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

DOI

10.21437/Interspeech.2021-1585

URI

https://hdl.handle.net/20.500.14288/2823

Collections

Publications with Fulltext

Full item page

1

Views

20

Downloads

View PlumX Details

Publication: Investigating contributions of speech and facial landmarks for talking head generation

Files

Departments

School / College / Institute

Program

KU-Authors

KU Authors

Co-Authors

Editor & Affiliation

Compiler & Affiliation

Translator

Other Contributor

Date

Language

Type

Embargo Status

Journal Title

Journal ISSN

Volume Title

Alternative Title

Abstract

Source

Publisher

Subject

Citation

Has Part

Source

Book Series Title

Edition

DOI

URI

item.page.datauri

Link

Rights

Copyrights Note

Collections

Endorsement

Review

Supplemented By

Referenced By

Related Goal

1

Views

20

Downloads

Publication:
Investigating contributions of speech and facial landmarks for talking head generation