Multimodal speaker identification using discriminative lip motion features

Publication:
Multimodal speaker identification using discriminative lip motion features

Departments

Organizational Unit

Department of Electrical and Electronics Engineering

Organizational Unit

Department of Computer Engineering

Organizational Unit

Graduate School of Sciences and Engineering

School / College / Institute

Organizational Unit

College of Engineering

Organizational Unit

GRADUATE SCHOOL OF SCIENCES AND ENGINEERING

Upper Org Unit

KU-Authors

Çetingül, Hasan Ertan

Erzin, Engin

Tekalp, Ahmet Murat

Yemez, Yücel

Publication Date

2009

Type

Book Chapter

Abstract

This chapter presents a multimodal speaker identification system that integrates audio, lip texture, and lip motion modalities, and the authors propose to use the "explicit" lip motion information that best represent the modality for the given problem. The work is presented in two stages: First, they consider several lip motion feature candidates such as dense motion features on the lip region, motion features on the outer lip contour, and lip shape features. Meanwhile, the authors introduce their main contribution, which is a novel two-stage, spatial-temporal discrimination analysis framework designed to obtain the best lip motion features. For speaker identification, the best lip motion features result in the highest discrimination among speakers. Next, they investigate the benefits of the inclusion of the best lip motion features for multimodal recognition. Audio, lip texture, and lip motion modalities are fused by the reliability weighted summation (RWS) decision rule, and hidden Markov model (HMM)-based modeling is performed for both unimodal and multimodal recognition. Experimental results indicate that discriminative grid-based lip motion features are proved to be more valuable and provide additional performance gains in speaker identification. © 2009, IGI Global.

Publisher

IGI Global

Subject

Electrical electronics engineering, Computer engineering

Source

Visual Speech Recognition: Lip Segmentation and Mapping

DOI

10.4018/978-1-60566-186-5.ch016

URI

https://doi.org/10.4018/978-1-60566-186-5.ch016
https://hdl.handle.net/20.500.14288/12531

Publication:
Multimodal speaker identification using discriminative lip motion features

Departments

School / College / Institute

Program

KU-Authors

KU Authors

Co-Authors

Publication Date

Language

Type

Embargo Status

Journal Title

Journal ISSN

Volume Title

Alternative Title

Abstract

Source

Publisher

Subject

Citation

Has Part

Source

Book Series Title

Edition

DOI

URI

item.page.datauri

Link

Rights

Copyrights Note

Collections

Endorsement

Review

Supplemented By

Referenced By

0

Views

0

Downloads

Publication: Multimodal speaker identification using discriminative lip motion features

Departments

School / College / Institute

Program

KU-Authors

KU Authors

Co-Authors

Publication Date

Language

Type

Embargo Status

Journal Title

Journal ISSN

Volume Title

Alternative Title

Abstract

Source

Publisher

Subject

Citation

Has Part

Source

Book Series Title

Edition

DOI

URI

item.page.datauri

Link

Rights

Copyrights Note

Collections

Endorsement

Review

Supplemented By

Referenced By

0

Views

0

Downloads

Publication:
Multimodal speaker identification using discriminative lip motion features