Multimodal speaker/speech recognition using lip motion, lip texture and audio

Publication:
Multimodal speaker/speech recognition using lip motion, lip texture and audio

Departments

Organizational Unit

Department of Computer Engineering

School / College / Institute

Organizational Unit

College of Engineering

KU-Authors

Çetingül, Hasan Ertan

Erzin, Engin

Tekalp, Ahmet Murat

Yemez, Yücel

Date

2006

Type

Journal Article

Embargo Status

N/A

Abstract

We present a new multimodal speaker/speech recognition system that integrates audio, lip texture and lip motion modalities. Fusion of audio and face texture modalities has been investigated in the literature before. The emphasis of this work is to investigate the benefits of inclusion of lip motion modality for two distinct cases: speaker and speech recognition. The audio modality is represented by the well-known mel-frequency cepstral coefficients (MFCC) along with the first and second derivatives, whereas lip texture modality is represented by the 2D-DCT coefficients of the luminance component within a bounding box about the lip region. In this paper, we employ a new lip motion modality representation based on discriminative analysis of the dense motion vectors within the same bounding box for speaker/speech recognition. The fusion of audio, lip texture and lip motion modalities is performed by the so-called reliability weighted summation (RWS) decision rule. Experimental results show that inclusion of lip motion modality provides further performance gains over those which are obtained by fusion of audio and lip texture alone, in both speaker identification and isolated word recognition scenarios.

Publisher

Elsevier

Subject

Engineering, electrical and electronic

Source

Signal Processing

DOI

10.1016/j.sigpro.2006.02.045

URI

https://doi.org/10.1016/j.sigpro.2006.02.045
https://hdl.handle.net/20.500.14288/10249

Publication: Multimodal speaker/speech recognition using lip motion, lip texture and audio

Departments

School / College / Institute

Program

KU-Authors

KU Authors

Co-Authors

Editor & Affiliation

Compiler & Affiliation

Translator

Other Contributor

Date

Language

Type

Embargo Status

Journal Title

Journal ISSN

Volume Title

Alternative Title

Abstract

Source

Publisher

Subject

Citation

Has Part

Source

Book Series Title

Edition

DOI

URI

item.page.datauri

Link

Rights

Copyrights Note

Collections

Endorsement

Review

Supplemented By

Referenced By

Related Goal

1

Views

0

Downloads

Publication:
Multimodal speaker/speech recognition using lip motion, lip texture and audio