Multimodal speaker identification with audio-video processing

Publication:
Multimodal speaker identification with audio-video processing

Files

Primary IR06888.pdf (342.9 KB)

Departments

Organizational Unit

MVGL (Multimedia, Vision and Graphics Laboratory)

School / College / Institute

Organizational Unit

Laboratory

KU-Authors

Date

2003

Type

Conference Proceeding

Embargo Status

No

Abstract

In this paper we present a multimodal audio-visual speaker identification system. The objective is to improve the recognition performance over conventional unimodal schemes. The proposed system decomposes the information existing in a video stream into three components: speech, face texture and lip motion. Lip motion between successive frames is first computed in terms of optical row vectors and then encoded as a feature vector in a magnitude-direction histogram domain. The feature vectors obtained along the whole stream are then interpolated to match the rate of the speech signal and fused with mel frequency cepstral coeffcients (MFCC) of the corresponding speech signal. The resulting joint feature vectors are used to train and test a Hidden Markov Model (HMM) based identification system. Face texture images are treated separately in eigenface domain and integrated to the system through decision-fusion. Experimental results are also included for demonstration of the system performance.

Publisher

Institute of Electrical and Electronics Engineers

Subject

Computer science, Artificial intelligence, Imaging systems, Photography

Source

2003 International Conference on Image Processing, Vol 3, Proceedings

URI

https://hdl.handle.net/20.500.14288/16652

Rights

Other

Collections

Publications with Fulltext

Full item page

Publication: Multimodal speaker identification with audio-video processing

Files

Departments

School / College / Institute

Program

KU-Authors

KU Authors

Co-Authors

Editor & Affiliation

Compiler & Affiliation

Translator

Other Contributor

Date

Language

Type

Embargo Status

Journal Title

Journal ISSN

Volume Title

Alternative Title

Abstract

Source

Publisher

Subject

Citation

Has Part

Source

Book Series Title

Edition

DOI

URI

item.page.datauri

Link

Rights

Copyrights Note

Collections

Endorsement

Review

Supplemented By

Referenced By

Related Goal

0

Views

1

Downloads

Publication:
Multimodal speaker identification with audio-video processing