Multimodal speaker identification with audio-video processing

Publication:
Multimodal speaker identification with audio-video processing

dc.conference.date	SEP 14-17, 2003
dc.conference.location	Barcelona, Spain
dc.conference.organizer	IEEE International Conference on Image Processing
dc.contributor.department	MVGL (Multimedia, Vision and Graphics Laboratory)
dc.contributor.facultymember	Yes
dc.contributor.kuauthor	Erzin, Engin
dc.contributor.kuauthor	Kanak, Alper
dc.contributor.kuauthor	Tekalp, Ahmet Murat
dc.contributor.kuauthor	Yemez, Yücel
dc.contributor.schoolcollegeinstitute	Laboratory
dc.date.accessioned	2024-11-10T00:06:40Z
dc.date.issued	2003
dc.description.abstract	In this paper we present a multimodal audio-visual speaker identification system. The objective is to improve the recognition performance over conventional unimodal schemes. The proposed system decomposes the information existing in a video stream into three components: speech, face texture and lip motion. Lip motion between successive frames is first computed in terms of optical row vectors and then encoded as a feature vector in a magnitude-direction histogram domain. The feature vectors obtained along the whole stream are then interpolated to match the rate of the speech signal and fused with mel frequency cepstral coeffcients (MFCC) of the corresponding speech signal. The resulting joint feature vectors are used to train and test a Hidden Markov Model (HMM) based identification system. Face texture images are treated separately in eigenface domain and integrated to the system through decision-fusion. Experimental results are also included for demonstration of the system performance.
dc.description.fulltext	Yes
dc.description.harvestedfrom	Manual
dc.description.indexedby	WOS
dc.description.indexedby	Scopus
dc.description.openaccess	Green OA
dc.description.peerreviewstatus	N/A
dc.description.publisherscope	International
dc.description.readpublish	N/A
dc.description.sponsoredbyTubitakEu	N/A
dc.description.studentonlypublication	No
dc.description.studentpublication	Yes
dc.description.version	Post-print
dc.identifier.embargo	No
dc.identifier.endpage	8
dc.identifier.filenameinventoryno	IR06888
dc.identifier.isbn	0780377508
dc.identifier.quartile	N/A
dc.identifier.scopus	2-s2.0-0345565788
dc.identifier.startpage	5
dc.identifier.uri	https://hdl.handle.net/20.500.14288/16652
dc.identifier.wos	000187010500002
dc.keywords	Speech
dc.keywords	Pattern recognition
dc.keywords	Video processing
dc.language.iso	eng
dc.publisher	Institute of Electrical and Electronics Engineers
dc.relation.affiliation	Koç University
dc.relation.collection	Koç University Institutional Repository
dc.relation.ispartof	2003 International Conference on Image Processing, Vol 3, Proceedings
dc.relation.openaccess	Yes
dc.rights	Other
dc.subject	Computer science
dc.subject	Artificial intelligence
dc.subject	Imaging systems
dc.subject	Photography
dc.title	Multimodal speaker identification with audio-video processing
dc.type	Conference Proceeding
dspace.entity.type	Publication
local.contributor.kuauthor	Yemez, Yücel
local.contributor.kuauthor	Kanak, Alper
local.contributor.kuauthor	Erzin, Engin
local.contributor.kuauthor	Tekalp, Ahmet Murat
relation.isOrgUnitOfPublication	cb6bbbf6-fd19-4052-b581-f591a9748d21
relation.isOrgUnitOfPublication.latestForDiscovery	cb6bbbf6-fd19-4052-b581-f591a9748d21
relation.isParentOrgUnitOfPublication	20385dee-35e7-484b-8da6-ddcc08271d96
relation.isParentOrgUnitOfPublication.latestForDiscovery	20385dee-35e7-484b-8da6-ddcc08271d96

Files

Original bundle

Now showing 1 - 1 of 1

Name:: IR06888.pdf
Size:: 342.9 KB
Format:: Adobe Portable Document Format

Download

Collections

Publications with Fulltext

Publication: Multimodal speaker identification with audio-video processing

Files

Original bundle

Collections

Publication:
Multimodal speaker identification with audio-video processing