Publication:
Multimodal speaker identification with audio-video processing

dc.conference.dateSEP 14-17, 2003
dc.conference.locationBarcelona, Spain
dc.conference.organizerIEEE International Conference on Image Processing
dc.contributor.departmentMVGL (Multimedia, Vision and Graphics Laboratory)
dc.contributor.facultymemberYes
dc.contributor.kuauthorErzin, Engin
dc.contributor.kuauthorKanak, Alper
dc.contributor.kuauthorTekalp, Ahmet Murat
dc.contributor.kuauthorYemez, Yücel
dc.contributor.schoolcollegeinstituteLaboratory
dc.date.accessioned2024-11-10T00:06:40Z
dc.date.issued2003
dc.description.abstractIn this paper we present a multimodal audio-visual speaker identification system. The objective is to improve the recognition performance over conventional unimodal schemes. The proposed system decomposes the information existing in a video stream into three components: speech, face texture and lip motion. Lip motion between successive frames is first computed in terms of optical row vectors and then encoded as a feature vector in a magnitude-direction histogram domain. The feature vectors obtained along the whole stream are then interpolated to match the rate of the speech signal and fused with mel frequency cepstral coeffcients (MFCC) of the corresponding speech signal. The resulting joint feature vectors are used to train and test a Hidden Markov Model (HMM) based identification system. Face texture images are treated separately in eigenface domain and integrated to the system through decision-fusion. Experimental results are also included for demonstration of the system performance.
dc.description.fulltextYes
dc.description.harvestedfromManual
dc.description.indexedbyWOS
dc.description.indexedbyScopus
dc.description.openaccessGreen OA
dc.description.peerreviewstatusN/A
dc.description.publisherscopeInternational
dc.description.readpublishN/A
dc.description.sponsoredbyTubitakEuN/A
dc.description.studentonlypublicationNo
dc.description.studentpublicationYes
dc.description.versionPost-print
dc.identifier.embargoNo
dc.identifier.endpage8
dc.identifier.filenameinventorynoIR06888
dc.identifier.isbn0780377508
dc.identifier.quartileN/A
dc.identifier.scopus2-s2.0-0345565788
dc.identifier.startpage5
dc.identifier.urihttps://hdl.handle.net/20.500.14288/16652
dc.identifier.wos000187010500002
dc.keywordsSpeech
dc.keywordsPattern recognition
dc.keywordsVideo processing
dc.language.isoeng
dc.publisherInstitute of Electrical and Electronics Engineers
dc.relation.affiliationKoç University
dc.relation.collectionKoç University Institutional Repository
dc.relation.ispartof2003 International Conference on Image Processing, Vol 3, Proceedings
dc.relation.openaccessYes
dc.rightsOther
dc.subjectComputer science
dc.subjectArtificial intelligence
dc.subjectImaging systems
dc.subjectPhotography
dc.titleMultimodal speaker identification with audio-video processing
dc.typeConference Proceeding
dspace.entity.typePublication
local.contributor.kuauthorYemez, Yücel
local.contributor.kuauthorKanak, Alper
local.contributor.kuauthorErzin, Engin
local.contributor.kuauthorTekalp, Ahmet Murat
relation.isOrgUnitOfPublicationcb6bbbf6-fd19-4052-b581-f591a9748d21
relation.isOrgUnitOfPublication.latestForDiscoverycb6bbbf6-fd19-4052-b581-f591a9748d21
relation.isParentOrgUnitOfPublication20385dee-35e7-484b-8da6-ddcc08271d96
relation.isParentOrgUnitOfPublication.latestForDiscovery20385dee-35e7-484b-8da6-ddcc08271d96

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
IR06888.pdf
Size:
342.9 KB
Format:
Adobe Portable Document Format