Publication:
Joint audio-video processing for biometric speaker identification

dc.conference.locationBaltimore, USA
dc.conference.organizer4th International Conference on Multimedia and Expo (ICME 2003)
dc.contributor.departmentMVGL (Multimedia, Vision and Graphics Laboratory)
dc.contributor.facultymemberYes
dc.contributor.kuauthorErzin, Engin
dc.contributor.kuauthorKanak, Alper
dc.contributor.kuauthorTekalp, Ahmet Murat
dc.contributor.kuauthorYemez, Yücel
dc.contributor.schoolcollegeinstituteLaboratory
dc.dateJUL 06-09, 2003
dc.date.accessioned2024-11-09T22:59:03Z
dc.date.issued2003
dc.description.abstractWe present a bimodal audio-visual speaker identification system. The objective is to improve the recognition performance over conventional unimodal schemes. The proposed system exploits not only the temporal and spatial correlations existing in the speech and video signals of a speaker, but also the cross-correlation between these two modalities. Lip images extracted from each video frame are transformed onto an eigenspace. The obtained eigenlip coefficients are interpolated to match the rate of the speech signal and fused with Mel frequency cepstral coefficients (MFCC) of the corresponding speech signal. The resulting joint feature vectors are used to train and test a hidden Markov model (HMM) based identification system. Experimental results are included to demonstrate the system performance.
dc.description.fulltextNo
dc.description.harvestedfromManual
dc.description.indexedbyWOS
dc.description.indexedbyScopus
dc.description.openaccessNO
dc.description.peerreviewstatusN/A
dc.description.publisherscopeInternational
dc.description.readpublishN/A
dc.description.sponsoredbyTubitakEuN/A
dc.description.versionN/A
dc.identifier.embargoN/A
dc.identifier.endpage564
dc.identifier.isbn0780379659
dc.identifier.quartileN/A
dc.identifier.scopus2-s2.0-0141590222
dc.identifier.startpage561
dc.identifier.urihttps://hdl.handle.net/20.500.14288/7831
dc.identifier.wos000185328600095
dc.keywordsSpeech
dc.keywordsMultimodal biometrics
dc.keywordsFace recognition
dc.language.isoeng
dc.publisherInstitute of Electrical and Electronics Engineers
dc.relation.affiliationKoç University
dc.relation.collectionKoç University Institutional Repository
dc.relation.ispartof2003 International Conference on Multimedia and Expo, Vol III, Proceedings
dc.relation.openaccessN/A
dc.rightsN/A
dc.subjectComputer science
dc.subjectArtificial intelligence
dc.subjectEngineering
dc.subjectElectrical and electronic engineering
dc.subjectImaging science
dc.subjectPhotographic technology
dc.titleJoint audio-video processing for biometric speaker identification
dc.typeConference Proceeding
dspace.entity.typePublication
local.contributor.kuauthorKanak, Alper
local.contributor.kuauthorErzin, Engin
local.contributor.kuauthorYemez, Yücel
local.contributor.kuauthorTekalp, Ahmet Murat
relation.isOrgUnitOfPublicationcb6bbbf6-fd19-4052-b581-f591a9748d21
relation.isOrgUnitOfPublication.latestForDiscoverycb6bbbf6-fd19-4052-b581-f591a9748d21
relation.isParentOrgUnitOfPublication20385dee-35e7-484b-8da6-ddcc08271d96
relation.isParentOrgUnitOfPublication.latestForDiscovery20385dee-35e7-484b-8da6-ddcc08271d96

Files