Publication:
Joint audio-video processing for biometric speaker identification

dc.contributor.coauthorN/A
dc.contributor.departmentN/A
dc.contributor.departmentDepartment of Computer Engineering
dc.contributor.departmentDepartment of Computer Engineering
dc.contributor.departmentDepartment of Electrical and Electronics Engineering
dc.contributor.kuauthorKanak, Alper
dc.contributor.kuauthorErzin, Engin
dc.contributor.kuauthorYemez, Yücel
dc.contributor.kuauthorTekalp, Ahmet Murat
dc.contributor.kuprofileMaster Student
dc.contributor.kuprofileFaculty Member
dc.contributor.kuprofileFaculty Member
dc.contributor.kuprofileFaculty Member
dc.contributor.otherDepartment of Computer Engineering
dc.contributor.otherDepartment of Electrical and Electronics Engineering
dc.contributor.schoolcollegeinstituteGraduate School of Sciences and Engineering
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.contributor.yokidN/A
dc.contributor.yokid34503
dc.contributor.yokid107907
dc.contributor.yokid26207
dc.date.accessioned2024-11-09T22:59:03Z
dc.date.issued2003
dc.description.abstractWe present a bimodal audio-visual speaker identification system. The objective is to improve the recognition performance over conventional unimodal schemes. The proposed system exploits not only the temporal and spatial correlations existing in the speech and video signals of a speaker, but also the cross-correlation between these two modalities. Lip images extracted from each video frame are transformed onto an eigenspace. The obtained eigenlip coefficients are interpolated to match the rate of the speech signal and fused with Mel frequency cepstral coefficients (MFCC) of the corresponding speech signal. The resulting joint feature vectors are used to train and test a hidden Markov model (HMM) based identification system. Experimental results are included to demonstrate the system performance.
dc.description.indexedbyWoS
dc.description.indexedbyScopus
dc.description.openaccessNO
dc.description.publisherscopeInternational
dc.identifier.doiN/A
dc.identifier.isbn0-7803-7965-9
dc.identifier.quartileN/A
dc.identifier.scopus2-s2.0-0141590222
dc.identifier.uriN/A
dc.identifier.urihttps://hdl.handle.net/20.500.14288/7831
dc.identifier.wos185328600095
dc.keywordsSpeech
dc.languageEnglish
dc.publisherIEEE
dc.source2003 international Conference on Multimedia and Expo, Vol Iii, Proceedings
dc.subjectComputer science
dc.subjectArtificial intelligence
dc.subjectEngineering
dc.subjectElectrical and electronic engineering
dc.subjectImaging science
dc.subjectPhotographic technology
dc.titleJoint audio-video processing for biometric speaker identification
dc.typeConference proceeding
dspace.entity.typePublication
local.contributor.authorid0000-0003-2541-7753
local.contributor.authorid0000-0002-2715-2368
local.contributor.authorid0000-0002-7515-3138
local.contributor.authorid0000-0003-1465-8121
local.contributor.kuauthorKanak, Alper
local.contributor.kuauthorErzin, Engin
local.contributor.kuauthorYemez, Yücel
local.contributor.kuauthorTekalp, Ahmet Murat
relation.isOrgUnitOfPublication89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication21598063-a7c5-420d-91ba-0cc9b2db0ea0
relation.isOrgUnitOfPublication.latestForDiscovery21598063-a7c5-420d-91ba-0cc9b2db0ea0

Files