Publication:
Multimodal speaker identification using canonical correlation analysis

dc.contributor.departmentN/A
dc.contributor.departmentDepartment of Computer Engineering
dc.contributor.departmentDepartment of Computer Engineering
dc.contributor.departmentDepartment of Electrical and Electronics Engineering
dc.contributor.kuauthorSargın, Mehmet Emre
dc.contributor.kuauthorErzin, Engin
dc.contributor.kuauthorYemez, Yücel
dc.contributor.kuauthorTekalp, Ahmet Murat
dc.contributor.kuprofileMaster Student
dc.contributor.kuprofileFaculty Member
dc.contributor.kuprofileFaculty Member
dc.contributor.kuprofileFaculty Member
dc.contributor.otherDepartment of Computer Engineering
dc.contributor.otherDepartment of Electrical and Electronics Engineering
dc.contributor.schoolcollegeinstituteGraduate School of Sciences and Engineering
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.contributor.yokidN/A
dc.contributor.yokid34503
dc.contributor.yokid107907
dc.contributor.yokid26207
dc.date.accessioned2024-11-09T22:53:07Z
dc.date.issued2006
dc.description.abstractIn this work, we explore the use of canonical correlation analysis to improve the performance of multimodal recognition systems that involve multiple correlated modalities. More specifically, we consider the audiovisual speaker identification problem, where speech and lip texture (or intensity) modalities are fused in an open-set identification framework. Our motivation is based on the following observation. The late integration strategy, which is also referred to as decision or opinion fusion, is effective especially in case the contributing modalities are uncorrelated and thus the resulting partial decisions are statistically independent. Early integration techniques on the other hand can be favored only if a couple of modalities are highly correlated. However, coupled modalities such as audio and lip texture also consist of some components that are mutually independent. Thus we first perform a cross-correlation analysis on the audio and lip modalities so as to extract the correlated part of the information, and then employ an optimal combination of early and late integration techniques to fuse the extracted features. The results of the experiments testing the performance of the proposed system are also provided.
dc.description.indexedbyWoS
dc.description.indexedbyScopus
dc.description.openaccessNO
dc.identifier.doiN/A
dc.identifier.isbn978-1-4244-0468-1
dc.identifier.issn1520-6149
dc.identifier.scopus2-s2.0-33947376189
dc.identifier.urihttps://hdl.handle.net/20.500.14288/7146
dc.identifier.wos245559901036
dc.keywordsN/A
dc.languageEnglish
dc.publisherIEEE
dc.source2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13
dc.subjectAcoustics
dc.subjectComputer Science
dc.subjectArtificial intelligence
dc.subjectComputer science
dc.subjectSoftware Electrical electronics engineering engineering
dc.titleMultimodal speaker identification using canonical correlation analysis
dc.typeConference proceeding
dspace.entity.typePublication
local.contributor.authoridN/A
local.contributor.authorid0000-0002-2715-2368
local.contributor.authorid0000-0002-7515-3138
local.contributor.authorid0000-0003-1465-8121
local.contributor.kuauthorSargın, Mehmet Emre
local.contributor.kuauthorErzin, Engin
local.contributor.kuauthorYemez, Yücel
local.contributor.kuauthorTekalp, Ahmet Murat
relation.isOrgUnitOfPublication89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication21598063-a7c5-420d-91ba-0cc9b2db0ea0
relation.isOrgUnitOfPublication.latestForDiscovery21598063-a7c5-420d-91ba-0cc9b2db0ea0

Files