Publication:
Audiovisual synchronization and fusion using canonical correlation analysis

dc.contributor.departmentDepartment of Computer Engineering
dc.contributor.departmentDepartment of Electrical and Electronics Engineering
dc.contributor.kuauthorSargın, Mehmet Emre
dc.contributor.kuauthorYemez, Yücel
dc.contributor.kuauthorErzin, Engin
dc.contributor.kuauthorTekalp, Ahmet Murat
dc.contributor.kuprofileFaculty Member
dc.contributor.kuprofileFaculty Member
dc.contributor.kuprofileFaculty Member
dc.contributor.otherDepartment of Computer Engineering
dc.contributor.otherDepartment of Electrical and Electronics Engineering
dc.contributor.schoolcollegeinstituteGraduate School of Sciences and Engineering
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.contributor.yokidN/A
dc.contributor.yokidN/A
dc.contributor.yokid34503
dc.contributor.yokid26207
dc.date.accessioned2024-11-09T12:29:11Z
dc.date.issued2007
dc.description.abstractIt is well-known that early integration (also called data fusion) is effective when the modalities are correlated, and late integration (also called decision or opinion fusion) is optimal when modalities are uncorrelated. In this paper, we propose a new multimodal fusion strategy for open-set speaker identification using a combination of early and late integration following canonical correlation analysis (CCA) of speech and lip texture features. We also propose a method for high precision synchronization of the speech and lip features using CCA prior to the proposed fusion. Experimental results show that i) the proposed fusion strategy yields the best equal error rates (EER), which are used to quantify the performance of the fusion strategy for open-set speaker identification, and ii) precise synchronization prior to fusion improves the EER; hence, the best EER is obtained when the proposed synchronization scheme is employed together with the proposed fusion strategy. We note that the proposed fusion strategy outperforms others because the features used in the late integration are truly uncorrelated, since they are output of the CCA analysis.
dc.description.fulltextYES
dc.description.indexedbyWoS
dc.description.indexedbyScopus
dc.description.issue7
dc.description.openaccessYES
dc.description.publisherscopeInternational
dc.description.sponsoredbyTubitakEuEU
dc.description.sponsorshipEuropean FP6 Network of Excellence SIMILAR
dc.description.versionAuthor's final manuscript
dc.description.volume9
dc.formatpdf
dc.identifier.doi10.1109/TMM.2007.906583
dc.identifier.embargoNO
dc.identifier.filenameinventorynoIR01073
dc.identifier.issn1520-9210
dc.identifier.linkhttps://doi.org/10.1109/TMM.2007.906583
dc.identifier.quartileQ1
dc.identifier.scopus2-s2.0-57549101447
dc.identifier.urihttps://hdl.handle.net/20.500.14288/1845
dc.identifier.wos250447400006
dc.keywordsInformation systems
dc.keywordsSoftware engineering
dc.keywordsTelecommunications
dc.keywordsAudiovisual synchronization
dc.keywordsCorrelation
dc.keywordsMultimodal
dc.keywordsFusion
dc.keywordsSpeaker recognition
dc.languageEnglish
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)
dc.relation.urihttp://cdm21054.contentdm.oclc.org/cdm/ref/collection/IR/id/6100
dc.sourceIEEE Transactions on Multimedia
dc.subjectComputer science
dc.titleAudiovisual synchronization and fusion using canonical correlation analysis
dc.typeJournal Article
dspace.entity.typePublication
local.contributor.authoridN/A
local.contributor.authoridN/A
local.contributor.authorid0000-0002-2715-2368
local.contributor.authorid0000-0003-1465-8121
local.contributor.kuauthorSargın, Mehmet Emre
local.contributor.kuauthorYemez, Yücel
local.contributor.kuauthorErzin, Engin
local.contributor.kuauthorTekalp, Ahmet Murat
relation.isOrgUnitOfPublication89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication21598063-a7c5-420d-91ba-0cc9b2db0ea0
relation.isOrgUnitOfPublication.latestForDiscovery21598063-a7c5-420d-91ba-0cc9b2db0ea0

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
6100.pdf
Size:
262.72 KB
Format:
Adobe Portable Document Format