Publication:
Multimodal speaker identification using discriminative lip motion features

dc.contributor.departmentDepartment of Electrical and Electronics Engineering
dc.contributor.departmentDepartment of Computer Engineering
dc.contributor.departmentDepartment of Computer Engineering
dc.contributor.departmentN/A
dc.contributor.kuauthorTekalp, Ahmet Murat
dc.contributor.kuauthorErzin, Engin
dc.contributor.kuauthorYemez, Yücel
dc.contributor.kuauthorÇetingül, Hasan Ertan
dc.contributor.kuprofileFaculty Member
dc.contributor.kuprofileFaculty Member
dc.contributor.kuprofileFaculty Member
dc.contributor.kuprofileMaster Student
dc.contributor.otherDepartment of Electrical and Electronics Engineering
dc.contributor.otherDepartment of Computer Engineering
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.contributor.schoolcollegeinstituteGraduate School of Sciences and Engineering
dc.contributor.yokid26207
dc.contributor.yokid34503
dc.contributor.yokid107907
dc.contributor.yokidN/A
dc.date.accessioned2024-11-09T23:35:37Z
dc.date.issued2009
dc.description.abstractThis chapter presents a multimodal speaker identification system that integrates audio, lip texture, and lip motion modalities, and the authors propose to use the "explicit" lip motion information that best represent the modality for the given problem. The work is presented in two stages: First, they consider several lip motion feature candidates such as dense motion features on the lip region, motion features on the outer lip contour, and lip shape features. Meanwhile, the authors introduce their main contribution, which is a novel two-stage, spatial-temporal discrimination analysis framework designed to obtain the best lip motion features. For speaker identification, the best lip motion features result in the highest discrimination among speakers. Next, they investigate the benefits of the inclusion of the best lip motion features for multimodal recognition. Audio, lip texture, and lip motion modalities are fused by the reliability weighted summation (RWS) decision rule, and hidden Markov model (HMM)-based modeling is performed for both unimodal and multimodal recognition. Experimental results indicate that discriminative grid-based lip motion features are proved to be more valuable and provide additional performance gains in speaker identification. © 2009, IGI Global.
dc.description.indexedbyScopus
dc.description.openaccessYES
dc.description.publisherscopeInternational
dc.identifier.doi10.4018/978-1-60566-186-5.ch016
dc.identifier.isbn9781-6056-6186-5
dc.identifier.linkhttps://www.scopus.com/inward/record.uri?eid=2-s2.0-84900179389anddoi=10.4018%2f978-1-60566-186-5.ch016andpartnerID=40andmd5=fcb17d7d71b78420819c86c412554530
dc.identifier.quartileN/A
dc.identifier.scopus2-s2.0-84900179389
dc.identifier.urihttp://dx.doi.org/10.4018/978-1-60566-186-5.ch016
dc.identifier.urihttps://hdl.handle.net/20.500.14288/12531
dc.keywordsN/A
dc.languageEnglish
dc.publisherIGI Global
dc.sourceVisual Speech Recognition: Lip Segmentation and Mapping
dc.subjectElectrical electronics engineering
dc.subjectComputer engineering
dc.titleMultimodal speaker identification using discriminative lip motion features
dc.typeBook Chapter
dspace.entity.typePublication
local.contributor.authorid0000-0003-1465-8121
local.contributor.authorid0000-0002-2715-2368
local.contributor.authorid0000-0002-7515-3138
local.contributor.authoridN/A
local.contributor.kuauthorTekalp, Ahmet Murat
local.contributor.kuauthorErzin, Engin
local.contributor.kuauthorYemez, Yücel
local.contributor.kuauthorÇetingül, Hasan Ertan
relation.isOrgUnitOfPublication21598063-a7c5-420d-91ba-0cc9b2db0ea0
relation.isOrgUnitOfPublication89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication.latestForDiscovery21598063-a7c5-420d-91ba-0cc9b2db0ea0

Files