Multimodal speaker identification using discriminative lip motion features

2024-11-0920099781-6056-6186-510.4018/978-1-60566-186-5.ch0162-s2.0-84900179389https://doi.org/10.4018/978-1-60566-186-5.ch016https://hdl.handle.net/20.500.14288/12531This chapter presents a multimodal speaker identification system that integrates audio, lip texture, and lip motion modalities, and the authors propose to use the "explicit" lip motion information that best represent the modality for the given problem. The work is presented in two stages: First, they consider several lip motion feature candidates such as dense motion features on the lip region, motion features on the outer lip contour, and lip shape features. Meanwhile, the authors introduce their main contribution, which is a novel two-stage, spatial-temporal discrimination analysis framework designed to obtain the best lip motion features. For speaker identification, the best lip motion features result in the highest discrimination among speakers. Next, they investigate the benefits of the inclusion of the best lip motion features for multimodal recognition. Audio, lip texture, and lip motion modalities are fused by the reliability weighted summation (RWS) decision rule, and hidden Markov model (HMM)-based modeling is performed for both unimodal and multimodal recognition. Experimental results indicate that discriminative grid-based lip motion features are proved to be more valuable and provide additional performance gains in speaker identification. © 2009, IGI Global.engElectrical electronics engineeringComputer engineeringMultimodal speaker identification using discriminative lip motion featuresBook ChapterN/A8371