Robust lip-motion features for speaker identification

2024-11-1020050-7803-8874-71520-61492-s2.0-33646818965https://hdl.handle.net/20.500.14288/16559This paper addresses the selection of robust lip-motion features for audio-visual open-set speaker identification problem. We consider two alternatives for initial lip motion representation. In the first alternative. the feature vector is composed of the 2D-DCT coefficients of the motion vectors estimated within the detected rectangular mouth region whereas in the second, lip boundaries are tracked over the video frames and only the motion vectors around the lip contour are taken into account along with the shape of the lip boundary. Experimental results of the HMM-based identification system are included for performance comparison of the two lip motion representation alternatives.engComputer scienceArtificial intelligenceEngineeringElectrical electronic engineeringRobust lip-motion features for speaker identificationConference Proceeding2294042001281642