Discriminative LIP-motion features for biometric speaker identification

2024-11-0920040780-3855-431522-488010.1109/ICIP.2004.14214802-s2.0-20444432705https://doi.org/10.1109/ICIP.2004.1421480https://hdl.handle.net/20.500.14288/9250This paper addresses the selection of best lip motion features for biometric open-set speaker identification. The best features are those that result in the highest discrimination of individual speakers in a population. We first detect the face region in each video frame. The lip region for each frame is then segmented following registration of successive face regions by global motion compensation. The initial lip feature vector is composed of the 2D-DCT coefficients of the optical flow vectors within the lip region at each frame. The discriminant analysis is composed of two stages. At the first stage, the most discriminative features are selected from the full set of DCT coefficients of a single lip motion frame by using a probabilistic measure that maximizes the ratio of intra-class and inter-class probabilities. At the second stage, the resulting discriminative feature vectors are interpolated and concatenated for each time instant within a neighborhood, and further analyzed by LDA to reduce dimension, this time taking into account temporal discrimination information. Experimental results of the HMM-based speaker identification system are included to demonstrate the performance.engElectrical electronics engineeringComputer engineeringDiscriminative LIP-motion features for biometric speaker identificationConference ProceedingN/A8370