Multimodal analysis of speech prosody and upper body gestures using hidden semi-Markov models

2024-11-1020139781-4799-0356-61520-614910.1109/ICASSP.2013.66383392-s2.0-84890457155https://doi.org/10.1109/ICASSP.2013.6638339https://hdl.handle.net/20.500.14288/17636Gesticulation is an essential component of face-to-face communication, and it contributes significantly to the natural and affective perception of human-to-human communication. In this work we investigate a new multimodal analysis framework to model relationships between intonational and gesture phrases using the hidden semi-Markov models (HSMMs). The HSMM framework effectively associates longer duration gesture phrases to shorter duration prosody clusters, while maintaining realistic gesture phrase duration statistics. We evaluate the multimodal analysis framework by generating speech prosody driven gesture animation, and employing both subjective and objective metrics.engAcousticsElectrical electronics engineeringMultimodal analysis of speech prosody and upper body gestures using hidden semi-Markov modelsConference Proceeding3296115031621094