Department of Computer Engineering2024-11-092018978-1-5108-7221-92308-457X10.21437/interspeech.2018-22152-s2.0-85054959957http://dx.doi.org/10.21437/interspeech.2018-2215https://hdl.handle.net/20.500.14288/14715Head-nods and turn-taking both significantly contribute conversational dynamics in dyadic interactions. Timely prediction and use of these events is quite valuable for dialog management systems in human-robot interaction. in this study, we present an audio-visual prediction framework for the head-nod and turn taking events that can also be utilized in real-time systems. Prediction systems based on Support vector Machines (SVM) and Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN) are trained on human-human conversational data. Unimodal and multi-modal classification performances of head-nod and turn-taking events are reported over the IEMOCaP dataset.Computer ScienceArtificial intelligenceElectrical electronics engineeringAudio-visual prediction of head-nod and turn-taking events in dyadic interactionsConference proceeding465363900364N/A8660