Publication:
Audio-visual prediction of head-nod and turn-taking events in dyadic interactions

dc.contributor.departmentN/A
dc.contributor.departmentDepartment of Computer Engineering
dc.contributor.departmentDepartment of Computer Engineering
dc.contributor.departmentDepartment of Computer Engineering
dc.contributor.kuauthorTürker, Bekir Berker
dc.contributor.kuauthorErzin, Engin
dc.contributor.kuauthorYemez, Yücel
dc.contributor.kuauthorSezgin, Tevfik Metin
dc.contributor.kuprofilePhD Student
dc.contributor.kuprofileFaculty Member
dc.contributor.kuprofileFaculty Member
dc.contributor.kuprofileFaculty Member
dc.contributor.otherDepartment of Computer Engineering
dc.contributor.schoolcollegeinstituteGraduate School of Sciences and Engineering
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.contributor.yokidN/A
dc.contributor.yokid34503
dc.contributor.yokid107907
dc.contributor.yokid18632
dc.date.accessioned2024-11-09T23:51:28Z
dc.date.issued2018
dc.description.abstractHead-nods and turn-taking both significantly contribute conversational dynamics in dyadic interactions. Timely prediction and use of these events is quite valuable for dialog management systems in human-robot interaction. in this study, we present an audio-visual prediction framework for the head-nod and turn taking events that can also be utilized in real-time systems. Prediction systems based on Support vector Machines (SVM) and Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN) are trained on human-human conversational data. Unimodal and multi-modal classification performances of head-nod and turn-taking events are reported over the IEMOCaP dataset.
dc.description.indexedbyWoS
dc.description.indexedbyScopus
dc.description.openaccessNO
dc.description.publisherscopeInternational
dc.description.sponsoredbyTubitakEuTÜBİTAK
dc.description.sponsorshipTurkish Scientific and Technical Research Council (TUBITaK) [113E324, 217E040] This work is supported by Turkish Scientific and Technical Research Council (TUBITaK) under grant numbers 113E324 and 217E040.
dc.identifier.doi10.21437/interspeech.2018-2215
dc.identifier.isbn978-1-5108-7221-9
dc.identifier.issn2308-457X
dc.identifier.quartileN/A
dc.identifier.scopus2-s2.0-85054959957
dc.identifier.urihttp://dx.doi.org/10.21437/interspeech.2018-2215
dc.identifier.urihttps://hdl.handle.net/20.500.14288/14715
dc.identifier.wos465363900364
dc.keywordsHead-nod
dc.keywordsTurn-taking
dc.keywordsSocial signals
dc.keywordsEvent prediction
dc.keywordsDyadic conversations
dc.keywordsHuman-robot interaction
dc.languageEnglish
dc.publisherIsca-int Speech Communication assoc
dc.source19th Annual Conference of the international Speech Communication Association (interspeech 2018), Vols 1-6: Speech Research for Emerging Markets in Multilingual Societies
dc.subjectComputer Science
dc.subjectArtificial intelligence
dc.subjectElectrical electronics engineering
dc.titleAudio-visual prediction of head-nod and turn-taking events in dyadic interactions
dc.typeConference proceeding
dspace.entity.typePublication
local.contributor.authoridN/A
local.contributor.authorid0000-0002-2715-2368
local.contributor.authorid0000-0002-7515-3138
local.contributor.authorid0000-0002-1524-1646
local.contributor.kuauthorTürker, Bekir Berker
local.contributor.kuauthorErzin, Engin
local.contributor.kuauthorYemez, Yücel
local.contributor.kuauthorSezgin, Tevfik Metin
relation.isOrgUnitOfPublication89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication.latestForDiscovery89352e43-bf09-4ef4-82f6-6f9d0174ebae

Files