Publication:
Engagement rewarded actor-critic with conservative Q-learning for speech-driven laughter backchannel generation

dc.contributor.departmentDepartment of Computer Engineering
dc.contributor.kuauthorBayramoğlu, Öykü Zeynep
dc.contributor.kuauthorErzin, Engin
dc.contributor.kuauthorSezgin, Tevfik Metin
dc.contributor.kuauthorYemez, Yücel
dc.contributor.kuprofileFaculty Member
dc.contributor.kuprofileFaculty Member
dc.contributor.kuprofileFaculty Member
dc.contributor.otherDepartment of Computer Engineering
dc.contributor.researchcenterKoç Üniversitesi İş Bankası Yapay Zeka Uygulama ve Araştırma Merkezi (KUIS AI)/ Koç University İş Bank Artificial Intelligence Center (KUIS AI)
dc.contributor.schoolcollegeinstituteCollege of Engineering
dc.contributor.schoolcollegeinstituteGraduate School of Sciences and Engineering
dc.contributor.yokidN/A
dc.contributor.yokid34503
dc.contributor.yokid18632
dc.contributor.yokid107907
dc.date.accessioned2024-11-09T13:56:20Z
dc.date.issued2021
dc.description.abstractWe propose a speech-driven laughter backchannel generation model to reward engagement during human-agent interaction. We formulate the problem as a Markov decision process where speech signal represents the state and the objective is to maximize human engagement. Since online training is often impractical in the case of human-agent interaction, we utilize the existing human-to-human dyadic interaction datasets to train our agent for the backchannel generation task. We address the problem using an actor-critic method based on conservative Q-learning (CQL), that mitigates the distributional shift problem by suppressing Q-value over-estimation during training. The proposed CQL based approach is evaluated objectively on the IEMOCAP dataset for laughter generation task. When compared to the existing off-policy Q-learning methods, we observe an improved compliance with the dataset in terms of laugh generation rate. Furthermore, we show the effectiveness of the learned policy by estimating the expected engagement using off-policy policy evaluation techniques.
dc.description.fulltextYES
dc.description.indexedbyScopus
dc.description.openaccessYES
dc.description.publisherscopeInternational
dc.description.sponsoredbyTubitakEuTÜBİTAK
dc.description.sponsorshipScientific and Technological Research Council of Turkey (TÜBİTAK)
dc.description.versionAuthor's final manuscript
dc.formatpdf
dc.identifier.doi10.1145/3462244.3479944
dc.identifier.embargoNO
dc.identifier.filenameinventorynoIR03356
dc.identifier.isbn978-1-4503-8481-0
dc.identifier.linkhttps://doi.org/10.1145/3462244.3479944
dc.identifier.quartileN/A
dc.identifier.scopus2-s2.0-85119021073
dc.identifier.urihttps://hdl.handle.net/20.500.14288/4059
dc.keywordsBackchannels
dc.keywordsHuman-agent interaction
dc.keywordsOffline reinforcement learning
dc.keywordsUser engagement
dc.languageEnglish
dc.publisherAssociation for Computing Machinery (ACM)
dc.relation.grantno2.17E+42
dc.relation.urihttp://cdm21054.contentdm.oclc.org/cdm/ref/collection/IR/id/10144
dc.sourceInternational Conference on Multimodal Interaction
dc.subjectGeneration
dc.titleEngagement rewarded actor-critic with conservative Q-learning for speech-driven laughter backchannel generation
dc.typeConference proceeding
dspace.entity.typePublication
local.contributor.authoridN/A
local.contributor.authorid0000-0002-2715-2368
local.contributor.authorid0000-0002-1524-1646
local.contributor.authorid0000-0002-7515-3138
local.contributor.kuauthorBayramoğlu, Öykü Zeynep
local.contributor.kuauthorErzin, Engin
local.contributor.kuauthorSezgin, Tevfik Metin
local.contributor.kuauthorYemez, Yücel
relation.isOrgUnitOfPublication89352e43-bf09-4ef4-82f6-6f9d0174ebae
relation.isOrgUnitOfPublication.latestForDiscovery89352e43-bf09-4ef4-82f6-6f9d0174ebae

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
10144.pdf
Size:
839.18 KB
Format:
Adobe Portable Document Format