Publication:
Reward learning from very few demonstrations

dc.contributor.departmentN/A
dc.contributor.kuauthorEteke, Cem
dc.contributor.kuauthorKebüde, Doğancan
dc.contributor.kuauthorAkgün, Barış
dc.contributor.kuprofilePhD Student
dc.contributor.schoolcollegeinstituteGraduate School of Sciences and Engineering
dc.contributor.yokidN/A
dc.date.accessioned2024-11-10T00:05:51Z
dc.date.issued2021
dc.description.abstractThis article introduces a novel skill learning framework that learns rewards from very few demonstrations and uses them in policy search (PS) to improve the skill. The demonstrations are used to learn a parameterized policy to execute the skill and a goal model, as a hidden Markov model (HMM), to monitor executions. The rewards are learned from the HMM structure and its monitoring capability. The HMM is converted to a finite-horizon Markov reward process (MRP). A Monte Carlo approach is used to calculate its values. Then, the HMM and the values are merged into a partially observable MRP to obtain execution returns to be used with PS for improving the policy. In addition to reward learning, a black box PS method with an adaptive exploration strategy is adopted. The resulting framework is evaluated with five PS approaches and two skills in simulation. The results show that the learned dense rewards lead to better performance compared to sparse monitoring signals, and using an adaptive exploration lead to faster convergence with higher success rates and lower variance. The efficacy of the framework is validated in a real-robot settings by improving three skills to complete success from complete failure using learned rewards where sparse rewards failed completely.
dc.description.indexedbyWoS
dc.description.indexedbyScopus
dc.description.issue3
dc.description.openaccessNO
dc.description.sponsorshipScientific and Technological Research Council of Turkey [117E949] This work was supported by the Scientific and Technological Research Council of Turkey under Project 117E949.
dc.description.volume37
dc.identifier.doi10.1109/TRO.2020.3038698
dc.identifier.eissn1941-0468
dc.identifier.issn1552-3098
dc.identifier.scopus2-s2.0-85097956746
dc.identifier.urihttp://dx.doi.org/10.1109/TRO.2020.3038698
dc.identifier.urihttps://hdl.handle.net/20.500.14288/16512
dc.identifier.wos658341900013
dc.keywordsHidden Markov models
dc.keywordsRobots
dc.keywordsFeature extraction
dc.keywordsTrajectory
dc.keywordsTraining
dc.keywordsData mining
dc.keywordsNeural networks
dc.keywordsLearning from demonstration (LfD)
dc.keywordsreinforcement learning (RL)
dc.keywordsRGB-D perception
dc.keywordsVisual learning
dc.languageEnglish
dc.publisherIeee-Inst Electrical Electronics Engineers Inc
dc.sourceIeee Transactions On Robotics
dc.subjectRobotics
dc.titleReward learning from very few demonstrations
dc.typeJournal Article
dspace.entity.typePublication
local.contributor.authoridN/A
local.contributor.kuauthorEteke, Cem
local.contributor.kuauthorKebüde, Doğancan
local.contributor.kuauthorAkgün, Barış

Files