Reward learning from very few demonstrations

Publication:
Reward learning from very few demonstrations

dc.contributor.department	Graduate School of Sciences and Engineering
dc.contributor.kuauthor	Akgün, Barış
dc.contributor.kuauthor	Eteke, Cem
dc.contributor.kuauthor	Kebüde, Doğancan
dc.contributor.schoolcollegeinstitute	GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
dc.date.accessioned	2024-11-10T00:05:51Z
dc.date.issued	2021
dc.description.abstract	This article introduces a novel skill learning framework that learns rewards from very few demonstrations and uses them in policy search (PS) to improve the skill. The demonstrations are used to learn a parameterized policy to execute the skill and a goal model, as a hidden Markov model (HMM), to monitor executions. The rewards are learned from the HMM structure and its monitoring capability. The HMM is converted to a finite-horizon Markov reward process (MRP). A Monte Carlo approach is used to calculate its values. Then, the HMM and the values are merged into a partially observable MRP to obtain execution returns to be used with PS for improving the policy. In addition to reward learning, a black box PS method with an adaptive exploration strategy is adopted. The resulting framework is evaluated with five PS approaches and two skills in simulation. The results show that the learned dense rewards lead to better performance compared to sparse monitoring signals, and using an adaptive exploration lead to faster convergence with higher success rates and lower variance. The efficacy of the framework is validated in a real-robot settings by improving three skills to complete success from complete failure using learned rewards where sparse rewards failed completely.
dc.description.indexedby	WOS
dc.description.indexedby	Scopus
dc.description.issue	3
dc.description.openaccess	NO
dc.description.sponsoredbyTubitakEu	N/A
dc.description.sponsorship	Scientific and Technological Research Council of Turkey [117E949] This work was supported by the Scientific and Technological Research Council of Turkey under Project 117E949.
dc.description.volume	37
dc.identifier.doi	10.1109/TRO.2020.3038698
dc.identifier.eissn	1941-0468
dc.identifier.issn	1552-3098
dc.identifier.scopus	2-s2.0-85097956746
dc.identifier.uri	https://doi.org/10.1109/TRO.2020.3038698
dc.identifier.uri	https://hdl.handle.net/20.500.14288/16512
dc.identifier.wos	658341900013
dc.keywords	Hidden Markov models
dc.keywords	Robots
dc.keywords	Feature extraction
dc.keywords	Trajectory
dc.keywords	Training
dc.keywords	Data mining
dc.keywords	Neural networks
dc.keywords	Learning from demonstration (LfD)
dc.keywords	reinforcement learning (RL)
dc.keywords	RGB-D perception
dc.keywords	Visual learning
dc.language.iso	eng
dc.publisher	Ieee-Inst Electrical Electronics Engineers Inc
dc.relation.ispartof	Ieee Transactions On Robotics
dc.subject	Robotics
dc.title	Reward learning from very few demonstrations
dc.type	Journal Article
dspace.entity.type	Publication
local.contributor.kuauthor	Eteke, Cem
local.contributor.kuauthor	Kebüde, Doğancan
local.contributor.kuauthor	Akgün, Barış
local.publication.orgunit1	GRADUATE SCHOOL OF SCIENCES AND ENGINEERING
local.publication.orgunit2	Graduate School of Sciences and Engineering
relation.isOrgUnitOfPublication	3fc31c89-e803-4eb1-af6b-6258bc42c3d8
relation.isOrgUnitOfPublication.latestForDiscovery	3fc31c89-e803-4eb1-af6b-6258bc42c3d8
relation.isParentOrgUnitOfPublication	434c9663-2b11-4e66-9399-c863e2ebae43
relation.isParentOrgUnitOfPublication.latestForDiscovery	434c9663-2b11-4e66-9399-c863e2ebae43

Collections

Publications without Fulltext

Publication: Reward learning from very few demonstrations

Files

Collections

Publication:
Reward learning from very few demonstrations