Skip to content

Commit

Permalink
Merge pull request #130 from StanfordASL/daniele/update_bib
Browse files Browse the repository at this point in the history
update SchmidtEtAl
  • Loading branch information
DanieleGammelli authored Nov 5, 2024
2 parents e45e390 + 174db08 commit ac744e3
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion _bibliography/ASL_Bib.bib
Original file line number Diff line number Diff line change
Expand Up @@ -1821,7 +1821,7 @@ @inproceedings{SchneiderBylardEtAl2022
@inproceedings{SchmidtGammelliEtAl2024,
author = {Schmidt, C. and Gammelli, D. and Harrison, J. and Pavone, M. and Rodrigues, F.},
title = {Offline Hierarchical Reinforcement Learning via Inverse Optimization},
booktitle = proc_NIPS,
booktitle = proc_ICLR,
keywords = {sub},
note = {Submitted},
abstract = {Hierarchical policies enable strong performance in many sequential decision-making problems, such as those with high-dimensional action spaces, those requiring long-horizon planning, and settings with sparse rewards. However, learning hierarchical policies from static offline datasets presents a significant challenge. Crucially, actions taken by higher-level policies may not be directly observable within hierarchical controllers, and the offline dataset might have been generated using a different policy structure, hindering the use of standard offline learning algorithms. In this work, we propose OHIO: a framework for offline reinforcement learning (RL) of hierarchical policies. Our framework leverages knowledge of the policy structure to solve the \textit{inverse problem}, recovering the unobservable high-level actions that likely generated the observed data under our hierarchical policy. This approach constructs a dataset suitable for off-the-shelf offline training. We demonstrate our framework on robotic and network optimization problems and show that it substantially outperforms end-to-end RL methods and improves robustness. We investigate a variety of instantiations of our framework, both in direct deployment of policies trained offline and when online fine-tuning is performed.},
Expand Down

0 comments on commit ac744e3

Please sign in to comment.