NBN_resolver

Unsupervised reinforcement learning via state entropy maximization

Mutti, Mirco <1993>

Subject

ING-INF/05 Sistemi di elaborazione delle informazioni

Description

Reinforcement Learning (RL) provides a powerful framework to address sequential decision-making problems in which the transition dynamics is unknown or too complex to be represented. The RL approach is based on speculating what is the best decision to make given sample estimates obtained from previous interactions, a recipe that led to several breakthroughs in various domains, ranging from game playing to robotics. Despite their success, current RL methods hardly generalize from one task to another, and achieving the kind of generalization obtained through unsupervised pre-training in non-sequential problems seems unthinkable. Unsupervised RL has recently emerged as a way to improve generalization of RL methods. Just as its non-sequential counterpart, the unsupervised RL framework comprises two phases: An unsupervised pre-training phase, in which the agent interacts with the environment without external feedback, and a supervised fine-tuning phase, in which the agent aims to efficiently solve a task in the same environment by exploiting the knowledge acquired during pre-training. In this thesis, we study unsupervised RL via state entropy maximization, in which the agent makes use of the unsupervised interactions to pre-train a policy that maximizes the entropy of its induced state distribution. First, we provide a theoretical characterization of the learning problem by considering a convex RL formulation that subsumes state entropy maximization. Our analysis shows that maximizing the state entropy in finite trials is inherently harder than RL. Then, we study the state entropy maximization problem from an optimization perspective. Especially, we show that the primal formulation of the corresponding optimization problem can be (approximately) addressed through tractable linear programs. Finally, we provide the first practical methodologies for state entropy maximization in complex domains, both when the pre-training takes place in a single environment as well as multiple environments.

Date

2023-03-29

Type

Doctoral Thesis

PeerReviewed

Format

application/pdf

Identifier

https://amsdottorato.unibo.it/id/eprint/10588/1/mutti_mirco_tesi.pdf

urn:nbn:it:unibo-29109

Mutti, Mirco (2023) Unsupervised reinforcement learning via state entropy maximization, [Dissertation thesis], Alma Mater Studiorum Università di Bologna. Dottorato di ricerca in Data science and computation , 34 Ciclo. DOI 10.48676/unibo/amsdottorato/10588.

Relations

https://amsdottorato.unibo.it/id/eprint/10588/