NBN_resolver

Exploiting and generalizing \\ Epistemic uncertainty in reinforcement learning and planning

Likmeta, Amarildo <1995>

Subject

ING-INF/05 Sistemi di elaborazione delle informazioni

Description

Solving sequential decision-making problems with complex and non-linear dynamics has been a goal of Artificial Intelligence since the conception of the field. Reinforcement Learning (RL) offers an general framework for solving such problems. Its approach learning by direct interaction with the environment, allowing for speculation on the value of candidate solutions, testing, and counter-factual reasoning, has allowed researchers to achieve remarkable achievements in a multitude of challenging problems both simulated and real-world. Nonetheless, the successful application of RL to new problems requires a large degree of task-specific tuning. One of the main open challenges in RL remains the exploration-exploitation dilemma. An agent that optimizes a cumulative objective in an unknown environment while learning faces the question of whether to trust the current information gathered and exploit it by executing the best-known strategies or take explorative strategies to gather more information with the hope of finding better strategies. The exploration problem has been thoroughly studied in the literature, and a multitude of solutions have been given for tabular domains or continuous domains with known structure. However, when moving to complex domains where neural networks are employed as function approximators, deep and directed exploration is still a challenge. In this dissertation, we tackle the exploration problem in RL by proposing Wasserstein TD-Learning (WTD), a novel framework that models the uncertainty over the value function in a model-free manner and propagates it across the state-action space by employing variational updates that allow us enough control over the updates to show some desirable theoretical properties in the tabular setting while allowing the method to be easily scalable in the DeepRL setting. This allows us to adapt WTD in a multitude of different settings by adapting algorithms from the literature to handle the distributional nature of our value function, allowing for deep and directed exploration.

Date

2024-06-21

Type

Doctoral Thesis

PeerReviewed

Format

application/pdf

Identifier

https://amsdottorato.unibo.it/id/eprint/11445/1/likmeta_amarildo_tesi.pdf

urn:nbn:it:unibo-30392

Likmeta, Amarildo (2024) Exploiting and generalizing \\ Epistemic uncertainty in reinforcement learning and planning, [Dissertation thesis], Alma Mater Studiorum Università di Bologna. Dottorato di ricerca in Data science and computation , 35 Ciclo. DOI 10.48676/unibo/amsdottorato/11445.

Relations

https://amsdottorato.unibo.it/id/eprint/11445/