PMID- 37645053 OWN - NLM STAT- PubMed-not-MEDLINE LR - 20240216 IS - 2331-8422 (Electronic) IS - 2331-8422 (Linking) DP - 2023 Aug 15 TI - Planning to Learn: A Novel Algorithm for Active Learning during Model-Based Planning. LID - arXiv:2308.08029v1 AB - Active Inference is a recently developed framework for modeling decision processes under uncertainty. Over the last several years, empirical and theoretical work has begun to evaluate the strengths and weaknesses of this approach and how it might be extended and improved. One recent extension is the "sophisticated inference" (SI) algorithm, which improves performance on multi-step planning problems through a recursive decision tree search. However, little work to date has been done to compare SI to other established planning algorithms in reinforcement learning (RL). In addition, SI was developed with a focus on inference as opposed to learning. The present paper therefore has two aims. First, we compare performance of SI to Bayesian RL schemes designed to solve similar problems. Second, we present and compare an extension of SI - sophisticated learning (SL) - that more fully incorporates active learning during planning. SL maintains beliefs about how model parameters would change under the future observations expected under each policy. This allows a form of counterfactual retrospective inference in which the agent considers what could be learned from current or past observations given different future observations. To accomplish these aims, we make use of a novel, biologically inspired environment that requires an optimal balance between goal-seeking and active learning, and which was designed to highlight the problem structure for which SL offers a unique solution. This setup requires an agent to continually search an open environment for available (but changing) resources in the presence of competing affordances for information gain. Our simulations demonstrate that SL outperforms all other algorithms in this context - most notably, Bayes-adaptive RL and upper confidence bound (UCB) algorithms, which aim to solve multi-step planning problems using similar principles (i.e., directed exploration and counterfactual reasoning about belief updates given different possible actions/observations). These results provide added support for the utility of Active Inference in solving this class of biologically-relevant problems and offer added tools for testing hypotheses about human cognition. FAU - Hodson, Rowan AU - Hodson R AD - Laureate Institute for Brain Research. Tulsa, OK, USA. FAU - Bassett, Bruce AU - Bassett B AD - University of Cape Town, South Africa. AD - African Institute for Mathematical Sciences, Muizenberg, Cape Town. AD - South African Astronomical Observatory, Observatory, Cape Town. FAU - van Hoof, Charel AU - van Hoof C AD - Delft University of Technoloty, Department of Cognitive Robotoics. FAU - Rosman, Benjamin AU - Rosman B AD - University of the Witwatersrand, South Africa. FAU - Solms, Mark AU - Solms M AD - University of Cape Town, South Africa. FAU - Shock, Jonathan P AU - Shock JP AD - University of Cape Town, South Africa. AD - INRS, Montreal, Canada. FAU - Smith, Ryan AU - Smith R AD - Laureate Institute for Brain Research. Tulsa, OK, USA. LA - eng GR - P20 GM121312/GM/NIGMS NIH HHS/United States PT - Preprint DEP - 20230815 PL - United States TA - ArXiv JT - ArXiv JID - 101759493 PMC - PMC10462173 EDAT- 2023/08/30 06:48 MHDA- 2023/08/30 06:49 PMCR- 2023/08/15 CRDT- 2023/08/30 03:46 PHST- 2023/08/30 06:48 [pubmed] PHST- 2023/08/30 06:49 [medline] PHST- 2023/08/30 03:46 [entrez] PHST- 2023/08/15 00:00 [pmc-release] AID - 2308.08029 [pii] PST - epublish SO - ArXiv [Preprint]. 2023 Aug 15:arXiv:2308.08029v1.