Michael Bradley Johanson, PhD

The Frost Hollow Papers:

Overview: The Frost Hollow Experiments: Pavlovial Signalling as a Path to Coordination and Communication Between Agents

Pavlovian Signalling: Pavlovian Signalling with General Value Functions in Agent-Agent Temporal Decision Making

VR Interaction: Assessing Human Interaction in Virtual Reality with Continually Learning Prediction Agents Based on Reinforcement Learning Algorithms: A Pilot Study

Download

Overview Paper: [PDF]
Pavlovian Signalling Paper: [PDF]
VR Interaction Paper: [PDF]

Abstract

Learned communication between agents is a powerful tool when approaching decision-making problems that are hard to overcome by any single agent in isolation. However, continual coordination and communication learning between machine agents or human-machine partnerships remains a challenging open problem. As a stepping stone toward solving the continual communication learning problem, in this paper we contribute a multi-faceted study into what we term Pavlovian Signalling -- a process by which learned, temporally extended predictions made by one agent inform decision-making by another agent with different perceptual access to their shared environment. We seek to establish how different temporal processes and representational choices impact Pavlovian signalling between learning agents. To do so, we introduce a partially observable decision-making domain we call the Frost Hollow. Extending from classical animal learning experiments, in this domain a prediction learning agent and a reinforcement learning agent are coupled into a two-part decision-making system that seeks to acquire sparse reward while avoiding time-conditional hazards. We evaluate two domain variations: machine prediction and control learning agents interacting in a simulated linear walk, and, as a key case of interest, a prediction learning machine interacting with a human participant via Pavlovian signalling in a virtual reality environment. Our results showcase the speed of learning for Pavlovian signalling, the impact that different temporal representations do (and do not) have on agent-agent coordination, and how temporal aliasing impacts agent-agent and human-agent interactions differently. As a main contribution, we establish Pavlovian signalling as a natural bridge between fixed signalling paradigms and fully adaptive communication learning between two agents. We further show how to computationally build this adaptive signalling process out of a fixed signalling process, characterized by fast continual prediction learning and minimal constraints on the nature of the agent receiving signals. Our results therefore point to an actionable, constructivist path towards continual communication learning between reinforcement learning agents, with potential impact in a range of real-world settings.

Notes

This set of three papers are closely related, and the first largely subsumes the other two, which appeared at the AAMAS 2022 ALA workshop. These papers explore dividing an RL agent into two or more tightly coupled components that each learn independently, with upstream agents producing features for consumption by downstream agents. Pavlovian Signalling is a particular choice of interface between these sub-agents, which filters the output signal down to one or more booleans. This gives the signal a more consistent meaning, which makes it easier for downstream agents to learn to use, and also an interpretable meaning if consumed by humans. In particular, we envision replacing the final agent in the chain by a human, and having the upstream agents learn to produce useful signals for the human.

We explore these ideas in a new environment called "Frost Hollow", which involves a temporal task: a cold wind that the agent has to avoid, but which has no warning in the environment about when it is about to blow. However, the wind blows on a predictable schedule, allowing upstream agents to learn to predict when it is imminent, and then warn the downstream agent responsible for choosing actions -- whether an RL agent or a human -- so that they can move to safety in time. We produced two implementations of this environment. The first, highlighted in the "Pavlovian Signalling" paper, focuses on the Agent-Agent task in an abstract python environment. The second, highlighted in the "VR Interaction" paper, focuses on the Agent-Human task in a virtual reality environment, where the prediction agent learns in real time alongside the human.

BibTeX


@article{pilarski2022frost,
  title={The Frost Hollow Experiments: Pavlovian Signalling as a Path to Coordination and Communication Between Agents},
  author={Pilarski, Patrick M and Butcher, Andrew and Davoodi, Elnaz and Johanson, Michael Bradley and Brenneis, Dylan JA and Parker, Adam SR and Acker, Leslie and Botvinick, Matthew M and Modayil, Joseph and White, Adam},
  journal={arXiv e-prints},
  pages={arXiv--2203},
  year={2022}
}

@article{butcher2022pavlovian,
  title={Pavlovian Signalling with General Value Functions in Agent-Agent Temporal Decision Making},
  author={Butcher, Andrew and Johanson, Michael Bradley and Davoodi, Elnaz and Brenneis, Dylan JA and Acker, Leslie and Parker, Adam SR and White, Adam and Modayil, Joseph and Pilarski, Patrick M},
  journal={arXiv preprint arXiv:2201.03709},
  year={2022}
}

@article{brenneis2021fhvr
  title={Assessing Human Interaction in Virtual Reality With Continually Learning Prediction Agents Based on Reinforcement Learning Algorithms: A Pilot Study},
  author={Brenneis, Dylan JA and Parker, Adam S and Johanson, Michael Bradley and Butcher, Andrew and Davoodi, Elnaz and Acker, Leslie and Botvinick, Matthew M and Modayil, Joseph and White, Adam and Pilarski, Patrick M},
  journal={arXiv preprint arXiv:2112.07774},
  year={2021}
}