The value-complexity trade-off for reinforcement learning based brain-computer interfaces

Hadar Levi-Aharoni; Naftali Tishby

doi:10.1088/1741-2552/abc8d8

The value-complexity trade-off for reinforcement learning based brain-computer interfaces

J Neural Eng. 2021 Feb 13;17(6):066011. doi: 10.1088/1741-2552/abc8d8.

Authors

Hadar Levi-Aharoni¹, Naftali Tishby^{1

2}

Affiliations

¹ The Edmond and Lilly Safra Center for Brain Sciences, Hebrew University of Jerusalem, Jerusalem, Israel.
² School of Engineering and Computer Science, Hebrew University of Jerusalem, Jerusalem, Israel.

PMID: 33586668
DOI: 10.1088/1741-2552/abc8d8

Abstract

Objective: One of the recent developments in the field of brain-computer interfaces (BCI) is the reinforcement learning (RL) based BCI paradigm, which uses neural error responses as the reward feedback on the agent's action. While having several advantages over motor imagery based BCI, the reliability of RL-BCI is critically dependent on the decoding accuracy of noisy neural error signals. A principled method is needed to optimally handle this inherent noise under general conditions.

Approach: By determining a trade-off between the expected value and the informational cost of policies, the info-RL (IRL) algorithm provides optimal low-complexity policies, which are robust under noisy reward conditions and achieve the maximal obtainable value. In this work we utilize the IRL algorithm to characterize the maximal obtainable value under different noise levels, which in turn is used to extract the optimal robust policy for each noise level.

Main results: Our simulation results of a setting with Gaussian noise show that the complexity level of the optimal policy is dependent on the reward magnitude but not on the reward variance, whereas the variance determines whether a lower complexity solution is favorable or not. We show how this analysis can be utilized to select optimal robust policies for an RL-BCI and demonstrate its use on EEG data.

Significance: We propose here a principled method to determine the optimal policy complexity of an RL problem with a noisy reward, which we argue is particularly useful for RL-based BCI paradigms. This framework may be used to minimize initial training time and allow for a more dynamic and robust shared control between the agent and the operator under different conditions.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Brain-Computer Interfaces*
Electroencephalography
Learning
Reinforcement, Psychology
Reproducibility of Results