Researchers from École Polytechnique Fédérale de Lausanne (EPFL) in Switzerland and the University of Texas at Austin (UT) developed a brain-computer interface(BCI) that allows people to modify a robot manipulator’s motion trajectories. The interface system uses inverse reinforcement learning (IRL) and can learn a user’s preferences from less than five demonstrations.
The Nature’s Communications Biology journal described the system and a set of experiments. The main aim of the research was to assist paralyzed patients by developing robots that can be controlled using a BCI, explains Anthony Alford, a Development Group Manager at Genesys Cloud Services, in an InfoQ article.
The robot’s software includes a semi-autonomous obstacle avoidance routinе with parameters that are updated using IRL based on error-related potentials. Aude Billard, a lead researcher, commented:
“Assistance from robots could help [people with a spinal cord injury] recover some of their lost dexterity, since the robot can execute tasks in their place.”
BCI devices typically measure neural activity using internal implants or external sensors such as EEG electrodes. The main goal is to convert this sensor data into a signal that can be used as a computer input. Because directly commanding a robot manipulator via a BCI could be time-consuming and fatiguing, the team chose to investigate how a BCI could be used to adjust the behavior of a semi-autonomous robot manipulator.
Because of that the system adjusts the robot’s obstacle avoidance algorithm in response to the user’s error-related potentials (ErrP). To make this adjustment, the researchers implemented an IRL training algorithm. The algorithm learns both the reward function and the optimal action from a set of demonstrations. As the manipulator approached the obstacle, the robot would attempt to avoid the obstacle; if the user anticipated that the robot would not avoid the obstacle, the ErrP signal detected by the BCI was used to adjust the reward function and obstacle avoidance parameters.
In a set of experiments, the researchers found that their system could identify a user’s reward function in as few as three demonstrations. They also added that the approach was “robust to the natural variability and sub-optimal performance of the ErrP decoder,” a useful property, because the EEG sensing can be noisy.