Laughing as a sequential decision making problem

Posted on Mon 27 Jan 2014

When designing a laughing virtual agent, there is the need to detect the context (multimodal analysis) and to synthesize laugh. Between these two modules, there is a decision problem handled by a laugh manager: should the agent laugh, and what type of laughter? Moreover, what the agent does has an effect on the human interacting with it, modifying the context and calling for a new reaction of the agent. In other words, this decision problem takes place in a perception/action loop. Therefore, laughing can be framed as a sequential decision making problem.

A standard machine learning approach for such a problem is reinforcement learning. In this paradigm, an agent learns to behave optimally with its environment by interacting with it. At each time step, given the current context, it chooses and action and apply it, resulting in a new context. The agent receives a reward for this transition, quantifying the local quality of the control. The ultimate goal of the agent is to maximize the cumulative reward over the long term (not just the next reward). If designing a reward is easy for some tasks (for example, if the agent learns to play a game, the reward could be +1 for winning, -1 for loosing and 0 else), it is a very hard problem for the task of laughing (which is anyway quite subjective). Even a human cannot explicit the cumulative reward he is optimizing when laughing.

Yet, it is easy to record an expert behavior: it suffices to collect data of a human (replacing the virtual agent) interacting with other humans (playing their own role). Then, a natural approach is to use supervised learning (and more precisely classification) to generalize the mapping between contexts and actions recorded with the human. However, this does not take into account the sequential nature of the problem.

To handle this issue, a principled approach is inverse reinforcement learning. Provided some examples of an expert behavior (the human laughing), the aim is to estimate the reward which is optimized. This allows imitating the human, by optimizing the learnt reward, but this could also provide some information on the laugh principle, by semantically analyzing the learnt reward. However, inverse reinforcement learning is a very hard problem (for diverse reasons), much less studied than supervised. A contribution of the Ilhaire project has been to draw connections between supervised learning and inverse reinforcement learning, leading to new algorithms advancing the state of the art and motivated by the laughing problem [1-4].

[1] E. Klein, M. Geist, B. Piot and O. Pietquin. “Inverse Reinforcement Learning through Structured Classification”. In Advances in Neural Information Processing Systems (NIPS), 2012.

[2] E. Klein, B. Piot, M. Geist and O. Pietquin. “A cascaded supervised learning approach to inverse reinforcement learning”. In European Conference on Machine Learning (ECML), 2013.

[3] B. Piot, M. Geist and O. Pietquin. “Learning from demonstrations: Is it worth estimating a reward function?”. In European Conference on Machine Learning (ECML), 2013.

[4] B. Piot, M. Geist and O. Pietquin. “Boosted and Reward-regularized Classification for Apprenticeship Learning”. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2013.