Multimodal motion synthesis of laughter

Posted on Fri 27 Jun 2014
Last update 30-06-2014
Yu Ding, Jing Huang, Catherine Pelachaud
Telecom-ParisTech, CNRS-LTCI

Laughter movements are very rhythmic and show saccadic patterns. To capture laughter motion characteristics, we have developed a statistical approach to reproduce frequency movements, such as shaking and trembling. We used Coupled HMMs which have been designed to model multiple interdependent streams of observations.The coupled model can simulate the relationship between modalities, especially head and torso motions.Our statistical model takes as input pseudo-phoneme sequences and acoustic features of laughter sound. Then it outputs the head and torso animations of the virtual agent as well as facial expressions. In the training model, not only the relation between input and output features is modelled, but also the relation between head and torso movements is captured.

The Animation synthesis model was built using human data of 2 subjects. The data contains 205 laugh sequences and 25625 frames in total. Human data from another subject was used for validation through subjective and objective evaluation studies. It contains 54 laugh sequences and 6750 frames. Objective and subjective evaluations were conducted to validate the proposed animation synthesis model. We tested if coupled HMM was able to capture relationship between multiple modalities. We also measure the similarity between synthesized and real motions. Similarity measures were done considering 3 features: the main frequency of the signal, its amplitude, and its energy. Two perceptive studies were also conducted. We first compared motions resulting from statistical models, HMM vs coupled HMM. Then, as for the objective evaluation, one perceptive study compared synthesized and real motions. Both evaluations showed that our model is able to capture the dynamism of laughter movement, but do not overcome animation from human data.