Real-time acoustic laughter synthesis
To enable convenient control over the synthesis, the algorithm to automatically generate phonetic transcriptions from intensity curves has been refined, ported in real-time and integrated within a Graphical User Interface built with Puredata. The user can hence control in real-time an intensity slider and listen to the synthesized laugh. The generation algorithm receives the intensity value point by point and selects laughter syllables one by one in real-time thanks to a unit selection approach (Urbain, Cakmak, Charlier, Denti, & Dutoit, 2014).
The MAGE platform (Astrinaki, D'Alessandro, Reboursière, Moinet, & Dutoit, 2013) has been used to perform real-time HMM-based laughter synthesis, using the phonetic transcriptions that are received in real-time from the generation algorithm. MAGE is implementing similar HMM-synthesis techniques as the HTS that has been previously used in ILHAIRE for offline laughter synthesis. The main difference introduced by MAGE is real-time processing, which also means that the waveform is computed piece by piece without knowing what the future will be like. This represents a big challenge for optimizing the trajectories and ensuring a smooth, coherent output. After tuning of the method to laughter, we were able to synthesize laughs in real-time using MAGE. MAGE also offers real-time control over the synthesis parameters, for example pitch. A demonstration video is presented below (Video 1). It indeed appears that the synthesis quality is lower than in the offline (HTS) situation, which has been further confirmed by perceptive evaluations conducted at the University of Mons.
Hence, it was decided to explore another synthesis technique, laughter concatenation, which is known for better audio quality than HMM-based synthesis, but is also less flexible. In the case of concatenation, instead of getting the phonetic transcriptions from the generation algorithm to further synthesize them, we directly play the sound of the syllable selected by the generation algorithm. An example video is displayed below (Video 2). The audio quality is indeed better than HMM-based synthesis, at the cost of a lack of flexibility. For instance, changing the pitch in real-time is difficult with that technique, and could only be performed at the cost of a drop in audio quality. Some repetitions of sounds can also appear, which may be disturbing in the long term.
The competition between the two synthesis techniques remains open and the preference of one method over the other depends on the application (is flexibility important, etc.?). Further research is currently performed at UMONS to improve and evaluate the two techniques.
Astrinaki, M., D'Alessandro, N., Reboursière, L., Moinet, A., & Dutoit, T. (2013). Mage 2.0: New features and its application in the development of a talking guitar. 13th Conference on New Interfaces for Musical Expression (NIME'13). Daejon and Seoul, South Korea.
Urbain, J., Cakmak, H., Charlier, A., Denti, M., & Dutoit, T. (2014). Arousal-driven Synthesis of Laughter. IEEE Journal of Selected Topics in Signal Processing, 8(2), 273-284.