Laughter Databases

Posted on Wed 02 May 2012by by Jerome Urbain
Last update 03-05-2012

To analyse, characterise and model laughter, samples are necessary. Gathering natural laughter samples is a tricky task: laughter is an emotional signal and its realization is strongly influenced by our feelings. Hence natural laughter can only be obtained in realistic situations; simply asking people to laugh in front of cameras does not enable to gather the wide variety of laughs we encounter every day.

Three main techniques have been used to build databases in the fields of emotion and laughter research (Scherer, 2003), from the most natural to the most convenient:

  • Natural expression: data is collected from the real-world, with the subjects free to express themselves and, ideally, not aware that they are being recorded until the end of the data acquisition. A popular setting for emotion recognition is the use of data collected in call-centres (Morrison, Wang, & Silva, 2007) (Devillers & Vidrascu, 2007). The big advantage of these techniques is the naturalness of the data. However, a lot of post processing is needed to segment the laughter utterances, surrounded by a lot of other signals, and it is difficult to ensure high quality recordings in such “hidden” settings.
  • Induced responses: subjects are presented to a stimulus (picture, video, vocal information, etc.) chosen to elicit a target emotion (happiness, fear, etc.). For example, laughter can be induced by presenting a comedy video. Users can be aware they are being recorded, but everything is done to provoke natural reactions. This kind of setting enable to get a good control over the quality of the acquired signals, e.g. using head-mounted microphones, frontal camera views, etc. However, the scenario must be carefully designed to ensure that the participant somehow forgets he is being recorded and acts naturally.
  • Portrayed emotions: actors, professional or not, are directly asked to portray the emotional state or, in our case, to laugh. This kind of data is the easiest to work with, as the annotation phase is quicker and everything can be done to acquire high quality signals (audio, video, etc.). The drawback is the lack of naturalness.

While laughter can virtually be found in any database involving interacting humans, some databases have been recorded with a particular attention on laughter (during the recordings or the post-processing phases). We will only mention some of these “laughter-related” databases here. Further laughter data will be recorded in the framework of the ILHAIRE project to overcome some of the limitations of the currently existing databases, according to the ILHAIRE objectives: we ideally need numerous, accurately annotated (not only laughs must be accurately spotted, but we also need information about the context that elicited and sustained the laugh utterance), multimodal (high quality audio, high quality video for facial movements, body movements, respiration signal, etc.), multi-user (among others to study contagion), spontaneous and multi-cultural laughter data. No existing database satisfies all these requirements. It is indeed unrealistic to have a single corpus fulfilling all the conditions. When appropriate data for our research does not already exist, new data will be recorded by ILHAIRE participants, in line with the scenarios we are targeting in the project, but trying to make new data as profitable as possible for the whole scientific community.

  1. The ICSI Meeting Corpus

    In 2000, the International Computer Science Institute (ICSI) of Berkeley launched a project to record a large speech database from meetings, called ICSI Meeting Corpus. The purpose was to obtain speech as natural as possible and they chose to record only meetings that would have occurred anyway (Janin, et al., 2003). All the meetings took place in a meeting room of their lab, where they would have taken place even if the ICSI Meeting Corpus project had not existed. The only, but important, unnatural setting the project implied on the meetings was the use of head-mounted microphones, in order to have easier speech activity detection and high quality speech transcriptions, but also to avoid penalizing, with poor acoustic signals, non-acoustic research like dialogue structure analysis (Janin, et al., 2003). In consequence, all the subjects knew they were being recorded.

    The meetings involved 3 to 10 participants, with an average of 6 (Janin, et al., 2004). In total, 72h of meetings were recorded, involving 53 different participants. Huge effort was done to annotate the data. For each meeting, there is a full speech transcription, with beginning and ending times of each utterance. Laughter occurrences were also included in the transcriptions. The Corpus is available from the Linguistic Data Consortium (LDC).

    Laughter processing was not the initial purpose of the ICSI Meeting Corpus and it was not the event that received the most attention. But due to the quality of the database, recorded in a natural environment and presenting numerous episodes of laughs in all its forms, it became a standard for laughter processing. Some of the groups using the ICSI Meeting Corpus for laughter processing did additional annotation works to keep only clearly audible laughs (Truong & Leeuwen, 2007) or localize the boundaries of the laughter segments with more accuracy (Laskowsk & Burger, 2007).

  2. The AMI Meeting Corpus

    The AMI Meeting Corpus (Carletta, 2007) consists of 100 hours of meeting recordings. One third is naturally occurring lab meetings. The remaining two thirds were elicited by a role playing game in which participants had to take different roles in a team project. While this is different from the settings of the ICSI Meeting corpus, it has little influence on laughter naturalness.

    The recordings include synchronized audio (individual and far-field microphones) and video (individual and room-view cameras). All of the 138 role playing meetings involved 4 participants. Out of the 33 naturally occurring meetings, 25 also involve 4 participants, 5 have 3 conversationalists and the last 3 have 5 participants.

    The database is freely available for non-commercial purposes from the AMI website. Signals are provided with a range of annotations. Some recordings include annotations about dialogue acts, emotions, actions, gestures, etc. The vocal activities of each participant in each meeting has also been manually transcribed. These transcriptions include the speech as it has been uttered by the speaker (with grammatical errors, hesitations, etc.) and other non-verbal vocalisations such as laughter and cough.

  3. TWSES corpus

    In the framework of the artistic installation "The world starts every second'' (TWSES) (Lafontaine & Todoroff, 2007), voluntary laughs from children and professional singers have been recorded. Some singers portrayed different states of mind like "lover laugh'', "hysteric laugh'', "obsessional laugh'', etc. Some are obviously exaggerated, corresponding to stereotypes of what we consider as "free laughter''. They are clearly different from spontaneous laughs, but they have a strong power of eliciting laughter to their listeners. The recordings include only audio.

  4. SEMAINE database

    Laughter occurs frequently in the SEMAINE SAL recordings (McKeown, Valstar, Cowie, Pantic, & Schroder, 2011), where users are discussing with either an operator-driven avatar or a limited automated avatar, designed to elicit particular emotions. Each file of the SAL database has been labelled in emotional states by 6 to 8 annotators, providing information about the emotional states leading to and following laughter. The database contains audio-visual recordings of the participants with frontal cameras and head-mounted microphones. The database is freely available to the research community.

  5. The Belfast Induced Natural Emotion Database

    The Belfast Induced Natural Emotion Database (BINED) (Sneddon, McRorie, McKeown, & Hanratty, 2011) contains recordings of the natural reactions of subjects participating in different tasks aiming to elicit five different emotional states: frustration, surprise, fear, disgust and amusement. Laughter frequently appeared in all these tasks. The database is freely available for research.

  6. The Green Persuasive Database

    The Green Persuasive Database consists of recordings of conversations between a persuader and a person he tries to convince to adopt more ecological behaviours. Each conversational partner is recorded with a different camera. As in any natural conversation, there are laughter occurrences in the dataset. This database is also freely available for research.

  7. The AVLaughterCycle database

     

    The AVLaughterCycle (AVLC) database (Urbain, et al., 2010) has been recorded during eNTERFACE'09. Its objective was to obtain good quality laughs. 24 subjects were filmed while watching a 12-minutes comedy video. The recorded signals consist of audio, frontal video (webcam) as well as facial motion tracking data. The obtained laughter utterances have been segmented and phonetically annotated (Urbain & Dutoit, 2011). The database is freely available for research works.


     
  8. The Manhob laughter database

    The Manhob laughter database is similar to the AVLC corpus. It was also recorded specifically to acquire laughter data. Subjects were also filmed while watching a funny video, alone. The signals include audio, video and thermal video recordings. Subjects were also asked to speak, which provides interesting data to study the relationship between the laugh and speech styles/features of a person.

  9. Other databases including laughter

    Laughter occurring frequently in everyday situations, most speech databases contain laughs. We can cite the Corpus of Spontaneous Japanese (Maekawa, Koiso, Kikuchi, & Yoneyama, 2003), containing around 650 hours of spontaneous speech. This huge audio database has been transcribed including labels denoting the presence of laughter, but no time boundaries. Professor Nick Campbell has also been involved in several extensive recordings of spontaneous speech and has shown interest in spotting laughter inside his large corpora (Campbell’s website). Among others, there are 20 hours of telephone conversations in Japanese between 8 pairs of volunteers accounting for 2001 laughter and 1129 speech-laugh utterances (Campbell, 2007).

    For his observations of laughs, Wallace Chafe (Chafe, 2007) used excerpts of the Santa Barbara Corpus of Spoken American English, which contains 60 recordings of discourse segments in a range of everyday situations (talking about studies, preparing dinner, business conversations, etc.). The corpus contains transcriptions of the audio files, including labels identifying laughter.

    John Esling (Esling, 2007) used samples from the University of Victoria Larynx Research Project, which includes nasoendoscopic videos of the larynx, to analyse the states of the larynx in laughter. Devillers and Vidrascu  (Devillers & Vidrascu, 2007) were interested in the emotions conveyed by laughter and used 20 hours of telephone conversations in a call centre providing medical advices. Verbal and nonverbal contents such as laughs or tears have been manually annotated. More than half of the 119 laughter utterances in this corpus have been related to negative emotions.

    Dedicated laughter databases have also been recorded for studying some aspects of laughter. For analising laughter acoustics, Bachorowski et al.  (Bachorowski, Smoski, & Owren, 2001) enrolled 139 students and let them watch video containing humorous sequences either alone or with a partner. Laughs from 97 individuals (52 females, 45 males) were kept for the acoustic analyses, for a total of 559 female and 465 male laughter bouts (i.e. laughter exhalation segment). For identifying acoustic correlates of different emotions in laughter, Szameitat et al.  (Szameitat, Alter, Szameitat, Wildgruber, Sterr, & Darwin, 2009), asked 8 professional actors to portray laughs for 4 different affective states, namely joyous, taunting, tickling and schadenfreude (i.e. German word meaning "pleasure in another's misfortune'') laughs. Kipper and Todt (Kipper & Todt, 2007) induced laughter while subjects were reading by putting their own voice in playback with a 200ms delay .

Bibliography

  • Bachorowski, J.-A., Smoski, M. J., & Owren, M. J. (2001). The acoustic features of human laughter. Journal of the Acoustical Society of America, 1581-1597.
  • Campbell, N. (2007). Whom we laugh with affects how we laugh. Proceedings of the Interdisciplinary Workshop on the Phonetics of Laughter, (pp. 61-65). Saarbrücken, Germany.
  • Campbell, N. (n.d.). Nick's Data website. Retrieved June 5, 2011, from http://www.speech-data.jp/
  • Carletta, J. (2007). Unleashing the killer corpus: experiences in creating the multi-everything {AMI Meeting Corpus. Language Resources and Evaluation Journal, 41(2), 181-190.
  • Chafe, W. (2007). The Importance of not being earnest. The feeling behind laughter and humor. (Paperback 2009 ed., Vol. 3). Amsterdam, The Nederlands: John Benjamins Publishing Company.
  • Devillers, L., & Vidrascu, L. (2007). Ensemble methods for spoken emotion recognition in call-centres. Proceedings of the Interdisciplinary Workshop on the Phonetics of Laughter, (pp. 37-40). Saarbrücken, Germany.
  • Esling, J. H. (2007). States of the larynx in laughter. Proceedings of the Interdisciplinary Workshop on the Phonetics of Laughter, (pp. 15-20). Saarbrücken, Germany.
  • Janin, A., Ang, J., Bhagat, S., Dhillon, R., Edwards, J., Macias-Guarasa, J., et al. (2004). The ICSI Meeting Project: Resources and Research. NIST ICASSP 2004 Meeting Recognition Workshop. Montreal, Canada.
  • Janin, A., Baron, D., Edwards, J., Ellis, D., Gelbart, D., Morgan, N., et al. (2003). The ICSI Meeting Corpus. 2003 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'03), (pp. I--364). Hong-Kong.
  • Kipper, S., & Todt, D. (2007). Series of similar vocal elements as a crucial acoustic structure in human laughter. Proceedings of the Interdisciplinary Workshop on the Phonetics of Laughter, (pp. 3-7). Saarbrücken.
  • Lafontaine, M.-J., & Todoroff, T. The world starts every second. Artistic Installation held at the "Musee des Beaux-Arts'', Angers, France}, Angers, France.
  • Laskowsk, K., & Burger, S. (2007). On the Correlation between Perceptual and Contextual Aspects of Laughter in Meetings. Proceedings of the Interdisciplinary Workshop on the Phonetics of Laughter, (pp. 55-60). Saarbrücken, Germany.
  • Maekawa, K., Koiso, H., Kikuchi, H., & Yoneyama, K. (2003). Use of a large-scale spontaneous speech corpus in the study of linguistic variation. Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS 2003), (pp. 643-646).
  • McKeown, G., Valstar, M., Cowie, R., Pantic, M., & Schroder, M. (2011). The SEMAINE database: Annotated multimodal records of emotionally coloured conversations between a person and a limited agent. IEEE Transactions on Affective Computing.
  • Morrison, D., Wang, R., & Silva, L. C. (2007). Ensemble methods for spoken emotion recognition in call-centres. Speech Communication, 98-112.
  • Scherer, K. (2003). Vocal communication of emotion: a review of research paradigms. Speech Communication, 227-256.
  • Sneddon, I., McRorie, M., McKeown, G., & Hanratty, J. (2011). The Belfast Induced Natural Emotion Database. IEEE Transactions on Affective Computing.
  • Szameitat, D. P., Alter, K., Szameitat, A. J., Wildgruber, D., Sterr, A., & Darwin, C. J. (2009). Acoustic profiles of distinct emotional expressions in laughter. (ASA, Ed.) The Journal of the Acoustical Society of America, 126(1), 354-366.
  • Truong, K. P., & Leeuwen, D. A. (2007). Automatic discrimination between laughter and speech. Speech Communication, 144-158.
  • Urbain, J., & Dutoit, T. (2011). A phonetic analysis of natural laughter, for use in automatic laughter processing systems. Affective Computing and Intelligent Interactions, (pp. 397-406). Memphis, Tennesse, USA.
  • Urbain, J., Bevacqua, E., Dutoit, T., Moinet, A., Niewiadomski, R., Pelachaud, C., et al. (2010). AVLaughterCycle: Enabling a virtual agent to join in laughing with a conversational partner using a similarity-driven audiovisual laughter animation. Journal on Multimodal User Interfaces, 47-58.
Jerome Urbain