Group conversations in Noisy environments (GiN) – Multimedia recordings for location-aware speech enhancement

Published in IEEE Open Journal of Signal Processing (OJSP), 2023

Authors: Emilie d’Olne, Alastair H. Moore, Patrick A. Naylor, Jacob Donley, Vladimir Tourbabin, Thomas Lunner. Data | Code | Paper | Poster | Examples

Recent years have seen a growing interest in the use of smart glasses mounted with microphones to solve the cocktail party problem using beamforming techniques or machine learning. Many such approaches could bring substantial advances in hearing aid or Augmented Reality (AR) research. To validate these methods, the EasyCom [1] dataset introduced high-quality multi-modal recordings of conversations in noise, including egocentric multi-channel microphone array audio, speech source pose, and headset microphone audio. While providing comprehensive data, EasyCom lacks diversity in the acoustic environments considered and the degree of overlapping speech in conversations. This work therefore presents the Group in Noise (GiN) dataset of over 2 hours of group conversations in noisy environments recorded using binaural microphones and a pair of glasses mounted with 5 microphones. The recordings took place in 3 rooms and contain 6 seated participants as well as a standing facilitator. The data also include close-talking microphone audio and head-pose data for each speaker, an audio channel from a fixed reference microphone, and automatically annotated speaker activity information. A baseline method is used to demonstrate the use of the data for speech enhancement. The dataset is publicly available in [2].

References

[1]  Donley, Jacob, et al. “Easycom: An augmented reality dataset to support algorithms for easy communication in noisy environments.” arXiv preprint arXiv:2107.04174 (2021).

[2]  E. d’Olne, A. H. Moore, P. A. Naylor, J. Donley, V. Tourbabin, and T. Lunner, “Group in Noise (GiN) data - 2023,” doi: https://doi.org/10.14469/hpc/13463.