Evolving multimedia systems are increasingly being adopted in virtual reality and gaming applications. Such systems emphasize immersion to engage users by bridging the gap between real and virtual content. In this context, visual and acoustic stimuli are the two key media that dictate such immersion. While visual 3D rendering is advancing rapidly, the same is not true for audio, where most research is limited to the reconstruction of the room impulse response (RIR) using omnidirectional audio or, at best, binaural. Such methods do not adequately account for the directions and orientations of the acoustic signals with respect to either the source or the listener, thereby compromising immersion quality. In this work, we explore the effect of adding such "directionality" to the training data to improve the estimation of the room’s acoustic parameters. A more accurate set of such parameters implies in fact a more realistic predicted RIR, leading to a more immersive experience of the acoustic scene. Specifically, we propose a novel framework driven by a suitable loss function to account for directionality in ambisonic microphones, and novel variants of loss functions for both omnidirectional and ambisonic cases. We also propose to account for microphone characteristics and their contribution to the predicted RIRs. Experiments were performed using two datasets of real recordings and the results established the efficacy of the proposed methods

Direction-Aware Room Impulse Response Estimation for Immersive Audio Rendering in Real Environments

Barbon Junior, Sylvio;Carini, Alberto;
2025-01-01

Abstract

Evolving multimedia systems are increasingly being adopted in virtual reality and gaming applications. Such systems emphasize immersion to engage users by bridging the gap between real and virtual content. In this context, visual and acoustic stimuli are the two key media that dictate such immersion. While visual 3D rendering is advancing rapidly, the same is not true for audio, where most research is limited to the reconstruction of the room impulse response (RIR) using omnidirectional audio or, at best, binaural. Such methods do not adequately account for the directions and orientations of the acoustic signals with respect to either the source or the listener, thereby compromising immersion quality. In this work, we explore the effect of adding such "directionality" to the training data to improve the estimation of the room’s acoustic parameters. A more accurate set of such parameters implies in fact a more realistic predicted RIR, leading to a more immersive experience of the acoustic scene. Specifically, we propose a novel framework driven by a suitable loss function to account for directionality in ambisonic microphones, and novel variants of loss functions for both omnidirectional and ambisonic cases. We also propose to account for microphone characteristics and their contribution to the predicted RIRs. Experiments were performed using two datasets of real recordings and the results established the efficacy of the proposed methods
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11368/3115681
 Avviso

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact