Evolving multimedia systems are increasingly being adopted in virtual reality and gaming applications. Such systems emphasize immersion to engage users by bridging the gap between real and virtual content. In this context, visual and acoustic stimuli are the two key media that dictate such immersion. While visual 3D rendering is advancing rapidly, the same is not true for audio, where most research is limited to the reconstruction of the room impulse response (RIR) using omnidirectional audio or, at best, binaural. Such methods do not adequately account for the directions and orientations of the acoustic signals with respect to either the source or the listener, thereby compromising immersion quality. In this work, we explore the effect of adding such "directionality" to the training data to improve the estimation of the room’s acoustic parameters. A more accurate set of such parameters implies in fact a more realistic predicted RIR, leading to a more immersive experience of the acoustic scene. Specifically, we propose a novel framework driven by a suitable loss function to account for directionality in ambisonic microphones, and novel variants of loss functions for both omnidirectional and ambisonic cases. We also propose to account for microphone characteristics and their contribution to the predicted RIRs. Experiments were performed using two datasets of real recordings and the results established the efficacy of the proposed methods
Direction-Aware Room Impulse Response Estimation for Immersive Audio Rendering in Real Environments
Barbon Junior, Sylvio;Carini, Alberto;
2025-01-01
Abstract
Evolving multimedia systems are increasingly being adopted in virtual reality and gaming applications. Such systems emphasize immersion to engage users by bridging the gap between real and virtual content. In this context, visual and acoustic stimuli are the two key media that dictate such immersion. While visual 3D rendering is advancing rapidly, the same is not true for audio, where most research is limited to the reconstruction of the room impulse response (RIR) using omnidirectional audio or, at best, binaural. Such methods do not adequately account for the directions and orientations of the acoustic signals with respect to either the source or the listener, thereby compromising immersion quality. In this work, we explore the effect of adding such "directionality" to the training data to improve the estimation of the room’s acoustic parameters. A more accurate set of such parameters implies in fact a more realistic predicted RIR, leading to a more immersive experience of the acoustic scene. Specifically, we propose a novel framework driven by a suitable loss function to account for directionality in ambisonic microphones, and novel variants of loss functions for both omnidirectional and ambisonic cases. We also propose to account for microphone characteristics and their contribution to the predicted RIRs. Experiments were performed using two datasets of real recordings and the results established the efficacy of the proposed methodsPubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


