Serveur © IRCAM - CENTRE POMPIDOU 1996-2005.
Tous droits réservés pour tous pays. All rights reserved.

A Real-Time Spatial Sound Processor for Music and Virtual Reality Applications

Jean-Marc Jot, Olivier Warusfel

ICMC 95, Banff (Canada) 1995
Copyright © Ircam - Centre Georges-Pompidou 1995

Abstract

The Spatialisateur, developed by Espaces Nouveaux and Ircam, is a real-time spatial processor which allows to reproduce and control the localization of sound sources and the projection of sounds in a real or virtual space. It can be configured for various reproduction formats over loudspeakers or headphones, and controlled through a higher-level user interface including perceptual attributes derived from psychoacoustical research. Applications include studio recording and computer music, virtual reality and multimedia, or variable acoustics in rooms (sound reinforcement and reverberation enhancement).

Introduction

The Spatialisateur was developed in the Max object-oriented software environment (Puckette 1991), and is implemented as a Max object (named Spat~) running in real-time on the Ircam Musical Workstation. Spat~ can also be considered as a library of elementary objects for real-time spatial processing of sounds (artificial reverberators, multichannel panoramic potentiometers, parametric equalizers...). This modularity allows one to configure the spatial processor for different applications or with different computational costs, depending on the reproduction format or set-up, the desired flexibility in controlling the room effect, and the available digital signal processing resources. The design approach focusses on giving the user the possibility of specifying the desired effect from the point of view of the listener rather than from the point of view of the technological apparatus of physical process which generates that effect. In a musical context, this allows the user to immediately take spatial effects into account at the composition stage, without refering to a particular electro-acoustical apparatus or performing space.

1. Processing Structure

To provide a global description of the reproduced effect, the temporal aspects and the directional aspects are integrated in a cost-efficient application, using the capacity of a single programmable digital signal processor per sound source, with no additional arithmetic hardware. Spat~ can be viewed as an extension of the system proposed in (Chowning 1971), allowing to control effectively and intuitively the direction of sound events as well as their distance or proximity (see section 3 below). The Spat~ processor is formed by cascade association of four configurable sub-modules, namely: Source~, Room~, Pan~, Out~. The Room~ module is a computationally efficient and scalable multi-channel reverberator based on multi-channel delay networks with feedback, designed to ensure the necessary naturalness and accuracy for music and virtual reality applications (Jot & al. 1995). The input signal (assumed devoid of reverberation) is pre-processed by the Source~ module, including a low-pass filter and a variable delay line to reproduce the air absorption and the Doppler effect. Input equalizers allow additional corrections according to the nature of the input signal(s) or to the position of the microphone(s) relative to the instrument.

2. Configuration According to the Reproduction Context

The directional distribution module Pan~ converts the multi-channel output of the Room~ module to a given reproduction format, while simultaneously allowing to control the apparent direction of the sound source. It can be configured for two-channel formats, including three-dimensional stereophony (binaural or transaural) over headphones or over a pair of loudspeakers (Jot & al. 1995), and the simulation of coincident or non-coincident microphone recordings. Multi-channel configurations, appropriate for studios or auditoria, include the '3/2-stereo' format derived from the motion picture industry (Thiele 1993) or systems of 4 to 8 loudspeakers allowing to reproduce all directions in the horizontal plane. The reproduced effect can be specified irrespective of the reproduction context and is, as much as possible, preserved from one reproduction mode or listening room to another. The Out~ module allows to compensate for the frequency response of the loudspeakers or headphones and for time lags due to the geometry of the loudspeaker system. Additionnally, when the listening room is not acoustically neutral, the processor can take into account measurements made at a reference listening position in order to automatically perform the necessary corrections in the room effect synthesis, so that the perceived effect at the reference position be as close as possible to the specification given by the user.

3. Perceptual Control Interface

The reproduced effect can be specified through a higher-level user-interface, which controls the different signal processing modules in Spat~ simultaneously. Its heart is a perceptual control module derived from psychoacoustical research carried out at Ircam (Jullien 1995, Warusfel 1990). The perceptual approach makes it possible to design a spatial processor which does not rely on a physical and geometrical description of the virtual environment for synthesizing the artificial room effect (e.g. Moore 1983, Foster & al. 1991). Instead, the user-interface is directly related to the perception of the reproduced sound by the listener, which is described by a small number of mutually independent perceptual factors:

source proximity, brilliance and warmth (energy and spectrum of direct sound and early reflections),
room presence and envelopment (relative energies of direct sound, early and late room effect),
running reverberance (early decay time), late reverberance (late decay time)
heaviness and liveness (variation of decay time with frequency)

Each perceptual factor is related to a measurable acoustical criterion characterizing the sound transformation. This allows to map the perceptual representation into signal processing parameters. Consequently, virtual and measured acoustical qualities can be manipulated within a unified framework.

4. Applications

By inserting a Spat~ processor in each channel of a mixing console or virtual mixing environment (devoting one DSP to each source channel) the localization and room effect can be intuitively controlled for each sound event. The mix can be produced in traditional as well as currently developing formats, including 3/2-stereo or three-dimensional two-channel stereo (binaural or transaural recording). The processor allows dynamic movements of sound sources and remote control through pointing or tracking devices. The realism of the sound reproduction over headphonesis substantially enhanced by interfacing the spatial processor with a head-tracking device, and by the synthesis of a natural-sounding room effect. Music, multimedia or virtual reality applications can benefit from a perceptually-oriented user interface which is particularly suitable for dynamic interpolation processes between different acoustical qualities. The Spat~ library can also be used in the design of an electro-acoustic system allowing to dynamically modify the acoustical quality of a large hall, for sound reinforcement or reverberation enhancement.

References

J. Chowning, "The simulation of moving sound sources", J. Audio Eng. Soc., vol. 19, no. 1, 1971.

S. Foster, E. M. Wenzel, R. M. Taylor, "Real-time synthesis of complex acoustic environments", Proc. IEEE Workshop on Applications of Digital Signal Processing to Audio and Acoustics, 1991.

J.-M. Jot, V. Larcher, O. Warusfel, "Digital signal processing issues in the context of binaural and transaural stereophony", Proc. 98th Conv. Audio Eng. Soc., preprint 3980, 1995.

J.-P. Jullien, "Structured model for the representation and the control of room acoustical quality", Proc. 15th International Conf. on Acoustics, 1995.

F. R. Moore, "A general model for spatial processing of sounds", Comp. Music J., vol. 7, no. 6, 1983.

M. Puckette, "Combining event and signal processing in the Max graphical programming environment", Computer Music Journal, vol. 15, no. 3, 1991.

G. Thiele, "The new sound format '3/2-stereo'", Proc. 94th Conv. Audio Eng. Soc., preprint 3550a, 1993.

O. Warusfel, "Etude des parametres liés à la prise de son pour les applications d'acoustique virtuelle", Proc. 1rst French Congress on Acoustics, 1990.