Serveur © IRCAM - CENTRE POMPIDOU 1996-2005.
Tous droits réservés pour tous pays. All rights reserved.

Spat~ : A Spatial Processor for Musicians and Sound Engineers

Jean-Marc Jot, Olivier Warusfel

CIARM 95, Ferrara (Italy) 1995
Copyright © CIARM 1995

Abstract

Spat~ is a real-time spatial processing software which runs on the Ircam Music Workstation in the Max graphical signal processing environment. It provides a library of elementary modules (pan-pots, equalizers, reverberators...) linkable into a compact processor integrating the localization of sound events together with the manipulation of room acoustical quality. This processor can be configured for various reproduction formats over loudspeaker or headphones, and controlled through a higher-level user interface including perceptual attributes derived from psychoacoustical research. Applications include studio recording and computer music, virtual reality or variable acoustics in rooms.

1 Introduction

The goal of the Spatialisateur project, conducted by Ircam and Espaces Nouveaux in collaboration with the Centre National d'Etudes des Telecommunications, is to design a virtual acoustics processor allowing composers, performers or sound engineers to control the diffusion of sound in a real or virtual space. This project incorporates research carried out within the Ircam room acoustics laboratory on the objective and perceptual characterization of room acoustical quality, and research done at Telecom Paris on digital signal processing algorithms for the spatialization and artificial reverberation of sounds. The Spatialisateur was developped in the Max object-oriented graphical signal processing software environment, and is available as a Max object named Spat~ running in real time on the Ircam Music Workstation.
The Spat~ processor receives sounds from instrumental or synthetic sources (assumed to be devoid of reverberation), adds spatialization effects in real time, and ouputs signals for reproduction on an electroacoustic system (loudspeakers or headphones). The general approach taken in the design of Spat~ can be characterized by the fact that it gives the user the possibility of specifying the desired effect from the point of view of the listener, irrespective of the device or process used to generate that effect.
Practically, this approach results in the following three general features:

directional aspects and temporal aspects are reproduced in real time by a single processor
this processor can be configured according to the reproduction context and the sound-pickup technique
a control interface is proposed which allows to specify the desired effect using perceptual attributes rather than technological parameters.

This control strategy does not suffer from the limitations that inevitably result from a geometrical and physical characterization of the enclosure, which is also less relevant from a perceptual point of view and involves heavier control computations. Since each perceptual attribute is linked to measurable transformations of the sound, virtual and real acoustical qualities can be manipulated intuitively within a unified representation.

2 Processor structure

The temporal aspects (artificial reverberation) and the directional aspects (localization of sound sources and spatial content of the room effect) are integrated in a single processor. This allows to overcome the limitations of heterogeneous systems in which the localization of sound sources and the reverberation effect are generated with separate devices (e.g. by associating a mixing console with standalone reverberation units). It allows, for instance, to control more precisely and more intuitively the distance or proximity of sound events. From this standpoint, Spat~ can be seen as an extension of the system designed by John Chowning in the seventies [1].
A Spat~ module processes one sound source, and is built by associating four main sub-modules. Each of these sub-modules can be instantiated in several versions according to the application:

Source~: pre-processing of the source signal(s). This module includes a variable delay line allowing to reproduce the propagation time from the source to the listener, together with a low-pass filter reproducing the effect of air absorption at high frequencies. A continuous variation of this "pre-delay" naturally reproduces the Doppler effect (apparent pitch shift) associated to a movement of the sound source. Parametric equalizers can also be included if spectral corrections of the source signal(s) are necessary (e.g. due to the positioning of the microphone(s) relative to the instrument).
Room~: multi-channel reverberator allowing real-time synthesis and control of the room effect (reflections and reverberation). This module is based on artificial reverberation algorithms described in [2, 3], using feedback delay networks to synthesize the diffuse reverberation decay.
Pan~: directional distribution of primary signals and reverberation signals. This module allows to adapt the output of the Room~ module to the loudspeaker setup and to control the apparent direction of the sound source with respect to the listener.
Out~: equalization of the output signals. This module allows to compensate for the frequency response of the loudspeakers or headphones, as well as time lags due to the geometry of the loudspeaker system, with respect to a reference listening position.

3 Configuration according to the reproduction context

The structure of the Room~ module is independent of the reproduction setup. Its output signals are transmitted to the Pan~ module, which feeds each loudspeaker channel and can be configured for different reproduction formats:

Multi-channel systems allowing to reproduce all directions in the horizontal plane. Such setups, using typically 4 to 8 loudspeakers, are suitable for computer music studios or concerts in small- or medium-sized auditoria. The directional processing is based on intensity panning and derived from the method described in [1].
3/2-Stereo: a recently proposed 5-channel sound format derived from systems used in the motion picture industry [4]. It includes an additional center channel to stabilize frontal sound images for a larger listening area, and two surround channels for reproducing ambience and diffuse reverberation coming from the sides and the back of the audience.
3-D stereophony over headphones (binaural reproduction) or a pair of loudspeakers (transaural reproduction). In these modes, Spat~ synthesizes the acoustic information which would be captured in a binaural recording made with a dummy head or with microphones inserted in the ear canals of an individual. The transaural reproduction mode involves an additionnal "cross-talk cancelling" processing stage [3].
Conventional two-channel stereophony, including the simulation of a sound recording made with a coincident or non-coincident microphone pair.

From a signal processing point of view, the binaural recording technique can be viewed as a particular case of a stereo recording technique with a non-coincident pair of microphones. The binaural synthesis process relies on a database of measured "head-related transfer functions", or HRTFs, to model the directivity of the two ears (we have used HRTFs measured on individuals at Ircam [5], as well as data measured by Gardner and Martin [6]). Although the HRTFs must be accurately modelled by digital filters in order to convey the direction of incidence of the direct sound, the efficiency of the binaural room effect simulation can be significantly improved by simplifying the directional processing for early reflections and reverberation, with hardly any audible difference [3].
Once Spat~ has been configured according to the reproduction format and the geometry of the loudspeaker setup, the reproduced effect can be specified irrespective of this reproduction context and is, as much as possible, preserved from a reproduction mode or listening room to another. When the listening room is not acoustically neutral (which is generally the case in concert), Spat~ can take into account measurements made at a reference listening position. It then performs automatically the necessary corrections of the room effect synthesis parameters, so that the perceived effect at that position be as close as possible to the specification given by the user. Such a compensation is necessary, for example, to reproduce the acoustics of a given room in another room.

4 Perceptual control interface

A higher-level control interface is proposed, which allows to control simultaneously the different processing modules in Spat~. This user interface is not a reunion of the "low-level" processing parameters of each sub-module: it provides a global description of the perceived effect and is based on a perceptual control module derived from psychoacoustical research carried out at Ircam [7, 8].

4.1 Controlling the acoustical quality

The term `acoustical quality' is used here to describe globally the transformations undergone by the message radiated by a sound source before it reaches the listener. In a natural situation with a sound source and a listener in a room, the acoustical quality is influenced by the geometry and acoustical properties of the listening room and obstacles, the positions of the listener and the sound source in the room, and the orientation and directivity of the sound source. If several sound sources are present in the same room at different positions or with different orientations or directivity patterns, the acoustical quality is generally different for each one of them.
In a simulation context, the acoustical quality must be dynamically updated when the source or the listener moves. This variation can be reproduced by a spatial processor whose signal processing parameters are computed in real time according to a geometrical and physical description of the virtual room, the virtual source and the listener [9, 10]. Although a geometrical and physical control of the acoustical quality can be implemented using the reverberation model synthesized by the Room~ module, this approach has a number of disadvantages in a musical or artistic context:

The control parameters are not perceptually relevant: the perceived effect of varying a geometrical or physical parameter may often be unpredictable (sometimes non-existant).
Updating the DSP parameters involves a complex control process, which typically relies on the computation of an image source distribution to derive the dates and energies of room reflections, from the geometrical and physical parameters.
This control method is limited to reproducing physically realizable situations. Even if the modelled room is imaginary, the laws of physics will limit the range of realizable acoustical qualities. For instance, in a room of a given shape, modifying wall absorption coefficients to modify the decay time will cause a change in the level of the room effect at the same time.

The approach taken in the Spatialisateur project allows to design a spatial processor which does not rely on a physical and geometrical description of the virtual environment for synthesizing the room effect. Instead, the proposed user-interface is directly related to the perception of the reproduced sound by the listener. In a musical context, this approach allows to immediately take the acoustical quality into account at the composition stage (by integrating perceptual attributes in the score, for example), without refering to a particular electroacoustical setup or to the place where the work will actually be performed. Additionally, the real-time computational efficiency is maximized since the processing is focussed on the reproduction and control of perceptually relevant attributes.

4.2 The perceptual factors

In the perceptual control interface, the acoustical quality is described by a small number of mutually independent perceptual factors, each of which is related to an objectively measurable criterion which characterizes the transformation undergone by the sound. These relations allow to translate the perceptual factors into signal processing parameters, and to reproduce the acoustical quality of an existing room. Furthermore, the perceptual factors provide a relevant basis for controlling dynamic interpolation processes between different acoustical qualities. Such processes can be used for musical or artistic purposes, or applied to virtual reality and simulation applications. The perceptual factors are manipulated by means of sliders which are scaled to account for the average sensitivity of listeners (see Figure 1). Additionally, the radiation characteristics of the sound source are modelled by a directivity index specified as a function of frequency.
Three perceptual factors describe effects which are characteristic of the room (the objective criteria are indicated in parenthesis):

late reverberance (late decay time)
heaviness and liveness (variation of decay time with frequency)

The six other factors describe effects which depend of the position, directivity and orientation of the source. The first three are perceived as characteristics of the source, while the next three are perceptually associated to the room:

source presence (energy of the direct sound and early room effect)
brilliance and warmth (variation of early energy with frequency)
room presence (energy of late room effect)
running reverberance (early decay time)
envelopment (energy of early room effect relative to direct sound)

A variation of the source presence creates a convincing effect of proximity or remoteness of the sound source. The term "reverberance" refers to the sensation that sounds are prolonged by the room reverberation. Late reverbance differs from running reverberance by the fact that it is essentially perceived during interruptions of the message radiated by the source. Running reverberance, on the contrary, remains perceived during continuous music.

Figure 1: higher-level user interface

5 Applications and perspectives

From the point of view of signal processing computational cost, the binaural / transaural mode is particularly demanding. Yet, even in this case, the implementation of Spat~ requires less than 400 operations (multiply-accumulates) per sample at a sampling frequency of 48 kHz [3]. This corresponds to less than 20 million operations per second, which can be handled by typical digital signal processors (DSPs) for audio applications. It is thus economically feasible to design a digital mixing console including a room simulator in each channel (which implies devoting one DSP per source). For studio recording and computer music applications, this evolution may call for a new kind of user interfaces to control the room simulation: providing a reduced set of independant perceptual attributes, as described in this paper, is particularly promising from the point of view of ergonomy. By differentiating sound sources in the generation of the room effect, this approach yields, in particular, a more effective and intuitive control of the subjective distance of each virtual sound source.
Spatial processors for virtual reality and multimedia applications also rely on a real-time mixing architecture and may benefit substantially from the reproduction of a natural-sounding room effect allowing effective control of the subjective distance of sound events. Spat~ is designed to ensure the necessary degree of interactivity, and allow dynamic movements of the virtual sound sources or the use of a headtracking device in headphone reproduction.
In the field of architectural acoustics, one perspective is the evolution of auralization systems toward real-time operation, allowing instant monitoring of modifications in source or listener location, geometry or wall materials of a room or concert hall, before its construction. This application places strong requirements on the accuracy of the room effect synthesis and the validation of the physical modelling algorithms used to predict the propagation of sound in rooms [11, 12]. The virtual acoustics processor can also be used to modify the acoustical quality of an existing room (sound reinforcement and/or assisted reverberation, with live sources or pre-recorded signals). To control effectively the perceptual attributes related to the direct sound and the early reflections in a relatively large room, the structure of the processor is configured according to a division of the audience area and/or the stage area into adjacent zones.

References

[1] J. Chowning, "The simulation of moving sound sources", Journal of the Audio Engineering Society, vol. 19, no. 1, pp. 2-6, 1971.

[2] J.-M. Jot, "Etude et realisation d'un spatialisateur de sons par modeles physiques et perceptifs", Doctoral dissertation, Telecom Paris, 1992.

[3] J.-M. Jot, V. Larcher, O. Warusfel, "Digital signal processing issues in the context of binaural and transaural stereophony", Proc. 98th Convention of the Audio Engineering Society, preprint 3980, Paris, 1995.

[4] G. Thiele, "The new sound format `3/2-Stereo'", Proc. 94th Convention of the Audio Engineering Society, preprint 3550a, Berlin, 1993.

[5] M. Marin, A. Gilloire, J.-F. Lacoume, O. Warusfel, J.-M. Jot, "Environnement de simulation pour l'evaluation psychoacoustique des systemes de prise et de restitution du son dans un contexte de teleconference", Proc. 3rd French Congress on Acoustics, Toulouse, April 1994.

[6] W.G. Gardner, K. Martin, "HRTF measurements on a KEMAR dummy-head microphone", Tech. report. #280, MIT Media Lab Perceptual Computing, 1994.

[7] J.-P. Jullien, "Structured model for the representation and the control of room acoustical quality", Proc. 15th International Conference on Acoustics, Trondheim, 1995.

[8] O. Warusfel, "Etude des parametres lies a la prise de son pour les applications d'acoustique virtuelle", Proc. 1rst French Congress on Acoustics, Lyon, 1990.

[9] F. R. Moore, "A general model for spatial processing of sounds", Computer Music Journal, vol. 7, no. 6, pp. 6-15, 1983.

[10] S. H. Foster, E.M. Wenzel, R.M.Taylor, "Real-time synthesis of complex acoustic environments", Proc. IEEE Workshop on Applications of Digital Signal Processing to Audio and Acoustics, New Paltz, NY, Oct. 1991.

[11] M. Kleiner, B.-I. Dalenback, P. Svensson, "Auralization - An overview", Journal of the Audio Engineering Society, vol. 41, no. 11, pp. 861-875, 1993.

[12] O. Warusfel, F. Cruz-Barney, "Validation of a computer simulation environment for room acoustics prediction", Proc. 15th International Conference on Acoustics, Trondheim, 1995.