 
|  | Serveur © IRCAM - CENTRE POMPIDOU 1996-2005. Tous droits réservés pour tous pays. All rights reserved. | 
IEEE 93, Mohonk (USA) 1993
Copyright © IEEE 1993
Although the Spatialisateur should eventually be used in any reproduction environment (taking into account the limitations encountered in each specific case), the current prototype is designed for reproduction on two channels, using headphones (binaural reproduction) or a pair of loudspeakers (conventional stereo or transaural reproduction). The first part of this paper describes the general architecture of the Spatialisateur and the current binaural/transaural prototype. The second part is a discussion of the digital signal processing models which make the binaural reproduction of the room effect realizable in real time. This involves the parametric modelling of head related transfer functions (HRTFs) using IIR filters. A method is then proposed for further reducing the complexity of the binaural processing of early echoes and for including artificial reverberation algorithms to simulate the later part of the room effect.
Fig.2 describes in more detail the prototype DSP module which has been implemented on the Ircam Music Workstation, using only standard objects available in the Max graphical environment. The structure of this prototype illustrates the separation of the room effect into temporal aspects and spatial aspects. The control of temporal aspects is based on the definition of a "generic room effect" (fig.3) separated into four sections : direct sound (DS), echoes R1, echoes R2 and late reverberation (Rev). This definition is derived from recent studies carried out at Ircam on the characterization of room acoustical quality [Lav89, Jul92, Jul93]. The artificial reverberation algorithm which reconstructs this generic room effect is composed of a series association of basic modules -"echoes", "cluster" and "reverb".
The structure of this generic algorithm is derived from [Jot92b, c], except for the introduction of the intermediate processing stage (cluster), which allows to control the echoes R1 and the echoes R2 separately. The delay line T1 provides several delayed copies of the input signal, used for reproducing the early echoes R1. These signals are also mixed in the matrix M2, whose outputs separately feed the parallel bank of delay lines T2, producing the echoes R2. The final stage of the algorithm produces the late reverberation, through the association of the matrix M3 and the bank of delay lines T3 within a feedback loop. The matrices M2 and M3 have no null coefficients, in order to ensure a fast build-up of the echo density from echoes R1 to echoes R2 and into the late reverberation. Explicit control of the reverberation time is obtained by associating an "absorbent filter" with each one of the delay lines T3, and by selecting for M3 a unitary matrix. If the matrix M2 is also unitary, the process producing the echoes R2 is a unitary system, as defined by Gerzon [Gerz76]. This ensures that the spectral content of the echoes R2 and the late reverberation are not affected by the delay times of the early echoes (R1 or R2). The delay times in T1, T2 and T3 can be selected in order to realize the temporal decomposition shown on fig.3.
 
Figure 1 : Processing for one source / one listening zone
 
Figure 2 : General structure of the DSP algorithm
It has been known for several decades that a recursive digital delay network can be used for imitating room reverberation in real time, and that the difficulty of obtaining a natural sounding effect essentially lies in the simulation of the late reverberation [Schr62, Moor79, Stau82, Smith85]. For this purpose, new criteria must be introduced, in terms of "density" of the response in the time domain and in the frequency domain : the echo density (number of echoes per unit time) and the modal density (number of normal modes per unit frequency). It can be shown that both densities can be made high enough, provided that the number and total length of the delays in the feedback network be sufficient [Jot92c, Jot91]. Furthermore, to avoid unnatural resonances in the late reverberation, the absorbent filters which allow to control the reverberation time should be realized so as to specify the decay time of any normal mode as a function of its frequency. When this requirement is met, the reverberation time and the reverberation energy level can be controlled independently as functions of frequency in order to synthesize, in real time, a late reverberation indistinguishable from that measured in an existing room, provided that this room be large enough [Jot92a, b, c].
The generic algoritm which reproduces the temporal structure of the room effect remains unchanged irrespective of the techniques used for the sound capture stage and the diffusion stage, unlike the processing which reproduces the spatial (directional) information. In the case of a source whose directivity pattern is known in all frequency ranges, it is possible to transfer the spatial processing to the output stage, as shown in fig.2.
This implies that each section of the room effect undergoes a spectral correction to account for the radiation characteristics of the source, as suggested in [War90]. On the sound capture side, it is assumed that the DSP module is fed with a single signal devoid of reverberation (e.g. recorded in an anechoic room or with a contact microphone placed on the instrument).
 
Figure 3 : Temporal decomposition - generic room effect
In the approximation of plane waves propagating inside the ear canal the spatial informations introduced by the head may be fully characterized by the measure of the transfer function between the source and the entrance of the ear canal. Such a transfer function may then be used to simulate an extra-cranial virtual source. One aim of the "Spatialisateur" being the control of directional events, a database of HRTFs has been collected from 20 subjects. For each subject, this database consists of 49 measurements for different azimuths and elevations. Two horizontal planes (elevation 0deg. and 30deg.) are sampled every 15deg., and an additional measurement is performed at 90deg. degrees elevation. The corresponding impulse responses were measured with MLS sequences with a 48kHz sampling frequency. The signal was emitted by a concentric two-driver loudspeaker, and recorded with electret microphones inserted in the ear canals.
The two output signals of the binaural "Spatialisateur" can be directly used for headphone listening, like a dummy head recording, provided that a correcting filter be introduced at the output stage to equalize the headphones, if necessary. For this purpose, the transfer function between each earphone and the microphone inserted in the corresponding ear canal is measured for each subject during the measurement procedure described above.
The reproduction can also be performed on a pair of loudspeakers in an acoustically neutral environment, in which case the output filter is replaced by a transaural cross-cancellation filter for inverting the transfer function from the loudspeakers to the ear canals (this filter is implemented for each subject according to [Coop89] and [Jot92c]). A well-known drawback of the transaural reproduction is that, for a faithful reproduction of the spatial information conveyed by the binaural signal, the listener must be placed in a specific position with reference to the loudspeakers. When this constraint is verified, sounds coming from the sides or the back of the listener can be faithfully reproduced although both loudspeakers are in the front.
Fig.4 shows how the general structure of fig.2 is adapted to the case of the binaural / transaural reproduction. The directions of arrival of the direct sound and echoes R1 are reproduced with localization filters derived from HRTF measurements performed on the listener, as described in the second part of this paper. Assuming a statistical description of the temporal and directional distribution of later echoes and reverberation, the directional information for echoes R2 and late reverberation can be reproduced by controlling the cross-correlation coefficient of left and right signals. Each temporal section of the room effect can be corrected separately in level and spectrum, to allow the control of perceptual attributes which depend on the directivity of the source and its position in the room.
The output filter is software switchable between binaural mode (headphone equalization filter) and transaural mode (cross-talk cancellation filter). In the case of conventional stereophonic reproduction on two loudspeakers, this output filter can be simply eliminated (or used for equalizing each loudspeaker) and the localization filters simply become conventional panoramic potentiometers ("panpots") as found in mixing desks. Several types of panpots have been implemented to simulate various stereophonic sound capture techniques (e.g. coincident or non-coincident microphone pairs with cardioid or omnidirectional directivity).
 
Figure 4 : Structure of the DSP algorithm for two-channel reproduction
 (binaural, transaural or stereophonic reproduction)
The advantage of this approach, in a tool designed for musical applications, lies in the possibility of including room acoustical quality attributes in the initial composition stage of a musical work (e.g. by integrating them in the score), using a formalism which does not refer to a particular electronic equipment or an electronic apparatus, or to the architectural environment where the work will be performed.
The parameters of the processing described in the preceding sections are determined by a set of objective criteria which specify the desired acoustical quality and are directly related to the listener's perception. These objective criteria can be measured in a real situation in order to simulate the acoustics of an existing room. The first envisioned application for this prototype is its use for psychocoustical experiments designed for validating and, if necessary, refining the definition of the objective criteria and their relation to the listener's perception, in the case of a single source. However, the musical potential of the prototype was tested in a musical piece, written by Georges Bloch, in which two sources are processed simultaneously with independent controls.
The control interface allows the listener to use its personal HRTFs or to select instantly, within the database of available measurements, the set of HRTFs which yield the most natural auditory sensation. The acoustical quality is controlled by the use of six mutually independent perceptual factors, associated to their respective objective criteria. The perceptual factors are graphically displayed in the form of sliders operated with a mouse. The range of each slider takes into account the average sensitivity of subjects with regard to the corresponding perceptual factor (this sensitivity was estimated in previous studies [Jul92]). The six proposed perceptual factors are the following (the corresponding objective criteria are listed in parenthesis) :
In a further step the knowledge of the binaural signal modifications which remain imperceptible for the listener would rightfully permit a simplification of the description required for controlling the room effect. We will particularly search for a description of the perceptual effect of the early echoes expressed in terms of objective criteria directly related to perceptual factors, rather than to the information of date, energy and directivion for each of the echoes.
Recent studies [Beg92] suggest that some modifications of the temporal and spatial distribution of early echoes are not perceptible. A first approach is to investigate the perceptual effect of elementary modifications in order to evaluate the differential perception thresholds for each of the parameters which characterize the distribution of the echoes (in terms of date, energy, direction). We have chosen a different approach, initially proposed in [Jul93], where the parameters of the distribution remain unchanged, but where we attempt to reduce the amount of information required for the binaural restitution of each echo without affecting the overall perceptual effect.
This chapter describes the proposed models. We first present a parametric modeling procedure which can be applied to the reproduction of the direct sound (or an echo) perceived as an isolated event. We will then present further simplifications to be considered when the direct sound and a series of echoes, forming the early part of a room effect, are reproduced simultaneously.
Two properties will help model the binaural filters [Pons92, Jot92c] :
* a general property of rational linear filters (pole - zero filters) : every stable filter can be decomposed into the series association of a minimum phase filter and an all-pass filter (the latter simply realizes an "excess phase") [Opp75].
* a particular property of HRTFs : in the case of HRTFs, the all-pass filter is approximately equivalent to a pure delay (the excess phase of the HRTF is a linear function of frequency, at least below roughly 10 khz) [Mehr77, Coop89].
A parametric modeling method can be used to approximate the minimum phase part with an IIR filter (recursive filter). The method currently used in our implementation [Mar93] consists in solving the Yule-Walker equations (standard routine available in the Matlab environment), and the chosen order of the IIR filters is 20 (reducing to 40 the number of coefficients). This reduction of information results in the smoothing of the magnitude frequency response of the filter (see fig 5).

Figure 5a : Implementation of directional filters with IIR model of order 20
Dt represents the interaural delay (here in the case of a source
 located to the right of the median plane of the head.

Fig.5b : Frequency response (amplitude and phase) of an HRTF (0deg. elevation, 75deg. azimuth) Dotted line : modeling of this HRTF by an IIR filter of order 20 cascaded with a pure delay.

The contribution of the early echoes to the perceived spectrum depends on :

Figure 6 : Simulation of a group of echoes. Three levels of
reproduction are indicated : 
omni, stereo and binaural. Ai represents the amplitude of echo i. Gl, Gr and
[[Delta]]t 
are the monaural gains and interaural delays defined in section 3.2.1
To ensure that the grouping of the directional filters remains inaudible, it is proposed that, for each ear, the total energy conveyed by the echoes should remain unchanged for all frequencies.
For N echoes of amplitudes Ai and directions corresponding respectively to HRTFs (Hl,i , Hr,i), the total energy arriving at each ear may be written:
 
       

In the realization described in fig.7, the transfer function of each of the IIR filters must be normalised in order to take into acount the total energy introduced by the gains Ai and Gl,i . The required transfer functions for the pair of IIR filters shown on fig.7 can hence be written :
 

Considering the calculation of the monaural gains Gl and Gr described in section 3.2.1, this amounts to dividing each spectrum, El(f) and Er(f), by its average energy calculated in the frequency band [f1, f2]. Of course, this normalization must also be carried out for the IIR filters represented on fig.6, each of which is derived directly from a particular measured HRTF.

Figure 7 : Common directional filtering of a group of echoes
Finally, it is possible to reintroduce in this model the frequency dependence of the acoustical characteristics of the sound source and the walls : this is equivalent to replacing each of the Ai by the product of the elementary filters undergone by the echo prior to its arrival at the receiver. However this is equivalent, for each echo, to transferring these spectral variations into the corresponding left and right IIR filters shown in fig.6. To achieve this, the power gain Ai2 must to be equal to the average energy of the echo in the frequency band [f1, f2] used to calculate monaural gains Gl,i and Gr,i. The grouping of the filters can then be implemented just as described above.
The method for reproducing the early echoes, suggested above, leads us to define the directional filter for processing the late reverberation as having a frequency response given by the average of the HRTF energy spectra corresponding to all possible directions. A filtering similar to this diffuse field HRTF is proposed in [Mart91] for the binaural reproduction of the late reverberation. This process can be compared to the pre-equalization of some high fidelity head-phones that have been calibrated for the diffuse field [Blau83]. For the restitution of the directional information (introduced in section 3.2.1 by the time and gain interaural differences for each echo) the proposed process [Mart91, Jot92] consists in controlling the interaural cross-correlation coefficient (or IACC, defined e.g. in [Blau83]).
At the opposite extremity of the acoustic channel, a similar model can be used for taking into account of the directivity characteristics of the source. In the hypothesis of a diffuse field, the late reverberation process of is fed uniformly by all directions of emission from the source. A power spectrum of the sound source (energy average of the spectrum emitted by the source in all directions) is suitable for properly describing the contribution of the source to the late reverberation process [War90]. In the case of a source with frequency-dependent directivity, the equalization which reconstructs this power spectrum of the source from the input must simply multiply the "diffuse field HRTF".
The calculation of the "diffuse fiel HRTF" exactly follows the method described in section 3.2.3, considering that all directions equally contribute to the global spectrum (El, Er). In practice, this calculation is identical to the calculation of the "average HRTF" filter, described in section 3.2.3, applied to the particular case of 49 echoes coming from each of the 49 directions for which a HRTF measurement was made. However, since these measurements do not constitute a regular discretization of the space surrounding the listener, the energy of each HRTF is weighted by the solid angle represented by the corresponding measurement.
Direct sound
The directional filter used for reproducing the direct sound is the IIR model of the transfer function corresponding to the direction of arrival. Informal tests led us to reduce the order of the filter down to 20. The spectral smoothing induced by the modeling of the minimum phase part of the HRTF was then considered to be undetectable. Due to precedence effect, further simplification of the processing of the direct sound can not be undertaken without a risking localisation distorsions.
Late reverberation and cluster
The directional filter used at the output of the "cluster" and "reverb" processes is the "diffuse field HRTF" introduced in the previous section, i.e. computed from all possible directions of arrival.
Early echoes
The envisioned simplifications consist in filtering the whole group of echoes with one common filter, keeping as individual directional cues for each echo the pair of monaural gains and the interaural delay. As described above, two different approaches may be considered for the calculation of the common filter. The first consists in using the same diffuse field filter as for the later reverberation process. The second consists in designing a common filter which is only calculated from the HRTFs corresponding to the specific directions of arrival of the echoes, weighted by their respective energies.
Both filters yeld a substantial reduction of the processing cost. However, in the context of a time varying room simulation, the "average HRTF filter" requires an uneasy control task since it must to be reset whenever the distribution of echoes changes (for example when the source or the receiver is moving within the room). The final part of this paper will present some results of a psychoacoustic experiment which intends to study the conditions under which these approaches are consistent.
Diffuse-field output equalization
As one can see, the diffuse field filter is involved in several sections of the room response and is mandatory for reproducing the late reverberation on headphones. It is thus natural to transfer this filtering into the output equalization filter which appears in fig.4. This strategy is particularly interesting if the binaural ouput signal is displayed on diffuse-field calibrated headphones. In this case, the output equalization filter can simply be eliminated!
In order to restore the proper binaural information for the direct sound and for the early echoes, all the corresponding directional filters must then be normalized (divided) by the diffuse field filter. We can then sum up the different processes on the flowgraph presented in figure 4. Each time section of the response (including direct sound) first goes through a directional process. The direct sound requires a binaural filter normalised by the diffuse field filter, the early echoes have individual interaural delays and monaural gains and are filtered by a common "average filter" (also normalized by the diffuse-field filter), the cluster and the reverberation sections simply undergo a correlation control between left and right channels. Each section of the room effect is filtered by an equalizer which describes the information linked to the behaviour of the room (sound attenuation during propagation and reflexions) and to the source characteristics (positionning and radiation). After mixing of the different sections, both left and right channels are processed, if necessary, through an output equalization stage which performs the diffuse field filtering and an inverse filtering depending on headphone or loudspeaker reproduction
In order to evaluate the validity domain for the proposed simplifications, the experiment was carried out with various acoustical conditions where three parameters were controlled :
The main observations pointed out by the experiment are as follows :
[Blau83] J. BLAUERT, "Spatial Hearing : the psychophysics of human sound localization", Cambridge MIT Press, 1983.
[Coop89] D.H. Cooper, J.L. Bauck, "Prospects for transaural recording", J. Audio Eng. Soc. 37(1/2): 3-19, 1989.
[Gerz76] M.A. Gerzon, "Unitary (Energy preserving) multichannel networks with feedbacks", in Electronics Letters, V, 12-11, 1976.
[Jot91] J.M. Jot, A. Chaigne "Digital delay networks for designing artificial reverberators", Proc. 90th A.E.S. Conv., Paris, preprint 3030 (E-2), 1991.
[Jot92a] J.M. Jot, "An analysis/synthesis approach to real-time artificial reverberation", Proc. IEEE ICASSP, San Francisco (paper n° 675), March 1992.
[Jot92b] J.M. Jot, A. Chaigne, "Spatialisation artificielle audio-numérique", French patent n° 92 02528, awarded March 1992.
[Jot92c] J.M. Jot, "Etude et réalisation d'un spatialisateur de sons par modèles physiques et perceptifs", doctoral dissertation, Télécom Paris, 1992.
[Jul92] J.P. Jullien et al., "Some results on the objective characterisation of room acoustical quality in both laboratory and real environments", Proc. Inst. of Acoustics, XIV(2), Birmingham, 1992.
[Jul93] J.P. Jullien, E. Kahle, M. Marin, O. Warusfel, G. Bloch, J.M. Jot, "Spatializer: a perceptual approach", Proc. 94th AES Convention, Berlin, preprint 3465, 1993. (Available from jmjot@ircam.fr, warusfel@ircam.fr, kahle@ircam.fr)
[Lav89] C. Lavandier, "Validation perceptive d'un modèle objectif de caractérisation de la qualité acoustique des salles", doctoral dissertation, Univ. du Maine, Le Mans, Juin 1989.
[Mar93] M. MARIN, rapport CNET NT/LAA/TSS, 1993.
[Mart91] J. MARTIN et al., "Binaural simulation of concert halls : a new approach for the binaural reverberation process", to be published.
[MEHR 77] S. MEHRGARDT, V. MELLERT, "Transformation characteristics of the external human ear", J. Acou. Soc. Am. vol 61(6) : 1567-1576, 1977.
[Mein93] M. Mein, "Perception de l'information binaurale liée aux réflexions précoces dans une salle. Application à la simulation de la qualité acoustique", Mémoire de DEA, Univ. du Maine, Le Mans, Septembre 1993.
[Moor79] J.A. Moorer, "About this reverberation business", Computer Music Journal 3(2): 13-18, 1979.
[Opp75] A.V. Oppenheim, R.W. Shafer, "Digital Signal Processing", Prentice Hall, 1975.
[Pers89] A. Persterer, "A very high performance digital audio processing system", Proc. 13th ICA, Belgrade, 1989.
[Pers91] A. Persterer, "Binaural reproduction of an 'ideal control room' for headphone reproduction", Proc. 90th AES Convention, Paris, preprint 3062, 1991.
[Pon92] F. Poncet, "Simulation de localisation de sources sonores dans l'espace",
rapport Télécom Paris, Dpt Signal, 1992.
[Schr62] M.R. Schroeder "Natural sounding artificial reverberation", J. Audio Eng. Soc. 10(3): 219-223, 1962.
[Smith85] J.O. Smith, "A new approach to digital reverberation using closed waveguide networks", Proc. Int. Computer Music Conference: 47-63, 1985.
[Stau82] J. Stautner, M. Puckette, "Designing multi-channel reverberators", Computer Music Journal 6(1): 52-65, 1982.
[War90] O. Warusfel, "Etude des paramètres lies a la prise de son pour les applications d'acoustique virtuelle", Proc. 1rst French Congress on Acoustics, vol. 2: 877-880, 1990.
____________________________
Server © IRCAM-CGP, 1996-2008 - file updated on  .
____________________________
Serveur © IRCAM-CGP, 1996-2008 - document mis à jour le  .