Serveur © IRCAM - CENTRE POMPIDOU 1996-2005.
Tous droits réservés pour tous pays. All rights reserved.

Binaural Concert Hall Simulation in Real Time

Jean-Marc Jot, Olivier Warusfel, Eckhard Kahle, Mireille Mein

IEEE 93, Mohonk (USA) 1993
Copyright © IEEE 1993

Abstract

A room simulator for binaural and transaural listening has been developped by Ircam and Espaces Nouveaux within the "Spatialisateur" project. In the current state, all control and digital processing operations are prototyped in the Max graphical environment on the Ircam Music Workstation. Due to the requirement for a perceptually relevant control interface operating in real time, the processing is based on the definition of a generic room effect, whose temporal sections are reproduced by the use of artificial reverberation algorithms. In order to reproduce the directional (spatial) information in headphone listening, additional perceptually relevant simplifications of the processing are proposed. Psychoacoustical validation experiments indicate that it is possible to substantially reduce the complexity of the binaural filtering without altering the listener's perception.

1. Introduction

Espaces Nouveaux and Ircam are collaborating on a room acoustics research project called "Spatialisateur". the envisioned tool can be considered as a virtual acoustics processor allowing composers, performers or sound engineers to control the projection of sounds in a performing space. This control should include localization aspects traditionnaly considered by composers, but also enhance the range of available parameters by integrating parameters related to room acoustical quality in the composition process.

Although the Spatialisateur should eventually be used in any reproduction environment (taking into account the limitations encountered in each specific case), the current prototype is designed for reproduction on two channels, using headphones (binaural reproduction) or a pair of loudspeakers (conventional stereo or transaural reproduction). The first part of this paper describes the general architecture of the Spatialisateur and the current binaural/transaural prototype. The second part is a discussion of the digital signal processing models which make the binaural reproduction of the room effect realizable in real time. This involves the parametric modelling of head related transfer functions (HRTFs) using IIR filters. A method is then proposed for further reducing the complexity of the binaural processing of early echoes and for including artificial reverberation algorithms to simulate the later part of the room effect.

2. Description of the Prototype

2.1. Structure of the processor

In the architecture of the real-time processor, illustrated in fig.1, the DSP module receives signals from an acoustic or synthesized instrument and computes the signals which feed the reproduction system (a set of loudspeakers or a pair of earphones). The processing must allow to control the localization of the source with respect to the listener, and also the acoustical quality (the perception of the transformation of sound from the source to the listener in a room) by adding an artificial room effect to the input signals. To account for the aspects of acoustical quality which depend on the directivity of the source, several input signals emanating from this source may be necessary.

Fig.2 describes in more detail the prototype DSP module which has been implemented on the Ircam Music Workstation, using only standard objects available in the Max graphical environment. The structure of this prototype illustrates the separation of the room effect into temporal aspects and spatial aspects. The control of temporal aspects is based on the definition of a "generic room effect" (fig.3) separated into four sections : direct sound (DS), echoes R1, echoes R2 and late reverberation (Rev). This definition is derived from recent studies carried out at Ircam on the characterization of room acoustical quality [Lav89, Jul92, Jul93]. The artificial reverberation algorithm which reconstructs this generic room effect is composed of a series association of basic modules -"echoes", "cluster" and "reverb".

The structure of this generic algorithm is derived from [Jot92b, c], except for the introduction of the intermediate processing stage (cluster), which allows to control the echoes R1 and the echoes R2 separately. The delay line T1 provides several delayed copies of the input signal, used for reproducing the early echoes R1. These signals are also mixed in the matrix M2, whose outputs separately feed the parallel bank of delay lines T2, producing the echoes R2. The final stage of the algorithm produces the late reverberation, through the association of the matrix M3 and the bank of delay lines T3 within a feedback loop. The matrices M2 and M3 have no null coefficients, in order to ensure a fast build-up of the echo density from echoes R1 to echoes R2 and into the late reverberation. Explicit control of the reverberation time is obtained by associating an "absorbent filter" with each one of the delay lines T3, and by selecting for M3 a unitary matrix. If the matrix M2 is also unitary, the process producing the echoes R2 is a unitary system, as defined by Gerzon [Gerz76]. This ensures that the spectral content of the echoes R2 and the late reverberation are not affected by the delay times of the early echoes (R1 or R2). The delay times in T1, T2 and T3 can be selected in order to realize the temporal decomposition shown on fig.3.

Figure 1 : Processing for one source / one listening zone

Figure 2 : General structure of the DSP algorithm

It has been known for several decades that a recursive digital delay network can be used for imitating room reverberation in real time, and that the difficulty of obtaining a natural sounding effect essentially lies in the simulation of the late reverberation [Schr62, Moor79, Stau82, Smith85]. For this purpose, new criteria must be introduced, in terms of "density" of the response in the time domain and in the frequency domain : the echo density (number of echoes per unit time) and the modal density (number of normal modes per unit frequency). It can be shown that both densities can be made high enough, provided that the number and total length of the delays in the feedback network be sufficient [Jot92c, Jot91]. Furthermore, to avoid unnatural resonances in the late reverberation, the absorbent filters which allow to control the reverberation time should be realized so as to specify the decay time of any normal mode as a function of its frequency. When this requirement is met, the reverberation time and the reverberation energy level can be controlled independently as functions of frequency in order to synthesize, in real time, a late reverberation indistinguishable from that measured in an existing room, provided that this room be large enough [Jot92a, b, c].

The generic algoritm which reproduces the temporal structure of the room effect remains unchanged irrespective of the techniques used for the sound capture stage and the diffusion stage, unlike the processing which reproduces the spatial (directional) information. In the case of a source whose directivity pattern is known in all frequency ranges, it is possible to transfer the spatial processing to the output stage, as shown in fig.2.

This implies that each section of the room effect undergoes a spectral correction to account for the radiation characteristics of the source, as suggested in [War90]. On the sound capture side, it is assumed that the DSP module is fed with a single signal devoid of reverberation (e.g. recorded in an anechoic room or with a contact microphone placed on the instrument).

Figure 3 : Temporal decomposition - generic room effect

2.2. Available reproduction modes

In its current implementation, the prototype is adapted to the reproduction on two output channels only. In order to overcome the limitations of conventional two-channel stereophony (unsuited to headphone listening and limited to the reproduction of frontal information on loudspeakers), the emphasis was laid, in a first step, on the binaural reproduction mode. In this mode, the spatializer attempts to reconstruct the information which would be captured by two microphones inserted in the ear canals of a listener placed in the virtual acoustical environment.

In the approximation of plane waves propagating inside the ear canal the spatial informations introduced by the head may be fully characterized by the measure of the transfer function between the source and the entrance of the ear canal. Such a transfer function may then be used to simulate an extra-cranial virtual source. One aim of the "Spatialisateur" being the control of directional events, a database of HRTFs has been collected from 20 subjects. For each subject, this database consists of 49 measurements for different azimuths and elevations. Two horizontal planes (elevation 0deg. and 30deg.) are sampled every 15deg., and an additional measurement is performed at 90deg. degrees elevation. The corresponding impulse responses were measured with MLS sequences with a 48kHz sampling frequency. The signal was emitted by a concentric two-driver loudspeaker, and recorded with electret microphones inserted in the ear canals.

The two output signals of the binaural "Spatialisateur" can be directly used for headphone listening, like a dummy head recording, provided that a correcting filter be introduced at the output stage to equalize the headphones, if necessary. For this purpose, the transfer function between each earphone and the microphone inserted in the corresponding ear canal is measured for each subject during the measurement procedure described above.

The reproduction can also be performed on a pair of loudspeakers in an acoustically neutral environment, in which case the output filter is replaced by a transaural cross-cancellation filter for inverting the transfer function from the loudspeakers to the ear canals (this filter is implemented for each subject according to [Coop89] and [Jot92c]). A well-known drawback of the transaural reproduction is that, for a faithful reproduction of the spatial information conveyed by the binaural signal, the listener must be placed in a specific position with reference to the loudspeakers. When this constraint is verified, sounds coming from the sides or the back of the listener can be faithfully reproduced although both loudspeakers are in the front.

Fig.4 shows how the general structure of fig.2 is adapted to the case of the binaural / transaural reproduction. The directions of arrival of the direct sound and echoes R1 are reproduced with localization filters derived from HRTF measurements performed on the listener, as described in the second part of this paper. Assuming a statistical description of the temporal and directional distribution of later echoes and reverberation, the directional information for echoes R2 and late reverberation can be reproduced by controlling the cross-correlation coefficient of left and right signals. Each temporal section of the room effect can be corrected separately in level and spectrum, to allow the control of perceptual attributes which depend on the directivity of the source and its position in the room.

The output filter is software switchable between binaural mode (headphone equalization filter) and transaural mode (cross-talk cancellation filter). In the case of conventional stereophonic reproduction on two loudspeakers, this output filter can be simply eliminated (or used for equalizing each loudspeaker) and the localization filters simply become conventional panoramic potentiometers ("panpots") as found in mixing desks. Several types of panpots have been implemented to simulate various stereophonic sound capture techniques (e.g. coincident or non-coincident microphone pairs with cardioid or omnidirectional directivity).

Figure 4 : Structure of the DSP algorithm for two-channel reproduction
(binaural, transaural or stereophonic reproduction)

2.3 Perceptual control interface

The specificities of the approach which was chosen in the Spatialisateur project are described in [Jul93] : the artificial room effect is not controlled through a physical and geometrical description of the virtual room, but with a control interface which is directly related to the listener's perception, derived from recent perceptual studies [Lav89, Jul92].

The advantage of this approach, in a tool designed for musical applications, lies in the possibility of including room acoustical quality attributes in the initial composition stage of a musical work (e.g. by integrating them in the score), using a formalism which does not refer to a particular electronic equipment or an electronic apparatus, or to the architectural environment where the work will be performed.

The parameters of the processing described in the preceding sections are determined by a set of objective criteria which specify the desired acoustical quality and are directly related to the listener's perception. These objective criteria can be measured in a real situation in order to simulate the acoustics of an existing room. The first envisioned application for this prototype is its use for psychocoustical experiments designed for validating and, if necessary, refining the definition of the objective criteria and their relation to the listener's perception, in the case of a single source. However, the musical potential of the prototype was tested in a musical piece, written by Georges Bloch, in which two sources are processed simultaneously with independent controls.

The control interface allows the listener to use its personal HRTFs or to select instantly, within the database of available measurements, the set of HRTFs which yield the most natural auditory sensation. The acoustical quality is controlled by the use of six mutually independent perceptual factors, associated to their respective objective criteria. The perceptual factors are graphically displayed in the form of sliders operated with a mouse. The range of each slider takes into account the average sensitivity of subjects with regard to the corresponding perceptual factor (this sensitivity was estimated in previous studies [Jul92]). The six proposed perceptual factors are the following (the corresponding objective criteria are listed in parenthesis) :

(a) Perceptual factors characterizing the room :
- - reverberance (reverberation time at medium frequencies)
- - liveness (ratio of high-frequency- to medium-frequency reverberation time)
(b) Perceptual factors related to the position and directivity of the source:
- - proximity (energy of direct sound and early room effect)
- - room effect (energy of the late reverberation)
- - clarity (slope of integrated decay curve in the time interval from 40 to 80 ms)
- - transparency (ratio of direct sound energy to early room effect energy)

The localization of the source in the horizontal plane is continuously variable with the azimuth control (direction angle with reference to the frontal direction). This control requires an interpolation to be computed in real time between the localization filters derived from the measurements performed on every subject (with an angular step of 15deg.). Although measurements allowing to simulate the elevation of the source were also performed for each subject, the interpolation process has not been extended to elevation at the time of this writing.

3. Modeling

The goal of the modeling described in this chapter is to reduce the quantity of binaural information needed for a reliable reproduction of the room effect. In the immediate prospect, this simplification should allow to reduce the cost of the implementation of the directional filter for the real-time simulation.

In a further step the knowledge of the binaural signal modifications which remain imperceptible for the listener would rightfully permit a simplification of the description required for controlling the room effect. We will particularly search for a description of the perceptual effect of the early echoes expressed in terms of objective criteria directly related to perceptual factors, rather than to the information of date, energy and directivion for each of the echoes.

Recent studies [Beg92] suggest that some modifications of the temporal and spatial distribution of early echoes are not perceptible. A first approach is to investigate the perceptual effect of elementary modifications in order to evaluate the differential perception thresholds for each of the parameters which characterize the distribution of the echoes (in terms of date, energy, direction). We have chosen a different approach, initially proposed in [Jul93], where the parameters of the distribution remain unchanged, but where we attempt to reduce the amount of information required for the binaural restitution of each echo without affecting the overall perceptual effect.

This chapter describes the proposed models. We first present a parametric modeling procedure which can be applied to the reproduction of the direct sound (or an echo) perceived as an isolated event. We will then present further simplifications to be considered when the direct sound and a series of echoes, forming the early part of a room effect, are reproduced simultaneously.

3.1 Parametric modeling of the binaural filters

The real-time implementation of the binaural filters under the form of a FIR filter (finite impulse response filter) requires a large processing power. The length of the impulse responses being of the order of 5 ms, the FIR filter typically requires about 250 coefficients.

Two properties will help model the binaural filters [Pons92, Jot92c] :

* a general property of rational linear filters (pole - zero filters) : every stable filter can be decomposed into the series association of a minimum phase filter and an all-pass filter (the latter simply realizes an "excess phase") [Opp75].

* a particular property of HRTFs : in the case of HRTFs, the all-pass filter is approximately equivalent to a pure delay (the excess phase of the HRTF is a linear function of frequency, at least below roughly 10 khz) [Mehr77, Coop89].

A parametric modeling method can be used to approximate the minimum phase part with an IIR filter (recursive filter). The method currently used in our implementation [Mar93] consists in solving the Yule-Walker equations (standard routine available in the Matlab environment), and the chosen order of the IIR filters is 20 (reducing to 40 the number of coefficients). This reduction of information results in the smoothing of the magnitude frequency response of the filter (see fig 5).

Figure 5a : Implementation of directional filters with IIR model of order 20

Dt represents the interaural delay (here in the case of a source
located to the right of the median plane of the head.

Fig.5b : Frequency response (amplitude and phase) of an HRTF (0deg. elevation, 75deg. azimuth) Dotted line : modeling of this HRTF by an IIR filter of order 20 cascaded with a pure delay.

3.2. Modeling of the binaural information related to the room effect

To realize the binaural simulation of a room effect in real-time, the above binaural filter modeling method can be applied to the directional filter of the direct sound as well as to each of the early echoes. However, for reproducing of a series of successive echoes within a short time interval, we can expect that a further simplification of the model would remain imperceptible.

3.2.1. Reproducing echoes with a "stereophonic" model

The binaural information in the HRTF can be separated as follows :
- - interaural differences of delay and amplitude,
- - spectral cues due to the diffraction by the head and the external ears [Blau83].

This can be understood as a hierarchization of binaural cues. The parametric modelling suggested before can be interpreted as a reduction of spectral cues (smoothing). When the order of the filter is reduced to the lowest limit (order 0), only the delay information will remain as well as a frequency-independent interaural gain (this simplified model for reproducing the directional information of the earliest echoes is proposed in [Jot92c]). This model can be called "stereophonic" because it is equivalent to a stereo recording realized with a non-coincident pair of microphones whose spacing and directivity (assumed independent of frequency) are defined in order to realize the following characteristics :


: * Duration of interaural delays :
: For frequencies above about 2 kHz, the interaural delay is approximately a linear function of the azimuth (increasing from 0 at azimuth 0deg. to about 0,7 ms at azimuth 90deg.) [Blau83]. This result is approximately verified when the interaural delay is derived from the excess phase of the HRTF [Ponc92], as in the model described in section 3.1. However, since the purpose of this study is not to derive universal properties of HRTFs, the calculated delay values are kept for each individual.
: * Monaural gains :
: To evaluate the monaural gains, noted gl and gr, we are naturally led to evaluate the mean energy of each transfer function within a significant frequency band [f1, f2]. The interaural gain proves to be hardly modified wheter the frequency band is [1 khz, 8 khz], [1 khz, 5 khz], or [2 khz, 8 khz]. As a consequence, in accordance with previous work, the chosen limit frequencies are f1 = 1 khz and f2 = 5 khz. The monaural gains for the direction i are given by the following formulas where the pair (Hl,i ; Hr,i) is the HRTF pair measured for direction i :

3.2.2. Separating spatial and temporal aspects

Implementing the directional filters with the above "stereophonic" model yields a considerable processing cost reduction, compared to a complete binaural simulation of the early room effect. However, the information extracted by this method mainly reporoduces the spatial (directional) and ignores the spectral modifications introduced by the head and the outer ears.

The contribution of the early echoes to the perceived spectrum depends on :

a) the temporal distribution of the echoes and their respective amplitudes (noted Ai),
b) the history of each echo : the spectrum of each echo is the result of a series of elementary filters associated to the direction of emission (radiation pattern of the sound source), to the absorption of the walls (frequency dependent) and to the HRTFs of the listener for the direction of arrival of the echo.

In a first step, the directivity of the source and the absorption coefficients of the walls can be considered independent of frequency ; they will then only result in a pure attenuation for each echo. Under these assumptions, the information from a) is exactly that which would be recovered by an omnidirectional microphone, while the information from b) is strictly directional and corresponds to the difference between the information recovered by the listener's ear and that recovered with an omnidirectional microphone. As illustrated in fig.6, we can further decompose this difference by first considering the information recovered with a pair of microphones (according to the stereophonic model of section 3.2.1) and then considering the additional spectral information due to the head and the external ears.

Figure 6 : Simulation of a group of echoes. Three levels of reproduction are indicated :
omni, stereo and binaural. Ai represents the amplitude of echo i. Gl, Gr and [[Delta]]t
are the monaural gains and interaural delays defined in section 3.2.1

3.2.3. Grouping the directional filters for a series of echoes

Given the temporal integration properties of the ear, we may assume that the perception of the different spectra of the individual echoes is partly described by a common spectrum of the set of echoes, irrespective of their temporal distribution. This leads to think that the binaural restitution of a dense group of echoes can be simplified without affecting the listener's perception, by using a common filter for all echoes, as suggested in [Jul93]. If all the filters which appear in the final (binaural) stage of fig.6 have the same transfer function, it is equivalent to implement the filtering as shown in fig.7.

To ensure that the grouping of the directional filters remains inaudible, it is proposed that, for each ear, the total energy conveyed by the echoes should remain unchanged for all frequencies.

For N echoes of amplitudes Ai and directions corresponding respectively to HRTFs (Hl,i , Hr,i), the total energy arriving at each ear may be written:

In the realization described in fig.7, the transfer function of each of the IIR filters must be normalised in order to take into acount the total energy introduced by the gains Ai and Gl,i . The required transfer functions for the pair of IIR filters shown on fig.7 can hence be written :

Considering the calculation of the monaural gains Gl and Gr described in section 3.2.1, this amounts to dividing each spectrum, El(f) and Er(f), by its average energy calculated in the frequency band [f1, f2]. Of course, this normalization must also be carried out for the IIR filters represented on fig.6, each of which is derived directly from a particular measured HRTF.

Figure 7 : Common directional filtering of a group of echoes

Finally, it is possible to reintroduce in this model the frequency dependence of the acoustical characteristics of the sound source and the walls : this is equivalent to replacing each of the Ai by the product of the elementary filters undergone by the echo prior to its arrival at the receiver. However this is equivalent, for each echo, to transferring these spectral variations into the corresponding left and right IIR filters shown in fig.6. To achieve this, the power gain Ai² must to be equal to the average energy of the echo in the frequency band [f1, f2] used to calculate monaural gains Gl,i and Gr,i. The grouping of the filters can then be implemented just as described above.

3.2.4. Application to late reverberation

For reproducing later echoes, we assume that these echoes create a diffuse field : at any time, the reverberation is the result of the superposition of a great number of echoes coming from directions equi-distributed in space.

The method for reproducing the early echoes, suggested above, leads us to define the directional filter for processing the late reverberation as having a frequency response given by the average of the HRTF energy spectra corresponding to all possible directions. A filtering similar to this diffuse field HRTF is proposed in [Mart91] for the binaural reproduction of the late reverberation. This process can be compared to the pre-equalization of some high fidelity head-phones that have been calibrated for the diffuse field [Blau83]. For the restitution of the directional information (introduced in section 3.2.1 by the time and gain interaural differences for each echo) the proposed process [Mart91, Jot92] consists in controlling the interaural cross-correlation coefficient (or IACC, defined e.g. in [Blau83]).

At the opposite extremity of the acoustic channel, a similar model can be used for taking into account of the directivity characteristics of the source. In the hypothesis of a diffuse field, the late reverberation process of is fed uniformly by all directions of emission from the source. A power spectrum of the sound source (energy average of the spectrum emitted by the source in all directions) is suitable for properly describing the contribution of the source to the late reverberation process [War90]. In the case of a source with frequency-dependent directivity, the equalization which reconstructs this power spectrum of the source from the input must simply multiply the "diffuse field HRTF".

The calculation of the "diffuse fiel HRTF" exactly follows the method described in section 3.2.3, considering that all directions equally contribute to the global spectrum (El, Er). In practice, this calculation is identical to the calculation of the "average HRTF" filter, described in section 3.2.3, applied to the particular case of 49 echoes coming from each of the 49 directions for which a HRTF measurement was made. However, since these measurements do not constitute a regular discretization of the space surrounding the listener, the energy of each HRTF is weighted by the solid angle represented by the corresponding measurement.

3.3. Realization of the binaural simulator

The different modeling procedures having been discussed in the previous sections, we will now sum up how they may be used for the different time sections of generic room effect, as described in fig.4.

Direct sound

The directional filter used for reproducing the direct sound is the IIR model of the transfer function corresponding to the direction of arrival. Informal tests led us to reduce the order of the filter down to 20. The spectral smoothing induced by the modeling of the minimum phase part of the HRTF was then considered to be undetectable. Due to precedence effect, further simplification of the processing of the direct sound can not be undertaken without a risking localisation distorsions.

Late reverberation and cluster

The directional filter used at the output of the "cluster" and "reverb" processes is the "diffuse field HRTF" introduced in the previous section, i.e. computed from all possible directions of arrival.

Early echoes

The envisioned simplifications consist in filtering the whole group of echoes with one common filter, keeping as individual directional cues for each echo the pair of monaural gains and the interaural delay. As described above, two different approaches may be considered for the calculation of the common filter. The first consists in using the same diffuse field filter as for the later reverberation process. The second consists in designing a common filter which is only calculated from the HRTFs corresponding to the specific directions of arrival of the echoes, weighted by their respective energies.

Both filters yeld a substantial reduction of the processing cost. However, in the context of a time varying room simulation, the "average HRTF filter" requires an uneasy control task since it must to be reset whenever the distribution of echoes changes (for example when the source or the receiver is moving within the room). The final part of this paper will present some results of a psychoacoustic experiment which intends to study the conditions under which these approaches are consistent.

Diffuse-field output equalization

As one can see, the diffuse field filter is involved in several sections of the room response and is mandatory for reproducing the late reverberation on headphones. It is thus natural to transfer this filtering into the output equalization filter which appears in fig.4. This strategy is particularly interesting if the binaural ouput signal is displayed on diffuse-field calibrated headphones. In this case, the output equalization filter can simply be eliminated!

In order to restore the proper binaural information for the direct sound and for the early echoes, all the corresponding directional filters must then be normalized (divided) by the diffuse field filter. We can then sum up the different processes on the flowgraph presented in figure 4. Each time section of the response (including direct sound) first goes through a directional process. The direct sound requires a binaural filter normalised by the diffuse field filter, the early echoes have individual interaural delays and monaural gains and are filtered by a common "average filter" (also normalized by the diffuse-field filter), the cluster and the reverberation sections simply undergo a correlation control between left and right channels. Each section of the room effect is filtered by an equalizer which describes the information linked to the behaviour of the room (sound attenuation during propagation and reflexions) and to the source characteristics (positionning and radiation). After mixing of the different sections, both left and right channels are processed, if necessary, through an output equalization stage which performs the diffuse field filtering and an inverse filtering depending on headphone or loudspeaker reproduction

4. Psychoacoustic Validation of the Simplified Binaural Processing for Early Echoes

This paragraph briefly presents the results of some psychoacoustic experiments that have been undertaken in order to study the conditions under which the proposed simplifications of the binaural synthesis process are perceptually relevant. Since these simplifications rely on the hypothesis of an temporal integration of the spectral informations provided by the different echoes, we expect that the relevance of these simplifications will depend on the temporal and spatial distribution of echoes and on their energy compared with the direct sound or the reverberated field. The experiments consist in comparing different acoustical situations where a group of 7 early echoes is either processed through a common filter or individually filtered by the corresponding measured HRTF. As described above two types of common filters are considered : a diffuse field filter constructed from all the possible directions of arrival (section 3.2.4), or a specific average filter which is designed for each specific spatial distribution of the echoes (section 3.2.3).

In order to evaluate the validity domain for the proposed simplifications, the experiment was carried out with various acoustical conditions where three parameters were controlled :

- in the spatial domain we consider the average direction of arrival called "Center direction". It takes the form of a vector whose coordinates correspond to the average of the different echo directions weighted by their respective energy. The amplitude of this vector will show if the distribution of echoes presents a main direction or if it is evenly distributed. Five center directions were studied : front, side, back, up, plus an even distribution.

- in the time domain we consider the "Center time" which is the weighted average of time arrivals of the echo distribution. The associated standard deviation allows to measure the evenness of the time ditribution. Three cases were studied : early and late Center time (resp. < 30 and > 40 ms), plus a regular time distribution.

- the energy distribution may be controlled by means of the ratio between the direct sound and the group of reflexions (0, +3, -6 dB) and the ratio between the group of reflexions and the late reverberated energy (0, +3dB).

The main observations pointed out by the experiment are as follows :

- the use of an common filter calculated from the specific distribution of echoes cannot be detected from the situation where each echo keeps its own measured HRTF filter. This property only fails when a late echo coming from the side emerges from the distribution. As seen before this simplification provides a reduction of the signal processing cost, but not of the control processing cost (in the case of time-varying room simulation).

- when the energy ratio between the direct sound and the group of early reflexions is +3dB the distorsions induced by a specific average filter or the diffuse field filter could not be noticed, irrespective of the temporal and spatial distributions of the early echoes. In that case the use of a diffuse field filter is consistent and provides a substantial reduction both of both signal and control processing.

- when the "Center direction" of echoes is backward, the diffuse field filter could not be noticed, irrespective of the energy distribution. This is consistent with the hypothesis of backwards reflexions being subjectively integrated within the late reverberation field.

In other terms one could roughly say that, except for frontal or side echo distribution emerging from the sound decay, it is possible to filter the series of early echoes with a diffuse field filter (just as for the late reverberation section), and, otherwise, with an average filter. These observations tend to validate the simplifications proposed for reproducing the directional information of the early echoes. Furthermore, the results of the psychoacoustic experiments suggest the derivation of a reduced number of perceptual controls which would describe the spatial distribution of early echoes in terms of a simple directivity representation.

5. Conclusion

The architecture of a real time virtual acoustics processor has been described. The current binaural prototype relies on a generic room effect that allows the control of a reduced number of perceptivaly relevant attributes. For the real time reproduction of the spatial information conveyed by the different sections of the room response, some simplifications of the binaural processing have been proposed. These simplifications consist in the parametric modelling of the HRTFs, and in the derivation of specific binaural filters for the group of early echoes and for the late reverberated process. Psychoacoustic experiments have been undertaken to validate this approach and tend to confirm the possibility of significantly reducing the total signal and control processing cost.

References

[Beg92] D.R. BEGAULT, "Binaural auralization and perceptual veridicality", Proc. 93rd AES Conv., San Francisco, preprint 3421 (M-3), 1992.

[Blau83] J. BLAUERT, "Spatial Hearing : the psychophysics of human sound localization", Cambridge MIT Press, 1983.

[Coop89] D.H. Cooper, J.L. Bauck, "Prospects for transaural recording", J. Audio Eng. Soc. 37(1/2): 3-19, 1989.

[Gerz76] M.A. Gerzon, "Unitary (Energy preserving) multichannel networks with feedbacks", in Electronics Letters, V, 12-11, 1976.

[Jot91] J.M. Jot, A. Chaigne "Digital delay networks for designing artificial reverberators", Proc. 90th A.E.S. Conv., Paris, preprint 3030 (E-2), 1991.

[Jot92a] J.M. Jot, "An analysis/synthesis approach to real-time artificial reverberation", Proc. IEEE ICASSP, San Francisco (paper n° 675), March 1992.

[Jot92b] J.M. Jot, A. Chaigne, "Spatialisation artificielle audio-numérique", French patent n° 92 02528, awarded March 1992.

[Jot92c] J.M. Jot, "Etude et réalisation d'un spatialisateur de sons par modèles physiques et perceptifs", doctoral dissertation, Télécom Paris, 1992.

[Jul92] J.P. Jullien et al., "Some results on the objective characterisation of room acoustical quality in both laboratory and real environments", Proc. Inst. of Acoustics, XIV(2), Birmingham, 1992.

[Jul93] J.P. Jullien, E. Kahle, M. Marin, O. Warusfel, G. Bloch, J.M. Jot, "Spatializer: a perceptual approach", Proc. 94th AES Convention, Berlin, preprint 3465, 1993. (Available from jmjot@ircam.fr, warusfel@ircam.fr, kahle@ircam.fr)

[Lav89] C. Lavandier, "Validation perceptive d'un modèle objectif de caractérisation de la qualité acoustique des salles", doctoral dissertation, Univ. du Maine, Le Mans, Juin 1989.

[Mar93] M. MARIN, rapport CNET NT/LAA/TSS, 1993.

[Mart91] J. MARTIN et al., "Binaural simulation of concert halls : a new approach for the binaural reverberation process", to be published.

[MEHR 77] S. MEHRGARDT, V. MELLERT, "Transformation characteristics of the external human ear", J. Acou. Soc. Am. vol 61(6) : 1567-1576, 1977.

[Mein93] M. Mein, "Perception de l'information binaurale liée aux réflexions précoces dans une salle. Application à la simulation de la qualité acoustique", Mémoire de DEA, Univ. du Maine, Le Mans, Septembre 1993.

[Moor79] J.A. Moorer, "About this reverberation business", Computer Music Journal 3(2): 13-18, 1979.

[Opp75] A.V. Oppenheim, R.W. Shafer, "Digital Signal Processing", Prentice Hall, 1975.

[Pers89] A. Persterer, "A very high performance digital audio processing system", Proc. 13th ICA, Belgrade, 1989.

[Pers91] A. Persterer, "Binaural reproduction of an 'ideal control room' for headphone reproduction", Proc. 90th AES Convention, Paris, preprint 3062, 1991.

[Pon92] F. Poncet, "Simulation de localisation de sources sonores dans l'espace",

rapport Télécom Paris, Dpt Signal, 1992.

[Schr62] M.R. Schroeder "Natural sounding artificial reverberation", J. Audio Eng. Soc. 10(3): 219-223, 1962.

[Smith85] J.O. Smith, "A new approach to digital reverberation using closed waveguide networks", Proc. Int. Computer Music Conference: 47-63, 1985.

[Stau82] J. Stautner, M. Puckette, "Designing multi-channel reverberators", Computer Music Journal 6(1): 52-65, 1982.

[War90] O. Warusfel, "Etude des paramètres lies a la prise de son pour les applications d'acoustique virtuelle", Proc. 1rst French Congress on Acoustics, vol. 2: 877-880, 1990.