Résumé |
Query by example retrieval of environmental sound recordings is a research area with applications to sound design, music composition and automatic suggestion of metadata for the labeling of sound databases. Retrieval problems are usually composed of successive feature extraction (FE) and similarity measurement (SM) steps, in which a set of extracted features encoding important properties of the sound recordings are used to compute the distance between elements in the database. Previous research has pointed out that successful features in the domains of speech and music, like MFCCs, might fail at describing environmental sounds, which have intrinsic variability and noisy characteristics. We present a set of novel multiresolution features obtained by modeling the distribution of wavelet subband coefficients with generalized Gaussian densities (GGDs). We define the similarity measure in terms of the Kullback-Leibler divergence between GGDs. Experimental results on a database of 1020 environmental sound recordings show that our approach always outperforms a method based on traditional MFCC features and Euclidean distance, improving retrieval rates from 51% to 62%. |