Résumé |
This paper presents some results on automatic characterisation
of musical and acoustic signals in terms of features attributed
to signal segments at various levels. These features describe some
of the musical and acoustical content of the sound and can be
used in applications such as intelligent sound processing,
retrieval of music and sound databases or music editing and labelling.
Three interdependent levels of segmentation are defined. They
correspond to different levels of signal attributes. The {\it source}
level classifies the nature of the source of sound into speech,
singing voice, instrumental sounds and various noises.
The {\it feature} level deals with characteristics such as silence/sound,
transitory/steady, voiced/unvoiced, harmonic, vibrato and so forth.
The last level is the segmentation into {\it notes} and {\it phones}.
A large set of features is first computed: derivative of fundamental
frequency and energy, voicing coefficient, measure of the inharmonicity
of the partials, spectral centroid, spectral ``flux'', etc.
Decision functions on the set of features have been built
and provide the segmentation marks. For research purposes,
a graphical interface has been designed to allow visualization
of these features, the results of the decisions, and the final result.
For the {\it source} level the mean and the variance of the features
are computed on sound segments of one second or more. Various
classification methods are used which are trained with data sets
collected by sampling radio broadcasts and movie sound tracks.
Segmentation starts with the {\it source} level. Information obtained
at a given level is propagated towards the other levels. For example,
in case of instrumental music and the singing voice, if vibrato is
detected at the {\it feature} level, amplitude and frequency of
vibrato are estimated and are taken into account for the {\it notes}
and {\it phones} level. The vibrato is removed from the fundamental
frequency trajectory, and the high frequencies of the signal are not
used in spectral flux computation.
A complete feature extraction and segmentation system is
demonstrated. Applications and results on various examples such as a
movie sound track are presented.
|