Résumé |
This paper proposes a new method to enhance the performance of non-negative tensor factorization (NTF), one of the most prevalent source separation techniques nowadays. The enhancement is mainly achieved by introducing weights on bin-wise NTF cost functions, which differentiates NTF target components from other components so that the target should be approximated more precisely than others. Assuming sources are distributed sparsely in a 2-D sound field, the target components approximating a target source are exclusively selected by a user, or from accompanying images by means of providing a spatial cue to an NTF framework. The spatial cue is given in a similar format to the well-known binaural feature, inter-channel level difference (IID). This helps incorporate the spatial cue into the system, since the similar features of this format can be easily calculated from every spectrogram bin. The weighting functions are designed taking into account the distance between the spatial cue and the calculated features. Namely, the largest values are assigned to the spectrogram bins where the features present the highest similarity to the spatial cue, and the value decreases in proportion to the distance between them. The method is evaluated in terms of separation quality, comparing the proposed algorithm to the conventional NTF technique, PARAFAC-NTF, as well as other source separation techniques. The evaluation results measured by the metric signal-to-distortion ratio (SDR), signal-to-interference ratio (SIR), and signal-to-artifact ratio (SAR) demonstrate the effectiveness of the new method, improved primarily by the weighting function and the initialization based on IID, while demonstrating a decrease in computational costs, a significant problem with NTF. |