Abstract:
In order to alleviate the limitation that the existing speaker identification methods are sensitive to noisy and environmental sounds.A novel robust text-independent speaker identification approach using single training sample is proposed. In such method, the main frequency components of an acoustic signal are determined in time-frequency domain, and then their local distributions and variations in time-frequency domain are obtained and regarded as the acoustic local features. These local features are not only robust to white noise and pink noise, and invariant to the intensity of the acoustic signal, but also reflect a person′s inherent phonation characteristic. A Bayesian decision classifier for these acoustic local features have been introduced. Experimental results on speech databases in English and Chinese demonstrate that the proposed approach can implement speaker identification based on single training sample, and yields a better performance in terms of the correct classification percentages compared with the conventional acoustic features such as linear predictive coding cepstral(LPCC) coefficients and mel-frequency cepstral coefficients(MFCC). It is also shown that the proposed approach yields significantly high tolerances to white noise, pink noise and environmental sounds.