Kernel PCA and Pre-Image Iterations for Speech Enhancement

Publikation aus Digital

Christina Leitner

, 1/2013


In this thesis, we present novel methods to enhance speech corrupted by noise. All methods are based on the processing of complex-valued spectral data. First, kernel principal component analysis (PCA) for speech enhancement is proposed. Subsequently, a simplification of kernel PCA, called pre-image iterations (PI), is derived. This method computes enhanced feature vectors iteratively by linear combination of noisy feature vectors. The weighting for the linear combination is found by a kernel function that measures the similarity between the feature vectors. The kernel variance is a key parameter for the degree of de-noising and has to be set according to the signal-to-noise ratio (SNR). Initially, PI were proposed for speech corrupted by additive white Gaussian noise. To be independent of knowledge about the SNR and to generalize to other stationary noise types, PI are extended by automatic determination of the kernel variance for white and colored noise. This enables a setting of the kernel variance without prior knowledge about the SNR. For colored noise this setting is frequency-dependent.  PI are executed on feature vectors extracted from the spectral representation. Analysis of PI shows that the convergence behavior of the feature vectors reveals information about the signal content. We use this information to segment the spectral representation in speech and non-speech regions and derive a mask for musical noise suppression
 in enhanced speech as a post-processing step. We evaluate the proposed methods by listening, visual inspection of the spectrograms, by objective quality measures and the word accuracy of an automatic speech recognizer. Listening to the utterances and visual inspection of the spectrograms confirm the suppression of noise. No musical noise occurs, however, there is some residual noise around speech components. In terms of objective quality measures, the proposed methods achieve similar results as the generalized subspace method, spectral subtraction and the minimum mean-square error log-spectral amplitude estimator evaluated on speech corrupted by white noise. PI with automatic determination of the kernel variance achieve better results than the initial PI method. For colored noise, the performance of PI is better than the performance of the generalized subspace method, but weaker than the performance of the other two reference methods. In terms of automatic speech recognition, PI with automatic determination of the kernel variance achieve a performance superior to the generalized subspace method and similar to spectral subtraction, while the minimum mean-square error log-spectral amplitude estimator achieves higher recognition results.