A very simple approach to using the noisy data stems from a very early linear speech enhancement technique. The idea is that if the noise is additive, then an adaptive linear predictor will not be able to predict the noise for time-lags beyond the correlation length of the noise [42,3]. For white noise, a linear one-step-ahead predictor trained on the noisy data can only learn to predict the signal, not the noise8.
Extending these ideas to neural predictors is straightforward. The network is simply adapted on-line where the input is a tapped delay-vector of noisy speech, and the target for training is the same signal time advanced. This is illustrated in Figure 14.8. Unfortunately, the use of neural predictors in this fashion is problematic because a neural network will start to model the noise process over a finite data segment. The advantage gained by potentially improving the prediction of speech is offset by the disadvantage of predicting the noise.
One possible way to reduce this problem is to limit the flexibility of the neural predictor (by choosing a smaller architecture) until it is unable to predict the noise, but can still predict the speech. This is done in , using minimum description length ideas to select the neural architecture. In general, on-line predictive approaches to speech enhancement have received less interest than more recent techniques. Unvoiced speech is problematic, and performance is also degraded for real-world noise sources (where the noise autocorrelation may be longer than the speech).