Audio demos: Hybrid generalized spectral subtraction + perceptual wavlet denoising + quantile based noise tracking
1) Dual EKF Noise Reduction
Speech corrupted by several different classes of noise is enhanced by the Dual EKF algorithm using a 10-4-1 MLP neural network. The speech is sampled at 8kHz (16bps) and processed in 64ms Hamming windowed frames with 87.5% overlap.
For the first set of experiments, the clean speech signal was available to calculate SNRs and known noise variances and non-recurrent (truncated) derivatives were used.
- white stationary noise
- pink stationary noise
- white bursting noise
- cellular phone noise
For the second set of experiments, real world data were used. The Dual EKF was used to clean up noisy recordings of speech signals where no clean reference was available. The noise variances were estimated online and full recurrent derivatives were used. The same network architecture as in the first set of experiments was used. We subjectively compared the quality of the Dual EKF enhanced speech with that obtained by using several industry standard noise suppression / speech enhancement algorithms, like spectral subtraction (SS).
- car phone recording (male)
transcription : "driving on the border turnpike, about 55-60 miles an hour, with the left window open. I'm holding telephone in my..."
- car phone recording (female)
transcription : "this message is recorded while driving on highway... uh... sixty-five..."
- PASSS seminar recording
transcription : "now, a semiconductor on the other hand: if he had asked me silicon..."
- OV-10A cockpit recording
2) Monaural Blind Signal Separation
A proof-of-concept for separating two speech signals when only a single recording of the
combined signals is available. The models (10-4-1 MLP neural networks) were trained on clean speech. The speech is sampled at 8kHz (16bps) and processed in 64ms Hamming windowed windows with 87.5% overlap.
3) Noise Regularized Adaptive Filtering (NRAF)
Speech corrupted by several different classes of noise is enhanced by the NRAF algorithm. The speech is sampled at 8kHz (16bps) and processed in 600 point Hamming windowed frames with 75% overlap. Inside these frames, the speech was processed using a 19-5-1 MLP neural network operating on a 25 point filter window with a KLT embedding dimension of 19.
- white stationary noise
- colored factory noise
- colored F16-cockpit noise
4) Dual EKF and NRAF front-ends for speech recognition
The Dual EKF and NRAF algorithms were used to clean up speech before it was passed
to a speech recognition system which was trained on clean speech.The data set
consisted of zip-code and address fragments spoken and recorded over a telephone channel.
These speech samples were corrupted by additive white noise (SNR=6dB). The CSLU
Toolkit's digit recognizer was used for the recognition engine. The NRAF and Dual
EKF front-ends were compared to a spectral subtraction (SS) front-end as well as
the speech enhancement system used in the industry standard IS-718 VBR-CDMA system.
||Correct Words (%)
||Correct Sentences (%)