Audio Demonstrations

Home | Demos | Research | Database and Tools | Publications | People | Links

Your browser must support JavaScript and it must be active for these demos to work correctly. For a complete description of the algorithms, see the research section.



Audio demos: Hybrid generalized spectral subtraction + perceptual wavlet denoising + quantile based noise tracking


1) Dual EKF Noise Reduction

Speech corrupted by several different classes of noise is enhanced by the Dual EKF algorithm using a 10-4-1 MLP neural network. The speech is sampled at 8kHz (16bps) and processed in 64ms Hamming windowed frames with 87.5% overlap.

For the first set of experiments, the clean speech signal was available to calculate SNRs and known noise variances and non-recurrent (truncated) derivatives were used.



For the second set of experiments, real world data were used. The Dual EKF was used to clean up noisy recordings of speech signals where no clean reference was available. The noise variances were estimated online and full recurrent derivatives were used. The same network architecture as in the first set of experiments was used. We subjectively compared the quality of the Dual EKF enhanced speech with that obtained by using several industry standard noise suppression / speech enhancement algorithms, like spectral subtraction (SS).




2) Monaural Blind Signal Separation

A proof-of-concept for separating two speech signals when only a single recording of the combined signals is available. The models (10-4-1 MLP neural networks) were trained on clean speech. The speech is sampled at 8kHz (16bps) and processed in 64ms Hamming windowed windows with 87.5% overlap.




3) Noise Regularized Adaptive Filtering (NRAF)

Speech corrupted by several different classes of noise is enhanced by the NRAF algorithm. The speech is sampled at 8kHz (16bps) and processed in 600 point Hamming windowed frames with 75% overlap. Inside these frames, the speech was processed using a 19-5-1 MLP neural network operating on a 25 point filter window with a KLT embedding dimension of 19.




4) Dual EKF and NRAF front-ends for speech recognition

The Dual EKF and NRAF algorithms were used to clean up speech before it was passed to a speech recognition system which was trained on clean speech.The data set consisted of zip-code and address fragments spoken and recorded over a telephone channel. These speech samples were corrupted by additive white noise (SNR=6dB). The CSLU Toolkit's digit recognizer was used for the recognition engine. The NRAF and Dual EKF front-ends were compared to a spectral subtraction (SS) front-end as well as the speech enhancement system used in the industry standard IS-718 VBR-CDMA system.

  Correct Words (%) Correct Sentences (%)
clean 96.4 85.8
noisy 59.2 21.4
SS 77.5 38.1
IS-718 67.3 29.2
NRAF 81.6 44.3
Dual EKF 82.2 52.9