next up previous contents
Next: Chapter Overview Up: Introduction Previous: Background

Model Structure

A fairly general characterization of noisy speech is given by the following model structure:
$\displaystyle y_k = h(x_k,n_k),$ (1) 

where $ x_k$ is the clean speech signal, $ h(\cdot)$ is the communications channel, and $ n_k$ is a noise process. The degraded speech signal is represented by $ y_k$. For generality, we assume that the communications channel can have a nonlinear effect on the speech. However, with the exception of a few cases ( e.g., nonlinear distortion due to switching on telephone networks, and noise amplification due to automatic gain control), the channel can usually be replaced by a linear convolution with impulse response $ h_k$. If the noise is additive, this yields:
$\displaystyle y_k = h_k * x_k + n_k,$ (2) 

where $ n_k$ now includes channel effects. In most applications, $ n_k$ is statistically independent of the speech signal $ x_k$.

We should note here that some of the methods covered in this chapter do not compensate for channel distortion, but are designed purely for removing the noise signal $ n_k$. This is appropriate for improving quality when it is assumed the channel was designed properly for clean speech; note that our ears are insensitive to small phase distortions or global spectral shifts. On the other hand, compensation of the channel may be critical to improve robustness in ASR systems. For example, the simple act of changing the recording microphone can drastically affect recognition accuracy.

Note that the model excludes multiple-microphone systems ($ y_k$ is a scalar), which employ beam-forming [1,2] and noise cancellation [3] techniques. Beam-formers adapt the gains of a microphone array to place nulls at the noise source, while noise-cancellers assume the availability of a separate reference to the noise signal. Although these methods can be extremely effective, they are restrictive in their requirements on the availability and placement of multiple microphones. This chapter considers only single-microphone methods.


next up previous contents
Next: Chapter Overview Up: Introduction Previous: Background   Contents