|
Automatic Speech Recognition & Hidden Markov Models Course No: 520.678, Summer 2008 Zak Shafran Biomedical Computer Science Oregon Health & Science University Assignments Course Information Lectures Textbooks Schedule Links |
![]() Intimity by Jan Koblasa, image from Statistical Methods for Speech Recognition, Frederick Jelinek, MIT Press |
There will be no final exam. Instead, the course requires a final project of interest to student, chosen in consultation with the instructor. The project requires a written report and a final presentation. In most cases, the data, software toolkit, and key components for the project will be made available. The students will also get an opportunity to present papers related to the topics covered under the syllabus and related to their project.
| Computer Speech Processing | Overview of the course; a brief history and progress in three decades of research. | ||||||||||||
| Speech Sounds | Physiology of speech production; a model for speech production; types of sounds. | ||||||||||||
| Feature Extraction | Feature extraction; linear predictive coding; cepstral coefficients. | ||||||||||||
| Statistical Framework for ASR | Classification of static vectors; popular classifiers in machine learning; Bayes decision rule | ||||||||||||
| Sequences Matching | Template matching; dynamic programming; dynamic time warping. | ||||||||||||
| Hidden Markov Models | From DTW to HMM; EM algorithm for HMM; Viterbi algorithm. | ||||||||||||
| Decision Tree Clustering | Prounciation modeling; context-dependent models; clustering distributions; decision tree based state clustering. | ||||||||||||
| Training Acoustic Models | Increasing the complexity in steps; Context-independent to context-dependent acoustic models; Gaussian mixture splitting; multistage search. | ||||||||||||
| Language Models | Types of language
models, estimation procedures, techniques for adaptation
|
Finite State Machines | Basic operations;
representing components of ASR as FSMs; simplifying and improving the
efficiency of search using determinization and minimization. Slides
from an ICSLP tutorial by Mohri and Riley. |
Speaker Adaptation & Normalization |
Bridging the gap between SI and SD systems; types of adaptation -
speaker adaptive training and speaker adaptation; MLLR; constrained
MLLR; regression class; MAP; Vocal tract length
normalization |
Confidence, Consensus, etc | Blame
assignment, confidence estimation, consensus decoding, re-ranking,
Bayes minimum risk |
Discriminative Training | Need for
discriminative training; Maximum mutual information estimation
(conditional maximum likelihood); extended Baum-Welch
Algorithm. |
Linear Transforms | Overview of different
linear transforms applied to the observation models, including HLDA,
HDA, FAC, FACILT, SPAM, EMLLT, etc |
Sequence Classification | Utterance
classification (language or speaker recognition); Fisher kernel;
rational kernels; affect recognition | |
| Lectures | Tue/Thu 11:00 -- 12:30 pm |
| Venue | Central 123 in west campus, video conferenced to BICC 131B in Marquam Hill campus |
| Office hours | Central 123, Thu 12:30pm -- 1:30pm, or request for appointment by email |