Automatic Speech Recognition & Hidden Markov Models
Course No: 520.678, Summer 2008


Zak Shafran
Biomedical Computer Science
Oregon Health & Science University


Assignments
Course Information
Lectures
Textbooks
Schedule
Links

Intimity by Jan Koblasa, image from Statistical Methods for Speech
Recognition, Frederick Jelinek, MIT Press

Assignments

Assignment 1 (due July 21st)
Assignment 2 / Semi-Final Exam (due Aug 11th)

Course Information

This course aims to provide theoretical foundations and practical experience in computer speech processing and recognition. Many of the techniques and algorithms covered under the course are applicable to a variety of areas concerned with recognizing sequences. On completion of the course, students should be able to understand the basic principles of pattern recognition, gain knowledge of automatic speech recognition (ASR) system design, and the various trade-offs involved. It should also enable students to read and discuss technical papers in ASR, speech processing and pattern recognition.

There will be no final exam. Instead, the course requires a final project of interest to student, chosen in consultation with the instructor. The project requires a written report and a final presentation. In most cases, the data, software toolkit, and key components for the project will be made available. The students will also get an opportunity to present papers related to the topics covered under the syllabus and related to their project.


Lectures

Computer Speech Processing Overview of the course; a brief history and progress in three decades of research.
Speech Sounds Physiology of speech production; a model for speech production; types of sounds.
Feature Extraction Feature extraction; linear predictive coding; cepstral coefficients.
Statistical Framework for ASR Classification of static vectors; popular classifiers in machine learning; Bayes decision rule
Sequences Matching Template matching; dynamic programming; dynamic time warping.
Hidden Markov Models From DTW to HMM; EM algorithm for HMM; Viterbi algorithm.
Decision Tree Clustering Prounciation modeling; context-dependent models; clustering distributions; decision tree based state clustering.
Training Acoustic Models Increasing the complexity in steps; Context-independent to context-dependent acoustic models; Gaussian mixture splitting; multistage search.
Language Models Types of language models, estimation procedures, techniques for adaptation
Finite State Machines Basic operations; representing components of ASR as FSMs; simplifying and improving the efficiency of search using determinization and minimization. Slides from an ICSLP tutorial by Mohri and Riley.
Speaker Adaptation & Normalization Bridging the gap between SI and SD systems; types of adaptation - speaker adaptive training and speaker adaptation; MLLR; constrained MLLR; regression class; MAP; Vocal tract length normalization
Confidence, Consensus, etc Blame assignment, confidence estimation, consensus decoding, re-ranking, Bayes minimum risk
Discriminative Training Need for discriminative training; Maximum mutual information estimation (conditional maximum likelihood); extended Baum-Welch Algorithm.
Linear Transforms Overview of different linear transforms applied to the observation models, including HLDA, HDA, FAC, FACILT, SPAM, EMLLT, etc
Sequence Classification Utterance classification (language or speaker recognition); Fisher kernel; rational kernels; affect recognition

Textbooks

Note: For recently developed techniques, we will rely on selected papers, which will be provided in required readings.

Schedule

Lectures Tue/Thu 11:00 -- 12:30 pm
Venue Central 123 in west campus, video conferenced to BICC 131B in Marquam Hill campus
Office hours Central 123, Thu 12:30pm -- 1:30pm, or request for appointment by email

Links

Related Online Lectures, Journals & Conference Proceedings Relevant Software Tools & Resources

This page is maintained by Zak Shafran. Last updated on Jan 31, 2005.