Alexander Kain (kaina@ohsu.edu)

Center for Spoken Language Understanding (CSLU)
Institute on Development & Disability (IDD)
School of Medicine (SOM)
Oregon Health & Science University (OHSU)

ORCID 0000-0001-5807-9311



Research Support




CS 506/606 - Research Programming

Credits: 1
This course will cover important software for quantitative research. The first unit will focus on the UNIX programming environment with a special emphasis on version control. The second unit will cover the Python programming language, focusing on libraries for efficient numeric computation.

CS 506/606 - Speech Signal Processing

Credits: 3
Speech systems are becoming more and more commonplace in today's computer systems. Examples are speech recognition systems and Text-to-Speech synthesis systems. This course will introduce the fundamentals of the underlying speech signal processing that enables such systems. Topics include speech production and perception by humans, frequency transforms, filters, linear predictive features, pitch estimation, speech coding, speech enhancement, and prosodic speech modification.

CS 553/653 - Speech Synthesis

Credits: 3
This course will introduce students to the problem of synthesizing speech from text input. Speech synthesis is a challenging area that draws on expertise from a diverse set of scientific fields, including signal processing, linguistics, psychology, statistics, and artificial intelligence. Fundamental advances in each of these areas will be needed to achieve truly human-like synthesis quality and advances in other realms of speech technology (like speech recognition, speech coding, speech enhancement). In this course, we will consider current approaches to sub-problems such as text analysis, pronunciation, linguistic analysis of prosody, and generation of the speech waveform. Lectures, demonstrations, and readings of relevant literature in the area will be supplemented by student lab exercises using hands-on tools.

CS 506/606 - Computational Approaches to Speech and Language Disorders

Credits: 3
This course covers a range of speech and language analysis algorithms that have been developed for measurement of speech or language based markers of neurological disorders, for the creation of assistive devices, and for remedial applications. Topics will include introduction to speech and language disorders, robust speech signal processing, statistical approaches to pitch and timing modeling, voice transformation algorithms, speech segmentation, and modeling of disfluency. The class will use a wide array of clinical data, and will be closely tied to several ongoing research projects.

Peer-reviewed Publications


Speech Intelligibility is the degree to which listeners can understand a speech signal's message. Historically, the specific acoustic sources of intelligibility are poorly understood, and automatic approaches to modify the degree of intelligibility were limited. We invented a hybridization approach that allows for precisely measuring the degree of contribution of one or more acoustic features to speech intelligibility. We applied this approach to find the most relevant acoustic features that cause the intelligibility improvement in clearly-spoken typical and dysarthric speech. This allows a principled study of different remedial strategies. We also created algorithms that automatically improve the intelligibility of dysarthric or conversational speech signals, using approaches from speech analysis, machine learning, and speech synthesis. These algorithms may be instrumental for next-generation hearing- and speaking-aids.


Coarticulation refers to the phenomenon in which a conceptually isolated speech sound becomes more similar to a preceding or following speech sound. Modeling coarticulation in speech has been largely limited to short sequences and/or limited phonetic context. We introduce a methodology for modeling both formant frequencies and bandwidth in continuous speech. Applications of such a model include improved formant tracking, characterization of conversational vs. clear speech, and detection of typical vs. disordered speech.

Text-to-Speech Synthesis (TTS)

Text-to-Speech (TTS) Synthesis is the process of generating human speech artificially from textual input. Although TTS systems are becoming more and more commonplace, many challenges remain to produce natural-sounding, meaningful output. We have created algorithms that significantly reduce audible artifacts in the synthesis output, that improve the naturalness of the intonation contour, and that allow remarkable data compression of acoustic inventories.

Voice Conversion

Voice Conversion modifies a source speaker's utterance to sound as if a target speaker had spoken it. Its uses include entertainment and security applications, and, most importantly, adaptation of text-to-speech systems' voices to new speakers, especially benefitting individuals who depend on speech-generating devices for communication. Despite continuous research in the field, speech quality and mimic accuracy is still insufficient for everyday usage. We have advanced the state of the art by researching novel approaches, including the use of joint-density Gaussian mixture models and semi-supervised learning with deep autoencoders and deep neural networks.




Technical Reports

Audio Demos