Alexander Kain (

Computer Science & Electrical Engineering (CSEE)
Center for Spoken Language Understanding (CSLU)
Institute on Development & Disability (IDD)
School of Medicine (SOM)
Oregon Health & Science University (OHSU)

ORCID 0000-0001-5807-9311



Research Support




CS 627 - Data Science Programming

Credits: 3
This course represents a best-of compilation of concepts, practices, and R- and python-based software libraries (all free, open-source, and unrestricted) that allow for a relatively rapid, straight-forward, and easy-to-maintain implementation of new ideas and scientific questions. Students will gain awareness and initial working knowledge of some of the most fundamental computational tools for performing a wide variety of academic research. As such, it will focus on providing breadth instead of depth, which means that for each concept we will talk about motivation, key concepts, and concrete usage scenarios, but without mathematical background or proofs, which can be acquired in more specialized classes. In this class we will: use R for data exploration and visualization, write programs in python, perform numeric tasks using numpy and scipy, analyze data using pandas, discuss audio and image processing using scipy.signal and scikit-image, apply machine learning algorithms using scikit-learn, visualize data using matplotlib and pyqtgraph, use QT to build graphical user interfaces, learn how to version control files with git, address performance issues via compilation/profiling/parallelization tools, and much more.

EE 658 - Speech Signal Processing

Credits: 3
Speech systems are becoming commonplace in today's computer systems and Augmentative and Alternative Communication (AAC) devices. Examples are speech recognition systems and Text-to-Speech synthesis systems. This course will introduce the fundamentals of the underlying speech signal processing that enables such systems. Topics include speech production and perception by humans, frequency transforms, filters, linear predictive features, pitch estimation, speech coding, speech enhancement, and prosodic speech modification.

CS 653 - Speech Synthesis

Credits: 3
This course will introduce students to the problem of synthesizing speech from text input. Speech synthesis is a challenging area that draws on expertise from a diverse set of scientific fields, including signal processing, linguistics, psychology, statistics, and artificial intelligence. Fundamental advances in each of these areas will be needed to achieve truly human-like synthesis quality and advances in other realms of speech technology (like speech recognition, speech coding, speech enhancement). In this course, we will consider current approaches to sub-problems such as text analysis, pronunciation, linguistic analysis of prosody, and generation of the speech waveform. Lectures, demonstrations, and readings of relevant literature in the area will be supplemented by student lab exercises using hands-on tools.

CS 606 - Computational Approaches to Speech and Language Disorders

Credits: 3
This course covers a range of speech and language analysis algorithms that have been developed for measurement of speech or language based markers of neurological disorders, for the creation of assistive devices, and for remedial applications. Topics will include introduction to speech and language disorders, robust speech signal processing, statistical approaches to pitch and timing modeling, voice transformation algorithms, speech segmentation, and modeling of disfluency. The class will use a wide array of clinical data, and will be closely tied to several ongoing research projects.

Peer-reviewed Publications


Speech Intelligibility is the degree to which listeners can understand a speech signal's message. Historically, the specific acoustic sources of intelligibility are poorly understood, and automatic approaches to modify the degree of intelligibility were limited. We invented a hybridization approach that allows for precisely measuring the degree of contribution of one or more acoustic features to speech intelligibility. We applied this approach to find the most relevant acoustic features that cause the intelligibility improvement in clearly-spoken typical and dysarthric speech. This allows a principled study of different remedial strategies. We also created algorithms that automatically improve the intelligibility of dysarthric or conversational speech signals, using approaches from speech analysis, machine learning, and speech synthesis. These algorithms may be instrumental for next-generation hearing- and speaking-aids.

Text-to-Speech Synthesis (TTS)

Text-to-Speech (TTS) Synthesis is the process of generating human speech artificially from textual input. Although TTS systems are becoming more and more commonplace, many challenges remain to produce natural-sounding, meaningful output. We have created algorithms that significantly reduce audible artifacts in the synthesis output, that improve the naturalness of the intonation contour, and that allow remarkable data compression of acoustic inventories.

Voice Conversion

Voice Conversion modifies a source speaker's utterance to sound as if a target speaker had spoken it. Its uses include entertainment and security applications, and, most importantly, adaptation of text-to-speech systems' voices to new speakers, especially benefitting individuals who depend on speech-generating devices for communication. Despite continuous research in the field, speech quality and mimic accuracy is still insufficient for everyday usage. We have advanced the state of the art by researching novel approaches, including the use of joint-density Gaussian mixture models and semi-supervised learning with deep autoencoders and deep neural networks.


Coarticulation refers to the phenomenon in which a conceptually isolated speech sound becomes more similar to a preceding or following speech sound. Modeling coarticulation in speech has been largely limited to short sequences and/or limited phonetic context. We introduce a methodology for modeling both formant frequencies and bandwidth in continuous speech. Applications of such a model include improved formant tracking, characterization of conversational vs. clear speech, and detection of typical vs. disordered speech.

Sleep Apnea




Technical Reports

Audio Demos