Oregon Health & Science University | Center for Spoken Language Understanding
OHSU

Alexander Kain

Center for Spoken Language Understanding (CSLU)

Institute on Development & Disability (IDD)

School of Medicine (SOM)

Oregon Health & Science University (OHSU)

3181 S. W. Sam Jackson Park Road
Portland, Oregon 97239-3098
Email: kaina at ohsu edu
Phone: (503) 349-3750
Fax: (503) 346-3754

portrait

Positions

  • Oregon Health & Science University, Portland, OR
    Associate Professor, 2014-present
    Assistant Professor, 2007-2014
    Senior Research Associate, 2005-2007
  • BioSpeech, Inc., Portland, OR
    Chief Scientist, 2005-present
  • Sensory, Inc., Santa Clara, CA
    Lead Speech Synthesis Technologist, 2001-2008
  • AT&T Research Labs, Florham Park, NJ
    Visiting Researcher, 1999
  • Reviewer / Guest Editor for: Journal of the Acoustical Society of America (JASA); Computer, Speech, and Language; Journal of Speech, Language, and Hearing Research (JSLHR); IEEE Journals; scientific conferences such as Interspeech; National Science Foundation (NSF) proposals.

Education

Research Support

Current

Completed

  • 2011/04/01-2012/03/31: National Institute of Health 5R42DC008712, "User Adaptation of AAC Device Voices - Phase 2", PI: Klabbers (BioSpeech). Developing and evaluating voice transformation and prosody modification technologies to customize synthetic voices in AAC devices, mimicking the individual user's pre-morbid speech.
  • 2011/03/01-2013/03/31: National Institute of Health 1R43DC011706-01, "SBIR Phase I: Computerized System for Phonemic Awareness Intervention", PI: Connors (BioSpeech). This grant aims to develop and evaluate a play-and-drag-and-drop audio-visual interface for analyzing and sequencing phonemes in words to help children build the phonemic and phonological awareness foundational skills necessary for literacy.
  • 2009/09/01-2013/08/31: National Science Foundation IIS-0915754, "RI: Small: Modeling Coarticulation for Automatic Speech Recognition", PI: Kain (OHSU). Performing automatic speech recognition (ASR) using the Asynchronous Interpolation Model (AIM) framework. By decomposing the input speech signal into basis vectors and weights, we search for phonemic basis vectors and weights that yield the highest-probability match to the input signal.
  • 2009/07/15-2012/06/30: National Science Foundation IIS-0905095, "HCC: Automatic detection of atypical patterns in cross-modal affect", PI: van Santen (OHSU). The long term goal is to build interactive, agent based systems for (1) remediation of poor affect communication and (2) diagnosis of the underlying neurological disorders based on analysis of affective signals.
  • 2009/07/17-2012/06/30: National Institute of Health 5R21DC010035, "Quantitative Modeling of Segmental Timing in Dysarthria", PI: van Santen (OHSU). The project seeks to apply a quantitative modeling framework to segment durations in sentences produced by speakers with a variety of neurological diagnoses and dysarthrias.
  • 2008-2009: Nancy Lurie Marks Family Foundation, "In Your Own Voice: Personal AAC Voices for Minimally Verbal Children with Autism Spectrum Disorder", PI: van Santen (OHSU). Adapted a text-to-speech voice to sound like a child's voice.
  • 2007/09/01-2011/08/31: National Science Foundation IIS-0713617, "HCC: High-quality Compression, Enhancement, and Personalization of Text-to-Speech Voices", PI: Kain (OHSU). Developed Text-to-Speech technologies that focus on elimination of concatenation errors, and accurate speech modifications in the areas of coarticulation, degree of articulation, prosodic effects, and speaker characteristics, using an asynchronous interpolation model.
  • 2007/01/01-2008/06/30: National Institute of Health 1R41DC008712, "User Adaptation of AAC Device Voices - Phase 1", PI: van Santen (BioSpeech). Developed and evaluated voice transformation and prosody modification technologies to customize synthetic voices in AAC devices, mimicking the individual user's pre-morbid speech.
  • 2006/09/01-2008/03/31: National Institute of Health 1R41DC007240, "Voice Transformation for Dysarthria - Phase 1", PI: van Santen (BioSpeech). Developed software that transforms speech compromised by dysarthria into easier-to understand and more natural-sounding speech. The software resides on a wearable computer, with headset microphone input and powered speaker or line output.
  • 2005/01/10-2010/12/31: National Institute of Health 5R01DC007129, "Expressive crossmodal affect integration in Autism", PI: van Santen (OHSU). This study performed a comprehensive analysis of crossmodal integration of affect expression in ASD.
  • 2005/01/01-2006/06/30: National Science Foundation IIP-0441125, "STTR Phase 1: Small Footprint Speech Synthesis", PI: Kain (BioSpeech). Created and evaluated speech compression technologies for concatenative text-to-speech synthesizers.
  • 2001/10/01-2005/09/30: National Science Foundation IIS-0117911, "Making Dysarthric Speech Intelligible", PI: van Santen (OHSU). Developed new algorithms that enable dysarthric individuals to be more easily understood by the general population.

Courses

CS 506/606 - Special Topics: Speech Signal Processing

Credits: 3
Description: Speech systems are becoming more and more commonplace in today's computer systems. Examples are speech recognition systems and Text-to-Speech synthesis systems. This course will introduce the fundamentals of the underlying speech signal processing that enables such systems. Topics include speech production and perception by humans, frequency transforms, filters, linear predictive features, pitch estimation, speech coding, speech enhancement, and prosodic speech modification.

CS 553/653 - Speech Synthesis

Credits: 3
Description: This course will introduce students to the problem of synthesizing speech from text input. Speech synthesis is a challenging area that draws on expertise from a diverse set of scientific fields, including signal processing, linguistics, psychology, statistics, and artificial intelligence. Fundamental advances in each of these areas will be needed to achieve truly human-like synthesis quality and advances in other realms of speech technology (like speech recognition, speech coding, speech enhancement). In this course, we will consider current approaches to sub-problems such as text analysis, pronunciation, linguistic analysis of prosody, and generation of the speech waveform. Lectures, demonstrations, and readings of relevant literature in the area will be supplemented by student lab exercises using hands-on tools.

CS 506/606 - Special Topics: Computational Approaches to Speech and Language Disorders

Credits: 3
Description: This course covers a range of speech and language analysis algorithms that have been developed for measurement of speech or language based markers of neurological disorders, for the creation of assistive devices, and for remedial applications. Topics will include introduction to speech and language disorders, robust speech signal processing, statistical approaches to pitch and timing modeling, voice transformation algorithms, speech segmentation, and modeling of disfluency. The class will use a wide array of clinical data, and will be closely tied to several ongoing research projects.

Peer-reviewed Publications

Intelligibility

Text-to-Speech Synthesis (TTS)

Voice Conversion

Miscellaneous

Abstracts

  • A. Kain, "Speech transformation: Increasing intelligibility and changing speakers", Journal of the Acoustical Society of America, 126(4):2205 (2009).
  • J.P. Hosom, A. Kain, and B. Bush, "Towards the recovery of targets from coarticulated speech for automatic speech recognition", Transactions of the IRE Professional Group on Audio, 130(4):2407 (2011).

Patents

  • J. van Santen and A. Kain, OHSU. System and Method for Compressing Concatenative Acoustic Inventories for Speech Synthesis.
  • A. Kain and Y. Stylianou, AT&T Research Laboratories. Stochastic Modeling Of Spectral Adjustment For High Quality Pitch Modification.

Technical Reports

Audio Demos