Alexander Kain
|
Center for Spoken Language Understanding (CSLU) Division of Biomedical Computer Science Department of Science & Engineering School of Medicine Oregon Health & Science University (OHSU) 20000 NW Walker Road Beaverton, Oregon 97006 Email: kaina at ohsu edu Phone: (503) 748-1539 Fax: (503) 748-1306 |
|
Professional Positions
- Assistant Professor, 2007-present
Oregon Health & Science University, Portland, OR - Chief Scientist, 2005-present
BioSpeech, Inc., Portland, OR - Lead Speech Synthesis Technologist and Consultant, 2001-present
Sensory, Inc., Santa Clara, CA - Visiting Researcher, 1999
AT&T Research Labs, Florham Park, NJ
Education
- Postdoctoral Training, 2002-2005
OGI School of Science & Engineering, Portland, OR - Ph.D. in Computer Science and Engineering, 2001
Oregon Graduate Institute, Portland, OR - B.A. in Computer Science and B.A. in Mathematics, 1995
Rockford College, Rockford, IL
Research Interests
- Quantitative assessment and transformation of clear and conversational speech, with the aim of advancing hearing-aid performance (without extra noise: conversational, clear prosody and conversational spectrum, conversational prosody and clear spectrum, clear; with multi-talker background noise: conversational, clear prosody and conversational spectrum, conversational prosody and clear spectrum, clear)
- Transformation of aphonic speech to improve intelligibility and acceptability (aphonic speech, transformation)
- Transformation of dysarthric speech to improve intelligibility and perceived voice quality (dysarthric speech, transformation)
- Increasing spectral control in concatenative synthesizers to eliminate concatenation errors (baseline, formant + spectral-band + time-domain crossfading)
- Representing acoustic inventories of Text-to-Speech systems with an asynchronous interpolation model, allowing high rates of compression, elimination of concatenation errors, and speaker transformation (compression: original, compression with AIM coder @ 3.4kbps, compression with speex coder @ 3.4kbps for comparison; speaker transformation: transformation-1, transformation-2, transformation-3, transformation-4, transformation-5)
- Improving the accuracy and quality of speaker transformation systems and designing speaker recognizability perceptual tests (transformation of natural speech: source, transformation, target; transformation of TTS synthesis voices: source, transformation, target)
- Multi-purpose speech modification algorithms (original, resynthesis, slow to 300%, speed-up to 50%, lower pitch to 50%, raise pitch to 200%, scale formants to 80%, scale formants to 120%, mimic child, mimic man)
- Singing synthesis ("The Search is Over")
Research Support
Current
- National Institute of Health, "Expressive crossmodal affect integration in autism": The study aims to be the first to perform a comprehensive analysis of crossmodal integration of affect expression in ASD.
- National Institute of Health, "Quantitative Modeling of Segmental Timing in Dysarthria": The project seeks to apply a quantitative modeling framework to segment durations in sentences produced by speakers with a variety of neurological diagnoses and dysarthrias.
- National Science Foundation, "HCC: Automatic detection of atypical patterns in cross-modal affect: The long term goal is to build interactive, agent based systems for (1) remediation of poor affect communication and (2) diagnosis of the underlying neurological disorders based on analysis of affective signals.
- Nancy Lurie Marks Family Foundation, "In Your Own Voice": Personal AAC Voices for Minimally Verbal Children with Autism Spectrum Disorder: Adapt a text-to-speech voice to sound like a child's voice.
- National Science Foundation, "HCC: High-quality Compression, Enhancement, and Personalization of Text-to-Speech Voices": Developing Text-to-Speech technologies that focus on elimination of concatenation errors, and accurate speech modifications in the areas of coarticulation, degree of articulation, prosodic effects, and speaker characteristics, using an asynchronous interpolation model.
Completed
- National Institute of Health, "Voice Transformation for Dysarthria – Phase 1": Developed software that transforms speech compromised by dysarthria into easier-to understand and more natural-sounding speech. The software resides on a wearable computer, with headset microphone input and powered speaker or line output.
- National Institute of Health, "User Adaptation of AAC Device Voices – Phase 1": Developed and evaluated voice transformation and prosody modification technologies to customize synthetic voices in AAC devices, mimicking the individual user's pre-morbid speech.
- National Science Foundation, "STTR Phase 1: Small Footprint Speech Synthesis": Created and evaluated speech compression technologies for concatenative text-to-speech synthesizers.
- National Science Foundation, "Making Dysarthric Speech Intelligible": Developed new algorithms that enable dysarthric individuals to be more easily understood by the general population.
Courses
EE 530 / EE 630 - Speech Synthesis
Credits: 3
Description: This course will introduce students to the problem of synthesizing speech from text input. Speech synthesis is a challenging area that draws on expertise from a diverse set of scientific fields, including signal processing, linguistics, psychology, statistics, and artificial intelligence. Fundamental advances in each of these areas will be needed to achieve truly human-like synthesis quality and advances in other realms of speech technology (like speech recognition, speech coding, speech enhancement). In this course, we will consider current approaches to sub-problems such as text analysis, pronunciation, linguistic analysis of prosody, and generation of the speech waveform. Lectures, demonstrations, and readings of relevant literature in the area will be supplemented by student lab exercises using hands-on tools.
CS 506 / CS 606 - Computational Approaches to Speech and Language Disorders
Credits: 3
Description: This course covers a range of speech and language analysis algorithms that have been developed for measurement of speech or language based markers of neurological disorders, for the creation of assistive devices, and for remedial applications. Topics will include introduction to speech and language disorders, robust speech signal processing, statistical approaches to pitch and timing modeling, voice transformation algorithms, speech segmentation, and modeling of disfluency. The class will use a wide array of clinical data, and will be closely tied to several ongoing research projects.
Peer-reviewed Publications
Intelligibility
- A. Kain, J. van Santen. "Using Speech Transformation to Increase Speech Intelligibility for the Hearing- and Speaking-impaired". Proceedings of ICASSP, April 2009.
- A. Kain, A. Amano-Kusumoto, and J.-P. Hosom. "Hybridizing Conversational and Clear Speech to Determine the Degree of Contribution of Acoustic Features to Intelligibility". Journal of the Acoustical Society of America, Volume 124, Issue 4, October 2008, Pages 2308-2319.
- A. Kusumoto, A. Kain, P. Hosom, and J. van Santen. "Hybridizing Conversational and Clear Speech". Proceedings of Interspeech, August 2007.
- A. Kain, J. Hosom, X. Niu, J. van Santen, M. Fried-Oken, J. Staehely. "Improving the Intelligibility of Dysarthric Speech". Speech Communication, Volume 49, Issue 9, September 2007, Pages 743-759.
- X. Niu, A. Kain, J. van Santen. "A Noninvasive, Low-cost Device to Study the Velopharyngeal Port During Speech and Some Preliminary Results". Proceedings of Interspeech, September 2006.
- X. Niu, A. Kain, J. van Santen. "Estimation of the Acoustic Properties of the Nasal Tract during the Production of Nasalized Vowels". Proceedings of EUROSPEECH, September 2005.
- A. Kain, X. Niu, J. Hosom, Q. Miao, J. van Santen. "Formant Re-synthesis of Dysarthric Speech". Proceedings of 5th ISCA Workshop on Speech Synthesis, June 2004.
- J. Hosom, A. Kain, T. Mishra, J. van Santen, M. Fried-Oken, J. Staehely. "Intelligibility of modifications to dysarthric speech". Proceedings of ICASSP, May 2003.
Text-to-Speech Synthesis
- R. Moldover, A. Kain. "Compression of Line Spectral Frequency Parameters with Asynchronous Interpolation". Proceedings of ICASSP, April 2009.
- A. Kain, Q. Miao, J. van Santen. "Spectral Control in Concatenative Speech Synthesis". Proceedings of 6th ISCA Workshop on Speech Synthesis, August 2007.
- A. Kain and J. van Santen. "Unit-Selection Text-to-Speech Synthesis Using an Asynchronous Interpolation Model". Proceedings of 6th ISCA Workshop on Speech Synthesis, August 2007.
- E. Klabbers, J. van Santen, A. Kain. "The contribution of various sources of spectral mismatch to audible discontinuities in a diphone database". IEEE Transactions on Audio, Speech, and Language Processing Journal, Volume 15, Issue 3, Pages 949-956, March 2007.
- J. van Santen, A. Kain, E. Klabbers, and T. Mishra. "Synthesis of Prosody using Multi-level Unit Sequences". Speech Communication Journal, Volume 46, Issues 3-4, Pages 365-375, July 2005.
- J. van Santen, A. Kain, and E. Klabbers. "Synthesis by Recombination of Segmental and Prosodic Information". Speech Prosody 2004, March 2004.
- A. Kain and J. van Santen. "A speech model of acoustic inventories based on asynchronous interpolation". Proceedings of EUROSPEECH, Pages 329-332, August 2003.
- J. van Santen, L. Black, G. Cohen, A. Kain, E. Klabbers, T. Mishra, J. de Villiers, X. Niu. "Applications of computer generated expressive speech for communication disorders". Proceedings of EUROSPEECH, Pages 1657-1660, August 2003.
- A. Kain and J. van Santen. "Compression of Acoustic Inventories using Asynchronous Interpolation". Proceedings of IEEE Workshop on Speech Synthesis, Pages 83-86, September 2002.
- J. van Santen, J. Wouters, and A. Kain. "Modification of Speech: A Tribute to Mike Macon". Proceedings of IEEE Workshop on Speech Synthesis, September 2002.
- A. Kain and Y. Stylianou. "Stochastic modeling of spectral adjustment for high quality pitch modification". Proceedings of ICASSP, June 2000, vol. 2, pp. 949-952.
Speaker Transformation
- H. Duxans, A. Bonafonte, A. Kain, J. van Santen. "Including Dynamic and Phonetic Information in Voice Conversion Systems". Proceedings of ICSLP, October 2004.
- A. Kain. "High Resolution Voice Transformation". Ph.D. thesis, OGI School of Science & Engineering at Oregon Health & Science University, 2001. The data used in this thesis are available from the Linguistic Data Consortium as the VOICES Corpus.
- A. Kain and M. Macon. "Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction". Proceedings of ICASSP, May 2001.
- A. Kain and M. Macon. "Personalizing a speech synthesizer by voice adaptation". Third ESCA/COCOSDA International Speech Synthesis Workshop, November 1998, pp. 225-230.
- A. Kain and M. Macon. "Text-to-speech voice adaptation from sparse training data". Proceedings of ICSLP, November 1998, vol. 7, pp. 2847-50.
- A. Kain and M. Macon. "Spectral Voice Conversion for Text-to-Speech Synthesis". Proceedings of ICASSP, May 1998, vol. 1, pp. 285-288.
Miscellaneous
- J. House, A. Kain, and J. Hines. "ESP - Metaphor for learning: an evolutionary algorithm". Proceedings of GECCO 2000, Las Vegas, NV.
- S. Sutton, R. Cole, J. de Villiers, J. Schalkwyk, P. Vermeulen, M. Macon, Y. Yan, E. Kaiser, B. Rundle, K. Shobaki, P. Hosom, A. Kain, J. Wouters, D. Massaro, M. Cohen. "Universal speech tools: the CSLU Toolkit". Proceedings of ICSLP, November 1998, vol. 7, pp. 3221-24.
- N. Malayath, H. Hermansky, A. Kain and R. Carlson. "Speaker-Independent Feature Extraction by Oriented Principal Component Analysis". Proceedings of EUROSPEECH 1997.
Patents
- J. van Santen and A. Kain, OHSU. System and Method for Compressing Concatenative Acoustic Inventories for Speech Synthesis.
- A. Kain and Y. Stylianou, AT&T Research Laboratories. Stochastic Modeling Of Spectral Adjustment For High Quality Pitch Modification.