John-Paul Hosom


Contact Information

Assistant Professor
Center for Spoken Language Understanding (CSLU)
Division of Biomedical Computer Science (BMCS)
Department of Science and Engineering (DSE)
School of Medicine (SOM)
Oregon Health & Science University (OHSU)

20000 N.W. Walker Road
Beaverton, OR 97006 USA
Telephone: (503) 748-1456
Fax: (503) 748-1306
E-mail: hosom at  cslu dot ogi dot  ed u
URL: http://www.cslu.ogi.edu/people/hosom


Research Interests

Automatic speech recognition, speech intelligibility, time alignment of phonemes, acoustic analysis of speech, assistive technology, machine learning.


Education
1994 - 2000 Ph.D. in Computer Science & Engineering Oregon Graduate Institute of Science & Technology (OGI), Center for Spoken Language Understanding (CSLU), Beaverton, OR, 2000. Thesis: Automatic Time Alignment of Phonemes Using Acoustic-Phonetic Information
1983 - 1987 B.S. in Computer and Information Science University of Massachusetts at Amherst, Amherst, MA, 1987


Professional Experience
2001 - Present Assistant Professor
Oregon Health & Science University (OHSU), OGI School of Science & Engineering, Center for Spoken Language Understanding, Beaverton, OR.
Research topics have included improving the intelligibility of dysarthric speech, a novel assistive device for persons with dysarthria, development of diagnostic markers for childhood apraxia of speech, new models of automatic speech recognition, automation of tests for Mild Cognitive Impairment (a precursor to Alzheimer's Disease), reliable estimation of fundamental frequency, stop burst detection, automatic phoneme alignment, and measuring the contribution of specific acoustic features to sentence-level speech intelligibility.
2000 - 2001 Post-Doctoral Research Associate
Oregon Graduate Institute's Center for Spoken Language Understanding, Beaverton, OR.
Research on measuring how well a phoneme has been pronounced (for language training), developing children's speech recognizers and an audio-visual corpus of children's speech, and creating a Brazilian Portuguese version of a software toolkit for developing spoken-language systems.
1994 - 2000 Research Assistant
Oregon Graduate Institute's Center for Spoken Language Understanding, Beaverton, OR.
Research on automatic time-alignment of phonemes in the speech signal and on improving accuracy of automatic speech recognition. Developed tools for speech display and annotation, and for training of hybrid Hidden Markov Model and Artificial Neural Network speech recognizers.
1989 - 1993 Researcher
Sumitomo Electric Industries, Ltd., Osaka, Japan.
Research and development of Japanese-language text-to-speech synthesis, with a focus on acoustic analysis of the speech signal.

Teaching and Advising
Course Title: Hidden Markov Models for Speech Recognition (CSE 552; ECE 580-HMM)
Description: The theory and implementation of Hidden Markov Models, applied to the task of automatic speech recognition.
Terms Taught: Winter, 2000; Spring, 2001; Spring, 2002; Spring, 2003; Spring, 2004; Spring, 2005; Spring, 2006
Course Title: The Structure of Spoken Language (CSE 551; ECE 580-SSL)
Description: An overview of speech in terms of phonetics, acoustic-phonetic patterns, and models of human speech perception and production.
Terms Taught: Fall, 2001; Fall, 2002; Fall, 2003; Fall, 2004; Fall, 2005
Ph.D. Student: Akiko Kusumoto
Thesis Topic:
Identifying the Contribution of Acoustic Features to Speech Intelligibility
Thesis Proposal: A number of studies have shown that the intelligibility of speech spoken deliberately clearly, referred to as "clear speech" or CLR speech, is higher than that of speech spoken during typical communication, referred to as "conversational speech" or CNV speech. Significant differences in the acoustic features of CLR speech, as compared to those of CNV speech, have been found in previous studies. However, little is known about the causal relationship between individual sets of acoustic features and speech intelligibility. Our long-term goal is to better understand and model those features that contribute to speech intelligibility for particular groups of listeners. The objective of this work is to identify specific acoustic features that contribute to the increased intelligibility of CLR speech over CNV speech, which we refer to as "relevant features," and to determine the degree of contribution of these features. We propose a hybridization algorithm that replaces a single feature or a combination of features of CNV speech with those of CLR speech, in order to examine, through perceptual testing, the relative contribution of these features to intelligibility. Hybridized (HYB) speech is the synthesized speech whose features consist of both CNV and CLR features. In a series of preliminary experiments, we confirmed that it is possible to obtain intelligibility of HYB speech that is higher than the intelligibility of CNV speech. We have also demonstrated that intelligibility levels of CNV vowels in consonant-vowel-consonant words can be improved to the level of CLR speech by applying the CLR vowel formants and durations to the CNV speech.

Grants
Title: "Making Dysarthric Speech Intelligible"
Sponsor: National Science Foundation (NSF)
Project Dates: 10/1/01 - 9/30/04
PIs: van Santen and Hosom
Project Goal: To transform the speech signal of a person with a motor speech impairment so that it becomes more intelligible to the untrained listener.
Title: "Toward Automatic Speech Recognition Without Viterbi Search"
Sponsor: Defense Advanced Research Projects Agency (DARPA)
Project Dates: 4/10/01 - 3/31/02
PIs: Hosom and van Santen
Project Goal: To challenge current thinking in the automatic-speech-recognition community by prototyping a high-risk, high-yield framework for speech recognition that is different from standard approaches.
Title: "ITR: Prosody Generation for Child Oriented Speech Synthesis"
Sponsor: National Science Foundation (NSF)
Project Dates: 8/15/02 - 9/30/07
PIs: van Santen, Black, Sproat, and Hosom
Project Goal: The goal of this project has been to address three issues in text-to-speech synthesis in order to improve the prosody of speech generated for children: (1) computation of abstract tags to identify regions requiring emphasis and phrasing, (2) determination of a realistic contour of the fundamental frequency, and (3) performing signal processing of speech units so that extreme distortions in pitch do not distort the audio quality of the units.
Title: "Phase II S43: Reading Remediation Using Computer Speech Recognition"
Sponsor: National Institutes of Health (NIH)
Project Dates: 9/01/02 - 2/28/04
PI: Steely
Project Goal: The primary objective of this project was to develop a software-based reading remediation tool that implements the "direct instruction" teaching model and uses computer speech recognition to evaluate pronunciation and timing. Results included the delivery of real-time word and phoneme recognition software, trained on children’s speech.
Role: Subcontractor
Title: "Pilot Study for Word Recognition of Children with Speech Delay"
Sponsor: Oregon Medical Research Foundation
Project Dates: 6/01/04 - 5/31/05
PI: Hosom
Project Goal: The major goal of this project was to analyze the speech of children who have been diagnosed with speech delay, to determine acoustic features of their observed pronunciation that may be used to identify their intended pronunciation. Results include software for accurate formant estimation and other speech parameters.
Title: "Automated Analysis of Spoken Story Recall Tests"
Sponsor: The Oregon Roybal Center for Aging, Technology, Education and Community Health (ORCATECH)
Project Dates: 8/01/05 - 7/31/06
PI: Roark
Project Goal: The major goal of this pilot study was to automate existing manual tests of Mild Cognitive Impairment based on verbal recall of stories. Automation involved automatic recognition of speech from elderly speakers, classification of speakers based on analysis of language, and language entropy measures. Entropy and pause-duration measures yielded encouraging results on the small amount of pilot data.
Role: Co-Investigator
Title: "Diagnostic Markers for Childhood Apraxia of Speech"
Sponsor: National Institutes of Health, NIDCD
Project Dates: 4/01/04 - 3/31/08
PI: Hosom
Project Goal: The major goal of this project is to develop automated diagnostic markers for childhood apraxia of speech. Research includes improvements to existing (manual) markers, development of new markers, and automatic combination of these markers for improved sensitivity and specificity.
Title: "Automated Test of Word Recognition - Phase II"
Sponsor: National Institutes of Health
Project Dates: 4/01/05 - 3/31/07
PI: Margolis
Project Goal: The major goal of this Phase II project is to automate hearing tests based on word recognition in noise. Automation involves automatic recognition of isolated words that may be acoustically very similar to target words.
Role: Subcontractor
Title: "Automation and Analysis of Standardized Verbal Tests for Mild Cognitive Impairment"
Sponsor: Oregon Alzheimer's Disease Center (OADC)
Project Dates: 4/01/06 - 3/31/07
PI: Hosom
Project Goal: The goal of this project is to improve robustness of the automatic analysis of verbal tests for mild dementia. The focus of this work is on a new model of unknown words or phrases.
Title: "Speech Supplemented Word Prediction Program"
Sponsor: National Institutes of Health
Project Dates: 8/01/06 - 7/31/09
PI: Jakobs
Project Goal: The goal of this research is to provide people with motor speech disorders (dysarthria) with a unique assistive-device access method that utilizes their speech. This project combines dysarthric speech recognition and word prediction techniques into a single access method.
Role: Investigator
Title: "Alzheimer's Disease Cooperative Study (ADCS): Instrument Protocol"
Sponsor: National Institutes of Health
Project Dates: 10/01/06 - 9/30/12
PI: Thal / Kaye
Project Goal: The goal of this research is to develop more efficient and sensitive home-based methods to capture meaningful cognitive decline and dementia in the elderly as outcomes for clinical trials aimed at primary prevention of AD. Existing neuropsychological exams, which typically involve simple spoken language tasks, will be automated using new algorithms for automatic word recognition designed to take advantage of constraints in the speech data.
Role: Investigator
Title: "OHSU BAIC: Technologies for Behavioral Assessment and Intervention"
Sponsor: Intel Corporation
Project Dates: 9/25/06 - 9/24/07
PI: Hayes
Project Goal: The major goals of this project are: to develop new technologies and algorithms for assessing neurological change through unobtrusive in-home technologies; to create a standards-based infrastructure for sharing of artifacts and data collected with such systems; and to establish a "living laboratory" of residential homes in which new technologies may be field tested on an ongoing basis. Existing neuropsychological exams, which typically involve simple spoken language tasks, will be automated using standard algorithms for automatic speech recognition.
Role: Investigator
Title: "DHB: Measuring Spoken Language Variability in Elderly Individuals"
Sponsor: National Science Foundation
Project Dates: 11/01/08 - 10/31/11
PIs: Roark, Hosom, Howieson, Kemper
Project Goal: The research objectives of this work are twofold: to develop and validate algorithms for extracting features that can be robustly derived automatically from variously elicited spoken language samples; and to contrast methods of elicitation in terms of the utility of the resulting spoken language sample for tracking
individual behavior via automatically extracted features.
Title: "RI: Small: Modeling Coarticulation for Automatic Speech Recognition"
Sponsor: National Science Foundation
Project Dates: 9/01/09 - 8/31/12
PI: Hosom
Project Goal: Despite the effective use of stochastic models, current ASR systems are often unable to sufficiently account for the large degree of variability observed in speech.  In many cases, this variability is not due to random factors, but is due to predictable changes in the speech signal.  These factors are currently modeled in order to generate speech via TTS, but they are not yet modeled in order to recognize speech, largely because of non-local dependencies.  We apply the Asynchronous Interpolation Model (AIM) used in TTS to the task of automatic speech recognition, by decomposing the speech signal into target vectors and weight trajectories, and then searching weight-trajectory and stochastic target-vector models for the highest-probability match to the input signal.


Book Chapters

  1. Hosom, J.P., "Computer Processing for Analysis of Speech Disorders." In Speech Sound Disorders in Children, R. Paul and P. Flipsen, Jr., (ed.).  San Diego: Plural Publishing, pp. 115-140, 2009.

  2. Hosom, J.P., "Automatic Speech Recognition." In Encyclopedia of Information Systems, H. Bidgoli (ed.). San Francisco: Academic Press, vol. 4, pp. 155-169, 2003.

Journal Articles

  1. Hosom, J.P., "Speaker-Independent Phoneme Alignment Using Transition-Dependent States," in Speech Communication, 51 (4), pp. 352-368, Apr. 2009.

  2. Kain, A., Kusumoto, A., and Hosom, J. P., "Hybridizing Conversational and Clear Speech to Determine the Degree of Contribution of Acoustic Features to Intelligibility," in Journal of the Acoustical Society of America, 124 (4), pp. 2308-2319, 2008.

  3. Kain, A. B., Hosom, J. P., Niu, X., van Santen, J. P. H., Fried-Oken, M., and Staehely, J., "Improving the Intelligibility of Dysarthric Speech," in Speech Communication, 49, pp. 743-759, 2007.

  4. Hosom, J.P., Shriberg, L., and Green, J. R., "Diagnostic Assessment of Childhood Apraxia of Speech Using Automatic Speech Recognition (ASR) Systems," in Journal of Medical Speech-Language Pathology, 12(4), pp. 167-171, 2004.

  5. Hosom, J.P., Cole, R.A, and Cosi, P., "Improvements in Neural-Network Training and Search Techniques for Continuous Digit Recognition," Australian Journal of Intelligent Information Processing Systems (AJIIPS), vol. 5, no. 4, pp. 277-284, Summer 1998.

  6. Hosom, J. P. and Yamaguchi. M., "Proposal and Evaluation of a Method for Accurate Analysis of Glottal Source Parameters", The Institute of Electronics, Information and Communication Engineers (IEICE) Transactions on Information and Systems, vol. E77-D, no. 10, pp. 1130-1141, Oct. 1994.

  7. Yamaguchi, M. and Hosom, J. P., "Development of a Rule-Based Speech Synthesizer Module for Embedded Use", The Institute of Electronics, Information and Communication Engineers (IEICE) Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. E76-A, no. 11, pp. 1990-1998, Nov. 1993.


Peer-Reviewed Conference Publications

  1. Amano-Kusumoto, A., Hosom, J. P., Shafran, I., "Classifying Clear and Conversational Speech Based on Acoustic Features," in Proceedings of InterSpeech, pp. 1735-1738, Sep. 2009.

  2. Amano-Kusumoto A., Hosom, J. P., "The Effect of Formant Trajectories and Phoneme Durations on Vowel Intelligibility," in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP) , pp. 4677-4680, Apr. 2009.

  3. Kusumoto, A., Kain, A. B., Hosom, J. P., and van Santen, J. P. H., "Hybridizing Conversational and Clear Speech," in Proceedings of InterSpeech, Antwerp, Belgium, Sep. 2007.

  4. Coulston, R., Klabbers, E., de Villiers, J., and Hosom, J. P., "Application of Speech Technology in a Home Based Assessment Kiosk for Early Detection of Alzheimer's Disease," in Proceedings of InterSpeech, Antwerp, Belgium, Sep. 2007.

  5. Roark, B., Hosom, J. P., Mitchell, M., and Kaye, J. A., "Automatically Derived Spoken Language Markers for Detecting Mild Cognitive Impairment," in Proceedings of the 2nd International Conference on Technology and Aging (ICTA), Toronto, Canada, Jun. 2007.

  6. Hosom, J.P., "F0 Estimation for Adult and Children’s Speech," in Proceedings of InterSpeech, Lisbon, Portugal, pp. 317-320, Sep. 2005.

  7. Vu, T.T., Nguyen, D.T., Luong, M.C., and Hosom, J.P., "Vietnamese Large Vocabulary Continuous Speech Recognition," in Proceedings of InterSpeech, Lisbon, Portugal, pp. 1689-1692, Sep. 2005.

  8. Duc, D. N., Hosom, J. P., and Luong, C. M., "HMM/ANN System for Vietnamese Continuous Digit Recognition." In Developments in Applied Artificial Intelligence, Lecture Notes in Artificial Intelligence 2718, Paul W. H. Chung, Chris Hinde, and Moonis Ali (ed.) Berlin: Springer-Verlag, pp. 481-486, 2003.

  9. Hosom, J. P., Kain, A. B., Mishra, T., van Santen, J.P.H., Fried-Oken, M., and Staehely, J., "Intelligibility of Modifications to Dysarthric Speech," in 2003 International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), Hong Kong, vol. I, pp. 878-881, Apr. 2003.

  10. Hosom, J. P., "Automatic Phoneme Alignment Based on Acoustic-Phonetic Modeling," in 2002 International Conference on Spoken Language Processing (ICSLP 2002), Boulder, Co., vol. I, pp. 357-360, Sep. 2002.

  11. Hosom, J.P. and Cole, R.A., "Burst Detection Based on Measurements of Intensity Discrimination," in 2000 International Conference on Spoken Language Processing (ICSLP 2000), Beijing, vol. IV, pp. 564-567, Oct. 2000.

  12. Cosi, P. and Hosom, J. P., "High Performance General Purpose Phonetic Recognition for Italian," in 2000 International Conference on Spoken Language Processing (ICSLP 2000), Beijing, vol. II, pp. 527-530, Oct. 2000.

  13. Cosi, P., Hosom, J. P., and Tesser, F., "High Performance Italian Continuous Digit Recognition," in 2000 International Conference on Spoken Language Processing (ICSLP 2000), Beijing, vol. IV, pp. 242-245, Oct. 2000.

  14. Shobaki, K., Hosom, J.P., and Cole, R.A., "The OGI Kids' Speech Recognizers and Corpus," in 2000 International Conference on Spoken Language Processing (ICSLP 2000), Beijing, vol. IV, pp. 258-261, Oct. 2000.

  15. van Santen, J., Macon, M., Cronk, A., Hosom, P., Kain, A., Pagel, V., and Wouters, J., "When Will Synthetic Speech Sound Human: Roles of Rules and Data," in 2000 International Conference on Spoken Language Processing (ICSLP 2000), Beijing, vol. III, pp. 402-409, Oct. 2000.

  16. Cosi, P. and Hosom, J.P., "HMM/Neural Network-Based System for Italian Continuous Digit Recognition," in Proceedings of the 14th International Congress of Phonetic Sciences (ICPhS), San Francisco, Aug. 1999.

  17. Sutton, S., Cole, R. A., de Villiers, J., Schalkwyk, J., Vermeulen, P., Macon, M., Yan, Y., Kaiser, E., Rundle, B., Shobaki, K., Hosom, P., Kain, A., Wouters, J., Massaro, D., and Cohen, M., "Universal Speech Tools: The CSLU Toolkit", in 1998 International Conference on Spoken Language Processing (ICSLP98), Sydney, Nov.-Dec. 1998, vol. 7, pp. 3221-3224.

  18. Hosom, J. P., Cosi, P., and Cole, R. A., "Evaluation and Integration of Neural-Network Training Techniques for Continuous Digit Recognition", in 1998 International Conference on Spoken Language Processing (ICSLP98), Sydney, Nov.-Dec. 1998, vol. 3, pp. 731-734.

  19. Hosom, J. P. and Cole, R. A., "A Diphone-Based Digit Recognition System using Neural Networks", in 1997 International Conference on Acoustics, Speech, and Signal Processing (ICASSP 97), vol. 4, pp. 3369-3372, Apr. 1997.

  20. Yamaguchi, M. and Hosom, J. P., "Development of a Rule-Based Speech Synthesizer Module for Embedding and Its Power Control," in Proceedings of the Spring 1993 Meeting of the Acoustical Society of Japan, pp. 147-148, May 1993.

  21. Hirose, K., Asano, T., Asano, Y., Fujisaki, H., Yamaguchi, M., and Hosom, J. P., "Rule-Synthesis Using Terminal Analog Speech Synthesizer with Configuration of Multiple Cascade Circuits," in Proceedings of the Fall 1992 Meeting of the Acoustical Society of Japan, pp. 317-318, Apr. 1992 (in Japanese).

  22. Hosom, J. P. and Yamaguchi, M., "A Comparison of AbS and AIF Analysis of Glottal Source Parameters", in Proceedings of the Spring 1992 Meeting of the Acoustical Society of Japan, 1-2-5, pp. 215-216, 1992.

  23. Hosom, J. P., Yamaguchi, M., and Fujisaki, H., "Acoustic Characteristics of Japanese Nasal Consonants and Nasalized Vowels," in Proceedings of the Fall 1990 Meeting of the Acoustical Society of Japan, pp. 227-228, Sep. 1990.


Workshop Publications

  1. Kain, A., Niu, X., Hosom, J.P., Miao, Q., and van Santen, J. P. H., "Formant Re-synthesis of Dysarthric Speech," in Proceedings of the 5th IEEE Workshop on Speech Synthesis, Pittsburgh, PA, pp. 25-30, June, 2004.

  2. Cosi, P., Hosom, J.P., and Tesser, F., "Towards the Italian CSLU Toolkit", in Proceedings Workshop Annuale AIIA -- "Elaborazione del Linguaggio e Riconoscimento del Parlato", Povo di Trento, 16-17 December, 1999, pp. 33-44.

  3. Cosi, P., Hosom, J. P., Valente, A., "High Performance Telephone Bandwidth Speaker Independent Continuous Digit Recognition," in Proceedings of the Automatic Speech Recognition and Understanding (ASRU) Workshop, Trento, Italy, Dec. 2001.

  4. Cole, R.A., Serridge, B., Hosom, J.P., Cronk, A., and Kaiser, E., "A Platform for Multilingual Research in Spoken Dialogue Systems", in Proceedings of the Workshop on Multi-Lingual Interoperability in Speech Technology (MIST), Leusden, The Netherlands, pp. 43-48, Sep. 1999.

  5. Carmell, T., Cole, R., and Hosom, J.P., "An Interactive Course in Spectrogram Reading," in Proceedings of the Method and Tool Innovations for Speech Science Education (MATISSE) Workshop, London, Apr. 1999.

  6. Cosi, P., Hosom, J. P., Schalkwyk, J., Sutton, S., and Cole, R. A., "Connected Digit Recognition Experiments with the OGI Toolkit's Neural Network and HMM-Based Recognizers", in 4th IEEE Workshop on Interactive Voice Technology for Telecommunications Applications (IVTTA-ETWR98), Turin, Sep. 1998, pp. 135-140.

  7. Wright, E. L. and Hosom, J. P., "Seismic-Reflector Database Software", in Proceedings of the Fourth Working Symposium on Oceanographic Data Systems, Computer Society Press, Washington D.C., pp. 184-190, 1986.


Non-Peer-Reviewed Conference Presentations

  1. Amano-Kusumoto A., Hosom, J. P., "The effect of formant trajectories and phoneme durations on vowel perception," presented at  International Hearing Aid Research Conference (IHCON) , Aug., 2008.

  2. van Santen, J., Niu, X., Hosom, J.P., and Kain, A., "Towards Automated Measures of Speech Intelligibility in Dysarthria," presented at 2007 American Speech-Language-Hearing Association (ASHA), Boston, Massachusetts, 17 November 2007.

  3. Hosom, J. P., Shriberg, L. D., and Green, J., "The Coefficient of Variation Ratio Determined Using Automatic Speech Recognition," presented at 5th International Conference on Speech Motor Control, Nijmegen, The Netherlands, 9 June, 2006.

  4. Kusumoto, A., Hosom, J.P., and Hayes, T. L., "Effect of Prosodic Modifications on Sentence Recall," presented at Biennial International Conference of the VA RR&D National Center for Rehabilitative Auditory Research (NCRAR), Portland, Oregon, 2005.

  5. Kusumoto, A., Hosom, J.P., Vaughan, N., "Comparison of Acoustic Features of Time-Compressed and Natural Speech," presented at Acoustical Society of America, 148th Meeting, San Diego, CA, Nov. 2004.

  6. Shriberg, L., Hosom, J. P., and Green, J., "Diagnostic Assessment of Childhood Apraxia of Speech Using Automatic Speech Recognition (ASR) Systems," presented at Conference on Motor Speech: Motor Speech Disorders / Speech Motor Control, Albuquerque New Mexico, 20 March, 2004.

  7. Hosom, J.P., "Toward ASR Without Viterbi Search: Motivation and Implementation," presented at The Second Speech in Noisy Environments (SPINE) Evaluation and Workshop, Orlando Florida, 29 Nov. 2001.

  8. Hosom, J.P., "Toward ASR Without Viterbi Search: A Prototype System for the SPINE Evaluation," presented at The Second Speech in Noisy Environments (SPINE) Evaluation and Workshop, Orlando Florida, 30 Nov. 2001.


Patent

  1. Hosom, J. P. and Yamaguchi, M., "Speech Analysis Apparatus for Extracting Glottal Source Parameters and Formant Parameters," U.S. Patent number 5,577,160, assigned to Sumitomo Electric Industries, Ltd., granted Nov. 19, 1996.