John-Paul Hosom


Contact Information

Assistant Professor
Center for Spoken Language Understanding (CSLU)
Departments of Computer Science & Electrical Engineering (CSEE) and Biomedical Engineering (BME),
OGI School of Science & Engineering (OGI)
Oregon Health & Science University (OHSU)

20000 N.W. Walker Road
Beaverton, OR 97006 USA
Telephone: (503) 748-1456
Fax: (503) 748-1306
E-mail: hosom at  cslu dot ogi dot  edu
URL: http://www.cslu.ogi.edu/people/hosom


Research Interests

Automatic speech recognition, speech intelligibility, time alignment of phonemes, acoustic analysis of speech, assistive technology, machine learning.


Education
1994 - 2000 Ph.D. in Computer Science & Engineering Oregon Graduate Institute of Science & Technology (OGI), Center for Spoken Language Understanding (CSLU), Beaverton, OR, 2000. Thesis: Automatic Time Alignment of Phonemes Using Acoustic-Phonetic Information
1983 - 1987 B.S. in Computer and Information Science University of Massachusetts at Amherst, Amherst, MA, 1987


Professional Experience
2001 - Present Assistant Professor
Oregon Health & Science University (OHSU), OGI School of Science & Engineering, Center for Spoken Language Understanding, Beaverton, OR.
Research topics have included improving the intelligibility of dysarthric speech, a novel assistive device for persons with dysarthria, development of diagnostic markers for childhood apraxia of speech, new models of automatic speech recognition, automation of tests for Mild Cognitive Impairment (a precursor to Alzheimer's Disease), reliable estimation of fundamental frequency, stop burst detection, automatic phoneme alignment, and measuring the contribution of specific acoustic features to sentence-level speech intelligibility.
2000 - 2001 Post-Doctoral Research Associate
Oregon Graduate Institute's Center for Spoken Language Understanding, Beaverton, OR.
Research on measuring how well a phoneme has been pronounced (for language training), developing children's speech recognizers and an audio-visual corpus of children's speech, and creating a Brazilian Portuguese version of a software toolkit for developing spoken-language systems.
1994 - 2000 Research Assistant
Oregon Graduate Institute's Center for Spoken Language Understanding, Beaverton, OR.
Research on automatic time-alignment of phonemes in the speech signal and on improving accuracy of automatic speech recognition. Developed tools for speech display and annotation, and for training of hybrid Hidden Markov Model and Artificial Neural Network speech recognizers.
1989 - 1993 Researcher
Sumitomo Electric Industries, Ltd., Osaka, Japan.
Research and development of Japanese-language text-to-speech synthesis, with a focus on acoustic analysis of the speech signal.

Teaching and Advising
Course Title: Hidden Markov Models for Speech Recognition (CSE 552; ECE 580-HMM)
Description: The theory and implementation of Hidden Markov Models, applied to the task of automatic speech recognition.
Terms Taught: Winter, 2000; Spring, 2001; Spring, 2002; Spring, 2003; Spring, 2004; Spring, 2005; Spring, 2006
Course Title: The Structure of Spoken Language (CSE 551; ECE 580-SSL)
Description: An overview of speech in terms of phonetics, acoustic-phonetic patterns, and models of human speech perception and production.
Terms Taught: Fall, 2001; Fall, 2002; Fall, 2003; Fall, 2004; Fall, 2005
Ph.D. Student: Akiko Kusumoto
Thesis Topic:
Identifying the Contribution of Acoustic Features to Speech Intelligibility
Thesis Proposal: A number of studies have shown that the intelligibility of speech spoken deliberately clearly, referred to as "clear speech" or CLR speech, is higher than that of speech spoken during typical communication, referred to as "conversational speech" or CNV speech. Significant differences in the acoustic features of CLR speech, as compared to those of CNV speech, have been found in previous studies. However, little is known about the causal relationship between individual sets of acoustic features and speech intelligibility. Our long-term goal is to better understand and model those features that contribute to speech intelligibility for particular groups of listeners. The objective of this proposal is to identify specific acoustic features that contribute to the increased intelligibility of CLR speech over CNV speech, which we refer to as "relevant features," and to determine the degree of contribution of these features. We propose a hybridization algorithm that replaces a single feature or a combination of features of CNV speech with those of CLR speech, in order to examine, through perceptual testing, the relative contribution of these features to intelligibility. Hybridized (HYB) speech is the synthesized speech whose features consist of both CNV and CLR features. In a series of preliminary experiments, we confirmed that it is possible to obtain intelligibility of HYB speech that is higher than the intelligibility of CNV speech.

Grants
Title: "Making Dysarthric Speech Intelligible"
Sponsor: National Science Foundation (NSF)
Project Dates: 10/1/01 - 9/30/04
PIs: van Santen and Hosom
Project Goal: To transform the speech signal of a person with a motor speech impairment so that it becomes more intelligible to the untrained listener.
Title: "Toward Automatic Speech Recognition Without Viterbi Search"
Sponsor: Defense Advanced Research Projects Agency (DARPA)
Project Dates: 4/10/01 - 3/31/02
PIs: Hosom and van Santen
Project Goal: To challenge current thinking in the automatic-speech-recognition community by prototyping a high-risk, high-yield framework for speech recognition that is different from standard approaches.
Title: "ITR: Prosody Generation for Child Oriented Speech Synthesis"
Sponsor: National Science Foundation (NSF)
Project Dates: 8/15/02 - 9/30/07
PIs: van Santen, Black, Sproat, and Hosom
Project Goal: The goal of this project has been to address three issues in text-to-speech synthesis in order to improve the prosody of speech generated for children: (1) computation of abstract tags to identify regions requiring emphasis and phrasing, (2) determination of a realistic contour of the fundamental frequency, and (3) performing signal processing of speech units so that extreme distortions in pitch do not distort the audio quality of the units.
Title: "Phase II S43: Reading Remediation Using Computer Speech Recognition"
Sponsor: National Institutes of Health (NIH)
Project Dates: 9/01/02 - 2/28/04
PI: Steely
Project Goal: The primary objective of this project was to develop a software-based reading remediation tool that implements the "direct instruction" teaching model and uses computer speech recognition to evaluate pronunciation and timing. Results included the delivery of real-time word and phoneme recognition software, trained on children’s speech.
Role: Subcontractor
Title: "Pilot Study for Word Recognition of Children with Speech Delay"
Sponsor: Oregon Medical Research Foundation
Project Dates: 6/01/04 - 5/31/05
PI: Hosom
Project Goal: The major goal of this project was to analyze the speech of children who have been diagnosed with speech delay, to determine acoustic features of their observed pronunciation that may be used to identify their intended pronunciation. Results include software for accurate formant estimation and other speech parameters.
Title: "Automated Analysis of Spoken Story Recall Tests"
Sponsor: The Oregon Roybal Center for Aging, Technology, Education and Community Health (ORCATECH)
Project Dates: 8/01/05 - 7/31/06
PI: Roark
Project Goal: The major goal of this pilot study was to automate existing manual tests of Mild Cognitive Impairment based on verbal recall of stories. Automation involved automatic recognition of speech from elderly speakers, classification of speakers based on analysis of language, and language entropy measures. Entropy and pause-duration measures yielded encouraging results on the small amount of pilot data.
Role: Co-Investigator
Title: "Diagnostic Markers for Childhood Apraxia of Speech"
Sponsor: National Institutes of Health, NIDCD
Project Dates: 4/01/04 - 3/31/08
PI: Hosom
Project Goal: The major goal of this project is to develop automated diagnostic markers for childhood apraxia of speech. Research includes improvements to existing (manual) markers, development of new markers, and automatic combination of these markers for improved sensitivity and specificity.
Title: "Automated Test of Word Recognition - Phase II"
Sponsor: National Institutes of Health
Project Dates: 4/01/05 - 3/31/07
PI: Margolis
Project Goal: The major goal of this Phase II project is to automate hearing tests based on word recognition in noise. Automation involves automatic recognition of isolated words that may be acoustically very similar to target words.
Role: Subcontractor
Title: "Automation and Analysis of Standardized Verbal Tests for Mild Cognitive Impairment"
Sponsor: Oregon Alzheimer's Disease Center (OADC)
Project Dates: 4/01/06 - 3/31/07
PI: Hosom
Project Goal: The goal of this project is to improve robustness of the automatic analysis of verbal tests for mild dementia. The focus of this work is on a new model of unknown words or phrases.
Title: "Speech Supplemented Word Prediction Program"
Sponsor: National Institutes of Health
Project Dates: 8/01/06 - 7/31/09
PI: Jakobs
Project Goal: The goal of this research is to provide people with motor speech disorders (dysarthria) with a unique assistive-device access method that utilizes their speech. This project combines dysarthric speech recognition and word prediction techniques into a single access method.
Role: Investigator
Title: "Alzheimer's Disease Cooperative Study (ADCS): Instrument Protocol"
Sponsor: National Institutes of Health
Project Dates: 10/01/06 - 9/30/12
PI: Thal / Kaye
Project Goal: The goal of this research is to develop more efficient and sensitive home-based methods to capture meaningful cognitive decline and dementia in the elderly as outcomes for clinical trials aimed at primary prevention of AD. Existing neuropsychological exams, which typically involve simple spoken language tasks, will be automated using new algorithms for automatic word recognition designed to take advantage of constraints in the speech data.
Role: Investigator
Title: "OHSU BAIC: Technologies for Behavioral Assessment and Intervention"
Sponsor: Intel Corporation
Project Dates: 9/25/06 - 9/24/07
PI: Hayes
Project Goal: The major goals of this project are: to develop new technologies and algorithms for assessing neurological change through unobtrusive in-home technologies; to create a standards-based infrastructure for sharing of artifacts and data collected with such systems; and to establish a "living laboratory" of residential homes in which new technologies may be field tested on an ongoing basis. Existing neuropsychological exams, which typically involve simple spoken language tasks, will be automated using standard algorithms for automatic speech recognition.
Role: Investigator


Journal Articles and Book Chapters

  1. Kain, A. B., Hosom, J. P., Niu, X., van Santen, J. P. H., Fried-Oken, M., and Staehely, J., "Improving the Intelligibility of Dysarthric Speech," in Speech Communication, 49, pp. 743-759, 2007.

  2. Hosom, J.P., Shriberg, L., and Green, J. R., "Diagnostic Assessment of Childhood Apraxia of Speech Using Automatic Speech Recognition (ASR) Systems," in Journal of Medical Speech-Language Pathology, 12(4), pp. 167-171, 2004.

  3. Hosom, J. P., "Automatic Speech Recognition." In Encyclopedia of Information Systems, H. Bidgoli (ed.). San Francisco: Academic Press, vol. 4, pp. 155-169, 2003 (invited chapter).

  4. Hosom, J.P., Cole, R.A, and Cosi, P., "Improvements in Neural-Network Training and Search Techniques for Continuous Digit Recognition," Australian Journal of Intelligent Information Processing Systems (AJIIPS), vol. 5, no. 4, pp. 277-284, Summer 1998 (invited paper).

  5. Hosom, J. P. and Yamaguchi. M., "Proposal and Evaluation of a Method for Accurate Analysis of Glottal Source Parameters", The Institute of Electronics, Information and Communication Engineers (IEICE) Transactions on Information and Systems, vol. E77-D, no. 10, pp. 1130-1141, Oct. 1994.

  6. Yamaguchi, M. and Hosom, J. P., "Development of a Rule-Based Speech Synthesizer Module for Embedded Use", The Institute of Electronics, Information and Communication Engineers (IEICE) Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. E76-A, no. 11, pp. 1990-1998, Nov. 1993.


Peer-Reviewed Conference Publications

  1. Kusumoto, A., Kain, A. B., Hosom, J. P., and van Santen, J. P. H., "Hybridizing Conversational and Clear Speech," in Proceedings of InterSpeech, Antwerp, Belgium, Sep. 2007.

  2. Coulston, R., Klabbers, E., de Villiers, J., and Hosom, J. P., "Application of Speech Technology in a Home Based Assessment Kiosk for Early Detection of Alzheimer's Disease," in Proceedings of InterSpeech, Antwerp, Belgium, Sep. 2007.

  3. Roark, B., Hosom, J. P., Mitchell, M., and Kaye, J. A., "Automatically Derived Spoken Language Markers for Detecting Mild Cognitive Impairment," in Proceedings of the 2nd International Conference on Technology and Aging (ICTA), Toronto, Canada, Jun. 2007.

  4. Hosom, J.P., "F0 Estimation for Adult and Children’s Speech," in Proceedings of InterSpeech, Lisbon, Portugal, pp. 317-320, Sep. 2005.

  5. Vu, T.T., Nguyen, D.T., Luong, M.C., and Hosom, J.P., "Vietnamese Large Vocabulary Continuous Speech Recognition," in Proceedings of InterSpeech, Lisbon, Portugal, pp. 1689-1692, Sep. 2005.

  6. Duc, D. N., Hosom, J. P., and Luong, C. M., "HMM/ANN System for Vietnamese Continuous Digit Recognition." In Developments in Applied Artificial Intelligence, Lecture Notes in Artificial Intelligence 2718, Paul W. H. Chung, Chris Hinde, and Moonis Ali (ed.) Berlin: Springer-Verlag, pp. 481-486, 2003.

  7. Hosom, J. P., Kain, A. B., Mishra, T., van Santen, J.P.H., Fried-Oken, M., and Staehely, J., "Intelligibility of Modifications to Dysarthric Speech," 2003 International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), Hong Kong, vol. I, pp. 878-881, Apr. 2003.

  8. Hosom, J. P., "Automatic Phoneme Alignment Based on Acoustic-Phonetic Modeling," 2002 International Conference on Spoken Language Processing (ICSLP 2002), Boulder, Co., vol. I, pp. 357-360, Sep. 2002.

  9. Hosom, J.P. and Cole, R.A., "Burst Detection Based on Measurements of Intensity Discrimination," 2000 International Conference on Spoken Language Processing (ICSLP 2000), Beijing, vol. IV, pp. 564-567, Oct. 2000.

  10. Cosi, P. and Hosom, J. P., "High Performance General Purpose Phonetic Recognition for Italian," 2000 International Conference on Spoken Language Processing (ICSLP 2000), Beijing, vol. II, pp. 527-530, Oct. 2000.

  11. Cosi, P., Hosom, J. P., and Tesser, F., "High Performance Italian Continuous Digit Recognition," 2000 International Conference on Spoken Language Processing (ICSLP 2000), Beijing, vol. IV, pp. 242-245, Oct. 2000.

  12. Shobaki, K., Hosom, J.P., and Cole, R.A., "The OGI Kids' Speech Recognizers and Corpus," 2000 International Conference on Spoken Language Processing (ICSLP 2000), Beijing, vol. IV, pp. 258-261, Oct. 2000.

  13. van Santen, J., Macon, M., Cronk, A., Hosom, P., Kain, A., Pagel, V., and Wouters, J., "When Will Synthetic Speech Sound Human: Roles of Rules and Data," 2000 International Conference on Spoken Language Processing (ICSLP 2000), Beijing, vol. III, pp. 402-409, Oct. 2000.

  14. Cosi, P. and Hosom, J.P., "HMM/Neural Network-Based System for Italian Continuous Digit Recognition," Proceedings of the 14th International Congress of Phonetic Sciences (ICPhS), San Francisco, Aug. 1999.

  15. Sutton, S., Cole, R. A., de Villiers, J., Schalkwyk, J., Vermeulen, P., Macon, M., Yan, Y., Kaiser, E., Rundle, B., Shobaki, K., Hosom, P., Kain, A., Wouters, J., Massaro, D., and Cohen, M., "Universal Speech Tools: The CSLU Toolkit", 1998 International Conference on Spoken Language Processing (ICSLP98), Sydney, Nov.-Dec. 1998, vol. 7, pp. 3221-3224.

  16. Hosom, J. P., Cosi, P., and Cole, R. A., "Evaluation and Integration of Neural-Network Training Techniques for Continuous Digit Recognition", 1998 International Conference on Spoken Language Processing (ICSLP98), Sydney, Nov.-Dec. 1998, vol. 3, pp. 731-734.

  17. Hosom, J. P. and Cole, R. A., "A Diphone-Based Digit Recognition System using Neural Networks", 1997 International Conference on Acoustics, Speech, and Signal Processing (ICASSP 97), vol. 4, pp. 3369-3372, Apr. 1997.

  18. Yamaguchi, M. and Hosom, J. P., "Development of a Rule-Based Speech Synthesizer Module for Embedding and Its Power Control," Proceedings of the Spring 1993 Meeting of the Acoustical Society of Japan, pp. 147-148, May 1993.

  19. Hirose, K., Asano, T., Asano, Y., Fujisaki, H., Yamaguchi, M., and Hosom, J. P., "Rule-Synthesis Using Terminal Analog Speech Synthesizer with Configuration of Multiple Cascade Circuits," Proceedings of the Fall 1992 Meeting of the Acoustical Society of Japan, pp. 317-318, Apr. 1992 (in Japanese).

  20. Hosom, J. P. and Yamaguchi, M., "A Comparison of AbS and AIF Analysis of Glottal Source Parameters", Proceedings of the Spring 1992 Meeting of the Acoustical Society of Japan, 1-2-5, pp. 215-216, 1992.

  21. Hosom, J. P., Yamaguchi, M., and Fujisaki, H., "Acoustic Characteristics of Japanese Nasal Consonants and Nasalized Vowels," Proceedings of the Fall 1990 Meeting of the Acoustical Society of Japan, pp. 227-228, Sep. 1990.


Workshop Publications

  1. Kain, A., Niu, X., Hosom, J.P., Miao, Q., and van Santen, J. P. H., "Formant Re-synthesis of Dysarthric Speech," in Proceedings of the 5th IEEE Workshop on Speech Synthesis, Pittsburgh, PA, pp. 25-30, June, 2004.

  2. Cosi, P., Hosom, J.P., and Tesser, F., "Towards the Italian CSLU Toolkit", Proceedings Workshop Annuale AIIA -- "Elaborazione del Linguaggio e Riconoscimento del Parlato", Povo di Trento, 16-17 December, 1999, pp. 33-44.

  3. Cosi, P., Hosom, J. P., Valente, A., "High Performance Telephone Bandwidth Speaker Independent Continuous Digit Recognition." In Proceedings of the Automatic Speech Recognition and Understanding (ASRU) Workshop, Trento, Italy, Dec. 2001.

  4. Cole, R.A., Serridge, B., Hosom, J.P., Cronk, A., and Kaiser, E., "A Platform for Multilingual Research in Spoken Dialogue Systems", Proceedings of the Workshop on Multi-Lingual Interoperability in Speech Technology (MIST), Leusden, The Netherlands, pp. 43-48, Sep. 1999.

  5. Carmell, T., Cole, R., and Hosom, J.P., "An Interactive Course in Spectrogram Reading," Proceedings of the Method and Tool Innovations for Speech Science Education (MATISSE) Workshop, London, Apr. 1999.

  6. Cosi, P., Hosom, J. P., Schalkwyk, J., Sutton, S., and Cole, R. A., "Connected Digit Recognition Experiments with the OGI Toolkit's Neural Network and HMM-Based Recognizers", 4th IEEE Workshop on Interactive Voice Technology for Telecommunications Applications (IVTTA-ETWR98), Turin, Sep. 1998, pp. 135-140.

  7. Wright, E. L. and Hosom, J. P., "Seismic-Reflector Database Software", Proceedings of the Fourth Working Symposium on Oceanographic Data Systems, Computer Society Press, Washington D.C., pp. 184-190, 1986.


Non-Peer-Reviewed Conference Presentations

  1. van Santen, J., Niu, X., Hosom, J.P., and Kain, A., "Towards Automated Measures of Speech Intelligibility in Dysarthria," presented at the 2007 American Speech-Language-Hearing Association (ASHA), Boston, Massachusetts, 17 November 2007.

  2. Hosom, J. P., Shriberg, L. D., and Green, J., "The Coefficient of Variation Ratio Determined Using Automatic Speech Recognition," presented at 5th International Conference on Speech Motor Control, Nijmegen, The Netherlands, 9 June, 2006.

  3. Kusumoto, A., Hosom, J.P., and Hayes, T. L., "Effect of Prosodic Modifications on Sentence Recall," presented at the Biennial International Conference of the VA RR&D National Center for Rehabilitative Auditory Research (NCRAR), Portland, Oregon, 2005.

  4. Kusumoto, A., Hosom, J.P., Vaughan, N., "Comparison of Acoustic Features of Time-Compressed and Natural Speech," presented at Acoustical Society of America, 148th Meeting, San Diego, CA, Nov. 2004.

  5. Shriberg, L., Hosom, J. P., and Green, J., "Diagnostic Assessment of Childhood Apraxia of Speech Using Automatic Speech Recognition (ASR) Systems," presented at Conference on Motor Speech: Motor Speech Disorders / Speech Motor Control, Albuquerque New Mexico, 20 March, 2004.

  6. Hosom, J.P., "Toward ASR Without Viterbi Search: Motivation and Implementation," presented at The Second Speech in Noisy Environments (SPINE) Evaluation and Workshop, Orlando Florida, 29 Nov. 2001.

  7. Hosom, J.P., "Toward ASR Without Viterbi Search: A Prototype System for the SPINE Evaluation," presented at The Second Speech in Noisy Environments (SPINE) Evaluation and Workshop, Orlando Florida, 30 Nov. 2001.


Patent

  1. Hosom, J. P. and Yamaguchi, M., "Speech Analysis Apparatus for Extracting Glottal Source Parameters and Formant Parameters," U.S. Patent number 5,577,160, assigned to Sumitomo Electric Industries, Ltd., granted Nov. 19, 1996.