Esther Klabbers-Judd
Address Information
Center for Spoken Language Understanding |
![]() |
News
- The Center for Spoken Language Understanding has several openings for PhD students in speech and language processing with rolling admissions. If you are interested, here is a flyer.
Research Interests
- Personalizing text-to-speech voices in AAC devices.
- Analyzing and modeling of different affects (emotions, i.e., angry, happy, etc) for use in text-to-speech synthesis systems.
- Perception of duration and pitch in Parkinson's patients.
- Modeling pitch contours in text-to-speech synthesis systems using the superpositional approach to intonation.
- Modeling segmental duration prediction in text-to-speech synthesis systems using the Bell-Labs sums-of-products model.
- Perception of spectral discontinuities in concatenative text-to-speech synthesis.
Grants
- Oregon Alzheimer's Disease Center (OADC) pilot grant
- National Institute of Health, SBIR Phase I: Computerized System for Phonemic Awareness Intervention
- National Institute of Health, SBIR Phase II: Computer-based auditory skill building program for aural (re)habilitation
- OHSU Medical Research Foundation, Perception of Temporal Structure of Speech in Parkinson's Disease (PD)
- National Science Foundation, Synthesis and Perception of Speaker Identity
- National Institute of Health, STTR Phrase II: User Adaptation of AAC Device Voices
- National Institute of Health, "SBIR Phase I: Computer-based auditory skill building program for aural (re)habilitation"
- Nancy Lurie Marks Family Foundation, "In Your Own Voice: Personal AAC Voices for Minimally Verbal Children with Autism Spectrum Disorder"
- National Institute of Health, "STTR Phase I: User Adaptation of AAC Device Voices"
Education
- PhD in Speech Synthesis (Language and Computer Science) in 2000
at IPO, Center for User-System Interaction at Eindhoven University of Technology, the Netherlands - MA in Language and Computer Science in 1996
at University of Nijmegen, the Netherlands - BA in English Language and Literature in 1992
at University of Nijmegen, the Netherlands
Professional Experience
- Assistant Professor, 2008 - Present,
CSLU, Oregon Health & Science University, Portland, OR - Assistant Scientist, 2007-April 2008,
CSLU, Oregon Health & Science University, Portland, OR - Senior Scientist, 2007-Present,
Biospeech, Inc., Portland, OR - Senior Research Associate, 2001-2006,
CSLU, Oregon Health & Science University, Portland, OR - Postdoc, 2000,
IPO, Center for User-System Interaction, Eindhoven University of Technology, the Netherlands - Winter Research Intern, 1999,
Lucent Technologies, NJ - Summer Research Intern, 1996,
KPN Research, Leidschendam, the Netherlands
Organizational Activities
Conference Organization
- Local Arrangements Chair for InterSpeech 2012, Portland, Oregon
NIH Panel Reviews
- NIH Small Business Review Meeting, July 2012
Journal Reviews
- IEEE Transactions on Speech and Audio Processing: 2004-present
- Speech Communication: 2007-present
- ICPhS 2007, Interspeech 2007-present
- Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT) : 2007
- International Conference on Spoken Language Processing (ICSLP): 2006
- Annual Meeting of the Association for Computational Linguistics (ACL): 2005
- 6th ISCA Speech Synthesis Workshop, Bonn, Germany, 2007
- 5th ISCA Speech Synthesis Workshop, Pittsburgh, PA, 2004
Courses
CS 551 / CS 651 - Structure of Spoken Language
Credits: 3
Descriptions: Speech is considered a key component in the future of human-computer communication. However, the success of speech recognition and text-to-speech synthesis systems depends on development of the technology as well as further research advances. Research and development of spoken-language technology is facilitated by an understanding of the acoustic and symbolic structure of language, as well as the capabilities and limitations of current systems. This course will present some of what is known about speech in terms of phonetics, acoustic-phonetic patterns, and models of speech perception and production. The goals are for the student to understand how speech is structured, understand and identify acoustic cues (especially in different phonetic contexts), and understand how this information may be relevant to automatic speech recognition or generation systems.
CS 553 / CS 653 - Speech Synthesis
Credits: 3
Description: This course will introduce students to the problem of synthesizing speech from text input. Speech synthesis is a challenging area that draws on expertise from a diverse set of scientific fields, including signal processing, linguistics, psychology, statistics, and artificial intelligence. Fundamental advances in each of these areas will be needed to achieve truly human-like synthesis quality and advances in other realms of speech technology (like speech recognition, speech coding, speech enhancement). In this course, we will consider current approaches to sub-problems such as text analysis, pronunciation, linguistic analysis of prosody, and generation of the speech waveform. Lectures, demonstrations, and readings of relevant literature in the area will be supplemented by student lab exercises using hands-on tools.
CS 506 / CS 606 - Computational Approaches to Speech and Language Disorders
Credits: 3
Description: This course covers a range of speech and language analysis algorithms that have been developed for measurement of speech or language based markers of neurological disorders, for the creation of assistive devices, and for remedial applications. Topics will include introduction to speech and language disorders, robust speech signal processing, statistical approaches to pitch and timing modeling, voice transformation algorithms, speech segmentation, and modeling of disfluency. The class will use a wide array of clinical data, and will be closely tied to several ongoing research projects.
Demos
Phrase concatenation
This speech generation method was developed as part of my PhD project. It produces high-quality speech output for limited domain synthesizers. The output speech sounds close to natural speech. This is due to the careful recordings of carrier sentences and slot fillers. There are several variants of slot filler words that vary in accentuation and location in the sentence. The correct version of a word is selected at run-time by taking into account information about which words are accented and where phrase boundaries occur.
- GoalGetter: automatic spoken summaries of soccer matches in Dutch
- The GoalGetter system presents one implementation of the phrase concatenation methodology. The input text for the speech generation module is automatically generated from tabular data using language generation. As such there are no typos and the locations of accents and phrase boundaries are known in advance and are correct. These recordings were made using a non-professional speaker. We used the same methodology later in the OVIS train information system with a professional speaker. The speech output in OVIS sounds even more natural. Follow the link for above for more information about the GoalGetter system.
- OVIS: Openbaar Vervoer Informatie Systeem (Public Transit Information System)
- OVIS was developed as part of an NWO project. It is a fullfledged spoken dialogue system. Mu PhD project dealt with the speech generation component of this system. I worked closely together with Mariet Theune who was responsible for the language generation module. As in the GoalGetter system, the phrase concatenation again uses different tokens for slot filler words in carrier phrases that depend on accentuation and position in the sentence.
Diphone Synthesis
For OVIS we also developed a Dutch diphone voice using the same professional speaker as was used in the phrase concatenation method. I developed rules for the intonation to match the natural intonation used in the carrier phrases.
At OGI I have recorded an American English diphone database.
Peer Reviewed Publications
2012
- E. Morley, E. Klabbers, J. van Santen, A. Kain, and S.H. Mohammadi (2012), "Synthetic F0 Can Effectively Convey Speaker ID in Delexicalized Speech", Proceedings InterSpeech 2012, Portland, Oregon
2011
- E. Morley, J. van Santen, E. Klabbers, and A. Kain (2011), "F0 range and peak alignment across speakers and emotions", Proceedings of ICASSP 2011, Prague, Czech Republic, p4952-4955.
2010
- E. Klabbers, A. Kain, and J. van Santen (2010), "Evaluation of Speaker Mimic Technology for Personalizing SGD Voices", Proceedings Interspeech 2010, Makuhari, Japan, p2154-2157.
2008
- J. van Santen, T. Mishra, and E. Klabbers (2008), "Prosodic Processing", In J. Benesty, M. Sondhi, and Y. Huang (Eds.), Springer Handbook of Speech Processing, Springer, Berlin Heidelberg, pp. 471-488.
2007
- J. van Santen, E. Klabbers, and T. Mishra (2007), "Towards measurement of pitch alignment", Italian Journal of Linguistics. Special issue on "Autosegmental-metrical approaches to intonation in Europe: tonal targets and anchors", pp. 161-188.
- E. Klabbers, T. Mishra, and J. van Santen (2007), Analysis of affective speech recordings using the superpositional intonation model, Proceedings of the 6th ISCA workshop on speech synthesis (SSW6), pp. 339-344, Bonn, Germany.
- E. Klabbers, J. van Santen, A. Kain (2007)."The contribution of various sources of spectral mismatch to audible discontinuities in a diphone database", IEEE Transactions on Audio, Speech, and Language Processing Journal, 15(3), pp. 949-956.
2006
- Q. Miao, X. Niu, E. Klabbers, and J. van Santen (2006). "Effects of Prosodic Factors on Spectral Balance: Analysis and Synthesis", Proceedings Speech Prosody 2006, Dresden, Germany
- T. Mishra, J. van Santen and E. Klabbers (2006). Decomposition of pitch curves in the general superpositional model, Proceedings of Speech Prosody 2006, Dresden, Germany.
2005
- Y. Pantazis, Y. Stylianou, and E. Klabbers (2005), Discontinuity detection in concatenated speech synthesis based on nonlinear speech analysis, Proceedings EUROSPEECH 2005, Lisbon, Portugal, pp. 2817 - 2820.
- J. van Santen, A. Kain, E. Klabbers and T. Mishra (2005). Synthesis of prosody using multi-level sequence units, Speech Communication, 46(3-4), 365-375, July 2005, Quantitative Prosody Modelling for Natural Speech Description and Generation.
- Esther Klabbers, Jan van Santen and Johan Wouters (2005). Minimizing the amount of pitch modification in speech synthesis, in S. Narayanan and A. Alwan (eds). Text to Speech synthesis: New Paradigms and Advances, Prentice Hall Professional Technical Reference, Upper Saddle River, NJ, pp. 89-107.
2004
- J. van Santen, T. Mishra, and E. Klabbers (2004). Estimating phrase curves in the general superpositional model, Proceedings ISCA Speech Synthesis Workshop 5, Pittsburgh, PA, pp. 61-66.
- Esther Klabbers and Jan van Santen (2004). Clustering of foot-based pitch contours in expressive speech, Proceedings ISCA Speech Synthesis Workshop 5, Pittsburgh, PA, pp. 73-78.
- Jan van Santen, Alexander Kain and Esther Klabbers (2004). Synthesis by recombination of segmental and prosodic information, International Conference on Speech Prosody 2004, Nara, Japan.
2003
- Esther Klabbers and Jan van Santen (2003). Control
and prediction of the impact of pitch modification on synthetic speech
quality, EUROSPEECH'03, Geneva, Switzerland, p317-320.
- Taniya Mishra, Esther Klabbers
and Jan van Santen (2003). Detection of
list-type sentences,
EUROSPEECH'03, Geneva, Switzerland, p2477-2480.
- Jan van Santen, Lois Black, Gilead Cohen, Esther Klabbers, Taniya
Mishra, Jacques de Villiers and Xiaochuan Niu (2003). Applications of computer generated
expressive speech for communication disorders, EUROSPEECH'03,
Geneva, Switzerland, p1657-1660
- Raymond Veldhuis and Esther Klabbers (2003). On the computation of the Kullback-Leibler measure for spectral distances, IEEE Transactions on Speech and Audio Processing, vol. 11, no. 1, January 2003, p100-103.
2002
- Esther Klabbers, Jan van Santen and Johan Wouters (2002). Prosodic factors for predicting local pitch shape, IEEE 2002 Workshop on Speech Synthesis, Santa Monica, CA, September 11-13 2002.
2001
- Esther Klabbers, Karlheinz Stöber (2001). Creation of speech corpora for the multilingual Bonn Open Synthesis System, 4th ISCA Tutorial and Research Workshop on Speech Synthesis, Pitlochry, Scotland, p23-27 (ps).
- M. Theune, E. Klabbers, J. Odijk, J.R. de Pijper and E. Krahmer (2001). From Data to Speech: A General Approach, Natural Language Engineering7(1), p47-86 (ps).
- Esther Klabbers, Karlheinz Stöber, Raymond Veldhuis, Petra Wagner and Stefan Breuer (2001). Speech synthesis development made easy: The Bonn Open Synthesis System, EUROSPEECH 2001, Aalborg, Denmark, volume I, 521-524 (ps).
- Esther Klabbers and Raymond Veldhuis (2001). Reducing audible spectral discontinuities, IEEE Transactions on Speech and Audio Processing, vol. 9, no. 1, January 2001, p39-51.
2000
- Esther Klabbers, Raymond Veldhuis and Kim Koppen (2000). A solution to the reduction of concatenation artefacts in speech synthesis, Proceedings ICSLP 2000, Beijing, China, volume III, 474-477 (ps).
- Esther Klabbers and Jan van Santen (2000). Predicting segmental durations for Dutch using the sums-of-products approach, Proceedings ICSLP 2000, Beijing, China, volume III, 670-673. (ps)
- Esther Klabbers (2000). Segmental and Prosodic Improvements to Speech Generation, PhD Thesis, Eindhoven University of Technology (TUE).
1998
- Esther Klabbers and Rene Collier (1998). On the performance of speech output in a practical setting, IPO Annual Progress Report 33, Eindhoven, p121-128 (ps).
- Esther Klabbers and Raymond Veldhuis (1998). On the reduction of concatenation artefacts in diphone synthesis, Proceedings ICSLP'98, Sydney, Australia, p1983 - 1986 (ps).
- Esther Klabbers, Emiel Krahmer and Mariët Theune (1998). A generic algorithm for generating spoken monologues, Proceedings ICSLP'98, Sydney, Australia, p2759-2762. (ps)
1997
- E. Klabbers (1997).High-quality speech output generation through advanced phrase concatenation, Proceedings of the COST Workshop on Speech Technology in the Public Telephone Network: Where are we today?, Rhodes, Greece, p85-88 (ps).
- M. Theune, E. Klabbers, J. Odijk and J.R. de Pijper (1997). Computing Prosodic Properties in a Data-to-Speech System, Proceedings of the Workshop on Concept-to-Speech Generation Systems, ACL/EACL, Madrid, p39-45 (ps).
- Esther Klabbers (1997).Speech Output Generation in GoalGetter, Computational Linguistics in the Netherlands, Papers from the Seventh CLIN Meeting, Eindhoven, p57-68 (ps).
- M. Theune en E. Klabbers (1997). Gebruik van taal en spraak in informatiesystemen, NWO Annual Report 1997, p103-106 (MS-Word document).
1996
- Esther Klabbers, Jan Odijk, Jan-Roelof de Pijper and Mariët Theune (1996). GoalGetter: From Teletext to Speech, IPO Annual Progress Report 31, Eindhoven, p66-75 (ps).
- M. Theune, E. Klabbers, J. Odijk and J.R. de Pijper (1996). From Data to Speech: A Generic Approach, IPO manuscript 1202 (ps).
Links
- CSLU download for OGI Festival components
- Audio demonstrations from my PhD thesis
- GoalGetter
- Edinburgh Speech Tools Library
- EMU Transcription
- Festival Speech Synthesis System
- FestVox
- HTK
- Praat
- SIL Speech Analysis Tools
- Praat
- Snack
- Wavesurfer
- The MBROLA Project
- Transcriber: a tool for segmenting, labeling and transcribing speech
- WWStim
- SAMPA Homepage
Esther Klabbers
Last modified: February 19 2013
