Oregon Health & Science University | Center for Spoken Language Understanding
OHSU

Esther Klabbers-Judd

Address Information


Center for Spoken Language Understanding
Division of Biomedical Computer Science at OHSU
20000 NW Walker Road
Beaverton, Oregon 97006
email: klabbers AT-SYMBOL c s l u dot o g i dot e d u
phone: (503) 748 3005
fax: (503) 748 1306

esther

News

  • The Center for Spoken Language Understanding has several openings for PhD students in speech and language processing with rolling admissions. If you are interested, here is a flyer.

Research Interests

  • Personalizing text-to-speech voices in AAC devices.
  • Analyzing and modeling of different affects (emotions, i.e., angry, happy, etc) for use in text-to-speech synthesis systems.
  • Perception of duration and pitch in Parkinson's patients.
  • Modeling pitch contours in text-to-speech synthesis systems using the superpositional approach to intonation.
  • Modeling segmental duration prediction in text-to-speech synthesis systems using the Bell-Labs sums-of-products model.
  • Perception of spectral discontinuities in concatenative text-to-speech synthesis.

Grants

Education

  • PhD in Speech Synthesis (Language and Computer Science) in 2000
    at IPO, Center for User-System Interaction at Eindhoven University of Technology, the Netherlands
  • MA in Language and Computer Science in 1996
    at University of Nijmegen, the Netherlands
  • BA in English Language and Literature in 1992
    at University of Nijmegen, the Netherlands

Professional Experience

  • Assistant Professor, 2008 - Present,
    CSLU, Oregon Health & Science University, Portland, OR
  • Assistant Scientist, 2007-April 2008,
    CSLU, Oregon Health & Science University, Portland, OR
  • Senior Scientist, 2007-Present,
    Biospeech, Inc., Portland, OR
  • Senior Research Associate, 2001-2006,
    CSLU, Oregon Health & Science University, Portland, OR
  • Postdoc, 2000,
    IPO, Center for User-System Interaction, Eindhoven University of Technology, the Netherlands
  • Winter Research Intern, 1999,
    Lucent Technologies, NJ
  • Summer Research Intern, 1996,
    KPN Research, Leidschendam, the Netherlands

Organizational Activities

Conference Organization

NIH Panel Reviews

  • NIH Small Business Review Meeting, July 2012

Journal Reviews

  • IEEE Transactions on Speech and Audio Processing: 2004-present
  • Speech Communication: 2007-present
Conference Paper Reviews Workshop Paper Reviews
  • 6th ISCA Speech Synthesis Workshop, Bonn, Germany, 2007
  • 5th ISCA Speech Synthesis Workshop, Pittsburgh, PA, 2004

Courses

CS 551 / CS 651 - Structure of Spoken Language
Credits: 3
Descriptions: Speech is considered a key component in the future of human-computer communication. However, the success of speech recognition and text-to-speech synthesis systems depends on development of the technology as well as further research advances. Research and development of spoken-language technology is facilitated by an understanding of the acoustic and symbolic structure of language, as well as the capabilities and limitations of current systems. This course will present some of what is known about speech in terms of phonetics, acoustic-phonetic patterns, and models of speech perception and production. The goals are for the student to understand how speech is structured, understand and identify acoustic cues (especially in different phonetic contexts), and understand how this information may be relevant to automatic speech recognition or generation systems.
CS 553 / CS 653 - Speech Synthesis
Credits: 3
Description: This course will introduce students to the problem of synthesizing speech from text input. Speech synthesis is a challenging area that draws on expertise from a diverse set of scientific fields, including signal processing, linguistics, psychology, statistics, and artificial intelligence. Fundamental advances in each of these areas will be needed to achieve truly human-like synthesis quality and advances in other realms of speech technology (like speech recognition, speech coding, speech enhancement). In this course, we will consider current approaches to sub-problems such as text analysis, pronunciation, linguistic analysis of prosody, and generation of the speech waveform. Lectures, demonstrations, and readings of relevant literature in the area will be supplemented by student lab exercises using hands-on tools.

CS 506 / CS 606 - Computational Approaches to Speech and Language Disorders
Credits: 3
Description: This course covers a range of speech and language analysis algorithms that have been developed for measurement of speech or language based markers of neurological disorders, for the creation of assistive devices, and for remedial applications. Topics will include introduction to speech and language disorders, robust speech signal processing, statistical approaches to pitch and timing modeling, voice transformation algorithms, speech segmentation, and modeling of disfluency. The class will use a wide array of clinical data, and will be closely tied to several ongoing research projects.

Demos

Phrase concatenation

This speech generation method was developed as part of my PhD project. It produces high-quality speech output for limited domain synthesizers. The output speech sounds close to natural speech. This is due to the careful recordings of carrier sentences and slot fillers. There are several variants of slot filler words that vary in accentuation and location in the sentence. The correct version of a word is selected at run-time by taking into account information about which words are accented and where phrase boundaries occur.

  • GoalGetter: automatic spoken summaries of soccer matches in Dutch
  • The GoalGetter system presents one implementation of the phrase concatenation methodology. The input text for the speech generation module is automatically generated from tabular data using language generation. As such there are no typos and the locations of accents and phrase boundaries are known in advance and are correct. These recordings were made using a non-professional speaker. We used the same methodology later in the OVIS train information system with a professional speaker. The speech output in OVIS sounds even more natural. Follow the link for above for more information about the GoalGetter system.
  • OVIS: Openbaar Vervoer Informatie Systeem (Public Transit Information System)
  • OVIS was developed as part of an NWO project. It is a fullfledged spoken dialogue system. Mu PhD project dealt with the speech generation component of this system. I worked closely together with Mariet Theune who was responsible for the language generation module. As in the GoalGetter system, the phrase concatenation again uses different tokens for slot filler words in carrier phrases that depend on accentuation and position in the sentence.

Diphone Synthesis

For OVIS we also developed a Dutch diphone voice using the same professional speaker as was used in the phrase concatenation method. I developed rules for the intonation to match the natural intonation used in the carrier phrases.

At OGI I have recorded an American English diphone database.

Peer Reviewed Publications


2012

  • E. Morley, E. Klabbers, J. van Santen, A. Kain, and S.H. Mohammadi (2012), "Synthetic F0 Can Effectively Convey Speaker ID in Delexicalized Speech", Proceedings InterSpeech 2012, Portland, Oregon

2011

  • E. Morley, J. van Santen, E. Klabbers, and A. Kain (2011), "F0 range and peak alignment across speakers and emotions", Proceedings of ICASSP 2011, Prague, Czech Republic, p4952-4955.

2010

  • E. Klabbers, A. Kain, and J. van Santen (2010), "Evaluation of Speaker Mimic Technology for Personalizing SGD Voices", Proceedings Interspeech 2010, Makuhari, Japan, p2154-2157.

2008

  • J. van Santen, T. Mishra, and E. Klabbers (2008), "Prosodic Processing", In J. Benesty, M. Sondhi, and Y. Huang (Eds.), Springer Handbook of Speech Processing, Springer, Berlin Heidelberg, pp. 471-488.

2007


2006

  • Q. Miao, X. Niu, E. Klabbers, and J. van Santen (2006). "Effects of Prosodic Factors on Spectral Balance: Analysis and Synthesis", Proceedings Speech Prosody 2006, Dresden, Germany
  • T. Mishra, J. van Santen and E. Klabbers (2006). Decomposition of pitch curves in the general superpositional model, Proceedings of Speech Prosody 2006, Dresden, Germany.

2005

  • Y. Pantazis, Y. Stylianou, and E. Klabbers (2005), Discontinuity detection in concatenated speech synthesis based on nonlinear speech analysis, Proceedings EUROSPEECH 2005, Lisbon, Portugal, pp. 2817 - 2820.
  • J. van Santen, A. Kain, E. Klabbers and T. Mishra (2005). Synthesis of prosody using multi-level sequence units, Speech Communication, 46(3-4), 365-375, July 2005, Quantitative Prosody Modelling for Natural Speech Description and Generation.
  • Esther Klabbers, Jan van Santen and Johan Wouters (2005). Minimizing the amount of pitch modification in speech synthesis, in S. Narayanan and A. Alwan (eds). Text to Speech synthesis: New Paradigms and Advances, Prentice Hall Professional Technical Reference, Upper Saddle River, NJ, pp. 89-107.

2004



2003


2002


2001


2000


1998


1997


1996


Links


Esther Klabbers
Last modified: February 19 2013