======================================================================== --- OGIresLPC --- OGI Residual LPC Synthesizer version 2.2 December 2006 http://cslu.cse.ogi.edu/tts Esther Klabbers Alexander Kain Johan Wouters Mike Macon Andrew Cronk This file is additional code NOT INCLUDED with the Festival TTS system as distributed by CSTR/University of Edinburgh. Copyright (c) 2006 Center for Spoken Language Understanding OGI School of Science & Engineering @ OHSU Please see the file "license_ogi_tts.txt" for information on usage and redistribution, and for a DISCLAIMER OF ALL WARRANTIES. ======================================================================== DESCRIPTION ----------- This package is a drop-in module for the Festival TTS system created by CSTR at the University of Edinburgh. This version of OGIresLPC has been designed to work with Festival version 1.96 and older, released Jul 2004. It is meant to serve as a simple baseline synthesizer in the CSLU Toolkit and for other projects. If you have ftp'ed all the files, the distribution includes the following 1) OGIresLPC-2.2.tar.gz -- the C++ and Scheme code 3) OGIlexicon-2.2.tar.gz -- a pronunciation lexicon that started as a union of CMUdict and Moby, many entries changed since. 4) a bunch of voices: voice_as_di-2.3.tar.gz -- American female voice voice_mwm5_di-2.3.tar.gz -- American male voice voice_aec_di-2.3.tar.gz -- American male voice voice_jph_di-2.3.tar.gz -- American male voice voice_tll_di-2.3.tar.gz -- American female voice voice_ogirab_di-2.1.tar.gz -- British English male voice using diphones from CSTR voice_convert_di-2.1.tar.gz -- TLL and JPH voices mapped from MWM using a voice conversion technique You must first install the OGIresLPC package in order to use the voices. Festival itself is available from the Festvox website at http://www.festvox.org INSTALLATION ------------ OGIresLPC: 1) Uncompress and unpack all the tar files in the directory directly above the festival directory. This will unpack the synthesizer code into your existing festival installation. 2) Add the following lines to the file festival/config/config.in and run ./configure ALSO_INCLUDE += OGIcommon OGIeffect OGIinsert OGIdbase OGIresLPC 3) Add the following modules to the file festival/src/modules/Makefile OPTIONAL = OGIcommon OGIeffect OGIinsert OGIdbase OGIresLPC 3) cd to the festival/ directory and remake festival. You should see the make routine compiling the new code in the festival/src/modules/ OGIresLPC directory and festival/src/modules/OGIcommon directory. If not, 'make clean' on your whole installation and start over. 4) Be sure you have unpacked everything (except perhaps the voices) from the Edinburgh distribution - specifically, the *POSLEX.tar files (found in the Festvox dir where you got Festival). 5) When festival is restarted, the commands (voice_???_diphone) with ??? replaced by the particular voice names above should be available in Festival. 6) If you unpack new voices at a later time, no recompilation is needed. USING VOICES ------------ When properly installed, you should be able to issue the festival commands festival> (voice_???_diphone) where ??? is one of {as, mwm5, aec, jph, tll, mwm2tll, mwm2jph, ogirab}, depending on which of the optional modules you have installed. Subsequent synthesis will then use the selected voice. Notes: 1) The voices AEC, JPH, and TLL were created by recording nonsense words and then _automatically_ aligning phonetic labels to them using an HMM speech recognizer in the CSLU Toolkit. The MWM5 and AS diphones were manually labeled. CHANGES ---------- 1. For American English, a generic voice configuration file called ogi_configure_voice.scm has been created that sets the modules to be used by Festival. Previously each voice had its own long voice configuration file for this purpose, which resulted in a fair amount of overlap. Replacing or adding a module meant adding the same line in the voice configuration file of each voice. Since most modules only change per language, it is more elegant to group them together in a single voice configuration file, and override certain speaker-specific parameters in the voice configuration file of that voice. The voice configuration files have recently been changed to include the extension "diphone" in the voice name. In the past this was taken care of in the ogi_configure_voice file, but in the future we may be working on other databases besides diphones. The following modules are set in the generic voice configuration file: Lexicon ogi_lexicons.scm: defines the command (setup_ogi_lex) which loads ogi_lex.out. This lexicon has recently been recreated using a new syllabification algorithm. See Syllabification. The distinction between primary and secondary stress has been preserved. ogi_lex_addenda.scm: defines (function_words_addenda) which lists a set of function words whose stress levels are set to 0. This avoids the assignment of accents to these words. Phone set ogi_worldbet_phones.scm. This set has not changed much over time. However, the consensus now is not to use the phone iU any more, but instead use j u or j U. Also the following features have been changed: (aU + d 3 2 - 0 0 0) ;; how to (aU + d 3 2 + 0 0 0) ;; how (This was done because the diphthong gets rounded towards the end and to distinguish it by feature from the diphthong aI. (dx - s 0 0 0 s a -) ;;; flap to (dx - 0 0 0 0 s a +) ;;; flap (dx is a flap and not a vowel, so the feature s for short vowel doesn't make any sense here). (S - 0 0 0 0 f a -) to (S - 0 0 0 0 f p -) (The feature place of articulation is moved from alveolar to palatal, to distinguish it by feature from the consonant s). (? - 0 0 0 0 0 0 +) (This line adds the glottal stop. The function add_glottal_stop in ogi_postlex.scm adds glottal stops between two vowels when they are word-final and word-initial and the second one is stressed). Token to word rules This module uses the standard Festival token to word rules as defined in token.scm POS tagger This module uses the standard Festival POS tagger as defined in pos.scm Postlexical rules ogi_postlex.scm: contains functions for reducing vowels etc. Previously, the file ogi_hack.scm was used for defining functions such as plosive_hack and flap_hack. This file is now redundant. The function plosive_hack has been rewritten to plosive_aspiration which is now part of ogi_postlex.scm. It adds allophone features for aspirated unvoiced plosives. Unvoiced plosives are aspirated when they occur in the onset of a stressed syllable and are not preceded by an s in that onset. The function flap_hack has been moved from ogi_hack.scm to ogi_postlex.scm. It changes the allophone feature of t to dx when it occurs intervocalically. Syllabification A new syllabification algorithm has been created. The function is defined in ogi_syllabify.scm. It calls a CFG rewrite grammar defined in ogi_lts_syllabify.scm. It uses the lts (letter-to-sound) parser. The new lexicon was created by calling: festival> (ogi_syllabify.scm 'newlex.scm 'ogi_lex.out) Words that are not in the lexicon are syllabified by setting the postlexical hook syllabify_lts. A function called secondary_stress_hack has been written that changes the stress from 2 to 0. This is necessary, because the duration and intonation modules at present cannot deal with secondary-stressed syllables. However, it is still better to consider them unstressed than stressed, as was the case in the old lexicon. Additionally, we don't have to change the lexicon if we ever do want to use that information. Phrase prediction Phrase prediction algorithms are defined in ogi_phrase.scm. This has not been improved on for a long time, and still needs some major improvements. At the moment, the phrasing algorithm points to the very simple function OGI_PuncPhrasify, which uses only the punctuation to assign phrase breaks. Accent and tone prediction This module uses the standard Festival intonation module as defined in tobi.scm. It uses a Classification and Regression Tree. In the next release of the CSLU Toolkit a binary version will be available of a new intonation module developed at CSLU that is based on the Bell Labs superpositional approach to intonation. F0 prediction This module uses the standard Festival software as defined in f2bf0lr.scm. In the next release of the CSLU Toolkit, a binary version will be available of a new intonation module. The parameters for target F0 mean and standard deviation are defaulted to 170 and 34. Speaker-specific values are set in the speaker's voice configuration file. Duration prediction This module uses a CART as defined in ogi_kddurtreeZ_wb.scm. In the next release of the CSLU Toolkit, a binary version will be available of a new duration module that is based on the Bell Labs sums-of-products approach. Synthesis This module uses OGI's resLPC method. The default parameters are set in the generic voice definition file. For female voices, some of the parameters are reset in their voice configuration file. --------- 2. A new diphone voice was created (voice_as_diphone). This 22kHz US female voice was recorded at OGI in a professional recording studio. CITATIONS --------- If you find this module useful for your own research projects, please cite our work in your publications. Citations in your papers should appear like this: M. Macon, A. Cronk, J. Wouters, and A. Kain, "OGIresLPC: Diphone synthesiser using residual-excited linear prediction", Technical Report CSE-97-007, Department of Computer Science, Oregon Graduate Institute of Science and Technology, September 1997. A. Kain and M. W. Macon, "Spectral Voice Conversion for Text-to-Speech Synthesis," Proc. of International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, pp. 285-288, 1998. To download these and other publications by our group, please see our web page. LICENSE ------- This module and accompanying data are freely available for non-commercial use only, and are covered under the same licensing agreement as the CSLU Toolkit. Please see the file "license_ogi_tts.txt" for information on usage and redistribution, and for a DISCLAIMER OF ALL WARRANTIES. Please contact CSTR for information regarding commercial use of Festival or the British English diphone set. BUGS and IMPROVEMENTS --------------------- Please contact us via email if you find bugs, have suggestions for improvements, or would like to be informed of future releases of the module.