======================================================================== --- OGIsable --- version 1.5 October 2000 http://cslu.cse.ogi.edu/tts Johan Wouters Mike Macon This file is additional code NOT INCLUDED with the Festival TTS system as distributed by CSTR/University of Edinburgh. Festival is being redistributed with permission of CSTR as part of the CSLU Toolkit. The CSLU Toolkit and this code are Copyright (c) 1997-2000 Center for Spoken Language Understanding Oregon Graduate Institute of Science & Technology Please see the file "license_ogi_tts.txt" for information on usage and redistribution, and for a DISCLAIMER OF ALL WARRANTIES. ======================================================================== DESCRIPTION ----------- OGIsable is an extension to the Festival TTS system that provides extensions to the SABLE markup standard. This version of OGIsable has been designed to work with Festival version 1.4.1 released Dec. 1999. The module is distributed in two chunks: 1) OGIsable-1.5.tar.gz -- Scheme code, examples, DTD 2) OGIsounds-1.5.tar.gz -- sounds to insert into the wave You must first install the OGIresLPC package. Festival itself is available from CSTR at http://www.cstr.ed.ac.uk/projects/festival.html. INSTALLATION ------------ This will unpack the synthesizer code into your existing festival installation. 1) Uncompress and unpack all the tar files in the directory above festival/ This will unpack the synthesizer code into your existing festival installation. 2) Add the following lines to the file festival/config/config ALSO_INCLUDE += OGIinsert ALSO_INCLUDE += OGIeffect 3) cd to the festival/ directory and remake festival. You should see the make routine compiling the new code in the festival/src/modules/OGIinsert directory. If not, 'make clean' on your whole installation and start over. 4) When festival is restarted, the commands should be available in Festival. As a test, try starting festival, then running (tts "festival/examples/example.ogisable" 'ogisable) [We need to make a better example.] This one is more interesting: (tts "festival/examples/enh_example.ogisable" 'ogisable) LICENSE ------- This module and accompanying data are freely available for non-commercial use only. Please see the file "license_ogi_tts.txt" for information on usage and redistribution, and for a DISCLAIMER OF ALL WARRANTIES. BUGS and IMPROVEMENTS --------------------- Please contact us via email if you find bugs or have suggestions for improvements: SUMMARY OF MARKUP TAGS ====================== Here's a quick overview of markup tags currently supported in ogisable-mode. A tag can have zero or more attributes. For an example of marked up text, see the file example.ogisable. All original SABLE commands are also supported. BREAK - LEVEL="LARGE" : this starts a new utterance if LEVEL is set to NONE, a break cannot occur where the tag is inserted. if LEVEL has any other value, a minor phrase break is introduced - MSEC="500" : this inserts a pause of a specified number of milliseconds (here, half a second) AUDIO - SRC="laugh.wav" : this inserts a laugh sound in the synthesized speech other sounds currently available are: annoyed.wav hihi.pk relief.wav sniff.wav surprise.wav yawn.wav hihi.wav shh.wav breath.wav hmm.wav smack.wav burp.wav laugh.wav smack2.wav victory.wav haha.wav release.wav sneeze.wav uhm.wav PRON - SUB="edin bruh" : this replaces the pronunciation of the tagged text - IPA="w oU 9r l d b E t" : specify pronunciation using Worldbet - SEGDUR="w 100; 3r 200; 9r 100; d 70" : specify pronunciation and phoneme durations SPEAKER - NAME="abc" : this changes the voice to abc (Mexican spanish male) OGI voices: mwm, tll, jph, aec, abc, hvs, axk, bcs, ogirab http://cslu.cse.ogi.edu/tts/demos for complete overview RATE - SPEED="-20%" : speaking rate 20% slower you can also specify presets of (fastest, fast, medium, slow, slowest) PITCH - BASE="-10%" : this lets the average pitch drop by 10% you can also specify one of (highest,high,medium,low,lowest) - RANGE="+10%" : this increases the pitch range by 10% (more dynamic speech) you can also specify one of (largest,large,medium,small,smallest) - TOBI="H*;H-H%" : this enables specification of ToBI intonation events. If a separator ";" is found, the first label is assumed to be the pitch accent and the second label is the boundary tone. Not all legal ToBI types will affect the speech, since we use a trained F0 generation module that is trained on a limited set. - CONTOUR="0.1 100; 0.3 150; 1.0 100" : allows specification of an F0 contour. The first coordinate is relative time and the second coordinate is pitch in Hertz. VOLUME - LEVEL="+10%" : increases volume by 10% you can also specify one of (loudest,loud,default,medium,quiet) - CONTOUR="0.1 1; 0.3 1.5; 1.0 2.0" : allows specification of a pitch contour. The first coordinate is relative time and the second coordinate is relative gain. MARKER - EMBED="some text" : this places a marker in the input text. This tag can be used to communicate specific timing information to the engine, e.g. to drive movements of an animated face. SAYAS - MODE="phone" : pronounce tagged text (i.e. number) as a phone number current modes: phone,digits,literal,cardinal,ordinal, syllabify. The 'literal' mode will spell text or read digits The 'syllabify' mode enables rhythmic syllabified saying of the tagged word(s) - MODETYPE="fluent" : For MODE="literal", this means the text is spelled fluently. The other option is MODETYPE="isolated". For MODE="syllabify", the MODETYPE options are "automatic" or a specification using Worldbet symbols, such as "s I . l @ . b I . f aI". The dots are used to indicate the syllable boundaries. - MODE="enhance" Implementation of sub-phoneme 'enhanced' cues - time stretching and amplification of transitions between pairs of phonems. To specify which transition segments to enhance, set MODETYPE to be a scheme list with the following parameters: Tgain - amplification gain factor to apply to transition legal values: 0.0 - infinity 0.0 - 1.0 = attenuation Tstretch - time stretch factor to apply to transition legal values: 0.01 - 100 0.01 - 1.0 = time compression For both `gain' and `stretch' operations, only a portion of the phonemes immediately adjacent the transition will be affected. In vowels, for example, this proportion is set to 10%. So if you specify Tstretch=6.0, the first 10% of the vowel will be lengthened to 6 times its original length, and the overall vowel duration will be (.9 + .1*6.0) = 1.5 times longer. The proportions for each of several phoneme classes are currently set in the function `get_breakpoint' in festival/lib/ogi_enhance.scm. The F0 contour is also stretched so that F0 movements stay aligned with the appropriate syllables after the stretched segments. TfeatL - all transitions between a pair of segments that match ALL TfeatR features in TfeatL,TfeatR will be affected. These can be a list of any Festival features that can be found related to a Segment item using item.feat. The most useful are the following (see ogi_worldbet_phones.scm for more): feat legal values explanation ---- ------------ ----------- name any valid phoneme name in the phoneset for a particular voice ph_vc + - vowel (+) or consonant (-) ph_cvox + - 0 consonant voicing (+ = VOICED) ph_ctype s f a n l r 0 cons type: stop fricative affricate nasal lateral approximant ph_cplace l a p b d v g 0 cons place: labial alveolar palatal labio-dental dental velar glottal For an example of this, see festival/examples/enh_example.ogisable. Enhancement functions are defined in festival/lib/ogi_enhance.scm. It's best to consult the code if you have questions about exactly how the functions work. REFERENCES ----------- J. Wouters, B. Rundle and M. W. Macon, "Authoring Tools for Speech Synthesis using the Sable Markup Standard", Eurospeech'99.