Try the demos - they are in .au (sun), .wav (pc), and .mp3 (pc) format.
This is Festival using the OGIresLPC synthesis module and diphone-based voices developed at OGI in 1997.
These examples were synthesized using a MIDI file input. A sinusoidal model analysis was performed on a single recording of the syllables "/L/AA/","/L/ /EY/". From these prototype syllables, the sinusoidal model performs the synthesis of the musical phrase based on control functions specified by the MIDI file.
We have also extended the synthesis capability to cover any arbitrary English utterance. The next example was synthesized using a set of phonetic transition units stored in an online inventory. The system selects several of these units at runtime, concatenates them, and then performs smoothing and pitch/time-scale modification. For more details, see our paper presented at ICASSP 1997 in Munich.
Find out more about singing synthesis research and our new Festival-based system for singing synthesis, called Flinger (Festival singer).
Voice conversion is a technique that modifies a source speaker's speech to sound as if it was spoken by a target speaker. Speech synthesizers using voice conversion technologies allow developers to create more voices from a single database and users to personalize the synthesizer to speak with any desired voice after a training period. The following sound files were synthesized using Festival in conjunction with a voice conversion module.
To begin the demonstration, first listen to the synthetic source and the desired, natural target speaker by clicking on the bubbles below (left half for .au and right half for .wav format). During training, we create mapping models that describe the relationship between the source and the target speaker. The storage requirements for these models are several orders of magnitude less than a diphone database. Now listen to the simulated target speaker. The demonstrated conversion algorithm transforms the pitch (rate of vibration of the vocal folds) and the spectrum (frequency components) of the source speech, but leaves the residual (detailed description of the movement of the vocal folds) unchanged.
For more information, see the research section for voice conversion.