News

9 July 2004

Regrettably our costs associated with producing the corpora on CD for non-commercial use were not adequately covered by the previous price. Effective immediately, the cost per corpus is now $300 US.

16 June 2003

CSLU Center memberships, commercial licensing and not-for-profit licensing agreements have been updated.

22 September 2002

Release of Natcell-2.3 corpus with time align phoneme for available orthographical word transcriptions and final release of Stories-1.2 with time align phoneme transcriptions (force time align) for all files with orthographical word transcriptions.

23 August 2002

Final releases of Alphadigits, CWP, Names, Numbers, PortCell, SRX4, SSW, YesNo corpora, including time align phoneme transcriptions (force time align). Release of Apple Words and Phrases v1.3 with time align phoneme transcriptions (force time align) for all files with orthographical word transcriptions.

19 August 2002

Final release of Isolet corpus, including time align phoneme transcriptions.

3 June 2002

Several corpora got updated with new files and all corpora went through consistency check. We are currently verifying all corpora for accuracy. Next step will be working on force alignment for all English corpora with existing orthographic transcription so we can provide automatic time-align phoneme transcription.

25 February 2002

A new corpus The Spoltech Brazilian Portuguese was released and is available now.

14 February 2002

Note, that an license agreement must be signed by an authorized representative of the organization, whether a not for-profit organization or a university.

7 February 2002

A new corpus VOICES was released and is available for commercial via a special licensing agreement, not as part of the standard membership agreement. The corpus consists of 12 speakers, 50 phonetically rich sentences per speaker. Recording procedure involved a mimicking.

28 January 2002

CSLU released National Cellular v2.2. Corpus contains cellular telephone speech from 2337 speakers from locations throughout the Unined States from which 1996 speakers are transcribed.

7 December 2001

We are pleased to announce that the final version of Kid's Speech Corpus was released. This corpus contains of spontaneous and scripted utterances from kids grades k to 10.

15 October 2000

Regrettably our costs associated with producing the corpora on CD were not adequately covered by the previous price. Effective immediately, the cost per corpus is now $30 US.

15 October 2000

All outstanding corpora orders have been filled and shipped. Watch the skies.

1 October 2000

We are please to announce the first release of the Kid's Speech Corpus. This corpus contains about 1000 2 minute spontaneous conversations with kids grades 3-10.

15 September 2000

CSLU hosts over 50 speech technology professionals from more than 20 companies. Yes, we have finally finished the comprehensive review and CD conversion of all our corpora. We are burning CD's now and catching up on orders. Thanks for your patience.

24 July 2000

In order to provide easier access to our corpora, CSLU is transferring all of its corpora to CD format. Before burning the corpora onto CD, we will be performing the following tasks:
  • Converting files from NIST to RIFF format
  • Re-organizing the corpora file structure
  • Removing invalid or incorrect files
  • Updating the documentation to reflect the changes
  • Updating the web pages to reflect the changes

Our goal is to have the corpora updated and burned to CD as soon as possible; however, this will entail a delay of three to four weeks. Since we do not wish to further delay orders, we can send the corpora in the original DAT format. We will be updating this web site each time a new corpus has been updated and burned to CD. Current orders are entitled to the upgraded version of a corpora on CD when it becomes available.
To receive the updated corpora, email cronk@ece.ogi.edu. BE SURE TO INCLUDE THE NAME OF THE ORIGINAL CORPORA YOU ORDERED AS WELL AS YOUR ORIGINAL INVOICE NUMBER.
Please accept our apologies for these delays while we improve the quality of our corpora for our fellow researchers.
Corpora Status
Corpora Name
Completed
22 Language
September 15
Alphadigit
30 June 2000
Apple Words and Phrases
07 September 2000
Cellular Words and Phrases
16 August 2000
Foreign Accented English
07 August 2000
ISOLET
31 July 2000
Multi-Language Telephone Speech
September 15
Names
07 August 2000
National Cellular
07 July 2000
Numbers
18 August 2000
Portland Cellular
September 15
Speaker Recognition
September 15
Spelled and Spoken Words
September 15
SR4X
24 July 2000
Stories
31 July 2000
Yes/No
19 July 2000

May 30, 2000

Our corpora is moving to CDROM. In the past we've distributed corpora on DAT. The process is expected to take about a week. All corpus shipments will be delayed until the transition is complete. We'll make an announcement when the job is finished.

May 27, 2000

Welcome four new phonetic transcribers to our team! See People to meet Flink, Lisa, Trina, and Kay.

May 11, 2000

We are currently compiling a cellular speech corpus and a corpus for research into speaker recognition, as part of an initiative in human language technology supported by the National Science Foundation and DARPA.
JOB ANNOUNCEMENT: Phonetic Transcriber We are currently hiring phonetic transcribers. The OGI Employment page has the official announcement.
Center members are entitled to commercial use of our corpora as part of the membership agreement. Non-members may request our corpora for non-commercial research purposes. We have instituted a $20 US per corpus fee to cover our costs of shipping, media, and handling.