Numbers v1.3

Structure | Protocol | Versions | Misc

General Description
The Numbers Corpus is a collection of naturally produced numbers. The utterances were taken from other CSLU telephone speech data collections, and include isolated digit strings, continuous digit strings, and ordinal/cardinal numbers. A total of 23902 files are included in this corpus.

Recording Details
The data in this corpus were collected over telephone lines. They were collected from both analog and digital phone lines.

The analog data were recorded using a Gradient Tehcnologies analog-to- digital conversion box. These files were recorded as 16-bit, 8 khz and stored in a linear format.

The digital data were recorded with the CSLU T1 digital data collection system. These files were sampled at 8 khz 8-bit and stored as ulaw files.

All of the data have been linearly encoded in the 16-bit RIFF standard file format.

Directory Structure
There are five top-level directories in this distribution: docs, speech, labels, trans, and misc. The docs directory contains assorted documentation files.

The speech, trans, and labels directories contain the data files, which have the following name structure:
xxxxx = call number
y = utterance code
zzz = file extension (txt/wav/phn)

For example:

This utterance is from caller 1016 and contains numbers from a street address.

Corresponding text and phonetic transcriptions can be found in these files:

These audio and text files are subdivided into directories based on their call number div 100. So, these files would be found in /numbers/speech/10,, /numbers/trans/10, and /numbers/labels/10, respectively.

The text transcriptions were performed according to the non time-aligned word-level conventions described in the CSLU Labeling Guide.

Phonetic transcriptions are plain text files that carry time-aligned phonetic labels. The first two lines of the file are a header, which defines the length of a "frame" in milliseconds. The rest of the files consists of two numbers that define a frame range, and a label that applies to that region. For example:
		MillisecondsPerFrame: 1.000000
		2 113 .pau
		113 191 w
		191 267 ^
		267 395 n        

So, we can see here that a frame corresponds to 1 millisecond (ms) of time, and that from 2 to 113 ms into the file, there is a pause (.pau), with the first phoneme (w) starting at 113 ms and stretching to 191 ms.