Distribution of Corpora

One of the greatest impacts that CSLU has had on the speech recognition research community (and related fields) is the development and distribution of speech corpora. Since the early 1990s CSLU has been collecting, processing, and distributing corpora worldwide at no cost to acedemic institutions and non-profit organizations. CSLU currently distributes 17 corpora which are listed on the list of available corpora.

Corpus Distribution by Corpus

The following table shows how often each corpus has been requested.
CorpusQuantity
22 Language102
Alphadigit70
Apple54
CWP74
FAE24
ISOLET116
MLTS132
MLTS (mem only)77
Names66
National Cellular45
Numbers137
Portland Cellular53
SR4X54
Spkrec47
Stories118
Whitepages88
Yes/No52
Total1309
 
Corpus Distribution by Year

The following table presents the number of corpora distributed each year.

YearQuantity
199213
199311
199453
199593
1996245
1997316
1998120
1999458
Total1309
 
Corpus Distribution by Country

The following table lists each of the countries that corpora have been sent to and a count of how many were sent to that country.

CountryQuantity
Australia50
Belgium29
Brazil20
China8
Denmark4
England15
Germany27
Hong Kong73
Iceland6
Iran2
Israel47
Japan13
Malaysia3
New Zealand9
Philipines1
Portugal6
Romania16
Scotland2
Slovakia4
South Korea11
Sweden19
Taiwan5
Turkey4
USA594
CountryQuantity
Belarus4
Bolivia2
Canada22
Czech4
Egypt7
France40
Greece5
Hungary3
India28
Ireland5
Italy37
Korea24
Mexico6
Peru3
Poland18
Republic of Korea1
Russia9
Singapore29
Slovenia7
South Africa5
Spain18
Switzerland5
The Netherlands14
UK26
unknown17