Publications

Most of the CSLU corpora contain transcriptions of the speech data in the corpus. The transcriptions are usually non-time-aligned word level transcriptions but a few corpora contain time-aligned word level and phonetic transcriptions. In order to ensure consistancy in the transcriptions, the following guide has been developed. The guide describes all word level and phonetic transcription conventions for each of the languages in which we have speech data.

Conference Papers

The following are conference papers that provide detailed descriptions of many of the CSLU corpora.

Please also see the CSLU publications page for a complete listing of CSLU publications.
  • Khaldoun Shobaki, John-Paul Hosom, and Ronald Cole. The OGI Kids' Speech Corpus and Recognizers , in Proceedings of the International Conference on Spoken Language Processing (ICSLP), Beijing, China, Oct, 2000.
  • Ronald Cole, Mike Noel, and Victoria Noel. The CSLU Speaker Recognition Corpus. In Proceedings of ICSLP, Sydney, Australia, 1998. (PostScript, 240853 bytes)
  • R. A. Cole, M. Noel, T. Lander, and T. Durham. New telephone speech corpora at CSLU. In Proc. of the Fourth European Conference on Speech Communication and Technology, Madrid, Spain, September 1995.
  • T. Lander, R. A. Cole, B. T. Oshika, and M. Noel. The OGI 22 language telephone speech corpus. In Proc. of the Fourth European Conference on Speech Communication and Technology, Madrid, Spain, September 1995.
  • R. A. Cole, M. Fanty, M. Noel, and T. Lander. Telephone speech corpus development at CSLU. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), Yokohama, Japan, September 1994. (postscript, 74834 bytes)
  • R. A. Cole, M. Noel, D. C. Burnett, M. Fanty, T. Lander, B. Oshika, and S. Sutton. Corpus development activities at the Center for Spoken Language Understanding. In Proc. of the ARPA Workshop on Human Language Technology, April 1994.
  • R. A. Cole, B.T. Oshika, M. Noel, T. Lander, and M. Fanty. Labeler agreement in phonetic labeling of continuous speech. In Proceedings of the International Conference on Spoken Language Processing, pages 2131-2134, Yokohama, Japan, September 1994.
  • R. A. Cole, Y. Muthusamy, and M. Fanty. The ISOLET Spoken Letter Database. 1994.
  • Y. K. Muthusamy, R. A. Cole, and B. T. Oshika. The OGI Multi-language Telephone Speech Corpus.
  • R. A. Cole, K. Roginski, and M. Fanty. A Telephone Speech Database of Spelled and Spoken Names.
  • Jacques de Villiers, Pieter Vermeulen, and Mark Fanty. Digital Data Collection at CSLU, 1994.