CS506/606: Special Topics, Computational Linguistics

Richard Sproat

are double you ess at eks
					    oh bee a dot com

Fall 2009

TR 4-5:20, WCC403

Office Hours: Central Building, 130, Thursday 2-3:30

The following books will be used in this course. They are not required: if you can get by without them, then fine. But I think you would find them useful. Also, you will likely need them if you plan to take
Natural Language Processing.

It is assumed that you know how to program. For students who do not, Kristy Hollingshead has agreed to provide a quick intro to linux scripting early in the quarter. We will discuss details of this, as necessary, on the first day.

We will make use of publicly avaialable tools, in particular the OpenFst toolkit. Versions of this will be installed on the CSLU server. You are also of course welcome to download and install the code on your own machine.


Your grade will depend upon the following components:


This course is about computational linguistics, by which is meant computational approaches to the study of language. These may have engineering applications -- in which case the field grades off into
Natural Language Processing. Or they may be only with a view to understanding more about how language works. In this course we will look at a few problems that fit into each of these categories.

This course will be organized around specific linguistic problems, each organized into its own unit. Each unit will take approximately two and a half weeks.

We will present the problem, present background in areas that are needed to understand one or more computational approaches to the problem, and finally present a computational solution. For the first two or three units, there will be an associated homework, which will be due a week after the end of the unit.

Depending upon interest, we may also spend some time going over issues of text encoding.

Unit 1: Part-of-speech tagging.

9/29, 10/1, 10/6, 10/8, 10/13.


Slides: Unit 1


Homework: Homework 1

Unit 2: Morphology and Phonology.

10/15, 10/20, 10/22, 10/27, 10/29.




Homework: Homework 2

Unit 3: Syntax.

11/3, 11/5, 11/10*, 11/12, 11/17.


Slides: Unit 3


Homework: Homework 3

Unit 4: Statistical Methods in the Study of Ancient Symbols.



Slides: Unit 4

Reading: "A Computational Approach to Deciphering Unknown Scripts", (K. Knight and K. Yamada), Proceedings of the ACL Workshop on Unsupervised Learning in Natural Language Processing, 1999. PDF

Final Two Weeks.

11/24, 12/1, 12/3.


4:00-4:15 Mahsa
4:15-4:30 Mark
4:30-4:45 Steve
4:45-5:00 Géza
5:00-5:15 Masoud
5:15-5:30 Nate


4:00-4:15 David
4:15-4:30 Zephy
4:30-4:45 Ba┼čak
4:45-5:00 Mike
5:00-5:15 Eric
5:15-5:30 Maider


4:00-4:15 Yongshun
4:15-4:30 Emily
4:30-4:45 Ethan
4:45-5:00 Charles
5:00-5:15 Chris

