CSE 562/662 - Natural Language Processing

Instructor: Brian Roark

Class time: MW 11:30 AM - 12:50 PM    Jan. 9 - Mar. 24, 2006

Class location: Wilson Clark Center - Room 403

Office hours: W 2-4, Central 157, or by appointment

Required textbook: Jurafsky and Martin Speech and Language Processing
(on reserve at the library)

Skip to overview of lectures.


The goal of this course is to give a broad but detailed introduction to the key algorithms and modeling techniques used for Natural Language Processing (NLP) today. With a few exceptions, NLP involves taking a sequence of words as input (e.g. a sentence) and returning some annotation(s) for that string. Well-known examples of this include part-of-speech tagging and syntactic parsing. Many other common tasks, e.g. shallow parsing or named-entity recognition, can be easily recast as tagging tasks; hence certain basic techniques can be widely applied within NLP. Applications such as automatic speech recogntion, machine translation, information extraction, and question answering all make use of NLP techniques. By the end of this course, you should understand how to approach common natural language problems arising in these and other applications.


CSE 560/660 - Artificial Intelligence is a prerequisite for this course. There is no official programming language for this course, but there will be a fair amount of programming required to complete assignments, hence facility with some programming language is assumed.


10% of your grade will depend on in-class discussion, 50% on the homeworks and 20% each on the midterm and final.

What we'll cover and an approximate schedule

Roughly speaking, half of the course will be devoted to finite-state methods, and half to context-free methods (or beyond). Algorithms for annotating linguistic structure will always be presented with statistical variants, which provide the basis for disambiguation.

I have listed tentative reading assignments from the Jurafsky and Martin (J&M) textbook that are associated with each section. These should be taken as a rough guide, as the schedule will almost certainly change over the term.

Date     Topic Reading AssignmentFAQs
Jan 9 Introduction to NLP; Applications using NLP; Chomsky hierarchy J&M Ch. 1,13    
Jan 11 Weighted finite-state automata and transducers; n-grams and smoothing J&M Ch. 2,6 HW 1FAQ1
Jan 13 Optional tutorial on regular expression processing, CSEE Central 123, 11-noon      
Jan 18 Finite-state Morphology J&M Ch. 3 HW 2, pt.1FAQ2
Jan 23 Phonology and Pronunciation J&M Ch. 4,5 HW 2, pt.2  
Jan 25 N-grams and smoothing (cont.); grammar and lexicon composition C&G, 1998
MP&R, 2001
Jan 30 POS-tagging and Class-based language modeling J&M Ch. 8 (new Ch. 4)    
Feb 1 Expectation Maximization (EM) for smoothing and word-classes JB, 1997 section 4.0-1 HW 4FAQ4
Feb 6 Machine learning for tagging MC, 2002
S&P, 2003
Feb 8 Context-free grammars (CFGs); treebanks; probabilistic CFGs J&M Ch. 9   
Feb 13 Midterm Exam    
Feb 15 Context-free parsing (introduction); CYK J&M Ch. 12.1
fixed algorithm
Feb 22 Context-free parsing (continued); Earley, Shift-reduce, Top-down
Left-corner; grammar transformations
J&M Ch. 10
MJ, 1998(a)
Feb 27 Context-sensitive grammars; unification J&M Ch. 11   
Mar 1 Tree-adjoining grammar; Categorial grammar J&S, 1997, sec. 1-4,8
MS, 1999
Mar 6 Statistical parsing approaches (1) J&M Ch. 12,
MJ, 1998(b)
Mar 8 Statistical parsing approaches (2); and finite-state parsing MC, 1997
EC, 1997
Mar 13 Lexical semantics and word statistics J&M Ch. 16 HW 8 FAQ8
Mar 15 Word sense disambiguation J&M Ch. 17    
Mar 20 Anaphora resolution; NP coreference J&M Ch. 18    
Mar 22 Final Exam      

JB, 1997  Jeff A. Bilmes. A Gentle Tutorial on the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models. International Computer Science Institute (ICSI) Technical Report TR-97-021, 1997.
EC, 1997  Eugene Charniak. Statistical parsing with a context-free grammar and word statistics, In Proceedings of the Fourteenth National Conference on Artificial Intelligence, AAAI Press/MIT Press, Menlo Park, 1997.
C&G, 1998   Stan Chen and Joshua Goodman. An Empirical Study of Smoothing Techniques for Language Modeling. Harvard University Technical Report TR-10-98, 1998.
MC, 1997  Michael Collins. Three Generative, Lexicalised Models for Statistical Parsing. In Proceedings of the 35th Annual Meeting of the ACL, 1997.
MC, 2002   Michael Collins. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. In Proceedings of Empirical Methods in Natural Language Processing (EMNLP), 2002.
MJ, 1998(a)  Mark Johnson. Finite-state approximation of constraint-based grammars using left-corner grammar transforms. In Proceedings of the 36th Annual Meeting of the ACL, 1998, pp. 619-623.
MJ, 1998(b)  Mark Johnson. PCFG models of linguistic tree representations. Computational Linguistics, 24(4), pp. 617-636, 1998.
J&S, 1997  Aravind Joshi and Yves Schabes. Tree-Adjoining Grammars. In Handbook of Formal Languages, G. Rozenberg and A. Salomaa (eds.), Vol. 3, Springer, Berlin, New York, 1997, pp. 69-124.
MP&R, 2001  Mehryar Mohri, Fernando Pereira and Michael Riley. Weighted Finite-State Transducers in Speech Recognition. Computer Speech & Language, 16(1), pp. 69-88, 2001.
MS, 1999  Mark Steedman. Categorial Grammar. In MIT Encyclopedia of Cognitive Sciences, R. Wilson and F. Keil (eds.), 1999.
S&P, 2003   Fei Sha and Fernando Pereira. Shallow parsing with conditional random fields. In Proceedings of HLT-NAACL 2003, pages 213-220.