Formal Syntax and Grammar Engineering

Stephan Oepen

Lilja Øvrelid


a brief summary of course contents and goals;

draft schedule of course and exercise topics;

background information on the LinGO project at CSLI Stanford;

obtaining the LKB package (source code and binaries for certain platforms);

Course Summary

From machine translation to speech recognition and web-based search engines, a wide range of applications demand increasing accuracy and robustness from natural language processing. Meeting these demands requires hand-built, linguistic grammars of human languages (combined with sophisticated statistical processing methods).

In this course we will introduce the fundamental concepts of formal and computational models of natural language grammar and gain practical experience in grammar implementation, i.e. draw on a combination of contemporary grammatical theory and hands-on engineering skills. We will work in the framework of unification-based (or constraint-based) grammar and acquire a solid understanding of the formalism of typed feature structures and their use for linguistic description. Selected chapters from Sag, Wasow, & Bender (2003) will provide the linguistic background knowledge, while we will use the Linguistic Knowledge Builder (LKB; Copestake, 2001) as the implementation environment.

A combination of lectures and in-class exercises will enable students to investigate the implementation of constraints in morphology, syntax, and semantics, working within a unification-based lexicalist framework. While most of the course work will focus on developing small grammars for English, we will apply our jointly acquired grammar engineering expertise to another language towards the end of the term.

Some preliminary knowledge of syntactic theory and phrase structure grammar will be helpful, but no prior programming skills are required. There will be four hands-on exercises assigned throughout the course (see the draft schedule below) that will form the basis for joint laboratory sessions; we will try to not complete each of the exercises during the laboratory hours, but instead expect students to continue implementation work individually outside of class hours. The expected time to complete each assignment should be between two and six hours per exercise, and students will be asked to submit their solutions to each assignment electronically.

Exercises will be graded and contribute substantially towards the final course assessment; exercise results will be complemented by a 90-minute written exam in December (exact date tba). In addition to the four regular exercises that are part of the course schedule below, there will be one optional exercise towards the end of the course, essentially asking students to adapt our implemented grammar (of English) at the time for Swedish. Completion and submission of the additional, Swedish exercise will be a prerequisite to consideration of a VG grade.

Expected Schedule

Date Time Room Topic
Thu, October 21 10:00 – 12:00 E230 Lecture: Course Overview and Motivation; Phrase Structure Grammar
Thu, October 21 13:00 – 15:00 E230 Lecture: Formal Syntax — Unification-Based Grammar
Wed, November 3 09:00 – 10:00 Mac Lecture: Structured Categories
Wed, November 3 10:00 – 12:00 Mac Laboratory: Exercise 2 (due Thu, November 4; 18:00 h)
Thu, November 4 09:00 – 12:00 E230 Lecture: Agreement, Government, Modification
Thu, November 4 15:00 – 17:00 Mac Laboratory: Exercise 2 (due Thu, November 4; 18:00 h)
Fri, November 5 09:00 – 10:00 Mac Lecture: Generalisations in Typed Feature Structures
Fri, November 5 10:00 – 12:00 Mac Laboratory: Exercise 3 (due Fri, November 19; 18:00 h)
Thu, November 11 10:00 – 12:00 Mac Laboratory: Exercise 3 (due Fri, November 19; 18:00 h)
Tue, December 7 09:00 – 10:00 Mac Lecture: Lexical Rules and Morphology
Tue, December 7 10:00 – 12:00 Mac Laboratory: Exercise 4 (due Wed, December 9; 18:00 h)
Wed, December 8 09:00 – 10:00 Mac Lecture: Semantics in Typed Feature Structures
Wed, December 8 10:00 – 12:00 Mac Laboratory: Exercise 5 (due Tue, December 14; 18:00 h)
Wed, December 8 13:00 – 15:00 Mac Laboratory: Exercise 5 (due Tue, December 14; 18:00 h)
Thu, December 9 09:00 – 12:00 E230 Lecture: Natural Language Processing; Summary
Mon, December 13 10:00 – 12:00 Mac Laboratory: Exercise 5 (due Tue, December 14; 18:00 h)
Sat, December 18 09:00 – 12:00   Written Exam: 120 Minutes, Nice & Simple
Fri, December 24 12:00   Submission Deadline for (Optional) Swedish Exercise
Mon, January 24 09:00 – 12:00   Written Exam (2nd Slot): 120 Minutes, Nice & Simple


Slides Grammar Exercises Solution
Overview no grammar Excercise Solution
Categories Grammar Exercise Solution
Typed Feature Structures Grammar Exercise Solution
Lexical Rules Grammar Exercise Solution
Sample Exam   Exercise  
  Basic & Optional Data Exercise  

Background Reading

It is advised that all participants have access to a copy of Sag, Wasow, & Bender (2003) throughout the term; check the library or buy your own copy. Additionally, Copestake (2001) will serve as reference documentation for the LKB software which we will use during laboratories and for exercise assignments. We expect to make at least one copy available for use during laboratory sessions.

The remaining two references — viz. Shieber (1986) and Pollard & Sag (1994) — serve mainly for historic completeness and, thus, constitute optional reading.

last modified: 11-dec-04 (oe@csli.stanford.edu)