TSNLP User Manual

To document the major results of TSNLP and make the methodology, test data, and tools developed by the project widely assessable to the academic and industrial public, the consortium has compiled a concise three-volume User Manual. The User Manual is intended to update, summarize, reflect, and relate the information that is available in the TSNLP publications and project reports and sometimes supersedes the latter.

Potential users of TSNLP results or NLP developers who want to build on and extend the TSNLP test suites should refer to the User Manual first before looking at the wealth of publications and project reports. However, in several cases the User Manual does not duplicate the full information comprised by a token project report but give a reference instead; thus, the complete set of TSNLP project reports is preserved for technical details and as background information. The TSNLP User Manual comes in three volumes

Volume 1: Background, Methodology, Customization, and Testing;
Volume 2: Core Test Suite Technology;
Volume 2b: Test Suite Tools; and
Volume 3: Data Documentation.

Volume 1 contains a background chapter in which some of the factors which have influenced the design of the project are sketched. The methodology chapter gives a step by step account of how one can go about writing core data, that is, data that cover central phenomena of a language and that are intended to be applicable to a wide range of applications. The customisation chapter describes how the core data can be customized to a particular application (and sketches how it could be customized to a particular domain or text type). The chapter on testing gives an example of how the test suite can be applied to a real life evaluation scenario.

Volume 2 contains a description of the annotation scheme on which the data was constructed, the construction tool tsct(1) used to create the data, the database tsdb(1) on which the data is mounted, and the automated import and consistency checking procedure from tsct(1) to tsdb(1).

Volume 2b documents the test suite generation (AutoTSG) and lexical replacement tools.

Volume 3 contains the detailed documentation that accompanies the data, and which is intended to make the data more accessible to users. It also contains the category and function labels used in the English, French, and German test data with examples for each language, in addition to the vocabulary list used for the test data by the three languages.

Please note that the title page for most of the TSNLP user manual volumes contains colour PostScript causing older versions of ghostview(1) to report an error. Even if your version of ghostview(1) does not support colour you can browse the entire documents, (except for the title page) by advancing to the following pages; besides, the documents print without problems on all black and white printers that we have access to.

Background, Methodology, Customization, and Testing.

Lorna Balkan, Frederik Fouvry, Sylvie Regnier-Prost (editors).

University of Essex, UK

This volume consists of four chapters. The Background chapter discusses the factors which influenced the design of the project. The Methodology chapter gives a step by step account of how one can go about writing core data, that is, data that cover central phenomena of a language and that are intended to be applicable to a wide range of applications. The Customisation chapter describes how the core data can be customised to a particular application (and sketches how it could be customised to a particular domain or text type). The chapter on Testing gives an example of how the test suite can be applied to a real life evaluation scenario.

Available: `.ps' file and `.bib' entry.

Core Test Suite Technology.

Stephan Oepen, Frederik Fouvry, Klaus Netter, Tom Fettig, Fred Oberhauser.

Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI) GmbH

Because the test data construction proper as well as the customization and application of a multi-purpose test suite to a specific NLP system or domain are laborious, cost-intensive and error-prone tasks, TSNLP put strong emphasis on supplying suitable special-purpose technology to facilitate both the development as well as the usage of the TSNLP test data. The TSNLP core technology designed to support both developers of additional or new test data and users who plan to apply the TSNLP data to a token system or domain. It comprises the following software packages:

the test suite construction tool tsct(1), an intelligent form-based editor for TSNLP test data;
the test suite database tsdb(1), a simplified and fine-tuned relational database management system;
the import and consistency checking procedure import(1) that converts data from the tsct(1) to tsdb(1) format and ensures basic wellformedness and consistency; and
the test suite utilities tsu(1), a set of utility scripts that have evolved throughout the project and provides some useful functionality (e.g. computing a list of vocabulary used sorted by frequency).

Since both the test suite construction tool and the test suite database crucially build on the TSNLP annotation schema (the formal specification of properties and values used in classifying and organizing TSNLP test data), the abstract annotation schema is reviewed in section 2 and then related to its implementations in tsct(1) and tsdb(1).

Available: `.ps' file and `.bib' entry.

Test Suite Tools.

Frederik Fouvry (editor).

University of Essex, UK

This volume of the TSNLP user manual describes the tools developed during the project, to automate the process of test item construction. The volume contains three chapters:

Chapter 1 describes the AutoTsg: the automatic test suite generation tool.
Chapter 2 describes the graphical user interface to AutoTSG.
Chapter 3 describes the lexical replacement tool for changing the vocabulary of test suites.

Chapters 1 and 3, and this introduction are written by Doug Arnold, chapter 2 is written by Martin Rondell. The revisions (this volume is a slightly revised and updated version of the TSNLP deliverable WP 5.1) are made by Frederik Fouvry. As regards the Prolog and Lisp code, the code for the engine is due to Dave Moffat (who did the origininal implementation), Doug Arnold, and Martin Rondell. The GUI code is by Martin Rondell, apart from the code dealing with the viewing of trees. The Lisp code for the lexical replacement tool is due to Doug Arnold and Frederik Fouvry.

Available: `.ps' file and `.bib' entry.

Data Documentation.

Sabine Lehmann, Dominique Estival, Kirsten Falkedal, Hervé Compagnion,
Lorna Balkan, Frederik Fouvry, Judith Baur, Judith Klein.

Istituto Dalle Molle per gli Studii Semantici e Cognitivi (ISSCO)

This volume of the TSNLP user manual describes the TSNLP test data available through the TSNLP database. The documentation explains how the various phenomena have been treated in the three languages English, French and German. Not all the phenomena have been covered to the same extent in all languages and in some cases, they have not been covered at all in one or more language. The document provides an overview of the number of existing test sentences for the different phenomena. Furthermore, it includes two annexes which present:

the category and function labels used in the English, French, and German test data with examples for each language and
a vocabulary list used for the test data by the three languages.

The complete set of TSNLP test data can be accessed through the TSNLP home page http://www.delph-in.net/tsnlp/.

Available: `.ps' file and `.bib' entry.

[objective] [consortium] [staff] [publications] [construction tool] [database] [TSNLP home]

last modified: 21-may-96