[incr tsdb()] Bibliography

Bibliographical references for the [incr tsdb()] documents are provided as BibTeX entries with the individual publications (see below). Additionally, a complete and moderately up-to-date bibliographical database is available in BibTeX format as one single file itsdb.bib; please, make sure to always use these references when quoting from or referring to any of the [incr tsdb()] publications.

[incr tsdb()] - Competence and Performance Laboratory. User Manual.

Stephan Oepen.

Technical Report. Computational Linguistics. Saarland University (in preparation).

This user manual documents [incr tsdb()], an integrated package for diagnostics, evaluation, and benchmarking in practical grammar and system engineering. The software implements an approach to grammar development and system optimization that builds on precise empirical data and systematic experimentation as suggested by, among others, [Erbach 1991] and [Carroll 1994].

[incr tsdb()] has been integrated with several contemporary grammar development systems; the methodology and tools were designed for sufficient flexibility and generality to facilitate interfacing and adaption to other platforms. The [incr tsdb] package is made available to the general public (see for details) in the hope that it may be useful to grammar and system developers and ultimately help in the study and comparison of salient grammar or processor properties across platforms.

Available: `.ps.gz' or `.pdf' file and `.bib' entry.

Ambiguity Packing in Constraint-based Parsing --- Practical Results

Stephan Oepen, John Carroll

NAACL, Seattle, WA (May 2000).

We describe a novel approach to `packing' of local ambiguity in parsing with a wide-coverage HPSG grammar, and provide an empirical assessment of the interaction between various packing and parsing strategies. We present a linear-time, bidirectional subsumption test for typed feature structures and demonstrate that (a) subsumption- and equivalence-based packing is applicable to large HPSG grammars and (b) average parse complexity can be greatly reduced in bottom-up chart parsing with comprehensive HPSG implementations.

Available: `.ps.gz' file and `.bib' entry.

Introduction to this Special Issue

Stephan Oepen, Dan Flickinger, Hans Uszkoreit, and Jun-ichi Tsujii

Danial Flickinger, Stephan Oepen, Jun-ichi Tsujii, Hans Uszkoreit (editors):
Journal of Natural Language Engineering # 6 (1).
Special Issue on Efficient Processing with HPSG (March 2000).

This volume reports on recent achievements in the domain of HPSG-based parsing. Research groups at Saarbrücken, CSLI Stanford, and the University of Tokyo have worked on grammar development and processing systems that allow the use of HPSG-based processing in practical application contexts. Much of the research reported here has been collaborative, and all of the work shares a commitment to producing comparable results on wide-coverage grammars with substantial test suites. The focus of this special issue is deliberately narrow, in order to allow detailed technical reports on the results obtained among the collaborating groups. Thus, the volume cannot aim at providing a complete survey on the current state of the field.

This introduction summarizes the research background for the work reported in the volume and puts the major new approaches and results into perspective. Relationships to similar efforts pursued elsewhere are included, along with a brief summary of the research and development efforts reflected in the volume, the joint reference grammar, and the common sets of reference data.

Available: `.ps.gz' file and `.bib' entry.

Parser Engineering and Performance Profiling.

Stephan Oepen, John Carroll.

Danial Flickinger, Stephan Oepen, Jun-ichi Tsujii, Hans Uszkoreit (editors):
Journal of Natural Language Engineering # 6 (1).
Special Issue on Efficient Processing with HPSG (March 2000).

We describe and argue for a strategy of performance profiling and comparison in the engineering of parsing systems for wide-coverage linguistic grammars. A performance profile is a precise, rich, and structured snapshot of system (and grammar) behaviour at a given development point. The aim is to characterize system performance at a very detailed technical level, but at the same time to abstract away from idiosyncracies of particular processors.

Profiles are obtained with minimal effort by applying a specialized profiling tool to a set of structured reference data (taken from both existing test suites and corpora), in conjunction with a uniform format for test data and processing results. The resulting profiles can be analyzed and visualized at various levels of granularity in order to highlight different aspects of system performance, thus providing a solid empirical basis for system refinement and optimization. Since profiles are stored in a database, comparison with earlier versions, different parameter settings, or other processing systems is straightforward.

We apply several salient performance metrics in a contrastive discussion of various (one-pass, bottom-up, chart-based) parsing strategies (viz. passive vs. active and uni- vs. bidirectional approaches). Based on insights gained from detailed performance profiles, we outline and evaluate a novel `hyper-active' parsing strategy. We also present preliminary profiles for techniques for `packing' of local ambiguities with respect to (partial) subsumption of feature structures.

Available: `.ps.gz' file and `.bib' entry.

Measure for Measure: Parser Cross-Fertilization.
Towards Increased Component Comparability and Exchange.

Stephan Oepen, Ulrich Callmeier.

6th International Workshop on Parsing Technology, Trento (February 2000).

Over the past few years significant progress was accomplished in efficient processing with wide-coverage HPSG grammars. HPSG-based parsing systems are now available that can process medium-complexity sentences (of ten to twenty words, say) in average parse times equivalent to real (i.e. human reading) time. A large number of engineering improvements in current HPSG systems were achieved through collaboration of multiple research centers and mutual exchange of experience, encoding techniques, algorithms, and even pieces of software.

This article presents an approach to grammar and system engineering, termed competence & performance profiling, that makes systematic experimentation and the precise empirical study of system properties a focal point in development. Adapting the profiling metaphor familiar from software engineering to constraint-based grammars and parsers, enables developers to maintain an accurate record of system evolution, identify grammar and system deficiencies quickly, and compare to earlier versions or between different systems.

We discuss a number of exemplary problems that motivate the experimental approach, and apply the empirical methodology in a fairly detailed discussion of what was achieved during a development period of three years. Given the collaborative nature in setup, the empirical results we present involve research and achievements of a large group of people.

Available: `.ps.gz' file and `.bib' entry.

Towards Systematic Grammar Profiling.
Test Suite Technology Ten Years After.

Stephan Oepen, Daniel P. Flickinger.

Robert Gaizauskas (editor):
Journal of Computer Speech and Language # 12 (4).
Special Issue on Evaluation (June 1998).

An experiment with recent test suite and grammar (engineering) resources is outlined: a critical assessment of the EU-funded TSNLP (Test Suites for Natural Language Processing) package as a diagnostic and benchmarking facility for a distributed (multi-site) large-scale HPSG grammar engineering effort.

This paper argues for a generalized, systematic, and fully automated testing and diagnosis facility as an integral part of the linguistic engineering cycle and gives a practical assessment of existing resources; both a flexible methodology and tools for competence and performance profiling are presented.

By comparison to earlier evaluation work as reflected in the Hewlett-Packard test suite data, released exactly ten years before TSNLP, it is judged where test-suite-based evaluation has improved (and where not) over time.

Available: `.ps.gz' file and `.bib' entry.

TSNLP --- Test Suites for Natural Language Processing.

Stephan Oepen, Klaus Netter, Judith Klein.

John Nerbonne (editor): Linguistic Databases. CSLI Lecture Notes # 77 (November 1997).

The objective of the TSNLP project is to construct test suites in three different languages building on a common basis and methodology. Specifically, TSNLP addresses a range of issues related to the construction and use of test suites. The main goals of the project are to:

provide guidelines for the construction of test suites to facilitate a coherent and systematic data collection;
develop a rich annotation schema that is maximally neutral with respect to particular linguistic theories or specific evaluation and application types (section 1.2.1);
construct substantial test fragments in English, German, and French that are applicable to different NLP application types (viz. parsers, grammar and controlled language checkers) (section 1.2.2);
design and implement an extensible linguistic database to store, access and manipulate the test data (section 1.2.3); and to
investigate methods and tools for (semi-) automated test suite construction.

Both the methodology and test data developed currently are validated in a testing and application phase (section 1.2.4).

In the present paper the authors take the opportunity to present some of the recent outcome of TSNLP to the community of language technology developers as well as to potential users of NLP systems. Accordingly, the presentation puts emphasis on practical aspects of applicability and plausibility rather than on theoretically demanding research topics; the TSNLP results presented are of both methodological and technological interest.

Available: `.ps.gz' file and `.bib' entry.

TSNLP --- Test Suites for Natural Language Processing.

Sabine Lehmann, Stephan Oepen,
Sylvie Regnier-Prost, Klaus Netter, Vornika Lux, Judith Klein,
Kirsten Falkedal, Frederik Fouvry, Dominique Estival, Eva Dauphin,
Hervé Compagnion, Judith Baur, Lorna Balkan, Doug Arnold.

COLING, Copenhagen, Denmark (August 1996).

The growing language technology industry needs measurement tools to allow researchers, engineers, managers, and customers to track development, evaluate and assure quality, and assess suitability for a variety of applications.

The TSNLP (Test Suites for Natural Language Processing) project has investigated various aspects of the construction, maintenance and application of systematic test suites as diagnostic and evaluation tools for NLP applications. The paper summarizes the motivation and main results of TSNLP: besides the solid methodological foundation of the project, TSNLP has produced substantial (i.e. larger than any existing general test suites) multi-purpose and multi-user test suites for three European languages together with a set of specialized tools that facilitate the construction, extension, maintenance, retrieval, and customization of the test data.

The publicly available results of TSNLP represent a valuable linguistic resource that has the potential of providing a wide-spread pre-standard diagnostic and evaluation tool for both developers and users of NLP applications.

Available: `.ps.gz' file and `.bib' entry.

last modified: 22-jun-00 (oe@coli.uni-sb.de)

[incr tsdb()] Bibliography

[incr tsdb()] User Manual

[incr tsdb()] Publications

[incr tsdb()] Bibliography