Technical Report. Computational Linguistics. Saarland University
(in preparation).
This user manual documents [incr tsdb()], an integrated package for
diagnostics, evaluation, and benchmarking in practical grammar and
system engineering.
The software implements an approach to grammar development and system
optimization that builds on precise empirical data and systematic
experimentation as suggested by, among others, [Erbach 1991] and
[Carroll 1994].
[incr tsdb()] has been integrated with several contemporary grammar
development systems; the methodology and tools were designed for
sufficient flexibility and generality to facilitate interfacing and
adaption to other platforms.
The [incr tsdb] package is made available to the general public (see
Available:
`.ps.gz' or
`.pdf' file and
`.bib' entry.
NAACL, Seattle, WA (May 2000).
We describe a novel approach to `packing' of local ambiguity in parsing with
a wide-coverage HPSG grammar, and provide an empirical assessment of the
interaction between various packing and parsing strategies.
We present a linear-time, bidirectional subsumption test for typed
feature structures and demonstrate that (a) subsumption- and
equivalence-based packing is applicable to large HPSG grammars and
(b) average parse complexity can be greatly reduced in bottom-up chart
parsing with comprehensive HPSG implementations.
Available:
`.ps.gz' file and
`.bib' entry.
Danial Flickinger, Stephan Oepen,
Jun-ichi Tsujii, Hans Uszkoreit (editors):
This volume reports on recent achievements in the domain of HPSG-based
parsing.
Research groups at Saarbrücken, CSLI Stanford, and
the University of Tokyo have worked on grammar development and
processing systems that allow the use of HPSG-based processing in
practical application contexts.
Much of the research reported here has been collaborative, and all of
the work shares a commitment to producing comparable results on
wide-coverage grammars with substantial test suites.
The focus of this special issue is deliberately narrow, in order to
allow detailed technical reports on the results obtained among the
collaborating groups.
Thus, the volume cannot aim at providing a complete survey on the
current state of the field.
This introduction summarizes the research background for the work
reported in the volume and puts the major new approaches and results
into perspective.
Relationships to similar efforts pursued elsewhere are included,
along with a brief summary of the research and development efforts
reflected in the volume, the joint reference grammar, and the common
sets of reference data.
Available:
`.ps.gz' file and
`.bib' entry.
Danial Flickinger, Stephan Oepen,
Jun-ichi Tsujii, Hans Uszkoreit (editors):
We describe and argue for a strategy of performance profiling and
comparison in the engineering of parsing systems for wide-coverage
linguistic grammars.
A performance profile is a precise, rich, and structured snapshot of
system (and grammar) behaviour at a given development point.
The aim is to characterize system performance at a very detailed
technical level, but at the same time to abstract away from
idiosyncracies of particular processors.
Profiles are obtained with minimal effort by applying a specialized
profiling tool to a set of structured reference data (taken from both
existing test suites and corpora), in conjunction with a uniform format
for test data and processing results.
The resulting profiles can be analyzed and visualized at various
levels of granularity in order to highlight different aspects of system
performance, thus providing a solid empirical basis for system
refinement and optimization.
Since profiles are stored in a database, comparison with earlier
versions, different parameter settings, or other processing systems is
straightforward.
We apply several salient performance metrics in a contrastive
discussion of various (one-pass, bottom-up, chart-based) parsing
strategies (viz. passive vs. active and uni- vs. bidirectional
approaches).
Based on insights gained from detailed performance profiles, we outline
and evaluate a novel `hyper-active' parsing strategy.
We also present preliminary profiles for techniques for `packing' of
local ambiguities with respect to (partial) subsumption of feature
structures.
Available:
`.ps.gz' file and
`.bib' entry.
6th International Workshop on Parsing Technology,
Trento (February 2000).
Over the past few years significant progress was accomplished in
efficient processing with wide-coverage HPSG grammars.
HPSG-based parsing systems are now available that can process
medium-complexity sentences (of ten to twenty words, say) in average
parse times equivalent to real (i.e. human reading) time.
A large number of engineering improvements in current HPSG systems
were achieved through collaboration of multiple research centers and
mutual exchange of experience, encoding techniques, algorithms, and
even pieces of software.
This article presents an approach to grammar and system engineering,
termed competence & performance profiling, that makes systematic
experimentation and the precise empirical study of system properties
a focal point in development.
Adapting the profiling metaphor familiar from software engineering to
constraint-based grammars and parsers, enables developers to maintain
an accurate record of system evolution, identify grammar and system
deficiencies quickly, and compare to earlier versions or between
different systems.
We discuss a number of exemplary problems that motivate the
experimental approach, and apply the empirical methodology in a fairly
detailed discussion of what was achieved during a development period of
three years.
Given the collaborative nature in setup, the empirical results we
present involve research and achievements of a large group of people.
Available:
`.ps.gz' file and
`.bib' entry.
Robert Gaizauskas (editor):
An experiment with recent test suite and grammar (engineering) resources is
outlined: a critical assessment of the EU-funded TSNLP (Test Suites for Natural
Language Processing) package as a diagnostic and benchmarking facility for a
distributed (multi-site) large-scale HPSG grammar engineering effort.
This paper argues for a generalized, systematic, and fully automated testing
and diagnosis facility as an integral part of the linguistic engineering cycle
and gives a practical assessment of existing resources; both a flexible
methodology and tools for competence and performance profiling are presented.
By comparison to earlier evaluation work as reflected in the Hewlett-Packard
test suite data, released exactly ten years before TSNLP, it is judged where
test-suite-based evaluation has improved (and where not) over time.
Available:
`.ps.gz' file and
`.bib' entry.
John Nerbonne (editor): Linguistic Databases. CSLI Lecture
Notes # 77 (November 1997).
The objective of the TSNLP project is to construct test suites in three
different languages building on a common basis and methodology.
Specifically, TSNLP addresses a range of issues related to the
construction and use of test suites.
The main goals of the project are to:
In the present paper the authors take the opportunity to present some
of the recent outcome of TSNLP to the community of language
technology developers as well as to potential users of NLP
systems.
Accordingly, the presentation puts emphasis on practical aspects of
applicability and plausibility rather than on theoretically demanding
research topics; the TSNLP results presented are of both
methodological and technological interest.
Available:
`.ps.gz' file and
`.bib' entry.
COLING, Copenhagen, Denmark (August 1996).
The growing language technology industry needs measurement
tools to allow researchers, engineers, managers, and customers
to track development, evaluate and assure quality, and assess
suitability for a variety of applications.
The TSNLP (Test Suites for Natural Language Processing)
project has investigated various aspects of the construction,
maintenance and application of systematic test suites as
diagnostic and evaluation tools for NLP applications.
The paper summarizes the motivation and main results of
TSNLP: besides the solid methodological foundation of the
project, TSNLP has produced substantial (i.e. larger than
any existing general test suites) multi-purpose and multi-user
test suites for three European languages together with a
set of specialized tools that facilitate the construction,
extension, maintenance, retrieval, and customization of the
test data.
The publicly available results of TSNLP represent a valuable
linguistic resource that has the potential of providing a wide-spread
pre-standard diagnostic and evaluation tool for both developers and
users of NLP applications.
Available:
`.ps.gz' file and
`.bib' entry.
[incr tsdb()] - Competence and Performance Laboratory.
User Manual.
Stephan Oepen.Ambiguity Packing in Constraint-based Parsing --- Practical Results
Stephan Oepen, John Carroll Introduction to this Special Issue
Stephan Oepen, Dan Flickinger, Hans Uszkoreit, and Jun-ichi Tsujii
Journal of Natural Language Engineering # 6 (1).
Special Issue on Efficient Processing with HPSG (March 2000).
Parser Engineering and Performance Profiling.
Stephan Oepen, John Carroll.
Journal of Natural Language Engineering # 6 (1).
Special Issue on Efficient Processing with HPSG (March 2000).
Measure for Measure: Parser Cross-Fertilization.
Stephan Oepen, Ulrich Callmeier.
Towards Increased Component Comparability and Exchange. Towards Systematic Grammar Profiling.
Stephan Oepen, Daniel P. Flickinger.
Test Suite Technology Ten Years After.
Journal of Computer Speech and Language # 12 (4).
Special Issue on Evaluation (June 1998).
TSNLP --- Test Suites for Natural Language Processing.
Stephan Oepen, Klaus Netter, Judith Klein.
Both the methodology and test data developed currently are validated in
a testing and application phase (section 1.2.4).
TSNLP --- Test Suites for Natural Language Processing.
Sabine Lehmann, Stephan Oepen,
Sylvie Regnier-Prost, Klaus Netter, Vornika Lux, Judith Klein,
Kirsten Falkedal, Frederik Fouvry, Dominique Estival, Eva Dauphin,
Hervé Compagnion, Judith Baur, Lorna Balkan, Doug Arnold.
last modified: 22-jun-00
(oe@coli.uni-sb.de)