Grammars

From DELPH-IN

Jump to: navigation, search

DELPH-IN members share a commitment to re-usable, multi-purpose resources and active exchange. Based on contributions from several members and joint development over many years, an open-source repository of software and linguistic resources has been created that has wide usage in education, research, and application building.

At the core of the DELPH-IN repository is agreement among partners on a shared set of linguistic assumptions (grounded in HPSG and Minimal Recursion Semantics) and on a common formalism (i.e. logic) for linguistic description in typed feature structures. The formalism is implemented in several development and processing environments (which can serve differing purposes) and enables the exchange of grammars and lexicons across platforms. Formalism continuity, on the other hand, has allowed DELPH-IN researchers to develop several comprehensive, wide-coverage grammars of diverse languages that can be processed by a variety of software tools.

A re-usable, multi-purpose grammar is dubbed a ‘resource grammar’. In the modern DELPH-IN ecology, a resource grammar has the following properties: It is open source, under active development, developed and maintained against a set of test corpora representing naturally occurring language from different genres, has an associated treebank and treebank-trained parse selection model, and in addition has documentation (often including a demo) available online. In addition to the resource grammars, described below, there is also a set of ‘emerging’ grammars for other languages.

Contents

LinGO English Resource Grammar (ERG)

Being developed at the Center for the Study of Language and Information (CSLI) at Stanford University since 1993. The ERG was originally developed within the Verbmobil machine translation effort, but over the past few years has been ported to additional domains and significantly extended. The grammar includes a hand-built lexicon of over 35,000 lexemes and allows interfacing to external lexical resources (like COMLEX). The main grammar developer is Dan Flickinger, with contributions by (among others) Emily Bender, Rob Malouf, Stephan Oepen, and Jeff Smith.

Jacy Japanese Grammar

Jacy is a grammar of Japanese that builds on the ChaSen package for word segmentation, morphological analysis, and a treatment of unknown words. It has been developed at multiple sites. It was originally developed at the German National Research Center in AI (DFKI GmbH) and Saarland University (both in Saarbrücken, Germany) then through cooperation with YY Technologies (in Mountain View, CA), later NTT Communications Research Laboratories and the National Institutue for Information Technologies, Japan, and now Nanyang Technological University in Singapore. Melanie Siegel, Emily Bender, and Francis Bond are the main developers.

German Grammar (GG)

Developed at the DFKI Language Technology Lab in Saarbrücken, Germany, and initially published in 1996. GG development has been funded by the German Ministry for Education, Science, Research and Technology under a variety of projects, and used in the context of applications including machine translation (Verbmobil), question answering (Quetal), and grammar checking (Checkpoint). The grammar is currently developed by Berthold Crysmann.

Spanish Resource Grammar (SRG)

A large scale grammar of Spanish. The SRG was originally developed at the Institut Universitari de Lingüística Aplicada (IULA) of the Universitat Pomepou Fabra, and is currently maintained at the Grup de Recerca Interuniversitari en Aplicacions Lingüístiques (GRIAL) of the Universitat de Barcelona. Montserrat Marimon is the main developer.

Additional Grammar Resources

Information about additional grammars for other languages, including German, Portuguese, Korean, Norwegian, and Chinese among others, can be found on the DELPH-IN internal wiki Grammar Catalogue page. These grammars vary in size from fairly broad coverage to small experimental grammars.
Personal tools