Treebanks

From DELPH-IN

Jump to: navigation, search

As DELPH-IN resources began to be deployed in practical applications, both in industry and in academia, it became clear that there was a need for improved parse ranking, disambiguation, and robust recovery techniques. More generally, there is now broad consensus that applications of broad-coverage linguistic grammars for analysis or generation require the use of sophisticated stochastic models. The LinGO Redwoods initiative is providing the methodology and tools for a novel type of treebanks, far richer in the granularity of available linguistic information and dynamic in both the access to treebank information and its evolution over time.

As of the Seventh Growth (matched to the 1111 release of the ERG), the English Redwoods treebanks included over 51,000 sentences, across a variety of genres. This resource has enabled research into generative and conditional models for parse selection and realization ranking. The DeepBank project applies the methodology to the same Wall Street Journal text as is annotated in the Penn Treebank.

Similar resources have been created on the basis of other DELPH-IN grammars using the same methodology, including the Hinoki Treebank for Japanese and the Tibidabo Treebank for Spanish.

Personal tools