Treebanks

From DELPH-IN

Revision as of 01:11, 12 February 2013 by EmilyBender (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Finally, as processing efficiency and grammatical coverage have become less pressing aspects for ‘deep’ NLP applications, the research focus of several DELPH-IN members has shifted to combinations of ‘deep’ processing with stochastic approaches to NLP, on the one hand, and to building hybrid NLP systems that integrate ‘deep’ and ‘shallow’ techniques in novel ways. More specifically, the transfer of DELPH-IN resources into industry has amplified the need for improved parse ranking, disambiguation, and robust recovery techniques and there is now broad consensus that applications of broad-coverage linguistic grammars for analysis or generation require the use of sophisticated stochastic models. The LinGO Redwoods initiative is providing the methodology and tools for a novel type of treebanks, far richer in the granularity of available linguistic information and dynamic in both the access to treebank information and its evolution over time. Redwoods has completed two sets of treebanks, each of around 7,000 sentences, for Verbmobil transcribed dialogues and customer emails from an ecommerce domain. On-going research for the Redwoods group at Stanford (and partners in Edinburgh and Saarbrücken) is investigating generative and conditional probabilistic models for parse disambiguation in conjunction with the LinGO ERG (and other DELPH-IN grammars).

Personal tools