Treebanks

From DELPH-IN

(Difference between revisions)
Jump to: navigation, search
Line 1: Line 1:
-
Finally, as processing efficiency and grammatical coverage have become less pressing aspects for ‘deep’ NLP applications, the research focus of several DELPH-IN members has shifted to combinations of ‘deep’ processing with stochastic approaches to NLP, on the one hand, and to building hybrid NLP systems that integrate ‘deep’ and ‘shallow’ techniques in novel ways. More specifically, the transfer of DELPH-IN resources into industry has amplified the need for improved parse ranking, disambiguation, and robust recovery techniques and there is now broad consensus that applications of broad-coverage linguistic grammars for analysis or generation require the use of sophisticated stochastic models. The [http://moin.delph-in.net/RedwoodsTop LinGO Redwoods] initiative is providing the methodology and tools for a novel type of treebanks, far richer in the granularity of available linguistic information and dynamic in both the access to treebank information and its evolution over time. Redwoods has completed two sets of treebanks, each of around 7,000 sentences, for Verbmobil transcribed dialogues and customer emails from an ecommerce domain. On-going research for the Redwoods group at Stanford (and partners in Edinburgh and Saarbrücken) is investigating generative and conditional probabilistic models for parse disambiguation in conjunction with the LinGO ERG (and other DELPH-IN grammars).
+
As DELPH-IN resources were deployed in practical applications, both in industry and in academia, it became clear that there was a need for improved parse ranking, disambiguation, and robust recovery techniques.  More generally, there is now broad consensus that applications of broad-coverage linguistic grammars for analysis or generation require the use of sophisticated stochastic models. The [http://moin.delph-in.net/RedwoodsTop LinGO Redwoods] initiative is providing the methodology and tools for a novel type of treebanks, far richer in the granularity of available linguistic information and dynamic in both the access to treebank information and its evolution over time.  
 +
 
 +
As of the [http://svn.delph-in.net/erg/tags/1111/tsdb/gold/ Seventh Growth] (matched to the 1111 release of the ERG), the English Redwoods treebanks included over 51,000 sentences, across a variety of genres. This resource has enabled research into generative and conditional models for parse selection and realization ranking.  The [http://moin.delph-in.net/DeepBank DeepBank] project applies the methodology to the same Wall Street Journal text as is annotated in the Penn Treebank.

Revision as of 01:20, 12 February 2013

As DELPH-IN resources were deployed in practical applications, both in industry and in academia, it became clear that there was a need for improved parse ranking, disambiguation, and robust recovery techniques. More generally, there is now broad consensus that applications of broad-coverage linguistic grammars for analysis or generation require the use of sophisticated stochastic models. The LinGO Redwoods initiative is providing the methodology and tools for a novel type of treebanks, far richer in the granularity of available linguistic information and dynamic in both the access to treebank information and its evolution over time.

As of the Seventh Growth (matched to the 1111 release of the ERG), the English Redwoods treebanks included over 51,000 sentences, across a variety of genres. This resource has enabled research into generative and conditional models for parse selection and realization ranking. The DeepBank project applies the methodology to the same Wall Street Journal text as is annotated in the Penn Treebank.

Personal tools