Deep Dependencies Evaluation Corpus

What Lurks in the Depths?

The Deep Dependencies Evaluation Corpus (DDEC; Bender et al 2011) is a collection of 1000 naturally occurring English sentences illustrating 10 linguistic phenomena collected to probe the depth of analysis of syntactic parsers. Each phenomenon is characterized by up to two dependency triple types which were manually annotated for each example. All 1000 examples were dual annotated and the annotations were then reconciled. The phenomena and annotation scheme, along with the results of using the corpus to evaluate seven parsers: are described in:

Bender, Emily M., Dan Flickinger, Stephan Oepen and Yi Zhang. 2011. Parser Evaluation over Local and Non-Local Deep Dependencies in a Large Corpus. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP 2011). pp.397-408. [.bib] [slides from the talk]

As reproducibility and reusability are among our primary goals in this work, the corpus, annotations and software used in the evaluation are available for download from the ACL anthology.

Corpus and software download (gzip file)

The 10 phenomena we chose to evaluate are only a sample of the kinds subtleties that exist in English (and by extension, other languages), but we believe them to be representative.


These materials were published along with the paper in the proceedings of EMNLP. ACL asserts copyright and licensing conditions on the ACL Anthology main page.