Work on the new PADT 2.0 is now in progress. The recent developments are described in our submission to the LREC 2008 Workshop on Arabic & Local Languages:
- Prague Arabic Dependency Treebank: A Word on the Million Words
- [paper]
According to the paper, the expected contents of PADT 2.0 will include these annotations:
PADT 2.0 Corpus Fun. Morphology Dep. Syntax Tectogrammatics Notes Total 1,095,610 1,281,858 1,001,908 30,894 merged annotations Prague 328,240 383,482 282,252 30,894 original annotations Penn 767,370 898,376 719,656 converted annotations Prague Corpus Fun. Morphology Dep. Syntax Tectogrammatics Notes AEP 99,360 116,717 116,717 9,690 Arabic English Parallel News EAT 48,371 55,097 55,097 13,934 English-Arabic Treebank ASB 11,881 14,254 14,254 Arabic Gigaword NHR 21,445 25,329 12,613 Arabic Gigaword HYT 85,683 100,537 41,855 5,228 Arabic Gigaword XIN 61,500 71,548 41,716 2,042 Arabic Gigaword Penn Corpus Fun. Morphology Dep. Syntax Tectogrammatics Notes 1v3 151,546 172,386 172,386 Penn Arabic Treebank 1v3 2v2 141,515 161,217 161,217 Penn Arabic Treebank 2v2 3v2 335,250 394,466 394,466 Penn Arabic Treebank 3v2 4v1 149,784 178,720 Penn Arabic Treebank 4v1
Your suggestions and comments are very welcome. Thank you.