ESTONIAN ACADEMY
PUBLISHERS
eesti teaduste
akadeemia kirjastus
The Yearbook of the Estonian Mother Tongue Society cover
The Yearbook of the Estonian Mother Tongue Society
Impact Factor (2022): 0.3
EESTI KEELE SÕLTUVUSPUUDE PANK JA SELLE KEELETEOREETILISED LÄHTED; pp. 122–145
PDF | http://dx.doi.org/10.3176/esa62.04

Authors
Kadri Muischnek, Kaili Müürisep
Abstract

The Estonian Dependency Treebank and its theoretical basis

This article presents the Estonian Dependency Treebank (EDT) and discusses its language-theoretical basis. EDT contains ca 400,000 tokens of fiction, newspaper and science texts. Its syntactic annotation is based on principles of dependency syntax. Previous experiments with annotating Estonian sentences according to the principles of phrase structure syntax have shown that the resulting trees tend to be too shallow and thus do not encode the linguistic information in the best possible way. Therefore dependency-syntactic representation was chosen instead.
Dependency relations are efficient for encoding typical head-dependent relations like verb-argument or head-modifier but are not so suitable for analysing adpositional phrases, verbal chains, multi-word expressions or other constructs without clear internal syntactic relations. In such cases, there are arguments both for and against all possible solutions.

References

Bick jt 2004 = Eckhard Bick, Heli Uibo, Kaili Müürisep. Arborest – a VISL-style treebank derived from an Estonian Constraint Grammar Corpus. – Proceedings of the Third Workshop on Treebanks and Linguistic Theories (TLT 2004), Tübingen, December 10–11, 2004. Eds. Sandra Kübler, Joakim Nivre, Erhard Hinrichs, Holger Wunsch. 1–14.

Bick, Eckhard, Tino Didriksen 2015. CG-3 – Beyond Classical Constraint Grammar. – Proceedings of NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania. Linköping: LiU Electronic Press, 31–39.

Brants jt 2002 = Sabine Brants, Stefanie Dipper, Silvia Hansen, Wolfgang Lezius, George Smith. The TIGER treebank. – Proceedings of the Workshop on Treebanks and Linguistic Theories, Sozopol.

EKG II = Mati Erelt, Reet Kasik, Helle Metslang, Henno Rajandi, Kristiina Ross, Henn Saari, Kaja Tael, Silvi Vare 1993. Eesti keele grammatika. II. Süntaks. Lisa: Kiri. Peatoim. Mati Erelt, toim. Tiiu Erelt, Henn Saari, Ülle Viks. Eesti Teaduste Akadeemia Keele ja Kirjanduse Instituut. Tallinn.

Foth jt 2014 = Kilian Foth, Arne Köhn, Niels Beuck, Wolfgang Menzel. Because size does matter: the Hamburg Dependency Treebank. – Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), Reykjavik. 2326–2333.

Hajič, Jan 1998. Building a syntactically annotated corpus: The Prague Dependency Treebank. – Issues of valency and meaning, 106–132.

Hajič jt 2015 = Jan Hajič, Eva Hajičova, Marie Mikulov, Jiří Mírovský, Jarmila Panevová, Daniel Zeman. Deletions and Node Reconstructions in a Dependency-Based Multilevel Annotation Scheme. – Computational Linguistics and Intelligent Text Processing. 16th International Conference, CICLing 2015, Cairo, Egypt, April 14-20, 2015, Proceedings, Part I. Ed. Alexander Gelbukh. LNCS 9041. Springer, 17–31.
http://dx.doi.org/10.1007/978-3-319-18111-0_2.

Havelka, Jirí 2007. Beyond projectivity: multilingual evaluation of constraints and measures on non-projective structures. – Proceedings of ACL. Conference: ACL 2007, Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, June 23-30, 2007, Prague, Czech Republic. Eds. John A. Carroll, Antal van den Bosch, Annie ­Zaenen. The Association for Computational Linguistics, 608–615.

Haverinen jt 2014 = Katri Haverinen, Jenna Nyblom, Timo Viljanen, Veronika Laippala, Samuel Kohonen, Anna Missilä, Stina Ojala, Tapio Salakoski, Filip Ginter. Building the essential resources for Finnish: the Turku Dependency Treebank. – Language Resources and Evaluation 48 (3), 493–531.

Hudson, Richard A. 1984. Word Grammar. Blackwell.

Järvinen, Timo, Pasi Tapanainen 1997. A dependency parser for English. Technical Report TR-1, Department of General Linguistics, University of Helsinki.

Lindström, Liina 2002. Veel kord subjekti ja predikaadi vastastikusest asendist laiendi järel. – Emakeele Seltsi aastaraamat 47 (2001). Tartu: Emakeele Selts, 87−106.

Marcus jt 1993 = Mitchell P. Marcus, Mary Ann Marcinkiewicz, Beatrice Santorini. Building a large annotated corpus of English: The Penn Treebank. – Computational Linguistics 19 (2), 313–330.

McDonald jt 2013 = Ryan McDonald, Joakim Nivre, Yvonne Quirmbach-Brundage, Yoav Goldberg, Dipanjan Das, Kuzman Ganchev, Keith Hall, Slav Petrov, Hao Zhang, Oscar Täckström, Claudia Bedini, Núria Bertomeu Castelló, Jungmee Lee. Universal Dependency Annotation for Multilingual Parsing. – 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), August 4–9 August, 2013, Sofia, Bulgaria. Proceedings of the conference, Volume 2: Short papers,
92–97.

Mel’čuk, Igor 1988. Dependency Syntax: Theory and Practice. State University of New York Press.

Muischnek jt 2016 = Kadri Muischnek, Kaili Müürisep, Tiina Puolakainen. Estonian Dependency Treebank: from Constraint Grammar tagset to Universal Dependencies. – Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), May 23–28, Portorož, Slovenia, 1558−1565.

Müürisep, Kaili 2000. Eesti keele arvutigrammatika: süntaks. (= Dissertationes mathematicae Universitatis Tartuensis 22.) Tartu Ülikooli matemaatikateaduskond. Tartu: Tartu Ülikooli Kirjastus.

Müürisep jt 2008 = Kaili Müürisep, Heili Orav, Haldur Õim, Kadri Vider, Neeme Kahusk, Piia Taremaa. From Syntax Trees in Estonian to Frame Semantics. – The Third Baltic Conference on Human Language Technologies, October 4-5, Kaunas, 211–218.

Nivre, Joakim 2005. Dependency grammar and dependency parsing. – Technical report. Växsjö University, School of Mathematics and Systems Engineering.

Nivre, Joakim 2009. Non-projective dependency parsing in expected linear time. – Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Vol. 1. Association for Computational Linguistics, 2009, 351–359.

Nivre, Joakim 2015. Towards a Universal Grammar for Natural Language Processing. – Computational Linguistics and Intelligent Text Processing. 16th International Conference, CICLing 2015, Cairo, Egypt, April 14-20, 2015, Proceedings, Part I. Ed. Alexander Gelbukh. LNCS 9041. Springer, 3–16.

Nivre jt 2004 = Joakim Nivre, Koenraad de Smedt, Martin Volk. Treebanking in Northern Europe: a white paper. – Nordisk Sprogteknologi. Nordic Language Technology. Årbog for Nordisk Sprogteknologisk Forskningsprogram 2000-2004. Ed. Henrik Holmboe. 97–113.

Osborne, Timothy 2013. A Look at Tesnière’s Éléments through the Lens of Modern Syntactic Theory. – Proceedings of the Second International Conference on Dependency Linguistics (DepLing 2013), 262–271.

Osborne, Timothy 2015. Dependency Grammar. Syntax – Theory and Analysis. An International Handbook. Vol 2, 1027–1045.

Rätsep, Huno 1978. Eesti keele lihtlause tüübid. Tallinn: Valgus.

Sgall jt 1986 = Petr Sgall, Eva Hajičová, Jarmila Panevová. The Meaning of the Sentence and Its Semantic and Pragmatic Aspects. Reidel Publishing Company, Dordrecht, Netherlands.

Zeman jt 2012 = Daniel Zeman, David Marecek, Martin Popel, Loganathan Ramasamy, Jan Stepánek, Zdenĕk Žabokrtský, Jan Hajič. HamleDT: To parse or not to parse? – LREC, 2735–2741.

Tesnière, Lucien 1959. Éléments de syntaxe structurale. Paris: Klincksieck.

Tesnière, Lucien 2015. Elements of Structural Syntax. Transl. Tymothy Osborne, Sylvain Kahane. John Benjamins.
http://dx.doi.org/10.1075/z.185.

Torga, Liisi 2016. Mitte-projektiivsed laused eesti keele sõltuvuspuude pangas. Bakalaureusetöö. Käsikiri Tartu Ülikooli eesti ja üldkeeleteaduse instituudis.

Uibo, Heli 2004. Syntactically annotated corpora of Estonian. – The First Baltic Conference Human Language Technology – the Baltic Perspective, 21–22.

Urešová jt 2013 = Zdeňka Urešová, Jana Šindlerová, Eva Fučíková, Jan Hajič. An analysis of annotation of verb-noun idiomatic combinations in a parallel dependency corpus. – NAACL HLT 2013, 58–63.

Back to Issue