The DiGreC Treebank: Linguistics and Literature
In: Research Data journal for the humanities and social sciences, Band 6, Heft 1, S. 1-12
ISSN: 2452-3666
Abstract
The DiGreC (DIachrony of GREek Case) treebank is a corpus of selected sentences from Greek texts, ranging from Homer to Modern Greek, which have been annotated morphosyntactically and semantically. The corpus comprises excerpts from 655 texts, for a total of 3385 sentences and 56,440 word tokens; automated tagging and lemmatisation has been supplemented with manual review to ensure accuracy. The data exist in xml and csv formats, which can be manipulated and converted automatically to other schemata. A web site has also been created to allow users to interact with the data more easily, and to provide specialised functionality for searching and visualisation. This corpus was created to inform theoretical debates regarding the role of case in grammar, and may be of use to researchers searching for specific attestations of a range of different constructions in Greek.