CorDon - About the Lexicon

Start Texts Search Lexicon Alignment Links Publications Visus

English Version

Lithuanian Version

|

German Version

|

The Lexicon

The Lexicon contains the vocabulary of Donelaitis’s texts, sorted by lemma. It is structured on the concordance principle.

The dark grey box at the head of each lemma entry contains the following information: The lemma’s part(s) of speech, its accentuation, inherent categories (like inflectional class or grammatical gender) as well as (Lithuanian) glosses of its meaning in its concrete occurrences in the text. The Lemma taip (which may be an adverb, a particle or a conjunction), for example, is used in Donelaitis’s texts in the meanings 'taip', 'taip pat', 'tiek', 'taigi, todėl', 'labai, daug' and 'taip (kaip)'. The box additionally contains a link to the corresponding LKŽ dictionary entry and the ten nearest neighbours of the lemma in lemma vector space in the texts of the corpus.

Below the grey box is a list of individual occurrences of the lemma, in the form in which they appear in the annotated base text. The lemma taip, for example, occurs in three spellings: “taip”, “taìp” and “tâip”. Below each form you will find the number of occurrences as well as their locations. The verse IDs are in the format siglum_page or sheet_line number(verse number). Click on an ID to open the annotated Reading View at the corresponding location. Additionally, each entry contains a transliteration of the form into the modern Lithuanian alphabet, its equivalent modern Standard Lithuanian form, its part of speech, inflection, morphology, and its 10 nearest neighbours in word vector space in the texts of the corpus.

The information in the Lexicon is generated from the annotated texts. The additional information about the neighbours in word vector space of the lemma and its forms was generated using the word2vec Tool (Mikolov 2013). Proximity in vector space typically indicates a paradigmatic, syntagmatic and/or metrical similarity to the word in question.

The sigla of all texts in the corpus are resolved here.

Graphic representation of the lemmas per first letter - P has the largest occurrence

About Donelaitis

About the Texts

About the Annotation

About the Reading View

About the Search

About the Lexicon

About the Alignment

The data provided by this website were digitized and processed as part of the Fritz Thyssen project Altlitauisch Digital: Corpus des Kristijonas Donelaitis (1714–1780). Part of the data, especially the annotations, was adapted from the SLIEKKAS project.

If you want to use the content of this website, please cite this page as follows:

CorDon 2020 – Jolanta Gelumbeckaitė, Armin Hoenen (board), Mortimer Drach (annotator), Philipp Büch (programmer), Altlitauisch Digital: Corpus des Kristijonas Donelaitis (1714–1780). Fritz Thyssen
Retrieved from http://titus.fkidg1.uni-frankfurt.de/cordon/start.html


		funded by

;