The Lexicon contains the vocabulary of Donelaitis’s texts, sorted by lemma. It is structured on the concordance principle.
The dark grey box at the head of each lemma entry contains the following information: The lemma’s part(s) of speech, its accentuation, inherent categories (like inflectional class or grammatical gender) as well as (Lithuanian) glosses of its meaning in its concrete occurrences in the text. The Lemma taip (which may be an adverb, a particle or a conjunction), for example, is used in Donelaitis’s texts in the meanings 'taip', 'taip pat', 'tiek', 'taigi, todėl', 'labai, daug' and 'taip (kaip)'. The box additionally contains a link to the corresponding LKŽ dictionary entry and the ten nearest neighbours of the lemma in lemma vector space in the texts of the corpus.
Below the grey box is a list of individual occurrences of the lemma, in the form in which they appear in the annotated base text. The lemma taip, for example, occurs in three spellings: “taip”, “taìp” and “tâip”. Below each form you will find the number of occurrences as well as their locations. The verse IDs are in the format siglum_page or sheet_line number(verse number). Click on an ID to open the annotated Reading View at the corresponding location. Additionally, each entry contains a transliteration of the form into the modern Lithuanian alphabet, its equivalent modern Standard Lithuanian form, its part of speech, inflection, morphology, and its 10 nearest neighbours in word vector space in the texts of the corpus.
The information in the Lexicon is generated from the annotated texts. The additional information about the neighbours in word vector space of the lemma and its forms was generated using the word2vec Tool (Mikolov 2013). Proximity in vector space typically indicates a paradigmatic, syntagmatic and/or metrical similarity to the word in question.
The sigla of all texts in the corpus are resolved here.