Start Texts Search Lexicon Alignment Links Publications Visus English Version Lithuanian Version German Version

Some Visualisations

Graphic representation of two identical lines of verse at different points in a Donelaitis text according to the ReAF principle, Hoenen 2018

Recurrence Analysis Function (ReAF): A description of the ReAf can be found in the concurrent article Hoenen (2018). The main goal of this visualisation is to use textual redundancy to characterise the mode of text genesis as either born-oral (bardic) or born-written. Even though a non-standardised orthography can obscure redundancy, the concurrent analyses on the basis of the lemmata of the texts have (just as the wordforms) not made any other conclusion probable than the one that Donelaitis' texts are born-written.
ReAFs: Pavasario linksmybės, Vasaros darbai, Rudenio gėrybės, Žiemos rūpesčiai, Fortsetzung, Pričkaus pasaka apie lietuvišką svodbą, Lapės ir gandro česnis, Rudikis jomarkininks, Šuo didgalvis, Pasaka apie šūdvabalį, Vilks provininks, Aužuols gyrpelnys

Word cloud: A word cloud counts as one of the most popular visualisations in digital space. It allows for an approximate impression of the textual content through the rendering of its most frequently ocurring word forms. This word cloud is based on the lemmata of the texts whilst all positions with a non content-word part-of-speech tag have been excluded. The graphic was produced with Python. Word cloud of the CorDon texts

Stilometry: Stylometry analyses the styles of different authors using under more statistical characteristics of their texts and has generated public attention when contributing to cases of attribution of texts of anonymous authorship to concrete persons. In this case, we use the R package stylo in order to compute how similar the single texts appear to be to each other given the use of frequent words. Generelly one can observe, that the fables and Metai are lying somewhat apart, but that Metai fragments show a tendency towards the fables. This could theoretically be influenced by text length. In using the most frequent words, similarities in content are usually not captured since the most frequent words are usually function words. In another view onto the same data (so-called multi dimensional scaling) where texts are located within a 2-dimensional space according to their distances the interpretation seems similar. The tree does not change using more words.
Cluster tree of the CorDon texts, 10 most common words, 2 large clusters, Metai and Pasakos, the Metai continuation, however, clusters with the Pasakos
Multi-dimensional scaling, the texts are arranged in a 2-D space, Fortsetzung and Fritzen's stories are further away from the rest of the Metai cluster

Topic Models: Topic models are a method which recognises word groups which are statistically similar in their distribution throughout the texts in a corpus. Such word groups are called topics in this technology. In opposition to other technologies in topic models one word may belong to different topics, may indicate different topics with different strengths. The only parameter one needs to initialise a topic model is the number of assumed topics within ones corpus. Analysing the Metai (lemmata) assuming 4 topics, the following graphic can be produced. The topics here are characterised by the following word groups (only 10 most indicative terms per topic):
topic1: savo, būti, būras, ponas, dievas, reikėti, žinoti, pasidaryti, kartas, padaryti
topic2: pulkas, lėlė, austi, kopinėti, nuogas, oras, vanduo, skranda, padėti, trūsas
topic3: krizas, česnis, šventas, matyti, enskys, bjaurus, ranka, svodba, stalas, šokti
topic4: amtsrotas, varna, kakalys, šiltas, šaltyšius, mylėti, metas, klastuoti, žiemys, tamsa
Topic model, visualized as grid lines, which indicate for each text by the size of the diameter of concentric circles how strong which of the topics 1-4 is present. Topic 1 overall largest: autumn, continuation, winter, spring, summer, Fritz. Topic 2: Spring, Topic 3: Fritz, Summer, Topic 4: Winter

Dispersion: The following simple graphics display where a certain word appears. The x-axis has the "text time", that is with every token (punctuation mark or word) the text time increments. The 100st token has thus text time 100. Text of the investigation was Metai (without Pričkaus pasaka). One sees that the peasant is active throughout the entire year, and that the seasons are not only mentioned in their respective chapters. Free after Jockers (2014).
Distribution of the term for spring - distribution accumulation in the first quarter, especially shortly after text time token 5000, afterwards seldom. Text time up to approx. 30000. Distribution of the term for summer - distribution accumulation in the first quarter, shortly after text time token 5000 parallel to spring, then around text time 12000 to 14000 (middle) and accumulated at the end. Distribution of the term for autumn - distribution accumulation in the second half, especially around text time 17000-22000 and at the end. Distribution of the term for winter - distribution without a single cluster with gaps at the end of the first quarter, in the middle and at around text time 20,000 and 25,000. Strongest cluster of terms around tokens 23,000. Distribution of the term for farmer - almost even distribution over the entire span with 4-6 term clusters and just as many shorter gaps.


;