Achtung: Dieser Text ist mit Unicode / UTF8 kodiert. Um die in ihm erscheinenden Sonderzeichen auf Bildschirm und Drucker sichtbar zu machen, muß ein Font installiert sein, der Unicode abdeckt wie z.B. der TITUS-Font Titus Bitstream Unicode. | Attention: This text is encoded using Unicode / UTF8. The special characters as contained in it can only be displayed and printed by installing a font that covers Unicode such as the TITUS font Titus Bitstream Unicode. |
Main Principles of Computer-aided Modelling
(on the basis of the Georgian and Udi Languages)
Manana Tandashvili, Tbilisi
Computer processing of natural language is a modern, in principle new stage of
language study. It was developed in the 60's as an independent branch of
artificial intelligence and covered a wide spectrum of spheres (both theoretic
and practical), such as computer modelling, computer analysis of texts (natural
language systems), computer analysis and synthesis of speech, systems for
automatized correction (spellcheckers), systems for visual processing of
information, computer-aided translation, etc.
The two principles are known in natural language computer processing (aimed
at creation of the above mentioned information-technological products): an
accumulative principle (such an approach of problem resolution implies creation
of the correct word forms dictionary of maximum possible volume accompanied
by a search system for each wordform to be verified), and productive principle
(the mentioned approach implies building of a full computer model of word
derivation and on its basis, creation of productive rules for formal description of
paradigms). Both approaches have advantages and limitations. Though, both
mentioned principles have been used with equal success for different languages,
taking into account morphological-syntactic peculiarity of the language.
To carry out computer processing of Iberian-Caucasian languages, we
essentially follow the second approach, and accordingly we see perspectives of
computer research of these languages in connection with computer modelling
possibilities.
Computer modelling of a language implies building of fully formalized model
of morphology which, in its turn, consists of (1) necessary linguistic facts (e.g.
electronic morphological dictionary containing linguistic data of two types: stem
dictionary and morpheme database) and (2) productive rules (transformation
rules base and computer programs for analysis and synthesis designed on its
basis). Combination of these two components (electronic morphological
dictionary and transformation rules base) creates informational (ultimately
program) product, so called, morphological knowledge base.
Both mentioned components are interrelated and they must meet the following
four essential requirements:
(1) Integrity;
(2) Logical and grammatical consistency;
(3) Entirety;
(4) Minimal redundance of information.
To obtain the formalized model of natural language morphology deductive way
is used. It presents a result of deductive generalization of theoretic conclusions
obtained by means of system description of linguistic facts. It is a conceptual
model and thus has not yet been a system which adequately reflects the
language as a pragmatic-communicative means. Its actualization (in form of
programs) in the one hand, and its verification on the basis of representative
lexical material on the other hand, is the further stage of the conceptual model
formalization. Present work is an endeavour to carry out conceptual and
computer modelling of Iberian-Caucasian languages (on the pattern of Georgian
and Udi languages) using general principles of computer modelling and aimed
at their investigation by means of modern technologies.
Frame Model of Georgian Verb Conjugation
Necessity of creating both conceptual and computer models of the Georgian
language is dictated by the modern level of theoretical research in linguistics
and design requirements of applied systems for data processing. Difficulties in
solving this fundamental problem are basically connected with the complexity
of formalization of the Georgian verb, since its conjugation is possible
according to both subjective and objective persons (verb polypersonality)
causing increase in the number of verbal forms contained in the paradigm,
incomparable with other languages. That is why conceptions developed for the
study of verbs in Indo-European languages that have formed the basis of the
available computer models, seem to be insufficiently effective in application to
Georgian; and the interested researchers have to look for other approaches to
this problem.
One of the possible conceptions of building a Georgian verb model oriented to
the formalization level necessary for computer applications is described below.
The theory of "A Paradigm of marks" developed by T. Uturgaidze, is used as a linguistic basis
for the model. According to this theory, the conjugation paradigm for the
Georgian verb may be expanded to the Descartes product of three relatively
independent components: paradigm of persons, paradigm of tense and mood,
and paradigm of mars. While the first and the second components exist
practically in all languages, the paradigm of marks is a unique feature of
Georgian, it embraces such grammatical categories of a polypersonal verb as
voice, version and causation. The introduction of concept of mark generalizing
the cited categories as well as the related new, extended interpretation of the
person paradigm, is undoubtedly an essentially new step in the study of the
Georgian verb, considerably facilitating its formal modeling. On the other hand,
the limits of the model restrict the present work to the problems of modeling of
the paradigm of marks and paradigm of persons; models of tense and mood will
be a subject of further research.
A number of researchers have made a description of possible combination types
of verbal forms in the Georgian language using such concepts as verbal
personality (A. Shanidze, 1973), valency (Th. Gamkrelidze, 1979), position
numbers in predicates (R. Asatiani,1982). The present work is based on the
concept of "actant" of verbal form (L.Tesnière, 1959) that can have four
following varieties for deep structures of Georgian: I – initiator of action; P –
object affected (patient); R – object which the action is oriented to (recipient);
E – direct action executor. From this standpoint, the existence of one-, two-, three- and four-actant types of verbal forms in Georgian is confirmed. Their deep (1)
and surface (1A) structures for the concrete stem "-c̣er" ("to write") may be
written as follows:
V-P: ic̣ereba "it (a letter) is being written"
V-PR: mec̣ereba "it (a letter) is being written for/to me"
V-EP: c̣ers "he (Luka) is writing it (a letter) "
V-EPR: mic̣ers "he (Luka) is writing it (a letter) for/to me"
V-IEP: ac̣erinebs "she (Anastasia) makes him (Luka) to write it (a letter)"
V-IEPR: mic̣erinebs "she (Anastasia) makes him (Luka) to write it (a letter) for/to
me"
(Scheme 1)
V-S ic̣ereba S – 'a'; PASSIVE – 'i-eb'
V-SOind mec̣ereba S – 'a',Oind – 'm'; PASSIVE – 'e-eb'
V-SOdir c̣ers S – 's', Odir – ø; ACTIVE – ø
V-SOdirOind mic̣ers S – 's', Odir – ø, Oind – 'm-i'; ACTIVE – ø
V- S Oind exOdir ac̣erinebs S – 's', Odir – ø, Oind ex – ø; CAUSATIVE – 'a-ineb'
V- S O ind exOdirOind mic̣erinebs S – 's', Odir – ø, Oind – 'm-i', Oindex – ø; CAUS.– '-ineb'
(Scheme 1A)
The main assumption of the present work is the possibility of formal description
of Georgian verbal form combinations using conception of frames that was
originally developed by M.Minsky to solve knowledge representation problems.
In the opinion of the authors, there is a close similarity between the description
of stereotype situations of the environment and representation of the actants in
verbal forms as well as their interrelations. For Indo-European languages such
an analogy seems to be rather superficial, but being used for the polypersonal
Georgian verb it may lead to a number of non-trivial deductions. In particular,
language for the description of stereotyped situations developed in the theory of
frames will undoubtedly be helpful for giving a more compact and convincing
form to the paradigm of mars as well as for more effective prediction of virtual
word forms that can be potentially realized.
If we represent the totality of verbal forms as a certain frame or a script,
according to M.Minsky's metaphor (cf. L.Tesnière's "play") and impart the
meaning of slots to the actants operating in it, and describe them according to
features: "the role" set by the script and "the actor" playing this role, then fairly
obvious conformity is established between all possible actant combinations and
corresponding frames, it is extending to the relations between the deep and
surface structures. In the important particular case of subjective version forms:
"ვიწერ" ("I write it (a letter) for myself") such an interpretation differs from
the traditional standpoint since it considers this form to be a copy of the frame
V-EPR (see Scheme 1) in which three roles and two executors are identifiable
in E=R, and in the other case of causative passive "მეწერინება" ("the
letter (i.e. extralinguistic factors related to it) prompts me to write it (i.e. the
letter itself)") as a variant of the frame V-IEP in I=P.
As is directly seen from the enumeration of deep structure frames, P is the main
actant of the Georgian verb, being represented in all frames. Therefore, V-P
should be taken as the main verbal frame while the other frames only expanded
the information in relation to the actants, which may be tentatively represented
in the following scheme as transfer to the left and to the right: I ← E ← P
→ R.
This link between the frames may be represented also as their subordination
system, being transformational one for the frame of the surface structure. V-P is
a superframe of the system, while all the other ones are subframes. A direct
extension of frame V-P, yields two-actant frames V-EP and V-PR; their
combination gives the three-actant frame V-EPR; in its turn, its combination
with frame V-IEP creates a four-actant frame V-IEPR: Schema 1
In the four-level system of verbal frames, we have the principle of actant
heredity: each actant of an upper level is preserved on all lower levels. One of
the possible disciplines of building up actants may be represented as follows:
P level P S R level R P Oind S E level E R P S Oind Odir I level I E R P S Oindex Oind Odir (Scheme 3) (Scheme 3A) |
It follows from the foregoing that the canonical number of deep structures of
the Georgian verb cannot be more than 6 (and not 15 as must be expected for
an arbitrary four-actant frame). It is significant that Oind is the only actant of the
surface structure that is not transformed when moving from one level to another
(compare 1A and 3A). The remaining actants are transformed as follows: the
actant of the surface structure S, determined on the S and R levels, turns into P,
and the actant S enters the form; in its turn, on the level I it turns into actant E
(with corresponding changes in the morphemes representing the actants).
The scheme assumes a more obvious form if we divide the frames into those
containing the recipient and the ones that do not contain it. In this case, we
obtain a three-level transformation system in which horizontal transfer between
the frames reflects non-transformation changes and vertical transfer-transformation changes in surface structure.Schema 3
Since the frames of level c are set at the end of the transformation chain, they
cannot be subject to subsequent transformations, and naturally, the actants of
their deep and surface structures fully coincide (see Schemes 1 and 1A). At the
same time, the actant S on levels a, b and c describe considerably different
entities: subject of level a is inagentural, of levels b and c - agentural.
Accordingly, the frames of level a reflect a passive verbal construction, while
the frames of levels b and c reflect a active verbal construction.
It should be emphasized that in the opinion of authors the verbal categories that
use a natural language for generating a transformation system of the frames of a
polypersonal verb (voice, version, causation) can not be referred in neither to
derivative (A. Shanidze) or a flective (T.Uturgaidze) grammatical categories. In
fact, they should be reckoned to the concepts of morphological-syntactical level
(M. Machavariani, 1987), and the above represented model - to a great degree,
among morphological-syntactic models.
One more argument in favour of the adopted conception is the obvious
interpretation of such an important concept of the frame theory as hidden slot
value. On can see the corresponding mechanism for implied representation of
the actants in the above schemes. The presence of certain values of above
described actants P, R, E and I is necessary and sufficient for a full description
of the semantics of a possible stereotype situation in the environment reflected
by this verb. It is another matter, which of the actants is made active in each
concrete case, on which of them logical attention is focused, and at what
sequence the actants are builtup in. The transformation system of the Georgian
verb reminds of a three-act "play" in which the rules for the appearance and
departure of the actors are strictly defined.
Let us consider terminal versions of the above frames, taking into account the
substitution for verbal actants with 6 possible in a natural language person
values in the singular and plural number. For example, frame V-P branchs into
6 superterminals V-P1, V-P2, V-P3, V-P1p, V-P1p and V-P3p. Any subterminal
(terminal of subframe) located on the lower level has a corresponding node of
the information graph, which concretizes the values of the actant related to this
node; the branches of this graph are defined by the discipline for building up
the actants and naturally, by the selection of an expanding person.
In other words, owing to the above mentioned fundamental property of
preserving by the lower levels of the information of the upper levels in
invariable form, only the subterminals of the same superterminals enter into
combinations with each other, i.e., in fact, the above graph is a tree.
Considering the applicability sphere for the introduced conception, it should be
characterized as practically fully covering the verb paradigm for active and
passive voices (taking into account all diatheses of versions and causation),
although it is applicable to most static verbs and verbs of medial voice as well.
As regards some verbs for which formalization within the frame of the cited
conception is difficult (e.g. deponences), they form a relatively small - number
group, as a rule with a substantially restricted paradigm, and it is advisable to
work out special (narrower) models for such verbs.
From the standpoint of practical use, the application of the frame theory in the
formalization of the Georgian verb enables to build a paradigm for any new
stems of natural language (with its expansion) automatically, without
difficulties, or to form potential word forms, potentially possible in human
communication do not occur in textual materials of the stems existent in the
language. On the other hand, this conception explains the existence of so called
"forbidden" combinations of verb the paradigm (e.g., V-E1P1R1) and sheds light
the mechanisms used in Georgian in such cases.
Main Aspects of Computer Research of Udi Language
The necessity of computational linguistic research in the Udi language.
The Udi language belongs to the Lezghian subgroup of the Daghestan group of
the Caucasian languages. Like many other Caucasian languages, Udi is the
native language of a few thousand people. Already in the middle of the 19th
century, A. Schiefner, one of the first investigators of the Udi language, pointed
out that this language was becoming less used in its Azerbajdzan environment,
and he came to the pessimistic conclusion that it would disappear forever very
soon. By now, the area where the Udi language is used is extremely limited: all
in all, there are but three villages where Udi people live compactly, namely
Vartashen and Nizh in Azerbajdzan and Oktomberi on the territory of Georgia.
The importance of a computational research in the Udi language is caused,
on the one hand, by morphosyntactic peculiarities that make the Udi language
interesting from a typological viewpoint, and, on the other hand, by the fact that
this language is thought to be the relic of the so-called "Albanian" language of
the Caucasus.
We have created the basis for a computational research in the Udi language by
designing electronic databases (using Visual Fox-Pro 2.6 and 3.0 as DBMS)
which include a morphological-lexicological dictionary of the Udi language,
electronic databases of an annotated bibliography, and textual databases. The
morphological-lexicological dictionary of the Udi language was for the first
time publicly presented during the 2nd Tbilisi International Symposium on
"Language, Logic, Computer" in 1997. The annotated electronic bibliography
was developed under the guidance of Prof. Dr. J.Gippert, during a scientific trip
to the Institute of Comparative Linguistics of Frankfurt University. At that
institute, we have also started scanning the New Testament text which was
translated into the Udi language and published in the beginning of the 20th
century.
The first stage of a computational processing of Udi text materials includes
dividing the text into lexemes and their sorting by form-variable and form-invariable words. The 2nd stage comprises the analysis and synthesis of form-variable words by means of computer programs, with the aim of an automatic
generation of an electronic dictionary.
Software for a computational synthesis and analysis of Udi nouns and verbs. In
order to develop software for a computational synthesis and analysis of Udi
nouns and verbs, we have carried out a computer modelling of these elements.
On the basis of computer modelling principles, the necessary parameters of
noun and verb paradigms (Ps and Pv, respectively) were defined. The
declensional paradigm of Udi nouns consists of the following parameters:
Ps = P(P1,P2,P3,P4),
where P1 - wordform,
P2 - declension system classificator,
P3 - declension type,
P4 - type of paradigm.
On the basis of these components, Udi noun structure can be written down in a
formal
model as follows:
P1 = stem (+Em) (+Pl) (+Kn)
m ∊ {Esing., Epl. }
n ∊ {nominative, ergative, genitive, dative, ablative,
comitative, adessive, allative, superessive, causative).
where stem = (S1,S2,S3,S4),
S1,S2,S3,S4-phonematic-structural characteristics of the stem.
Em - insertion according to the type of paradigm,
Pl - plural marker,
Kn - set of markers of main and postpositional cases.
P2 = P2 {K,K1,K2,K3,K4}
where K - one-stem declension type,
K1 - two-stem "diffusional" declension type,
K2 - two-stem "deergative" declension type,
K3 - two-stem "degenitive" declension type,
K4 - two-stem "insertional" declension type.
P3 = P2 (S1,S2,S3,S4)
the declension type is defined on the basis of the
phonematic-structural analysis of the stem, the type of
paradigm - on the basis of the insertion Em: P4 = P4(Em).
There is a certain interrelationship between the parameters which is clearly
reflected in the table below:
P2 | P4 | E m | |||||||||
T1 | K, K1, K2, K3 | ___
T2 | K4 | E m∊{ Es.}
| T3 | K4 | E m∊{ Epl., }
| T4 | K4 | E m∊{ Es., Epl.} | |
As we can see from the table, there is no strong dominating relationship
between parameters P2 and P4. For one group of nouns, the type of paradigm
(T1) determines the declension system classificator, while for the second group,
on the contrary, the declension system classificator (K4) is a
determinator.Schema 14
In the computer modelling of verbs, two verb components must be
distinguished: constants and variables. In the Udi language, root morphemes-semes are considered as constants. Proceeding from the peculiarities of
formation, three types of constants can be distinguished: C = (C1, C2, C3) where
C1 = N-Stem; C2 = IM; C3 = Pr.
The "screeve" paradigms of the 1st series are produced on the basis of constant
C2, the screeves of the 2nd series on the basis of constant C3, and the screeves
of the 3rd series on the basis of constant C1. With a view to constructing a
formal model of the verb paradigm (Pv), the following paradigm parameters are
defined:
P = (P1, P2, P3, P4, P5)
where P1 - syntactic construction of verb,
P2 - parameter of stem structure,
P3 - type of constant (C),
P4 - person marker parameter,
P5 - parameter defining screeve-producing morpheme,
P6 - parameter of causation.
There is certain interdependence between the verb paradigm parameters:
P3 = P3(P2)
where P2 = (An; Bn)
P4 = P4(P1)
where P1 = {N-K, E-K, D-K, G-K}
During the computational processing of a text, the computer analysis of nouns
and verbs is carried out in three stages: (1) marking of selected wordforms; (2)
dividing of the wordform to be analyzed into possible morphemes by means of
paradigms Ps and Pv, its verification with the Udi language database, i.e., by
means of referring to the electronic dictionary of Udi language on the one hand
and by using identificators of attribute registers on the other hand; as a result,
the respective identification of stems and segmented morphemes is carried out;
(3) if there is no appropriate stem in the database, then all those permissible
wordforms that are supposed to be possible members of either a declensional or
a conjugational paradigm of the stem to be identified, are marked in the text by
means of selection. According to a human operator's decision during an
interactive dialogue, the
wordform under analysis can be inserted into the database as a new lexical
entry.
Concerning The Structure Of Electronic Databank
Of the Udi Language
The history of the Caucasian languages study is not long. In comparison with
the Indo-European studies which have centuries-old traditions, Caucasiology is
young: it exists for two centuries only. The especial interest manifested by the
linguists and persons visited the Caucasus with various missions has resulted in
creation of the first brief grammars of the Caucasian languages and textual
records that have laid the foundation of scientific research of the Caucasian
languages and finally - establishment of Caucasian linguistics as a separate
branch of linguistics.
The intensive development of the newest information technologies and progress
in computer linguistics have promoted start of a new stage in the Caucasian
language research. Today necessity of electronic databases containing language
data is indisputable. They can be used in the creation of computer methods for
language research as well as building of language computer models.
The Caucasian languages are distinguished by the complexity of their
phonological systems and morphological structures. The grammatical categories
such as class, version, causation, system of conversion rules, existence of
ergative construction, etc. considerably increase the size of verb and noun
paradigm that complicates formalization of the languages morphology to a great
degree.
The work we are introducing is a part of the project that is being actively
carried out at the Arn.Chikobava Institute of Linguistics. It includes the
electronic databank which integrates the Caucasian language databases of three
types: databases with textual data, electronic dictionaries and electronic
bibliography. Now we are introducing the electronic databank containing a part
of one of these databases: electronic dictionary of Udi language which
illustrates the main principles of building databases including into this project.
The electronic dictionary has been created using the well-known database
control system Visual FoxPro for Windows. The set of dictionary units form the
database structure. As dictionary units the database contains form-variable
words (nouns, adjectives, numerals, pronouns, verbs) and form-invariable ones
(adverbs). The form-variable word is accompanied by its grammatical
characteristic disposed in the main, morphological and lexicological fields. A
special index (structural and morphological characteristic) is applied to the
dictionary unit in the main and morphological fields, which gives a possibility
to solve an important search problem. Just using this index, we can reveal a set
of similar lexical data. Let us separately consider the representation of nouns,
adjectives, numerals, and pronouns in the database.
The main field for noun contains the dictionary unit, its translation (into
Georgian), index (structural characteristic of the stem). The morphological field
contains grammatical forms in accordance with their morphological categories:
form of plural and case marks for the main (ergative, genitive, dative) and
postpositive (locative) cases. Where adding the case mark causes phonetical
changes (mainly in the plural paradigm of noun), the reconstructions are given
as well.
Depending upon how the noun is declined in singular and plural number, 5
types of nouns are distinguished:
1. The nouns for which the singular and plural paradigms are produced by the
case marks only;
2. The nouns for which "the insertion" can precede the case mark in the singular
paradigm;
3. The nouns for which "the insertion" can precede the case mark in the plural
paradigm;
4. The nouns for which "the insertion" can precede the case mark in both
singular and plural paradigm;
5. The nouns which decline in a special way. The index given in the
morphological field reflects morphological typization of the noun and is used
for retrieval of morphologically identical units.
The lexicological fields are divided into the following parts.
The subfield C1 contains the derivation forms reflecting derivative possibilities
of the lexical unit, e.g. kul - hand, kulnut - handless.
The subfield C2 contains the index which enables us to find the ideographic
groups such as parts of body of man, animals and birds, utensil, dishes,
cookery, food stuff, etc.
The subfield C3 contains hard established phraseological expressions and
syntagmas in which the dictionary units participate the most frequently and the
words close to them by their meaning. In fact, this subfield reflects semantic
actants of the lexeme: paradigmatic (e.g. ṭul - grapes; ṭulla imaǯi -
vintage; ġoma - bunch of grapes; gila - berry; č̣aṗ - vine; č̣aṗluġ -
vineyard; φi - wine) and syntagmatic (kul - hand; kin kaša - finger;
kulkex biq̇sun - to shake hands; muš - wind; mušen puntexa - the
wind blows).
The subfield C4 contains the synonyms (e.g. there is the lexeme č̣aṗluġ -
vineyard's synonym "ga " in the subfield C4).
The subfield C5 contains the word etymology:
a) the common Caucasian stem (Indo-European or other) is given (e.g. in the
etymological subfield for the lexeme - head is pointed out that common
Caucasian "L" corresponds Udi "L", "B" - "B", "T" - zero);
b) bibliography (author, book, page);
c) corresponding forms in other Caucasian languages, such as Avarian, Dargian,
Lakian, Lezghinian, etc.
The electronic dictionary of Udi language is a morphological-lexicological
dictionary which contains both morphological and lexicological characteristics
of parts of speech. To a certain degree, syntactic requirements of the language
are taken into account as well. For instance, the adjectives are represented in
two ways in the database; as an attributive word (in such case, it has no
morphological field, since it is form-invariable by number and case), and as a
substantivized word (in this case, the adjective's basic structure in the main
reflects the structure of the noun, and it has the main, morphological and
lexicological fields). The subfields of the adjective's lexicological field are
structurized as those for the noun; though unlike the noun, it contains the
antonymous subfields and does not contain ideographic subfield. The derivation
subfield (C1) of the adjective in both cases is the same (as it is for the nouns)
and demonstrates the adjective's derivation ability; while C3 - hard
phraseological expressions and those syntagmas where given adjective
participates the most frequently. For example, pis - bad; pis-luġ - evil; pis-uḳla - spiteful (literally: with evil heart); pis-baksun - be out of temper;
amc̣io - empty; bul amc̣i - rattle-brained; kul amc̣i - lit.: with empty hands;
amc̣i besun - desolation; mac̣io - white; mac̣i luġ - whiteness; mac̣i kul -
lime (lit.: white ground); mac̣i xe - vodka (lit.: white water); φoφlin mac̣io -
albumen; bul mac̣i - grey-headed (lit.: white-headed); č́o mac̣i - right (lit.:
with white mouth). The antonyms are given in the antonymous subfield, e.g.
šel _ pis – (good - bad); buj _ amc̣i – (full – empty); mac̣i _ májn
white - black, etc.
The numeral database in the Udi electronic dictionary contains the ordinal
numbers, cardinal numbers and distributive words. Like the adjectives, they are
included into the database in two ways: as an attributive word and as a
substantivized word. In the first case, as an uninflected wordform, it is
represented without morphological field, while in the second case - like
substantivized wordforms.
The lexicological field for the numeral consists of the subfields: C1, C2, C3,
and C4 and accordingly, it demonstrates the numeral's derivation ability,
participation in syntagmas and hard phraseological expressions; the synonyms
and etymology are present. For instance, so - one; camǯi- the first; sã-sã -
one by one; sa q̇ärän / sa jak - one time. The database is created on the
basis of the material of Vartashian dialect, though often (mainly in the
synonym subfield) the material of the Nijian dialect is present as well. For
example, wu'ġ_wic̣ 70, C3 – xibq̇o _ wic̣ (Vart.), jetmiš (Turkish-Azerb.).
The dictionary includes the following pronouns: personal, demonstrative,
possessive, interrogative, reflexive, and infinite pronouns. Unlike other
dictionary units, the pronoun's lexicological field only contains the etymological
subfield. For example, the subfield C4 of the dictionary unit for - you provides
the information concerning the fact that the common Caucasian n gives n in
Udi language. Here the lexemes in other Caucasian languages relative to this
lexeme are given: mun (Avarian), men (Andian), mene (Akhvakhian), ma
(Dido), un (Dargian), un ← *wun (Archibian).
It is noteworthy that the Udi language database we are introducing is created
using the Georgian transcription system; its conversion into the Latin
transcription is not difficult. In this case the database will only contain the Udi
language material without translation.
The given structure enables us to carry out data processing of different kinds for
the language studied; to process different types of query including data retrieval
in various dictionaries (such as bilingual, stems, ideographic, synonyms,
etymological, inverse, etc.).
Creation the analogous databases for other Caucasian languages along with
elaboration of relative software will accelerate complex study of the Caucasian
languages and will be the first, most important stage of the progress of modern
methodology of their study.