scispace - formally typeset
Open AccessJournal ArticleDOI

Ways of trying in Russian: clustering behavioral profiles

Dagmar Divjak, +1 more
- 01 Jan 2006 - 
- Vol. 2, Iss: 1, pp 23-60
Reads0
Chats0
TLDR
The results show that this behavioral profile approach can be used to elucidate the internal structure of the group of near synonymous verbs and present it as a radial network structured around a prototypical member and to make explicit the scales of variation along which the near synonymous verb vary.
Abstract
Abstract This article proposes a methodology for addressing three long-standing problems of near synonym research. First, we show how the internal structure of a group of near synonyms can be revealed. Second, we deal with the problem of distinguishing the subclusters and the words in those subclusters from each other. Finally, we illustrate how these results identify the semantic properties that should be mentioned in lexicographic entries. We illustrate our methodology with a case study on nine near synonymous Russian verbs that, in combination with an infinitive, express TRY. Our approach is corpus-linguistic and quantitative: assuming a strong correlation between semantic and distributional properties, we analyze 1,585 occurrences of these verbs taken from the Amsterdam Corpus and the Russian National Corpus, supplemented where necessary with data from the Web. We code each particular instance in terms of 87 variables (a.k.a. ID tags), i. e., morphosyntactic, syntactic and semantic characteristics that form a verb's behavioral profile. The resulting co-occurrence table is evaluated by means of a hierarchical agglomerative cluster analysis and additional quantitative methods. The results show that this behavioral profile approach can be used (i) to elucidate the internal structure of the group of near synonymous verbs and present it as a radial network structured around a prototypical member and (ii) to make explicit the scales of variation along which the near synonymous verbs vary.

read more

Content maybe subject to copyright    Report

Ways of trying in Russian: clustering
behavioral profiles
DAGMAR DIVJAK and STEFAN TH. GRIES*
Abstract
This article proposes a methodology for addressing three long-standing
problems of near synonym research. First, we show how the internal struc-
ture of a group of near synonyms can be revealed. Second, we deal with
the problem of distinguishing the subclusters and the words in those sub-
clusters from each other. Finally, we illustrate how these results identify
the semantic properties that should be mentioned in lexicographic entries.
We illustrate our methodology with a case study on nine near synonymous
Russian verbs that, in combination with an infinitive, express TRY.
Our approach is corpus-linguistic and quantitative: assuming a strong
correlation between semantic and distributional properties, we analyze
1,585 occurrences of these verbs taken from the Amsterdam Corpus and
the Russian National Corpus, supplemented where necessary with data
from the Web. We code each particular instance in terms of 87 variables
(a.k.a. ID tags), i. e., morphosyntactic, syntactic and semantic character-
istics that form a verb’s behavioral profile. The resulting co-occurrence ta-
ble is evaluated by means of a hierarchical agglomerative cluster analysis
and additional quantitative methods. The results show that this behavioral
profile approach can be used (i) to elucidate the internal structure of the
group of near synonymous verbs and present it as a radial network struc-
tured around a prototypical member and (ii) to make explicit the scales of
variation along which the near synonymous verbs vary.
Key words: (near) synonymy, behavioral profiles, ID tags, (hierarchical
agglomerative) cluster analysis, t-values, z-scores, Russian,
verbs of trying
1. Introduction
An intriguiging meaning relation in natural language is that of “(near)
sameness of meaning”, i. e., (near) synonymy. Synonymy has received
Corpus Linguistics and Linguistic Theory 21 (2006), 2360 1613-7027/06/00020023
DOI 10.1515/CLLT.2006.002
Walter de Gruyter
Brought to you by | Swets (Swets)
Authenticated | 172.16.1.226
Download Date | 4/18/12 2:12 PM

24 D. Divjak and St. Th. Gries
relatively little attention in Western linguistics in recent years. It is said
to “waste” the limited lexical resources on one and the same semantic
unit, and therefore it should not exist in an ideal one-to-one semiotic
system (De Jonge 1993: 521; Taylor 2003: 264). But, even if synonyms
name one and the same thing, they name it in different ways; they pre-
sent different perspectives on a situation. And this provides interesting
information on how a particular semantic and related conceptual space
is structured. In what follows we will show how a quantitative corpus
linguistic approach that is informed by findings from cognitive linguis-
tics provides a solid empirical basis for theoretical modeling.
1.1. Problems of research on near synonymy
Near synonymy is an area in which the theoretical interests of the lexical
semanticist and the applied interests of the lexicographer converge. It is,
however, also a particularly problematic area, both from a general, lexi-
cal-semantic point of view as from a more specific, synonymy-related
stance.
Defining any word’s meaning or distinguishing between its senses is a
rather elusive endeavor; as a consequence, cases of ambiguity and vague-
ness are difficult to deal with on a principled and objective basis. This
situation would be even aggravated when the semantic tests used for
distinguishing senses of a single word, i. e., cases of polysemy, were ap-
plied to the study of groups that contain two or more semantically simi-
lar words, i. e., synonymy. Since no two words ever are exact synonyms,
but instead always differ from a syntactic, semantic and/or pragmatic
point of view, among other things, scholars assume the existence of a
scale of synonymy (Cruse 1986: 267268; Taylor 2003: 265). Although
such a scalar view on synonymy obviates the need for clear-cut decisions,
it leaves the analyst with a multitude of possible scalar distinctions to
choose from. Some of these problems, which are germane to synonymy
research, are summarized briefly in what follows. As examples, we will
use tentative verbs (i. e., verbs that express try-ing) in Russian.
1.1.1. The delineation problem
The first main problem of synonym research concerning word X, e. g., a
Russian tentative verb, is how to decide which near synonyms of X
should be mentioned in X’s entry and which ones should be left out. A
check of three major works dealing with tentative verbs reveals that none
of them lists exactly the same verbs.
Consider as a first example, Apresjan et al. (1999: 303308), who
following the principles of the semantic metalanguage (cf. below for
some details) list only the verbs in (1) as verbs that share the meaning
Brought to you by | Swets (Swets)
Authenticated | 172.16.1.226
Download Date | 4/18/12 2:12 PM

Ways of trying in Russian: clustering behavioral profiles 25
“make an effort in order to carry out a certain action, while the subject
or the speaker does not know whether the effort in question will lead to
the necessary result”:
(1) probovat’ (‘try’), pytat’sja (‘try, attempt’), starat’sja (‘try, endeavor’),
silit’sja (‘try, make efforts’)
By contrast, Apresjan et al. (1999: 308) list the verbs in (2) as also
exhibiting a significant part of the general semantic structure attributed
to the verbs in (1), although there is not enough overlap for them to be
considered full-fledged near synonyms of the verbs in (1):
(2) dobivat’sja (‘get, obtain’), domogat’sja (‘seek, solicit’), chotet’ (‘want,
intend’), namerevat’sja (‘intend, mean’), stremit’sja (‘strive, try’),
rvat’sja (‘strain, burst to’), poryvat’sja (‘try, endeavor’), bit’sja (‘strug-
gle’), osilit’ (‘manage’), ts
ˇ
c
ˇ
it’sja (‘try, endeavor’), pyz
ˇ
it’sja (‘go all
out’), norovit’ (‘try, strive to, aim at’), ispytyvat’ (‘test’), probovat’
2
(‘test’)
A second work, C
ˇ
ernova (1996: 87), counts nine verbs that fit the ‘try,
attempt’-stage of the frame ‘plan accomplishment of the plan’. They
are listed under (3):
(3) probovat’ (‘try’), pytat’sja (‘try, attempt’), starat’sja (‘try, endeavor’),
norovit’ (‘try, strive to, aim at’), silit’sja (‘try, make efforts’), ts
ˇ
c
ˇ
it’sja
(‘try, endeavor’), iskat’ (‘look for, seek’), domogat’sja (‘seek, solicit’),
ne poc
ˇ
esat’sja (‘(not) to be itching to’)
C
ˇ
ernova (1996: 87) states that “a situation of trial and attempt arises
when the subject is not convinced that s/he will reach the desired result.
This uncertainty is brought about by the presence of external or internal
obstacles and by the subject’s lack of experience to carry out the action”.
Finally, the dictionary of synonyms by Evgen’eva (2001, 2: 323, 496)
includes as synonyms of pytat’sja (‘try, attempt’) only probovat’ (‘try’).
The six verbs in (4) are paraphrased as meaning ‘make an effort in order
to obtain or realize something’ and are given as synonyms for starat’sja
(‘try, endeavor’):
(4) stremit’sja ‘strive, try’, pytat’sja (‘try, attempt’), norovit’ (‘try, strive
to, aim at’), silit’sja (‘try, make efforts’), ts
ˇ
c
ˇ
it’sja (‘try, endeavor’),
pyz
ˇ
it’sja (‘go all out’)
Comparing the verbs presented in Apresjan et al. (1999) with those
listed by C
ˇ
ernova (1996) and Evgen’eva (2001) exemplifies a typical
Brought to you by | Swets (Swets)
Authenticated | 172.16.1.226
Download Date | 4/18/12 2:12 PM

26 D. Divjak and St. Th. Gries
problem within synonymy research: different (groups of) researchers ar-
rive at rather distinct sets of near synonymous verbs. Apresjan et al.’s
core group of tentative verbs consists of four verbs, while that of C
ˇ
erno-
va’s consists of nine and Evgen’eva distinguishes two groups containing
two and six verbs respectively. In other words, Apresjan draws the line
for near synonyms, listed in (1), much tighter than the other two re-
searchers. At the same time, he includes reference to groups of “semanti-
cally similar verbs”, enumerated in (2); the semantic criteria for inclusion
in this latter type of group are much looser, as already becomes apparent
from the translations that, apart from verbs that express try, include
verbs like test, struggle and manage.
1
The method to be outlined below
will address this problem of delineation in more detail.
1.1.2. The structuring problem
The comparison of the three researchers’ groups of tentative verbs does
not only show that the groups differ in terms of the number of verbs
that are considered as near synonyms it also reveals differences con-
cerning how the groups of verbs are structured. For example, Apresjan
et al. (1999: 303308) treat probovat’ and starat’sja as belonging to a
small core group of tentative verbs whereas C
ˇ
ernova (1996: 87) classifies
them as belonging to a much larger group of tentative verbs. Evgen’eva
(2001: 323, 496), however, keeps probovat’ separate from nearly all other
verbs, which are classed together with starat’sja. Conflicting analyses
such as these underscore the general need for more objective and thus
replicable lexical-semantic analyses. The internal structure of a group of
near synonyms is an issue that has hitherto remained largely undiscussed
in the literature. With the notable exception of Edmonds and Hirst
(2002), many if not most analyses we are aware of tend to treat syn-
onyms in pairs; cf. standard textbook references (cf., e. g., Cruse 1986;
Saaed 1997), lexical-semantic studies (cf., e. g., Geeraerts 1985; Mondry
and Taylor 1992), corpus-based studies (cf., e. g., Gries 2001, 2003;
Kjellmer 2003; Taylor 2002), etc. However, synonym dictionaries and
thesauri typically cross-reference individual words, thus revealing that
pairs of near synonyms form larger series of semantically similar words
and word fields. This raises the question of whether and, if so, how
semantically coherent categories are structured internally. We will take
up this question below.
1.1.3. The description problem
The third problem is one of comparing potentially synonymous words.
A prerequisite for measuring the similarity between words is having a
Brought to you by | Swets (Swets)
Authenticated | 172.16.1.226
Download Date | 4/18/12 2:12 PM

Ways of trying in Russian: clustering behavioral profiles 27
means to compare them. While traditional semantic analysis has ad-
vanced several tests to uncover subtle differences between senses of lex-
emes and/or between lexemes (cf. Cruse 1986 for insightful discussion)
these tests offer relative judgments, e. g., which senses and/or lexemes are
more or less similar in meaning, without specifying the magnitude of the
difference and providing an objective motiviation for it. In addition,
given that these tests are not designed to produce precise results, they
have only low interrater reliability and replicability. What is needed,
therefore, is a reliable means to capture differences between different
words’ meaning.
1.2. Objectives and overview of the present paper
In this paper, we develop and outline a largely objective and verifiable
approach to tackle two of the three above-mentioned problems, i. e.,
structuring and description (for delineation see Divjak 2004, in press).
The paper is organized as follows. Section 2 presents our methodology
in quite some detail. More specifically, we will comprehensively discuss
the corpus data we investigate (Section 2.1) as well as all the variables
making up a behavioral profile (Section 2.2). In addition we will intro-
duce the statistical method of hierarchical agglomerative cluster analysis,
which we use for inferring structure from the data (Section 2.3). Section
3 presents the results of cluster analysis applied to the full set of vari-
ables. Section 4 illustrates how the results from the cluster analysis and
additional statistics derived from the behavioral profiles can be used to
develop a radial network of the verbs investigated; in addition, it shows
how this approach facilitates identifying subtle semantic differences be-
tween near synonymous verbs. Finally, Section 5 summarizes our main
results and outlines possibilities for future research.
We use the example of near synonymous tentative verbs in Russian
(introduced above) to exemplify our method, but wish to emphasize
from the outset that it is applicable to other areas of lexical semantics as
well; in fact, the method has already been applied successfully to the
analysis of a highly polysemous English verb, run (cf. Gries 2006). In
other words, this study has a theoretical and methodological focus: it
aims at elucidating the structure of the category of tentative verbs and
finding the elements that are of interest for describing the prototype
usage of each verb it does not pretend to offer a full-fledged “lexico-
graphical portrait” (Apresjan et al. 1995) of each tentative verb.
In this paper, we approach near synonyms from a quantitative, cor-
pus-linguistic perspective. More specifically and like most other corpus-
linguistic approaches to lexicography/semantics, we make use of an as-
sumed correlation between distributional similarity on the one hand and
Brought to you by | Swets (Swets)
Authenticated | 172.16.1.226
Download Date | 4/18/12 2:12 PM

Citations
More filters
Book

How to do Linguistics with R: Data exploration and statistical analysis

TL;DR: How to do Linguistics with R: Data exploration and statistical analysis is unique in its scope, as it covers a wide range of classical and cutting-edge statistical methods, including different flavours of regression analysis and ANOVA, random forests and conditional inference trees, as well as specific linguistic approaches.
Book

Statistics in Corpus Linguistics: A Practical Guide

TL;DR: The book as mentioned in this paper gives step-by-step guidance through the process of statistical analysis and provides multiple examples of how statistical techniques can be used to analyse and visualise linguistic data It also includes a useful selection of discussion questions and exercises which you can use to check your understanding.
Book ChapterDOI

Behavioral profiles: A corpus-based approach to cognitive semantic analysis

TL;DR: Questions that concern what may be considered two of the central meaning relations in semantics, i.e. polysemy or the association of multiple meanings with one form and synonymy, are looked into.
Journal ArticleDOI

Every method counts: Combining corpus-based and experimental evidence in the study of synonymy

TL;DR: This study explores the concurrent, combined use of three research methods, statistical corpus analysis and two psycholinguistic experiments (a forced-choice and an acceptability rating task) using verbal synonymy in Finnish as a case in point to show that each method adds to the understanding of the studied phenomenon.
Journal ArticleDOI

Behavioral profiles: A fine-grained and quantitative approach in corpus-based lexical semantics

TL;DR: A fairly recent corpus-based approach to lexical semantics, the Behavioral Profile (BP) approach is introduced and its application to different lexical relations in English and Russian is exemplified with an eye to illustrating how the BP approach allows for the incorporation of different statistical techniques.
References
More filters
Journal ArticleDOI

Hierarchical Grouping to Optimize an Objective Function

TL;DR: In this paper, a procedure for forming hierarchical groups of mutually exclusive subsets, each of which has members that are maximally similar with respect to specified characteristics, is suggested for use in large-scale (n > 100) studies when a precise optimal solution for a specified number of groups is not practical.
Journal Article

PHYLIP-Phylogeny inference package (Version 3.2)

J. Felsenstein
- 01 Jan 1989 - 
BookDOI

Finding Groups in Data

TL;DR: In this article, an electrical signal transmission system for railway locomotives and rolling stock is proposed, where a basic pulse train is transmitted whereof the pulses are of a selected first amplitude and represent a train axle count, and a spike pulse of greater selected amplitude is transmitted, occurring immediately after the axle count pulse to which it relates, whenever an overheated axle box is detected.
Journal ArticleDOI

Basic objects in natural categories

TL;DR: In this paper, the authors define basic objects as those categories which carry the most information, possess the highest category cue validity, and are the most differentiated from one another, and thus the most distinctive from each other.
Proceedings Article

An Information-Theoretic Definition of Similarity

Dekang Lin
TL;DR: This work presents an informationtheoretic definition of similarity that is applicable as long as there is a probabilistic model and demonstrates how this definition can be used to measure the similarity in a number of different domains.
Frequently Asked Questions (2)
Q1. What contributions have the authors mentioned in the paper "Ways of trying in russian: clustering behavioral profiles" ?

This article proposes a methodology for addressing three long-standing problems of near synonym research. First, the authors show how the internal structure of a group of near synonyms can be revealed. The authors illustrate their methodology with a case study on nine near synonymous Russian verbs that, in combination with an infinitive, express TRY. Their approach is corpus-linguistic and quantitative: assuming a strong correlation between semantic and distributional properties, the authors analyze 1,585 occurrences of these verbs taken from the Amsterdam Corpus and the Russian National Corpus, supplemented where necessary with data from the Web. The results show that this behavioral profile approach can be used ( i ) to elucidate the internal structure of the group of near synonymous verbs and present it as a radial network structured around a prototypical member and ( ii ) to make explicit the scales of variation along which the near synonymous verbs vary. 

Obviously, it is possible to extend this analysis to include such data, yet in that case it would be of utmost importance not to neglect the variation that is induced by this factor. Further empirical underpinning for exploratory research using clustering algorithms can be sought in different directions, of which the authors mention the two most frequently used. The authors hope that this paper will stimulate future research along the lines suggested, given that they have shown how rewarding behavioral profiles and the proposed methods for their evaluation are for the analysis of near synonyms in particular and lexical-semantic research in general. In addition, results of this type may also be relevant to researchers from neighboring disciplines, such as psycholinguistics, when it comes to formulating and evaluating hypotheses concerning the interaction between grammar and lexicon in language acquisition and the mental reality of radial networks.