What contributions have the authors mentioned in the paper "Ways of trying in russian: clustering behavioral profiles" ?

This article proposes a methodology for addressing three long-standing problems of near synonym research. First, the authors show how the internal structure of a group of near synonyms can be revealed. The authors illustrate their methodology with a case study on nine near synonymous Russian verbs that, in combination with an infinitive, express TRY. Their approach is corpus-linguistic and quantitative: assuming a strong correlation between semantic and distributional properties, the authors analyze 1,585 occurrences of these verbs taken from the Amsterdam Corpus and the Russian National Corpus, supplemented where necessary with data from the Web. The results show that this behavioral profile approach can be used ( i ) to elucidate the internal structure of the group of near synonymous verbs and present it as a radial network structured around a prototypical member and ( ii ) to make explicit the scales of variation along which the near synonymous verbs vary.

What are the future works in "Ways of trying in russian: clustering behavioral profiles" ?

Obviously, it is possible to extend this analysis to include such data, yet in that case it would be of utmost importance not to neglect the variation that is induced by this factor. Further empirical underpinning for exploratory research using clustering algorithms can be sought in different directions, of which the authors mention the two most frequently used. The authors hope that this paper will stimulate future research along the lines suggested, given that they have shown how rewarding behavioral profiles and the proposed methods for their evaluation are for the analysis of near synonyms in particular and lexical-semantic research in general. In addition, results of this type may also be relevant to researchers from neighboring disciplines, such as psycholinguistics, when it comes to formulating and evaluating hypotheses concerning the interaction between grammar and lexicon in language acquisition and the mental reality of radial networks.

(Open Access) Ways of trying in Russian: clustering behavioral profiles (2006) | Dagmar Divjak

Ways of trying in Russian: clustering

behavioral profiles

DAGMAR DIVJAK and STEFAN TH. GRIES*

Abstract

This article proposes a methodology for addressing three long-standing

problems of near synonym research. First, we show how the internal struc-

ture of a group of near synonyms can be revealed. Second, we deal with

the problem of distinguishing the subclusters and the words in those sub-

clusters from each other. Finally, we illustrate how these results identify

the semantic properties that should be mentioned in lexicographic entries.

We illustrate our methodology with a case study on nine near synonymous

Russian verbs that, in combination with an infinitive, express TRY.

Our approach is corpus-linguistic and quantitative: assuming a strong

correlation between semantic and distributional properties, we analyze

1,585 occurrences of these verbs taken from the Amsterdam Corpus and

the Russian National Corpus, supplemented where necessary with data

from the Web. We code each particular instance in terms of 87 variables

(a.k.a. ID tags), i. e., morphosyntactic, syntactic and semantic character-

istics that form a verb’s behavioral profile. The resulting co-occurrence ta-

ble is evaluated by means of a hierarchical agglomerative cluster analysis

and additional quantitative methods. The results show that this behavioral

profile approach can be used (i) to elucidate the internal structure of the

group of near synonymous verbs and present it as a radial network struc-

tured around a prototypical member and (ii) to make explicit the scales of

variation along which the near synonymous verbs vary.

Key words: (near) synonymy, behavioral profiles, ID tags, (hierarchical

agglomerative) cluster analysis, t-values, z-scores, Russian,

verbs of trying

1. Introduction

An intriguiging meaning relation in natural language is that of “(near)

sameness of meaning”, i. e., (near) synonymy. Synonymy has received

Corpus Linguistics and Linguistic Theory 2⫺1 (2006), 23⫺60 1613-7027/06/0002⫺0023

DOI 10.1515/CLLT.2006.002

쑕

Walter de Gruyter

Brought to you by | Swets (Swets)

Authenticated | 172.16.1.226

Download Date | 4/18/12 2:12 PM

24 D. Divjak and St. Th. Gries

relatively little attention in Western linguistics in recent years. It is said

to “waste” the limited lexical resources on one and the same semantic

unit, and therefore it should not exist in an ideal one-to-one semiotic

system (De Jonge 1993: 521; Taylor 2003: 264). But, even if synonyms

name one and the same thing, they name it in different ways; they pre-

sent different perspectives on a situation. And this provides interesting

information on how a particular semantic and related conceptual space

is structured. In what follows we will show how a quantitative corpus

linguistic approach that is informed by findings from cognitive linguis-

tics provides a solid empirical basis for theoretical modeling.

1.1. Problems of research on near synonymy

Near synonymy is an area in which the theoretical interests of the lexical

semanticist and the applied interests of the lexicographer converge. It is,

however, also a particularly problematic area, both from a general, lexi-

cal-semantic point of view as from a more specific, synonymy-related

stance.

Defining any word’s meaning or distinguishing between its senses is a

rather elusive endeavor; as a consequence, cases of ambiguity and vague-

ness are difficult to deal with on a principled and objective basis. This

situation would be even aggravated when the semantic tests used for

distinguishing senses of a single word, i. e., cases of polysemy, were ap-

plied to the study of groups that contain two or more semantically simi-

lar words, i. e., synonymy. Since no two words ever are exact synonyms,

but instead always differ from a syntactic, semantic and/or pragmatic

point of view, among other things, scholars assume the existence of a

scale of synonymy (Cruse 1986: 267⫺268; Taylor 2003: 265). Although

such a scalar view on synonymy obviates the need for clear-cut decisions,

it leaves the analyst with a multitude of possible scalar distinctions to

choose from. Some of these problems, which are germane to synonymy

research, are summarized briefly in what follows. As examples, we will

use tentative verbs (i. e., verbs that express try-ing) in Russian.

1.1.1. The delineation problem

The first main problem of synonym research concerning word X, e. g., a

Russian tentative verb, is how to decide which near synonyms of X

should be mentioned in X’s entry and which ones should be left out. A

check of three major works dealing with tentative verbs reveals that none

of them lists exactly the same verbs.

Consider as a first example, Apresjan et al. (1999: 303⫺308), who ⫺

following the principles of the semantic metalanguage (cf. below for

some details) ⫺ list only the verbs in (1) as verbs that share the meaning

Brought to you by | Swets (Swets)

Authenticated | 172.16.1.226

Download Date | 4/18/12 2:12 PM

Ways of trying in Russian: clustering behavioral profiles 25

“make an effort in order to carry out a certain action, while the subject

or the speaker does not know whether the effort in question will lead to

the necessary result”:

(1) probovat’ (‘try’), pytat’sja (‘try, attempt’), starat’sja (‘try, endeavor’),

silit’sja (‘try, make efforts’)

By contrast, Apresjan et al. (1999: 308) list the verbs in (2) as also

exhibiting a significant part of the general semantic structure attributed

to the verbs in (1), although there is not enough overlap for them to be

considered full-fledged near synonyms of the verbs in (1):

(2) dobivat’sja (‘get, obtain’), domogat’sja (‘seek, solicit’), chotet’ (‘want,

intend’), namerevat’sja (‘intend, mean’), stremit’sja (‘strive, try’),

rvat’sja (‘strain, burst to’), poryvat’sja (‘try, endeavor’), bit’sja (‘strug-

gle’), osilit’ (‘manage’), ts

it’sja (‘try, endeavor’), pyz

it’sja (‘go all

out’), norovit’ (‘try, strive to, aim at’), ispytyvat’ (‘test’), probovat’

(‘test’)

A second work, C

ernova (1996: 87), counts nine verbs that fit the ‘try,

attempt’-stage of the frame ‘plan ⫺ accomplishment of the plan’. They

are listed under (3):

(3) probovat’ (‘try’), pytat’sja (‘try, attempt’), starat’sja (‘try, endeavor’),

norovit’ (‘try, strive to, aim at’), silit’sja (‘try, make efforts’), ts

it’sja

(‘try, endeavor’), iskat’ (‘look for, seek’), domogat’sja (‘seek, solicit’),

ne poc

esat’sja (‘(not) to be itching to’)

ernova (1996: 87) states that “a situation of trial and attempt arises

when the subject is not convinced that s/he will reach the desired result.

This uncertainty is brought about by the presence of external or internal

obstacles and by the subject’s lack of experience to carry out the action”.

Finally, the dictionary of synonyms by Evgen’eva (2001, 2: 323, 496)

includes as synonyms of pytat’sja (‘try, attempt’) only probovat’ (‘try’).

The six verbs in (4) are paraphrased as meaning ‘make an effort in order

to obtain or realize something’ and are given as synonyms for starat’sja

(‘try, endeavor’):

(4) stremit’sja ‘strive, try’, pytat’sja (‘try, attempt’), norovit’ (‘try, strive

to, aim at’), silit’sja (‘try, make efforts’), ts

it’sja (‘try, endeavor’),

pyz

it’sja (‘go all out’)

Comparing the verbs presented in Apresjan et al. (1999) with those

listed by C

ernova (1996) and Evgen’eva (2001) exemplifies a typical

Brought to you by | Swets (Swets)

Authenticated | 172.16.1.226

Download Date | 4/18/12 2:12 PM

26 D. Divjak and St. Th. Gries

problem within synonymy research: different (groups of) researchers ar-

rive at rather distinct sets of near synonymous verbs. Apresjan et al.’s

core group of tentative verbs consists of four verbs, while that of C

erno-

va’s consists of nine and Evgen’eva distinguishes two groups containing

two and six verbs respectively. In other words, Apresjan draws the line

for near synonyms, listed in (1), much tighter than the other two re-

searchers. At the same time, he includes reference to groups of “semanti-

cally similar verbs”, enumerated in (2); the semantic criteria for inclusion

in this latter type of group are much looser, as already becomes apparent

from the translations that, apart from verbs that express try, include

verbs like test, struggle and manage.

The method to be outlined below

will address this problem of delineation in more detail.

1.1.2. The structuring problem

The comparison of the three researchers’ groups of tentative verbs does

not only show that the groups differ in terms of the number of verbs

that are considered as near synonyms ⫺ it also reveals differences con-

cerning how the groups of verbs are structured. For example, Apresjan

et al. (1999: 303⫺308) treat probovat’ and starat’sja as belonging to a

small core group of tentative verbs whereas C

ernova (1996: 87) classifies

them as belonging to a much larger group of tentative verbs. Evgen’eva

(2001: 323, 496), however, keeps probovat’ separate from nearly all other

verbs, which are classed together with starat’sja. Conflicting analyses

such as these underscore the general need for more objective and thus

replicable lexical-semantic analyses. The internal structure of a group of

near synonyms is an issue that has hitherto remained largely undiscussed

in the literature. With the notable exception of Edmonds and Hirst

(2002), many if not most analyses we are aware of tend to treat syn-

onyms in pairs; cf. standard textbook references (cf., e. g., Cruse 1986;

Saaed 1997), lexical-semantic studies (cf., e. g., Geeraerts 1985; Mondry

and Taylor 1992), corpus-based studies (cf., e. g., Gries 2001, 2003;

Kjellmer 2003; Taylor 2002), etc. However, synonym dictionaries and

thesauri typically cross-reference individual words, thus revealing that

pairs of near synonyms form larger series of semantically similar words

and word fields. This raises the question of whether and, if so, how

semantically coherent categories are structured internally. We will take

up this question below.

1.1.3. The description problem

The third problem is one of comparing potentially synonymous words.

A prerequisite for measuring the similarity between words is having a

Brought to you by | Swets (Swets)

Authenticated | 172.16.1.226

Download Date | 4/18/12 2:12 PM

Ways of trying in Russian: clustering behavioral profiles 27

means to compare them. While traditional semantic analysis has ad-

vanced several tests to uncover subtle differences between senses of lex-

emes and/or between lexemes (cf. Cruse 1986 for insightful discussion)

these tests offer relative judgments, e. g., which senses and/or lexemes are

more or less similar in meaning, without specifying the magnitude of the

difference and providing an objective motiviation for it. In addition,

given that these tests are not designed to produce precise results, they

have only low interrater reliability and replicability. What is needed,

therefore, is a reliable means to capture differences between different

words’ meaning.

1.2. Objectives and overview of the present paper

In this paper, we develop and outline a largely objective and verifiable

approach to tackle two of the three above-mentioned problems, i. e.,

structuring and description (for delineation see Divjak 2004, in press).

The paper is organized as follows. Section 2 presents our methodology

in quite some detail. More specifically, we will comprehensively discuss

the corpus data we investigate (Section 2.1) as well as all the variables

making up a behavioral profile (Section 2.2). In addition we will intro-

duce the statistical method of hierarchical agglomerative cluster analysis,

which we use for inferring structure from the data (Section 2.3). Section

3 presents the results of cluster analysis applied to the full set of vari-

ables. Section 4 illustrates how the results from the cluster analysis and

additional statistics derived from the behavioral profiles can be used to

develop a radial network of the verbs investigated; in addition, it shows

how this approach facilitates identifying subtle semantic differences be-

tween near synonymous verbs. Finally, Section 5 summarizes our main

results and outlines possibilities for future research.

We use the example of near synonymous tentative verbs in Russian

(introduced above) to exemplify our method, but wish to emphasize

from the outset that it is applicable to other areas of lexical semantics as

well; in fact, the method has already been applied successfully to the

analysis of a highly polysemous English verb, run (cf. Gries 2006). In

other words, this study has a theoretical and methodological focus: it

aims at elucidating the structure of the category of tentative verbs and

finding the elements that are of interest for describing the prototype

usage of each verb ⫺ it does not pretend to offer a full-fledged “lexico-

graphical portrait” (Apresjan et al. 1995) of each tentative verb.

In this paper, we approach near synonyms from a quantitative, cor-

pus-linguistic perspective. More specifically and like most other corpus-

linguistic approaches to lexicography/semantics, we make use of an as-

sumed correlation between distributional similarity on the one hand and

Brought to you by | Swets (Swets)

Authenticated | 172.16.1.226

Download Date | 4/18/12 2:12 PM

Ways of trying in Russian: clustering behavioral profiles

Figures

Citations

How to do Linguistics with R: Data exploration and statistical analysis

Statistics in Corpus Linguistics: A Practical Guide

Behavioral profiles: A corpus-based approach to cognitive semantic analysis

Every method counts: Combining corpus-based and experimental evidence in the study of synonymy

Behavioral profiles: A fine-grained and quantitative approach in corpus-based lexical semantics

References

Hierarchical Grouping to Optimize an Objective Function

PHYLIP-Phylogeny inference package (Version 3.2)

Finding Groups in Data

Basic objects in natural categories

An Information-Theoretic Definition of Similarity

Related Papers (5)

Collostructions: Investigating the interaction of words and constructions

Contextual dependency and lexical sets

Multifactorial Analysis in Corpus Linguistics: A Study of Particle Placement

Corpora in cognitive linguistics : corpus-based approaches to syntax and lexis

Predicting the dative alternation

Frequently Asked Questions (2)

Q1. What contributions have the authors mentioned in the paper "Ways of trying in russian: clustering behavioral profiles" ?

Q2. What are the future works in "Ways of trying in russian: clustering behavioral profiles" ?