What are the contributions mentioned in the paper "Introduction to wordnet: an on-line lexical database" ?

The WordNet system this paper is an on-line thesaurus that allows users to find the word they are looking for.

What is the entailment between verbs like bequeath and own?

Like the backward presupposition relation that holds between verbs like fail/succeed and try, the entailment between verbs like bequeath and own is characterized by the absence of temporal inclusion.

What are the common opposition relations among deadjectival verbs?

Many deadjectival verbs formed with a suffix such as -en or -ify inherit opposition relations from their root adjectives: lengthen/shorten, strengthen/weaken, prettify/uglify, for example.

What are the common adjectives that are not gradable?

Relational adjectives, like nouns and unlike descriptive adjectives, are not gradable: *the extremely atomic bomb, like *the extremely atom bomb or *the very baseball game, are not acceptable.

What is the semantic field containing verbs of bodily care and functions?

The semantic field containing verbs of bodily care and functions consists of a number of independent hierarchies that form a coherent semantic field by virtue of the fact that most of the verbs (wash, comb, shampoo, make up; ache, atrophy) select for the same kinds of noun arguments (body parts).

What is the common way to describe opposition relations among verbs?

As in the case of adjectives, much of the opposition among verbs is based on the morphological markedness of one member of an opposed pair, as in the pairs tie/untie and appear/disappear/fR.

Why do relational adjectives have pointers to the nouns?

And because their syntactic and semantic properties are a mixture of those of adjectives and those of nouns used as noun modifiers, rather than attempting to integrate them into either structure WordNet maintains a separate file of relational adjectives with pointers to the corresponding nouns.

Why do relational adjectives not have antonyms?

Since relational adjectives do not have antonyms, they cannot be incorporated into the clusters that characterize descriptive adjectives.

What is the problem with a topical thesaurus?

The problem with a topical thesaurus is that two look-ups are required, first on an alphabetical list and again in the thesaurus proper, thus doubling a user’s search time.

What is the reason to suspect that the elaborate color terminology available in the languages of industrialized countries?

There is some reason to suspect that the elaborate color terminology available in the languages of industrialized countries is a consequence of technological progress and not a natural linguistic development.

What is the need for a fine-grained sub-classification of creation verbs?

In discussing these verbs, Fellbaum and Kegl (1988) point out that the data suggest a need for a fine-grained sub-classification of creation verbs that distinguishes a class of verbs referring to acts of mental creation (such as as fabricate and compose) from verbs denoting the creation from raw materials (such as weave and mold).

(Open Access) Introduction to WordNet: An On-line Lexical Database (1990) | George A. Miller

Introduction to WordNet: An On-line Lexical Database

George A. Miller, Richard Beckwith, Christiane Fellbaum,

Derek Gross, and Katherine Miller

(Revised August 1993)

WordNet is an on-line lexical reference system whose design is inspired by current

psycholinguistic theories of human lexical memory. English nouns, verbs, and adjectives are

organized into synonym sets, each representing one underlying lexical concept. Different

relations link the synonym sets.

Standard alphabetical procedures for organizing lexical information put together

words that are spelled alike and scatter words with similar or related meanings

haphazardly through the list. Unfortunately, there is no obvious alternative, no other

simple way for lexicographers to keep track of what has been done or for readers to ﬁnd

the word they are looking for. But a frequent objection to this solution is that ﬁnding

things on an alphabetical list can be tedious and time-consuming. Many people who

would like to refer to a dictionary decide not to bother with it because ﬁnding the

information would interrupt their work and break their train of thought.

In this age of computers, however, there is an answer to that complaint. One

obvious reason to resort to on-line dictionaries—lexical databases that can be read by

computers—is that computers can search such alphabetical lists much faster than people

can. A dictionary entry can be available as soon as the target word is selected or typed

into the keyboard. Moreover, since dictionaries are printed from tapes that are read by

computers, it is a relatively simple matter to convert those tapes into the appropriate kind

of lexical database. Putting conventional dictionaries on line seems a simple and natural

marriage of the old and the new.

Once computers are enlisted in the service of dictionary users, however, it quickly

becomes apparent that it is grossly inefﬁcient to use these powerful machines as little

more than rapid page-turners. The challenge is to think what further use to make of

them. WordNet is a proposal for a more effective combination of traditional

lexicographic information and modern high-speed computation.

This, and the accompanying four papers, is a detailed report of the state of WordNet

as of 1990. In order to reduce unnecessary repetition, the papers are written to be read

consecutively.

Psycholexicology

Murray’s Oxford English Dictionary (1928) was compiled ‘‘on historical

principles’’ and no one doubts the value of the OED in settling issues of word use or

sense priority. By focusing on historical (diachronic) evidence, however, the OED, like

other standard dictionaries, neglected questions concerning the synchronic organization

of lexical knowledge.

- 2 -

It is now possible to envision ways in which that omission might be repaired. The

20th Century has seen the emergence of psycholinguistics, an interdisciplinary ﬁeld of

research concerned with the cognitive bases of linguistic competence. Both linguists and

psycholinguists have explored in considerable depth the factors determining the

contemporary (synchronic) structure of linguistic knowledge in general, and lexical

knowledge in particular—Miller and Johnson-Laird (1976) have proposed that research

concerned with the lexical component of language should be called psycholexicology.

As linguistic theories evolved in recent decades, linguists became increasingly explicit

about the information a lexicon must contain in order for the phonological, syntactic, and

lexical components to work together in the everyday production and comprehension of

linguistic messages, and those proposals have been incorporated into the work of

psycholinguists. Beginning with word association studies at the turn of the century and

continuing down to the sophisticated experimental tasks of the past twenty years,

psycholinguists have discovered many synchronic properties of the mental lexicon that

can be exploited in lexicography.

In 1985 a group of psychologists and linguists at Princeton University undertook to

develop a lexical database along lines suggested by these investigations (Miller, 1985).

The initial idea was to provide an aid to use in searching dictionaries conceptually, rather

than merely alphabetically—it was to be used in close conjunction with an on-line

dictionary of the conventional type. As the work proceeded, however, it demanded a

more ambitious formulation of its own principles and goals. WordNet is the result.

Inasmuch as it instantiates hypotheses based on results of psycholinguistic research,

WordNet can be said to be a dictionary based on psycholinguistic principles.

How the leading psycholinguistic theories should be exploited for this project was

not always obvious. Unfortunately, most research of interest for psycholexicology has

dealt with relatively small samples of the English lexicon, often concentrating on nouns

at the expense of other parts of speech. All too often, an interesting hypothesis is put

forward, ﬁfty or a hundred words illustrating it are considered, and extension to the rest

of the lexicon is left as an exercise for the reader. One motive for developing WordNet

was to expose such hypotheses to the full range of the common vocabulary. WordNet

presently contains approximately 95,600 different word forms (51,500 simple words and

44,100 collocations) organized into some 70,100 word meanings, or sets of synonyms,

and only the most robust hypotheses have survived.

The most obvious difference between WordNet and a standard dictionary is that

WordNet divides the lexicon into ﬁve categories: nouns, verbs, adjectives, adverbs, and

function words. Actually, WordNet contains only nouns, verbs, adjectives, and adverbs.

The relatively small set of English function words is omitted on the assumption

(supported by observations of the speech of aphasic patients: Garrett, 1982) that they are

probably stored separately as part of the syntactic component of language. The

realization that syntactic categories differ in subjective organization emerged ﬁrst from

studies of word associations. Fillenbaum and Jones (1965), for example, asked English-

 

A discussion of adverbs is not included in the present collection of papers.

- 3 -

speaking subjects to give the ﬁrst word they thought of in response to highly familiar

words drawn from different syntactic categories. The modal response category was the

same as the category of the probe word: noun probes elicited nouns responses 79% of the

time, adjectives elicited adjectives 65% of the time, and verbs elicited verbs 43% of the

time. Since grammatical speech requires a speaker to know (at least implicitly) the

syntactic privileges of different words, it is not surprising that such information would be

readily available. How it is learned, however, is more of a puzzle: it is rare in connected

discourse for adjacent words to be from the same syntactic category, so Fillenbaum and

Jones’s data cannot be explained as association by continguity.

The price of imposing this syntactic categorization on WordNet is a certain amount

of redundancy that conventional dictionaries avoid—words like back, for example, turn

up in more than one category. But the advantage is that fundamental differences in the

semantic organization of these syntactic categories can be clearly seen and systematically

exploited. As will become clear from the papers following this one, nouns are organized

in lexical memory as topical hierarchies, verbs are organized by a variety of entailment

relations, and adjectives and adverbs are organized as N-dimensional hyperspaces. Each

of these lexical structures reﬂects a different way of categorizing experience; attempts to

impose a single organizing principle on all syntactic categories would badly misrepresent

the psychological complexity of lexical knowledge.

The most ambitious feature of WordNet, however, is its attempt to organize lexical

information in terms of word meanings, rather than word forms. In that respect,

WordNet resembles a thesaurus more than a dictionary, and, in fact, Laurence Urdang’s

revision of Rodale’s The Synonym Finder (1978) and Robert L. Chapman’s revision of

Roget’s International Thesaurus (1977) have been helpful tools in putting WordNet

together. But neither of those excellent works is well suited to the printed form. The

problem with an alphabetical thesaurus is redundant entries: if word W

and word W

are

synonyms, the pair should be entered twice, once alphabetized under W

and again

alphabetized under W

. The problem with a topical thesaurus is that two look-ups are

required, ﬁrst on an alphabetical list and again in the thesaurus proper, thus doubling a

user’s search time. These are, of course, precisely the kinds of mechanical chores that a

computer can perform rapidly and efﬁciently.

WordNet is not merely an on-line thesaurus, however. In order to appreciate what

more has been attempted in WordNet, it is necessary to understand its basic design

(Miller and Fellbaum, 1991).

The Lexical Matrix

Lexical semantics begins with a recognition that a word is a conventional

association between a lexicalized concept and an utterance that plays a syntactic role.

This deﬁnition of ‘‘word’’ raises at least three classes of problems for research. First,

what kinds of utterances enter into these lexical associations? Second, what is the nature

and organization of the lexicalized concepts that words can express? Third, what

syntactic roles do different words play? Although it is impossible to ignore any of these

questions while considering only one, the emphasis here will be on the second class of

- 4 -

problems, those dealing with the semantic structure of the English lexicon.

Since the word ‘‘word’’ is commonly used to refer both to the utterance and to its

associated concept, discussions of this lexical association are vulnerable to

terminological confusion. In order to reduce ambiguity, therefore, ‘‘word form’’ will be

used here to refer to the physical utterance or inscription and ‘‘word meaning’’ to refer to

the lexicalized concept that a form can be used to express. Then the starting point for

lexical semantics can be said to be the mapping between forms and meanings (Miller,

1986). A conservative initial assumption is that different syntactic categories of words

may have different kinds of mappings.

Table 1 is offered simply to make the notion of a lexical matrix concrete. Word

forms are imagined to be listed as headings for the columns; word meanings as headings

for the rows. An entry in a cell of the matrix implies that the form in that column can be

used (in an appropriate context) to express the meaning in that row. Thus, entry E

1,1

implies that word form F

can be used to express word meaning M

. If there are two

entries in the same column, the word form is polysemous; if there are two entries in the

same row, the two word forms are synonyms (relative to a context).

Table 1

Illustrating the Concept of a Lexical Matrix:

and F

are synonyms; F

is polysemous



Word Word Forms

Meanings F

. . . F



1,1

1,2

2,2

3,3

. .

m,n





Mappings between forms and meanings are many:many—some forms have several

different meanings, and some meanings can be expressed by several different forms.

Two difﬁcult problems of lexicography, polysemy and synonymy, can be viewed as

complementary aspects of this mapping. That is to say, polysemy and synonymy are

problems that arise in the course of gaining access to information in the mental lexicon: a

listener or reader who recognizes a form must cope with its polysemy; a speaker or writer

who hopes to express a meaning must decide between synonyms.

As a parenthetical comment, it should be noted that psycholinguists frequently

represent their hypotheses about language processing by box-and-arrow diagrams. In

that notation, a lexical matrix could be represented by two boxes with arrows going

between them in both directions. One box would be labeled ‘Word Meaning’ and the

other ‘Word Form’; arrows would indicate that a language user could start with a

meaning and look for appropriate forms to express it, or could start with a form and

- 5 -

retrieve appropriate meanings. This box-and-arrow representation makes clear the

difference between meaning:meaning relations (in the Word Meaning box) and

word:word relations (in the Word Form box). In its initial conception, WordNet was

concerned solely with the pattern of semantic relations between lexicalized concepts; that

is to say, it was to be a theory of the Word Meaning box. As work proceeded, however,

it became increasingly clear that lexical relations in the Word Form box could not be

ignored. At present, WordNet distinguishes between semantic relations and lexical

relations; the emphasis is still on semantic relations between meanings, but relations

between words are also included.

Although the box-and-arrow representation respects the difference between these

two kinds of relations, it has the disadvantage that the intricate details of the many:many

mapping between meanings and forms are slighted, which not only conceals the

reciprocity of polysemy and synonymy, but also obscures the major device used in

WordNet to represent meanings. For that reason, this description of WordNet has been

introduced in terms of a lexical matrix, rather than as a box-and-arrow diagram.

How are word meanings represented in WordNet? In order to simulate a lexical

matrix it is necessary to have some way to represent both forms and meanings in a

computer. Inscriptions can provide a reasonably satisfactory solution for the forms, but

how meanings should be represented poses a critical question for any theory of lexical

semantics. Lacking an adequate psychological theory, methods developed by

lexicographers can provide an interim solution: deﬁnitions can play the same role in a

simulation that meanings play in the mind of a language user.

How lexicalized concepts are to be represented by deﬁnitions in a theory of lexical

semantics depends on whether the theory is intended to be constructive or merely

differential. In a constructive theory, the representation should contain sufﬁcient

information to support an accurate construction of the concept (by either a person or a

machine). The requirements of a constructive theory are not easily met, and there is

some reason to believe that the deﬁnitions found in most standard dictionaries do not

meet them (Gross, Kegl, Gildea, and Miller, 1989; Miller and Gildea, 1987). In a

differential theory, on the other hand, meanings can be represented by any symbols that

enable a theorist to distinguish among them. The requirements for a differential theory

are more modest, yet sufﬁce for the construction of the desired mappings. If the person

who reads the deﬁnition has already acquired the concept and needs merely to identify it,

then a synonym (or near synonym) is often sufﬁcient. In other words, the word meaning

in Table 1 can be represented by simply listing the word forms that can be used to

express it: {F

, F

, . . . }. (Here and later, the curly brackets, ‘{’ and ‘},’ surround the

sets of synonyms that serve as identifying deﬁnitions of lexicalized concepts.) For

example, someone who knows that board can signify either a piece of lumber or a group

of people assembled for some purpose will be able to pick out the intended sense with no

more help than plank or committee. The synonym sets, {board, plank} and {board,

committee} can serve as unambiguous designators of these two meanings of board.

These synonym sets (synsets) do not explain what the concepts are; they merely signify

that the concepts exist. People who know English are assumed to have already acquired

Introduction to WordNet: An On-line Lexical Database

Figures

Citations

WordNet: a lexical database for English

WordNet : an electronic lexical database

YOLO9000: Better, Faster, Stronger

YOLO9000: Better, Faster, Stronger

Mining and summarizing customer reviews

References

The Ecological Approach to Visual Perception

How to do things with words

The Measurement of Meaning

Basic objects in natural categories

Cognitive Representations of Semantic Categories.

Related Papers (5)

WordNet : an electronic lexical database

WordNet: a lexical database for English

Introduction to Modern Information Retrieval

Foundations of Statistical Natural Language Processing

Indexing by Latent Semantic Analysis

Frequently Asked Questions (12)

Q1. What are the contributions mentioned in the paper "Introduction to wordnet: an on-line lexical database" ?

Q2. What is the entailment between verbs like bequeath and own?

Q3. What are the common opposition relations among deadjectival verbs?

Q4. What are the common adjectives that are not gradable?

Q5. What is the semantic field containing verbs of bodily care and functions?

Q6. What is the common way to describe opposition relations among verbs?

Q7. Why do relational adjectives have pointers to the nouns?

Q8. Why do relational adjectives not have antonyms?

Q9. What is the problem with a topical thesaurus?

Q10. What are the two problems that arise in the course of gaining access to information in the mental?

Q11. What is the reason to suspect that the elaborate color terminology available in the languages of industrialized countries?

Q12. What is the need for a fine-grained sub-classification of creation verbs?