Automatic acquisition of hyponyms from large text corpora

doi:10.3115/992133.992154

Open AccessProceedings ArticleDOI

Automatic acquisition of hyponyms from large text corpora

Marti A. Hearst

- pp 539-545

Chats0

TLDR

A set of lexico-syntactic patterns that are easily recognizable, that occur frequently and across text genre boundaries, and that indisputably indicate the lexical relation of interest are identified.

Abstract:

We describe a method for the automatic acquisition of the hyponymy lexical relation from unrestricted text. Two goals motivate the approach: (i) avoidance of the need for pre-encoded knowledge and (ii) applicability across a wide range of text. We identify a set of lexico-syntactic patterns that are easily recognizable, that occur frequently and across text genre boundaries, and that indisputably indicate the lexical relation of interest. We describe a method for discovering these patterns and suggest that other lexical relations will also be acquirable in this way. A subset of the acquisition algorithm is implemented and the results are used to augment and critique the structure of a large hand-built thesaurus. Extensions and applications to areas such as information retrieval are suggested.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Distant supervision for relation extraction without labeled data

Mike D. Mintz, +3 more

TL;DR: This work investigates an alternative paradigm that does not require labeled corpora, avoiding the domain dependence of ACE-style algorithms, and allowing the use of corpora of any size.

...read moreread less

Journal ArticleDOI

A survey of named entity recognition and classification

David Nadeau, +1 more

- 01 Jan 2007 -

Lingvisticae Investigationes

TL;DR: Observations about languages, named entity types, domains and textual genres studied in the literature, along with other critical aspects of NERC such as features and evaluation methods, are reported.

...read moreread less

Proceedings ArticleDOI

Audio Set: An ontology and human-labeled dataset for audio events

Jort F. Gemmeke, +7 more

TL;DR: The creation of Audio Set is described, a large-scale dataset of manually-annotated audio events that endeavors to bridge the gap in data availability between image and audio research and substantially stimulate the development of high-performance audio event recognizers.

...read moreread less

Journal ArticleDOI

Word sense disambiguation: A survey

Roberto Navigli

- 23 Feb 2009 -

ACM Computing Surveys

TL;DR: This work introduces the reader to the motivations for solving the ambiguity of words and provides a description of the task, and overviews supervised, unsupervised, and knowledge-based approaches.

...read moreread less

Book

Ontology Learning for the Semantic Web

Alexander Maedche, +1 more

TL;DR: The authors present an ontology learning framework that extends typical ontology engineering environments by using semiautomatic ontology construction tools and encompasses ontology import, extraction, pruning, refinement and evaluation.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Introduction to WordNet: An On-line Lexical Database

George A. Miller, +4 more

- 01 Dec 1990 -

International Journal of Lexicography

TL;DR: Standard alphabetical procedures for organizing lexical information put together words that are spelled alike and scatter words with similar or related meanings haphazardly through the list.

...read moreread less

Journal Article

Lexical cohesion computed by thesaural relations as an indicator of the structure of text

Jane Morris, +1 more

- 01 Mar 1991 -

Computational Linguistics

TL;DR: Since the lexical chains are computable, and exist in non-domain-specific text, they provide a valuable indicator of text structure, and provide a semantic context for interpreting words, concepts, and sentences.

...read moreread less

Proceedings ArticleDOI

A Practical Part-of-Speech Tagger

Douglass R. Cutting, +3 more

TL;DR: An implementation of a part-of-speech tagger based on a hidden Markov model that enables robust and accurate tagging with few resource requirements and accuracy exceeds 96%.

...read moreread less

Proceedings ArticleDOI

Noun classification from predicate-argument structures

Donald Hindle

TL;DR: The resulting quasi-semantic classification of nouns demonstrates the plausibility of the distributional hypothesis, and has potential application to a variety of tasks, including automatic indexing, resolving nominal compounds, and determining the scope of modification.

...read moreread less

Book ChapterDOI

Providing machine tractable dictionary tools

Yorick Wilks, +5 more

- 01 Jun 1990 -

Machine Translation

TL;DR: This paper discusses three different but related large-scale computational methods to transform Mrds into Mtds, the Longman Dictionary of Contemporary English (Ldoce), which requires some handcoding of initial information but are largely automatic.

...read moreread less

Automatic acquisition of hyponyms from large text corpora

Citations

Distant supervision for relation extraction without labeled data

A survey of named entity recognition and classification

Audio Set: An ontology and human-labeled dataset for audio events

Word sense disambiguation: A survey

Ontology Learning for the Semantic Web

References

Introduction to WordNet: An On-line Lexical Database

Lexical cohesion computed by thesaural relations as an indicator of the structure of text

A Practical Part-of-Speech Tagger

Noun classification from predicate-argument structures

Providing machine tractable dictionary tools

Related Papers (5)

WordNet : an electronic lexical database

WordNet: a lexical database for English

Snowball: extracting relations from large plain-text collections

Yago: a core of semantic knowledge

Automatic Retrieval and Clustering of Similar Words