scispace - formally typeset
Open AccessProceedings ArticleDOI

Nymble: a High-Performance Learning Name-finder

Reads0
Chats0
TLDR
This paper presents a statistical, learned approach to finding names and other nonrecursive entities in text (as per the MUC-6 definition of the NE task), using a variant of the standard hidden Markov model.
Abstract
This paper presents a statistical, learned approach to finding names and other nonrecursive entities in text (as per the MUC-6 definition of the NE task), using a variant of the standard hidden Markov model. We present our justification for the problem and our approach, a detailed discussion of the model itself and finally the successful results of this new approach.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

A theory of learning from different domains

TL;DR: A classifier-induced divergence measure that can be estimated from finite, unlabeled samples from the domains and shows how to choose the optimal combination of source and target error as a function of the divergence, the sample sizes of both domains, and the complexity of the hypothesis class.
Journal ArticleDOI

A survey of named entity recognition and classification

TL;DR: Observations about languages, named entity types, domains and textual genres studied in the literature, along with other critical aspects of NERC such as features and evaluation methods, are reported.
Journal ArticleDOI

Head-Driven Statistical Models for Natural Language Parsing

TL;DR: Three statistical models for natural language parsing are described, leading to approaches in which a parse tree is represented as the sequence of decisions corresponding to a head-centered, top-down derivation of the tree.
Book

The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data

TL;DR: Providing an in-depth examination of core text mining and link detection algorithms and operations, this text examines advanced pre-processing techniques, knowledge representation considerations, and visualization approaches.
Journal ArticleDOI

Automating the Construction of Internet Portals with Machine Learning

TL;DR: New research in reinforcement learning, information extraction and text classification that enables efficient spidering, the identification of informative text segments, and the population of topic hierarchies are described.
References
More filters
Book

Elements of information theory

TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
Journal Article

Coping with ambiguity and unknown words through probabilistic models

TL;DR: A new natural language system (PLUM) is constructed for extracting data from text, e.g., newswire text, based on results of experiments in predicting parts of speech of highly ambiguous words, predicting the intended interpretation of an utterance when more than one interpretation satisfies all known syntactic and semantic constraints.
Proceedings ArticleDOI

SRI International FASTUS system: MUC-6 test results and analysis

TL;DR: SRI International participated in the MUC-6 evaluation using the latest version of SRI's FASTUS system as mentioned in this paper, which is a cascaded finite state transducers, each providing an additional level of analysis of the input and merging of the final results.
Proceedings ArticleDOI

MITRE: description of the Alembic system used for MUC-6

TL;DR: As with several other veteran MUC participants, MITRE's Alembic system has undergone a major transformation in the past two years.
Proceedings ArticleDOI

BBN: description of the PLUM system as used for MUC-4

TL;DR: BBN's PLUM system (Probabilistic Language Understanding Model) was developed as part of a DARPA-funded research effort on integrating probabilistic language models with more traditional linguistic techniques.