Topic
Terminology extraction
About: Terminology extraction is a research topic. Over the lifetime, 411 publications have been published within this topic receiving 7396 citations. The topic is also known as: term extraction.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: This paper presents a domain-independent method for the automatic extraction of multi-word terms, from machine-readable special language corpora, using C-value/NC-value, which enhances the common statistical measure of frequency of occurrence for term extraction, making it sensitive to a particular type ofMulti- word terms, the nested terms.
Abstract: Technical terms (henceforth called terms ), are important elements for digital libraries. In this paper we present a domain-independent method for the automatic extraction of multi-word terms, from machine-readable special language corpora. The method, (C-value/NC-value ), combines linguistic and statistical information. The first part, C-value, enhances the common statistical measure of frequency of occurrence for term extraction, making it sensitive to a particular type of multi-word terms, the nested terms. The second part, NC-value, gives: 1) a method for the extraction of term context words (words that tend to appear with terms); 2) the incorporation of information from term context words to the extraction of terms.
849 citations
••
TL;DR: This paper identifies some linguistic properties of technical terminology, and uses them to formulate an algorithm for identifying technical terms in running text, and presents a terminology indentification algorithm that is motivated by these linguistic properties.
Abstract: This paper identifies some linguistic properties of technical terminology, and uses them to formulate an algorithm for identifying technical terms in running text. The grammatical properties discussed are preferred phrase structures: technical terms consist mostly of noun phrases containing adjectives, nouns, and occasionally prepositions; rerely do terms contain verbs, adverbs, or conjunctions. The discourse properties are patterns of repetition that distinguish noun phrases that are technical terms, especially those multi-word phrases that constitute a substantial majority of all technical vocabulary, from other types of noun phrase.The paper presents a terminology indentification algorithm that is motivated by these linguistic properties. An implementation of the algorithm is described; it recovers a high proportion of the technical terms in a text, and a high proportaion of the recovered strings are vaild technical terms. The algorithm proves to be effective regardless of the domain of the text to which it is applied.
794 citations
••
TL;DR: The major challenge of biomedical text mining over the next 5-10 years will require enhanced access to full text, better understanding of the feature space of biomedical literature, better methods for measuring the usefulness of systems to users, and continued cooperation with the biomedical research community to ensure that their needs are addressed.
Abstract: The volume of published biomedical research, and therefore the underlying biomedical knowledge base, is expanding at an increasing rate. Among the tools that can aid researchers in coping with this information overload are text mining and knowledge extraction. Significant progress has been made in applying text mining to named entity recognition, text classification, terminology extraction, relationship extraction and hypothesis generation. Several research groups are constructing integrated flexible text-mining systems intended for multiple uses. The major challenge of biomedical text mining over the next 5–10 years is to make these systems useful to biomedical researchers. This will require enhanced access to full text, better understanding of the feature space of biomedical literature, better methods for measuring the usefulness of systems to users, and continued cooperation with the biomedical research community to ensure that their needs are addressed.
782 citations
••
TL;DR: This paper tries to give an overview of the principles and methods of automatic term recognition and two major trends are examined, i.e., studies in automatic recognition of significant elements for indexing mainly carried out in information-retrieval circles and current research in automaticterm recognition in the field of computational linguistics.
Abstract: Following the growing interest in "corpus-based" approaches to computational linguistics, a number of studies have recently appeared on the topic of automatic term recognition or extraction. Because a successful term-recognition method has to be based on proper insights into the nature of terms, studies of automatic term recognition not only contribute to the applications of computational linguistics but also to the theoretical foundation of terminology. Many studies on automatic term recognition treat interesting aspects of terms, but most of them are not well founded and described.This paper tries to give an overview of the principles and methods of automatic term recognition. For that purpose, two major trends are examined, i.e., studies in automatic recognition of significant elements for indexing mainly carried out in information-retrieval circles and current research in automatic term recognition in the field of computational linguistics.
409 citations
••
TL;DR: This paper describes a new hybrid term extraction technique for technical corpora that base its term extraction process on lexical items selected by a statistical test that targets items that are highly specific to the technical corpus being analyzed.
Abstract: This paper describes a new hybrid term extraction technique for technical corpora. Our main goal is to reduce the amount of noise in the list of candidate terms by restricting the lexical items that can appear inside candidate terms. In order to do so, we base our term extraction process on lexical items selected by a statistical test that targets items that are highly specific to the technical corpus being analyzed.
253 citations