scispace - formally typeset
Search or ask a question
Topic

Terminology extraction

About: Terminology extraction is a research topic. Over the lifetime, 411 publications have been published within this topic receiving 7396 citations. The topic is also known as: term extraction.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper presents a domain-independent method for the automatic extraction of multi-word terms, from machine-readable special language corpora, using C-value/NC-value, which enhances the common statistical measure of frequency of occurrence for term extraction, making it sensitive to a particular type ofMulti- word terms, the nested terms.
Abstract: Technical terms (henceforth called terms ), are important elements for digital libraries. In this paper we present a domain-independent method for the automatic extraction of multi-word terms, from machine-readable special language corpora. The method, (C-value/NC-value ), combines linguistic and statistical information. The first part, C-value, enhances the common statistical measure of frequency of occurrence for term extraction, making it sensitive to a particular type of multi-word terms, the nested terms. The second part, NC-value, gives: 1) a method for the extraction of term context words (words that tend to appear with terms); 2) the incorporation of information from term context words to the extraction of terms.

849 citations

Journal ArticleDOI
TL;DR: This paper identifies some linguistic properties of technical terminology, and uses them to formulate an algorithm for identifying technical terms in running text, and presents a terminology indentification algorithm that is motivated by these linguistic properties.
Abstract: This paper identifies some linguistic properties of technical terminology, and uses them to formulate an algorithm for identifying technical terms in running text. The grammatical properties discussed are preferred phrase structures: technical terms consist mostly of noun phrases containing adjectives, nouns, and occasionally prepositions; rerely do terms contain verbs, adverbs, or conjunctions. The discourse properties are patterns of repetition that distinguish noun phrases that are technical terms, especially those multi-word phrases that constitute a substantial majority of all technical vocabulary, from other types of noun phrase.The paper presents a terminology indentification algorithm that is motivated by these linguistic properties. An implementation of the algorithm is described; it recovers a high proportion of the technical terms in a text, and a high proportaion of the recovered strings are vaild technical terms. The algorithm proves to be effective regardless of the domain of the text to which it is applied.

794 citations

Journal ArticleDOI
TL;DR: The major challenge of biomedical text mining over the next 5-10 years will require enhanced access to full text, better understanding of the feature space of biomedical literature, better methods for measuring the usefulness of systems to users, and continued cooperation with the biomedical research community to ensure that their needs are addressed.
Abstract: The volume of published biomedical research, and therefore the underlying biomedical knowledge base, is expanding at an increasing rate. Among the tools that can aid researchers in coping with this information overload are text mining and knowledge extraction. Significant progress has been made in applying text mining to named entity recognition, text classification, terminology extraction, relationship extraction and hypothesis generation. Several research groups are constructing integrated flexible text-mining systems intended for multiple uses. The major challenge of biomedical text mining over the next 5–10 years is to make these systems useful to biomedical researchers. This will require enhanced access to full text, better understanding of the feature space of biomedical literature, better methods for measuring the usefulness of systems to users, and continued cooperation with the biomedical research community to ensure that their needs are addressed.

782 citations

Journal ArticleDOI
TL;DR: This paper tries to give an overview of the principles and methods of automatic term recognition and two major trends are examined, i.e., studies in automatic recognition of significant elements for indexing mainly carried out in information-retrieval circles and current research in automaticterm recognition in the field of computational linguistics.
Abstract: Following the growing interest in "corpus-based" approaches to computational linguistics, a number of studies have recently appeared on the topic of automatic term recognition or extraction. Because a successful term-recognition method has to be based on proper insights into the nature of terms, studies of automatic term recognition not only contribute to the applications of computational linguistics but also to the theoretical foundation of terminology. Many studies on automatic term recognition treat interesting aspects of terms, but most of them are not well founded and described.This paper tries to give an overview of the principles and methods of automatic term recognition. For that purpose, two major trends are examined, i.e., studies in automatic recognition of significant elements for indexing mainly carried out in information-retrieval circles and current research in automatic term recognition in the field of computational linguistics.

409 citations

Journal ArticleDOI
TL;DR: This paper describes a new hybrid term extraction technique for technical corpora that base its term extraction process on lexical items selected by a statistical test that targets items that are highly specific to the technical corpus being analyzed.
Abstract: This paper describes a new hybrid term extraction technique for technical corpora. Our main goal is to reduce the amount of noise in the list of candidate terms by restricting the lexical items that can appear inside candidate terms. In order to do so, we base our term extraction process on lexical items selected by a statistical test that targets items that are highly specific to the technical corpus being analyzed.

253 citations


Network Information
Related Topics (5)
Machine translation
22.1K papers, 574.4K citations
84% related
Parsing
21.5K papers, 545.4K citations
83% related
Language model
17.5K papers, 545K citations
82% related
Natural language
31.1K papers, 806.8K citations
80% related
Ontology (information science)
57K papers, 869.1K citations
77% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202118
202018
201917
201827
20178
201626