Technical terminology: some linguistic properties and an algorithm for identification in text

doi:10.1017/S1351324900000048

Home
/
Papers
/
Technical terminology: some linguistic properties and an algorithm for identification in text

Journal Article•DOI•

Technical terminology: some linguistic properties and an algorithm for identification in text

John S. Justeson¹, Slava M. Katz²•Institutions (2)

University at Albany, SUNY¹, IBM²

01 Mar 1995-Natural Language Engineering (Cambridge University Press)-Vol. 1, Iss: 01, pp 9-27

TL;DR: This paper identifies some linguistic properties of technical terminology, and uses them to formulate an algorithm for identifying technical terms in running text, and presents a terminology indentification algorithm that is motivated by these linguistic properties.

read less

Abstract: This paper identifies some linguistic properties of technical terminology, and uses them to formulate an algorithm for identifying technical terms in running text. The grammatical properties discussed are preferred phrase structures: technical terms consist mostly of noun phrases containing adjectives, nouns, and occasionally prepositions; rerely do terms contain verbs, adverbs, or conjunctions. The discourse properties are patterns of repetition that distinguish noun phrases that are technical terms, especially those multi-word phrases that constitute a substantial majority of all technical vocabulary, from other types of noun phrase.The paper presents a terminology indentification algorithm that is motivated by these linguistic properties. An implementation of the algorithm is described; it recovers a high proportion of the technical terms in a text, and a high proportaion of the recovered strings are vaild technical terms. The algorithm proves to be effective regardless of the domain of the text to which it is applied.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Mining and summarizing customer reviews

[...]

Minqing Hu¹, Bing Liu¹•Institutions (1)

University of Illinois at Chicago¹

22 Aug 2004

TL;DR: This research aims to mine and to summarize all the customer reviews of a product, and proposes several novel techniques to perform these tasks.

...read moreread less

Abstract: Merchants selling products on the Web often ask their customers to review the products that they have purchased and the associated services. As e-commerce is becoming more and more popular, the number of customer reviews that a product receives grows rapidly. For a popular product, the number of reviews can be in hundreds or even thousands. This makes it difficult for a potential customer to read them to make an informed decision on whether to purchase the product. It also makes it difficult for the manufacturer of the product to keep track and to manage customer opinions. For the manufacturer, there are additional difficulties because many merchant sites may sell the same product and the manufacturer normally produces many kinds of products. In this research, we aim to mine and to summarize all the customer reviews of a product. This summarization task is different from traditional text summarization because we only mine the features of the product on which the customers have expressed their opinions and whether the opinions are positive or negative. We do not summarize the reviews by selecting a subset or rewrite some of the original sentences from the reviews to capture the main points as in the classic text summarization. Our task is performed in three steps: (1) mining product features that have been commented on by customers; (2) identifying opinion sentences in each review and deciding whether each opinion sentence is positive or negative; (3) summarizing the results. This paper proposes several novel techniques to perform these tasks. Our experimental results using reviews of a number of products sold online demonstrate the effectiveness of the techniques.

...read moreread less

7,330 citations

Cites background from "Technical terminology: some linguis..."

...In terminology finding, there are basically two techniques for discovering terms in corpora: symbolic approaches that rely on syntactic description of terms, namely noun phrases, and statistical approaches that exploit the fact that the words composing a term tend to be found close to each other and reoccurring [21, 22, 7, 6]....
[...]

Proceedings Article•DOI•

Opinion observer: analyzing and comparing opinions on the Web

[...]

Bing Liu¹, Minqing Hu¹, Junsheng Cheng¹•Institutions (1)

University of Illinois at Chicago¹

10 May 2005

TL;DR: A novel framework for analyzing and comparing consumer opinions of competing products is proposed, and a new technique based on language pattern mining is proposed to extract product features from Pros and Cons in a particular type of reviews.

...read moreread less

Abstract: The Web has become an excellent source for gathering consumer opinions. There are now numerous Web sites containing such opinions, e.g., customer reviews of products, forums, discussion groups, and blogs. This paper focuses on online customer reviews of products. It makes two contributions. First, it proposes a novel framework for analyzing and comparing consumer opinions of competing products. A prototype system called Opinion Observer is also implemented. The system is such that with a single glance of its visualization, the user is able to clearly see the strengths and weaknesses of each product in the minds of consumers in terms of various product features. This comparison is useful to both potential customers and product manufacturers. For a potential customer, he/she can see a visual side-by-side and feature-by-feature comparison of consumer opinions on these products, which helps him/her to decide which product to buy. For a product manufacturer, the comparison enables it to easily gather marketing intelligence and product benchmarking information. Second, a new technique based on language pattern mining is proposed to extract product features from Pros and Cons in a particular type of reviews. Such features form the basis for the above comparison. Experimental results show that the technique is highly effective and outperform existing methods significantly.

...read moreread less

1,758 citations

Proceedings Article•

Mining opinion features in customer reviews

[...]

Minqing Hu¹, Bing Liu¹•Institutions (1)

University of Illinois at Chicago¹

25 Jul 2004

TL;DR: This project aims to summarize all the customer reviews of a product by mining opinion/product features that the reviewers have commented on and a number of techniques are presented to mine such features.

...read moreread less

Abstract: It is a common practice that merchants selling products on the Web ask their customers to review the products and associated services. As e-commerce is becoming more and more popular, the number of customer reviews that a product receives grows rapidly. For a popular product, the number of reviews can be in hundreds. This makes it difficult for a potential customer to read them in order to make a decision on whether to buy the product. In this project, we aim to summarize all the customer reviews of a product. This summarization task is different from traditional text summarization because we are only interested in the specific features of the product that customers have opinions on and also whether the opinions are positive or negative. We do not summarize the reviews by selecting or rewriting a subset of the original sentences from the reviews to capture their main points as in the classic text summarization. In this paper, we only focus on mining opinion/product features that the reviewers have commented on. A number of techniques are presented to mine such features. Our experimental results show that these techniques are highly effective.

...read moreread less

1,373 citations

Cites background or methods from "Technical terminology: some linguis..."

...In terminology identification, there are basically two techniques for discovering terms in corpora: symbolic approaches that rely on syntactic description of terms, namely noun phrases, and statistical approaches that exploiting the fact that the words composing a term tend to be found close to each other and reoccurring (Jacquemin and Bourigault 2001; Justeson and Katz 1995; Daille 1996; Church and Hanks 1990)....
[...]
...…that rely on syntactic description of terms, namely noun phrases, and statistical approaches that exploiting the fact that the words composing a term tend to be found close to each other and reoccurring (Jacquemin and Bourigault 2001; Justeson and Katz 1995; Daille 1996; Church and Hanks 1990)....
[...]

Proceedings Article•DOI•

Improved automatic keyword extraction given more linguistic knowledge

[...]

Anette Hulth¹•Institutions (1)

Stockholm University¹

11 Jul 2003

TL;DR: By adding linguistic knowledge to the representation, rather than relying only on statistics, a better result is obtained as measured by keywords previously assigned by professional indexers, by extracting NP-chunks gives a better precision than n-grams.

...read moreread less

Abstract: In this paper, experiments on automatic extraction of keywords from abstracts using a supervised machine learning algorithm are discussed. The main point of this paper is that by adding linguistic knowledge to the representation (such as syntactic features), rather than relying only on statistics (such as term frequency and n-grams), a better result is obtained as measured by keywords previously assigned by professional indexers. In more detail, extracting NP-chunks gives a better precision than n-grams, and by adding the PoS tag(s) assigned to the term as a feature, a dramatic improvement of the results is obtained, independent of the term selection approach applied.

...read moreread less

958 citations

Cites background or methods from "Technical terminology: some linguis..."

...In a first set of runs, the terms were defined in a manner similar to Turney (2000) and Frank et al....
[...]
...Boguraev and Kennedy (1999) extract technical terms based on the noun phrase patterns suggested by Justeson and Katz (1995); these terms are then the basis for a headline-like characterisation of a document....
[...]

Journal Article•DOI•

Automatic recognition of multi-word terms:. the C-value/NC-value method

[...]

Katerina T. Frantzi¹, Sophia Ananiadou¹, Hideki Mima²•Institutions (2)

University of Manchester¹, University of Tokyo²

01 Aug 2000-International Journal on Digital Libraries

TL;DR: This paper presents a domain-independent method for the automatic extraction of multi-word terms, from machine-readable special language corpora, using C-value/NC-value, which enhances the common statistical measure of frequency of occurrence for term extraction, making it sensitive to a particular type ofMulti- word terms, the nested terms.

...read moreread less

Abstract: Technical terms (henceforth called terms ), are important elements for digital libraries. In this paper we present a domain-independent method for the automatic extraction of multi-word terms, from machine-readable special language corpora. The method, (C-value/NC-value ), combines linguistic and statistical information. The first part, C-value, enhances the common statistical measure of frequency of occurrence for term extraction, making it sensitive to a particular type of multi-word terms, the nested terms. The second part, NC-value, gives: 1) a method for the extraction of term context words (words that tend to appear with terms); 2) the incorporation of information from term context words to the extraction of terms.

...read moreread less

849 citations

Cites background or methods from "Technical terminology: some linguis..."

...An example of such a filter is that of Justeson and Katz, [18]....
[...]
...A number of di erent lters have been used, [3,8,6,18]....
[...]
...Dagan and Church, [6], Daille et al., [8], and Justeson and Katz, [18], and Enguehard and Pantera, [11], use frequency of occurrence....
[...]
...Since most terms consist of nouns and adjectives, [27], and sometimes prepositions, [18], we use a linguistic lter that accepts these types of terms....
[...]
..., [8], and Justeson and Katz, [18], Enguehard and Pantera, [11], use frequency of occurrence....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text

[...]

Kenneth Church¹•Institutions (1)

Bell Labs¹

09 Feb 1988

TL;DR: The authors used a linear-time dynamic programming algorithm to find an assignment of parts of speech to words that optimizes the product of (a) lexical probabilities (probability of observing part of speech i given word i) and (b) contextual probabilities (pb probability of observing n following partsof speech).

...read moreread less

Abstract: A program that tags each word in an input sentence with the most likely part of speech has been written. The program uses a linear-time dynamic programming algorithm to find an assignment of parts of speech to words that optimizes the product of (a) lexical probabilities (probability of observing part of speech i given word i) and (b) contextual probabilities (probability of observing part of speech i given n following parts of speech). Program performance is encouraging; a 400-word sample is presented and is judged to be 99.5% correct. >

...read moreread less

913 citations

"Technical terminology: some linguis..." refers methods in this paper

...McCord (personal communication, 1990) implemented our algorithm using his parser (McCord 1990); Dagan and Church (1994) implemented an abbreviated version of it using a part-of-speech tagger (Church 1988)....
[...]

Proceedings Article•DOI•

A stochastic parts program and noun phrase parser for unrestricted text

[...]

Kenneth Church¹•Institutions (1)

Bell Labs¹

23 May 1989

TL;DR: A program that tags each word in an input sentence with the most likely part of speech has been written and performance is encouraging; a 400-word sample is presented and is judged to be 99.5% correct.

...read moreread less

838 citations

Journal Article•DOI•

General Principles of Classification and Nomenclature in Folk Biology

[...]

Dennis E. Breedlove¹, Peter H. Raven²•Institutions (2)

California Academy of Sciences¹, Missouri Botanical Garden²

01 Feb 1973-American Anthropologist

TL;DR: In this paper, it has been shown that several important and far reaching generalizations can be formulated which promise to throw considerable light on prescientific man's understanding of his biological universe.

...read moreread less

Abstract: Since about 1954, modern field research has been carried out by a number of ethnographers and biologists in an effort to understand more fully the nature of folk biological classification. Much of this work has been devoted to studies dealing with the naming and classification of plants and animals in non-Western societies. It has now become apparent that several important and far reaching generalizations can be formulated which promise to throw considerable light on prescientific man's understanding of his biological universe.

...read moreread less

692 citations

Book•

Introduction to the Grammar of English

[...]

Rodney Huddleston

09 Mar 2009

TL;DR: The authors provided a thorough and precise account of all the major areas of English grammar, including syntax and morphology, in a very broad understanding of that term, and provided a much needed foundation for more advanced work in theoretical linguistics.

...read moreread less

Abstract: This textbook provides a thorough and precise account of all the major areas of English grammar. For practical reasons the author concentrates on Standard English and only selected aspects of its regional variation. The book is written for students who may have no previous knowledge of linguistics and little familiarity with 'traditional' grammar. All grammatical terms, whether traditional or more recent, are therefore carefully explained, and in the first three chapters the students is introduced to the theoretical concepts and methodological principles needed to follow the later descriptive chapters. Nevertheless, the book is more than a straightforward 'grammar of English'. Rodney Huddleston does not espouse any formalised contemporary model of syntax and morphology, but he adopts the framework of modern 'structural' linguistics, in a very broad understanding of that term. The grammatical categories postulated derive from a study of the combinational and contrastive relationships the words and other forms enter into, and Dr Huddleston pays particular attention to the problem of choosing between alternative analyses and justifying the analysis he proposes. In this sense his book is addressed to the student of linguistics, who will find Introduction to the Grammar of English a much needed foundation for more advanced work in theoretical linguistics.

...read moreread less

641 citations

Journal Article•DOI•

A Comprehensive Dictionary of Psychological and Psychoanalytical Terms

[...]

Ashley Montagu

01 Apr 1959-American Journal of Psychiatry

584 citations