scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Technical terminology: some linguistic properties and an algorithm for identification in text

01 Mar 1995-Natural Language Engineering (Cambridge University Press)-Vol. 1, Iss: 01, pp 9-27
TL;DR: This paper identifies some linguistic properties of technical terminology, and uses them to formulate an algorithm for identifying technical terms in running text, and presents a terminology indentification algorithm that is motivated by these linguistic properties.
Abstract: This paper identifies some linguistic properties of technical terminology, and uses them to formulate an algorithm for identifying technical terms in running text. The grammatical properties discussed are preferred phrase structures: technical terms consist mostly of noun phrases containing adjectives, nouns, and occasionally prepositions; rerely do terms contain verbs, adverbs, or conjunctions. The discourse properties are patterns of repetition that distinguish noun phrases that are technical terms, especially those multi-word phrases that constitute a substantial majority of all technical vocabulary, from other types of noun phrase.The paper presents a terminology indentification algorithm that is motivated by these linguistic properties. An implementation of the algorithm is described; it recovers a high proportion of the technical terms in a text, and a high proportaion of the recovered strings are vaild technical terms. The algorithm proves to be effective regardless of the domain of the text to which it is applied.
Citations
More filters
Proceedings ArticleDOI
22 Aug 2004
TL;DR: This research aims to mine and to summarize all the customer reviews of a product, and proposes several novel techniques to perform these tasks.
Abstract: Merchants selling products on the Web often ask their customers to review the products that they have purchased and the associated services. As e-commerce is becoming more and more popular, the number of customer reviews that a product receives grows rapidly. For a popular product, the number of reviews can be in hundreds or even thousands. This makes it difficult for a potential customer to read them to make an informed decision on whether to purchase the product. It also makes it difficult for the manufacturer of the product to keep track and to manage customer opinions. For the manufacturer, there are additional difficulties because many merchant sites may sell the same product and the manufacturer normally produces many kinds of products. In this research, we aim to mine and to summarize all the customer reviews of a product. This summarization task is different from traditional text summarization because we only mine the features of the product on which the customers have expressed their opinions and whether the opinions are positive or negative. We do not summarize the reviews by selecting a subset or rewrite some of the original sentences from the reviews to capture the main points as in the classic text summarization. Our task is performed in three steps: (1) mining product features that have been commented on by customers; (2) identifying opinion sentences in each review and deciding whether each opinion sentence is positive or negative; (3) summarizing the results. This paper proposes several novel techniques to perform these tasks. Our experimental results using reviews of a number of products sold online demonstrate the effectiveness of the techniques.

7,330 citations


Cites background from "Technical terminology: some linguis..."

  • ...In terminology finding, there are basically two techniques for discovering terms in corpora: symbolic approaches that rely on syntactic description of terms, namely noun phrases, and statistical approaches that exploit the fact that the words composing a term tend to be found close to each other and reoccurring [21, 22, 7, 6]....

    [...]

Proceedings ArticleDOI
10 May 2005
TL;DR: A novel framework for analyzing and comparing consumer opinions of competing products is proposed, and a new technique based on language pattern mining is proposed to extract product features from Pros and Cons in a particular type of reviews.
Abstract: The Web has become an excellent source for gathering consumer opinions. There are now numerous Web sites containing such opinions, e.g., customer reviews of products, forums, discussion groups, and blogs. This paper focuses on online customer reviews of products. It makes two contributions. First, it proposes a novel framework for analyzing and comparing consumer opinions of competing products. A prototype system called Opinion Observer is also implemented. The system is such that with a single glance of its visualization, the user is able to clearly see the strengths and weaknesses of each product in the minds of consumers in terms of various product features. This comparison is useful to both potential customers and product manufacturers. For a potential customer, he/she can see a visual side-by-side and feature-by-feature comparison of consumer opinions on these products, which helps him/her to decide which product to buy. For a product manufacturer, the comparison enables it to easily gather marketing intelligence and product benchmarking information. Second, a new technique based on language pattern mining is proposed to extract product features from Pros and Cons in a particular type of reviews. Such features form the basis for the above comparison. Experimental results show that the technique is highly effective and outperform existing methods significantly.

1,758 citations

Proceedings Article
25 Jul 2004
TL;DR: This project aims to summarize all the customer reviews of a product by mining opinion/product features that the reviewers have commented on and a number of techniques are presented to mine such features.
Abstract: It is a common practice that merchants selling products on the Web ask their customers to review the products and associated services. As e-commerce is becoming more and more popular, the number of customer reviews that a product receives grows rapidly. For a popular product, the number of reviews can be in hundreds. This makes it difficult for a potential customer to read them in order to make a decision on whether to buy the product. In this project, we aim to summarize all the customer reviews of a product. This summarization task is different from traditional text summarization because we are only interested in the specific features of the product that customers have opinions on and also whether the opinions are positive or negative. We do not summarize the reviews by selecting or rewriting a subset of the original sentences from the reviews to capture their main points as in the classic text summarization. In this paper, we only focus on mining opinion/product features that the reviewers have commented on. A number of techniques are presented to mine such features. Our experimental results show that these techniques are highly effective.

1,373 citations


Cites background or methods from "Technical terminology: some linguis..."

  • ...In terminology identification, there are basically two techniques for discovering terms in corpora: symbolic approaches that rely on syntactic description of terms, namely noun phrases, and statistical approaches that exploiting the fact that the words composing a term tend to be found close to each other and reoccurring (Jacquemin and Bourigault 2001; Justeson and Katz 1995; Daille 1996; Church and Hanks 1990)....

    [...]

  • ...…that rely on syntactic description of terms, namely noun phrases, and statistical approaches that exploiting the fact that the words composing a term tend to be found close to each other and reoccurring (Jacquemin and Bourigault 2001; Justeson and Katz 1995; Daille 1996; Church and Hanks 1990)....

    [...]

Proceedings ArticleDOI
11 Jul 2003
TL;DR: By adding linguistic knowledge to the representation, rather than relying only on statistics, a better result is obtained as measured by keywords previously assigned by professional indexers, by extracting NP-chunks gives a better precision than n-grams.
Abstract: In this paper, experiments on automatic extraction of keywords from abstracts using a supervised machine learning algorithm are discussed. The main point of this paper is that by adding linguistic knowledge to the representation (such as syntactic features), rather than relying only on statistics (such as term frequency and n-grams), a better result is obtained as measured by keywords previously assigned by professional indexers. In more detail, extracting NP-chunks gives a better precision than n-grams, and by adding the PoS tag(s) assigned to the term as a feature, a dramatic improvement of the results is obtained, independent of the term selection approach applied.

958 citations


Cites background or methods from "Technical terminology: some linguis..."

  • ...In a first set of runs, the terms were defined in a manner similar to Turney (2000) and Frank et al....

    [...]

  • ...Boguraev and Kennedy (1999) extract technical terms based on the noun phrase patterns suggested by Justeson and Katz (1995); these terms are then the basis for a headline-like characterisation of a document....

    [...]

Journal ArticleDOI
TL;DR: This paper presents a domain-independent method for the automatic extraction of multi-word terms, from machine-readable special language corpora, using C-value/NC-value, which enhances the common statistical measure of frequency of occurrence for term extraction, making it sensitive to a particular type ofMulti- word terms, the nested terms.
Abstract: Technical terms (henceforth called terms ), are important elements for digital libraries. In this paper we present a domain-independent method for the automatic extraction of multi-word terms, from machine-readable special language corpora. The method, (C-value/NC-value ), combines linguistic and statistical information. The first part, C-value, enhances the common statistical measure of frequency of occurrence for term extraction, making it sensitive to a particular type of multi-word terms, the nested terms. The second part, NC-value, gives: 1) a method for the extraction of term context words (words that tend to appear with terms); 2) the incorporation of information from term context words to the extraction of terms.

849 citations


Cites background or methods from "Technical terminology: some linguis..."

  • ...An example of such a filter is that of Justeson and Katz, [18]....

    [...]

  • ...A number of di erent lters have been used, [3,8,6,18]....

    [...]

  • ...Dagan and Church, [6], Daille et al., [8], and Justeson and Katz, [18], and Enguehard and Pantera, [11], use frequency of occurrence....

    [...]

  • ...Since most terms consist of nouns and adjectives, [27], and sometimes prepositions, [18], we use a linguistic lter that accepts these types of terms....

    [...]

  • ..., [8], and Justeson and Katz, [18], Enguehard and Pantera, [11], use frequency of occurrence....

    [...]

References
More filters
Proceedings ArticleDOI
Kenneth Church1
09 Feb 1988
TL;DR: The authors used a linear-time dynamic programming algorithm to find an assignment of parts of speech to words that optimizes the product of (a) lexical probabilities (probability of observing part of speech i given word i) and (b) contextual probabilities (pb probability of observing n following partsof speech).
Abstract: A program that tags each word in an input sentence with the most likely part of speech has been written. The program uses a linear-time dynamic programming algorithm to find an assignment of parts of speech to words that optimizes the product of (a) lexical probabilities (probability of observing part of speech i given word i) and (b) contextual probabilities (probability of observing part of speech i given n following parts of speech). Program performance is encouraging; a 400-word sample is presented and is judged to be 99.5% correct. >

913 citations


"Technical terminology: some linguis..." refers methods in this paper

  • ...McCord (personal communication, 1990) implemented our algorithm using his parser (McCord 1990); Dagan and Church (1994) implemented an abbreviated version of it using a part-of-speech tagger (Church 1988)....

    [...]

Proceedings ArticleDOI
Kenneth Church1
23 May 1989
TL;DR: A program that tags each word in an input sentence with the most likely part of speech has been written and performance is encouraging; a 400-word sample is presented and is judged to be 99.5% correct.
Abstract: A program that tags each word in an input sentence with the most likely part of speech has been written. The program uses a linear-time dynamic programming algorithm to find an assignment of parts of speech to words that optimizes the product of (a) lexical probabilities (probability of observing part of speech i given word i) and (b) contextual probabilities (probability of observing part of speech i given n following parts of speech). Program performance is encouraging; a 400-word sample is presented and is judged to be 99.5% correct. >

838 citations

Journal ArticleDOI
TL;DR: In this paper, it has been shown that several important and far reaching generalizations can be formulated which promise to throw considerable light on prescientific man's understanding of his biological universe.
Abstract: Since about 1954, modern field research has been carried out by a number of ethnographers and biologists in an effort to understand more fully the nature of folk biological classification. Much of this work has been devoted to studies dealing with the naming and classification of plants and animals in non-Western societies. It has now become apparent that several important and far reaching generalizations can be formulated which promise to throw considerable light on prescientific man's understanding of his biological universe.

692 citations

Book
09 Mar 2009
TL;DR: The authors provided a thorough and precise account of all the major areas of English grammar, including syntax and morphology, in a very broad understanding of that term, and provided a much needed foundation for more advanced work in theoretical linguistics.
Abstract: This textbook provides a thorough and precise account of all the major areas of English grammar. For practical reasons the author concentrates on Standard English and only selected aspects of its regional variation. The book is written for students who may have no previous knowledge of linguistics and little familiarity with 'traditional' grammar. All grammatical terms, whether traditional or more recent, are therefore carefully explained, and in the first three chapters the students is introduced to the theoretical concepts and methodological principles needed to follow the later descriptive chapters. Nevertheless, the book is more than a straightforward 'grammar of English'. Rodney Huddleston does not espouse any formalised contemporary model of syntax and morphology, but he adopts the framework of modern 'structural' linguistics, in a very broad understanding of that term. The grammatical categories postulated derive from a study of the combinational and contrastive relationships the words and other forms enter into, and Dr Huddleston pays particular attention to the problem of choosing between alternative analyses and justifying the analysis he proposes. In this sense his book is addressed to the student of linguistics, who will find Introduction to the Grammar of English a much needed foundation for more advanced work in theoretical linguistics.

641 citations