scispace - formally typeset
Search or ask a question
Topic

Phrase

About: Phrase is a research topic. Over the lifetime, 12580 publications have been published within this topic receiving 317823 citations. The topic is also known as: syntagma & phrases.


Papers
More filters
Proceedings Article
11 Jul 2010
TL;DR: A novel scheme to apply factored phrase-based SMT to a language pair with very disparate morphological structures that relies on syntactic analysis on the source side and encodes a wide variety of local and non-local syntactic structures as complex structural tags which appear as additional factors in the training data.
Abstract: We present a novel scheme to apply factored phrase-based SMT to a language pair with very disparate morphological structures. Our approach relies on syntactic analysis on the source side (English) and then encodes a wide variety of local and non-local syntactic structures as complex structural tags which appear as additional factors in the training data. On the target side (Turkish), we only perform morphological analysis and disambiguation but treat the complete complex morphological tag as a factor, instead of separating morphemes. We incrementally explore capturing various syntactic substructures as complex tags on the English side, and evaluate how our translations improve in BLEU scores. Our maximal set of source and target side transformations, coupled with some additional techniques, provide an 39% relative improvement from a baseline 17.08 to 23.78 BLEU, all averaged over 10 training and test sets. Now that the syntactic analysis on the English side is available, we also experiment with more long distance constituent reordering to bring the English constituent order close to Turkish, but find that these transformations do not provide any additional consistent tangible gains when averaged over the 10 sets.

68 citations

Proceedings ArticleDOI
18 Dec 2006
TL;DR: A new concept-based mining model that relies on the analysis of both the sentence and the document, rather than, the traditional analysis of the document dataset only is introduced and enhances the clustering quality of sets of documents substantially.
Abstract: Most of text mining techniques are based on word and/or phrase analysis of the text. The statistical analysis of a term (word or phrase) frequency captures the importance of the term within a document. However, to achieve a more accurate analysis, the underlying mining technique should indicate terms that capture the semantics of the text from which the importance of a term in a sentence and in the document can be derived. A new concept-based mining model that relies on the analysis of both the sentence and the document, rather than, the traditional analysis of the document dataset only is introduced. The proposed mining model consists of a concept-based analysis of terms and a concept-based similarity measure. The term which contributes to the sentence semantics is analyzed with respect to its importance at the sentence and document levels. The model can efficiently find significant matching terms, either words or phrases, of the documents according to the semantics of the text. The similarity between documents relies on a new concept-based similarity measure which is applied to the matching terms between documents. Experiments using the proposed concept-based term analysis and similarity measure in text clustering are conducted. Experimental results demonstrate that the newly developed concept-based mining model enhances the clustering quality of sets of documents substantially.

68 citations

Proceedings ArticleDOI
Colin Bannard1
28 Jun 2007
TL;DR: This paper describes a method for identifying items in corpora, focussing on English verb-noun combinations, that achieves greater accuracy than existing MWE extraction methods based on lexical association.
Abstract: Natural languages contain many multi-word sequences that do not display the variety of syntactic processes we would expect given their phrase type, and consequently must be included in the lexicon as multiword units. This paper describes a method for identifying such items in corpora, focussing on English verb-noun combinations. In an evaluation using a set of dictionary-published MWEs we show that our method achieves greater accuracy than existing MWE extraction methods based on lexical association.

68 citations

Proceedings Article
01 Jan 1996
TL;DR: The CLARIT NLP track effort is focused on evaluating the usefulness of syntactic phrases for document indexing and the use of lexical atoms, such as hot dog, to replace single words for indexing would increase both precision and recall.
Abstract: The CLARIT NLP track effort is focused on evaluating the usefulness of syntactic phrases for document indexing. AA. are inclined to propose the following two hypotheses : 1) the use of lexical atoms, such as hot dog, to replace single words for indexing would increase both precision and recall; 2) the use of syntactic phrases, such as junior college to supplement single words would increase precision without hurting recall and using more such phrases results in greater improvement in precision

67 citations

Journal Article
TL;DR: This paper aims at analyzing a solution for the sentiment classification at a fine-grained level, namely the sentence level in which polarity of the sentence can be given by three categories as positive, negative and neutral.
Abstract: Sentiment classification is a way to analyze the subjective information in the text and then mine the opinion. Sentiment analysis is the procedure by which information is extracted from the opinions, appraisals and emotions of people in regards to entities, events and their attributes. In decision making, the opinions of others have a significant effect on customers ease, making choices with regards to online shopping, choosing events, products, entities. The approaches of text sentiment analysis typically work at a particular level like phrase, sentence or document level. This paper aims at analyzing a solution for the sentiment classification at a fine-grained level, namely the sentence level in which polarity of the sentence can be given by three categories as positive, negative and neutral.

67 citations


Network Information
Related Topics (5)
Sentence
41.2K papers, 929.6K citations
92% related
Vocabulary
44.6K papers, 941.5K citations
88% related
Natural language
31.1K papers, 806.8K citations
84% related
Grammar
33.8K papers, 767.6K citations
83% related
Perception
27.6K papers, 937.2K citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023467
20221,079
2021360
2020470
2019525
2018535