Home
/
Authors
/
Salah Ait-Mokhtar

Author

Salah Ait-Mokhtar

Bio: Salah Ait-Mokhtar is an academic researcher from Xerox. The author has contributed to research in topics: Computer science & Parser combinator. The author has an hindex of 12, co-authored 18 publications receiving 872 citations.

Topics: Computer science, Parser combinator, Parsing, Shallow parsing, Top-down parsing ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Robustness beyond shallowness: incremental deep parsing

[...]

Salah Ait-Mokhtar¹, Jean-Pierre Chanod¹, Claude Roux¹•Institutions (1)

Xerox¹

01 Jun 2002-Natural Language Engineering

TL;DR: This work argues that with a systematic incremental methodology one can go beyond shallow parsing to deeper language analysis, while preserving robustness, and describes a generic system based on such a methodology and designed for building robust analyzers that tackle deeper linguistic phenomena than those traditionally handled by the now widespread shallow parsers.

...read moreread less

Abstract: Robustness is a key issue for natural language processing in general and parsing in particular, and many approaches have been explored in the last decade for the design of robust parsing systems. Among those approaches is shallow or partial parsing, which produces minimal and incomplete syntactic structures, often in an incremental way. We argue that with a systematic incremental methodology one can go beyond shallow parsing to deeper language analysis, while preserving robustness. We describe a generic system based on such a methodology and designed for building robust analyzers that tackle deeper linguistic phenomena than those traditionally handled by the now widespread shallow parsers. The rule formalism allows the recognition of n-ary linguistic relations between words or constituents on the basis of global or local structural, topological and/or lexical conditions. It offers the advantage of accepting various types of inputs, ranging from raw to chunked or constituent-marked texts, so for instance it can be used to process existing annotated corpora, or to perform a deeper analysis on the output of an existing shallow parser. It has been successfully used to build a deep functional dependency parser, as well as for the task of co-reference resolution, in a modular way.

...read moreread less

321 citations

Proceedings Article•DOI•

Incremental finite-state parsing

[...]

Salah Ait-Mokhtar¹, Jean-Pierre Chanod¹•Institutions (1)

Xerox¹

31 Mar 1997

TL;DR: This paper describes a new finite-state shallow parser that overcomes the inefficiency of previous fully reductionist constraint-based systems, while maintaining broad coverage and linguistic granularity.

...read moreread less

Abstract: This paper describes a new finite-state shallow parser. It merges constructive and reductionist approaches within a highly modular architecture. Syntactic information is added at the sentence level in an incremental way, depending on the contextual information available at a given stage. This approach overcomes the inefficiency of previous fully reductionist constraint-based systems, while maintaining broad coverage and linguistic granularity. The implementation relies on a sequence of networks built with the replace operator. Given the high level of modularity, the core grammar is easily augmented with corpus-specific sub-grammars. The current system is implemented for French and is being expanded to new languages.

...read moreread less

174 citations

Patent•

Method and system for retrieving statements of information sources and associating a factuality assessment to the statements

[...]

Salah Ait-Mokhtar¹, Aude Rebotier¹, Ágnes Sándor¹•Institutions (1)

Xerox¹

22 Apr 2008

TL;DR: In this paper, a system and a method for providing a factuality assessment of a retrieved information source's statement are disclosed, which includes receiving a user's query which identifies an information source whose statements are to be retrieved, retrieving documents which refer to the information source, mapping statements in the retrieved documents to their authors, identifying as information source statements, the mapped statements that are mapped to an author which is compatible with the Information source, and for at least one of the information sources's statements, assessing the factuality of the statement according to the source's statements.

...read moreread less

Abstract: A system and method for providing a factuality assessment of a retrieved information source's statement are disclosed. The method includes receiving a user's query which identifies an information source whose statements are to be retrieved, retrieving documents which refer to the information source, mapping statements in the retrieved documents to their authors, identifying as information source statements, the mapped statements that are mapped to an author which is compatible with the information source, and for at least one of the information source's statements, assessing a factuality of the information source's statement according to the information source.

...read moreread less

82 citations

Subject and Object Dependency Extraction Using Finite-State Transducers

[...]

Salah Ait-Mokhtar¹, Jean-Pierre Chanod¹•Institutions (1)

Xerox¹

01 Jan 1997

TL;DR: An approach for fast automatic recognition and extraction of subject and object dependency relations from large French corpora, using a sequence of finite-state transducers, and the impact of POS tagging errors on subject/object dependency extraction is evaluated.

...read moreread less

Abstract: We describe and evaluate an approach for fast automatic recognition and extraction of subject and object dependency relations from large French corpora, using a sequence of finite-state transducers. The extraction is performed in two major steps: incremental finite-state parsing and extraction of subject/verb and object/verb relations. Our incremental and cautious approach during the first phase allows the system to deal successfully with complex phenomena such as embeddings, coordination of VPs and NPs or non-standard word order. The extraction requires no subcategorisation information. It relies on POS information only. After describing the two steps, we give the results of an evaluation on various types of unrestricted corpora. Precision is around 90-97% for subjects (84-88% for objects) and recall around 86-92% for subjects (80-90% for objects). We also provide some error analysis; in particular, we evaluate the impact of POS tagging errors on subject/object dependency extraction.

...read moreread less

67 citations

Proceedings Article•

A Multi-Input Dependency Parser

[...]

Salah Ait-Mokhtar, Jean-Pierre Chanod, Claude Roux

01 Oct 2001

58 citations

1
2
3
4
…
5

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

SENTIMENT CLASSIFICATION of MOVIE REVIEWS USING CONTEXTUAL VALENCE SHIFTERS

[...]

Alistair Kennedy¹, Diana Inkpen¹•Institutions (1)

University of Ottawa¹

01 May 2006

TL;DR: It is shown that extending the term‐counting method with contextual valence shifters improves the accuracy of the classification, and combining the two methods achieves better results than either method alone.

...read moreread less

Abstract: We present two methods for determining the sentiment expressed by a movie review. The semantic orientation of a review can be positive, negative, or neutral. We examine the effect of valence shifters on classifying the reviews. We examine three types of valence shifters: negations, intensifiers, and diminishers. Negations are used to reverse the semantic polarity of a particular term, while intensifiers and diminishers are used to increase and decrease, respectively, the degree to which a term is positive or negative. The first method classifies reviews based on the number of positive and negative terms they contain. We use the General Inquirer to identify positive and negative terms, as well as negation terms, intensifiers, and diminishers. We also use positive and negative terms from other sources, including a dictionary of synonym differences and a very large Web corpus. To compute corpus-based semantic orientation values of terms, we use their association scores with a small group of positive and negative terms. We show that extending the term-counting method with contextual valence shifters improves the accuracy of the classification. The second method uses a Machine Learning algorithm, Support Vector Machines. We start with unigram features and then add bigrams that consist of a valence shifter and another word. The accuracy of classification is very high, and the valence shifter bigrams slightly improve it. The features that contribute to the high accuracy are the words in the lists of positive and negative terms. Previous work focused on either the term-counting method or the Machine Learning method. We show that combining the two methods achieves better results than either method alone.

...read moreread less

735 citations

Book•

Information Extraction

[...]

Maria T. Pazienza

01 Jan 1997

TL;DR: This paper discusses attempts to derive templates directly from corpora; to derive knowledge structures and lexicons directly from Corpora, including discussion of the recent LE project ECRAN which attempted to tune existing lexicons to new corpora.

...read moreread less

Abstract: It seems widely agreed that IE (Information Extraction) is now a tested language technology that has reached precision+recall values that put it in about the same position as Information Retrieval and Machine Translation, both of which are widely used commercially. There is also a clear range of practical applications that would be eased by the sort of template-style data that IE provides. The problem for wider deployment of the technology is adaptability: the ability to customize IE rapidly to new domains. In this paper we discuss some methods that have been tried to ease this problem, and to create something more rapid than the bench-mark one-month figure, which was roughly what ARPA teams in IE needed to adapt an existing system by hand to a new domain of corpora and templates. An important distinction in discussing the issue is the degree to which a user can be assumed to know what is wanted, to have preexisting templates ready to hand, as opposed to a user who has a vague idea of what is needed from a corpus. We shall discuss attempts to derive templates directly from corpora; to derive knowledge structures and lexicons directly from corpora, including discussion of the recent LE project ECRAN which attempted to tune existing lexicons to new corpora. An important issue is how far established methods in Information Retrieval of tuning to a user’s needs with feedback at an interface can be transferred to IE.

...read moreread less

716 citations

Book Chapter•DOI•

Building a Treebank for French

[...]

Anne Abeillé¹, Lionel Clément², François Toussenel¹•Institutions (2)

University of Paris¹, French Institute for Research in Computer Science and Automation²

01 Jan 2003

TL;DR: A treebank project for French has annotated a newspaper corpus of 1 Million words with part of speech, inflection, compounds, lemmas and constituency and presents some uses of the corpus.

...read moreread less

Abstract: We present a treebank project for French. We have annotated a newspaper corpus of 1 Million words with part of speech, inflection, compounds, lemmas and constituency. We describe the tagging and parsing phases of the project, and for each, the automatic tools, the guidelines and the validation process. We then present some uses of the corpus as well as some directions for future work.

...read moreread less

509 citations

Journal Article•DOI•

A survey on sentiment detection of reviews

[...]

Huifeng Tang¹, Songbo Tan¹, Xueqi Cheng¹•Institutions (1)

Chinese Academy of Sciences¹

01 Sep 2009-Expert Systems With Applications

TL;DR: This survey discusses related issues and main approaches to these problems, namely, subjectivity classification, word sentiment classification, document sentiment classification and opinion extraction.

...read moreread less

Abstract: The sentiment detection of texts has been witnessed a booming interest in recent years, due to the increased availability of online reviews in digital form and the ensuing need to organize them Till to now, there are mainly four different problems predominating in this research community, namely, subjectivity classification, word sentiment classification, document sentiment classification and opinion extraction In fact, there are inherent relations between them Subjectivity classification can prevent the sentiment classifier from considering irrelevant or even potentially misleading text Document sentiment classification and opinion extraction have often involved word sentiment classification techniques This survey discusses related issues and main approaches to these problems

...read moreread less

447 citations

Journal Article•DOI•

An approach to discovering new technology opportunities: Keyword-based patent map approach

[...]

Sungjoo Lee¹, Byungun Yoon², Yongtae Park³•Institutions (3)

Ajou University¹, Dongguk University², Seoul National University³

01 Jun 2009-Technovation

TL;DR: Text mining is used to transform patent documents into structured data to identify keyword vectors and principal component analysis is employed to reduce the numbers of keyword vectors to make suitable for use on a two-dimensional map.

...read moreread less

386 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155

Collapse