Home
/
Authors
/
Silvio Amir

Author

Silvio Amir

Other affiliations: Northeastern University, Carnegie Mellon University, INESC-ID

Bio: Silvio Amir is an academic researcher from Johns Hopkins University. The author has contributed to research in topics: Computer science & Sentiment analysis. The author has an hindex of 11, co-authored 24 publications receiving 1624 citations. Previous affiliations of Silvio Amir include Northeastern University & Carnegie Mellon University.

Topics: Computer science, Sentiment analysis, SemEval, Social media, Mental health ...read more

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation

[...]

Wang Ling¹, Chris Dyer¹, Alan W. Black², Isabel Trancoso², Ramon Fermandez², Silvio Amir², Luís Marujo¹, Tiago Luís¹ - Show less +4 more•Institutions (2)

INESC-ID¹, Carnegie Mellon University²

09 Aug 2015

TL;DR: A model for constructing vector representations of words by composing characters using bidirectional LSTMs that requires only a single vector per character type and a fixed set of parameters for the compositional model, which yields state- of-the-art results in language modeling and part-of-speech tagging.

...read moreread less

Abstract: We introduce a model for constructing vector representations of words by composing characters using bidirectional LSTMs. Relative to traditional word representation models that have independent vectors for each word type, our model requires only a single vector per character type and a fixed set of parameters for the compositional model. Despite the compactness of this model and, more importantly, the arbitrary nature of the form‐function relationship in language, our “composed” word representations yield state-of-the-art results in language modeling and part-of-speech tagging. Benefits over traditional baselines are particularly pronounced in morphologically rich languages (e.g., Turkish).

...read moreread less

538 citations

Posted Content•

Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation

[...]

Wang Ling, Tiago Luís, Luís Marujo, Ramón Fernandez Astudillo, Silvio Amir, Chris Dyer, Alan W. Black, Isabel Trancoso - Show less +4 more

09 Aug 2015-arXiv: Computation and Language

Abstract: We introduce a model for constructing vector representations of words by composing characters using bidirectional LSTMs. Relative to traditional word representation models that have independent vectors for each word type, our model requires only a single vector per character type and a fixed set of parameters for the compositional model. Despite the compactness of this model and, more importantly, the arbitrary nature of the form-function relationship in language, our "composed" word representations yield state-of-the-art results in language modeling and part-of-speech tagging. Benefits over traditional baselines are particularly pronounced in morphologically rich languages (e.g., Turkish).

...read moreread less

519 citations

Proceedings Article•DOI•

Modelling Context with User Embeddings for Sarcasm Detection in Social Media

[...]

Silvio Amir¹, Byron C. Wallace², Hao Lyu, Paula Carvalho¹, Mário J. Silva¹ - Show less +1 more•Institutions (2)

INESC-ID¹, Northeastern University²

01 Aug 2016

TL;DR: This work proposes to automatically learn and then exploit user embeddings, to be used in concert with lexical signals to recognize sarcasm, and shows that the model outperforms a state-of-the-art approach leveraging an extensive set of carefully crafted features.

...read moreread less

165 citations

Posted Content•

Modelling Context with User Embeddings for Sarcasm Detection in Social Media

[...]

Silvio Amir, Byron C. Wallace, Hao Lyu, Paula Carvalho Mário J. Silva

04 Jul 2016-arXiv: Computation and Language

TL;DR: This paper proposed a deep neural network for automated sarcasm detection using user embeddings to exploit contextual features beyond lexical and syntactic cues present in utterances, and achieved state-of-the-art performance.

...read moreread less

Abstract: We introduce a deep neural network for automated sarcasm detection. Recent work has emphasized the need for models to capitalize on contextual features, beyond lexical and syntactic cues present in utterances. For example, different speakers will tend to employ sarcasm regarding different subjects and, thus, sarcasm detection models ought to encode such speaker information. Current methods have achieved this by way of laborious feature engineering. By contrast, we propose to automatically learn and then exploit user embeddings, to be used in concert with lexical signals to recognize sarcasm. Our approach does not require elaborate feature engineering (and concomitant data scraping); fitting user embeddings requires only the text from their previous posts. The experimental results show that our model outperforms a state-of-the-art approach leveraging an extensive set of carefully crafted features.

...read moreread less

164 citations

Proceedings Article•DOI•

Not All Contexts Are Created Equal: Better Word Representations with Variable Attention

[...]

Wang Ling¹, Yulia Tsvetkov¹, Silvio Amir¹, Ramon Fermandez¹, Chris Dyer², Alan W. Black², Isabel Trancoso², Chu-Cheng Lin² - Show less +4 more•Institutions (2)

Carnegie Mellon University¹, INESC-ID²

01 Sep 2015

TL;DR: An extension to the bag-ofwords model for learning words representations that take into account both syntactic and semantic properties within language is introduced by employing an attention model that finds within the contextual words, the words that are relevant for each prediction.

...read moreread less

Abstract: We introduce an extension to the bag-ofwords model for learning words representations that take into account both syntactic and semantic properties within language. This is done by employing an attention model that finds within the contextual words, the words that are relevant for each prediction. The general intuition of our model is that some words are only relevant for predicting local context (e.g. function words), while other words are more suited for determining global context, such as the topic of the document. Experiments performed on both semantically and syntactically oriented tasks show gains using our model over the existing bag of words model. Furthermore, compared to other more sophisticated models, our model scales better as we increase the size of the context of the model.

...read moreread less

156 citations

1
2
3
4
…
5
6
7

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Enriching Word Vectors with Subword Information

[...]

Piotr Bojanowski¹, Edouard Grave¹, Armand Joulin¹, Tomas Mikolov¹•Institutions (1)

Facebook¹

12 Jun 2017-Transactions of the Association for Computational Linguistics

TL;DR: This paper proposed a new approach based on skip-gram model, where each word is represented as a bag of character n-grams, words being represented as the sum of these representations, allowing to train models on large corpora quickly and allowing to compute word representations for words that did not appear in the training data.

...read moreread less

Abstract: Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. Popular models to learn such representations ignore the morphology of words, by assigning a distinct vector to each word. This is a limitation, especially for languages with large vocabularies and many rare words. In this paper, we propose a new approach based on the skipgram model, where each word is represented as a bag of character n-grams. A vector representation is associated to each character n-gram, words being represented as the sum of these representations. Our method is fast, allowing to train models on large corpora quickly and allows to compute word representations for words that did not appear in the training data. We evaluate our word representations on nine different languages, both on word similarity and analogy tasks. By comparing to recently proposed morphological word representations, we show that our vectors achieve state-of-the-art performance on these tasks.

...read moreread less

7,537 citations

Proceedings Article•DOI•

Neural Machine Translation of Rare Words with Subword Units

[...]

Rico Sennrich, Barry Haddow, Alexandra Birch

12 Aug 2016

TL;DR: This paper introduces a simpler and more effective approach, making the NMT model capable of open-vocabulary translation by encoding rare and unknown words as sequences of subword units, and empirically shows that subword models improve over a back-off dictionary baseline for the WMT 15 translation tasks English-German and English-Russian by 1.3 BLEU.

...read moreread less

Abstract: Neural machine translation (NMT) models typically operate with a fixed vocabulary, but translation is an open-vocabulary problem. Previous work addresses the translation of out-of-vocabulary words by backing off to a dictionary. In this paper, we introduce a simpler and more effective approach, making the NMT model capable of open-vocabulary translation by encoding rare and unknown words as sequences of subword units. This is based on the intuition that various word classes are translatable via smaller units than words, for instance names (via character copying or transliteration), compounds (via compositional translation), and cognates and loanwords (via phonological and morphological transformations). We discuss the suitability of different word segmentation techniques, including simple character ngram models and a segmentation based on the byte pair encoding compression algorithm, and empirically show that subword models improve over a back-off dictionary baseline for the WMT 15 translation tasks English!German and English!Russian by up to 1.1 and 1.3 BLEU, respectively.

...read moreread less

6,898 citations

Proceedings Article•DOI•

Hierarchical Attention Networks for Document Classification

[...]

Zichao Yang¹, Diyi Yang¹, Chris Dyer¹, Xiaodong He², Alexander J. Smola¹, Eduard Hovy¹ - Show less +2 more•Institutions (2)

Carnegie Mellon University¹, Microsoft²

13 Jun 2016

TL;DR: Experiments conducted on six large scale text classification tasks demonstrate that the proposed architecture outperform previous methods by a substantial margin.

...read moreread less

Abstract: We propose a hierarchical attention network for document classification. Our model has two distinctive characteristics: (i) it has a hierarchical structure that mirrors the hierarchical structure of documents; (ii) it has two levels of attention mechanisms applied at the wordand sentence-level, enabling it to attend differentially to more and less important content when constructing the document representation. Experiments conducted on six large scale text classification tasks demonstrate that the proposed architecture outperform previous methods by a substantial margin. Visualization of the attention layers illustrates that the model selects qualitatively informative words and sentences.

...read moreread less

4,282 citations

Proceedings Article•DOI•

Neural Architectures for Named Entity Recognition

[...]

Guillaume Lample¹, Miguel Ballesteros², Sandeep Subramanian³, Kazuya Kawakami⁴, Chris Dyer³ - Show less +1 more•Institutions (4)

Facebook¹, Pompeu Fabra University², Carnegie Mellon University³, Google⁴

04 Mar 2016

TL;DR: Comunicacio presentada a la 2016 Conference of the North American Chapter of the Association for Computational Linguistics, celebrada a San Diego (CA, EUA) els dies 12 a 17 of juny 2016.

...read moreread less

Abstract: Comunicacio presentada a la 2016 Conference of the North American Chapter of the Association for Computational Linguistics, celebrada a San Diego (CA, EUA) els dies 12 a 17 de juny 2016.

...read moreread less

3,960 citations

Posted Content•

Enriching Word Vectors with Subword Information

[...]

Piotr Bojanowski¹, Edouard Grave¹, Armand Joulin¹, Tomas Mikolov¹•Institutions (1)

Facebook¹

15 Jul 2016-arXiv: Computation and Language

TL;DR: A new approach based on the skipgram model, where each word is represented as a bag of character n-grams, with words being represented as the sum of these representations, which achieves state-of-the-art performance on word similarity and analogy tasks.

...read moreread less

Abstract: Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. Popular models that learn such representations ignore the morphology of words, by assigning a distinct vector to each word. This is a limitation, especially for languages with large vocabularies and many rare words. In this paper, we propose a new approach based on the skipgram model, where each word is represented as a bag of character $n$-grams. A vector representation is associated to each character $n$-gram; words being represented as the sum of these representations. Our method is fast, allowing to train models on large corpora quickly and allows us to compute word representations for words that did not appear in the training data. We evaluate our word representations on nine different languages, both on word similarity and analogy tasks. By comparing to recently proposed morphological word representations, we show that our vectors achieve state-of-the-art performance on these tasks.

...read moreread less

2,425 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse