Home
/
Authors
/
Thamar Solorio

Author

Thamar Solorio

Other affiliations: University of Alabama at Birmingham, National University of Colombia, University of Texas at El Paso ...read more

Bio: Thamar Solorio is an academic researcher from University of Houston. The author has contributed to research in topics: Named-entity recognition & Task (project management). The author has an hindex of 30, co-authored 175 publications receiving 3308 citations. Previous affiliations of Thamar Solorio include University of Alabama at Birmingham & National University of Colombia.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2002

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Lexical feature based phishing URL detection using online learning

[...]

Aaron Blum¹, Brad Wardman¹, Thamar Solorio¹, Gary Warner¹•Institutions (1)

University of Alabama at Birmingham¹

08 Oct 2010

TL;DR: This paper explores the possibility of utilizing confidence weighted classification combined with content based phishing URL detection to produce a dynamic and extensible system for detection of present and emerging types of phishing domains.

...read moreread less

Abstract: Phishing is a form of cybercrime where spammed emails and fraudulent websites entice victims to provide sensitive information to the phishers. The acquired sensitive information is subsequently used to steal identities or gain access to money. This paper explores the possibility of utilizing confidence weighted classification combined with content based phishing URL detection to produce a dynamic and extensible system for detection of present and emerging types of phishing domains. Our system is capable of detecting emerging threats as they appear and subsequently can provide increased protection against zero hour threats unlike traditional blacklisting techniques which function reactively.

...read moreread less

196 citations

Proceedings Article•DOI•

Overview for the First Shared Task on Language Identification in Code-Switched Data

[...]

Thamar Solorio¹, Elizabeth Blair, Suraj Maharjan², Steven Bethard¹, Mona Diab³, Mahmoud Ghoneim³, Abdelati Hawwari³, Fahad AlGhamdi³, Julia Hirschberg⁴, Alison Chang, Pascale Fung⁵ - Show less +7 more•Institutions (5)

University of Alabama at Birmingham¹, Jilin University², George Washington University³, Columbia University⁴, Hong Kong University of Science and Technology⁵

01 Oct 2014

TL;DR: The evaluation showed that language identification at the token level is more difficult when the languages present are closely related, as in the case of MSA-DA, where the prediction performance was the lowest among all language pairs.

...read moreread less

Abstract: We present an overview of the first shared task on language identification on codeswitched data. The shared task included code-switched data from four language pairs: Modern Standard ArabicDialectal Arabic (MSA-DA), MandarinEnglish (MAN-EN), Nepali-English (NEPEN), and Spanish-English (SPA-EN). A total of seven teams participated in the task and submitted 42 system runs. The evaluation showed that language identification at the token level is more difficult when the languages present are closely related, as in the case of MSA-DA, where the prediction performance was the lowest among all language pairs. In contrast, the language pairs with the higest F-measure where SPA-EN and NEP-EN. The task made evident that language identification in code-switched data is still far from solved and warrants further research.

...read moreread less

174 citations

Proceedings Article•DOI•

Not All Character N-grams Are Created Equal: A Study in Authorship Attribution

[...]

Upendra Sapkota¹, Steven Bethard¹, Manuel Montes², Thamar Solorio¹•Institutions (2)

University of Alabama at Birmingham¹, National Institute of Astrophysics, Optics and Electronics²

01 Jan 2015

TL;DR: It is demonstrated that characterngrams that capture information about affixes and punctuation account for almost all of the power of character n-grams as features.

...read moreread less

Abstract: Character n-grams have been identified as the most successful feature in both singledomain and cross-domain Authorship Attribution (AA), but the reasons for their discriminative value were not fully understood. We identify subgroups of charactern-grams that correspond to linguistic aspects commonly claimed to be covered by these features: morphosyntax, thematic content and style. We evaluate the predictiveness of each of these groups in two AA settings: a single domain setting and a cross-domain setting where multiple topics are present. We demonstrate that characterngrams that capture information about affixes and punctuation account for almost all of the power of character n-grams as features. Our study contributes new insights into the use of n-grams for future AA work and other classification tasks.

...read moreread less

165 citations

Proceedings Article•DOI•

Convolutional Neural Networks for Authorship Attribution of Short Texts

[...]

Prasha Shrestha¹, Sebastián Sierra², Fabio A. González², Manuel Montes³, Paolo Rosso⁴, Thamar Solorio¹ - Show less +2 more•Institutions (4)

University of Houston¹, National University of Colombia², National Institute of Astrophysics, Optics and Electronics³, Polytechnic University of Valencia⁴

01 Apr 2017

TL;DR: A model to perform authorship attribution of tweets using Convolutional Neural Networks over character n-grams and a strategy that improves model interpretability by estimating the importance of input text fragments in the predicted classification are presented.

...read moreread less

Abstract: We present a model to perform authorship attribution of tweets using Convolutional Neural Networks (CNNs) over character n-grams. We also present a strategy that improves model interpretability by estimating the importance of input text fragments in the predicted classification. The experimental evaluation shows that text CNNs perform competitively and are able to outperform previous methods.

...read moreread less

158 citations

Posted Content•

Gated Multimodal Units for Information Fusion

[...]

John Arevalo¹, Thamar Solorio², Manuel Montes-y-Gómez³, Fabio A. González¹•Institutions (3)

National University of Colombia¹, University of Houston², National Institute of Astrophysics, Optics and Electronics³

07 Feb 2017-arXiv: Machine Learning

TL;DR: In this paper, a novel model for multimodal learning based on gated neural networks is presented, which is intended to be used as an internal unit in a neural network architecture whose purpose is to find an intermediate representation based on a combination of data from different modalities.

...read moreread less

Abstract: This paper presents a novel model for multimodal learning based on gated neural networks. The Gated Multimodal Unit (GMU) model is intended to be used as an internal unit in a neural network architecture whose purpose is to find an intermediate representation based on a combination of data from different modalities. The GMU learns to decide how modalities influence the activation of the unit using multiplicative gates. It was evaluated on a multilabel scenario for genre classification of movies using the plot and the poster. The GMU improved the macro f-score performance of single-modality approaches and outperformed other fusion strategies, including mixture of experts models. Along with this work, the MM-IMDb dataset is released which, to the best of our knowledge, is the largest publicly available multimodal dataset for genre prediction on movies.

...read moreread less

153 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•

Data Mining Practical Machine Learning Tools and Techniques

[...]

อนิรุธ สืบสิงห์

01 Jan 2014-Journal of management science

9,185 citations

C4.5: Programs for Machine Learning (書評)

[...]

重郎金田

01 May 1995

1,164 citations

Journal Article•DOI•

Life with Two Languages: An Introduction to Bilingualism.@@@Language Spread: Studies in Diffusion and Social Change.

[...]

Edward Sagarin, Francois Grosjean, Robert L. Cooper

23 Jan 1983-International Migration Review

565 citations

Journal Article•DOI•

Inference and Disputed Authorship: The Federalist

[...]

S. K. Khamis, Frederick Mosteller, David L. Wallace

01 Jan 1966

552 citations

Journal Article•DOI•

Deep Multimodal Learning: A Survey on Recent Advances and Trends

[...]

Dhanesh Ramachandram¹, Graham W. Taylor¹•Institutions (1)

University of Guelph¹

09 Nov 2017-IEEE Signal Processing Magazine

TL;DR: This work first classify deep multimodal learning architectures and then discusses methods to fuse learned multi-modal representations in deep-learning architectures.

...read moreread less

Abstract: The success of deep learning has been a catalyst to solving increasingly complex machine-learning problems, which often involve multiple data modalities. We review recent advances in deep multimodal learning and highlight the state-of the art, as well as gaps and challenges in this active research field. We first classify deep multimodal learning architectures and then discuss methods to fuse learned multimodal representations in deep-learning architectures. We highlight two areas of research–regularization strategies and methods that learn or optimize multimodal fusion structures–as exciting areas for future work.

...read moreread less

529 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse