Home
/
Authors
/
Utsab Barman

Author

Utsab Barman

Bio: Utsab Barman is an academic researcher from Dublin City University. The author has contributed to research in topics: Language identification & Conditional random field. The author has an hindex of 6, co-authored 8 publications receiving 480 citations. Previous affiliations of Utsab Barman include Jadavpur University.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Code Mixing: A Challenge for Language Identification in the Language of Social Media

[...]

Utsab Barman¹, Amitava Das¹, Joachim Wagner¹, Jennifer Foster²•Institutions (2)

Dublin City University¹, University of North Texas²

01 Oct 2014

TL;DR: A new dataset is described, which contains Facebook posts and comments that exhibit code mixing between Bengali, English and Hindi, and it is found that the dictionary-based approach is surpassed by supervised classification and sequence labelling, and that it is important to take contextual clues into consideration.

...read moreread less

Abstract: In social media communication, multilingual speakers often switch between languages, and, in such an environment, automatic language identification becomes both a necessary and challenging task. In this paper, we describe our work in progress on the problem of automatic language identification for the language of social media. We describe a new dataset that we are in the process of creating, which contains Facebook posts and comments that exhibit code mixing between Bengali, English and Hindi. We also present some preliminary word-level language identification experiments using this dataset. Different techniques are employed, including a simple unsupervised dictionary-based approach, supervised word-level classification with and without contextual clues, and sequence labelling using Conditional Random Fields. We find that the dictionary-based approach is surpassed by supervised classification and sequence labelling, and that it is important to take contextual clues into consideration.

...read moreread less

273 citations

Proceedings Article•DOI•

DCU: Aspect-based Polarity Classification for SemEval Task 4

[...]

Joachim Wagner¹, Piyush Arora¹, Santiago Cortes¹, Utsab Barman¹, Dasha Bogdanova¹, Jennifer Foster¹, Lamia Tounsi¹ - Show less +3 more•Institutions (1)

Dublin City University¹

24 Aug 2014

TL;DR: The DCU team submitted one constrained run for the restaurant domain and one for the laptop domain for sub-task B (aspect term polarity prediction), ranking highest out of 36 systems on the restaurant test set and joint highest on the laptop test set.

...read moreread less

Abstract: We describe the work carried out by DCU on the Aspect Based Sentiment Analysis task at SemEval 2014. Our team submitted one constrained run for the restaurant domain and one for the laptop domain for sub-task B (aspect term polarity prediction), ranking highest out of 36 systems on the restaurant test set and joint highest out of 32 systems on the laptop test set.

...read moreread less

216 citations

Proceedings Article•DOI•

DCU-UVT: Word-Level Language Classification with Code-Mixed Data

[...]

Utsab Barman¹, Joachim Wagner¹, Grzegorz Chrupała¹, Jennifer Foster¹•Institutions (1)

Dublin City University¹

01 Oct 2014

TL;DR: The DCU-UVT team’s participation in the Language Identification in Code-Switched Data shared task in the Workshop on Computational Approaches to Code Switching is described and a SVM-based system with contextual clues is selected as the final system.

...read moreread less

Abstract: This paper describes the DCU-UVT team’s participation in the Language Identification in Code-Switched Data shared task in the Workshop on Computational Approaches to Code Switching. Word-level classification experiments were carried out using a simple dictionary-based method, linear kernel support vector machines (SVMs) with and without contextual clues, and a k-nearest neighbour approach. Based on these experiments, we select our SVM-based system with contextual clues as our final system and present results for the Nepali-English and Spanish-English datasets.

...read moreread less

36 citations

Proceedings Article•DOI•

Part-of-speech Tagging of Code-mixed Social Media Content: Pipeline, Stacking and Joint Modelling

[...]

Utsab Barman¹, Joachim Wagner¹, Jennifer Foster¹•Institutions (1)

Dublin City University¹

02 Nov 2016

TL;DR: This work annotates a subset of a trilingual code-mixed corpus with part-of-speech (POS) tags and investigates the use of a joint model which performs language identification (LID) and part of speech ( POS) tagging simultaneously.

...read moreread less

Abstract: Multilingual users of social media sometimes use multiple languages during conversation. Mixing multiple languages in content is known as code-mixing. We annotate a subset of a trilingual code-mixed corpus (Barman et al., 2014) with part-of-speech (POS) tags. We investigate two state-of-the-art POS tagging techniques for code-mixed content and combine the features of the two systems to build a better POS tagger. Furthermore, we investigate the use of a joint model which performs language identification (LID) and partof-speech (POS) tagging simultaneously.

...read moreread less

24 citations

Proceedings Article•DOI•

NextGen AML: Distributed Deep Learning based Language Technologies to Augment Anti Money Laundering Investigation

[...]

Jingguang Han¹, Utsab Barman², Jeremiah Hayes, Jinhua Du², Edward Burgin³, Dadong Wan - Show less +2 more•Institutions (3)

University of Southampton¹, Dublin City University², Accenture³

01 Jul 2018

TL;DR: Feedback from AML practitioners suggests that the proposed distributed framework can reduce approximately 30% time and cost compared to their previous manual approaches of AML investigation.

...read moreread less

Abstract: Most of the current anti money laundering (AML) systems, using handcrafted rules, are heavily reliant on existing structured databases, which are not capable of effectively and efficiently identifying hidden and complex ML activities, especially those with dynamic and timevarying characteristics, resulting in a high percentage of false positives. Therefore, analysts1 are engaged for further investigation which significantly increases human capital cost and processing time. To alleviate these issues, this paper presents a novel framework for the next generation AML by applying and visualizing deep learning-driven natural language processing (NLP) technologies in a distributed and scalable manner to augment AML monitoring and investigation. The proposed distributed framework performs news and tweet sentiment analysis, entity recognition, relation extraction, entity linking and link analysis on different data sources (e.g. news articles and tweets) to provide additional evidence to human investigators for final decisionmaking. Each NLP module is evaluated on a task-specific data set, and the overall experiments are performed on synthetic and real-world datasets. Feedback from AML practitioners suggests that our system can reduce approximately 30% time and cost compared to their previous manual approaches of AML investigation.

...read moreread less

22 citations

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Aspect Level Sentiment Classification with Deep Memory Network

[...]

Duyu Tang¹, Bing Qin², Ting Liu²•Institutions (2)

Microsoft¹, Harbin Institute of Technology²

01 Nov 2016

TL;DR: The authors proposed a deep memory network for aspect level sentiment classification, which explicitly captures the importance of each context word when inferring the sentiment polarity of an aspect, such importance degree and text representation are calculated with multiple computational layers, each of which is a neural attention model over an external memory.

...read moreread less

Abstract: We introduce a deep memory network for aspect level sentiment classification. Unlike feature-based SVM and sequential neural models such as LSTM, this approach explicitly captures the importance of each context word when inferring the sentiment polarity of an aspect. Such importance degree and text representation are calculated with multiple computational layers, each of which is a neural attention model over an external memory. Experiments on laptop and restaurant datasets demonstrate that our approach performs comparable to state-of-art feature based SVM system, and substantially better than LSTM and attention-based LSTM architectures. On both datasets we show that multiple computational layers could improve the performance. Moreover, our approach is also fast. The deep memory network with 9 layers is 15 times faster than LSTM with a CPU implementation.

...read moreread less

731 citations

Proceedings Article•

Targeted Aspect-Based Sentiment Analysis via Embedding Commonsense Knowledge into an Attentive LSTM

[...]

Yukun Ma¹, Haiyun Peng¹, Erik Cambria¹•Institutions (1)

Nanyang Technological University¹

26 Apr 2018

TL;DR: A novel solution to targeted aspect-based sentiment analysis, which tackles the challenges of both aspect- based sentiment analysis and targeted sentiment analysis by exploiting commonsense knowledge by augmenting the LSTM network with a hierarchical attention mechanism.

...read moreread less

Abstract: Analyzing people’s opinions and sentiments towards certain aspects is an important task of natural language understanding. In this paper, we propose a novel solution to targeted aspect-based sentiment analysis, which tackles the challenges of both aspect-based sentiment analysis and targeted sentiment analysis by exploiting commonsense knowledge. We augment the long short-term memory (LSTM) network with a hierarchical attention mechanism consisting of a target-level attention and a sentence-level attention. Commonsense knowledge of sentiment-related concepts is incorporated into the end-to-end training of a deep neural network for sentiment classification. In order to tightly integrate the commonsense knowledge into the recurrent encoder, we propose an extension of LSTM, termed Sentic LSTM. We conduct experiments on two publicly released datasets, which show that the combination of the proposed attention architecture and Sentic LSTM can outperform state-of-the-art methods in targeted aspect sentiment tasks.

...read moreread less

491 citations

Proceedings Article•DOI•

Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence

[...]

Chi Sun¹, Luyao Huang¹, Xipeng Qiu¹•Institutions (1)

Fudan University¹

01 Jun 2019

TL;DR: This paper constructs an auxiliary sentence from the aspect and converts ABSA to a sentence-pair classification task, such as question answering (QA) and natural language inference (NLI), and fine-tune the pre-trained model from BERT.

...read moreread less

Abstract: Aspect-based sentiment analysis (ABSA), which aims to identify fine-grained opinion polarity towards a specific aspect, is a challenging subtask of sentiment analysis (SA). In this paper, we construct an auxiliary sentence from the aspect and convert ABSA to a sentence-pair classification task, such as question answering (QA) and natural language inference (NLI). We fine-tune the pre-trained model from BERT and achieve new state-of-the-art results on SentiHood and SemEval-2014 Task 4 datasets. The source codes are available at https://github.com/HSLCY/ABSA-BERT-pair.

...read moreread less

397 citations

Journal Article•DOI•

Deep Learning for Aspect-Based Sentiment Analysis: A Comparative Review

[...]

Hai Ha Do¹, P. W. C. Prasad¹, Angelika Maag¹, Abeer Alsadoon¹•Institutions (1)

Charles Sturt University¹

15 Mar 2019-Expert Systems With Applications

TL;DR: This article aims to provide a comparative review of deep learning for aspect-based sentiment analysis to place different approaches in context.

...read moreread less

Abstract: The increasing volume of user-generated content on the web has made sentiment analysis an important tool for the extraction of information about the human emotional state. A current research focus for sentiment analysis is the improvement of granularity at aspect level, representing two distinct aims: aspect extraction and sentiment classification of product reviews and sentiment classification of target-dependent tweets. Deep learning approaches have emerged as a prospect for achieving these aims with their ability to capture both syntactic and semantic features of text without requirements for high-level feature engineering, as is the case in earlier methods. In this article, we aim to provide a comparative review of deep learning for aspect-based sentiment analysis to place different approaches in context.

...read moreread less

388 citations

Proceedings Article•DOI•

TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification.

[...]

Francesco Barbieri¹, Jose Camacho-Collados², Luis Espinosa Anke², Leonardo Neves²•Institutions (2)

Pompeu Fabra University¹, Cardiff University²

01 Nov 2020

TL;DR: This paper proposes a new evaluation framework (TweetEval) consisting of seven heterogeneous Twitter-specific classification tasks, and shows the effectiveness of starting off with existing pre-trained generic language models, and continue training them on Twitter corpora.

...read moreread less

Abstract: The experimental landscape in natural language processing for social media is too fragmented. Each year, new shared tasks and datasets are proposed, ranging from classics like sentiment analysis to irony detection or emoji prediction. Therefore, it is unclear what the current state of the art is, as there is no standardized evaluation protocol, neither a strong set of baselines trained on such domain-specific data. In this paper, we propose a new evaluation framework (TweetEval) consisting of seven heterogeneous Twitter-specific classification tasks. We also provide a strong set of baselines as starting point, and compare different language modeling pre-training strategies. Our initial experiments show the effectiveness of starting off with existing pre-trained generic language models, and continue training them on Twitter corpora.

...read moreread less

328 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109

Collapse