scispace - formally typeset
Search or ask a question

Showing papers by "Walter Daelemans published in 2021"


Proceedings ArticleDOI
01 Jun 2021
TL;DR: The results of the conducted experiments show that hateful metaphor features improve model performance for the both tasks and the effect of different metaphor information encoding methods on hate speech type and target detection accuracy is investigated.
Abstract: We study the usefulness of hateful metaphorsas features for the identification of the type and target of hate speech in Dutch Facebook comments. For this purpose, all hateful metaphors in the Dutch LiLaH corpus were annotated and interpreted in line with Conceptual Metaphor Theory and Critical Metaphor Analysis. We provide SVM and BERT/RoBERTa results, and investigate the effect of different metaphor information encoding methods on hate speech type and target detection accuracy. The results of the conducted experiments show that hateful metaphor features improve model performance for the both tasks. To our knowledge, it is the first time that the effectiveness of hateful metaphors as an information source for hatespeech classification is investigated.

14 citations


Proceedings ArticleDOI
01 Jun 2021
TL;DR: An SVM approach is introduced that allows to significantly improve the state-of-the-art results when combined with the deep learning models through a simple majority-voting ensemble, mainly due to a reduction of the false positive rate.
Abstract: Hate speech detection is an actively growing field of research with a variety of recently proposed approaches that allowed to push the state-of-the-art results. One of the challenges of such automated approaches – namely recent deep learning models – is a risk of false positives (i.e., false accusations), which may lead to over-blocking or removal of harmless social media content in applications with little moderator intervention. We evaluate deep learning models both under in-domain and cross-domain hate speech detection conditions, and introduce an SVM approach that allows to significantly improve the state-of-the-art results when combined with the deep learning models through a simple majority-voting ensemble. The improvement is mainly due to a reduction of the false positive rate.

12 citations


Journal ArticleDOI
TL;DR: This collection of scholarly articles asks the question "How useful is translation technology?" and describes several new statistical approaches to more rigorous evaluation methods.
Abstract: This collection of scholarly articles asks the question "How useful is translation technology?" Pointing to the need for a widely used and reliable way to test the efficiency of language translation programs, the presenters show that commercial tools such as translation memories and translation workbenches are popular, and their developers find them useful in terms of productivity, consistency, or quality. However, these claims are rarely proven using objective comparative studies, and this group describes several new statistical approaches to more rigorous evaluation methods.

9 citations


Proceedings ArticleDOI
01 Jun 2021
TL;DR: This paper explore whether supplementing textual domain knowledge in the medical NLI task can improve the performance of BERT models for domain-specific inference. But they conclude that the task of unsupervised text retrieval to bridge the gap in existing information to facilitate inference is more complex than what the state-of-the-art methods can solve, and warrants extensive research in the future.
Abstract: We explore whether state-of-the-art BERT models encode sufficient domain knowledge to correctly perform domain-specific inference. Although BERT implementations such as BioBERT are better at domain-based reasoning than those trained on general-domain corpora, there is still a wide margin compared to human performance on these tasks. To bridge this gap, we explore whether supplementing textual domain knowledge in the medical NLI task: a) by further language model pretraining on the medical domain corpora, b) by means of lexical match algorithms such as the BM25 algorithm, c) by supplementing lexical retrieval with dependency relations, or d) by using a trained retriever module, can push this performance closer to that of humans. We do not find any significant difference between knowledge supplemented classification as opposed to the baseline BERT models, however. This is contrary to the results for evidence retrieval on other tasks such as open domain question answering (QA). By examining the retrieval output, we show that the methods fail due to unreliable knowledge retrieval for complex domain-specific reasoning. We conclude that the task of unsupervised text retrieval to bridge the gap in existing information to facilitate inference is more complex than what the state-of-the-art methods can solve, and warrants extensive research in the future.

7 citations


Journal ArticleDOI
01 Sep 2021
TL;DR: In this paper, the authors evaluate publicly available resources for cyberbullying detection and demonstrate difficulties with data collection, and present an effective crowdsourcing method to generate plausible data that can be used to enrich real data.
Abstract: The detection of online cyberbullying has seen an increase in societal importance, popularity in research, and available open data. Nevertheless, while computational power and affordability of resources continue to increase, the access restrictions on high-quality data limit the applicability of state-of-the-art techniques. Consequently, much of the recent research uses small, heterogeneous datasets, without a thorough evaluation of applicability. In this paper, we further illustrate these issues, as we (i) evaluate many publicly available resources for this task and demonstrate difficulties with data collection. These predominantly yield small datasets that fail to capture the required complex social dynamics and impede direct comparison of progress. We (ii) conduct an extensive set of experiments that indicate a general lack of cross-domain generalization of classifiers trained on these sources, and openly provide this framework to replicate and extend our evaluation criteria. Finally, we (iii) present an effective crowdsourcing method: simulating real-life bullying scenarios in a lab setting generates plausible data that can be effectively used to enrich real data. This largely circumvents the restrictions on data that can be collected, and increases classifier performance. We believe these contributions can aid in improving the empirical practices of future research in the field.

4 citations



Journal ArticleDOI
01 Jan 2021
TL;DR: A comparison of three exploratory text representation approaches to study the issue communication of parties on Twitter shows a clear trade-off between interpretability and discriminative power, where a combination of all three simultaneously provides the best insights.
Abstract: Party competition in Western Europe is increasingly focused on “issue competition”, which is the selective emphasis on issues by parties. The aim of this paper is to contribute methodologically to the increasing number of studies that deal with different aspects of parties’ issue competition and communication. We systematically compare the value and shortcomings of three exploratory text representation approaches to study the issue communication of parties on Twitter. More specifically, we analyze which issues separate the online communication of one party from that of the other parties and how consistent party communication is. Our analysis was performed on two years of Twitter data from six Belgian political parties, comprising of over 56,000 political tweets. The results indicate that our exploratory approach is useful to study how political parties profile themselves on Twitter and which strategies are at play. Second, our method allows to analyze communication of individual politicians which contributes to classical literature on party unity and party discipline. A comparison of our three methods shows a clear trade-off between interpretability and discriminative power, where a combination of all three simultaneously provides the best insights.

2 citations


05 Oct 2021
TL;DR: In this article, a score-and-aggregate module between encoder and decoder is added to learn to pick the proper knowledge through minimising the language modelling loss (i.e. without having access to knowledge labels).
Abstract: Knowledge Grounded Conversation Models are usually based on a selection/retrieval module and a generation module, trained separately or simultaneously, with or without having access to a ‘gold’ knowledge option. With the introduction of large pre-trained generative models, the selection and generation part have become more and more entangled, shifting the focus towards enhancing knowledge incorporation (from multiple sources) instead of trying to pick the best knowledge option. These approaches however depend on knowledge labels and/or a separate dense retriever for their best performance. In this work we study the unsupervised selection abilities of pre-trained generative models (e.g. BART) and show that by adding a score-and-aggregate module between encoder and decoder, they are capable of learning to pick the proper knowledge through minimising the language modelling loss (i.e. without having access to knowledge labels). Trained as such, our model - K-Mine - shows competitive selection and generation performance against models that benefit from knowledge labels and/or separate dense retriever.

2 citations


Journal ArticleDOI
30 Aug 2021
TL;DR: This article examined how teenagers adapt their language use to that of their conversation partner (i.e., the linguistic phenomenon of accommodation) in interactions with peers and with older interlocutors (intergenerational communication).
Abstract: The present study examines how teenagers adapt their language use to that of their conversation partner (i.e., the linguistic phenomenon of accommodation) in interactions with peers (intragenerational communication) and with older interlocutors (intergenerational communication). We analyze a large corpus of Flemish teenagers' conversations on Facebook Messenger and WhatsApp, which appear to be highly peer-oriented. With Poisson models, we examine whether the teenage participants adjust their writing style to older interlocutors. The same trend emerges for three sets of prototypical markers of the informal online genre: teenagers insert significantly fewer of these markers when interacting with older interlocutors, thus matching their interlocutors' style and increasing linguistic similarity. Finally, the analyses reveal subtle differences in accommodation patterns for the distinct linguistic variables with respect to the impact of the teenagers' sociodemographic profiles and their interlocutors' age.

1 citations


01 Apr 2021
TL;DR: In this article, the authors evaluated the impact of stylometric and emotion-based features on hate speech detection in three languages (English, Slovene, and Dutch) and found that the combination of features that model the targeted phenomena outperforms words and character n-gram features under cross-domain conditions.
Abstract: In this paper, we describe experiments designed to evaluate the impact of stylometric and emotion-based features on hate speech detection: the task of classifying textual content into hate or non-hate speech classes. Our experiments are conducted for three languages – English, Slovene, and Dutch – both in in-domain and cross-domain setups, and aim to investigate hate speech using features that model two linguistic phenomena: the writing style of hateful social media content operationalized as function word usage on the one hand, and emotion expression in hateful messages on the other hand. The results of experiments with features that model different combinations of these phenomena support our hypothesis that stylometric and emotion-based features are robust indicators of hate speech. Their contribution remains persistent with respect to domain and language variation. We show that the combination of features that model the targeted phenomena outperforms words and character n-gram features under cross-domain conditions, and provides a significant boost to deep learning models, which currently obtain the best results, when combined with them in an ensemble.

1 citations


Posted Content
TL;DR: This article proposed a pretaining procedure to adapt ConveRT, an English SOTA conversational agent, to other languages with less training data available, and applied it for the first time to the task of Dutch FAQ answering related to the COVID19 vaccine.
Abstract: Knowledgeable FAQ chatbots are a valuable resource to any organization. Unlike traditional call centers or FAQ web pages, they provide instant responses and are always available. Our experience running a COVID19 chatbot revealed the lack of resources available for FAQ answering in non-English languages. While powerful and efficient retrieval-based models exist for English, it is rarely the case for other languages which do not have the same amount of training data available. In this work, we propose a novel pretaining procedure to adapt ConveRT, an English SOTA conversational agent, to other languages with less training data available. We apply it for the first time to the task of Dutch FAQ answering related to the COVID19 vaccine. We show it performs better than an open-source alternative in a low-data regime and high-data regime.

Journal ArticleDOI
TL;DR: In this article, the authors compare the value and shortcomings of three exploratory text representation approaches to study the issue communication of parties on Twitter and analyze which issues separate the online communication of one party from that of the other parties and how consistent party communication is.
Abstract: Party competition in Western Europe is increasingly focused on “issue competition”, which is the selective emphasis on issues by parties. The aim of this paper is to contribute methodologically to the increasing number of studies that deal with different aspects of parties’ issue competition and communication. We systematically compare the value and shortcomings of three exploratory text representation approaches to study the issue communication of parties on Twitter. More specifically, we analyze which issues separate the online communication of one party from that of the other parties and how consistent party communication is. Our analysis was performed on two years of Twitter data from six Belgian political parties, comprising of over 56,000 political tweets. The results indicate that our exploratory approach is useful to study how political parties profile themselves on Twitter and which strategies are at play. Second, our method allows to analyze the communication of individual politicians which contributes to the classical literature on party unity and party discipline. A comparison of our three methods shows a clear trade-off between interpretability and discriminative power, where a combination of all three simultaneously provides the best insights.

01 Apr 2021
TL;DR: This paper proposed a scalable multi-task training regime for biomedical name encoders which can also learn robust representations using only higher-level semantic classes, which can generalise both bottom-up as well as top-down among various semantic hierarchies.
Abstract: Neural encoders of biomedical names are typically considered robust if representations can be effectively exploited for various downstream NLP tasks. To achieve this, encoders need to model domain-specific biomedical semantics while rivaling the universal applicability of pretrained self-supervised representations. Previous work on robust representations has focused on learning low-level distinctions between names of fine-grained biomedical concepts. These fine-grained concepts can also be clustered together to reflect higher-level, more general semantic distinctions, such as grouping the names nettle sting and tick-borne fever together under the description puncture wound of skin. It has not yet been empirically confirmed that training biomedical name encoders on fine-grained distinctions automatically leads to bottom-up encoding of such higher-level semantics. In this paper, we show that this bottom-up effect exists, but that it is still relatively limited. As a solution, we propose a scalable multi-task training regime for biomedical name encoders which can also learn robust representations using only higher-level semantic classes. These representations can generalise both bottom-up as well as top-down among various semantic hierarchies. Moreover, we show how they can be used out-of-the-box for improved unsupervised detection of hypernyms, while retaining robust performance on various semantic relatedness benchmarks.

Proceedings ArticleDOI
01 Apr 2021
TL;DR: This article used conceptual grounding constraints to align encoded names to pre-trained embeddings of their concept identifiers, which is effective even when using a simple feed-forward encoding architecture that allows for scaling to large corpora while remaining sufficiently expressive.
Abstract: Effective representation of biomedical names for downstream NLP tasks requires the encoding of both lexical as well as domain-specific semantic information. Ideally, the synonymy and semantic relatedness of names should be consistently reflected by their closeness in an embedding space. To achieve such robustness, prior research has considered multi-task objectives when training neural encoders. In this paper, we take a next step towards truly robust representations, which capture more domain-specific semantics while remaining universally applicable across different biomedical corpora and domains. To this end, we use conceptual grounding constraints which more effectively align encoded names to pretrained embeddings of their concept identifiers. These constraints are effective even when using a Deep Averaging Network, a simple feedforward encoding architecture that allows for scaling to large corpora while remaining sufficiently expressive. We empirically validate our approach using multiple tasks and benchmarks, which assess both literal synonymy as well as more general semantic relatedness.

Proceedings Article
01 Nov 2021
TL;DR: In this article, the problem text is first mapped to a formal representation in a declarative language using a sequence-to-sequence model, and then the resulting representation is executed using a probabilistic programming system to provide the answer.
Abstract: While solving math word problems automatically has received considerable attention in the NLP community, few works have addressed probability word problems specifically. In this paper, we employ and analyse various neural models for answering such word problems. In a two-step approach, the problem text is first mapped to a formal representation in a declarative language using a sequence-to-sequence model, and then the resulting representation is executed using a probabilistic programming system to provide the answer. Our best performing model incorporates general-domain contextualised word representations that were finetuned using transfer learning on another in-domain dataset. We also apply end-to-end models to this task, which bring out the importance of the two-step approach in obtaining correct solutions to probability problems.

Posted Content
TL;DR: The authors collected around 6M FAQ pairs from the web, in 21 different languages, and adopted a similar setup as Dense Passage Retrieval (DPR) and test various bi-encoders on this dataset.
Abstract: In this paper, we present the first multilingual FAQ dataset publicly available. We collected around 6M FAQ pairs from the web, in 21 different languages. Although this is significantly larger than existing FAQ retrieval datasets, it comes with its own challenges: duplication of content and uneven distribution of topics. We adopt a similar setup as Dense Passage Retrieval (DPR) and test various bi-encoders on this dataset. Our experiments reveal that a multilingual model based on XLM-RoBERTa achieves the best results, except for English. Lower resources languages seem to learn from one another as a multilingual model achieves a higher MRR than language-specific ones. Our qualitative analysis reveals the brittleness of the model on simple word changes. We publicly release our dataset, model and training script.

Proceedings ArticleDOI
01 Jun 2021
TL;DR: In this paper, a few-shot learning approach is proposed to explore the impact of conceptual distinctions on robust biomedical name representations, which is effective for various types of input representations, both domain-specific or unsupervised.
Abstract: Recent research on robust representations of biomedical names has focused on modeling large amounts of fine-grained conceptual distinctions using complex neural encoders. In this paper, we explore the opposite paradigm: training a simple encoder architecture using only small sets of names sampled from high-level biomedical concepts. Our encoder post-processes pretrained representations of biomedical names, and is effective for various types of input representations, both domain-specific or unsupervised. We validate our proposed few-shot learning approach on multiple biomedical relatedness benchmarks, and show that it allows for continual learning, where we accumulate information from various conceptual hierarchies to consistently improve encoder performance. Given these findings, we propose our approach as a low-cost alternative for exploring the impact of conceptual distinctions on robust biomedical name representations.

Posted Content
TL;DR: In this paper, the authors propose a novel pretaining procedure to adapt ConveRT, an English SOTA conversational agent, to other languages with less training data available, and apply it for the first time to the task of Dutch FAQ answering related to the COVID19 vaccine.
Abstract: Knowledgeable FAQ chatbots are a valuable resource to any organization. Unlike traditional call centers or FAQ web pages, they provide instant responses and are always available. Our experience running a COVID19 chatbot revealed the lack of resources available for FAQ answering in non-English languages. While powerful and efficient retrieval-based models exist for English, it is rarely the case for other languages which do not have the same amount of training data available. In this work, we propose a novel pretaining procedure to adapt ConveRT, an English SOTA conversational agent, to other languages with less training data available. We apply it for the first time to the task of Dutch FAQ answering related to the COVID19 vaccine. We show it performs better than an open-source alternative in a low-data regime and high-data regime.

Posted Content
TL;DR: In this paper, a score-and-aggregate module between encoder and decoder is added to pre-trained generative models to learn to pick the proper knowledge through minimising the language modelling loss.
Abstract: Knowledge Grounded Conversation Models (KGCM) are usually based on a selection/retrieval module and a generation module, trained separately or simultaneously, with or without having access to a gold knowledge option. With the introduction of large pre-trained generative models, the selection and generation part have become more and more entangled, shifting the focus towards enhancing knowledge incorporation (from multiple sources) instead of trying to pick the best knowledge option. These approaches however depend on knowledge labels and/or a separate dense retriever for their best performance. In this work we study the unsupervised selection abilities of pre-trained generative models (e.g. BART) and show that by adding a score-and-aggregate module between encoder and decoder, they are capable of learning to pick the proper knowledge through minimising the language modelling loss (i.e. without having access to knowledge labels). Trained as such, our model - K-Mine - shows competitive selection and generation performance against models that benefit from knowledge labels and/or separate dense retriever.

Proceedings ArticleDOI
01 Jun 2021
TL;DR: The authors propose a pipeline to explain RNNs by means of decision lists (also called rules) over skipgrams, which persistently achieves high explanation fidelity and qualitatively interpretable rules.
Abstract: Several previous studies on explanation for recurrent neural networks focus on approaches that find the most important input segments for a network as its explanations. In that case, the manner in which these input segments combine with each other to form an explanatory pattern remains unknown. To overcome this, some previous work tries to find patterns (called rules) in the data that explain neural outputs. However, their explanations are often insensitive to model parameters, which limits the scalability of text explanations. To overcome these limitations, we propose a pipeline to explain RNNs by means of decision lists (also called rules) over skipgrams. For evaluation of explanations, we create a synthetic sepsis-identification dataset, as well as apply our technique on additional clinical and sentiment analysis datasets. We find that our technique persistently achieves high explanation fidelity and qualitatively interpretable rules.

01 Nov 2021
TL;DR: This article collected around 6M FAQ pairs from the web, in 21 different languages, and adopted a similar setup as Dense Passage Retrieval (DPR) and test various bi-encoders on this dataset.
Abstract: In this paper, we present the first multilingual FAQ dataset publicly available. We collected around 6M FAQ pairs from the web, in 21 different languages. Although this is significantly larger than existing FAQ retrieval datasets, it comes with its own challenges: duplication of content and uneven distribution of topics. We adopt a similar setup as Dense Passage Retrieval (DPR) and test various bi-encoders on this dataset. Our experiments reveal that a multilingual model based on XLM-RoBERTa achieves the best results, except for English. Lower resources languages seem to learn from one another as a multilingual model achieves a higher MRR than language-specific ones. Our qualitative analysis reveals the brittleness of the model on simple word changes. We publicly release our dataset, model, and training script.