scispace - formally typeset
Search or ask a question

Showing papers by "Walter Daelemans published in 2018"


Journal ArticleDOI
08 Oct 2018-PLOS ONE
TL;DR: This paper describes the collection and fine-grained annotation of a cyberbullying corpus for English and Dutch and performs a series of binary classification experiments to determine the feasibility of automatic cyberbullies detection.
Abstract: While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online. Recent studies report that cyberbullying constitutes a growing problem among youngsters. Successful prevention depends on the adequate detection of potentially harmful messages and the information overload on the Web requires intelligent systems to identify potential risks automatically. The focus of this paper is on automatic cyberbullying detection in social media text by modelling posts written by bullies, victims, and bystanders of online bullying. We describe the collection and fine-grained annotation of a cyberbullying corpus for English and Dutch and perform a series of binary classification experiments to determine the feasibility of automatic cyberbullying detection. We make use of linear support vector machines exploiting a rich feature set and investigate which information sources contribute the most for the task. Experiments on a hold-out test set reveal promising results for the detection of cyberbullying-related posts. After optimisation of the hyperparameters, the classifier yields an F1 score of 64% and 61% for English and Dutch respectively, and considerably outperforms baseline systems.

231 citations


01 Jan 2018
TL;DR: This edition of PAN studies two task, the novel task of cross-domain authorship attribution, where the texts of known and unknown authorship belong to different domains, and style change detection, where single-author and multi-author texts are to be distinguished.
Abstract: Author identification attempts to reveal the authors behind texts. It is an emerging area of research associated with applications in literary research, cyber-security, forensics, and social media analysis. In this edition of PAN, we study two task, the novel task of cross-domain authorship attribution, where the texts of known and unknown authorship belong to different domains, and style change detection, where single-author and multi-author texts are to be distinguished. For the former task, we make use of fanfiction texts, a large part of contemporary fiction written by non-professional authors who are inspired by specific well-known works, to enable us control the domain of texts for the first time. We describe a new corpus of fanfiction texts covering five languages (English, French, Italian, Polish, and Spanish). For the latter, a new data set of Q&As covering multiple topics in English is introduced. We received 11 submissions for the cross-domain authorship attribution task and 5 submissions for the style change detection task. A survey of participant methods and analytical evaluation results are presented in this paper.

84 citations


Proceedings ArticleDOI
26 Mar 2018
TL;DR: The authors presented a new dataset for machine comprehension in the medical domain using clinical case reports with around 100,000 gap-filling queries about these cases and applied several baselines and state-of-the-art neural readers to the dataset, and observed a considerable gap in performance (20% F1) between the best human and machine readers.
Abstract: We present a new dataset for machine comprehension in the medical domain. Our dataset uses clinical case reports with around 100,000 gap-filling queries about these cases. We apply several baselines and state-of-the-art neural readers to the dataset, and observe a considerable gap in performance (20% F1) between the best human and machine readers. We analyze the skills required for successful answering and show how reader performance varies depending on the applicable skills. We find that inferences using domain knowledge and object tracking are the most frequently required skills, and that recognizing omitted information and spatio-temporal reasoning are the most difficult for the machines.

68 citations


Book ChapterDOI
08 Sep 2018
TL;DR: This paper shows how DCNNs, which have been fine tuned on a large artistic collection, outperform the same architectures which are pre-trained on the ImageNet dataset only, when it comes to the classification of heritage objects from a different dataset.
Abstract: In this paper we investigate whether Deep Convolutional Neural Networks (DCNNs), which have obtained state of the art results on the ImageNet challenge, are able to perform equally well on three different art classification problems. In particular, we assess whether it is beneficial to fine tune the networks instead of just using them as off the shelf feature extractors for a separately trained softmax classifier. Our experiments show how the first approach yields significantly better results and allows the DCNNs to develop new selective attention mechanisms over the images, which provide powerful insights about which pixel regions allow the networks successfully tackle the proposed classification challenges. Furthermore, we also show how DCNNs, which have been fine tuned on a large artistic collection, outperform the same architectures which are pre-trained on the ImageNet dataset only, when it comes to the classification of heritage objects from a different dataset.

49 citations


Journal ArticleDOI
TL;DR: The utility of a stacked denoising autoencoder and a paragraph vector model to learn task-independent dense patient representations directly from clinical notes is explored and novel techniques to facilitate model interpretability are proposed.

42 citations


Posted Content
TL;DR: A new dataset for machine comprehension in the medical domain using clinical case reports with around 100,000 gap-filling queries is presented, and it is found that inferences using domain knowledge and object tracking are the most frequently required skills.
Abstract: We present a new dataset for machine comprehension in the medical domain. Our dataset uses clinical case reports with around 100,000 gap-filling queries about these cases. We apply several baselines and state-of-the-art neural readers to the dataset, and observe a considerable gap in performance (20% F1) between the best human and machine readers. We analyze the skills required for successful answering and show how reader performance varies depending on the applicable skills. We find that inferences using domain knowledge and object tracking are the most frequently required skills, and that recognizing omitted information and spatio-temporal reasoning are the most difficult for the machines.

28 citations


Proceedings ArticleDOI
01 Aug 2018
TL;DR: It is found that the output rule-sets can explain the predictions of a neural network trained for 4-class text classification from the 20 newsgroups dataset to a macro-averaged F-score of 0.80.
Abstract: Understanding the behavior of a trained network and finding explanations for its outputs is important for improving the network’s performance and generalization ability, and for ensuring trust in automated systems. Several approaches have previously been proposed to identify and visualize the most important features by analyzing a trained network. However, the relations between different features and classes are lost in most cases. We propose a technique to induce sets of if-then-else rules that capture these relations to globally explain the predictions of a network. We first calculate the importance of the features in the trained network. We then weigh the original inputs with these feature importance scores, simplify the transformed input space, and finally fit a rule induction model to explain the model predictions. We find that the output rule-sets can explain the predictions of a neural network trained for 4-class text classification from the 20 newsgroups dataset to a macro-averaged F-score of 0.80. We make the code available at https://github.com/clips/interpret_with_rules.

14 citations


Proceedings Article
01 Aug 2018
TL;DR: This paper describes CLiPS’s submissions for the Discriminating between Dutch and Flemish in Subtitles (DFS) shared task at VarDial 2018 and explores different ways to combine classifiers trained on different feature groups.
Abstract: This paper describes CLiPS’s submissions for the Discriminating between Dutch and Flemish in Subtitles (DFS) shared task at VarDial 2018. We explore different ways to combine classifiers trained on different feature groups. Our best system uses two Linear SVM classifiers; one trained on lexical features (word n-grams) and one trained on syntactic features (PoS n-grams). The final prediction for a document to be in Flemish Dutch or Netherlandic Dutch is made by the classifier that outputs the highest probability for one of the two labels. This confidence vote approach outperforms a meta-classifier on the development data and on the test data.

13 citations


Posted Content
TL;DR: This report presents a study of eight corpora of online hate speech, by demonstrating the NLP techniques that were used to collect and analyze the jihadist, extremist, racist, and sexist content.
Abstract: In this report, we present a study of eight corpora of online hate speech, by demonstrating the NLP techniques that we used to collect and analyze the jihadist, extremist, racist, and sexist content Analysis of the multilingual corpora shows that the different contexts share certain characteristics in their hateful rhetoric To expose the main features, we have focused on text classification, text profiling, keyword and collocation extraction, along with manual annotation and qualitative study

10 citations


Journal ArticleDOI
28 Dec 2018-PLOS ONE
TL;DR: Analysis of distributional properties that facilitate the categorization of words into lexical categories shows how, in order for the learner to see an opportunity to form a category, there needs to be a certain degree of uncertainty in the co-occurrence pattern.
Abstract: This paper analyzes distributional properties that facilitate the categorization of words into lexical categories. First, word-context co-occurrence counts were collected using corpora of transcribed English child-directed speech. Then, an unsupervised k-nearest neighbor algorithm was used to categorize words into lexical categories. The categorization outcome was regressed over three main distributional predictors computed for each word, including frequency, contextual diversity, and average conditional probability given all the co-occurring contexts. Results show that both contextual diversity and frequency have a positive effect while the average conditional probability has a negative effect. This indicates that words are easier to categorize in the face of uncertainty: categorization works best for words which are frequent, diverse, and hard to predict given the co-occurring contexts. This shows how, in order for the learner to see an opportunity to form a category, there needs to be a certain degree of uncertainty in the co-occurrence pattern.

9 citations


Journal ArticleDOI
01 Jan 2018
TL;DR: The authors studied the impact of Flemish adolescents' social background on non-standard writing and found significant correlations between different aspects of social class (level of education, home language and profession of the parents) and all examined deviations from formal written standard Dutch.
Abstract: In a large corpus (2.9 million tokens) of chat conversations, we studied the impact of Flemish adolescents’ social background on non-standard writing. We found significant correlations between different aspects of social class (level of education, home language and profession of the parents) and all examined deviations from formal written standard Dutch. Clustering several social variables might not only lead to a better operationalization of the complex phenomenon of social class, it certainly allows for discriminating social groups with distinct linguistic practices: lower class teenagers used each of the non-standard features much more often and in some cases in a different way than their upper class peers. Possible explanations concern discrepancies in terms of both linguistic proficiency and linguistic attitudes. Our findings emphasize the importance of including social background as an independent variable in variationist studies on youngsters’ computer-mediated communication.

01 Sep 2018
TL;DR: This article analyzed Flemish adolescents' non-standard writing practices and look for correlations with the teenagers' social class, finding that these parameters are highly correlated, combining them into one social class label.
Abstract: In a large social media corpus (2.9 million tokens), we analyze Flemish adolescents’ non-standard writing practices and look for correlations with the teenagers’ social class. Three different aspects of adolescents’ social background are included: educational track, parental profession, and home language. Since the data reveal that these parameters are highly correlated, we combine them into one social class label. The different linguistic practices emerging from the analyses demonstrate the crucial impact of social class on adolescent online writing practices. Furthermore, our results nuance classical findings on working class adherence to ‘old vernacular’ by also highlighting working class youth’s strong connection to the online writing culture, or ‘new vernacular’. Finally, we point out the complexity of the social class variable by demonstrating interactions with gender and age, and by examining groups of teenagers whose social background is ambiguous and therefore hard to operationalize.

Proceedings Article
01 Aug 2018
TL;DR: This work elaborates on SentProp, a framework for inducing domain-specific polarities from word embeddings, by evaluating its use for enhancing DuOMan, a general-purpose lexicon, for use in the political domain.
Abstract: Lexicon based methods for sentiment analysis rely on high quality polarity lexicons. In recent years, automatic methods for inducing lexicons have increased the viability of lexicon based methods for polarity classification. SentProp is a framework for inducing domain-specific polarities from word embeddings. We elaborate on SentProp by evaluating its use for enhancing DuOMan, a general-purpose lexicon, for use in the political domain. By adding only top sentiment bearing words from the vocabulary and applying small polarity shifts in the general-purpose lexicon, we increase accuracy in an in-domain classification task. The enhanced lexicon performs worse than the original lexicon in an out-domain task, showing that the words we added and the polarity shifts we applied are domain-specific and do not translate well to an out-domain setting.

Proceedings ArticleDOI
01 Oct 2018
TL;DR: While the detection of the most theory- and practice-oriented educational tracks seems to be a relatively easy task, the hybrid Technical level appears to be much harder to capture based on online writing style, as expected.
Abstract: We aim to predict Flemish adolescents’ educational track based on their Dutch social media writing. We distinguish between the three main types of Belgian secondary education: General (theory-oriented), Vocational (practice-oriented), and Technical Secondary Education (hybrid). The best results are obtained with a Naive Bayes model, i.e. an F-score of 0.68 (std. dev. 0.05) in 10-fold cross-validation experiments on the training data and an F-score of 0.60 on unseen data. Many of the most informative features are character n-grams containing specific occurrences of chatspeak phenomena such as emoticons. While the detection of the most theory- and practice-oriented educational tracks seems to be a relatively easy task, the hybrid Technical level appears to be much harder to capture based on online writing style, as expected.

Posted Content
TL;DR: The authors proposed a rule induction model to explain the predictions of a neural network trained for 4-class text classification from the 20 newsgroups dataset to a macro-averaged F-score of 080.
Abstract: Understanding the behavior of a trained network and finding explanations for its outputs is important for improving the network's performance and generalization ability, and for ensuring trust in automated systems Several approaches have previously been proposed to identify and visualize the most important features by analyzing a trained network However, the relations between different features and classes are lost in most cases We propose a technique to induce sets of if-then-else rules that capture these relations to globally explain the predictions of a network We first calculate the importance of the features in the trained network We then weigh the original inputs with these feature importance scores, simplify the transformed input space, and finally fit a rule induction model to explain the model predictions We find that the output rule-sets can explain the predictions of a neural network trained for 4-class text classification from the 20 newsgroups dataset to a macro-averaged F-score of 080 We make the code available at this https URL

Proceedings Article
01 Dec 2018
TL;DR: It is suggested that a carefully composed set of deep features is as informative as surface-feature word and character n-grams, and combining surface and deep features resulted in a slight increase in F-score, suggesting a less dynamic writing style for adolescents with ASD.
Abstract: One of the main characteristics of individuals with autism spectrum disorder (ASD) is a deficit in social communication. The effects of ASD on both verbal and non-verbal communication are widely researched in this respect. In this exploratory study, we investigate whether texts of Dutchspeaking adolescents with ASD (aged 12-18 years) are (automatically) distinguishable from texts written by typically developing peers. First, we want to reveal whether specific characteristics can be found in the writing style of adolescents with ASD, and secondly, we examine the possibility to use these features in an automated classification task. We look for surface features (word and character n-grams, and simple linguistic metrics), but also for deep linguistic features (namely syntactic, semantic and discourse features). The differences between the ASD group and control group are tested for statistical significance and we show that mainly syntactic features are different among the groups, possibly indicating a less dynamic writing style for adolescents with ASD. For the classification task, a Logistic Regression classifier is used. With a surface feature approach, we could reach an F-score of 72.15%, which is much higher than the random baseline of 50%. However, a pure n-gram-based approach very much relies on content and runs the risk of detecting topics instead of style, which argues the need of using deeper linguistic features. The best combination in the deep feature approach originally reached an F-score of just 62.14%, which could not be boosted by automatic feature selection. However, by taking into account the information from the statistical analysis and merely using the features that were significant or trending, we could equal the surface-feature performance and again reached an F-score of 72.15%. This suggests that a carefully composed set of deep features is as informative as surface-feature word and character n-grams. Moreover, combining surface and deep features resulted in a slight increase in F-score to 72.33%.

Proceedings Article
01 May 2018
TL;DR: This work presents wordkit, a python package which allows users to switch between feature sets and featurizers with a uniform API, allowing for rapid prototyping, and is the first package which integrates a variety of orthographic and phonological featurizer in a single package.
Abstract: The modeling of psycholinguistic phenomena, such as word reading, with machine learning techniques requires the featurization of word stimuli into appropriate orthographic and phonological representations. Critically, the choice of features impacts the performance of machine learning algorithms, and can have important ramifications for the conclusions drawn from a model. As such, featurizing words with a variety of feature sets, without having to resort to using different tools is beneficial in terms of development cost. In this work, we present wordkit, a python package which allows users to switch between feature sets and featurizers with a uniform API, allowing for rapid prototyping. To the best of our knowledge, this is the first package which integrates a variety of orthographic and phonological featurizers in a single package. The package is fully compatible with scikit-learn, and hence can be integrated into a variety of machine learning pipelines. Furthermore, the package is modular and extensible, allowing for the future integration of a large variety of feature sets and featurizers. The package and documentation can be found at github.com/stephantul/wordkit

Proceedings Article
01 Dec 2018
TL;DR: This paper explores the differences in propositional behavior between healthy individuals and Alzheimer’s patients, using a newly developed computerized propositional idea density measure for Dutch texts and provides support to the hypothesis that a slight decrease in propositionally idea density over time might be a predictor of cognitive decline in late life.
Abstract: Low linguistic ability has been associated with low cognitive reserve, which might result in the development of Alzheimer’s disease. As a result, propositional idea density (PID), as a measure of linguistic ability in early life, might predict cognitive decline in late life. This paper explores the differences in propositional behavior between healthy individuals and Alzheimer’s patients, using a newly developed computerized propositional idea density measure for Dutch texts. This exploratory study describes an experiment on literary text. We measured the propositional idea density of the works of one author without Alzheimer’s disease (i.e. Elsschot) and one author with attested Alzheimer’s disease (i.e. Claus). Changes in propositional idea density for both authors were compared, as well as the differences in propositional idea density in early life. Analyses from this experiment showed that the propositional idea density in early life of Elsschot was not significantly higher than that of Claus. The propositional idea density of Elsschot significantly increased over time. This change in propositional idea density greatly differed from the slight decrease in propositional idea density of Claus. On the one hand, this study fails to support the hypothesis that a low propositional idea density in early life predicts cognitive decline in late life. On the other hand, the results provide support to the hypothesis that a slight decrease in propositional idea density over time might be a predictor of cognitive decline in late life. However, much more research is needed to corroborate these findings. The propositional idea density software for Dutch is available on request.

Proceedings ArticleDOI
01 Oct 2018
TL;DR: A new neighborhood measure, rd20, is introduced, which can be used to quantify neighborhood effects over arbitrary feature spaces, and it is shown that feature sets that do not allow for transposition or deletion explain more variance in Reaction Time measurements.
Abstract: We investigate the relation between the transposition and deletion effects in word reading, i.e., the finding that readers can successfully read “SLAT” as “SALT”, or “WRK” as “WORK”, and the neighborhood effect. In particular, we investigate whether lexical orthographic neighborhoods take into account transposition and deletion in determining neighbors. If this is the case, it is more likely that the neighborhood effect takes place early during processing, and does not solely rely on similarity of internal representations. We introduce a new neighborhood measure, rd20, which can be used to quantify neighborhood effects over arbitrary feature spaces. We calculate the rd20 over large sets of words in three languages using various feature sets and show that feature sets that do not allow for transposition or deletion explain more variance in Reaction Time (RT) measurements. We also show that the rd20 can be calculated using the hidden state representations of an Multi-Layer Perceptron, and show that these explain less variance than the raw features. We conclude that the neighborhood effect is unlikely to have a perceptual basis, but is more likely to be the result of items co-activating after recognition. All code is available at: www.github.com/clips/conll2018



Proceedings ArticleDOI
01 Oct 2018
TL;DR: It is found that including ontological association between drugs and problems, and data-induced association between medical concepts does not reliably improve the performance, but that large gains are obtained by the incorporation of semantic classes to capture relation triggers.
Abstract: Recently, segment convolutional neural networks have been proposed for end-to-end relation extraction in the clinical domain, achieving results comparable to or outperforming the approaches with heavy manual feature engineering. In this paper, we analyze the errors made by the neural classifier based on confusion matrices, and then investigate three simple extensions to overcome its limitations. We find that including ontological association between drugs and problems, and data-induced association between medical concepts does not reliably improve the performance, but that large gains are obtained by the incorporation of semantic classes to capture relation triggers.