Showing papers by "Marie-Francine Moens published in 2021"

PDF

Open Access

Journal Article•DOI•

Discrete and continuous representations and processing in deep learning: Looking forward

[...]

Ruben Cartuyvels¹, Graham Spinks¹, Marie-Francine Moens¹•Institutions (1)

01 Jan 2021

TL;DR: It is argued that combining discrete and continuous representations and their processing will be essential to build systems that exhibit a general form of intelligence.

...read moreread less

Abstract: Discrete and continuous representations of content (e.g., of language or images) have interesting properties to be explored for the understanding of or reasoning with this content by machines. This position paper puts forward our opinion on the role of discrete and continuous representations and their processing in the deep learning field. Current neural network models compute continuous-valued data. Information is compressed into dense, distributed embeddings. By stark contrast, humans use discrete symbols in their communication with language. Such symbols represent a compressed version of the world that derives its meaning from shared contextual information. Additionally, human reasoning involves symbol manipulation at a cognitive level, which facilitates abstract reasoning, the composition of knowledge and understanding, generalization and efficient learning. Motivated by these insights, in this paper we argue that combining discrete and continuous representations and their processing will be essential to build systems that exhibit a general form of intelligence. We suggest and discuss several avenues that could improve current neural networks with the inclusion of discrete elements to combine the advantages of both types of representations.

...read moreread less

12 citations

Journal Article•DOI•

Causal relationship extraction from biomedical text using deep neural models: A comprehensive survey.

[...]

Abbas Akkasi¹, Marie-Francine Moens¹•Institutions (1)

Katholieke Universiteit Leuven¹

24 May 2021-Journal of Biomedical Informatics

TL;DR: In this paper, the authors present a survey of state-of-the-art methods to identify causal relationships between events or entities within biomedical texts, including multiview CNN, attention-based BiLSTM, and graph LSTM.

...read moreread less

11 citations

Journal Article•DOI•

Time-Aware Evidence Ranking for Fact-Checking

[...]

Liesbeth Allein¹, Isabelle Augenstein², Marie-Francine Moens¹•Institutions (2)

University of Copenhagen Faculty of Science¹, University of Copenhagen²

01 Nov 2021-Journal of Web Semantics

TL;DR: In this article, the authors investigate the hypothesis that the timestamp of a Web page is crucial to how it should be ranked for a given claim, and they delineate four temporal ranking methods that constrain evidence ranking differently and simulate hypothesis-specific evidence rankings given the evidence timestamps as gold standard.

...read moreread less

11 citations

Book Chapter•DOI•

LSTM for Dialogue Breakdown Detection: Exploration of Different Model Types and Word Embeddings

[...]

Mariya Hendriksen¹, Artuur Leeuwenberg², Marie-Francine Moens²•Institutions (2)

University of Amsterdam¹, Katholieke Universiteit Leuven²

01 Jan 2021

TL;DR: In this paper, a multinomial sequence classifier for dialogue breakdown detection was proposed, and the best performing model was selected and compared with the best model and with the majority baseline from the previous challenge.

...read moreread less

Abstract: One of the principal problems of human-computer interaction is miscommunication. Occurring mainly on behalf of the dialogue system, miscommunication can lead to dialogue breakdown, i.e., a point when the dialogue cannot be continued. Detecting breakdown can facilitate its prevention or recovery after breakdown occurred. In the paper, we propose a multinomial sequence classifier for dialogue breakdown detection. We explore several LSTM models each different in terms of model type and word embedding models they use. We select our best performing model and compare it with the performance of the best model and with the majority baseline from the previous challenge. We conclude that our detector outperforms the baselines during the offline testing.

...read moreread less

8 citations

Journal Article•DOI•

Transitioning the information retrieval literature to a fully open access model

[...]

Djoerd Hiemstra¹, Marie-Francine Moens², Raffaele Perego, Fabrizio Sebastiani•Institutions (2)

Radboud University Nijmegen¹, Katholieke Universiteit Leuven²

19 Feb 2021-Sigir Forum

TL;DR: It is proposed that the IR community starts working on a road map for transitioning the IR literature to a fully, "diamond", open access model.

...read moreread less

Abstract: Almost all of the important literature on Information Retrieval (IR) is published in subscription-based journals and digital libraries. We argue that the lack of open access publishing in IR is seriously hampering progress and inclusiveness of the field. We propose that the IR community starts working on a road map for transitioning the IR literature to a fully, "diamond", open access model.

...read moreread less

7 citations

Journal Article•DOI•

Probing Spatial Clues: Canonical Spatial Templates for Object Relationship Understanding

[...]

Guillem Collell¹, Thierry Deruyttere¹, Marie-Francine Moens¹•Institutions (1)

Katholieke Universiteit Leuven¹

01 Jan 2021-IEEE Access

TL;DR: In this paper, the authors investigate the predictive power of solely processing spatial clues for scene understanding in 2D images and compare such an approach with visual appearance, and propose a scale-, mirror-, and translation-invariant representation that captures the spatial essence of the relationship, i.e., a canonical spatial representation.

...read moreread less

Abstract: Humans often leverage spatial clues to categorize scenes in a fraction of a second. This form of intelligence is very relevant in time-critical situations (e.g., when driving a car) and valuable to transfer to automated systems. This work investigates the predictive power of solely processing spatial clues for scene understanding in 2D images and compares such an approach with the predictive power of visual appearance. To this end, we design the laboratory task of predicting the identity of two objects (e.g., “man” and “horse”) and their relationship or predicate (e.g., “riding”) given exclusively the ground truth bounding box coordinates of both objects. We also measure the performance attainable in Human Object Interaction (HOI) detection, a real-world spatial task, which includes a setting where ground truth boxes are not available at test time. An additional goal is to identify the principles necessary to effectively represent a spatial template, that is, the visual region in which two objects involved in a relationship expressed by a predicate occur. We propose a scale-, mirror-, and translation-invariant representation that captures the spatial essence of the relationship, i.e., a canonical spatial representation. Tests in two benchmarks reveal: (1) High performance is attainable by using exclusively spatial information in all tasks. (2) In HOI detection, the canonical template outperforms the rest of spatial, visual, and several state-of-the-art baselines. (3) Simple fusion of visual and spatial features substantially improves performance. (4) Our methods fare remarkably well with a small amount of data and rare categories. Our results obtained on the Visual Genome (VG) and the Humans Interacting with Common Objects - Detection (HICO-DET) datasets indicate that great predictive power can be obtained from spatial clues alone, opening up possibilities for performing fast scene understanding at a glance.

...read moreread less

4 citations

Journal Article•DOI•

Fine-Grained Cross-Modal Retrieval for Cultural Items with Focal Attention and Hierarchical Encodings

[...]

Shurong Sheng, Katrien Laenen, Luc Van Gool, Marie-Francine Moens

25 Aug 2021-The first computers

TL;DR: This work proposes a weakly supervised alignment model where the correspondence between the input training visual and textual fragments is not known but their corresponding units that refer to the same artwork are treated as a positive pair.

...read moreread less

Abstract: In this paper, we target the tasks of fine-grained image–text alignment and cross-modal retrieval in the cultural heritage domain as follows: (1) given an image fragment of an artwork, we retrieve the noun phrases that describe it; (2) given a noun phrase artifact attribute, we retrieve the corresponding image fragment it specifies. To this end, we propose a weakly supervised alignment model where the correspondence between the input training visual and textual fragments is not known but their corresponding units that refer to the same artwork are treated as a positive pair. The model exploits the latent alignment between fragments across modalities using attention mechanisms by first projecting them into a shared common semantic space; the model is then trained by increasing the image–text similarity of the positive pair in the common space. During this process, we encode the inputs of our model with hierarchical encodings and remove irrelevant fragments with different indicator functions. We also study techniques to augment the limited training data with synthetic relevant textual fragments and transformed image fragments. The model is later fine-tuned by a limited set of small-scale image–text fragment pairs. We rank the test image fragments and noun phrases by their intermodal similarity in the learned common space. Extensive experiments demonstrate that our proposed models outperform two state-of-the-art methods adapted to fine-grained cross-modal retrieval of cultural items for two benchmark datasets.

...read moreread less

2 citations

Proceedings Article•DOI•

LIIR at SemEval-2021 task 6: Detection of Persuasion Techniques In Texts and Images using CLIP features

[...]

Erfan Ghadery¹, Damien Sileo², Marie-Francine Moens²•Institutions (2)

University of Tehran¹, Katholieke Universiteit Leuven²

01 Aug 2021

TL;DR: In this paper, a system combining pretrained multimodal models (CLIP) and chained classifiers was proposed to detect persuasion techniques in multimodality (memes) for SemEval-2021 task 6.

...read moreread less

Abstract: We describe our approach for SemEval-2021 task 6 on detection of persuasion techniques in multimodal content (memes). Our system combines pretrained multimodal models (CLIP) and chained classifiers. Also, we propose to enrich the data by a data augmentation technique. Our submission achieves a rank of 8/16 in terms of F1-micro and 9/16 with F1-macro on the test set.

...read moreread less

2 citations

Posted Content•

Towards Understanding Iterative Magnitude Pruning: Why Lottery Tickets Win.

[...]

Jaron Maene¹, Mingxiao Li¹, Marie-Francine Moens¹•Institutions (1)

Katholieke Universiteit Leuven¹

13 Jun 2021-arXiv: Learning

TL;DR: This article showed that by using a training method that is stable with respect to linear mode connectivity, large networks can also be entirely rewound to initialization, which raises doubts about the use of the lottery ticket hypothesis.

...read moreread less

Abstract: The lottery ticket hypothesis states that sparse subnetworks exist in randomly initialized dense networks that can be trained to the same accuracy as the dense network they reside in. However, the subsequent work has failed to replicate this on large-scale models and required rewinding to an early stable state instead of initialization. We show that by using a training method that is stable with respect to linear mode connectivity, large networks can also be entirely rewound to initialization. Our subsequent experiments on common vision tasks give strong credence to the hypothesis in Evci et al. (2020b) that lottery tickets simply retrain to the same regions (although not necessarily to the same basin). These results imply that existing lottery tickets could not have been found without the preceding dense training by iterative magnitude pruning, raising doubts about the use of the lottery ticket hypothesis.

...read moreread less

1 citations

Posted Content•

Imposing Relation Structure in Language-Model Embeddings Using Contrastive Learning

[...]

Christos Theodoropoulos¹, James Henderson², Andrei Catalin Coman², Marie-Francine Moens¹•Institutions (2)

Katholieke Universiteit Leuven¹, Idiap Research Institute²

02 Sep 2021-arXiv: Computation and Language

TL;DR: This article proposed a contrastive learning framework that trains sentence embeddings to encode the relations in a graph structure, which achieved state-of-the-art results on relation extraction task using only a simple KNN classifier.

...read moreread less

Abstract: Though language model text embeddings have revolutionized NLP research, their ability to capture high-level semantic information, such as relations between entities in text, is limited. In this paper, we propose a novel contrastive learning framework that trains sentence embeddings to encode the relations in a graph structure. Given a sentence (unstructured text) and its graph, we use contrastive learning to impose relation-related structure on the token-level representations of the sentence obtained with a CharacterBERT (El Boukkouri et al.,2020) model. The resulting relation-aware sentence embeddings achieve state-of-the-art results on the relation extraction task using only a simple KNN classifier, thereby demonstrating the success of the proposed method. Additional visualization by a tSNE analysis shows the effectiveness of the learned representation space compared to baselines. Furthermore, we show that we can learn a different space for named entity recognition, again using a contrastive learning objective, and demonstrate how to successfully combine both representation spaces in an entity-relation task.

...read moreread less

1 citations

Proceedings Article•DOI•

Clinical Report Classification: Continually Learning from User Feedback

[...]

Elias Moons¹, Marie-Francine Moens¹•Institutions (1)

Katholieke Universiteit Leuven¹

07 Jun 2021

TL;DR: In this article, a neural network approach that can aid the process of medical document classification is presented, where a human annotator can correct the model predictions for a new training sample which can then be used for training.

...read moreread less

Abstract: Hospitals have to deal with continually arriving clinical data. In this paper, we present a neural network approach that can aide the process medical document classification. In this scenario a human annotator can correct the model predictions for a new training sample which can then be used for training. This data needs to be classified into ICD categories and the newly obtained knowledge should be captured by the model with minimal loss of already acquired knowledge. More specifically, different strategies are proposed and evaluated for constructing a replay dataset in a continual learning setting. The presented methodology alternates an incremental learning phase with a full retraining of all training samples seen so far. In this manner, a balance can be found where most of the time newly obtained knowledge can immediately be added to the model, but not to the extent where it loses a vital part of previously obtained knowledge.

...read moreread less

Proceedings Article•DOI•

TIEVis: a Visual Analytics Dashboard for Temporal Information Extracted from Clinical Reports

[...]

Robin De Croon¹, Artuur Leeuwenberg², Jan Aerts, Marie-Francine Moens¹, Vero Vanden Abeele¹, Katrien Verbert¹ - Show less +2 more•Institutions (2)

Katholieke Universiteit Leuven¹, Utrecht University²

14 Apr 2021

TL;DR: TIEVis as mentioned in this paper is a visual analytics dashboard that visualizes event-timelines extracted from clinical reports, highlighting the importance of seeing events in their context, and the ability to manually verify and update critical events in a patient history.

...read moreread less

Abstract: Clinical reports, as unstructured texts, contain important temporal information. However, it remains a challenge for natural language processing (NLP) models to accurately combine temporal cues into a single coherent temporal ordering of described events. In this paper, we present TIEVis, a visual analytics dashboard that visualizes event-timelines extracted from clinical reports. We present the findings of a pilot study in which healthcare professionals explored and used the dashboard to complete a set of tasks. Results highlight the importance of seeing events in their context, and the ability to manually verify and update critical events in a patient history, as a basis to increase user trust.

...read moreread less

Journal Article•DOI•

Giving commands to a self-driving car: How to deal with uncertain situations?

[...]

Thierry Deruyttere¹, Victor Milewski¹, Marie-Francine Moens¹•Institutions (1)

Katholieke Universiteit Leuven¹

01 Aug 2021-Engineering Applications of Artificial Intelligence

TL;DR: In this article, the authors propose a model that detects uncertain situations when a command is given and finds the visual objects causing it, and generates a question generated by the system describing the uncertain objects.

...read moreread less

Posted Content•

Augmenting BERT-style Models with Predictive Coding to Improve Discourse-level Representations

[...]

Vladimir Araujo¹, Andrés Villa¹, Marcelo Mendoza², Marie-Francine Moens³, Alvaro Soto¹ - Show less +1 more•Institutions (3)

Pontifical Catholic University of Chile¹, Federico Santa María Technical University², Katholieke Universiteit Leuven³

10 Sep 2021-arXiv: Computation and Language

TL;DR: The authors proposed to use ideas from predictive coding theory to augment BERT-style language models with a mechanism that allows them to learn suitable discourse-level representations, which is able to predict future sentences using explicit top-down connections that operate at the intermediate layers of the network.

...read moreread less

Abstract: Current language models are usually trained using a self-supervised scheme, where the main focus is learning representations at the word or sentence level. However, there has been limited progress in generating useful discourse-level representations. In this work, we propose to use ideas from predictive coding theory to augment BERT-style language models with a mechanism that allows them to learn suitable discourse-level representations. As a result, our proposed approach is able to predict future sentences using explicit top-down connections that operate at the intermediate layers of the network. By experimenting with benchmarks designed to evaluate discourse-related knowledge using pre-trained sentence representations, we demonstrate that our approach improves performance in 6 out of 11 tasks by excelling in discourse relationship detection.

...read moreread less

Posted Content•

Like Article, Like Audience: Enforcing Multimodal Correlations for Disinformation Detection

[...]

Liesbeth Allein, Marie-Francine Moens¹, Domenico Perrotta•Institutions (1)

Association for Computing Machinery¹

31 Aug 2021-arXiv: Computation and Language

TL;DR: In this article, a multimodal learning algorithm for disinformation detection in online news articles is proposed, which is guided by the profile of users who prefer content similar to the news article that is evaluated, and this effect is reinforced if content is shared among different users.

...read moreread less

Abstract: User-generated content (e.g., tweets and profile descriptions) and shared content between users (e.g., news articles) reflect a user's online identity. This paper investigates whether correlations between user-generated and user-shared content can be leveraged for detecting disinformation in online news articles. We develop a multimodal learning algorithm for disinformation detection. The latent representations of news articles and user-generated content allow that during training the model is guided by the profile of users who prefer content similar to the news article that is evaluated, and this effect is reinforced if that content is shared among different users. By only leveraging user information during model optimization, the model does not rely on user profiling when predicting an article's veracity. The algorithm is successfully applied to three widely used neural classifiers, and results are obtained on different datasets. Visualization techniques show that the proposed model learns feature representations of unseen news articles that better discriminate between fake and real news texts.

...read moreread less

Proceedings Article•DOI•

Modeling Coreference Relations in Visual Dialog

[...]

Mingxiao Li¹, Marie-Francine Moens²•Institutions (2)

Katholieke Universiteit Leuven¹, University of Copenhagen Faculty of Science²

01 Apr 2021

TL;DR: This paper proposed two soft constraints that can improve the model's ability of resolving coreference relations in dialog in an unsupervised way, which achieved state-of-the-art performance on the VisDial v1.0 dataset.

...read moreread less

Abstract: Visual dialog is a vision-language task where an agent needs to answer a series of questions grounded in an image based on the understanding of the dialog history and the image. The occurrences of coreference relations in the dialog makes it a more challenging task than visual question-answering. Most previous works have focused on learning better multi-modal representations or on exploring different ways of fusing visual and language features, while the coreferences in the dialog are mainly ignored. In this paper, based on linguistic knowledge and discourse features of human dialog we propose two soft constraints that can improve the model’s ability of resolving coreferences in dialog in an unsupervised way. Experimental results on the VisDial v1.0 dataset shows that our model, which integrates two novel and linguistically inspired soft constraints in a deep transformer neural architecture, obtains new state-of-the-art performance in terms of recall at 1 and other evaluation metrics compared to current existing models and this without pretraining on other vision language datasets. Our qualitative results also demonstrate the effectiveness of the method that we propose.

...read moreread less

Proceedings Article•

Imposing Relation Structure in Language-Model Embeddings Using Contrastive Learning

[...]

Christos Theodoropoulos¹, James Henderson², Andrei Catalin Coman², Marie-Francine Moens¹•Institutions (2)

Katholieke Universiteit Leuven¹, Idiap Research Institute²

02 Sep 2021

TL;DR: This paper proposed a contrastive learning framework that trains sentence embeddings to encode the relations in a graph structure, which achieved state-of-the-art results on relation extraction task using only a simple KNN classifier.

...read moreread less

Abstract: Though language model text embeddings have revolutionized NLP research, their ability to capture high-level semantic information, such as relations between entities in text, is limited. In this paper, we propose a novel contrastive learning framework that trains sentence embeddings to encode the relations in a graph structure. Given a sentence (unstructured text) and its graph, we use contrastive learning to impose relation-related structure on the token level representations of the sentence obtained with a CharacterBERT (El Boukkouri et al., 2020) model. The resulting relation-aware sentence embeddings achieve state-of-the-art results on the relation extraction task using only a simple KNN classifier, thereby demonstrating the success of the proposed method. Additional visualization by a tSNE analysis shows the effectiveness of the learned representation space compared to baselines. Furthermore, we show that we can learn a different space for named entity recognition, again using a contrastive learning objective, and demonstrate how to successfully combine both representation spaces in an entity-relation task.

...read moreread less

Book Chapter•DOI•

How Do Simple Transformations of Text and Image Features Impact Cosine-Based Semantic Match?

[...]

Guillem Collell¹, Marie-Francine Moens¹•Institutions (1)

University of Copenhagen Faculty of Science¹

28 Mar 2021

TL;DR: This paper studied the effect of simple feature transforms (e.g., standardizing) in 25 datasets with 6 tasks covering semantic similarity and text and image retrieval, and found that some feature transforms provide solid improvements, suggesting their default adoption; cosine similarity fares better than Euclidean similarity.

...read moreread less

Abstract: Practitioners often resort to off-the-shelf feature extractors such as language models (e.g., BERT or Glove) for text or pre-trained CNNs for images. These features are often used without further supervision in tasks such as text or image retrieval and semantic similarity with cosine-based semantic match. Although cosine similarity is sensitive to centering and other feature transforms, their impact on task performance has not been systematically studied. Prior studies are limited to a single domain (e.g., bilingual embeddings) and one data modality (text). Here, we systematically study the effect of simple feature transforms (e.g., standardizing) in 25 datasets with 6 tasks covering semantic similarity and text and image retrieval. We further back up our claims in ad-hoc laboratory experiments. We include 15 (8 image + 7 text) embeddings, covering the state-of-the-art models. Our second goal is to determine whether the common practice of defaulting to the cosine similarity is empirically supported. Our findings reveal that: (i) some feature transforms provide solid improvements, suggesting their default adoption; (ii) cosine similarity fares better than Euclidean similarity, thus backing up standard practices. Ultimately, our takeaways provide actionable advice for practitioners.

...read moreread less

Posted Content•

Revisiting spatio-temporal layouts for compositional action recognition.

[...]

Gorjan Radevski¹, Marie-Francine Moens¹, Tinne Tuytelaars¹•Institutions (1)

Katholieke Universiteit Leuven¹

03 Nov 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, a compositional/few-shot action recognition approach is proposed, where multi-head attention is used over spatio-temporal layouts, i.e., configurations of object bounding boxes.

...read moreread less

Abstract: Recognizing human actions is fundamentally a spatio-temporal reasoning problem, and should be, at least to some extent, invariant to the appearance of the human and the objects involved. Motivated by this hypothesis, in this work, we take an object-centric approach to action recognition. Multiple works have studied this setting before, yet it remains unclear (i) how well a carefully crafted, spatio-temporal layout-based method can recognize human actions, and (ii) how, and when, to fuse the information from layout and appearance-based models. The main focus of this paper is compositional/few-shot action recognition, where we advocate the usage of multi-head attention (proven to be effective for spatial reasoning) over spatio-temporal layouts, i.e., configurations of object bounding boxes. We evaluate different schemes to inject video appearance information to the system, and benchmark our approach on background cluttered action recognition. On the Something-Else and Action Genome datasets, we demonstrate (i) how to extend multi-head attention for spatio-temporal layout-based action recognition, (ii) how to improve the performance of appearance-based models by fusion with layout-based models, (iii) that even on non-compositional background-cluttered video datasets, a fusion between layout- and appearance-based models improves the performance.

...read moreread less

Book Chapter•DOI•

Learning Grammar in Confined Worlds

[...]

Graham Spinks¹, Ruben Cartuyvels¹, Marie-Francine Moens¹•Institutions (1)

Katholieke Universiteit Leuven¹

01 Jan 2021

TL;DR: The authors argue that modern machine learning approaches fail to adequately address how grammar and common sense should be learned, and advocate for experiments with the use of abstract, confined world environments where agents interact with the emphasis on learning world models.

...read moreread less

Abstract: In this position paper we argue that modern machine learning approaches fail to adequately address how grammar and common sense should be learned. State of the art language models achieve impressive results in a range of specialized tasks but lack underlying world understanding. We advocate for experiments with the use of abstract, confined world environments where agents interact with the emphasis on learning world models. Agents are induced to learn the grammar needed to navigate the environment, hence their grammar will be grounded in this abstracted world. We believe that this grounded grammar will therefore facilitate a more realistic, interpretable and human-like form of common sense.

...read moreread less

Proceedings Article•

Augmenting BERT-style Models with Predictive Coding to Improve Discourse-level Representations

[...]

Vladimir Araujo¹, Andrés Villa¹, Marcelo Mendoza², Marie-Francine Moens³, Alvaro Soto¹ - Show less +1 more•Institutions (3)

Pontifical Catholic University of Chile¹, Federico Santa María Technical University², University of Copenhagen Faculty of Science³

01 Nov 2021

TL;DR: This article proposed to use ideas from predictive coding theory to augment BERT-style language models with a mechanism that allows them to learn suitable discourse-level representations, which is able to predict future sentences using explicit top-down connections that operate at the intermediate layers of the network.

...read moreread less