Home
/
Authors
/
Tom F. Wilderjans

Author

Tom F. Wilderjans

Other affiliations: VU University Amsterdam, Katholieke Universiteit Leuven, Leiden University Medical Center ...read more

Bio: Tom F. Wilderjans is an academic researcher from Leiden University. The author has contributed to research in topics: Cluster analysis & Computer science. The author has an hindex of 16, co-authored 47 publications receiving 680 citations. Previous affiliations of Tom F. Wilderjans include VU University Amsterdam & Katholieke Universiteit Leuven.

Topics: Cluster analysis, Computer science, Psychology, Binary data, Temperament ...read more

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2009
2008

Papers

PDF

Open Access

More filters

Journal Article•DOI•

CHull: a generic convex-hull-based model selection method.

[...]

Tom F. Wilderjans¹, Eva Ceulemans¹, Kristof Meers¹•Institutions (1)

Katholieke Universiteit Leuven¹

01 Mar 2013-Behavior Research Methods

TL;DR: The wide applicability of the CHull method is demonstrated by showing how it can be used to solve various model selection problems in the context of PCA, reduced K-means, best-subset regression, and partial least squares regression.

...read moreread less

Abstract: When analyzing data, researchers are often confronted with a model selection problem (e.g., determining the number of components/factors in principal components analysis [PCA]/factor analysis or identifying the most important predictors in a regression analysis). To tackle such a problem, researchers may apply some objective procedure, like parallel analysis in PCA/factor analysis or stepwise selection methods in regression analysis. A drawback of these procedures is that they can only be applied to the model selection problem at hand. An interesting alternative is the CHull model selection procedure, which was originally developed for multiway analysis (e.g., multimode partitioning). However, the key idea behind the CHull procedure—identifying a model that optimally balances model goodness of fit/misfit and model complexity—is quite generic. Therefore, the procedure may also be used when applying many other analysis techniques. The aim of this article is twofold. First, we demonstrate the wide applicability of the CHull method by showing how it can be used to solve various model selection problems in the context of PCA, reduced K-means, best-subset regression, and partial least squares regression. Moreover, a comparison of CHull with standard model selection methods for these problems is performed. Second, we present the CHULL software, which may be downloaded from http://ppw.kuleuven.be/okp/software/CHULL/, to assist the user in applying the CHull procedure.

...read moreread less

107 citations

Journal Article•DOI•

A flexible framework for sparse simultaneous component based data integration

[...]

Katrijn Van Deun¹, Tom F. Wilderjans¹, Robert A. van den Berg¹, Robert A. van den Berg², Anestis Antoniadis³, Iven Van Mechelen¹ - Show less +2 more•Institutions (3)

Katholieke Universiteit Leuven¹, GlaxoSmithKline², Joseph Fourier University³

15 Nov 2011-BMC Bioinformatics

TL;DR: Sparse simultaneous component analysis is a useful method for data integration: first, simultaneous analyses of multiple blocks offer advantages over sequential and separate analyses and second, interpretation of the results is highly facilitated by their sparseness.

...read moreread less

Abstract: High throughput data are complex and methods that reveal structure underlying the data are most useful. Principal component analysis, frequently implemented as a singular value decomposition, is a popular technique in this respect. Nowadays often the challenge is to reveal structure in several sources of information (e.g., transcriptomics, proteomics) that are available for the same biological entities under study. Simultaneous component methods are most promising in this respect. However, the interpretation of the principal and simultaneous components is often daunting because contributions of each of the biomolecules (transcripts, proteins) have to be taken into account. We propose a sparse simultaneous component method that makes many of the parameters redundant by shrinking them to zero. It includes principal component analysis, sparse principal component analysis, and ordinary simultaneous component analysis as special cases. Several penalties can be tuned that account in different ways for the block structure present in the integrated data. This yields known sparse approaches as the lasso, the ridge penalty, the elastic net, the group lasso, sparse group lasso, and elitist lasso. In addition, the algorithmic results can be easily transposed to the context of regression. Metabolomics data obtained with two measurement platforms for the same set of Escherichia coli samples are used to illustrate the proposed methodology and the properties of different penalties with respect to sparseness across and within data blocks. Sparse simultaneous component analysis is a useful method for data integration: First, simultaneous analyses of multiple blocks offer advantages over sequential and separate analyses and second, interpretation of the results is highly facilitated by their sparseness. The approach offered is flexible and allows to take the block structure in different ways into account. As such, structures can be found that are exclusively tied to one data platform (group lasso approach) as well as structures that involve all data platforms (Elitist lasso approach). The additional file contains a MATLAB implementation of the sparse simultaneous component method.

...read moreread less

54 citations

Journal Article•DOI•

Temperament subtypes in treatment seeking obese individuals: a latent profile analysis.

[...]

Astrid Müller¹, Laurence Claes², Tom F. Wilderjans², Martina de Zwaan¹•Institutions (2)

Hannover Medical School¹, Katholieke Universiteit Leuven²

01 Jul 2014-European Eating Disorders Review

TL;DR: The findings support the assumptions regarding the heterogeneity of obesity and the association between temperament subtypes and psychopathology.

...read moreread less

Abstract: Objective This study aimed to investigate temperament subtypes in obese patients. Methods Ninety-three bariatric surgery candidates and 63 obese inpatients from a psychotherapy unit answered the Behavioral Inhibition System/Behavioral Activation System Scale (BIS/BAS), the Effortful Control subscale of the Adult Temperament Questionnaire-Short Form (ATQ-EC), and questionnaires for eating disorder, depressive and attention deficit hyperactivity disorder (ADHD) symptoms and completed neurocognitive testing for executive functions. Binge eating disorder and impulse control disorders were diagnosed using interviews. Results A latent profile analysis using BIS/BAS and ATQ-EC scores revealed a ‘resilient/high functioning’ cluster (n = 88) showing high ATQ-EC and low BIS/BAS scores and an ‘emotionally dysregulated/undercontrolled’ cluster (n = 68) with low ATQ-EC and high BIS/BAS scores. Patients from the ‘emotionally dysregulated/undercontrolled’ cluster showed more eating disorder, depressive and ADHD symptoms, and poorer performance in the labyrinth task. Conclusion The findings support the assumptions regarding the heterogeneity of obesity and the association between temperament subtypes and psychopathology. Copyright © 2014 John Wiley & Sons, Ltd and Eating Disorders Association.

...read moreread less

47 citations

Journal Article•DOI•

CHull as an alternative to AIC and BIC in the context of mixtures of factor analyzers

[...]

Kirsten Bulteel¹, Tom F. Wilderjans¹, Francis Tuerlinckx¹, Eva Ceulemans¹•Institutions (1)

Katholieke Universiteit Leuven¹

10 Jan 2013-Behavior Research Methods

TL;DR: The CHull (Ceulemans & Kiers, 2006) method, which also balances model fit and complexity, is presented as an interesting alternative model selection strategy for MFA.

...read moreread less

Abstract: Mixture analysis is commonly used for clustering objects on the basis of multivariate data. When the data contain a large number of variables, regular mixture analysis may become problematic, because a large number of parameters need to be estimated for each cluster. To tackle this problem, the mixtures-of-factor-analyzers (MFA) model was proposed, which combines clustering with exploratory factor analysis. MFA model selection is rather intricate, as both the number of clusters and the number of underlying factors have to be determined. To this end, the Akaike (AIC) and Bayesian (BIC) information criteria are often used. AIC and BIC try to identify a model that optimally balances model fit and model complexity. In this article, the CHull (Ceulemans & Kiers, 2006) method, which also balances model fit and complexity, is presented as an interesting alternative model selection strategy for MFA. In an extensive simulation study, the performances of AIC, BIC, and CHull were compared. AIC performs poorly and systematically selects overly complex models, whereas BIC performs slightly better than CHull when considering the best model only. However, when taking model selection uncertainty into account by looking at the first three models retained, CHull outperforms BIC. This especially holds in more complex, and thus more realistic, situations (e.g., more clusters, factors, noise in the data, and overlap among clusters).

...read moreread less

46 citations

Journal Article•DOI•

Performing DISCO-SCA to search for distinctive and common information in linked data

[...]

Martijn Schouteden¹, Katrijn Van Deun¹, Tom F. Wilderjans¹, Iven Van Mechelen¹•Institutions (1)

Katholieke Universiteit Leuven¹

01 Nov 2013-Behavior Research Methods

TL;DR: The main benefits of the DISCO-SCA GUI are that it is easy to use, strongly facilitates the choice of model selection parameters (such as the number of mechanisms and their status as being common or distinctive), and is freely available.

...read moreread less

Abstract: Behavioral researchers often obtain information about the same set of entities from different sources. A main challenge in the analysis of such data is to reveal, on the one hand, the mechanisms underlying all of the data blocks under study and, on the other hand, the mechanisms underlying a single data block or a few such blocks only (i.e., common and distinctive mechanisms, respectively). A method called DISCO-SCA has been proposed by which such mechanisms can be found. The goal of this article is to make the DISCO-SCA method more accessible, in particular for applied researchers. To this end, first we will illustrate the different steps in a DISCO-SCA analysis, with data stemming from the domain of psychiatric diagnosis. Second, we will present in this article the DISCO-SCA graphical user interface (GUI). The main benefits of the DISCO-SCA GUI are that it is easy to use, strongly facilitates the choice of model selection parameters (such as the number of mechanisms and their status as being common or distinctive), and is freely available.

...read moreread less

44 citations

1
2
3
4
…
5
6
7
8
9
10
11
12

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•

When is nearest neighbor meaningful

[...]

Kevin S. Beyer, Jonathan Goldstein, Raghu Ramakrishnan, Uri Shaft

01 Jan 1999-Lecture Notes in Computer Science

TL;DR: In this article, the authors explore the effect of dimensionality on the nearest neighbor problem and show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance of the farthest data point.

...read moreread less

Abstract: We explore the effect of dimensionality on the nearest neighbor problem. We show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance to the farthest data point. To provide a practical perspective, we present empirical results on both real and synthetic data sets that demonstrate that this effect can occur for as few as 10-15 dimensions. These results should not be interpreted to mean that high-dimensional indexing is never meaningful; we illustrate this point by identifying some high-dimensional workloads for which this effect does not occur. However, our results do emphasize that the methodology used almost universally in the database literature to evaluate high-dimensional indexing techniques is flawed, and should be modified. In particular, most such techniques proposed in the literature are not evaluated versus simple linear scan, and are evaluated over workloads for which nearest neighbor is not meaningful. Often, even the reported experiments, when analyzed carefully, show that linear scan would outperform the techniques being proposed on the workloads studied in high (10-15) dimensionality!.

...read moreread less

1,992 citations

Journal Article•DOI•

Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects

[...]

Dana Lahat, Tulay Adali¹, Christian Jutten•Institutions (1)

University of Maryland, Baltimore County¹

20 Aug 2015

TL;DR: In this paper, a number of data-driven solutions based on matrix and tensor decompositions are discussed, emphasizing how they account for diversity across the data sets, and a key concept, diversity, is introduced.

...read moreread less

Abstract: In various disciplines, information about the same phenomenon can be acquired from different types of detectors, at different conditions, in multiple experiments or subjects, among others. We use the term “modality” for each such acquisition framework. Due to the rich characteristics of natural phenomena, it is rare that a single modality provides complete knowledge of the phenomenon of interest. The increasing availability of several modalities reporting on the same system introduces new degrees of freedom, which raise questions beyond those related to exploiting each modality separately. As we argue, many of these questions, or “challenges,” are common to multiple domains. This paper deals with two key issues: “why we need data fusion” and “how we perform it.” The first issue is motivated by numerous examples in science and technology, followed by a mathematical framework that showcases some of the benefits that data fusion provides. In order to address the second issue, “diversity” is introduced as a key concept, and a number of data-driven solutions based on matrix and tensor decompositions are discussed, emphasizing how they account for diversity across the data sets. The aim of this paper is to provide the reader, regardless of his or her community of origin, with a taste of the vastness of the field, the prospects, and the opportunities that it holds.

...read moreread less

673 citations

Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects This paper provides an overview of the main challenges in multimodal data fusion acrossvariousdisciplinesandaddressestwokeyissues:''whyweneeddatafusion''and ''how we perform it.''

[...]

Dana Lahat, Tulay Adali, Christian Jutten

01 Jan 2015

TL;DR: The aim of this paper is to provide the reader with a taste of the vastness of the field, the prospects, and the opportunities that it holds, and a number of data-driven solutions based on matrix and tensor decompositions are discussed, emphasizing how they account for diversity across the data sets.

...read moreread less

Abstract: In various disciplines, information about the same phenomenon can be acquired from different types of detectors, at different conditions, in multiple experiments or subjects, among others. We use the term ''modality'' for each such acquisition framework.Duetothe richcharacteristics of natural phenomena, it is rare that a single modality provides complete knowledge of the phenomenon of interest. The increasing availability of several modalities reporting on the same system introduces new degrees of freedom, which raise questions beyond those related to exploiting each modality separately. As we argue, many of these questions, or ''challenges,'' are common to multiple domains. This paper deals with two key issues: ''why we need data fusion'' and ''how we perform it.'' The first issue is motivated by numerous examples in science and technology, followed by a mathematical framework that showcases some of the benefits that data fusion provides. In order to address the second issue, ''diversity'' is introduced as a key concept, and a number of data-driven solutions based on matrix and tensor decompositions are discussed, emphasizing how they account for diversity across the data sets. The aim of this paper is to provide the reader, regardless of his or her community of origin, with a taste of the vastness of the field, the prospects, and the opportunities that it holds.

...read moreread less

373 citations

Journal Article•DOI•

Pupil dilation as an index of effort in cognitive control tasks: A review

[...]

Pauline van der Wel¹, Henk van Steenbergen¹•Institutions (1)

Leiden University¹

01 Dec 2018-Psychonomic Bulletin & Review

TL;DR: It is shown how an effort account of pupil dilation can provide an explanation of these findings and future directions to further corroborate this account are discussed in the context of recent theories on cognitive control and effort and their potential neurobiological substrates.

...read moreread less

Abstract: Pupillometry research has experienced an enormous revival in the last two decades. Here we briefly review the surge of recent studies on task-evoked pupil dilation in the context of cognitive control tasks with the primary aim being to evaluate the feasibility of using pupil dilation as an index of effort exertion, rather than task demand or difficulty. Our review shows that across the three cognitive control domains of updating, switching, and inhibition, increases in task demands typically leads to increases in pupil dilation. Studies show a diverging pattern with respect to the relationship between pupil dilation and performance and we show how an effort account of pupil dilation can provide an explanation of these findings. We also discuss future directions to further corroborate this account in the context of recent theories on cognitive control and effort and their potential neurobiological substrates.

...read moreread less

371 citations

Journal Article•DOI•

Tensors for Data Mining and Data Fusion: Models, Applications, and Scalable Algorithms

[...]

Evangelos E. Papalexakis¹, Christos Faloutsos², Nicholas D. Sidiropoulos³•Institutions (3)

University of California, Riverside¹, Carnegie Mellon University², University of Minnesota³

03 Oct 2016-ACM Transactions on Intelligent Systems and Technology

TL;DR: This survey presents some of the most widely used tensor decompositions, providing the key insights behind them, and summarizing them from a practitioner’s point of view.

...read moreread less

Abstract: Tensors and tensor decompositions are very powerful and versatile tools that can model a wide variety of heterogeneous, multiaspect data. As a result, tensor decompositions, which extract useful latent information out of multiaspect data tensors, have witnessed increasing popularity and adoption by the data mining community. In this survey, we present some of the most widely used tensor decompositions, providing the key insights behind them, and summarizing them from a practitioner’s point of view. We then provide an overview of a very broad spectrum of applications where tensors have been instrumental in achieving state-of-the-art performance, ranging from social network analysis to brain data analysis, and from web mining to healthcare. Subsequently, we present recent algorithmic advances in scaling tensor decompositions up to today’s big data, outlining the existing systems and summarizing the key ideas behind them. Finally, we conclude with a list of challenges and open problems that outline exciting future research directions.

...read moreread less

347 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128

Collapse