Home
/
Authors
/
S. Van Dongen

Author

S. Van Dongen

Bio: S. Van Dongen is an academic researcher from European Bioinformatics Institute. The author has contributed to research in topics: Comparative genomics & Protein structure database. The author has an hindex of 2, co-authored 2 publications receiving 3159 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

An efficient algorithm for large-scale detection of protein families

[...]

Anton J. Enright¹, S. Van Dongen, Christos A. Ouzounis•Institutions (1)

European Bioinformatics Institute¹

01 Apr 2002-Nucleic Acids Research

TL;DR: This work presents a novel approach called TRIBE-MCL for rapid and accurate clustering of protein sequences into families based on precomputed sequence similarity information that has been rigorously tested and validated on a number of very large databases.

...read moreread less

Abstract: Detection of protein families in large databases is one of the principal research objectives in structural and functional genomics. Protein family classification can significantly contribute to the delineation of functional diversity of homologous proteins, the prediction of function based on domain architecture or the presence of sequence motifs as well as comparative genomics, providing valuable evolutionary insights. We present a novel approach called TRIBE-MCL for rapid and accurate clustering of protein sequences into families. The method relies on the Markov cluster (MCL) algorithm for the assignment of proteins into families based on precomputed sequence similarity information. This novel approach does not suffer from the problems that normally hinder other protein sequence clustering algorithms, such as the presence of multi-domain proteins, promiscuous domains and fragmented proteins. The method has been rigorously tested and validated on a number of very large databases, including SwissProt, InterPro, SCOP and the draft human genome. Our results indicate that the method is ideally suited to the rapid and accurate detection of protein families on a large scale. The method has been used to detect and categorise protein families within the draft human genome and the resulting families have been used to annotate a large proportion of human proteins.

...read moreread less

3,468 citations

Journal Article•DOI•

Lack of correlation between predicted and actual off-target effects of short-interfering RNAs targeting the human papillomavirus type 16 E7 oncogene.

[...]

Jennifer E Hanning¹, Harpreet K Saini², Matthew J. Murray¹, S. Van Dongen², Matthew P. Davis², Emily Barker¹, Dawn Ward¹, Cinzia G. Scarpini¹, Anton J. Enright², Mark R. Pett¹, Nicholas Coleman¹ - Show less +7 more•Institutions (2)

University of Cambridge¹, European Bioinformatics Institute²

05 Feb 2013-British Journal of Cancer

TL;DR: The OTEs of potential therapeutic siRNAs targeting the human papillomavirus type-16 E7 oncogene are investigated, finding no correlation between the number of computationally predicted Otes and the actual number of seed-dependent O TEs.

...read moreread less

Abstract: Lack of correlation between predicted and actual off-target effects of short-interfering RNAs targeting the human papillomavirus type 16 E7 oncogene

...read moreread less

22 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

OrthoMCL: identification of ortholog groups for eukaryotic genomes.

[...]

Li Li¹, Christian J. Stoeckert, David S. Roos•Institutions (1)

University of Pennsylvania¹

01 Sep 2003-Genome Research

TL;DR: OrthoMCL provides a scalable method for constructing orthologous groups across multiple eukaryotic taxa, using a Markov Cluster algorithm to group (putative) orthologs and paralogs.

...read moreread less

Abstract: The identification of orthologous groups is useful for genome annotation, studies on gene/protein evolution, comparative genomics, and the identification of taxonomically restricted sequences. Methods successfully exploited for prokaryotic genome analysis have proved difficult to apply to eukaryotes, however, as larger genomes may contain multiple paralogous genes, and sequence information is often incomplete. OrthoMCL provides a scalable method for constructing orthologous groups across multiple eukaryotic taxa, using a Markov Cluster algorithm to group (putative) orthologs and paralogs. This method performs similarly to the INPARANOID algorithm when applied to two genomes, but can be extended to cluster orthologs from multiple species. OrthoMCL clusters are coherent with groups identified by EGO, but improved recognition of "recent" paralogs permits overlapping EGO groups representing the same gene to be merged. Comparison with previously assigned EC annotations suggests a high degree of reliability, implying utility for automated eukaryotic genome annotation. OrthoMCL has been applied to the proteome data set from seven publicly available genomes (human, fly, worm, yeast, Arabidopsis, the malaria parasite Plasmodium falciparum, and Escherichia coli). A Web interface allows queries based on individual genes or user-defined phylogenetic patterns (http://www.cbil.upenn.edu/gene-family). Analysis of clusters incorporating P. falciparum genes identifies numerous enzymes that were incompletely annotated in first-pass annotation of the parasite genome.

...read moreread less

5,321 citations

Journal Article•DOI•

Maps of random walks on complex networks reveal community structure

[...]

Martin Rosvall¹, Carl T. Bergstrom•Institutions (1)

University of Washington¹

29 Jan 2008-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: An information theoretic approach is introduced that reveals community structure in weighted and directed networks of large-scale biological and social systems and reveals a directional pattern of citation from the applied fields to the basic sciences.

...read moreread less

Abstract: To comprehend the multipartite organization of large-scale biological and social systems, we introduce an information theoretic approach that reveals community structure in weighted and directed networks. We use the probability flow of random walks on a network as a proxy for information flows in the real system and decompose the network into modules by compressing a description of the probability flow. The result is a map that both simplifies and highlights the regularities in the structure and their relationships. We illustrate the method by making a map of scientific communication as captured in the citation patterns of >6,000 journals. We discover a multicentric organization with fields that vary dramatically in size and degree of integration into the network of science. Along the backbone of the network—including physics, chemistry, molecular biology, and medicine—information flows bidirectionally, but the map reveals a directional pattern of citation from the applied fields to the basic sciences.

...read moreread less

4,051 citations

Journal Article•DOI•

Network Medicine: A Network-Based Approach to Human Disease

[...]

Albert-László Barabási¹, Natali Gulbahce², Natali Gulbahce³, Natali Gulbahce⁴, Joseph Loscalzo⁵ - Show less +1 more•Institutions (5)

Dana Corporation¹, University of California, San Francisco², Harvard University³, Northeastern University⁴, Brigham and Women's Hospital⁵

01 Jan 2011-Nature Reviews Genetics

TL;DR: Advances in this direction are essential for identifying new disease genes, for uncovering the biological significance of disease-associated mutations identified by genome-wide association studies and full-genome sequencing, and for identifying drug targets and biomarkers for complex diseases.

...read moreread less

Abstract: Given the functional interdependencies between the molecular components in a human cell, a disease is rarely a consequence of an abnormality in a single gene, but reflects the perturbations of the complex intracellular and intercellular network that links tissue and organ systems. The emerging tools of network medicine offer a platform to explore systematically not only the molecular complexity of a particular disease, leading to the identification of disease modules and pathways, but also the molecular relationships among apparently distinct (patho)phenotypes. Advances in this direction are essential for identifying new disease genes, for uncovering the biological significance of disease-associated mutations identified by genome-wide association studies and full-genome sequencing, and for identifying drug targets and biomarkers for complex diseases.

...read moreread less

3,978 citations

Journal Article•DOI•

MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity

[...]

Yupeng Wang¹, Haibao Tang¹, Jeremy D. DeBarry¹, Xu-fei Tan¹, Jingping Li¹, Xiyin Wang¹, Tae-Ho Lee¹, Huizhe Jin¹, Barry S. Marler¹, Hui Guo¹, Jessica C. Kissinger¹, Andrew H. Paterson¹ - Show less +8 more•Institutions (1)

Plant Genome Mapping Laboratory¹

01 Apr 2012-Nucleic Acids Research

TL;DR: The MCScanX toolkit implements an adjusted MCScan algorithm for detection of synteny and collinearity that extends the original software by incorporating 14 utility programs for visualization of results and additional downstream analyses.

...read moreread less

Abstract: MCScan is an algorithm able to scan multiple genomes or subgenomes in order to identify putative homologous chromosomal regions, and align these regions using genes as anchors. The MCScanX toolkit implements an adjusted MCScan algorithm for detection of synteny and collinearity that extends the original software by incorporating 14 utility programs for visualization of results and additional downstream analyses. Applications of MCScanX to several sequenced plant genomes and gene families are shown as examples. MCScanX can be used to effectively analyze chromosome structural changes, and reveal the history of gene family expansions that might contribute to the adaptation of lineages and taxa. An integrated view of various modes of gene duplication can supplement the traditional gene tree analysis in specific families. The source code and documentation of MCScanX are freely available at http://chibba.pgml.uga.edu/mcscan2/.

...read moreread less

3,388 citations

Journal Article•DOI•

The consensus molecular subtypes of colorectal cancer

[...]

Justin Guinney¹, Rodrigo Dienstmann¹, Rodrigo Dienstmann², Xingwu Wang³, Xingwu Wang⁴, Aurélien de Reyniès, Andreas Schlicker⁵, Charlotte Soneson⁶, Laetitia Marisa, Paul Roepman, Gift Nyamundanda, Paolo Angelino⁶, Brian M. Bot¹, Jeffrey S. Morris⁷, Iris Simon, Sarah Gerster⁶, Evelyn Fessler³, Felipe De Sousa E Melo³, Edoardo Missiaglia⁶, Hena R. Ramay⁶, David Barras⁶, Krisztian Homicsko⁸, Dipen M. Maru⁷, Ganiraju C. Manyam⁷, Bradley M. Broom⁷, Valérie Boige⁹, Beatriz Perez-Villamil¹⁰, Ted Laderas¹, Ramon Salazar, Joe W. Gray¹¹, Douglas Hanahan⁸, Josep Tabernero², René Bernards⁵, Stephen H. Friend¹, Pierre Laurent-Puig¹², Jan Paul Medema³, Anguraj Sadanandam, Lodewyk F. A. Wessels⁵, Mauro Delorenzi¹³, Mauro Delorenzi⁶, Scott Kopetz⁷, Louis Vermeulen³, Sabine Tejpar¹⁴ - Show less +39 more•Institutions (14)

Sage Bionetworks¹, Autonomous University of Barcelona², University of Amsterdam³, City University of Hong Kong⁴, Netherlands Cancer Institute⁵, Swiss Institute of Bioinformatics⁶, University of Texas MD Anderson Cancer Center⁷, École Polytechnique Fédérale de Lausanne⁸, Institut Gustave Roussy⁹, Hospital Clínico San Carlos¹⁰, Oregon Health & Science University¹¹, Paris Descartes University¹², University of Lausanne¹³, Katholieke Universiteit Leuven¹⁴

01 Nov 2015-Nature Medicine

TL;DR: An international consortium dedicated to large-scale data sharing and analytics across expert groups is formed, showing marked interconnectivity between six independent classification systems coalescing into four consensus molecular subtypes (CMSs) with distinguishing features.

...read moreread less

Abstract: Colorectal cancer (CRC) is a frequently lethal disease with heterogeneous outcomes and drug responses. To resolve inconsistencies among the reported gene expression-based CRC classifications and facilitate clinical translation, we formed an international consortium dedicated to large-scale data sharing and analytics across expert groups. We show marked interconnectivity between six independent classification systems coalescing into four consensus molecular subtypes (CMSs) with distinguishing features: CMS1 (microsatellite instability immune, 14%), hypermutated, microsatellite unstable and strong immune activation; CMS2 (canonical, 37%), epithelial, marked WNT and MYC signaling activation; CMS3 (metabolic, 13%), epithelial and evident metabolic dysregulation; and CMS4 (mesenchymal, 23%), prominent transforming growth factor-β activation, stromal invasion and angiogenesis. Samples with mixed features (13%) possibly represent a transition phenotype or intratumoral heterogeneity. We consider the CMS groups the most robust classification system currently available for CRC-with clear biological interpretability-and the basis for future clinical stratification and subtype-based targeted interventions.

...read moreread less

3,351 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse