Author
Aalt D. J. van Dijk
Other affiliations: University of Florence, Utrecht University
Bio: Aalt D. J. van Dijk is an academic researcher from Wageningen University and Research Centre. The author has contributed to research in topics: Arabidopsis & Protein function prediction. The author has an hindex of 34, co-authored 88 publications receiving 5042 citations. Previous affiliations of Aalt D. J. van Dijk include University of Florence & Utrecht University.
Papers published on a yearly basis
Papers
More filters
••
Indiana University1, Buck Institute for Research on Aging2, University of California, San Francisco3, University of California, Santa Cruz4, Colorado State University5, University of Colorado Denver6, Icahn School of Medicine at Mount Sinai7, University of California, Berkeley8, European Bioinformatics Institute9, University of Bologna10, University of Missouri11, University of Bristol12, University of Helsinki13, University College London14, Centre for Development of Advanced Computing15, Purdue University16, Baylor College of Medicine17, Royal Holloway, University of London18, Technische Universität München19, University of Turku20, Queen's University21, University UCINF22, Max Planck Society23, Imperial College London24, Wageningen University and Research Centre25, Nestlé26, Fudan University27, University of Padua28, Temple University29, University of Geneva30, Swiss Institute of Bioinformatics31, Hebrew University of Jerusalem32, Miami University33
TL;DR: Today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets, and there is considerable need for improvement of currently available tools.
Abstract: Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools.
859 citations
••
TL;DR: HADDOCK2.0 as mentioned in this paper is the most recent version of HADDOCK, which incorporates considerable improvements and new features, such as random patch definition or center-of-mass restraints.
Abstract: Here we present version 2.0 of HADDOCK, which incorporates considerable improvements and new features. HADDOCK is now able to model not only protein-protein complexes but also other kinds of biomolecular complexes and multi-component (N > 2) systems. In the absence of any experimental and/or predicted information to drive the docking, HADDOCK now offers two additional ab initio docking modes based on either random patch definition or center-of-mass restraints. The docking protocol has been considerably improved, supporting among other solvated docking, automatic definition of semi-flexible regions, and inclusion of a desolvation energy term in the scoring scheme. The performance of HADDOCK2.0 is evaluated on the targets of rounds 4-11, run in a semi-automated mode using the original information we used in our CAPRI submissions. This enables a direct assessment of the progress made since the previous versions. Although HADDOCK performed very well in CAPRI (65% and 71% success rates, overall and for unbound targets only, respectively), a substantial improvement was achieved with HADDOCK2.0.
542 citations
••
TL;DR: The second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function, was conducted by as mentioned in this paper. But the results of the CAFA2 assessment are limited.
Abstract: BACKGROUND: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging. RESULTS: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. CONCLUSIONS: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.
330 citations
••
TL;DR: It is reported that two members of CYP711 enzymes can catalyze two distinct steps in SL biosynthesis, identifying the first enzymes involved in B-C ring closure and a subsequent structural diversification step of SLs.
Abstract: Strigolactones (SLs) are a class of phytohormones and rhizosphere signaling compounds with high structural diversity. Three enzymes, carotenoid isomerase DWARF27 and carotenoid cleavage dioxygenases CCD7 and CCD8, were previously shown to convert all-trans-β-carotene to carlactone (CL), the SL precursor. However, how CL is metabolized to SLs has remained elusive. Here, by reconstituting the SL biosynthetic pathway in Nicotiana benthamiana, we show that a rice homolog of Arabidopsis More Axillary Growth 1 (MAX1), encodes a cytochrome P450 CYP711 subfamily member that acts as a CL oxidase to stereoselectively convert CL into ent-2'-epi-5-deoxystrigol (B-C lactone ring formation), the presumed precursor of rice SLs. A protein encoded by a second rice MAX1 homolog then catalyzes the conversion of ent-2'-epi-5-deoxystrigol to orobanchol. We therefore report that two members of CYP711 enzymes can catalyze two distinct steps in SL biosynthesis, identifying the first enzymes involved in B-C ring closure and a subsequent structural diversification step of SLs.
289 citations
••
TL;DR: Significant indications are provided that higher-order complex formation is a general and essential molecular mechanism for plant MADS box protein functioning and attribute a pivotal role to the SEP3 'glue' protein in mediating multimerization.
Abstract: Plant MADS box proteins play important roles in a plethora of developmental processes. In order to regulate specific sets of target genes, MADS box proteins dimerize and are thought to assemble into multimeric complexes. In this study a large-scale yeast three-hybrid screen is utilized to provide insight into the higher-order complex formation capacity of the Arabidopsis MADS box family. SEPALLATA3 (SEP3) has been shown to mediate complex formation and, therefore, special attention is paid to this factor in this study. In total, 106 multimeric complexes were identified; in more than half of these at least one SEP protein was present. Besides the known complexes involved in determining floral organ identity, various complexes consisting of combinations of proteins known to play a role in floral organ identity specification, and flowering time determination were discovered. The capacity to form this latter type of complex suggests that homeotic factors play essential roles in down-regulation of the MADS box genes involved in floral timing in the flower via negative auto-regulatory loops. Furthermore, various novel complexes were identified that may be important for the direct regulation of the floral transition process. A subsequent detailed analysis of the APETALA3, PISTILLATA, and SEP3 proteins in living plant cells suggests the formation of a multimeric complex in vivo. Overall, these results provide strong indications that higher-order complex formation is a general and essential molecular mechanism for plant MADS box protein functioning and attribute a pivotal role to the SEP3 'glue' protein in mediating multimerization.
261 citations
Cited by
More filters
••
13 Aug 2016TL;DR: Node2vec as mentioned in this paper learns a mapping of nodes to a low-dimensional space of features that maximizes the likelihood of preserving network neighborhoods of nodes by using a biased random walk procedure.
Abstract: Prediction tasks over nodes and edges in networks require careful effort in engineering features used by learning algorithms. Recent research in the broader field of representation learning has led to significant progress in automating prediction by learning the features themselves. However, present feature learning approaches are not expressive enough to capture the diversity of connectivity patterns observed in networks. Here we propose node2vec, an algorithmic framework for learning continuous feature representations for nodes in networks. In node2vec, we learn a mapping of nodes to a low-dimensional space of features that maximizes the likelihood of preserving network neighborhoods of nodes. We define a flexible notion of a node's network neighborhood and design a biased random walk procedure, which efficiently explores diverse neighborhoods. Our algorithm generalizes prior work which is based on rigid notions of network neighborhoods, and we argue that the added flexibility in exploring neighborhoods is the key to learning richer representations. We demonstrate the efficacy of node2vec over existing state-of-the-art techniques on multi-label classification and link prediction in several real-world networks from diverse domains. Taken together, our work represents a new way for efficiently learning state-of-the-art task-independent representations in complex networks.
7,072 citations
••
TL;DR: The new NCBI's Prokaryotic Genome Annotation Pipeline (PGAP) relies less on sequence similarity when confident comparative data are available, while it relies more on statistical predictions in the absence of external evidence.
Abstract: Recent technological advances have opened unprecedented opportunities for large-scale sequencing and analysis of populations of pathogenic species in disease outbreaks, as well as for large-scale diversity studies aimed at expanding our knowledge across the whole domain of prokaryotes. To meet the challenge of timely interpretation of structure, function and meaning of this vast genetic information, a comprehensive approach to automatic genome annotation is critically needed. In collaboration with Georgia Tech, NCBI has developed a new approach to genome annotation that combines alignment based methods with methods of predicting protein-coding and RNA genes and other functional elements directly from sequence. A new gene finding tool, GeneMarkS+, uses the combined evidence of protein and RNA placement by homology as an initial map of annotation to generate and modify ab initio gene predictions across the whole genome. Thus, the new NCBI's Prokaryotic Genome Annotation Pipeline (PGAP) relies more on sequence similarity when confident comparative data are available, while it relies more on statistical predictions in the absence of external evidence. The pipeline provides a framework for generation and analysis of annotation on the full breadth of prokaryotic taxonomy. For additional information on PGAP see https://www.ncbi.nlm.nih.gov/genome/annotation_prok/ and the NCBI Handbook, https://www.ncbi.nlm.nih.gov/books/NBK174280/.
3,902 citations
01 Jan 2011
TL;DR: The sheer volume and scope of data posed by this flood of data pose a significant challenge to the development of efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data.
Abstract: Rapid improvements in sequencing and array-based platforms are resulting in a flood of diverse genome-wide data, including data from exome and whole-genome sequencing, epigenetic surveys, expression profiling of coding and noncoding RNAs, single nucleotide polymorphism (SNP) and copy number profiling, and functional assays. Analysis of these large, diverse data sets holds the promise of a more comprehensive understanding of the genome and its relation to human disease. Experienced and knowledgeable human review is an essential component of this process, complementing computational approaches. This calls for efficient and intuitive visualization tools able to scale to very large data sets and to flexibly integrate multiple data types, including clinical data. However, the sheer volume and scope of data pose a significant challenge to the development of such tools.
2,187 citations
•
TL;DR: In node2vec, an algorithmic framework for learning continuous feature representations for nodes in networks, a flexible notion of a node's network neighborhood is defined and a biased random walk procedure is designed, which efficiently explores diverse neighborhoods.
Abstract: Prediction tasks over nodes and edges in networks require careful effort in engineering features used by learning algorithms. Recent research in the broader field of representation learning has led to significant progress in automating prediction by learning the features themselves. However, present feature learning approaches are not expressive enough to capture the diversity of connectivity patterns observed in networks. Here we propose node2vec, an algorithmic framework for learning continuous feature representations for nodes in networks. In node2vec, we learn a mapping of nodes to a low-dimensional space of features that maximizes the likelihood of preserving network neighborhoods of nodes. We define a flexible notion of a node's network neighborhood and design a biased random walk procedure, which efficiently explores diverse neighborhoods. Our algorithm generalizes prior work which is based on rigid notions of network neighborhoods, and we argue that the added flexibility in exploring neighborhoods is the key to learning richer representations. We demonstrate the efficacy of node2vec over existing state-of-the-art techniques on multi-label classification and link prediction in several real-world networks from diverse domains. Taken together, our work represents a new way for efficiently learning state-of-the-art task-independent representations in complex networks.
2,174 citations
••
TL;DR: Important new components of jasmonate signalling including its receptor were identified, providing deeper insight into the role ofJASMONATE signalling pathways in stress responses and development.
1,868 citations