scispace - formally typeset
Search or ask a question

Showing papers by "Albert-László Barabási published in 2018"


Journal ArticleDOI
02 Mar 2018-Science
TL;DR: The Science of Science (SciSci) as discussed by the authors provides a quantitative understanding of the interactions among scientific agents across diverse geographic and temporal scales, providing insights into the conditions underlying creativity and the genesis of scientific discovery, with the ultimate goal of developing tools and policies that have the potential to accelerate science.
Abstract: BACKGROUND The increasing availability of digital data on scholarly inputs and outputs—from research funding, productivity, and collaboration to paper citations and scientist mobility—offers unprecedented opportunities to explore the structure and evolution of science. The science of science (SciSci) offers a quantitative understanding of the interactions among scientific agents across diverse geographic and temporal scales: It provides insights into the conditions underlying creativity and the genesis of scientific discovery, with the ultimate goal of developing tools and policies that have the potential to accelerate science. In the past decade, SciSci has benefited from an influx of natural, computational, and social scientists who together have developed big data–based capabilities for empirical analysis and generative modeling that capture the unfolding of science, its institutions, and its workforce. The value proposition of SciSci is that with a deeper understanding of the factors that drive successful science, we can more effectively address environmental, societal, and technological problems. ADVANCES Science can be described as a complex, self-organizing, and evolving network of scholars, projects, papers, and ideas. This representation has unveiled patterns characterizing the emergence of new scientific fields through the study of collaboration networks and the path of impactful discoveries through the study of citation networks. Microscopic models have traced the dynamics of citation accumulation, allowing us to predict the future impact of individual papers. SciSci has revealed choices and trade-offs that scientists face as they advance both their own careers and the scientific horizon. For example, measurements indicate that scholars are risk-averse, preferring to study topics related to their current expertise, which constrains the potential of future discoveries. Those willing to break this pattern engage in riskier careers but become more likely to make major breakthroughs. Overall, the highest-impact science is grounded in conventional combinations of prior work but features unusual combinations. Last, as the locus of research is shifting into teams, SciSci is increasingly focused on the impact of team research, finding that small teams tend to disrupt science and technology with new ideas drawing on older and less prevalent ones. In contrast, large teams tend to develop recent, popular ideas, obtaining high, but often short-lived, impact. OUTLOOK SciSci offers a deep quantitative understanding of the relational structure between scientists, institutions, and ideas because it facilitates the identification of fundamental mechanisms responsible for scientific discovery. These interdisciplinary data-driven efforts complement contributions from related fields such as scientometrics and the economics and sociology of science. Although SciSci seeks long-standing universal laws and mechanisms that apply across various fields of science, a fundamental challenge going forward is accounting for undeniable differences in culture, habits, and preferences between different fields and countries. This variation makes some cross-domain insights difficult to appreciate and associated science policies difficult to implement. The differences among the questions, data, and skills specific to each discipline suggest that further insights can be gained from domain-specific SciSci studies, which model and identify opportunities adapted to the needs of individual research fields.

630 citations


Journal ArticleDOI
TL;DR: It is demonstrated that a unique integration of protein-protein interaction network proximity and large-scale patient-level longitudinal data complemented by mechanistic in vitro studies can facilitate drug repurposing.
Abstract: Here we identify hundreds of new drug-disease associations for over 900 FDA-approved drugs by quantifying the network proximity of disease genes and drug targets in the human (protein–protein) interactome. We select four network-predicted associations to test their causal relationship using large healthcare databases with over 220 million patients and state-of-the-art pharmacoepidemiologic analyses. Using propensity score matching, two of four network-based predictions are validated in patient-level data: carbamazepine is associated with an increased risk of coronary artery disease (CAD) [hazard ratio (HR) 1.56, 95% confidence interval (CI) 1.12–2.18], and hydroxychloroquine is associated with a decreased risk of CAD (HR 0.76, 95% CI 0.59–0.97). In vitro experiments show that hydroxychloroquine attenuates pro-inflammatory cytokine-mediated activation in human aortic endothelial cells, supporting mechanistically its potential beneficial effect in CAD. In summary, we demonstrate that a unique integration of protein-protein interaction network proximity and large-scale patient-level longitudinal data complemented by mechanistic in vitro studies can facilitate drug repurposing.

352 citations


Journal ArticleDOI
TL;DR: Using biochemical networks with experimentally measured kinetic parameters, it is shown that a knowledge of the network topology offers 65–80% accuracy in predicting the impact of perturbation patterns, which could open new avenues in modeling drug action and in identifying drug targets relying on the human interactome only.
Abstract: High-throughput technologies, offering an unprecedented wealth of quantitative data underlying the makeup of living systems, are changing biology. Notably, the systematic mapping of the relationships between biochemical entities has fueled the rapid development of network biology, offering a suitable framework to describe disease phenotypes and predict potential drug targets. However, our ability to develop accurate dynamical models remains limited, due in part to the limited knowledge of the kinetic parameters underlying these interactions. Here, we explore the degree to which we can make reasonably accurate predictions in the absence of the kinetic parameters. We find that simple dynamically agnostic models are sufficient to recover the strength and sign of the biochemical perturbation patterns observed in 87 biological models for which the underlying kinetics are known. Surprisingly, a simple distance-based model achieves 65% accuracy. We show that this predictive power is robust to topological and kinetic parameter perturbations, and we identify key network properties that can increase up to 80% the recovery rate of the true perturbation patterns. We validate our approach using experimental data on the chemotactic pathway in bacteria, finding that a network model of perturbation spreading predicts with ∼80% accuracy the directionality of gene expression and phenotype changes in knock-out and overproduction experiments. These findings show that the steady advances in mapping out the topology of biochemical interaction networks opens avenues for accurate perturbation spread modeling, with direct implications for medicine and drug development.

175 citations


Journal ArticleDOI
16 Nov 2018-Science
TL;DR: An extensive record of exhibition and auction data is used to study and model the career trajectory of individual artists relative to a network of galleries and museums, finding a lock-in effect among highly reputed artists who started their career in high-prestige institutions and a long struggle for access to elite institutions among those who start their career at the network periphery.
Abstract: In areas of human activity where performance is difficult to quantify in an objective fashion, reputation and networks of influence play a key role in determining access to resources and rewards. To understand the role of these factors, we reconstructed the exhibition history of half a million artists, mapping out the coexhibition network that captures the movement of art between institutions. Centrality within this network captured institutional prestige, allowing us to explore the career trajectory of individual artists in terms of access to coveted institutions. Early access to prestigious central institutions offered life-long access to high-prestige venues and reduced dropout rate. By contrast, starting at the network periphery resulted in a high dropout rate, limiting access to central institutions. A Markov model predicts the career trajectory of individual artists and documents the strong path and history dependence of valuation in art.

121 citations


Posted ContentDOI
02 Mar 2018-bioRxiv
TL;DR: A fundamental organizing principle of biological networks is unveiled, L3, that predicts yet uncovered protein interactions based on paths of length three (L3) and is expected to have a broad applicability, enabling to better understand the emergence of biological function under both healthy and pathological conditions.
Abstract: As biological function emerges through interactions between a cell9s molecular constituents, understanding cellular mechanisms requires us to catalogue all physical interactions between proteins. Despite spectacular advances in high-throughput mapping, the number of missing human protein-protein interactions (PPIs) continues to exceed the experimentally documented interactions. Computational tools that exploit structural, sequence or network topology information are increasingly used to fill in the gap, using the patterns of the already known interactome to predict undetected, yet biologically relevant interactions. Such network-based link prediction tools rely on the Triadic Closure Principle (TCP), stating that two proteins likely interact if they share multiple interaction partners. TCP is rooted in social network analysis, namely the observation that the more common friends two individuals have, the more likely that they know each other. Here, we offer direct empirical evidence across multiple datasets and organisms that, despite its dominant use in biological link prediction, TCP is not valid for most protein pairs. We show that this failure is fundamental - TCP violates both structural constraints and evolutionary processes. This understanding allows us to propose a link prediction principle, consistent with both structural and evolutionary arguments, that predicts yet uncovered protein interactions based on paths of length three (L3). A systematic computational cross-validation shows that the L3 principle significantly outperforms existing link prediction methods. To experimentally test the L3 predictions, we perform both large-scale high-throughput and pairwise tests, finding that the predicted links test positively at the same rate as previously known interactions, suggesting that most (if not all) predicted interactions are real. Combining L3 predictions with experimental tests provided new interaction partners of FAM161A, a protein linked to retinitis pigmentosa, offering novel insights into the molecular mechanisms that lead to the disease. Because L3 is rooted in a fundamental biological principle, we expect it to have a broad applicability, enabling us to better understand the emergence of biological function under both healthy and pathological conditions.

118 citations


Journal ArticleDOI
TL;DR: Light is shed on the role played by experience in publishing within specific scientific journals, on the paths toward acquiring the necessary experience and expertise, and on the skills required to publish in prestigious venues, by quantifying the chaperone effect.
Abstract: Experience plays a critical role in crafting high-impact scientific work. This is particularly evident in top multidisciplinary journals, where a scientist is unlikely to appear as senior author if he or she has not previously published within the same journal. Here, we develop a quantitative understanding of author order by quantifying this “chaperone effect,” capturing how scientists transition into senior status within a particular publication venue. We illustrate that the chaperone effect has a different magnitude for journals in different branches of science, being more pronounced in medical and biological sciences and weaker in natural sciences. Finally, we show that in the case of high-impact venues, the chaperone effect has significant implications, specifically resulting in a higher average impact relative to papers authored by new principal investigators (PIs). Our findings shed light on the role played by experience in publishing within specific scientific journals, on the paths toward acquiring the necessary experience and expertise, and on the skills required to publish in prestigious venues.

82 citations


Journal ArticleDOI
28 Nov 2018-Nature
TL;DR: A modelling framework is presented to determine the optimal layout and physical properties of networks in which the nodes and links have physical sizes and intersections between components is prohibited, and shows that networks with large numbers of nodes will ultimately exist in the strongly interacting regime.
Abstract: In many physical networks, including neurons in the brain1,2, three-dimensional integrated circuits3 and underground hyphal networks4, the nodes and links are physical objects that cannot intersect or overlap with each other. To take this into account, non-crossing conditions can be imposed to constrain the geometry of networks, which consequently affects how they form, evolve and function. However, these constraints are not included in the theoretical frameworks that are currently used to characterize real networks5-7. Most tools for laying out networks are variants of the force-directed layout algorithm8,9-which assumes dimensionless nodes and links-and are therefore unable to reveal the geometry of densely packed physical networks. Here we develop a modelling framework that accounts for the physical sizes of nodes and links, allowing us to explore how non-crossing conditions affect the geometry of a network. For small link thicknesses, we observe a weakly interacting regime in which link crossings are avoided via local link rearrangements, without altering the overall geometry of the layout compared to the force-directed layout. Once the link thickness exceeds a threshold, a strongly interacting regime emerges in which multiple geometric quantities, such as the total link length and the link curvature, scale with the link thickness. We show that the crossover between the two regimes is driven by the non-crossing condition, which allows us to derive the transition point analytically and show that networks with large numbers of nodes will ultimately exist in the strongly interacting regime. We also find that networks in the weakly interacting regime display a solid-like response to stress, whereas in the strongly interacting regime they behave in a gel-like fashion. Networks in the weakly interacting regime are amenable to 3D printing and so can be used to visualize network geometry, and the strongly interacting regime provides insights into the scaling of the sizes of densely packed mammalian brains.

46 citations


Journal ArticleDOI
TL;DR: There is a universal pattern to book sales and a statistical model is introduced to explain the time evolution of sales that reproduces the entire sales trajectory of a book but also predicts the total number of copies it will sell in its lifetime, based on its early sales numbers.
Abstract: Reading remains the preferred leisure activity for most individuals, continuing to offer a unique path to knowledge and learning. As such, books remain an important cultural product, consumed widely. Yet, while over 3 million books are published each year, very few are read widely and less than 500 make it to the New York Times bestseller lists. And once there, only a handful of authors can command the lists for more than a few weeks. Here we bring a big data approach to book success by investigating the properties and sales trajectories of bestsellers. We find that there are seasonal patterns to book sales with more books being sold during holidays, and even among bestsellers, fiction books sell more copies than nonfiction books. General fiction and biographies make the list more often than any other genre books, and the higher a book’s initial place in the rankings, the longer the book stays on the list as well. Looking at patterns characterizing authors, we find that fiction writers are more productive than nonfiction writers, commonly achieving bestseller status with multiple books. Additionally, there is no gender disparity among bestselling fiction authors but nonfiction, most bestsellers are written by male authors. Finally we find that there is a universal pattern to book sales. Using this universality we introduce a statistical model to explain the time evolution of sales. This model not only reproduces the entire sales trajectory of a book but also predicts the total number of copies it will sell in its lifetime, based on its early sales numbers. The analysis of the bestseller characteristics and the discovery of the universal nature of sales patterns with its driving forces are crucial for our understanding of the book industry, and more generally, of how we as a society interact with cultural products.

37 citations


Journal ArticleDOI
03 Jul 2018
TL;DR: A tissue-specific gene regulatory network derived from human pancreatic islets is constructed and the genes that control the network are determined, using the concept of “control centrality.” Pathways with high control centrality were significantly more associated with Type 2 Diabetes (T2D), and harbored loci that significantly affected gene expression related to glucose levels.
Abstract: Probing the dynamic control features of biological networks represents a new frontier in capturing the dysregulated pathways in complex diseases. Here, using patient samples obtained from a pancreatic islet transplantation program, we constructed a tissue-specific gene regulatory network and used the control centrality (Cc) concept to identify the high control centrality (HiCc) pathways, which might serve as key pathobiological pathways for Type 2 Diabetes (T2D). We found that HiCc pathway genes were significantly enriched with modest GWAS p-values in the DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) study. We identified variants regulating gene expression (expression quantitative loci, eQTL) of HiCc pathway genes in islet samples. These eQTL genes showed higher levels of differential expression compared to non-eQTL genes in low, medium, and high glucose concentrations in rat islets. Among genes with highly significant eQTL evidence, NFATC4 belonged to four HiCc pathways. We asked if the expressions of T2D-associated candidate genes from GWAS and literature are regulated by Nfatc4 in rat islets. Extensive in vitro silencing of Nfatc4 in rat islet cells displayed reduced expression of 16, and increased expression of four putative downstream T2D genes. Overall, our approach uncovers the mechanistic connection of NFATC4 with downstream targets including a previously unknown one, TCF7L2, and establishes the HiCc pathways’ relationship to T2D.

25 citations


Journal ArticleDOI
TL;DR: Yan et al. as mentioned in this paper provided a thorough review of the network control framework as applied to Caenorhabditis elegans, and further presented the Python code to enable exploration of control principles in a manner specific to this prototypical organism.
Abstract: Control is essential to the functioning of any neural system. Indeed, under healthy conditions the brain must be able to continuously maintain a tight functional control between the system's inputs and outputs. One may therefore hypothesize that the brain's wiring is predetermined by the need to maintain control across multiple scales, maintaining the stability of key internal variables, and producing behaviour in response to environmental cues. Recent advances in network control have offered a powerful mathematical framework to explore the structure-function relationship in complex biological, social and technological networks, and are beginning to yield important and precise insights on neuronal systems. The network control paradigm promises a predictive, quantitative framework to unite the distinct datasets necessary to fully describe a nervous system, and provide mechanistic explanations for the observed structure and function relationships. Here, we provide a thorough review of the network control framework as applied to Caenorhabditis elegans (Yan et al. 2017 Nature550, 519-523. (doi:10.1038/nature24056)), in the style of Frequently Asked Questions. We present the theoretical, computational and experimental aspects of network control, and discuss its current capabilities and limitations, together with the next likely advances and improvements. We further present the Python code to enable exploration of control principles in a manner specific to this prototypical organism.This article is part of a discussion meeting issue 'Connectome to behaviour: modelling C. elegans at cellular resolution'.

24 citations


Journal ArticleDOI
TL;DR: The online footprint of the 50 US state governments reflects their close embeddedness with state economies and suggests that other factors widely hypothesized to influence government play more limited roles, including location and income.
Abstract: Governments in modern societies undertake an array of complex functions that shape politics and economics, individual and group behavior, and the natural, social, and built environment. How are governments structured to execute these diverse responsibilities? How do those structures vary, and what explains the differences? To examine these longstanding questions, we develop a technique for mapping Internet "footprint" of government with network science methods. We use this approach to describe and analyze the diversity in functional scale and structure among the 50 US state governments reflected in the webpages and links they have created online: 32.5 million webpages and 110 million hyperlinks among 47,631 agencies. We first verify that this extensive online footprint systematically reflects known characteristics: 50 hierarchically organized networks of state agencies that scale with population and are specialized around easily identifiable functions in accordance with legal mandates. We also find that the footprint reflects extensive diversity among these state functional hierarchies. We hypothesize that this variation should reflect, among other factors, state income, economic structure, ideology, and location. We find that government structures are most strongly associated with state economic structures, with location and income playing more limited roles. Voters' recent ideological preferences about the proper roles and extent of government are not significantly associated with the scale and structure of their state governments as reflected online. We conclude that the online footprint of governments offers a broad and comprehensive window on how they are structured that can help deepen understanding of those structures.

Posted Content
TL;DR: A thorough review of the network control framework as applied to Caenorhabditis elegans is provided, and the Python code is presented to enable exploration of control principles in a manner specific to this prototypical organism.
Abstract: Control is essential to the functioning of any neural system. Indeed, under healthy conditions the brain must be able to continuously maintain a tight functional control between the system's inputs and outputs. One may therefore hypothesise that the brain's wiring is predetermined by the need to maintain control across multiple scales, maintaining the stability of key internal variables, and producing behaviour in response to environmental cues. Recent advances in network control have offered a powerful mathematical framework to explore the structure-function relationship in complex biological, social, and technological networks, and are beginning to yield important and precise insights for neuronal systems. The network control paradigm promises a predictive, quantitative framework to unite the distinct datasets necessary to fully describe a nervous system, and provide mechanistic explanations for the observed structure and function relationships. Here, we provide a thorough review of the network control framework as applied to C. elegans, in the style of a FAQ. We present the theoretical, computational, and experimental aspects of network control, and discuss its current capabilities and limitations, together with the next likely advances and improvements. We further present the Python code to enable exploration of control principles in a manner specific to this prototypical organism.

Posted ContentDOI
17 Jun 2018-bioRxiv
TL;DR: These findings show that the steady advances in mapping out the topology of biochemical interaction networks opens avenues for accurate perturbation spread modeling, with direct implications for medicine and drug development.
Abstract: High-throughput technologies, offering unprecedented wealth of quantitative data underlying the makeup of living systems, are changing biology. Notably, the systematic mapping of the relationships between biochemical entities has fueled the rapid development of network biology, offering a suitable framework to describe disease phenotypes and predict potential drug targets. Yet, our ability to develop accurate dynamical models remains limited, due in part to the limited knowledge of the kinetic parameters underlying these interactions. Here, we explore the degree to which we can make reasonably accurate predictions in the absence of the kinetic parameters. We find that simple dynamically agnostic models are sufficient to recover the strength and sign of the biochemical perturbation patterns observed in 87 biological models for which the underlying kinetics is known. Surprisingly, a simple distance-based model achieves 65% accuracy. We show that this predictive power is robust to topological and kinetic parameters perturbations, and we identify key network properties that can increase up to 80% the recovery rate of the true perturbation patterns. We validate our approach using experimental data on the chemotactic pathway in bacteria, finding that a network model of perturbation spreading predicts with ~80% accuracy the directionality of gene expression and phenotype changes in knock-out and overproduction experiments. These findings show that the steady advances in mapping out the topology of biochemical interaction networks opens avenues for accurate perturbation spread modeling, with direct implications for medicine and drug development.

Posted Content
TL;DR: This work derives several key network characteristics of spatial networks, from the analytical form of the degree distribution to path lengths and local clustering, and predicts the existence of two distinct phases, each governed by a different dynamical equation, with distinct testable predictions.
Abstract: Most social, technological and biological networks are embedded in a finite dimensional space, and the distance between two nodes influences the likelihood that they link to each other. Indeed, in social systems, the chance that two individuals know each other drops rapidly with the distance between them; in the cell, proteins predominantly interact with proteins in the same cellular compartment; in the brain, neurons mainly link to nearby neurons. Most modeling frameworks that aim to capture the empirically observed degree distributions tend to ignore these spatial constraints. In contrast, models that account for the role of the physical distance often predict bounded degree distributions, in disagreement with the empirical data. Here we address a long-standing gap in the spatial network literature by deriving several key network characteristics of spatial networks, from the analytical form of the degree distribution to path lengths and local clustering. The mathematically exact results predict the existence of two distinct phases, each governed by a different dynamical equation, with distinct testable predictions. We use empirical data to offer direct evidence for the practical relevance of each of these phases in real networks, helping better characterize the properties of spatial networks.

Patent
26 Dec 2018
TL;DR: In this article, a pre-processor can determine representations of comparative intrinsic characteristics of the products based on the representations of characteristics of products and representations of corresponding comparative extrinsic characteristics.
Abstract: Systems and methods are disclosed for predicting a product's ( e.g., a book's) performance prior to its availability. An example embodiment is a system for machine learning classification that includes representations of characteristics of products, a pre-processor, and a machine learning classifier. The pre-processor can determine (i) representations of comparative intrinsic characteristics of the products based on the representations of characteristics of products and (ii) representations of corresponding comparative extrinsic characteristics of the products. The pre-processor can generate a data structure representing relationships between the comparative intrinsic characteristics and the comparative extrinsic characteristics. The machine learning classifier is trained with the data structure. The classifier can return representations of comparative extrinsic characteristics in response to given comparative intrinsic characteristics. A disambiguator can rank a plurality of intervals between the extrinsic characteristics for a plurality of other products and determine an extrinsic characteristic for the given product based on the ranking.