scispace - formally typeset
Search or ask a question

Showing papers in "Journal of Complex Networks in 2021"


Journal ArticleDOI
TL;DR: In this article, a network embedding algorithm that captures information about a node from the local distribution over node attributes around it, as observed over random walks following an approach similar to Skip-gram is presented.
Abstract: We present network embedding algorithms that capture information about a node from the local distribution over node attributes around it, as observed over random walks following an approach similar to Skip-gram. Observations from neighborhoods of different sizes are either pooled (AE) or encoded distinctly in a multi-scale approach (MUSAE). Capturing attribute-neighborhood relationships over multiple scales is useful for a diverse range of applications, including latent feature identification across disconnected networks with similar attributes. We prove theoretically that matrices of node-feature pointwise mutual information are implicitly factorized by the embeddings. Experiments show that our algorithms are robust, computationally efficient and outperform comparable models on social networks and web graphs.

103 citations


Journal ArticleDOI
TL;DR: In this paper, the authors studied the impact of human mobility networks on the COVID-19 onset in 203 different countries and found that air flights were the dominant mode of transportation while male and returning travellers were the main carriers.
Abstract: Human mobility networks are crucial for a better understanding and controlling the spread of epidemics. Here, we study the impact of human mobility networks on the COVID-19 onset in 203 different countries. We use exponential random graph models to perform an analysis of the country-to-country global spread of COVID-19. We find that most countries had similar levels of virus spreading, with only a few acting as the main global transmitters. Our evidence suggests that migration and tourism inflows increase the probability of COVID-19 case importations while controlling for contiguity, continent co-location and sharing a language. Moreover, we find that air flights were the dominant mode of transportation while male and returning travellers were the main carriers. In conclusion, a mix of mobility and geography factors predicts the COVID-19 global transmission from one country to another. These findings have implications for non-pharmaceutical public health interventions and the management of transborder human circulation.

32 citations


Journal ArticleDOI
TL;DR: A class of assortativity coefficients is proposed that capture the assortative characteristics and structure of weighted and directed networks more precisely and reveals interesting insights that would not be obtained by using existing ones.
Abstract: Assortativity measures the tendency of a vertex in a network being connected by other vertexes with respect to some vertex-specific features. Classical assortativity coefficients are defined for unweighted and undirected networks with respect to vertex degree. We propose a class of assortativity coefficients that capture the assortative characteristics and structure of weighted and directed networks more precisely. The vertex-to-vertex strength correlation is used as an example, but the proposed measure can be applied to any pair of vertex-specific features. The effectiveness of the proposed measure is assessed through extensive simulations based on prevalent random network models in comparison with existing assortativity measures. In application World Input-Ouput Networks,the new measures reveal interesting insights that would not be obtained by using existing ones. An implementation is publicly available in a R package "wdnet".

16 citations


Journal ArticleDOI
TL;DR: In this paper, the authors studied the effect of symmetries of a hypergraph on the spectrum of the Laplacian in the case where each vertex has a real coefficient.
Abstract: Chemical hypergraphs and their associated normalized Laplace operators are generalized and studied in the case where each vertex--hyperedge incidence has a real coefficient. We systematically study the effect of symmetries of a hypergraph on the spectrum of the Laplacian.

15 citations


Journal ArticleDOI
TL;DR: In this paper, the authors investigate the complexity of multivariate interactions by inferring a connection transitivity that includes all possible measures of path length for weighted graphs and demonstrate that the distance backbone is very small in large networks across domains ranging from air traffic to the human brain connectome, revealing that network robustness to attacks and failures seems to stem from surprisingly vast amounts of redundancy.
Abstract: Redundancy needs more precise characterization as it is a major factor in the evolution and robustness of networks of multivariate interactions. We investigate the complexity of such interactions by inferring a connection transitivity that includes all possible measures of path length for weighted graphs. The result, without breaking the graph into smaller components, is a distance backbone subgraph sufficient to compute all shortest paths. This is important for understanding the dynamics of spread and communication phenomena in real-world networks. The general methodology we formally derive yields a principled graph reduction technique and provides a finer characterization of the triangular geometry of all edges -- those that contribute to shortest paths and those that do not but are involved in other network phenomena. We demonstrate that the distance backbone is very small in large networks across domains ranging from air traffic to the human brain connectome, revealing that network robustness to attacks and failures seems to stem from surprisingly vast amounts of redundancy.

12 citations


Journal ArticleDOI
TL;DR: It is found that characteristics as homophily, fitness and geographic distance are significant preferential attachment rules to modeling real networks.

11 citations


Journal ArticleDOI
TL;DR: What animal social networks are and main research themes where they are studied are described and an overview of the methods commonly used to study them are given to facilitate further interdisciplinary collaborations and further integration of these networks into the field of complex systems.
Abstract: Many animals live in societies where individuals frequently interact socially with each other. The social structures of these systems can be studied in depth by means of network analysis. A large number of studies on animal social networks in many species have in recent years been carried out in the biological research field of animal behaviour and have provided new insights into behaviour, ecology, and social evolution. This line of research is currently not so well connected to the field of complex systems as could be expected. The purpose of this paper is to provide an introduction to animal social networks for complex systems scientists and highlight areas of synergy. We believe that an increased integration of animal social networks with the interdisciplinary field of complex systems and networks would be beneficial for various reasons. Increased collaboration between researchers in this field and biologists studying animal social systems could be valuable in solving challenges that are of importance to animal social network research. Furthermore, animal social networks provide the opportunity to investigate hypotheses about complex systems across a range of natural real-world social systems. In this paper, we describe what animal social networks are and main research themes where they are studied; we give an overview of the methods commonly used to study animal social networks; we highlight challenges in the study of animal social networks where complex systems expertise may be particularly valuable; and we consider aspects of animal social networks that may be of particular interest to complex systems researchers. We hope that this will help to facilitate further interdisciplinary collaborations involving animal social networks, and further integration of these networks into the field of complex systems.

11 citations



Journal ArticleDOI
TL;DR: The mathematical theory of the friendship paradox is developed, both in general as well as for specific model networks, focusing not only on average behavior but also on variation about the average and using generating function methods to calculate full distributions of quantities of interest.
Abstract: The friendship paradox is the observation that the degrees of the neighbours of a node in any network will, on average, be greater than the degree of the node itself. In common parlance, your friends have more friends than you do. In this article, we develop the mathematical theory of the friendship paradox, both in general as well as for specific model networks, focusing not only on average behaviour but also on variation about the average and using generating function methods to calculate full distributions of quantities of interest. We compare the predictions of our theory with measurements on a large number of real-world network datasets and find remarkably good agreement. We also develop equivalent theory for the generalized friendship paradox, which compares characteristics of nodes other than degree to those of their neighbours.

10 citations


Journal ArticleDOI
TL;DR: In this article, the authors describe a fully Bayesian method for reconstructing networks from observational data in any format, even when the data contain substantial measurement error and when the nature and magnitude of that error is unknown.
Abstract: Most empirical studies of complex networks do not return direct, error-free measurements of network structure. Instead, they typically rely on indirect measurements that are often error-prone and unreliable. A fundamental problem in empirical network science is how to make the best possible estimates of network structure given such unreliable data. In this paper we describe a fully Bayesian method for reconstructing networks from observational data in any format, even when the data contain substantial measurement error and when the nature and magnitude of that error is unknown. The method is introduced through pedagogical case studies using real-world example networks, and specifically tailored to allow straightforward, computationally efficient implementation with a minimum of technical input. Computer code implementing the method is publicly available.

10 citations


Journal ArticleDOI
TL;DR: In this paper, the authors compare six algorithms for dynamic community detection in terms of instantaneous and longitudinal similarity with the planted ground truth, smoothness of dynamic partitions, and scalability.
Abstract: Many algorithms have been proposed in the last ten years for the discovery of dynamic communities. However, these methods are seldom compared between themselves. In this article, we propose a generator of dynamic graphs with planted evolving community structure, as a benchmark to compare and evaluate such algorithms. Unlike previously proposed benchmarks, it is able to specify any desired evolving community structure through a descriptive language, and then to generate the corresponding progressively evolving network. We empirically evaluate six existing algorithms for dynamic community detection in terms of instantaneous and longitudinal similarity with the planted ground truth, smoothness of dynamic partitions, and scalability. We notably observe different types of weaknesses depending on their approach to ensure smoothness, namely Glitches, Oversimplification and Identity loss. Although no method arises as a clear winner, we observe clear differences between methods, and we identified the fastest, those yielding the most smoothed or the most accurate solutions at each step.

Journal ArticleDOI
TL;DR: It is illustrated that hypergraphs can be more suitable than pairwise graphs for the analysis of multiprotein complex data.
Abstract: Protein-protein interactions are crucial in many biological pathways and facilitate cellular function. Investigating these interactions as a graph of pairwise interactions can help to gain a systemic understanding of cellular processes. It is known, however, that proteins interact with each other not exclusively in pairs but also in polyadic interactions and they can form multiprotein complexes, which are stable interactions between multiple proteins. In this manuscript, we use hypergraphs to investigate multiprotein complex data. We investigate two random null models to test which hypergraph properties occur as a consequence of constraints, such as the size and the number of multiprotein complexes. We find that assortativity, the number of connected components, and clustering differ from the data to these null models. Our main finding is that projecting a hypergraph of polyadic interactions onto a graph of pairwise interactions leads to the identification of different proteins as hubs than the hypergraph. We find in our data set that the hypergraph degree is a more accurate predictor for gene-essentiality than the degree in the pairwise graph. We find that analysing a hypergraph as pairwise graph drastically changes the distribution of the local clustering coefficient. Furthermore, using a pairwise interaction representing multiprotein complex data may lead to a spurious hierarchical structure, which is not observed in the hypergraph. Hence, we illustrate that hypergraphs can be more suitable than pairwise graphs for the analysis of multiprotein complex data.


Journal ArticleDOI
TL;DR: In this article, the authors applied relational hyperevent models on 19,713 individuals with 13,377 infection ties to determine to what degree the disease spread is affected by age whilst controlling for other covariate and human-to-human transmission network effects.
Abstract: We analyse officially procured data detailing the COVID-19 transmission in Romania's capital Bucharest between 1st August and 31st October 2020. We apply relational hyperevent models on 19,713 individuals with 13,377 infection ties to determine to what degree the disease spread is affected by age whilst controlling for other covariate and human-to-human transmission network effects. We find that positive cases are more likely to nominate alters of similar age as their sources of infection, thus providing evidence for age homophily. We also show that the relative infection risk is negatively associated with the age of peers, such that the risk of infection increases as the average age of contacts decreases. Additionally, we find that adults between the ages 35 and 44 are pivotal in the transmission of the disease to other age groups. Our results may contribute to better controlling future COVID-19 waves, and they also point to the key age groups which may be essential for vaccination given their prominent role in the transmission of the virus.

Journal ArticleDOI
TL;DR: In this paper, a detailed analysis of Twitter-based information cascades is performed, and it is demonstrated that branching process hypotheses are approximately satisfied, using a branching process framework, models of agent-to-agent transmission are compared to conclude that a limited attention model better reproduces the relevant characteristics of the data than the more common independent cascade model.
Abstract: A detailed analysis of Twitter-based information cascades is performed, and it is demonstrated that branching process hypotheses are approximately satisfied. Using a branching process framework, models of agent-to-agent transmission are compared to conclude that a limited attention model better reproduces the relevant characteristics of the data than the more common independent cascade model. Existing and new analytical results for branching processes are shown to match well to the important statistical characteristics of the empirical information cascades, thus demonstrating the power of branching process descriptions for understanding social information spreading.

Journal ArticleDOI
TL;DR: This work briefly survey recent proposals that seek to capture in numerical terms the resilience and the robustness of a graph, and describes some of the numerous application areas for such characterizations.
Abstract: We briefly survey recent proposals that seek to capture in numerical terms the resilience and the robustness of a graph. After a brief introduction and the establishment of notation and terminology, we catalogue characterizations proposed in journal articles published within the last two decades. We then describe some of the numerous application areas for such characterizations. We experiment with implementations of numerous characteristics on several graph-generation models, after which we conclude with a discussion of open problems and future directions.

Journal ArticleDOI
TL;DR: A new collaboration network of artists from the online music streaming service Spotify is presented, and a critical change in the eigenvector centrality of artists is demonstrated, as low popularity artists are removed.
Abstract: The modern age of digital music access has increased the availability of data about music consumption and creation, facilitating the large-scale analysis of the complex networks that connect music together. Data about user streaming behaviour, and the musical collaboration networks are particularly important with new data-driven recommendation systems. Without thorough analysis, such collaboration graphs can lead to false or misleading conclusions. Here we present a new collaboration network of artists from the online music streaming service Spotify, and demonstrate a critical change in the eigenvector centrality of artists, as low popularity artists are removed. The critical change in centrality, from classical artists to rap artists, demonstrates deeper structural properties of the network. A Social Group Centrality model is presented to simulate this critical transition behaviour, and switching between dominant eigenvectors is observed. This model presents a novel investigation of the effect of popularity bias on how centrality and importance are measured, and provides a new tool for examining such flaws in networks.


Journal ArticleDOI
TL;DR: The authors analyzed and modeled the macroscopic structure of tags applied by users to annotate and catalog questions, using a collection of 168 Stack Exchange websites, and found striking similarity in tagging structure across these Stack Exchange communities, even though each community evolves independently (albeit under similar guidelines).
Abstract: Large Question-and-Answer (Q&A) platforms support diverse knowledge curation on the Web. While researchers have studied user behavior on the platforms in a variety of contexts, there is relatively little insight into important by-products of user behavior that also encode knowledge. Here, we analyze and model the macroscopic structure of tags applied by users to annotate and catalog questions, using a collection of 168 Stack Exchange websites. We find striking similarity in tagging structure across these Stack Exchange communities, even though each community evolves independently (albeit under similar guidelines). Using our empirical findings, we develop a simple generative model that creates random bipartite graphs of tags and questions. Our model accounts for the tag frequency distribution but does not explicitly account for co-tagging correlations. Even under these constraints, we demonstrate empirically and theoretically that our model can reproduce a number of statistical properties of the co-tagging graph that links tags appearing in the same post.

Journal ArticleDOI
TL;DR: In this paper, a link prediction method based on the hyperbolic geometry of the complex network is proposed to discover redundant links, and numerical simulations reveal its superiority than other common and recent link prediction-based methods used for network recovery, especially in the case of attacks based on edge betweenness strategy.
Abstract: Recovery of complex networks is an important issue that has been extensively used in various fields. Much work has been done to measure and improve the stability of complex networks during attacks. Recently, many studies have focused on the network recovery strategies after attack. In many real cases, link retrieval and recovery of critical infrastructures such as transmission network and telecommunications infrastructures are of particular importance and should be prioritized. For example, when a flood disrupts optical fibre communications in transmission networks and paralyzes the network, link retrieval corresponds to the recovery of fibre communications, so that the transmission network communication capacity can be restored at the earliest possible time. So, predicting the appropriate reserved links in a way that the network can be recovered at the lowest cost and fastest time after attacks or interruptions will be critical in a disaster. In this article, different kinds of attack strategies are provided and some retrieval strategies based on link prediction methods are proposed to recover the network after failure and attack. Beside that, a new link prediction method based on the hyperbolic geometry of the complex network is proposed to discover redundant links. The numerical simulations reveal its superiority than other common and recent link prediction-based methods used for network recovery, especially in the case of attacks based on edge betweenness strategy.

Journal ArticleDOI
TL;DR: In this article, a variety of common network measures and their ability to characterize various two-dimensional and three-dimensional spatial random-graph models and empirical 2D granular networks are examined.
Abstract: Various approaches and measures from network analysis have been applied to granular and particulate networks to gain insights into their structural, transport, failure-propagation and other systems-level properties. In this article, we examine a variety of common network measures and study their ability to characterize various two-dimensional and three-dimensional spatial random-graph models and empirical two-dimensional granular networks. We identify network measures that are able to distinguish between physically plausible and unphysical spatial network models. Our results also suggest that there are significant differences in the distributions of certain network measures in two and three dimensions, hinting at important differences that we also expect to arise in experimental granular networks.

Journal ArticleDOI
TL;DR: This paper proposes to extend the Pearson correlation coefficient to work on complex networks by defining a function that uses the topology of the network to return a correlation coefficient, and shows that the formulation is intuitive and returns the expected values in a number of scenarios.
Abstract: Complex networks are useful tools to understand propagation events like epidemics, word-of-mouth, adoption of habits, and innovations. Estimating the correlation between two processes happening on the same network is therefore an important problem with a number of applications. However, at present there is no way to do so: current methods either correlate a network with itself, a single process with the network structure, or calculate a network distance between two processes. In this paper, we propose to extend the Pearson correlation coefficient to work on complex networks. Given two vectors, we define a function that uses the topology of the network to return a correlation coefficient. We show that our formulation is intuitive and returns the expected values in a number of scenarios. We also demonstrate how the classical Pearson correlation coefficient is unable to do so. We conclude the paper with two case studies, showcasing how our network correlation can facilitate tasks in social network analysis and economics. We provide examples of how we could use our network correlation to infer user characteristics from their activities on social media; and relationships between industrial products, under some assumptions as to what should make two exporting countries similar.

Journal ArticleDOI
TL;DR: A method for the modelling of the propagation of text data in web site space among some groups by using a multivariate Hawkes process with a sparse structure is proposed and a hybrid method using a quasi-maximum likelihood estimator (QMLE) and a $L^1$-penalized QMLE is introduced.
Abstract: We propose a method for the modelling of the propagation of text data in web site space among some groups by using a multivariate Hawkes process with a sparse structure. For estimation, we introduced a hybrid method using a quasi-maximum likelihood estimator (QMLE) and a $L^1$-penalized QMLE. As a real example, we investigated posts on a Japanese web service about uncomfortable gender experiences, which we classified into 12 groups by age and sex, and we calculated the magnitude of the correlation between each group. In addition, we visualized the propagation structure of posts in each group by summarizing the results in a directed Hawkes graph and a heat map of time integrals of kernel functions.

Journal ArticleDOI
TL;DR: This paper proposes a ``divergence score'' that can be assign to various embeddings to distinguish good ones from bad ones and provides a tool for an unsupervised graph embedding comparison.
Abstract: Graph embedding is a transformation of vertices of a graph into set of vectors. Good embeddings should capture the graph topology, vertex-to-vertex relationship, and other relevant information about graphs, subgraphs, and vertices. If these objectives are achieved, they are meaningful, understandable, and compressed representations of networks. They also provide more options and tools for data scientists as machine learning on graphs is still quite limited. Finally, vector operations are simpler and faster than comparable operations on graphs. The main challenge is that one needs to make sure that embeddings well describe the properties of the graphs. In particular, the decision has to be made on the embedding dimensionality which highly impacts the quality of an embedding. As a result, selecting the best embedding is a challenging task and very often requires domain experts. In this paper, we propose a ``divergence score'' that can be assign to various embeddings to distinguish good ones from bad ones. This general framework provides a tool for an unsupervised graph embedding comparison. In order to achieve it, we needed to generalize the well-known Chung-Lu model to incorporate geometry which is interesting on its own rights. In order to test our framework, we did a number of experiments with synthetic networks as well as real-world networks, and various embedding algorithms.

Journal ArticleDOI
TL;DR: In this article, the authors investigate cognitive networks tied to key concepts of computational thinking provided by 159 high school students enrolled in a science curriculum and 59 researchers in complex systems and simulations, finding evidence of a crippled computational thinking mindset in students who acquire mathematical skills that are not channelled toward real-world discovery through coding.
Abstract: Computational thinking is a way of reasoning about the world in terms of data. This mindset channels number crunching toward an ambition to discover knowledge through logic, models and simulations. Here we show how computational cognitive science can be used to reconstruct and analyse the structure of computational thinking mindsets (forma mentis in Latin) through complex networks. As a case study, we investigate cognitive networks tied to key concepts of computational thinking provided by: (i) 159 high school students enrolled in a science curriculum and (ii) 59 researchers in complex systems and simulations. Researchers' reconstructed forma mentis highlighted a positive mindset about scientific modelling, semantically framing data and simulations as ways of discovering nature. Students correctly identified different aspects of logic reasoning but perceived "computation" as a distressing, anxiety-eliciting task, framed with math jargon and lacking links to real-world discovery. Students' mindsets around "data", "model" and "simulations" critically revealed no awareness of numerical modelling as a way for understanding the world. Our findings provide evidence of a crippled computational thinking mindset in students, who acquire mathematical skills that are not channelled toward real-world discovery through coding. This unlinked knowledge ends up being perceived as distressing number-crunching expertise with no relevant outcome. The virtuous mindset of researchers reported here indicates that computational thinking can be restored by training students specifically in coding, modelling and simulations in relation to discovering nature. Our approach opens innovative ways for quantifying computational thinking and enhancing its development through mindset reconstruction.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a null model based on statistical techniques to extract signed network backbones from intrinsically dense and unipartite weighted networks, and the proposed significance filter and vigor filter allow inferring edge signs.
Abstract: Networks provide useful tools for analyzing diverse complex systems from natural, social, and technological domains. Growing size and variety of data such as more nodes and links and associated weights, directions, and signs can provide accessory information. Link and weight abundance, on the other hand, results in denser networks with noisy, insignificant, or otherwise redundant data. Moreover, typical network analysis and visualization techniques presuppose sparsity and are not appropriate or scalable for dense and weighted networks. As a remedy, network backbone extraction methods aim to retain only the important links while preserving the useful and elucidative structure of the original networks for further analyses. Here, we provide the first methods for extracting signed network backbones from intrinsically dense unsigned unipartite weighted networks. Utilizing a null model based on statistical techniques, the proposed significance filter and vigor filter allow inferring edge signs. Empirical analysis on migration, voting, temporal interaction, and species similarity networks reveals that the proposed filters extract meaningful and sparse signed backbones while preserving the multiscale nature of the network. The resulting backbones exhibit characteristics typically associated with signed networks such as reciprocity, structural balance, and community structure. The developed tool is provided as a free, open-source software package.

Journal ArticleDOI
TL;DR: A detailed analysis of matches played in the sport of Snooker during the period 1968-2020 is used to calculate a directed and weighted dominance network based upon the corresponding results.
Abstract: A detailed analysis of matches played in the sport of Snooker during the period 1968-2020 is used to calculate a directed and weighted dominance network based upon the corresponding results. We consider a ranking procedure based upon the well-studied PageRank algorithm that incorporates details of not only the number of wins a player has had over their career but also the quality of opponent faced in these wins. Through this study we find that John Higgins is the highest performing Snooker player of all time with Ronnie O'Sullivan appearing in second place. We demonstrate how this approach can be applied across a variety of temporal periods in each of which we may identify the strongest player in the corresponding era. This procedure is then compared with more classical ranking schemes. Furthermore, a visualization tool known as the rank-clock is introduced to the sport which allows for immediate analysis of the career trajectory of individual competitors. These results further demonstrate the use of network science in the quantification of success within the field of sport.


Journal ArticleDOI
TL;DR: In this article, the authors study the space of optimal solutions of the correlation clustering problem, on a collection of synthetic complete graphs, and show empirically that under certain conditions, there can be many optimal partitions of a signed graph.
Abstract: In order to study real-world systems, many applied works model them through signed graphs, i.e. graphs whose edges are labeled as either positive or negative. Such a graph is considered as structurally balanced when it can be partitioned into a number of modules, such that positive (resp. negative) edges are located inside (resp. in-between) the modules. When it is not the case, authors look for the closest partition to such balance, a problem called Correlation Clustering (CC). Due to the complexity of the CC problem, the standard approach is to find a single optimal partition and stick to it, even if other optimal or high scoring solutions possibly exist. In this work, we study the space of optimal solutions of the CC problem, on a collection of synthetic complete graphs. We show empirically that under certain conditions, there can be many optimal partitions of a signed graph. Some of these are very different and thus provide distinct perspectives on the system, as illustrated on a small real-world graph. This is an important result, as it implies that one may have to find several, if not all, optimal solutions of the CC problem, in order to properly study the considered system.