scispace - formally typeset
Search or ask a question
Journal

arXiv: Data Structures and Algorithms 

About: arXiv: Data Structures and Algorithms is an academic journal. The journal publishes majorly in the area(s): Time complexity & Approximation algorithm. Over the lifetime, 11914 publications have been published receiving 96051 citations.


Papers
More filters
Posted Content
TL;DR: In this article, the authors employ approximation algorithms for the graph partitioning problem to characterize as a function of size the statistical and structural properties of partitions of graphs that could plausibly be interpreted as communities.
Abstract: A large body of work has been devoted to defining and identifying clusters or communities in social and information networks. We explore from a novel perspective several questions related to identifying meaningful communities in large social and information networks, and we come to several striking conclusions. We employ approximation algorithms for the graph partitioning problem to characterize as a function of size the statistical and structural properties of partitions of graphs that could plausibly be interpreted as communities. In particular, we define the network community profile plot, which characterizes the "best" possible community--according to the conductance measure--over a wide range of size scales. We study over 100 large real-world social and information networks. Our results suggest a significantly more refined picture of community structure in large networks than has been appreciated previously. In particular, we observe tight communities that are barely connected to the rest of the network at very small size scales; and communities of larger size scales gradually "blend into" the expander-like core of the network and thus become less "community-like." This behavior is not explained, even at a qualitative level, by any of the commonly-used network generation models. Moreover, it is exactly the opposite of what one would expect based on intuition from expander graphs, low-dimensional or manifold-like graphs, and from small social networks that have served as testbeds of community detection algorithms. We have found that a generative graph model, in which new edges are added via an iterative "forest fire" burning process, is able to produce graphs exhibiting a network community profile plot similar to what we observe in our network datasets.

1,555 citations

Posted Content
TL;DR: This tutorial describes different graph Laplacians and their basic properties, present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches.
Abstract: In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm. On the first glance spectral clustering appears slightly mysterious, and it is not obvious to see why it works at all and what it really does. The goal of this tutorial is to give some intuition on those questions. We describe different graph Laplacians and their basic properties, present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches. Advantages and disadvantages of the different spectral clustering algorithms are discussed.

1,160 citations

Posted Content
TL;DR: In this paper, the authors explore a range of network community detection methods in order to compare them and to understand their relative performance and the systematic biases in the clusters they identify, and examine several different classes of approximation algorithms that aim to optimize such objective functions.
Abstract: Detecting clusters or communities in large real-world graphs such as large social or information networks is a problem of considerable interest. In practice, one typically chooses an objective function that captures the intuition of a network cluster as set of nodes with better internal connectivity than external connectivity, and then one applies approximation algorithms or heuristics to extract sets of nodes that are related to the objective function and that "look like" good communities for the application of interest. In this paper, we explore a range of network community detection methods in order to compare them and to understand their relative performance and the systematic biases in the clusters they identify. We evaluate several common objective functions that are used to formalize the notion of a network community, and we examine several different classes of approximation algorithms that aim to optimize such objective functions. In addition, rather than simply fixing an objective and asking for an approximation to the best cluster of any size, we consider a size-resolved version of the optimization problem. Considering community quality as a function of its size provides a much finer lens with which to examine community detection algorithms, since objective functions and approximation algorithms often have non-obvious size-dependent behavior.

950 citations

Proceedings ArticleDOI
TL;DR: In this paper, the authors present a method to analyze the powers of a given trilinear form (a special kind of algebraic constructions also called a tensor) and obtain upper bounds on the asymptotic complexity of matrix multiplication.
Abstract: This paper presents a method to analyze the powers of a given trilinear form (a special kind of algebraic constructions also called a tensor) and obtain upper bounds on the asymptotic complexity of matrix multiplication. Compared with existing approaches, this method is based on convex optimization, and thus has polynomial-time complexity. As an application, we use this method to study powers of the construction given by Coppersmith and Winograd [Journal of Symbolic Computation, 1990] and obtain the upper bound $\omega<2.3728639$ on the exponent of square matrix multiplication, which slightly improves the best known upper bound.

940 citations

Posted Content
TL;DR: This work develops an efficient approximation algorithm that scales to large datasets and finds provably near-optimal networks for tracing paths of diffusion and influence through networks and inferring the networks over which contagions propagate.
Abstract: Information diffusion and virus propagation are fundamental processes taking place in networks. While it is often possible to directly observe when nodes become infected with a virus or adopt the information, observing individual transmissions (i.e., who infects whom, or who influences whom) is typically very difficult. Furthermore, in many applications, the underlying network over which the diffusions and propagations spread is actually unobserved. We tackle these challenges by developing a method for tracing paths of diffusion and influence through networks and inferring the networks over which contagions propagate. Given the times when nodes adopt pieces of information or become infected, we identify the optimal network that best explains the observed infection times. Since the optimization problem is NP-hard to solve exactly, we develop an efficient approximation algorithm that scales to large datasets and finds provably near-optimal networks. We demonstrate the effectiveness of our approach by tracing information diffusion in a set of 170 million blogs and news articles over a one year period to infer how information flows through the online media space. We find that the diffusion network of news for the top 1,000 media sites and blogs tends to have a core-periphery structure with a small set of core media sites that diffuse information to the rest of the Web. These sites tend to have stable circles of influence with more general news media sites acting as connectors between them.

915 citations

Network Information
Related Journals (5)
SIAM Journal on Computing
3.5K papers, 327.5K citations
93% related
Information Processing Letters
7.7K papers, 189.7K citations
90% related
Theoretical Computer Science
12.4K papers, 368.9K citations
90% related
Journal of Computer and System Sciences
2.7K papers, 161K citations
88% related
Discrete Applied Mathematics
9.1K papers, 178.6K citations
86% related
Performance
Metrics
No. of papers from the Journal in previous years
YearPapers
20211,096
20201,469
20191,343
20181,202
20171,051
20161,013