scispace - formally typeset
Search or ask a question
Author

C. Seshadhri

Other affiliations: IBM, Sandia National Laboratories, University of California  ...read more
Bio: C. Seshadhri is an academic researcher from University of California, Santa Cruz. The author has contributed to research in topics: Degree (graph theory) & Property testing. The author has an hindex of 39, co-authored 198 publications receiving 5103 citations. Previous affiliations of C. Seshadhri include IBM & Sandia National Laboratories.


Papers
More filters
Journal ArticleDOI
TL;DR: The Block Two-Level Erdős-Rényi (BTER) model is proposed, and it is demonstrated that it accurately captures the observable properties of many real-world social networks.
Abstract: Community structure plays a significant role in the analysis of social networks and similar graphs, yet this structure is little understood and not well captured by most models. We formally define a community to be a subgraph that is internally highly connected and has no deeper substructure. We use tools of combinatorics to show that any such community must contain a dense Erd\ifmmode \mbox{\H{o}}\else \H{o}\fi{}s-R\'enyi (ER) subgraph. Based on mathematical arguments, we hypothesize that any graph with a heavy-tailed degree distribution and community structure must contain a scale-free collection of dense ER subgraphs. These theoretical observations corroborate well with empirical evidence. From this, we propose the Block Two-Level Erd\ifmmode \mbox{\H{o}}\else \H{o}\fi{}s-R\'enyi (BTER) model, and demonstrate that it accurately captures the observable properties of many real-world social networks.

292 citations

Proceedings ArticleDOI
12 Aug 2012
TL;DR: It is theoretically demonstrate that two commonly observed properties of social networks, heavy-tailed degree distributions and large clustering coefficients, imply the existence of vertex neighborhoods (also known as egonets) that are themselves good communities.
Abstract: The communities of a social network are sets of vertices with more connections inside the set than outside. We theoretically demonstrate that two commonly observed properties of social networks, heavy-tailed degree distributions and large clustering coefficients, imply the existence of vertex neighborhoods (also known as egonets) that are themselves good communities. We evaluate these neighborhood communities on a range of graphs. What we find is that the neighborhood communities can exhibit conductance scores that are as good as the Fiedler cut. Also, the conductance of neighborhood communities shows similar behavior as the network community profile computed with a personalized PageRank community detection method. Neighborhood communities give us a simple and powerful heuristic for speeding up local partitioning methods. Since finding good seeds for the PageRank clustering method is difficult, most approaches involve an expensive sweep over a great many starting vertices. We show how to use neighborhood communities to quickly generate a small set of seeds.

210 citations

Journal Article
TL;DR: A property tester is given that given a graph with degree bound d, an expansion bound �, and a parameter " > 0, accepts the graph with high probability if its expansion is more than�, and rejects it withhigh probability if it is "-far from agraph with expansion � 0 withdegree bound d".
Abstract: We consider the problem of testing graph expansion (either vertex or edge) in the bounded degree model [2]. We give a property tester that given a graph with degree bound d, an expansion bound �, and a parameter " > 0, accepts the graph with high probability if its expansion is more than �, and rejects it with high probability if it is "-far from a graph with expansion � 0 with degree bound d, where � 0 < � is a function of �. For edge expansion, we obtain � 0 = ( � 2 d ), and for vertex expansion, we obtain � 0 = ( � 2 d2 ). In either case, the algorithm runs in time

172 citations

Proceedings ArticleDOI
Elad Hazan1, C. Seshadhri1
14 Jun 2009
TL;DR: A different performance metric is proposed which strengthens the standard metric of regret and measures performance with respect to a changing comparator and can be applied to various learning scenarios, i.e. online portfolio selection, for which there are experimental results showing the advantage of adaptivity.
Abstract: We study online learning in an oblivious changing environment. The standard measure of regret bounds the difference between the cost of the online learner and the best decision in hindsight. Hence, regret minimizing algorithms tend to converge to the static best optimum, clearly a suboptimal behavior in changing environments. On the other hand, various metrics proposed to strengthen regret and allow for more dynamic algorithms produce inefficient algorithms.We propose a different performance metric which strengthens the standard metric of regret and measures performance with respect to a changing comparator. We then describe a series of data-streaming-based reductions which transform algorithms for minimizing (standard) regret into adaptive algorithms albeit incurring only poly-logarithmic computational overhead.Using this reduction, we obtain efficient low adaptive-regret algorithms for the problem of online convex optimization. This can be applied to various learning scenarios, i.e. online portfolio selection, for which we describe experimental results showing the advantage of adaptivity.

171 citations

Journal ArticleDOI
TL;DR: It is proposed that the proposed Block Two-Level Erdss-Renyi (BTER) model can be used as a graph generator for benchmarking purposes and provide idealized degree distributions and clustering coefficient profiles that can be tuned for user specifications.
Abstract: Network data is ubiquitous and growing, yet we lack realistic generative network models that can be calibrated to match real-world data. The recently proposed Block Two-Level Erdss-Renyi (BTER) model can be tuned to capture two fundamental properties: degree distribution and clustering coefficients. The latter is particularly important for reproducing graphs with community structure, such as social networks. In this paper, we compare BTER to other scalable models and show that it gives a better fit to real data. We provide a scalable implementation that requires only O(d_max) storage where d_max is the maximum number of neighbors for a single node. The generator is trivially parallelizable, and we show results for a Hadoop MapReduce implementation for a modeling a real-world web graph with over 4.6 billion edges. We propose that the BTER model can be used as a graph generator for benchmarking purposes and provide idealized degree distributions and clustering coefficient profiles that can be tuned for user specifications.

147 citations


Cited by
More filters
Journal ArticleDOI

[...]

08 Dec 2001-BMJ
TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.
Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

33,785 citations

Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

Proceedings ArticleDOI
22 Jan 2006
TL;DR: Some of the major results in random graphs and some of the more challenging open problems are reviewed, including those related to the WWW.
Abstract: We will review some of the major results in random graphs and some of the more challenging open problems. We will cover algorithmic and structural questions. We will touch on newer models, including those related to the WWW.

7,116 citations

Journal ArticleDOI
TL;DR: It is shown that the full set of hydromagnetic equations admit five more integrals, besides the energy integral, if dissipative processes are absent, which made it possible to formulate a variational principle for the force-free magnetic fields.
Abstract: where A represents the magnetic vector potential, is an integral of the hydromagnetic equations. This -integral made it possible to formulate a variational principle for the force-free magnetic fields. The integral expresses the fact that motions cannot transform a given field in an entirely arbitrary different field, if the conductivity of the medium isconsidered infinite. In this paper we shall show that the full set of hydromagnetic equations admit five more integrals, besides the energy integral, if dissipative processes are absent. These integrals, as we shall presently verify, are I2 =fbHvdV, (2)

1,858 citations

Journal ArticleDOI
TL;DR: In this article, the authors distinguish between structural and functional definitions of network communities and identify networks with explicitly labeled functional communities to which they refer as ground-truth communities, where nodes explicitly state their community memberships and use such social groups to define a reliable and robust notion of groundtruth communities.
Abstract: Nodes in real-world networks organize into densely linked communities where edges appear with high concentration among the members of the community. Identifying such communities of nodes has proven to be a challenging task due to a plethora of definitions of network communities, intractability of methods for detecting them, and the issues with evaluation which stem from the lack of a reliable gold-standard ground-truth. In this paper, we distinguish between structural and functional definitions of network communities. Structural definitions of communities are based on connectivity patterns, like the density of connections between the community members, while functional definitions are based on (often unobserved) common function or role of the community members in the network. We argue that the goal of network community detection is to extract functional communities based on the connectivity structure of the nodes in the network. We then identify networks with explicitly labeled functional communities to which we refer as ground-truth communities. In particular, we study a set of 230 large real-world social, collaboration, and information networks where nodes explicitly state their community memberships. For example, in social networks, nodes explicitly join various interest-based social groups. We use such social groups to define a reliable and robust notion of ground-truth communities. We then propose a methodology, which allows us to compare and quantitatively evaluate how different structural definitions of communities correspond to ground-truth functional communities. We study 13 commonly used structural definitions of communities and examine their sensitivity, robustness and performance in identifying the ground-truth. We show that the 13 structural definitions are heavily correlated and naturally group into four classes. We find that two of these definitions, Conductance and Triad participation ratio, consistently give the best performance in identifying ground-truth communities. We also investigate a task of detecting communities given a single seed node. We extend the local spectral clustering algorithm into a heuristic parameter-free community detection method that easily scales to networks with more than 100 million nodes. The proposed method achieves 30 % relative improvement over current local clustering methods.

1,518 citations