Trawling the Web for emerging cyber-communities

doi:10.1016/S1389-1286(99)00040-7

Journal ArticleDOI

Trawling the Web for emerging cyber-communities

- Vol. 31, Iss: 11, pp 1481-1493

TLDR

The subject of this paper is the systematic enumeration of over 100,000 emerging communities from a Web crawl, motivating a graph-theoretic approach to locating such communities, and describing the algorithms and algorithmic engineering necessary to find structures that subscribe to this notion.

Abstract:

The Web harbors a large number of communities — groups of content-creators sharing a common interest — each of which manifests itself as a set of interlinked Web pages. Newgroups and commercial Web directories together contain of the order of 20,000 such communities; our particular interest here is on emerging communities — those that have little or no representation in such fora. The subject of this paper is the systematic enumeration of over 100,000 such emerging communities from a Web crawl: we call our process trawling. We motivate a graph-theoretic approach to locating such communities, and describe the algorithms, and the algorithmic engineering necessary to find structures that subscribe to this notion, the challenges in handling such a huge data set, and the results of our experiment. © 1999 Published by Elsevier Science B.V. All rights reserved.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Evolution of networks

Sergey N. Dorogovtsev, +1 more

- 01 Jun 2002 -

Advances in Physics

TL;DR: The recent rapid progress in the statistical physics of evolving networks is reviewed, and how growing networks self-organize into scale-free structures is discussed, and the role of the mechanism of preferential linking is investigated.

...read moreread less

Proceedings ArticleDOI

Measurement and analysis of online social networks

Alan Mislove, +4 more

TL;DR: This paper examines data gathered from four popular online social networks: Flickr, YouTube, LiveJournal, and Orkut, and reports that the indegree of user nodes tends to match the outdegree; the networks contain a densely connected core of high-degree nodes; and that this core links small groups of strongly clustered, low-degree node at the fringes of the network.

...read moreread less

Proceedings ArticleDOI

Why we twitter: understanding microblogging usage and communities

Akshay Java, +3 more

TL;DR: It is found that people use microblogging to talk about their daily activities and to seek or share information and the user intentions associated at a community level are analyzed to show how users with similar intentions connect with each other.

...read moreread less

The wealth of networks : how social production transformsmarkets and freedom

Yochai Benkler

Proceedings ArticleDOI

Graphs over time: densification laws, shrinking diameters and possible explanations

Jure Leskovec, +2 more

TL;DR: A new graph generator is provided, based on a "forest fire" spreading process, that has a simple, intuitive justification, requires very few parameters (like the "flammability" of nodes), and produces graphs exhibiting the full range of properties observed both in prior work and in the present study.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

The anatomy of a large-scale hypertextual Web search engine

Sergey Brin, +1 more

TL;DR: This paper provides an in-depth description of Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and looks at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.

...read moreread less

Journal Article

The Anatomy of a Large-Scale Hypertextual Web Search Engine.

Sergey Brin, +1 more

- 01 Jan 1998 -

Computer Networks

TL;DR: Google as discussed by the authors is a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems.

...read moreread less

Journal ArticleDOI

Indexing by Latent Semantic Analysis

Scott Deerwester, +4 more

- 01 Sep 1990 -

Journal of the Association for Informati...

TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.

...read moreread less

Proceedings Article

Fast algorithms for mining association rules

Rakesh Agrawal, +1 more

TL;DR: Two new algorithms for solving thii problem that are fundamentally different from the known algorithms are presented and empirical evaluation shows that these algorithms outperform theknown algorithms by factors ranging from three for small problems to more than an order of magnitude for large problems.

...read moreread less

Journal ArticleDOI

Syntactic clustering of the Web

Andrei Z. Broder, +3 more

TL;DR: An efficient way to determine the syntactic similarity of files is developed and applied to every document on the World Wide Web, and a clustering of all the documents that are syntactically similar is built.

...read moreread less

Collapse

Trawling the Web for emerging cyber-communities

Citations

Evolution of networks

Measurement and analysis of online social networks

Why we twitter: understanding microblogging usage and communities

The wealth of networks : how social production transformsmarkets and freedom

Graphs over time: densification laws, shrinking diameters and possible explanations

References

The anatomy of a large-scale hypertextual Web search engine

The Anatomy of a Large-Scale Hypertextual Web Search Engine.

Indexing by Latent Semantic Analysis

Fast algorithms for mining association rules

Syntactic clustering of the Web

Related Papers (5)

Authoritative sources in a hyperlinked environment

The anatomy of a large-scale hypertextual Web search engine

Graph structure in the Web

Emergence of Scaling in Random Networks

Collective dynamics of small-world networks