A Comparative Study of Community Detection Techniques for Large Evolving Graphs
Summary (2 min read)
1 Introduction
- Community detection techniques in complex networks are a well-covered topic in academic literature nowadays as identifying meaningful substructures in complex networks has numerous applications in a vast variety of fields ranging from biology, mathematics, and computer science to finance, economics and sociology.
- Nonetheless, newly proposed community detection methods for dynamic graphs are typically compared with only very few methods in settings aiming to demonstrate superiority of the proposed method.
- The setup and results of these comparisons might contain an unconscious bias towards one’s own algorithm.
- This is not surprising given the many different aspects which come into play when comparing DCD methods: different underlying network models, different community definitions, different temporal segments used for detecting communities, different community evolution events tracked etc.
- Results showcase that no single best performing community technique exists.
2.1 Algorithm selection
- Given the soundness and completeness of Cazabet and Rossetti’s classification framework [1], the authors opt for using this framework as a steering wheel in the process of method selection.
- In an applied setting, neglecting previous states of the network oftentimes leads to sub-optimal solutions.
- For each subcategory, at least one representative is chosen.
- Additionally, the list of compared algorithms is complemented by more recently published techniques, which, in turn, are also classified in the four previously mentioned categories.
- Secondly, the algorithm would preferably be able to detect overlapping communities to ensure a realistic partitioning in social network problems.
2.3 Empirical analysis setup
- Given their different characteristics, to provide a fair comparison, selectedDCDmethods are benchmarked based on both synthetic and real-life datasets.
- Larger makes the sizes of communities relatively larger, more dispersed, while smaller makes the differences between community sizes smaller, more uniform.
- The rate of node appearance and vanishing is fixed to 0.05 and 0.02 respectively.
- To measure the relative performance of the different algorithms, two metrics were chosen.
3.1 Algorithm selection
- It is more time-efficient than its modularity-based peers that do not rely on community updating.
- The algorithms belonging to this category all consider a list of historic network changes in order to update the network’s partitioning.
- FacetNet is used as a benchmark approach in many papers introducing algorithms with similar capabilities.
- DEMON introduced in [23] is a technique that is able to hierarchically detect overlapping communities but cannot, unlike all previous methods, identify community evolution events.
3.2 Qualitative results
- The first aspect that stands out is the larger presence of algorithms that update communities by a set of rules (2.2) not only in their final selection, but, likewise, among the more recently proposed methods, such as AFOCS, HOCTracker, OLCPM and DOCET which are also more focused on performing in dynamic social environments.
- Finally, TILES, AFOCS, HOCTracker and DOCET appear to be the most complex algorithms as their computation time is expected to grow quadratically with the number of nodes, which is particularly problematic for large graphs.
- Firstly, it is remarkable that the event resurgence cannot be detected by any of the selected algorithms, nor by any of the other algorithms that were analyzed, even though the event has been included in the literature, among others by [1].
- It might be the case that continue is implied/detected when no event occurs and is therefore not mentioned by the authors.
- Secondly, the algorithms, such as OLCPM, HOCTracker and DOCET, that were included in addition to the survey by [1] because they were more recent and possessed good features for social network community detection, can detect most of the events community evolution events.
5 Conclusion
- Dynamic community detection has numerous applications in different fields and as such is extensively studied in the current literature.
- The qualitative analysis included an overall set of characteristics relevant for community detection such as community definition used, the ability to track community life-cycle events, overlapping and hierarchical communities, and the time complexity.
- For the empirical analysis, several limiting factors such as unavailable/poorly documented source code and inability to runmethods on large graphs led to a narrower set of compared methods.
- Nevertheless, 900 synthetic, evolving graphs of various sizes and community size distributions and the most frequently used real-world DBLP dataset were used for a thorough analysis.
Did you find this useful? Give us your feedback
Citations
10 citations
7 citations
7 citations
Cites methods from "A Comparative Study of Community De..."
...To the best of our knowledge, a single paper has been published so far comparing empirically dynamic community detection algorithms: in [7], 5 methods have been tested on RDyn benchmark [28]....
[...]
References
12,882 citations
"A Comparative Study of Community De..." refers background in this paper
...Although themost prominently used benchmark graphs GirvanNewman (GN) [4] and Lancichinetti-Fortunato-Radicchi (LFR) [2] are not suited for temporal community discovery, to this end, their extensions in [6] and [5] respectively, were proposed....
[...]
5,217 citations
2,772 citations
"A Comparative Study of Community De..." refers background in this paper
...Although themost prominently used benchmark graphs GirvanNewman (GN) [4] and Lancichinetti-Fortunato-Radicchi (LFR) [2] are not suited for temporal community discovery, to this end, their extensions in [6] and [5] respectively, were proposed....
[...]
2,630 citations
2,113 citations
Related Papers (5)
Frequently Asked Questions (2)
Q2. What have the authors stated for future works in "A comparative study of community detection techniques for large evolving graphs" ?
For future work, the authors envision an even more extensive empirical evaluation.