A New Approach for Mining Correlated Frequent Subgraphs

doi:10.1145/3473042

Home
/
Papers
/
A New Approach for Mining Correlated Frequent Subgraphs

Journal Article•DOI•

A New Approach for Mining Correlated Frequent Subgraphs

ChowdhuryMohammad Ehsan Shahmi¹, AhmedChowdhury Farhan¹, K LeungCarson²•Institutions (2)

University of Dhaka¹, University of Manitoba²

31 Mar 2022-Vol. 13, Iss: 1, pp 1-28

TL;DR: In this paper, the authors proposed a graph mining algorithm for graph datasets to extract frequent subgraphs, which has proven to be crucial in numerous aspects such as scientific research and computer vision.

read less

Abstract: Nowadays graphical datasets are having a vast amount of applications. As a result, graph mining—mining graph datasets to extract frequent subgraphs—has proven to be crucial in numerous aspects. It ...

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

UGMINE: utility-based graph mining

[...]

Md. Tanvir Alam, Amit Roy, Chowdhury Farhan Ahmed, Md. Ashraful Islam, Carson K. Leung - Show less +1 more

12 Apr 2022-Applied Intelligence

16 citations

Book Chapter•DOI•

Q-VIPER: Quantitative Vertical Bitwise Algorithm to Mine Frequent Patterns

[...]

Thomas J. Czubryt, Carson K. Leung, Adam G. M. Pazdor

01 Jan 2022

9 citations

Book Chapter•DOI•

Data Analytics for Parking Facility Management

[...]

Deyu Deng, Carson K. Leung, Adam G. M. Pazdor

01 Jan 2022

TL;DR: In this paper , the authors present a system that utilizes available public resources while optimizing revenue with predefined restrictions, especially in the parking management field, and design a data-driven time-series based prediction system, which can support dynamic pricing.

...read moreread less

Abstract: Although urbanization benefits modern society and residents of urban cities, limited public resources—such as parking facilities—remain a problem. Parking pricing acts as a tool to adjust the available resources. How should parking pricing be used to maximize parking resource utilization while optimizing the parking revenue for parking management? In this paper, we present a system that utilizes available public resources while optimizing revenue with predefined restrictions, especially in the parking management field. More specifically, we design a data-driven time-series based prediction system, which can support dynamic pricing. Evaluation results show the effectiveness and practicality of our parking data analytics system for supporting parking facility management and dynamic pricing for parking applications.

...read moreread less

5 citations

Proceedings Article•DOI•

Analysis of Multi-Dimensional Road Accident Data for Disaster Management in Smart Cities

[...]

Michael Kolisnyk, M. Kwiatkowski, C. K. Leung, Benjamin J. Zacharias

01 Aug 2022

TL;DR: This paper presents a data science solution for multi-dimensional analysis of traffic accident data that integrates heterogeneous data regarding vehicles, accidents and causality, and reuses past knowledge and information discovered from historical data for handling future situations.

...read moreread less

Abstract: In the current data-driven era, large volumes of data of different dimensions are generated and collected at a rapid rate. Examples of these big data include transportation data (e.g., traffic accident data). Integration of different transportation data, as well as reuse of past knowledge and information on public transit, can be for social good (e.g., can help road users avoid traffic accidents). Multi-dimensional data analysis and mining helps reveal factors associating with, or contributing to, traffic accidents. To manage this type of human-made disaster, we present in this paper a data science solution for multi-dimensional analysis of traffic accident data. It integrates heterogeneous data regarding vehicles, accidents and causality. It reuses past knowledge and information discovered from historical data for handling future situations. Evaluation on real-life accident data from the UK reveals some common conditions leading to serious and/or fatal accidents. It demonstrates the practicality of our solution in multi-dimensional analysis of traffic accident data, as well as the benefits of data integration and information (and knowledge) reuse, for disaster management in smart cities. Moreover, it is important to note that, although we illustrate our solution on UK accident data, our solution is expected to be reusable for the analysis of traffic accidents, support of disaster management, and building of smart cities at other locations.

...read moreread less

4 citations

Proceedings Article•DOI•

Q-Eclat: Vertical Mining of Interesting Quantitative Patterns

[...]

Thomas J. Czubryt, Carson K. Leung, Adam G. M. Pazdor

22 Aug 2022

TL;DR: Evaluation results show that the existing MQA-M algorithm, built for quantitative horizontal frequent pattern mining, takes shorter runtime to mine quantitative frequent patterns, while the new quantitative vertical Q-Eclat algorithm takes less runtime.

...read moreread less

Abstract: Frequent pattern mining is a popular technique in big data mining and analytics. It discovers frequently occurring sets of items (e.g., popular merchandise items, frequently co-occurring events) from big data found in numerous database engineered applications. These frequent patterns can be discovered horizontally by transaction-centric mining algorithms or vertically by item-centric mining algorithms. Regardless of their mining direction (horizontal or vertical), traditional frequent pattern mining algorithms aim to discover Boolean frequent patterns in the sense that patterns capture the presence (or absence) of items within the discovered patterns. However, there are many real-life situations, in which quantities of items within the patterns are important. For example, the quantity of items may also affect profits of selling the items within the discovered patterns. Hence, in this paper, we present an algorithm for vertical mining of interesting quantitative frequent patterns. This Q-Eclat algorithm first represents the big data as a collection of equivalence classes according to their prefix item labels. Each domain item is represented by one of these classes. Their corresponding item-centric sets capture (a) IDs of transactions containing the item, as well as (b) the quantity of that item in each transaction. With this representation, our algorithm then vertically mines quantitative frequent patterns. When compared the existing MQA-M algorithm (which was built for quantitative horizontal frequent pattern mining), evaluation results show that our quantitative vertical Q-Eclat algorithm takes shorter runtime to mine quantitative frequent patterns.

...read moreread less

4 citations

1
2
3
4
…
5
6

Collapse

References

PDF

Open Access

More filters

Proceedings Article•

Fast Algorithms for Mining Association Rules in Large Databases

[...]

Rakesh Agrawal, Ramakrishnan Srikant

12 Sep 1994

10,454 citations

Journal Article•DOI•

A survey of frequent subgraph mining algorithms

[...]

Chuntao Jiang¹, Frans Coenen¹, Michele Zito¹•Institutions (1)

University of Liverpool¹

01 Mar 2013-Knowledge Engineering Review

TL;DR: A survey of current research in the field of frequent subgraph mining is presented and solutions to address the main research issues are proposed.

...read moreread less

Abstract: Graph mining is an important research area within the domain of data mining The field of study concentrates on the identification of frequent subgraphs within graph data sets The research goals are directed at: (i) effective mechanisms for generating candidate subgraphs (without generating duplicates) and (ii) how best to process the generated candidate subgraphs so as to identify the desired frequent subgraphs in a way that is computationally efficient and procedurally effective This paper presents a survey of current research in the field of frequent subgraph mining and proposes solutions to address the main research issues

...read moreread less

333 citations

Proceedings Article•DOI•

FENNEL: streaming graph partitioning for massive scale graphs

[...]

Charalampos E. Tsourakakis¹, Christos Gkantsidis², Bozidar Radunovic², Milan Vojnovic²•Institutions (2)

Aalto University¹, Microsoft²

24 Feb 2014

TL;DR: This work derives a novel one-pass, streaming graph partitioning algorithm and shows that it yields significant performance improvements over previous approaches using an extensive set of real-world and synthetic graphs.

...read moreread less

Abstract: Balanced graph partitioning in the streaming setting is a key problem to enable scalable and efficient computations on massive graph data such as web graphs, knowledge graphs, and graphs arising in the context of online social networks. Two families of heuristics for graph partitioning in the streaming setting are in wide use: place the newly arrived vertex in the cluster with the largest number of neighbors or in the cluster with the least number of non-neighbors. In this work, we introduce a framework which unifies the two seemingly orthogonal heuristics and allows us to quantify the interpolation between them. More generally, the framework enables a well principled design of scalable, streaming graph partitioning algorithms that are amenable to distributed implementations. We derive a novel one-pass, streaming graph partitioning algorithm and show that it yields significant performance improvements over previous approaches using an extensive set of real-world and synthetic graphs. Surprisingly, despite the fact that our algorithm is a one-pass streaming algorithm, we found its performance to be in many cases comparable to the de-facto standard offline software METIS and in some cases even superiror. For instance, for the Twitter graph with more than 1.4 billion of edges, our method partitions the graph in about 40 minutes achieving a balanced partition that cuts as few as 6.8% of edges, whereas it took more than 81/2 hours by METIS to produce a balanced partition that cuts 11.98% of edges. We also demonstrate the performance gains by using our graph partitioner while solving standard PageRank computation in a graph processing platform with respect to the communication cost and runtime.

...read moreread less

324 citations

Journal Article•DOI•

Vehicle routing problem with drones

[...]

Zheng Wang¹, Jiuh-Biing Sheu²•Institutions (2)

Dalian Maritime University¹, National Taiwan University²

01 Apr 2019-Transportation Research Part B-methodological

TL;DR: This work proposes a mixed integer programming model, and develops a branch-and-price algorithm for routing trucks and drones in an integrated manner, and shows good computational performance of the proposed algorithm.

...read moreread less

Abstract: The vehicle routing problem with drones (VRPD) is an extension of the classic capacitated vehicle routing problem, where not only trucks but drones are used to deliver parcels to customers. One distinctive feature of the VRPD is that a drone may travel with a truck, take off from its stop to serve customers, and land at a service hub to travel with another truck as long as the flying range and loading capacity limitations are satisfied. Routing trucks and drones in an integrated manner makes the problem much more challenging and different from classical vehicle routing literature. We propose a mixed integer programming model, and develop a branch-and-price algorithm. Extensive experiments are conducted on the instances randomly generated in a practical setting, and the results demonstrate the good computational performance of the proposed algorithm. We also conduct sensitivity analysis on a key factor that may affect the total cost of a solution.

...read moreread less

216 citations

Journal Article•DOI•

The Wilcoxon–Mann–Whitney test under scrutiny

[...]

Morten W. Fagerland¹, Leiv Sandvik¹•Institutions (1)

Oslo University Hospital¹

10 May 2009-Statistics in Medicine

TL;DR: The problem of lack of robustness of this test is more serious than thought to be the case and is explained by considering the WMW test as a two-sample T test on ranks, which explains the results by noting some undesirable properties of the rank transformation.

...read moreread less

Abstract: The Wilcoxon-Mann-Whitney (WMW) test is often used to compare the means or medians of two independent, possibly nonnormal distributions. For this problem, the true significance level of the large sample approximate version of the WMW test is known to be sensitive to differences in the shapes of the distributions. Based on a wide ranging simulation study, our paper shows that the problem of lack of robustness of this test is more serious than is thought to be the case. In particular, small differences in variances and moderate degrees of skewness can produce large deviations from the nominal type I error rate. This is further exacerbated when the two distributions have different degrees of skewness. Other rank-based methods like the Fligner-Policello (FP) test and the Brunner-Munzel (BM) test perform similarly, although the BM test is generally better. By considering the WMW test as a two-sample T test on ranks, we explain the results by noting some undesirable properties of the rank transformation. In practice, the ranked samples should be examined and found to sufficiently satisfy reasonable symmetry and variance homogeneity before the test results are interpreted.

...read moreread less

208 citations