scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A New Approach for Mining Correlated Frequent Subgraphs

TL;DR: In this paper, the authors proposed a graph mining algorithm for graph datasets to extract frequent subgraphs, which has proven to be crucial in numerous aspects such as scientific research and computer vision.
Abstract: Nowadays graphical datasets are having a vast amount of applications. As a result, graph mining—mining graph datasets to extract frequent subgraphs—has proven to be crucial in numerous aspects. It ...
Citations
More filters
Book ChapterDOI
01 Jan 2022
TL;DR: In this paper , the authors present a system that utilizes available public resources while optimizing revenue with predefined restrictions, especially in the parking management field, and design a data-driven time-series based prediction system, which can support dynamic pricing.
Abstract: Although urbanization benefits modern society and residents of urban cities, limited public resources—such as parking facilities—remain a problem. Parking pricing acts as a tool to adjust the available resources. How should parking pricing be used to maximize parking resource utilization while optimizing the parking revenue for parking management? In this paper, we present a system that utilizes available public resources while optimizing revenue with predefined restrictions, especially in the parking management field. More specifically, we design a data-driven time-series based prediction system, which can support dynamic pricing. Evaluation results show the effectiveness and practicality of our parking data analytics system for supporting parking facility management and dynamic pricing for parking applications.

5 citations

Proceedings ArticleDOI
01 Aug 2022
TL;DR: This paper presents a data science solution for multi-dimensional analysis of traffic accident data that integrates heterogeneous data regarding vehicles, accidents and causality, and reuses past knowledge and information discovered from historical data for handling future situations.
Abstract: In the current data-driven era, large volumes of data of different dimensions are generated and collected at a rapid rate. Examples of these big data include transportation data (e.g., traffic accident data). Integration of different transportation data, as well as reuse of past knowledge and information on public transit, can be for social good (e.g., can help road users avoid traffic accidents). Multi-dimensional data analysis and mining helps reveal factors associating with, or contributing to, traffic accidents. To manage this type of human-made disaster, we present in this paper a data science solution for multi-dimensional analysis of traffic accident data. It integrates heterogeneous data regarding vehicles, accidents and causality. It reuses past knowledge and information discovered from historical data for handling future situations. Evaluation on real-life accident data from the UK reveals some common conditions leading to serious and/or fatal accidents. It demonstrates the practicality of our solution in multi-dimensional analysis of traffic accident data, as well as the benefits of data integration and information (and knowledge) reuse, for disaster management in smart cities. Moreover, it is important to note that, although we illustrate our solution on UK accident data, our solution is expected to be reusable for the analysis of traffic accidents, support of disaster management, and building of smart cities at other locations.

4 citations

Proceedings ArticleDOI
22 Aug 2022
TL;DR: Evaluation results show that the existing MQA-M algorithm, built for quantitative horizontal frequent pattern mining, takes shorter runtime to mine quantitative frequent patterns, while the new quantitative vertical Q-Eclat algorithm takes less runtime.
Abstract: Frequent pattern mining is a popular technique in big data mining and analytics. It discovers frequently occurring sets of items (e.g., popular merchandise items, frequently co-occurring events) from big data found in numerous database engineered applications. These frequent patterns can be discovered horizontally by transaction-centric mining algorithms or vertically by item-centric mining algorithms. Regardless of their mining direction (horizontal or vertical), traditional frequent pattern mining algorithms aim to discover Boolean frequent patterns in the sense that patterns capture the presence (or absence) of items within the discovered patterns. However, there are many real-life situations, in which quantities of items within the patterns are important. For example, the quantity of items may also affect profits of selling the items within the discovered patterns. Hence, in this paper, we present an algorithm for vertical mining of interesting quantitative frequent patterns. This Q-Eclat algorithm first represents the big data as a collection of equivalence classes according to their prefix item labels. Each domain item is represented by one of these classes. Their corresponding item-centric sets capture (a) IDs of transactions containing the item, as well as (b) the quantity of that item in each transaction. With this representation, our algorithm then vertically mines quantitative frequent patterns. When compared the existing MQA-M algorithm (which was built for quantitative horizontal frequent pattern mining), evaluation results show that our quantitative vertical Q-Eclat algorithm takes shorter runtime to mine quantitative frequent patterns.

4 citations

References
More filters
Journal ArticleDOI
TL;DR: A survey of current research in the field of frequent subgraph mining is presented and solutions to address the main research issues are proposed.
Abstract: Graph mining is an important research area within the domain of data mining The field of study concentrates on the identification of frequent subgraphs within graph data sets The research goals are directed at: (i) effective mechanisms for generating candidate subgraphs (without generating duplicates) and (ii) how best to process the generated candidate subgraphs so as to identify the desired frequent subgraphs in a way that is computationally efficient and procedurally effective This paper presents a survey of current research in the field of frequent subgraph mining and proposes solutions to address the main research issues

333 citations

Proceedings ArticleDOI
24 Feb 2014
TL;DR: This work derives a novel one-pass, streaming graph partitioning algorithm and shows that it yields significant performance improvements over previous approaches using an extensive set of real-world and synthetic graphs.
Abstract: Balanced graph partitioning in the streaming setting is a key problem to enable scalable and efficient computations on massive graph data such as web graphs, knowledge graphs, and graphs arising in the context of online social networks. Two families of heuristics for graph partitioning in the streaming setting are in wide use: place the newly arrived vertex in the cluster with the largest number of neighbors or in the cluster with the least number of non-neighbors. In this work, we introduce a framework which unifies the two seemingly orthogonal heuristics and allows us to quantify the interpolation between them. More generally, the framework enables a well principled design of scalable, streaming graph partitioning algorithms that are amenable to distributed implementations. We derive a novel one-pass, streaming graph partitioning algorithm and show that it yields significant performance improvements over previous approaches using an extensive set of real-world and synthetic graphs. Surprisingly, despite the fact that our algorithm is a one-pass streaming algorithm, we found its performance to be in many cases comparable to the de-facto standard offline software METIS and in some cases even superiror. For instance, for the Twitter graph with more than 1.4 billion of edges, our method partitions the graph in about 40 minutes achieving a balanced partition that cuts as few as 6.8% of edges, whereas it took more than 81/2 hours by METIS to produce a balanced partition that cuts 11.98% of edges. We also demonstrate the performance gains by using our graph partitioner while solving standard PageRank computation in a graph processing platform with respect to the communication cost and runtime.

324 citations

Journal ArticleDOI
TL;DR: This work proposes a mixed integer programming model, and develops a branch-and-price algorithm for routing trucks and drones in an integrated manner, and shows good computational performance of the proposed algorithm.
Abstract: The vehicle routing problem with drones (VRPD) is an extension of the classic capacitated vehicle routing problem, where not only trucks but drones are used to deliver parcels to customers. One distinctive feature of the VRPD is that a drone may travel with a truck, take off from its stop to serve customers, and land at a service hub to travel with another truck as long as the flying range and loading capacity limitations are satisfied. Routing trucks and drones in an integrated manner makes the problem much more challenging and different from classical vehicle routing literature. We propose a mixed integer programming model, and develop a branch-and-price algorithm. Extensive experiments are conducted on the instances randomly generated in a practical setting, and the results demonstrate the good computational performance of the proposed algorithm. We also conduct sensitivity analysis on a key factor that may affect the total cost of a solution.

216 citations

Journal ArticleDOI
TL;DR: The problem of lack of robustness of this test is more serious than thought to be the case and is explained by considering the WMW test as a two-sample T test on ranks, which explains the results by noting some undesirable properties of the rank transformation.
Abstract: The Wilcoxon-Mann-Whitney (WMW) test is often used to compare the means or medians of two independent, possibly nonnormal distributions. For this problem, the true significance level of the large sample approximate version of the WMW test is known to be sensitive to differences in the shapes of the distributions. Based on a wide ranging simulation study, our paper shows that the problem of lack of robustness of this test is more serious than is thought to be the case. In particular, small differences in variances and moderate degrees of skewness can produce large deviations from the nominal type I error rate. This is further exacerbated when the two distributions have different degrees of skewness. Other rank-based methods like the Fligner-Policello (FP) test and the Brunner-Munzel (BM) test perform similarly, although the BM test is generally better. By considering the WMW test as a two-sample T test on ranks, we explain the results by noting some undesirable properties of the rank transformation. In practice, the ranked samples should be examined and found to sufficiently satisfy reasonable symmetry and variance homogeneity before the test results are interpreted.

208 citations