# A novel graph clustering algorithm based on discrete-time quantum random walk

01 Jan 2017-pp 361-389

TL;DR: This chapter explains how quantum random walk helps in graph-based clustering, and proposes a new quantum clustering algorithm based on the discrete-time quantum randomwalk, which finds the clusters from a given adjacency matrix of a graph.

Abstract: The clustering activity is an unsupervised learning observation which coalesces the data into segments. Grouping of data is done by identification of common characteristics that are labeled as similarities among data on the basis of their characteristics. Graph clustering is a tool needed in many computer applications, such as network routing, analysis of social networks, computer vision, and VLSI physical design. This chapter explains how quantum random walk helps in graph-based clustering, and we propose a new quantum clustering algorithm. The proposed quantum clustering algorithm is based on the discrete-time quantum random walk, which finds the clusters from a given adjacency matrix of a graph. We give a quantum circuit model and Quantum Computing Language-based simulation of our algorithm and illustrate its faster rate of convergence. Simulation results for experimental graphs illustrate that our proposed algorithm shows an exponential speedup over existing classical algorithms.

##### Citations

More filters

••

[...]

TL;DR: In this paper, the authors assessed the possible contamination source to groundwater quality in the basement rocks of Osun State, South-Western Nigeria using multivariate analyses of Principal Component Analysis (PCA) and Cluster Analysis (CA).

Abstract: Groundwater is a major source of drinking water in many rural and urban areas of developing nations. Pollution of groundwater from diverse sources is an issue of concern due to inherent health problems. This study assessed the possible contamination source to groundwater quality in the basement rocks of Osun State, South-Western Nigeria using multivariate analyses of Principal Component Analysis (PCA) and Cluster Analysis (CA). The secondary data from 536 wells across the 30 Local Government Areas in the State were collected from the Rural Water and Environmental Sanitation Agency (RUWESA). The groundwater data include pH, temperature, turbidity, oxido-reduction potential, total dissolved solids, electrical conductivity, total alkalinity, magnesium hardness, calcium hardness, total hardness, free chlorine, total chlorine, chloride, fluoride, nitrate, nitrite, iron, manganese and zinc. The data were subjected to simple and inferential statistics using the Statistical Package for Social Sciences (SPSS vs. 21.0). The mean results of groundwater parameters such as nitrate and Mn were higher than the World Health Organisation (WHO) limits of 0.4 and 10 mg/L, respectively. The results of the PCA and CA revealed possible sources of pollutants to the groundwater quality as weathering of bedrocks, leachate from septic tanks and dumpsites, runoff of materials, hardness, nutrients from agricultural lands, and chlorine pollution.

8 citations

### Cites methods from "A novel graph clustering algorithm ..."

[...]

••

[...]

TL;DR: This study constructed a novel methodological framework by evaluating different machine learning models to group textual information based on pictorial content to uncover the destination image based on Instagram photographs.

Abstract: Symbols are powerful in branding and marketing to represent tourist attractions. By bridging semiotics, marketing, and data science in the tourism context, this study uncovers the destination image based on Instagram photographs. This study constructed a novel methodological framework by evaluating different machine learning models to group textual information based on pictorial content. The results highlighted specific destination image clusters such as the wilderness and spirituality of alpine experiences. This information facilitates marketers' understanding of tourists’ preferences and movement. It also discloses blind spots that are less promoted by the marketers.

8 citations

••

[...]

TL;DR: The experimental results showed that the combination of DBSCAN clustering and parallel indexing make the B3CF‐trees outperform the latest real data indexing methods, in terms of quality, and the use of parallelism during kNN search reduced, significantly, the retrieve time of the similarity query search.

Abstract: In recent years, the large amount of heterogeneous data generated by the Internet of Things (IoT) sensors and devices made recording and research tasks much more difficult, and most of the state‐of‐the‐art methods have failed to deal with the new IoT requirements. This article proposes a new efficient method that simplifies data indexing and enhances the quality and velocity of the similarity query search in the IoT environment. In this method, the fog layer was divided into two levels. In the clustering fog level, the incremental density‐based spatial clustering of applications with noise (DBSCAN) algorithm was used to separate collected data into clusters in order to minimize data overlap during in parallel indexes construction. Parallelism was also used, in the indexing fog level to speed up the similarity‐based search process and speed up the similarity‐based search process. The data in each cluster were indexed using our proposed structure called B3CF‐tree (binary tree based on containers at the cloud‐clusters fog computing level). The objects in the leaf nodes of the B3CF‐trees are, finally, stored in the cloud. Using this approach for computing multiple datasets, the retrieve time of the similarity search is significantly reduced. The experimental results showed that the combination of DBSCAN clustering and parallel indexing make the B3CF‐trees outperform the latest real data indexing methods. For example, in terms of quality, the B3CF‐tree has the smallest number of nodes and leaf nodes. In addition, the use of parallelism during kNN search reduced, significantly, the retrieve time of the similarity query search.

4 citations

••

[...]

TL;DR: This study presents one of the first attempts of DP-GMM for full automation of Operational Modal Analysis (OMA), validated using the field test data from a large-scale operating cable-stayed bridge, which has two closely-spaced modes around 3 Hz.

Abstract: The development of a fully automated system identifier without the need for human intervention, is a key step for real-time vibration-based Structural Health Monitoring (SHM). In this paper a novel approach based on the Dirichlet Process Gaussian Mixture Model (DP-GMM) is developed in order to analyze the stabilization diagram. The aim is to separate the true physical modes from the mathematically spurious modes in a fully automated manner, whilst eliminating the need for any manually specified parameters or thresholds. The parametric Covariance Driven Stochastic Subspace Identification (SSI-Cov) is adopted to estimate the modal parameters, and consequently establish the initial stabilization diagram. From there, the use of a two-stage algorithm involving a DP-GMM is proposed to non-parametrically perform an automated cleaning of the stabilization diagram. The contributions of the paper are five-fold: (1) A probabilistic approach based on a DP-GMM is proposed to analyze the stabilization diagram. To the best knowledge of the authors, this study presents one of the first attempts of DP-GMM for full automation of Operational Modal Analysis (OMA). The method is validated using the field test data from a large-scale operating cable-stayed bridge, which has two closely-spaced modes around 3 Hz. Not only are these two complicated scenarios consistently identified, but the performance of the method with respect to the problem of missing modes is compared against a reference method based on the conventional multi-stage clustering technique used in OMA, wherein superior performance of the proposed method is demonstrated. (2) The method does not require specification of any threshold or parameter at any stage of the algorithm for cleaning the stabilization diagram, making the approach a potential for robust and fully automated modal identification. (3) Compared to many conventional multi-stage clustering techniques, the proposed approach is computationally efficient as intelligent updates are made to the model using multiple linear algebra properties. (4) New feature vectors are developed which are justified using a combination of mathematical rigor, visual understanding, and engineering intuition. (5) Due to the probabilistic nature of the method, the identification results are accompanied with uncertainty bounds. Several mathematical proofs are presented to explain the observed behavior of the uncertainty bounds.

3 citations

••

[...]

TL;DR: In this paper, a machine learning K-means algorithm is applied to data of seven aerosol properties from a global aerosol simulation using EMAC-MADE3 to partition the aerosols properties across the global atmosphere in specific aerosol regimes.

Abstract: . A machine learning K-means algorithm is applied to data of seven aerosol properties from a global aerosol simulation using EMAC-MADE3. The aim is to partition the aerosol properties across the global atmosphere in specific aerosol regimes. K-means is an unsupervised machine learning method with the advantage that an a priori definition of the aerosol classes is not required. Using K-means, we are able to quantitatively define global aerosol regimes, so-called aerosol clusters, and explain their internal properties as well as their location and extension. This analysis shows that aerosol regimes in the lower troposphere are strongly influenced by emissions. Key drivers of the clusters’ internal properties and spatial distribution are, for instance, pollutants from biomass burning/biogenic sources, mineral dust, anthropogenic pollution, as well as their mixing. Several continental clusters propagate into oceanic regions. The identified oceanic regimes show a higher degree of pollution in the northern hemisphere than over the southern oceans. With increasing altitude, the aerosol regimes propagate from emission-induced clusters in the lower troposphere to roughly zonally distributed regimes in the middle troposphere and in the tropopause region. Notably, three polluted clusters identified over Africa, India and eastern China, cover the whole atmospheric column from the lower troposphere to the tropopause region. A markedly wide application potential of the classification procedure is identified and further aerosol studies are proposed which could benefit from this classification.

1 citations

### Additional excerpts

[...]

##### References

More filters

••

[...]

TL;DR: In this article, the modularity of a network is expressed in terms of the eigenvectors of a characteristic matrix for the network, which is then used for community detection.

Abstract: Many networks of interest in the sciences, including social networks, computer networks, and metabolic and regulatory networks, are found to divide naturally into communities or modules. The problem of detecting and characterizing this community structure is one of the outstanding issues in the study of networked systems. One highly effective approach is the optimization of the quality function known as “modularity” over the possible divisions of a network. Here I show that the modularity can be expressed in terms of the eigenvectors of a characteristic matrix for the network, which I call the modularity matrix, and that this expression leads to a spectral algorithm for community detection that returns results of demonstrably higher quality than competing methods in shorter running times. I illustrate the method with applications to several published network data sets.

8,969 citations

••

[...]

TL;DR: The multidimensional binary search tree (or k-d tree) as a data structure for storage of information to be retrieved by associative searches is developed and it is shown to be quite efficient in its storage requirements.

Abstract: This paper develops the multidimensional binary search tree (or k-d tree, where k is the dimensionality of the search space) as a data structure for storage of information to be retrieved by associative searches. The k-d tree is defined and examples are given. It is shown to be quite efficient in its storage requirements. A significant advantage of this structure is that a single data structure can handle many types of queries very efficiently. Various utility algorithms are developed; their proven average running times in an n record file are: insertion, O(log n); deletion of the root, O(n(k-1)/k); deletion of a random node, O(log n); and optimization (guarantees logarithmic performance of searches), O(n log n). Search algorithms are given for partial match queries with t keys specified [proven maximum running time of O(n(k-t)/k)] and for nearest neighbor queries [empirically observed average running time of O(log n).] These performances far surpass the best currently known algorithms for these tasks. An algorithm is presented to handle any general intersection query. The main focus of this paper is theoretical. It is felt, however, that k-d trees could be quite useful in many applications, and examples of potential uses are given.

6,444 citations

••

[...]

Bell Labs

^{1}TL;DR: Las Vegas algorithms for finding discrete logarithms and factoring integers on a quantum computer that take a number of steps which is polynomial in the input size, e.g., the number of digits of the integer to be factored are given.

Abstract: A computer is generally considered to be a universal computational device; i.e., it is believed able to simulate any physical computational device with a cost in computation time of at most a polynomial factor: It is not clear whether this is still true when quantum mechanics is taken into consideration. Several researchers, starting with David Deutsch, have developed models for quantum mechanical computers and have investigated their computational properties. This paper gives Las Vegas algorithms for finding discrete logarithms and factoring integers on a quantum computer that take a number of steps which is polynomial in the input size, e.g., the number of digits of the integer to be factored. These two problems are generally considered hard on a classical computer and have been used as the basis of several proposed cryptosystems. We thus give the first examples of quantum cryptanalysis. >

5,655 citations

••

[...]

Bell Labs

^{1}TL;DR: In this paper, it was shown that a quantum mechanical computer can solve integer factorization problem in a finite power of O(log n) time, where n is the number of elements in a given integer.

Abstract: were proposed in the early 1980’s [Benioff80] and shown to be at least as powerful as classical computers an important but not surprising result, since classical computers, at the deepest level, ultimately follow the laws of quantum mechanics. The description of quantum mechanical computers was formalized in the late 80’s and early 90’s [Deutsch85][BB92] [BV93] [Yao93] and they were shown to be more powerful than classical computers on various specialized problems. In early 1994, [Shor94] demonstrated that a quantum mechanical computer could efficiently solve a well-known problem for which there was no known efficient algorithm using classical computers. This is the problem of integer factorization, i.e. testing whether or not a given integer, N, is prime, in a time which is a finite power of o (logN) . ----------------------------------------------

5,636 citations

••

[...]

TL;DR: In this paper, the authors used data from a voluntary association to construct a new formal model for a traditional anthropological problem, fission in small groups, where the process leading to fission is viewed as an unequal flow of sentiments and information across the ties in a social network.

Abstract: Data from a voluntary association are used to construct a new formal model for a traditional anthropological problem, fission in small groups. The process leading to fission is viewed as an unequal flow of sentiments and information across the ties in a social network. This flow is unequal because it is uniquely constrained by the contextual range and sensitivity of each relationship in the network. The subsequent differential sharing of sentiments leads to the formation of subgroups with more internal stability than the group as a whole, and results in fission. The Ford-Fulkerson labeling algorithm allows an accurate prediction of membership in the subgroups and of the locus of the fission to be made from measurements of the potential for information flow across each edge in the network. Methods for measurement of potential information flow are discussed, and it is shown that all appropriate techniques will generate the same predictions.

3,342 citations

##### Related Papers (5)

[...]