Showing papers by "James Bailey published in 2014"

PDF

Open Access

Proceedings Article•DOI•

Effective global approaches for mutual information based feature selection

[...]

Xuan Vinh Nguyen¹, Jeffrey Chan¹, Simone Romano¹, James Bailey¹•Institutions (1)

24 Aug 2014

TL;DR: It is shown how the resulting NP-hard global optimization problem could be efficiently approximately solved via spectral relaxation and semi-definite programming techniques.

...read moreread less

Abstract: Most current mutual information (MI) based feature selection techniques are greedy in nature thus are prone to sub-optimal decisions Potential performance improvements could be gained by systematically posing MI-based feature selection as a global optimization problem A rare attempt at providing a global solution for the MI-based feature selection is the recently proposed Quadratic Programming Feature Selection (QPFS) approach We point out that the QPFS formulation faces several non-trivial issues, in particular, how to properly treat feature `self-redundancy' while ensuring the convexity of the objective function In this paper, we take a systematic approach to the problem of global MI-based feature selection We show how the resulting NP-hard global optimization problem could be efficiently approximately solved via spectral relaxation and semi-definite programming techniques We experimentally demonstrate the efficiency and effectiveness of these novel feature selection frameworks

...read moreread less

117 citations

Proceedings Article•

Standardized Mutual Information for Clustering Comparisons: One Step Further in Adjustment for Chance

[...]

Simone Romano¹, James Bailey¹, Vinh Nguyen¹, Karin Verspoor¹•Institutions (1)

University of Melbourne¹

21 Jun 2014

TL;DR: It is argued that a further type of statistical adjustment for the mutual information is also beneficial - an adjustment to correct selection bias, which requires computation of the variance of mutual information under a hypergeometric model of randomness.

...read moreread less

Abstract: Mutual information is a very popular measure for comparing clusterings. Previous work has shown that it is beneficial to make an adjustment for chance to this measure, by subtracting an expected value and normalizing via an upper bound. This yields the constant baseline property that enhances intuitiveness. In this paper, we argue that a further type of statistical adjustment for the mutual information is also beneficial - an adjustment to correct selection bias. This type of adjustment is useful when carrying out many clustering comparisons, to select one or more preferred clusterings. It reduces the tendency for the mutual information to choose clustering solutions i) with more clusters, or ii) induced on fewer data points, when compared to a reference one. We term our new adjusted measure the standardized mutual information. It requires computation of the variance of mutual information under a hypergeometric model of randomness, which is technically challenging. We derive an analytical formula for this variance and analyze its complexity. We then experimentally assess how our new measure can address selection bias and also increase interpretability. We recommend using the standardized mutual information when making multiple clustering comparisons in situations where the number of records is small compared to the number of clusters considered.

...read moreread less

102 citations

Journal Article•DOI•

Quantitative analysis of Plasmodium ookinete motion in three dimensions suggests a critical role for cell shape in the biomechanics of malaria parasite gliding motility

[...]

Andrey Kan¹, Yan-Hong Tan², Fiona Angrisano², Fiona Angrisano¹, Eric Hanssen¹, Kelly L. Rogers¹, Kelly L. Rogers², Lachlan Whitehead², Vanessa Mollard¹, Anton Cozijnsen¹, Michael J. Delves³, Simon Crawford¹, Robert E. Sinden³, Geoffrey I. McFadden¹, Christopher Leckie¹, James Bailey¹, Jake Baum², Jake Baum³, Jake Baum¹ - Show less +15 more•Institutions (3)

University of Melbourne¹, Walter and Eliza Hall Institute of Medical Research², Imperial College London³

01 May 2014-Cellular Microbiology

TL;DR: The analysis suggests that the molecular basis of cell shape may, in addition to motor force, be a key adaptive strategy for malaria parasite dissemination and, as such, transmission.

...read moreread less

Abstract: Motility is a fundamental part of cellular life and survival, including for Plasmodium parasites--single-celled protozoan pathogens responsible for human malaria. The motile life cycle forms achieve motility, called gliding, via the activity of an internal actomyosin motor. Although gliding is based on the well-studied system of actin and myosin, its core biomechanics are not completely understood. Currently accepted models suggest it results from a specifically organized cellular motor that produces a rearward directional force. When linked to surface-bound adhesins, this force is passaged to the cell posterior, propelling the parasite forwards. Gliding motility is observed in all three life cycle stages of Plasmodium: sporozoites, merozoites and ookinetes. However, it is only the ookinetes--formed inside the midgut of infected mosquitoes--that display continuous gliding without the necessity of host cell entry. This makes them ideal candidates for invasion-free biomechanical analysis. Here we apply a plate-based imaging approach to study ookinete motion in three-dimensional (3D) space to understand Plasmodium cell motility and how movement facilitates midgut colonization. Using single-cell tracking and numerical analysis of parasite motion in 3D, our analysis demonstrates that ookinetes move with a conserved left-handed helical trajectory. Investigation of cell morphology suggests this trajectory may be based on the ookinete subpellicular cytoskeleton, with complementary whole and subcellular electron microscopy showing that, like their motion paths, ookinetes share a conserved left-handed corkscrew shape and underlying twisted microtubular architecture. Through comparisons of 3D movement between wild-type ookinetes and a cytoskeleton-knockout mutant we demonstrate that perturbation of cell shape changes motion from helical to broadly linear. Therefore, while the precise linkages between cellular architecture and actomyosin motor organization remain unknown, our analysis suggests that the molecular basis of cell shape may, in addition to motor force, be a key adaptive strategy for malaria parasite dissemination and, as such, transmission.

...read moreread less

48 citations

Proceedings Article•

Reconsidering mutual information based feature selection: a statistical significance view

[...]

Nguyen Xuan Vinh¹, Jeffrey Chan¹, James Bailey¹•Institutions (1)

University of Melbourne¹

21 Jun 2014

TL;DR: This paper proposes a novel approach to the MI-based feature selection problem, in which the overfitting phenomenon is controlled rigourously by means of a statistical test, and develops local and global optimization algorithms for this new feature selection model.

...read moreread less

Abstract: Mutual information (MI) based approaches are a popular feature selection paradigm. Although the stated goal of Mi-based feature selection is to identify a subset of features that share the highest mutual information with the class variable, most current Mi-based techniques are greedy methods that make use of low dimensional MI quantities. The reason for using low dimensional approximation has been mostly attributed to the difficulty associated with estimating the high dimensional MI from limited samples. In this paper, we argue a different viewpoint that, given a very large amount of data, the high dimensional MI objective is still problematic to be employed as a meaningful optimization criterion, due to its overfitting nature: the MI almost always increases as more features are added, thus leading to a trivial solution which includes all features. We propose a novel approach to the MI-based feature selection problem, in which the overfitting phenomenon is controlled rigourously by means of a statistical test. We develop local and global optimization algorithms for this new feature selection model, and demonstrate its effectiveness in the applications of explaining variables and objects.

...read moreread less

30 citations

Book Chapter•DOI•

Exploiting Transitive Similarity and Temporal Dynamics for Similarity Search in Heterogeneous Information Networks

[...]

Jiazhen He¹, James Bailey¹, Rui Zhang¹•Institutions (1)

University of Melbourne¹

21 Apr 2014

TL;DR: The meta path-based similarity measure PathSim is extended by incorporating richer information, such as transitive similarity and temporal dynamics, to help solve the problem of similarity search in heterogeneous information networks.

...read moreread less

Abstract: Heterogeneous information networks have attracted much attention in recent years and a key challenge is to compute the similarity between two objects. In this paper, we study the problem of similarity search in heterogeneous information networks, and extend the meta path-based similarity measure PathSim by incorporating richer information, such as transitive similarity and temporal dynamics. Experiments on a large DBLP network show that our improved similarity measure is more effective at identifying similar authors in terms of their future collaborations.

...read moreread less

19 citations

Journal Article•DOI•

Generating multiple alternative clusterings via globally optimal subspaces

[...]

Xuan Hong Dang¹, James Bailey²•Institutions (2)

Aarhus University¹, University of Melbourne²

01 May 2014-Data Mining and Knowledge Discovery

TL;DR: Two new algorithms for alternative clustering generation are presented, each with a distinctive feature of their principled formulation of an objective function, facilitating the discovery of a subspace satisfying natural quality and orthogonality criteria.

...read moreread less

Abstract: Clustering analysis is important for exploring complex datasets. Alternative clustering analysis is an emerging subfield involving techniques for the generation of multiple different clusterings, allowing the data to be viewed from different perspectives. We present two new algorithms for alternative clustering generation. A distinctive feature of our algorithms is their principled formulation of an objective function, facilitating the discovery of a subspace satisfying natural quality and orthogonality criteria. The first algorithm is a regularization of the Principal Components analysis method, whereas the second is a regularization of graph-based dimension reduction. In both cases, we demonstrate a globally optimum subspace solution can be computed. Experimental evaluation shows our techniques are able to equal or outperform a range of existing methods.

...read moreread less

16 citations

Proceedings Article•DOI•

Generalized information theoretic cluster validity indices for soft clusterings

[...]

Yang Lei¹, James C. Bezdek¹, Jeffrey Chan¹, Nguyen Xuan Vinh¹, Simone Romano¹, James Bailey¹ - Show less +2 more•Institutions (1)

University of Melbourne¹

01 Dec 2014

TL;DR: This paper generalizes eight information theoretic crisp indices to soft clusterings, so that they can be used with partitions of any type (i.e., crisp or soft, with soft including fuzzy, probabilistic and possibilistic cases).

...read moreread less

Abstract: There have been a large number of external validity indices proposed for cluster validity. One such class of cluster comparison indices is the information theoretic measures, due to their strong mathematical foundation and their ability to detect non-linear relationships. However, they are devised for evaluating crisp (hard) partitions. In this paper, we generalize eight information theoretic crisp indices to soft clusterings, so that they can be used with partitions of any type (i.e., crisp or soft, with soft including fuzzy, probabilistic and possibilistic cases). We present experimental results to demonstrate the effectiveness of the generalized information theoretic indices.

...read moreread less

14 citations

Proceedings Article•DOI•

A context-aware do-not-disturb service for mobile devices

[...]

Yujue Qin¹, Tanusri Bhattacharya¹, Lars Kulik¹, James Bailey¹•Institutions (1)

University of Melbourne¹

25 Nov 2014

TL;DR: This paper proposes a proof of concept Do-Not-Disturb (DND) service that can a) determine a user's context relevant for DND service from the built-in smartphone sensors and b) correctly predict the DND status based on the given context such as being in a meeting, sleeping, or working at the office.

...read moreread less

Abstract: Modern sensor-equipped smartphones have attracted significant research interest in the pervasive community for recognizing and creating context-aware applications at a personal or community scale level. In this paper, we propose a proof of concept Do-Not-Disturb (DND) service that can a) determine a user's context relevant for DND service from the built-in smartphone sensors and b) correctly predict the DND status based on the given context such as being in a meeting, sleeping, or working at the office. In this preliminary study, we investigate whether sensor data can be clustered to represent user contexts. We use standard machine learning techniques to learn the relationship between a user's context and the corresponding DND status (available or unavailable). Given a user's current context, the DND service predicts a DND status and configures the mobile device accordingly. Our preliminary experiment demonstrates that the proposed system can achieve a prediction accuracy of up to 90% when trained with sufficient data.

...read moreread less

12 citations

Proceedings Article•

Efficient Matching of Substrings in Uncertain Sequences

[...]

Yuxuan Li¹, James Bailey¹, Lars Kulik¹, Jian Pei²•Institutions (2)

University of Melbourne¹, Simon Fraser University²

01 Jan 2014

TL;DR: This paper focuses on the core problem of computing substring matching probability in uncertain sequences and proposes an efficient dynamic programming algorithm for this task, which contributes towards a foundation for adapting classic sequence mining methods to deal with uncertain data.

...read moreread less

Abstract: Substring matching is fundamental to data mining methods for sequential data. It involves checking the existence of a short subsequence within a longer sequence, ensuring no gaps within a match. Whilst a large amount of existing work has focused on substring matching and mining techniques for certain sequences, there are only a few results for uncertain sequences. Uncertain sequences provide powerful representations for modelling sequence behavioural characteristics in emerging domains, such as bioinformatics, sensor streams and trajectory analysis. In this paper, we focus on the core problem of computing substring matching probability in uncertain sequences and propose an efficient dynamic programming algorithm for this task. We demonstrate our approach is both competitive theoretically, as well as effective and scalable experimentally. Our results contribute towards a foundation for adapting classic sequence mining methods to deal with uncertain data.

...read moreread less

9 citations

Book Chapter•DOI•

Mining Contrast Subspaces

[...]

Lei Duan¹, Lei Duan², Guanting Tang³, Jian Pei³, James Bailey⁴, Guozhu Dong⁵, Akiko Campbell, Changjie Tang² - Show less +4 more•Institutions (5)

Wuhan University¹, Sichuan University², Simon Fraser University³, University of Melbourne⁴, Wright State University⁵

13 May 2014

TL;DR: This paper presents CSMiner, a mining method with various pruning techniques, which is substantially faster than the baseline method and demonstrates that this problem has important applications, and at the same time is very challenging.

...read moreread less

Abstract: In this paper, we tackle a novel problem of mining contrast subspaces. Given a set of multidimensional objects in two classes C + and C − and a query object o, we want to find top-k subspaces S that maximize the ratio of likelihood of o in C + against that in C −. We demonstrate that this problem has important applications, and at the same time, is very challenging. It even does not allow polynomial time approximation. We present CSMiner, a mining method with various pruning techniques. CSMiner is substantially faster than the baseline method. Our experimental results on real data sets verify the effectiveness and efficiency of our method.

...read moreread less

9 citations

Journal Article•

A temporal bone surgery simulator with real-time feedback for surgical training.

[...]

Sudanthi Wijewickrema¹, Ioanna Ioannou¹, Yun Zhou¹, Patorn Piromchai¹, James Bailey¹, Gregor Kennedy¹, Stephen O'Leary¹ - Show less +3 more•Institutions (1)

University of Melbourne¹

01 Jan 2014-Studies in health technology and informatics

TL;DR: This work introduces a real-time feedback system for surgical technique within a temporal bone surgical simulator and shows that this feedback system performs exceptionally well with respect to accuracy and effectiveness.

...read moreread less

Abstract: Timely feedback on surgical technique is an important aspect of surgical skill training in any learning environment, be it virtual or otherwise. Feedback on technique should be provided in real-time to allow trainees to recognize and amend their errors as they occur. Expert surgeons have typically carried out this task, but they have limited time available to spend with trainees. Virtual reality surgical simulators offer effective, repeatable training at relatively low cost, but their benefits may not be fully realized while they still require the presence of experts to provide feedback. We attempt to overcome this limitation by introducing a real-time feedback system for surgical technique within a temporal bone surgical simulator. Our evaluation study shows that this feedback system performs exceptionally well with respect to accuracy and effectiveness.

...read moreread less

Proceedings Article•DOI•

Is this you?: identifying a mobile user using only diagnostic features

[...]

Anthony Quattrone¹, Tanusri Bhattacharya¹, Lars Kulik¹, Egemen Tanin¹, James Bailey¹ - Show less +1 more•Institutions (1)

University of Melbourne¹

25 Nov 2014

TL;DR: This work shows that diagnostic information that is not considered sensitive, could be used to identify a user after just three consecutive days of monitoring, using only diagnostic features like hardware statistics and system settings.

...read moreread less

Abstract: Mobile smart phones capture a great amount of information about a user across a variety of different data domains. This information can be sensitive and allow for identifying a user profile, thus causing potential threats to a user's privacy. Our work shows that diagnostic information that is not considered sensitive, could be used to identify a user after just three consecutive days of monitoring. We have used the Device Analyzer dataset to determine what features of a mobile device are important in identifying a user.Many mobile games and applications collect diagnostic data as a means of identifying or resolving issues. Diagnostic data is commonly accepted as less sensitive information. Our experimental results demonstrate that using only diagnostic features like hardware statistics and system settings, a user's device can be identified at an accuracy of 94% with a Naive Bayes classifier.

...read moreread less

Book Chapter•DOI•

Structure-Aware Distance Measures for Comparing Clusterings in Graphs

[...]

Jeffrey Chan¹, Jeffrey Chan², Nguyen Xuan Vinh¹, Nguyen Xuan Vinh², Wei Liu², Wei Liu¹, James Bailey², James Bailey¹, Christopher Leckie², Christopher Leckie¹, Kotagiri Ramamohanarao¹, Kotagiri Ramamohanarao², Jian Pei², Jian Pei¹ - Show less +10 more•Institutions (2)

Simon Fraser University¹, University of Melbourne²

13 May 2014

TL;DR: This paper focuses on comparison measures for two important graph clustering approaches, community detection and blockmodelling, and proposes comparison measures that work for weighted (and unweighted) graphs.

...read moreread less

Abstract: Clustering in graphs aims to group vertices with similar patterns of connections. Applications include discovering communities and latent structures in graphs. Many algorithms have been proposed to find graph clusterings, but an open problem is the need for suitable comparison measures to quantitatively validate these algorithms, performing consensus clustering and to track evolving (graph) clusters across time. To date, most comparison measures have focused on comparing the vertex groupings, and completely ignore the difference in the structural approximations in the clusterings, which can lead to counter-intuitive comparisons. In this paper, we propose new measures that account for differences in the approximations. We focus on comparison measures for two important graph clustering approaches, community detection and blockmodelling, and propose comparison measures that work for weighted (and unweighted) graphs.

...read moreread less

Journal Article•DOI•

Laboratory tests to identify patients at risk of early major adverse events: a prospective pilot study

[...]

Melissa Kaufman¹, Bronwyn Bebee¹, James Bailey², Raymond J Robbins¹, Graeme K Hart¹, Rinaldo Bellomo³, Rinaldo Bellomo¹ - Show less +3 more•Institutions (3)

Austin Hospital¹, University of Melbourne², Monash University³

01 Oct 2014-Internal Medicine Journal

TL;DR: To test whether commonly measured laboratory variables can identify surgical patients at risk of major adverse events (death, unplanned intensive care unit (ICU) admission or rapid response team (RRT) activation), a number of laboratory variables are tested.

...read moreread less

Abstract: Background/Aims To test whether commonly measured laboratory variables can identify surgical patients at risk of major adverse events (death, unplanned intensive care unit (ICU) admission or rapid response team (RRT) activation). Methods We conducted a prospective observational study in a surgical ward of a university-affiliated hospital in a cohort of 834 surgical patients admitted for >24 h. We applied a previously validated multivariable model-derived risk assessment to each combined set of common laboratory tests to identify patients at risk. We compared the clinical course of such patients with that of control patients from the same ward who had blood tests but were identified as low risk. Results We studied 7955 batches and 73 428 individual tests in 834 patients (males 55%; average age 65.8 ± 17.6 years). Among these patients, 66 (7.9%) were identified as ‘high risk’. High-risk patients were older (75.9 vs 61.8 years of age; P < 0.0001), had much greater early (48 h) mortality (6/66 (9%) vs 4/768 (0.5%); P < 0.0001) and greater overall hospital mortality (11/66 (16.7%) vs 9/768 (1.2%); P < 0.0001). They also had more early (8/66 (12.1%) vs 14/768 (1.8%); P = 0.0001) and overall in-hospital unplanned ICU admissions (12/66 (18.2%) vs 18/768 (2.3%); P < 0.0001) and more early (26/66 (39.3%) vs 50/768 (6.5%); P < 0.0001) and overall in-hospital RRT calls (26/66 (39.4%) vs 55/768 (7.2%); P < 0.0001). Conclusions Commonly performed laboratory tests identify surgical ward patients at risk of early major adverse events. Further studies are needed to assess whether such identification system can be used to trigger interventions that help improve patient outcomes.

...read moreread less

Book Chapter•DOI•

FILTA: better view discovery from collections of clusterings via filtering

[...]

Yang Lei¹, Nguyen Xuan Vinh¹, Jeffrey Chan¹, James Bailey¹•Institutions (1)

University of Melbourne¹

15 Sep 2014

TL;DR: In this paper, a simple and effective filtering algorithm (FILTA) is proposed to find multiple clusterings in the datasest, which can be flexibly used in conjunction with any meta-clustering method.

...read moreread less

Abstract: Meta-clustering is a popular approach to find multiple clusterings in the datasest, which takes a large number of base clusterings as input for further user navigation and refinement. However, the effectiveness of meta-clustering is highly dependent on the distribution of the base clusterings and open challenges exist with regard to its stability and noise tolerance. In this paper we propose a simple and effective filtering algorithm (FILTA) that can be flexibly used in conjunction with any meta-clustering method. Given a (raw) set of base clusterings, FILTA employs information theoretic criteria to remove those having poor quality or high redundancy. Then this filtered set of clusterings is highly suitable for further exploration, particularly the use of visualization for determining the dominant views in the dataset. We evaluate FILTA on both synthetic and real world datasets, and see how its use can enhance view discovery for complex scenarios.

...read moreread less

Book Chapter•DOI•

Improved Feature Transformations for Classification Using Density Estimation

[...]

Yamuna Kankanige¹, James Bailey¹•Institutions (1)

University of Melbourne¹

01 Dec 2014

TL;DR: It is demonstrated that higher order transformations have the potential to boost prediction performance and that DLR is a promising method for transfer learning.

...read moreread less

Abstract: Density based logistic regression (DLR) is a recently introduced classification technique, that performs a one-to-one non-linear transformation of the original feature space to another feature space based on density estimations. This new feature space is particularly well suited for learning a logistic regression model. Whilst performance gains, good interpretability and time efficiency make DLR attractive, there exist some limitations to its formulation. In this paper, we tackle these limitations and propose several new extensions: 1) A more robust methodology for performing density estimations, 2) A method that can transform two or more features into a single target feature, based on the use of higher order kernel density estimation, 3) Analysis of the utility of DLR for transfer learning scenarios. We evaluate our extensions using several synthetic and publicly available datasets, demonstrating that higher order transformations have the potential to boost prediction performance and that DLR is a promising method for transfer learning.

...read moreread less

Proceedings Article•DOI•

Transfer Learning of a Temporal Bone Performance Model via Anatomical Feature Registration

[...]

Yun Zhou¹, Ioanna Ioannou¹, Sudanthi Wijewickrema¹, James Bailey¹, Patorn Piromchai¹, Gregor Kennedy¹, Stephen O'Leary¹ - Show less +3 more•Institutions (1)

University of Melbourne¹

24 Aug 2014

TL;DR: This work proposes a transfer learning framework to adapt a classifier built on a single temporal bone specimen to multiple specimens, and built a surgical end-product performance classifier from 16 expert trials on a simulated temporalBone specimen.

...read moreread less

Abstract: Evaluation of the outcome (end-product) of surgical procedures carried out in virtual reality environments is an essential part of simulation-based surgical training. Automated end-product assessment can be carried out by performance classifiers built from a set of expert performances. When applied to temporal bone surgery simulation, these classifiers can evaluate performance on the bone specimen they were trained on, but they cannot be extended to new specimens. Thus, new expert performances need to be recorded for each new specimen, requiring considerable time commitment from time-poor expert surgeons. To eliminate this need, we propose a transfer learning framework to adapt a classifier built on a single temporal bone specimen to multiple specimens. Once a classifier is trained, we translate each new specimens' features to the original feature space, which allows us to carry out performance evaluation on different specimens using the same classifier. In our experiment, we built a surgical end-product performance classifier from 16 expert trials on a simulated temporal bone specimen. We applied the transfer learning approach to 8 new specimens to obtain machine generated end-products. We also collected end-products for these 8 specimens drilled by a single expert. We then compared the machine generated end-products to those drilled by the expert. The drilled regions generated by transfer learning were similar to those drilled by the expert.

...read moreread less

Journal Article•

FILTA: Better View Discovery from Collections of Clusterings via Filtering

[...]

Yang Lei, X Nguyen, Jeffrey Chan, James Bailey

01 Jan 2014-Lecture Notes in Computer Science

TL;DR: This paper proposes a simple and effective filtering algorithm (FILTA) that can be flexibly used in conjunction with any meta-clustering method and evaluates its use on both synthetic and real world datasets to see how its use can enhance view discovery for complex scenarios.

...read moreread less

Proceedings Article•DOI•

TRIBAC: Discovering Interpretable Clusters and Latent Structures in Graphs

[...]

Jeffrey Chan¹, Christopher Leckie¹, James Bailey¹, Kotagiri Ramamohanarao¹•Institutions (1)

University of Melbourne¹

14 Dec 2014

TL;DR: This paper proposes a new formulation of the graph clustering problem that results in clusterings that are easy to interpret and more accurate than state-of-the-art algorithms for both synthetic and real datasets.

...read moreread less

Abstract: Graphs are a powerful representation of relational data, such as social and biological networks. Often, these entities form groups and are organised according to a latent structure. However, these groupings and structures are generally unknown and it can be difficult to identify them. Graph clustering is an important type of approach used to discover these vertex groups and the latent structure within graphs. One type of approach for graph clustering is non-negative matrix factorisation However, the formulations of existing factorisation approaches can be overly relaxed and their groupings and results consequently difficult to interpret, may fail to discover the true latent structure and groupings, and converge to extreme solutions. In this paper, we propose a new formulation of the graph clustering problem that results in clusterings that are easy to interpret. Combined with a novel algorithm, the clusterings are also more accurate than state-of-the-art algorithms for both synthetic and real datasets.

...read moreread less