scispace - formally typeset
Search or ask a question
Author

Thomas Brendan Murphy

Other affiliations: Trinity College, Dublin
Bio: Thomas Brendan Murphy is an academic researcher from University College Dublin. The author has contributed to research in topics: Cluster analysis & Mixture model. The author has an hindex of 32, co-authored 140 publications receiving 3612 citations. Previous affiliations of Thomas Brendan Murphy include Trinity College, Dublin.


Papers
More filters
Journal ArticleDOI
TL;DR: A class of eight parsimonious Gaussian mixture models which are based on the mixtures of factor analyzers model are introduced and the maximum likelihood estimates for the parameters in these models are found using an AECM algorithm.
Abstract: Parsimonious Gaussian mixture models are developed using a latent Gaussian model which is closely related to the factor analysis model. These models provide a unified modeling framework which includes the mixtures of probabilistic principal component analyzers and mixtures of factor of analyzers models as special cases. In particular, a class of eight parsimonious Gaussian mixture models which are based on the mixtures of factor analyzers model are introduced and the maximum likelihood estimates for the parameters in these models are found using an AECM algorithm. The class of models includes parsimonious models that have not previously been developed. These models are applied to the analysis of chemical and physical properties of Italian wines and the chemical properties of coffee; the models are shown to give excellent clustering performance.

337 citations

Journal ArticleDOI
TL;DR: This modelling approach builds on previous work by introducing a modified factor analysis covariance structure, leading to a family of 12 mixture models, including parsimonious models, which gives very good performance, relative to existing popular clustering techniques, when applied to real gene expression microarray data.
Abstract: Motivation: In recent years, work has been carried out on clustering gene expression microarray data. Some approaches are developed from an algorithmic viewpoint whereas others are developed via the application of mixture models. In this article, a family of eight mixture models which utilizes the factor analysis covariance structure is extended to 12 models and applied to gene expression microarray data. This modelling approach builds on previous work by introducing a modified factor analysis covariance structure, leading to a family of 12 mixture models, including parsimonious models. This family of models allows for the modelling of the correlation between gene expression levels even when the number of samples is small. Parameter estimation is carried out using a variant of the expectation–maximization algorithm and model selection is achieved using the Bayesian information criterion. This expanded family of Gaussian mixture models, known as the expanded parsimonious Gaussian mixture model (EPGMM) family, is then applied to two well-known gene expression data sets. Results: The performance of the EPGMM family of models is quantified using the adjusted Rand index. This family of models gives very good performance, relative to existing popular clustering techniques, when applied to real gene expression microarray data. Availability: The reduced, preprocessed data that were analysed are available at www.paulmcnicholas.info Contact: pmcnicho@uoguelph.ca

211 citations

Journal ArticleDOI
TL;DR: The most significantly altered HILIC peak in lung cancer samples includes predominantly disialylated and tri- and tetra-antennary glycans, which is the combination of all glyco-biomarkers had the highest sensitivity and specificity.
Abstract: Lung cancer has a poor prognosis and a 5-year survival rate of 15%. Therefore, early detection is vital. Diagnostic testing of serum for cancer-associated biomarkers is a noninvasive detection method. Glycosylation is the most frequent post-translational modification of proteins and it has been shown to be altered in cancer. In this paper, high-throughput HILIC technology was applied to serum samples from 100 lung cancer patients, alongside 84 age-matched controls and significant alterations in N-linked glycosylation were identified. Increases were detected in glycans containing Sialyl Lewis X, monoantennary glycans, highly sialylated glycans and decreases were observed in core-fucosylated biantennary glycans, with some being detectable as early as in Stage I. The N-linked glycan profile of haptoglobin demonstrated similar alterations to those elucidated in the total serum glycome. The most significantly altered HILIC peak in lung cancer samples includes predominantly disialylated and tri- and tetra-antennary glycans. This potential disease marker is significantly increased across all disease groups compared to controls and a strong disease effect is visible even after the effect of smoking is accounted for. The combination of all glyco-biomarkers had the highest sensitivity and specificity. This study identifies candidates for further study as potential biomarkers for the disease.

186 citations

Journal ArticleDOI
TL;DR: Mixtures of distance-based models are used to analyze ranking data from heterogeneous populations, including the Irish electoral system and the Irish college admission system.

145 citations

Journal ArticleDOI
TL;DR: The analysis of network data is an area that is rapidly growing, both within and outside of the discipline of statistics.
Abstract: The analysis of network data is an area that is rapidly growing, both within and outside of the discipline of statistics. This review provides a concise summary of methods and models used in the statistical analysis of network data, including the Erdős–Renyi model, the exponential family class of network models, and recently developed latent variable models. Many of the methods and models are illustrated by application to the well-known Zachary karate dataset. Software routines available for implementing methods are emphasized throughout. The aim of this paper is to provide a review with enough detail about many common classes of network models to whet the appetite and to point the way to further reading. © 2012 Wiley Periodicals, Inc. Statistical Analysis and Data Mining, 2012 (This material is based upon work supported by the Science Foundation Ireland under Grant No. 08/SRC/I1407: Clique: Graph & Network Analysis Cluster.)

140 citations


Cited by
More filters
Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Proceedings ArticleDOI
22 Jan 2006
TL;DR: Some of the major results in random graphs and some of the more challenging open problems are reviewed, including those related to the WWW.
Abstract: We will review some of the major results in random graphs and some of the more challenging open problems. We will cover algorithmic and structural questions. We will touch on newer models, including those related to the WWW.

7,116 citations

01 Jan 2016
TL;DR: The modern applied statistics with s is universally compatible with any devices to read, and is available in the digital library an online access to it is set as public so you can download it instantly.
Abstract: Thank you very much for downloading modern applied statistics with s. As you may know, people have search hundreds times for their favorite readings like this modern applied statistics with s, but end up in harmful downloads. Rather than reading a good book with a cup of coffee in the afternoon, instead they cope with some harmful virus inside their laptop. modern applied statistics with s is available in our digital library an online access to it is set as public so you can download it instantly. Our digital library saves in multiple countries, allowing you to get the most less latency time to download any of our books like this one. Kindly say, the modern applied statistics with s is universally compatible with any devices to read.

5,249 citations

01 Jan 2012

3,692 citations