scispace - formally typeset
Search or ask a question

Showing papers by "Svetha Venkatesh published in 2015"


Journal ArticleDOI
TL;DR: A computational framework to harness EMR with minimal human supervision via restricted Boltzmann machine (RBM) derives a new representation of medical objects by embedding them in a low-dimensional vector space that facilitates algebraic and statistical manipulations such as projection onto 2D plane, object grouping, and risk stratification.

149 citations


Journal ArticleDOI
TL;DR: It is shown that Tree-Lasso based feature selection is significantly more stable than Lasso and comparable to other methods e.g. Information Gain, ReliefF and T-test and that, using different types of classifiers such as logistic regression, naive Bayes, support vector machines, decision trees and Random Forest, the classification performance of Tree- Lasso is comparable to Lassoand better than other methods.

77 citations


Proceedings ArticleDOI
14 Nov 2015
TL;DR: A novel random forest algorithm under the framework of differential privacy and feasible adversary models to infer about the attribute and class label of unknown data in presence of the knowledge of all other data are proposed.
Abstract: Privacy-preserving data mining has become an active focus of the research community in the domains where data are sensitive and personal in nature. For example, highly sensitive digital repositories of medical or financial records offer enormous values for risk prediction and decision making. However, prediction models derived from such repositories should maintain strict privacy of individuals. We propose a novel random forest algorithm under the framework of differential privacy. Unlike previous works that strictly follow differential privacy and keep the complete data distribution approximately invariant to change in one data instance, we only keep the necessary statistics (e.g. variance of the estimate) invariant. This relaxation results in significantly higher utility. To realize our approach, we propose a novel differentially private decision tree induction algorithm and use them to create an ensemble of decision trees. We also propose feasible adversary models to infer about the attribute and class label of unknown data in presence of the knowledge of all other data. Under these adversary models, we derive bounds on the maximum number of trees that are allowed in the ensemble while maintaining privacy. We focus on binary classification problem and demonstrate our approach on four real-world datasets. Compared to the existing privacy preserving approaches we achieve significantly higher utility.

70 citations


Journal ArticleDOI
TL;DR: The key goal of the present work is to explore the idea that Twitter, as a highly popular platform for information exchange, could be used as a data-mining source to learn about the population affected by ASD—their behaviour, concerns, needs, etc.
Abstract: Considering the raising socio-economic burden of autism spectrum disorder (ASD), timely and evidence-driven public policy decision making and communication of the latest guidelines pertaining to the treatment and management of the disorder is crucial. Yet evidence suggests that policy makers and medical practitioners do not always have a good understanding of the practices and relevant beliefs of ASD-afflicted individuals' carers who often follow questionable recommendations and adopt advice poorly supported by scientific data. The key goal of the present work is to explore the idea that Twitter, as a highly popular platform for information exchange, could be used as a data-mining source to learn about the population affected by ASD -- their behaviour, concerns, needs etc. To this end, using a large data set of over 11 million harvested tweets as the basis for our investigation, we describe a series of experiments which examine a range of linguistic and semantic aspects of messages posted by individuals interested in ASD. Our findings, the first of their nature in the published scientific literature, strongly motivate additional research on this topic and present a methodological basis for further work.

46 citations


Journal ArticleDOI
TL;DR: A novel mixed norm is introduced that takes a trade-off between SRC and JSRC and uses ?

42 citations


Journal ArticleDOI
TL;DR: A novel ordinal regression framework for predicting medical risk stratification from EMR is constructed, a conceptual view of EMR as a temporal image is constructed to extract a diverse set of features, and two indices are introduced that measure the model stability against data resampling.
Abstract: The recent wide adoption of electronic medical records (EMRs) presents great opportunities and challenges for data mining. The EMR data are largely temporal, often noisy, irregular and high dimensional. This paper constructs a novel ordinal regression framework for predicting medical risk stratification from EMR. First, a conceptual view of EMR as a temporal image is constructed to extract a diverse set of features. Second, ordinal modeling is applied for predicting cumulative or progressive risk. The challenges are building a transparent predictive model that works with a large number of weakly predictive features, and at the same time, is stable against resampling variations. Our solution employs sparsity methods that are stabilized through domain-specific feature interaction networks. We introduces two indices that measure the model stability against data resampling. Feature networks are used to generate two multivariate Gaussian priors with sparse precision matrices (the Laplacian and Random Walk). We apply the framework on a large short-term suicide risk prediction problem and demonstrate that our methods outperform clinicians to a large margin, discover suicide risk factors that conform with mental health knowledge, and produce models with enhanced stability.

42 citations


Journal ArticleDOI
TL;DR: In this article, the authors explore the idea that Twitter, as a highly popular platform for information exchange, could be used as a data-mining source to learn about the population affected by ASD.
Abstract: Considering the raising socio-economic burden of autism spectrum disorder (ASD), timely and evidence-driven public policy decision-making and communication of the latest guidelines pertaining to the treatment and management of the disorder is crucial. Yet evidence suggests that policy makers and medical practitioners do not always have a good understanding of the practices and relevant beliefs of ASD-afflicted individuals’ carers who often follow questionable recommendations and adopt advice poorly supported by scientific data. The key goal of the present work is to explore the idea that Twitter, as a highly popular platform for information exchange, could be used as a data-mining source to learn about the population affected by ASD—their behaviour, concerns, needs, etc. To this end, using a large data set of over 11 million harvested tweets as the basis for our investigation, we describe a series of experiments which examine a range of linguistic and semantic aspects of messages posted by individuals interested in ASD. Our findings, the first of their nature in the published scientific literature, strongly motivate additional research on this topic and present a methodological basis for further work.

37 citations


Proceedings Article
25 Jan 2015
TL;DR: Tensor-variate Restricted Boltzmann Machines are introduced which generalize RBMs to capture the multiplicative interaction between data modes and the latent variables and are demonstrated on three real-world applications: handwritten digit classification, face recognition and EEG-based alcoholic diagnosis.
Abstract: Restricted Boltzmann Machines (RBMs) are an important class of latent variable models for representing vector data. An under-explored area is multimode data, where each data point is a matrix or a tensor. Standard RBMs applying to such data would require vectorizing matrices and tensors, thus resulting in unnecessarily high dimensionality and at the same time, destroying the inherent higher-order interaction structures. This paper introduces Tensor-variate Restricted Boltzmann Machines (TvRBMs) which generalize RBMs to capture the multiplicative interaction between data modes and the latent variables. TvRBMs are highly compact in that the number of free parameters grows only linear with the number of modes. We demonstrate the capacity of TvRBMs on three real-world applications: handwritten digit classification, face recognition and EEG-based alcoholic diagnosis. The learnt features of the model are more discriminative than the rivals, resulting in better classification performance.

32 citations


Journal ArticleDOI
TL;DR: Inspired by the tree aerodynamics theory, a novel method named local variation persistence (LVP), that captures the key characteristics of swaying motions is proposed, posed as a convex optimization problem, whose variable is the local variation.
Abstract: Dynamically changing background (dynamic background) still presents a great challenge to many motion-based video surveillance systems. In the context of event detection, it is a major source of false alarms. There is a strong need from the security industry either to detect and suppress these false alarms, or dampen the effects of background changes, so as to increase the sensitivity to meaningful events of interest. In this paper, we restrict our focus to one of the most common causes of dynamic background changes: 1) that of swaying tree branches and 2) their shadows under windy conditions. Considering the ultimate goal in a video analytics pipeline, we formulate a new dynamic background detection problem as a signal processing alternative to the previously described but unreliable computer vision-based approaches. Within this new framework, we directly reduce the number of false alarms by testing if the detected events are due to characteristic background motions. In addition, we introduce a new data set suitable for the evaluation of dynamic background detection. It consists of real-world events detected by a commercial surveillance system from two static surveillance cameras. The research question we address is whether dynamic background can be detected reliably and efficiently using simple motion features and in the presence of similar but meaningful events, such as loitering. Inspired by the tree aerodynamics theory, we propose a novel method named local variation persistence (LVP), that captures the key characteristics of swaying motions. The method is posed as a convex optimization problem, whose variable is the local variation. We derive a computationally efficient algorithm for solving the optimization problem, the solution of which is then used to form a powerful detection statistic. On our newly collected data set, we demonstrate that the proposed LVP achieves excellent detection results and outperforms the best alternative adapted from existing art in the dynamic background literature.

30 citations


Journal ArticleDOI
TL;DR: This work further investigates these online populations through the contents of not only their posts but also their comments, and finds all three features are found to be significantly different between Autism and Control, and between autism Personal and Community.
Abstract: The Internet has provided an ever increasingly popular platform for individuals to voice their thoughts, and like-minded people to share stories. This unintentionally leaves characteristics of individuals and communities, which are often difficult to be collected in traditional studies. Individuals with autism are such a case, in which the Internet could facilitate even more communication given its social-spatial distance being a characteristic preference for individuals with autism. Previous studies examined the traces left in the posts of online autism communities (Autism) in comparison with other online communities (Control). This work further investigates these online populations through the contents of not only their posts but also their comments. We first compare the Autism and Control blogs based on three features: topics, language styles and affective information. The autism groups are then further examined, based on the same three features, by looking at their personal (Personal) and community (Community) blogs separately. Machine learning and statistical methods are used to discriminate blog contents in both cases. All three features are found to be significantly different between Autism and Control, and between autism Personal and Community. These features also show good indicative power in prediction of autism blogs in both personal and community settings.

29 citations


Proceedings ArticleDOI
TL;DR: In this article, a temporal topic model based on a hierarchical Dirichlet process is used to track the evolution of a complex topic structure of a Twitter community using autism-related tweets.
Abstract: Notwithstanding recent work which has demonstrated the potential of using Twitter messages for content-specific data mining and analysis, the depth of such analysis is inherently limited by the scarcity of data imposed by the 140 character tweet limit. In this paper we describe a novel approach for targeted knowledge exploration which uses tweet content analysis as a preliminary step. This step is used to bootstrap more sophisticated data collection from directly related but much richer content sources. In particular we demonstrate that valuable information can be collected by following URLs included in tweets. We automatically extract content from the corresponding web pages and treating each web page as a document linked to the original tweet show how a temporal topic model based on a hierarchical Dirichlet process can be used to track the evolution of a complex topic structure of a Twitter community. Using autism-related tweets we demonstrate that our method is capable of capturing a much more meaningful picture of information exchange than user-chosen hashtags.

Journal ArticleDOI
04 May 2015-PLOS ONE
TL;DR: It is found that regional prevalence estimates for non-communicable diseases can be reasonably predicted and stressed the vital importance of simple socio-demographic characteristics as both indicators and determinants of chronic disease.
Abstract: For years, we have relied on population surveys to keep track of regional public health statistics, including the prevalence of non-communicable diseases. Because of the cost and limitations of such surveys, we often do not have the up-to-date data on health outcomes of a region. In this paper, we examined the feasibility of inferring regional health outcomes from socio-demographic data that are widely available and timely updated through national censuses and community surveys. Using data for 50 American states (excluding Washington DC) from 2007 to 2012, we constructed a machine-learning model to predict the prevalence of six non-communicable disease (NCD) outcomes (four NCDs and two major clinical risk factors), based on population socio-demographic characteristics from the American Community Survey. We found that regional prevalence estimates for non-communicable diseases can be reasonably predicted. The predictions were highly correlated with the observed data, in both the states included in the derivation model (median correlation 0.88) and those excluded from the development for use as a completely separated validation sample (median correlation 0.85), demonstrating that the model had sufficient external validity to make good predictions, based on demographics alone, for areas not included in the model development. This highlights both the utility of this sophisticated approach to model development, and the vital importance of simple socio-demographic characteristics as both indicators and determinants of chronic disease.

Proceedings ArticleDOI
25 Aug 2015
TL;DR: A novel approach for targeted knowledge exploration which uses tweet content analysis as a preliminary step to bootstrap more sophisticated data collection from directly related but much richer content sources and it is demonstrated that valuable information can be collected by following URLs included in tweets.
Abstract: Notwithstanding recent work which has demonstrated the potential of using Twitter messages for content-specific data mining and analysis, the depth of such analysis is inherently limited by the scarcity of data imposed by the 140 character tweet limit. In this paper we describe a novel approach for targeted knowledge exploration which uses tweet content analysis as a preliminary step. This step is used to bootstrap more sophisticated data collection from directly related but much richer content sources. In particular we demonstrate that valuable information can be collected by following URLs included in tweets. We automatically extract content from the corresponding web pages and treating each web page as a document linked to the original tweet show how a temporal topic model based on a hierarchical Dirichlet process can be used to track the evolution of a complex topic structure of a Twitter community. Using autism-related tweets we demonstrate that our method is capable of capturing a much more meaningful picture of information exchange than user-chosen hashtags.

Posted Content
TL;DR: In this paper, the authors describe a novel framework for the discovery of the topical content of a data corpus, and the tracking of its complex structural changes across the temporal dimension, which does not impose a prior on the rate at which documents are added to the corpus nor does it adopt the Markovian assumption which overly restricts the type of changes that the model can capture.
Abstract: In this paper we describe a novel framework for the discovery of the topical content of a data corpus, and the tracking of its complex structural changes across the temporal dimension. In contrast to previous work our model does not impose a prior on the rate at which documents are added to the corpus nor does it adopt the Markovian assumption which overly restricts the type of changes that the model can capture. Our key technical contribution is a framework based on (i) discretization of time into epochs, (ii) epoch-wise topic discovery using a hierarchical Dirichlet process-based model, and (iii) a temporal similarity graph which allows for the modelling of complex topic changes: emergence and disappearance, evolution, and splitting and merging. The power of the proposed framework is demonstrated on the medical literature corpus concerned with the autism spectrum disorder (ASD) - an increasingly important research subject of significant social and healthcare importance. In addition to the collected ASD literature corpus which we will make freely available, our contributions also include two free online tools we built as aids to ASD researchers. These can be used for semantically meaningful navigation and searching, as well as knowledge discovery from this large and rapidly growing corpus of literature.

Journal ArticleDOI
TL;DR: In this paper, the problem of estimating the running quantile of a data stream when the memory for storing observations is limited is addressed, and two novel algorithms that exploit the proposed principle in different ways are introduced.
Abstract: The need to estimate a particular quantile of a distribution is an important problem that frequently arises in many computer vision and signal processing applications. For example, our work was motivated by the requirements of many semiautomatic surveillance analytics systems that detect abnormalities in close-circuit television footage using statistical models of low-level motion features. In this paper, we specifically address the problem of estimating the running quantile of a data stream when the memory for storing observations is limited. We make the following several major contributions: 1) we highlight the limitations of approaches previously described in the literature that make them unsuitable for nonstationary streams; 2) we describe a novel principle for the utilization of the available storage space; 3) we introduce two novel algorithms that exploit the proposed principle in different ways; and 4) we present a comprehensive evaluation and analysis of the proposed algorithms and the existing methods in the literature on both synthetic data sets and three large real-world streams acquired in the course of operation of an existing commercial surveillance system. Our findings convincingly demonstrate that both of the proposed methods are highly successful and vastly outperform the existing alternatives. We show that the better of the two algorithms (data-aligned histogram) exhibits far superior performance in comparison with the previously described methods, achieving more than 10 times lower estimate errors on real-world data, even when its available working memory is an order of magnitude smaller.

Journal ArticleDOI
TL;DR: TOBY may make a useful contribution to early intervention programming for children with ASD delivering high rates of appropriate learning opportunities and evaluating the efficacy of TOBY in relation to independent indicators of functioning is warranted.
Abstract: Purpose: To investigate use patterns and learning outcomes associated with the use of Therapy Outcomes By You (TOBY) Playpad, an early intervention iPad application. Methods: Participants were 33 families with a child with an autism spectrum disorder (ASD) aged 16 years or less, and with a diagnosis of autism or pervasive developmental disorder – not otherwise specified, and no secondary diagnoses. Families were provided with TOBY and asked to use it for 4–6 weeks, without further prompting or coaching. Dependent variables included participant use patterns and initial indicators of child progress. Results: Twenty-three participants engaged extensively with TOBY, being exposed to at least 100 complete learn units and completing between 17% and 100% of the curriculum. Conclusions: TOBY may make a useful contribution to early intervention programming for children with ASD delivering high rates of appropriate learning opportunities. Further research evaluating the efficacy of TOBY in relation to independent indicators of functioning is warranted.

Book ChapterDOI
19 May 2015
TL;DR: A framework based on discretization of time into epochs, epoch-wise topic discovery using a hierarchical Dirichlet process-based model, and a temporal similarity graph which allows for the modelling of complex topic changes is proposed.
Abstract: In this paper we describe a novel framework for the discovery of the topical content of a data corpus, and the tracking of its complex structural changes across the temporal dimension. In contrast to previous work our model does not impose a prior on the rate at which documents are added to the corpus nor does it adopt the Markovian assumption which overly restricts the type of changes that the model can capture. Our key technical contribution is a framework based on (i) discretization of time into epochs, (ii) epoch-wise topic discovery using a hierarchical Dirichlet process-based model, and (iii) a temporal similarity graph which allows for the modelling of complex topic changes: emergence and disappearance, evolution, splitting and merging. The power of the proposed framework is demonstrated on the medical literature corpus concerned with the autism spectrum disorder (ASD) – an increasingly important research subject of significant social and healthcare importance. In addition to the collected ASD literature corpus which we made freely available, our contributions also include two free online tools we built as aids to ASD researchers. These can be used for semantically meaningful navigation and searching, as well as knowledge discovery from this large and rapidly growing corpus of literature.

Posted Content
TL;DR: In this paper, the authors describe a novel framework for the discovery of the topical content of a data corpus, and the tracking of its complex structural changes across the temporal dimension, which does not impose a prior on the rate at which documents are added to the corpus nor does it adopt the Markovian assumption which overly restricts the type of changes that the model can capture.
Abstract: In this paper we describe a novel framework for the discovery of the topical content of a data corpus, and the tracking of its complex structural changes across the temporal dimension. In contrast to previous work our model does not impose a prior on the rate at which documents are added to the corpus nor does it adopt the Markovian assumption which overly restricts the type of changes that the model can capture. Our key technical contribution is a framework based on (i) discretization of time into epochs, (ii) epoch-wise topic discovery using a hierarchical Dirichlet process-based model, and (iii) a temporal similarity graph which allows for the modelling of complex topic changes: emergence and disappearance, evolution, splitting, and merging. The power of the proposed framework is demonstrated on two medical literature corpora concerned with the autism spectrum disorder (ASD) and the metabolic syndrome (MetS) -- both increasingly important research subjects with significant social and healthcare consequences. In addition to the collected ASD and metabolic syndrome literature corpora which we made freely available, our contribution also includes an extensive empirical analysis of the proposed framework. We describe a detailed and careful examination of the effects that our algorithms's free parameters have on its output, and discuss the significance of the findings both in the context of the practical application of our algorithm as well as in the context of the existing body of work on temporal topic analysis. Our quantitative analysis is followed by several qualitative case studies highly relevant to the current research on ASD and MetS, on which our algorithm is shown to capture well the actual developments in these fields.

Journal ArticleDOI
TL;DR: This paper demonstrates that the proposed method outperforms the state-of-the-art for pair-wise registration already, achieving greater accuracy and reliability, while at the same time reducing the computational cost of the task and that the increase in the number of available images in a set consistently reduces the average registration error.

Journal ArticleDOI
TL;DR: The high predictive validity of web search activity for NCD risk has potential to provide real-time information on population risk during policy implementation and other population-level NCD prevention efforts.
Abstract: Background The WHO framework for non-communicable disease (NCD) describes risks and outcomes comprising the majority of the global burden of disease. These factors are complex and interact at biological, behavioural, environmental and policy levels presenting challenges for population monitoring and intervention evaluation. This paper explores the utility of machine learning methods applied to population-level web search activity behaviour as a proxy for chronic disease risk factors. Methods Web activity output for each element of the WHO9s Causes of NCD framework was used as a basis for identifying relevant web search activity from 2004 to 2013 for the USA. Multiple linear regression models with regularisation were used to generate predictive algorithms, mapping web search activity to Centers for Disease Control and Prevention (CDC) measured risk factor/disease prevalence. Predictions for subsequent target years not included in the model derivation were tested against CDC data from population surveys using Pearson correlation and Spearman9s r. Results For 2011 and 2012, predicted prevalence was very strongly correlated with measured risk data ranging from fruits and vegetables consumed (r=0.81; 95% CI 0.68 to 0.89) to alcohol consumption (r=0.96; 95% CI 0.93 to 0.98). Mean difference between predicted and measured differences by State ranged from 0.03 to 2.16. Spearman9s r for state-wise predicted versus measured prevalence varied from 0.82 to 0.93. Conclusions The high predictive validity of web search activity for NCD risk has potential to provide real-time information on population risk during policy implementation and other population-level NCD prevention efforts.

Journal ArticleDOI
TL;DR: A prediction framework that explicitly models interventions by extracting a set of latent intervention groups through a Hierarchical Dirichlet Process (HDP) mixture is proposed and it is shown that by replacing HDP with a dynamic HDP prior, a more compact set of distributions can be learnt.
Abstract: Medical interventions critically determine clinical outcomes. But prediction models either ignore interventions or dilute impact by building a single prediction rule by amalgamating interventions with other features. One rule across all interventions may not capture differential effects. Also, interventions change with time as innovations are made, requiring prediction models to evolve over time. To address these gaps, we propose a prediction framework that explicitly models interventions by extracting a set of latent intervention groups through a Hierarchical Dirichlet Process HDP mixture. Data are split in temporal windows and for each window, a separate distribution over the intervention groups is learnt. This ensures that the model evolves with changing interventions. The outcome is modeled as conditional, on both the latent grouping and the patients' condition, through a Bayesian logistic regression. Learning distributions for each time-window result in an over-complex model when interventions do not change in every time-window. We show that by replacing HDP with a dynamic HDP prior, a more compact set of distributions can be learnt. Experiments performed on two hospital datasets demonstrate the superiority of our framework over many existing clinical and traditional prediction frameworks.

Journal ArticleDOI
TL;DR: Using a cohort of patients with heart failure, this work demonstrates better feature stability and goodness-of-fit through feature graph stabilization through Laplacian-based regularization into a regression model.
Abstract: We investigate feature stability in the context of clinical prognosis derived from high-dimensional electronic medical records. To reduce variance in the selected features that are predictive, we introduce Laplacian-based regularization into a regression model. The Laplacian is derived on a feature graph that captures both the temporal and hierarchic relations between hospital events, diseases, and interventions. Using a cohort of patients with heart failure, we demonstrate better feature stability and goodness-of-fit through feature graph stabilization.

Proceedings ArticleDOI
01 Jan 2015
TL;DR: This study presents an analysis of online Live Journal communities with and without mental health-related conditions including depression and autism, and reveals useful insights into hyper-group detection of online mental Health-related communities.
Abstract: People are increasingly using social media, especially online communities, to discuss mental health issues and seek supports. Understanding topics, interaction, sentiment and clustering structures of these communities informs important aspects of mental health. It can potentially add knowledge to the underlying cognitive dynamics, mood swings patterns, shared interests, and interaction. There has been growing research interest in analyzing online mental health communities; however sentiment analysis of these communities has been largely under-explored. This study presents an analysis of online Live Journal communities with and without mental health-related conditions including depression and autism. Latent topics for mood tags, affective words, and generic words in the content of the posts made in these communities were learned using nonparametric topic modelling. These representations were then input into a nonparametric clustering to discover meta-groups among the communities. The best performance results can be achieved on clustering communities with latent mood-based representation for such communities. The study also found significant differences in usage latent topics for mood tags and affective features between online communities with and without affective disorders. The findings reveal useful insights into hyper-group detection of online mental health-related communities.

Journal ArticleDOI
TL;DR: This paper revisits the abnormality detection problem through the lens of Bayesian nonparametric (BNP) and develops a novel usage of BNP methods for this problem, employing the Infinite Hidden Markov Model and Bayesian Nonparametric Factor Analysis for stream data segmentation and pattern discovery.
Abstract: In data science, anomaly detection is the process of identifying the items, events or observations which do not conform to expected patterns in a dataset. As widely acknowledged in the computer vision community and security management, discovering suspicious events is the key issue for abnormal detection in video surveillance. The important steps in identifying such events include stream data segmentation and hidden patterns discovery. However, the crucial challenge in stream data segmentation and hidden patterns discovery are the number of coherent segments in surveillance stream and the number of traffic patterns are unknown and hard to specify. Therefore, in this paper we revisit the abnormality detection problem through the lens of Bayesian nonparametric (BNP) and develop a novel usage of BNP methods for this problem. In particular, we employ the Infinite Hidden Markov Model and Bayesian Nonparametric Factor Analysis for stream data segmentation and pattern discovery. In addition, we introduce an interactive system allowing users to inspect and browse suspicious events.

Proceedings Article
01 Jan 2015
TL;DR: This paper presents two truncation-free variational algorithms, one for mix-membership inference called TFVB (truncation- free variational Bayes), and the other for hard clustering inference calledTFME (trUNCUBE), which further developed a streaming learning framework for the popular Dirichlet process mixture models.
Abstract: Bayesian nonparametric models are theoretically suitable to learn streaming data due to their complexity relaxation to the volume of observed data. However, most of the existing variational inference algorithms are not applicable to streaming applications since they re-quire truncation on variational distributions. In this paper, we present two truncation-free variational algorithms, one for mix-membership inference called TFVB (truncation-free variational Bayes), and the other for hard clustering inference called TFME (truncation-free maximization expectation). With these algorithms, we further developed a streaming learning framework for the popular Dirichlet process mixture (DPM) models. Our ex-periments demonstrate the usefulness of our framework in both synthetic and real-world data.

Proceedings ArticleDOI
07 Dec 2015
TL;DR: This work proposes a new method that aims to increase the stability of Lasso by encouraging similarities between features based on their relatedness, which is captured via a feature covariance matrix.
Abstract: Feature selection is an important step in building predictive models for most real-world problems. One of the popular methods in feature selection is Lasso. However, it shows instability in selecting features when dealing with correlated features. In this work, we propose a new method that aims to increase the stability of Lasso by encouraging similarities between features based on their relatedness, which is captured via a feature covariance matrix. Besides modeling positive feature correlations, our method can also identify negative correlations between features. We propose a convex formulation for our model along with an alternating optimization algorithm that can learn the weights of the features as well as the relationship between them. Using both synthetic and real-world data, we show that the proposed method is more stable than Lasso and many state-of-the-art shrinkage and feature selection methods. Also, its predictive performance is comparable to other methods.

Posted Content
TL;DR: The findings convincingly demonstrate that the proposed method is highly successful and vastly outperforms the existing alternatives, especially when the target quantile is high valued and the available buffer capacity severely limited.
Abstract: The need to estimate a particular quantile of a distribution is an important problem which frequently arises in many computer vision and signal processing applications. For example, our work was motivated by the requirements of many semi-automatic surveillance analytics systems which detect abnormalities in close-circuit television (CCTV) footage using statistical models of low-level motion features. In this paper we specifically address the problem of estimating the running quantile of a data stream with non-stationary stochasticity when the memory for storing observations is limited. We make several major contributions: (i) we derive an important theoretical result which shows that the change in the quantile of a stream is constrained regardless of the stochastic properties of data, (ii) we describe a set of high-level design goals for an effective estimation algorithm that emerge as a consequence of our theoretical findings, (iii) we introduce a novel algorithm which implements the aforementioned design goals by retaining a sample of data values in a manner adaptive to changes in the distribution of data and progressively narrowing down its focus in the periods of quasi-stationary stochasticity, and (iv) we present a comprehensive evaluation of the proposed algorithm and compare it with the existing methods in the literature on both synthetic data sets and three large `real-world' streams acquired in the course of operation of an existing commercial surveillance system. Our findings convincingly demonstrate that the proposed method is highly successful and vastly outperforms the existing alternatives, especially when the target quantile is high valued and the available buffer capacity severely limited.

Book ChapterDOI
30 Nov 2015
TL;DR: This work proposes a new method to increase the stability of \(l_{1}\)-norm SVM by encouraging similarities between feature weights based on feature correlations, which is captured via a feature covariance matrix.
Abstract: The support vector machine (SVM) is a popular method for classification, well known for finding the maximum-margin hyperplane. Combining SVM with \(l_{1}\)-norm penalty further enables it to simultaneously perform feature selection and margin maximization within a single framework. However, \(l_{1}\)-norm SVM shows instability in selecting features in presence of correlated features. We propose a new method to increase the stability of \(l_{1}\)-norm SVM by encouraging similarities between feature weights based on feature correlations, which is captured via a feature covariance matrix. Our proposed method can capture both positive and negative correlations between features. We formulate the model as a convex optimization problem and propose a solution based on alternating minimization. Using both synthetic and real-world datasets, we show that our model achieves better stability and classification accuracy compared to several state-of-the-art regularized classification methods.

Proceedings ArticleDOI
01 Jan 2015
TL;DR: This paper proposes a novel multi-view sub space clustering by searching for an unified latent structure as a global affinity matrix in subspace clustering and derives a provably convergent algorithm based on the alternating direction method of multipliers (ADMM) framework, which is computationally efficient, to solve the formulation.
Abstract: In many real-world computer vision applications, such as multi-camera surveillance, the objects of interest are captured by visual sensors concurrently, resulting in multi-view data. These views usually provide complementary information to each other. One recent and powerful computer vision method for clustering is sparse subspace clustering (SSC); however, it was not designed for multi-view data, which break down its linear separability assumption. To integrate complementary information between views, multi-view clustering algorithms are required to improve the clustering performance. In this paper, we propose a novel multi-view subspace clustering by searching for an unified latent structure as a global affinity matrix in subspace clustering. Due to the integration of affinity matrices for each view, this global affinity matrix can best represent the relationship between clusters. This could help us achieve better performance on face clustering. We derive a provably convergent algorithm based on the alternating direction method of multipliers (ADMM) framework, which is computationally efficient, to solve the formulation. We demonstrate that this formulation outperforms other alternatives based on state-of-the-arts on challenging multi-view face datasets.

Book ChapterDOI
19 May 2015
TL;DR: This work presents a Bayesian nonparametric framework for multilevel regression where individuals including observations and outcomes are organized into groups, and uses Dirichlet Process with product-space base measure in a nested structure to model group-level context distribution.
Abstract: Regression is at the cornerstone of statistical analysis. Multilevel regression, on the other hand, receives little research attention, though it is prevalent in economics, biostatistics and healthcare to name a few. We present a Bayesian nonparametric framework for multilevel regression where individuals including observations and outcomes are organized into groups. Furthermore, our approach exploits additional group-specific context observations, we use Dirichlet Process with product-space base measure in a nested structure to model group-level context distribution and the regression distribution to accommodate the multilevel structure of the data. The proposed model simultaneously partitions groups into cluster and perform regression. We provide collapsed Gibbs sampler for posterior inference. We perform extensive experiments on econometric panel data and healthcare longitudinal data to demonstrate the effectiveness of the proposed model.