scispace - formally typeset
Search or ask a question

Showing papers on "Graph (abstract data type) published in 2010"


Proceedings ArticleDOI
06 Jun 2010
TL;DR: A model for processing large graphs that has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier.
Abstract: Many practical computing problems concern large graphs. Standard examples include the Web graph and various social networks. The scale of these graphs - in some cases billions of vertices, trillions of edges - poses challenges to their efficient processing. In this paper we present a computational model suitable for this task. Programs are expressed as a sequence of iterations, in each of which a vertex can receive messages sent in the previous iteration, send messages to other vertices, and modify its own state and that of its outgoing edges or mutate graph topology. This vertex-centric approach is flexible enough to express a broad set of algorithms. The model has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier. Distribution-related details are hidden behind an abstract API. The result is a framework for processing large graphs that is expressive and easy to program.

3,840 citations



Journal ArticleDOI
TL;DR: The network-based statistic (NBS) is introduced for the first time and its power is evaluated with the use of receiver operating characteristic (ROC) curves to demonstrate its utility with application to a real case-control study involving a group of people with schizophrenia for which resting-state functional MRI data were acquired.

2,042 citations


Proceedings ArticleDOI
13 Jun 2010
TL;DR: An action graph is employed to model explicitly the dynamics of the actions and a bag of 3D points to characterize a set of salient postures that correspond to the nodes in the action graph to recognize human actions from sequences of depth maps.
Abstract: This paper presents a method to recognize human actions from sequences of depth maps. Specifically, we employ an action graph to model explicitly the dynamics of the actions and a bag of 3D points to characterize a set of salient postures that correspond to the nodes in the action graph. In addition, we propose a simple, but effective projection based sampling scheme to sample the bag of 3D points from the depth maps. Experimental results have shown that over 90% recognition accuracy were achieved by sampling only about 1% 3D points from the depth maps. Compared to the 2D silhouette based recognition, the recognition errors were halved. In addition, we demonstrate the potential of the bag of points posture model to deal with occlusions through simulation.

1,437 citations


Journal ArticleDOI
TL;DR: An introductory description to the graph-based SLAM problem is provided and a state-of-the-art solution that is based on least-squares error minimization and exploits the structure of the SLAM problems during optimization is discussed.
Abstract: Being able to build a map of the environment and to simultaneously localize within this map is an essential skill for mobile robots navigating in unknown environments in absence of external referencing systems such as GPS. This so-called simultaneous localization and mapping (SLAM) problem has been one of the most popular research topics in mobile robotics for the last two decades and efficient approaches for solving this task have been proposed. One intuitive way of formulating SLAM is to use a graph whose nodes correspond to the poses of the robot at different points in time and whose edges represent constraints between the poses. The latter are obtained from observations of the environment or from movement actions carried out by the robot. Once such a graph is constructed, the map can be computed by finding the spatial configuration of the nodes that is mostly consistent with the measurements modeled by the edges. In this paper, we provide an introductory description to the graph-based SLAM problem. Furthermore, we discuss a state-of-the-art solution that is based on least-squares error minimization and exploits the structure of the SLAM problems during optimization. The goal of this tutorial is to enable the reader to implement the proposed methods from scratch.

1,103 citations


Proceedings ArticleDOI
13 Jun 2010
TL;DR: An efficient and scalable technique for spatiotemporal segmentation of long video sequences using a hierarchical graph-based algorithm that generates high quality segmentations, which are temporally coherent with stable region boundaries, and allows subsequent applications to choose from varying levels of granularity.
Abstract: We present an efficient and scalable technique for spatiotemporal segmentation of long video sequences using a hierarchical graph-based algorithm. We begin by over-segmenting a volumetric video graph into space-time regions grouped by appearance. We then construct a “region graph” over the obtained segmentation and iteratively repeat this process over multiple levels to create a tree of spatio-temporal segmentations. This hierarchical approach generates high quality segmentations, which are temporally coherent with stable region boundaries, and allows subsequent applications to choose from varying levels of granularity. We further improve segmentation quality by using dense optical flow to guide temporal connections in the initial graph. We also propose two novel approaches to improve the scalability of our technique: (a) a parallel out-of-core algorithm that can process volumes much larger than an in-core algorithm, and (b) a clip-based processing algorithm that divides the video into overlapping clips in time, and segments them successively while enforcing consistency. We demonstrate hierarchical segmentations on video shots as long as 40 seconds, and even support a streaming mode for arbitrarily long videos, albeit without the ability to process them hierarchically.

772 citations


Journal ArticleDOI
TL;DR: It is demonstrated that schizophrenia involves an aberrant topology of the structural infrastructure of the brain network, which suggests that schizophrenia patients have a less strongly globally integrated structural brain network with a reduced central role for key frontal hubs.
Abstract: Brain regions are not independent. They are interconnected by white matter tracts, together forming one integrative complex network. The topology of this network is crucial for efficient information integration between brain regions. Here, we demonstrate that schizophrenia involves an aberrant topology of the structural infrastructure of the brain network. Using graph theoretical analysis, complex structural brain networks of 40 schizophrenia patients and 40 human healthy controls were examined. Diffusion tensor imaging was used to reconstruct the white matter connections of the brain network, with the strength of the connections defined as the level of myelination of the tracts as measured through means of magnetization transfer ratio magnetic resonance imaging. Patients displayed a preserved overall small-world network organization, but focusing on specific brain regions and their capacity to communicate with other regions of the brain revealed significantly longer node-specific path lengths (higher L) of frontal and temporal regions, especially of bilateral inferior/superior frontal cortex and temporal pole regions. These findings suggest that schizophrenia impacts global network connectivity of frontal and temporal brain regions. Furthermore, frontal hubs of patients showed a significant reduction of betweenness centrality, suggesting a less central hub role of these regions in the overall network structure. Together, our findings suggest that schizophrenia patients have a less strongly globally integrated structural brain network with a reduced central role for key frontal hubs, resulting in a limited structural capacity to integrate information across brain regions.

625 citations


Journal ArticleDOI
01 Sep 2010
TL;DR: Schism consistently outperforms simple partitioning schemes, and in some cases proves superior to the best known manual partitioning, reducing the cost of distributed transactions up to 30%.
Abstract: We present Schism, a novel workload-aware approach for database partitioning and replication designed to improve scalability of shared-nothing distributed databases. Because distributed transactions are expensive in OLTP settings (a fact we demonstrate through a series of experiments), our partitioner attempts to minimize the number of distributed transactions, while producing balanced partitions. Schism consists of two phases: i) a workload-driven, graph-based replication/partitioning phase and ii) an explanation and validation phase. The first phase creates a graph with a node per tuple (or group of tuples) and edges between nodes accessed by the same transaction, and then uses a graph partitioner to split the graph into k balanced partitions that minimize the number of cross-partition transactions. The second phase exploits machine learning techniques to find a predicate-based explanation of the partitioning strategy (i.e., a set of range predicates that represent the same replication/partitioning scheme produced by the partitioner).The strengths of Schism are: i) independence from the schema layout, ii) effectiveness on n-to-n relations, typical in social network databases, iii) a unified and fine-grained approach to replication and partitioning. We implemented and tested a prototype of Schism on a wide spectrum of test cases, ranging from classical OLTP workloads (e.g., TPC-C and TPC-E), to more complex scenarios derived from social network websites (e.g., Epinions.com), whose schema contains multiple n-to-n relationships, which are known to be hard to partition. Schism consistently outperforms simple partitioning schemes, and in some cases proves superior to the best known manual partitioning, reducing the cost of distributed transactions up to 30%.

602 citations


Journal ArticleDOI
TL;DR: This review will focus mainly on recent findings concerning graph theoretical analysis of human brain networks with a variety of imaging modalities, exploring whether graph-based brain network analysis could yield reliable biomarkers for disease diagnosis and treatment.
Abstract: Purpose of review In recent years, there has been an explosion of studies on network modeling of brain connectivity. This review will focus mainly on recent findings concerning graph theoretical analysis of human brain networks with a variety of imaging modalities, including structural MRI, diffusion MRI, functional MRI, and EEG/MEG. Recent findings Recent studies have utilized graph theoretical approaches to investigate the organizational principles of brain networks. These studies have consistently shown many important statistical properties underlying the topological organization of the human brain, including modularity, small-worldness, and the existence of highly connected network hubs. Importantly, these quantifiable network properties were found to change during normal development, aging, and various neurological and neuropsychiatric diseases such as Alzheimer's disease and schizophrenia. Moreover, several studies have also suggested that these network properties correlate with behavioral and genetic factors. Summary The exciting research regarding graph theoretical analysis of brain connectivity yields truly integrative and comprehensive descriptions of the structural and functional organization of the human brain, which provides important implications for health and disease. Future research will most likely involve integrative models of brain structural and functional connectivity with multimodal neuroimaging data, exploring whether graph-based brain network analysis could yield reliable biomarkers for disease diagnosis and treatment.

601 citations


Journal ArticleDOI
01 Oct 2010
TL;DR: A novel learnable proximity measure is described which instead uses one weight per edge label sequence: proximity is defined by a weighted combination of simple “path experts”, each corresponding to following a particular sequence of labeled edges.
Abstract: Scientific literature with rich metadata can be represented as a labeled directed graph. This graph representation enables a number of scientific tasks such as ad hoc retrieval or named entity recognition (NER) to be formulated as typed proximity queries in the graph. One popular proximity measure is called Random Walk with Restart (RWR), and much work has been done on the supervised learning of RWR measures by associating each edge label with a parameter. In this paper, we describe a novel learnable proximity measure which instead uses one weight per edge label sequence: proximity is defined by a weighted combination of simple "path experts", each corresponding to following a particular sequence of labeled edges. Experiments on eight tasks in two subdomains of biology show that the new learning method significantly outperforms the RWR model (both trained and untrained). We also extend the method to support two additional types of experts to model intrinsic properties of entities: query-independent experts, which generalize the PageRank measure, and popular entity experts which allow rankings to be adjusted for particular entities that are especially important.

582 citations


Journal ArticleDOI
TL;DR: This article reports on an approach based on supervised learning to automatically infer users' transportation modes, including driving, walking, taking a bus and riding a bike, from raw GPS logs, using a change point-based segmentation method and Decision Tree-based inference model.
Abstract: User mobility has given rise to a variety of Web applications, in which the global positioning system (GPS) plays many important roles in bridging between these applications and end users. As a kind of human behavior, transportation modes, such as walking and driving, can provide pervasive computing systems with more contextual information and enrich a user's mobility with informative knowledge. In this article, we report on an approach based on supervised learning to automatically infer users' transportation modes, including driving, walking, taking a bus and riding a bike, from raw GPS logs. Our approach consists of three parts: a change point-based segmentation method, an inference model and a graph-based post-processing algorithm. First, we propose a change point-based segmentation method to partition each GPS trajectory into separate segments of different transportation modes. Second, from each segment, we identify a set of sophisticated features, which are not affected by differing traffic conditions (e.g., a person's direction when in a car is constrained more by the road than any change in traffic conditions). Later, these features are fed to a generative inference model to classify the segments of different modes. Third, we conduct graph-based postprocessing to further improve the inference performance. This postprocessing algorithm considers both the commonsense constraints of the real world and typical user behaviors based on locations in a probabilistic manner. The advantages of our method over the related works include three aspects. (1) Our approach can effectively segment trajectories containing multiple transportation modes. (2) Our work mined the location constraints from user-generated GPS logs, while being independent of additional sensor data and map information like road networks and bus stops. (3) The model learned from the dataset of some users can be applied to infer GPS data from others. Using the GPS logs collected by 65 people over a period of 10 months, we evaluated our approach via a set of experiments. As a result, based on the change-point-based segmentation method and Decision Tree-based inference model, we achieved prediction accuracy greater than 71 percent. Further, using the graph-based post-processing algorithm, the performance attained a 4-percent enhancement.

Proceedings ArticleDOI
25 Jul 2010
TL;DR: This paper studies a query-dependent variant of the community-detection problem, which it is called thecommunity-search problem: given a graph G, and a set of query nodes in the graph, it is sought to find a subgraph of G that contains the query nodes and it is densely connected, and develops an optimum greedy algorithm for this measure.
Abstract: A lot of research in graph mining has been devoted in the discovery of communities. Most of the work has focused in the scenario where communities need to be discovered with only reference to the input graph. However, for many interesting applications one is interested in finding the community formed by a given set of nodes. In this paper we study a query-dependent variant of the community-detection problem, which we call the community-search problem: given a graph G, and a set of query nodes in the graph, we seek to find a subgraph of G that contains the query nodes and it is densely connected. We motivate a measure of density based on minimum degree and distance constraints, and we develop an optimum greedy algorithm for this measure. We proceed by characterizing a class of monotone constraints and we generalize our algorithm to compute optimum solutions satisfying any set of monotone constraints. Finally we modify the greedy algorithm and we present two heuristic algorithms that find communities of size no greater than a specified upper bound. Our experimental evaluation on real datasets demonstrates the efficiency of the proposed algorithms and the quality of the solutions we obtain.

Journal ArticleDOI
TL;DR: Investigating how variations in parcellation templates affect key graph analytic measures of functional brain organization using resting-state fMRI in 30 healthy volunteers found that gross inferences regarding network topology were robust to the template used, but that both absolute values of, and individual differences in, specific parameters such as path length, clustering, small-worldness, and degree distribution descriptors varied considerably.
Abstract: Graph analysis has become an increasingly popular tool for characterizing topological properties of brain connectivity networks. Within this approach, the brain is modeled as a graph comprising N nodes connected by M edges. In functional magnetic resonance imaging (fMRI) studies, the nodes typically represent brain regions and the edges some measure of interaction between them. These nodes are commonly defined using a variety of regional parcellation templates, which can vary both in the volume sampled by each region, and the number of regions parcellated. Here, we sought to investigate how such variations in parcellation templates affect key graph analytic measures of functional brain organization using resting-state fMRI in thirty healthy volunteers. Seven different parcellation resolutions (84, 91, 230, 438, 890, 1314 and 4320 regions) were investigated. We found that gross inferences regarding network topology, such as whether the brain is small-world or scale-free, were robust to the template used, but that both absolute values of, and individual differences in, specific parameters such as path length, clustering, small-worldness and degree distribution descriptors varied considerably across the resolutions studied. These findings underscore the need to consider the effect that a specific parcellation approach has on graph analytic findings in human fMRI studies, and indicate that results obtained using different templates may not be directly comparable.

Journal ArticleDOI
TL;DR: This paper describes a decentralized estimation procedure that allows each agent to track the algebraic connectivity of a time-varying graph and proposes a decentralized gradient controller for eachAgent to maintain global connectivity during motion.

Book ChapterDOI
Melissa Chase1, Seny Kamara1
05 Dec 2010
TL;DR: The notion of structured encryption is introduced which generalizes previous work on symmetric searchable encryption (SSE) to the setting of arbitrarily-structured data.
Abstract: We consider the problem of encrypting structured data (e.g., a web graph or a social network) in such a way that it can be efficiently and privately queried. For this purpose, we introduce the notion of structured encryption which generalizes previous work on symmetric searchable encryption (SSE) to the setting of arbitrarily-structured data.

Journal ArticleDOI
TL;DR: Repetitive regions of plant genomes can be efficiently characterized by the presented graph-based analysis and the graph representation of repeats can be further used to assess the variability and evolutionary divergence of repeat families, discover and characterize novel elements, and aid in subsequent assembly of their consensus sequences.
Abstract: The investigation of plant genome structure and evolution requires comprehensive characterization of repetitive sequences that make up the majority of higher plant nuclear DNA. Since genome-wide characterization of repetitive elements is complicated by their high abundance and diversity, novel approaches based on massively-parallel sequencing are being adapted to facilitate the analysis. It has recently been demonstrated that the low-pass genome sequencing provided by a single 454 sequencing reaction is sufficient to capture information about all major repeat families, thus providing the opportunity for efficient repeat investigation in a wide range of species. However, the development of appropriate data mining tools is required in order to fully utilize this sequencing data for repeat characterization. We adapted a graph-based approach for similarity-based partitioning of whole genome 454 sequence reads in order to build clusters made of the reads derived from individual repeat families. The information about cluster sizes was utilized for assessing the proportion and composition of repeats in the genomes of two model species, Pisum sativum and Glycine max, differing in genome size and 454 sequencing coverage. Moreover, statistical analysis and visual inspection of the topology of the cluster graphs using a newly developed program tool, SeqGrapheR, were shown to be helpful in distinguishing basic types of repeats and investigating sequence variability within repeat families. Repetitive regions of plant genomes can be efficiently characterized by the presented graph-based analysis and the graph representation of repeats can be further used to assess the variability and evolutionary divergence of repeat families, discover and characterize novel elements, and aid in subsequent assembly of their consensus sequences.

01 May 2010
TL;DR: In the words of Lord Kelvin, “if you cannot measure it, you cannot improve it”, one of the longlasting successes of the Top 500 list is sustained, community-wide floating point performance improvement.
Abstract: In the words of Lord Kelvin, “if you cannot measure it, you cannot improve it”. One of the longlasting successes of the Top 500 list is sustained, community-wide floating point performance improvement. Emerging large-data problems, either resulting from measured real-world phenomena or as further processing of data generated by simulation, have radically different performance characteristics and architectural requirements. As the community contemplates scaling to large-scale HPC resources to solve these problems, we are challenged by the reality that supercomputers are typically optimized for the 3D simulation of physics, not large-scale, data-driven analysis. Consequently, the community contemplating this kind of analysis requires a new yard stick for evaluating future platforms. Since the invention of the von Neumann architecture, the physics simulation has largely driven the development and evolution of High Performance Computing. This allows scientists and engineers to test hypotheses, designs, and ask “what if” questions. Emerging informatics and analytics applications are different both in purpose and structure. While physics simulations typically are core-memory sized, floating point intensive, and well-structured, informatics applications tend to be out of core, integer oriented, and unstructured. (It could be argued that physics simulations are moving in this direction.) The graph abstraction is a powerful model in com-

Proceedings ArticleDOI
13 Jun 2010
TL;DR: A method based on a combination of Support Vector Regression and Markov Random Fields to drastically reduce the time needed to search for a point's location and increase the accuracy and robustness of the algorithm.
Abstract: Finding fiducial facial points in any frame of a video showing rich naturalistic facial behaviour is an unsolved problem. Yet this is a crucial step for geometric-feature-based facial expression analysis, and methods that use appearance-based features extracted at fiducial facial point locations. In this paper we present a method based on a combination of Support Vector Regression and Markov Random Fields to drastically reduce the time needed to search for a point's location and increase the accuracy and robustness of the algorithm. Using Markov Random Fields allows us to constrain the search space by exploiting the constellations that facial points can form. The regressors on the other hand learn a mapping between the appearance of the area surrounding a point and the positions of these points, which makes detection of the points very fast and can make the algorithm robust to variations of appearance due to facial expression and moderate changes in head pose. The proposed point detection algorithm was tested on 1855 images, the results of which showed we outperform current state of the art point detectors.

Journal ArticleDOI
01 Sep 2010
TL;DR: The experimental studies demonstrate the effectiveness and scalability of SPath, which proves to be a more practical and efficient indexing method in addressing graph queries on large networks.
Abstract: The dramatic proliferation of sophisticated networks has resulted in a growing need for supporting effective querying and mining methods over such large-scale graph-structured data. At the core of many advanced network operations lies a common and critical graph query primitive: how to search graph structures efficiently within a large network? Unfortunately, the graph query is hard due to the NP-complete nature of subgraph isomorphism. It becomes even challenging when the network examined is large and diverse. In this paper, we present a high performance graph indexing mechanism, SPath, to address the graph query problem on large networks. SPath leverages decomposed shortest paths around vertex neighborhood as basic indexing units, which prove to be both effective in graph search space pruning and highly scalable in index construction and deployment. Via SPath, a graph query is processed and optimized beyond the traditional vertex-at-a-time fashion to a more efficient path-at-a-time way: the query is first decomposed to a set of shortest paths, among which a subset of candidates with good selectivity is picked by a query plan optimizer; Candidate paths are further joined together to help recover the query graph to finalize the graph query processing. We evaluate SPath with the state-of-the-art GraphQL on both real and synthetic data sets. Our experimental studies demonstrate the effectiveness and scalability of SPath, which proves to be a more practical and efficient indexing method in addressing graph queries on large networks.

Proceedings Article
09 Oct 2010
TL;DR: A Topical PageRank (TPR) is built on word graph to measure word importance with respect to different topics and shows that TPR outperforms state-of-the-art keyphrase extraction methods on two datasets under various evaluation metrics.
Abstract: Existing graph-based ranking methods for keyphrase extraction compute a single importance score for each word via a single random walk. Motivated by the fact that both documents and words can be represented by a mixture of semantic topics, we propose to decompose traditional random walk into multiple random walks specific to various topics. We thus build a Topical PageRank (TPR) on word graph to measure word importance with respect to different topics. After that, given the topic distribution of the document, we further calculate the ranking scores of words and extract the top ranked ones as keyphrases. Experimental results show that TPR outperforms state-of-the-art keyphrase extraction methods on two datasets under various evaluation metrics.

Proceedings ArticleDOI
13 Jun 2010
TL;DR: It is indicated that high quality itineraries can be automatically constructed from Flickr data, by constructing intra-city travel itineraries automatically by tapping a latent source reflecting geo-temporal breadcrumbs left by millions of tourists.
Abstract: Vacation planning is one of the frequent---but nonetheless laborious---tasks that people engage themselves with online; requiring skilled interaction with a multitude of resources. This paper constructs intra-city travel itineraries automatically by tapping a latent source reflecting geo-temporal breadcrumbs left by millions of tourists. For example, the popular rich media sharing site, Flickr, allows photos to be stamped by the time of when they were taken and be mapped to Points Of Interests (POIs) by geographical (i.e. latitude-longitude) and semantic (e.g., tags) metadata.Leveraging this information, we construct itineraries following a two-step approach. Given a city, we first extract photo streams of individual users. Each photo stream provides estimates on where the user was, how long he stayed at each place, and what was the transit time between places. In the second step, we aggregate all user photo streams into a POI graph. Itineraries are then automatically constructed from the graph based on the popularity of the POIs and subject to the user's time and destination constraints.We evaluate our approach by constructing itineraries for several major cities and comparing them, through a "crowd-sourcing" marketplace (Amazon Mechanical Turk), against itineraries constructed from popular bus tours that are professionally generated. Our extensive survey-based user studies over about 450 workers on AMT indicate that high quality itineraries can be automatically constructed from Flickr data.

Journal ArticleDOI
01 Sep 2010
TL;DR: A general framework for the mining of semantically meaningful, significant locations, e.g., shopping malls and restaurants, from GPS data is proposed and is capable of outperforming baseline methods and an extension of an existing proposal.
Abstract: With the increasing deployment and use of GPS-enabled devices, massive amounts of GPS data are becoming available. We propose a general framework for the mining of semantically meaningful, significant locations, e.g., shopping malls and restaurants, from such data.We present techniques capable of extracting semantic locations from GPS data. We capture the relationships between locations and between locations and users with a graph. Significance is then assigned to locations using random walks over the graph that propagates significance among the locations. In doing so, mutual reinforcement between location significance and user authority is exploited for determining significance, as are aspects such as the number of visits to a location, the durations of the visits, and the distances users travel to reach locations. Studies using up to 100 million GPS records from a confined spatio-temporal region demonstrate that the proposal is effective and is capable of outperforming baseline methods and an extension of an existing proposal.

Journal ArticleDOI
01 Jun 2010
TL;DR: This paper studies optimal linear-consensus algorithms for multivehicle systems with single-integrator dynamics in both continuous-time and discrete-time settings and shows that any symmetric Laplacian matrix is inverse optimal with respect to a properly chosen cost function.
Abstract: Laplacian matrices play an important role in linear-consensus algorithms. This paper studies optimal linear-consensus algorithms for multivehicle systems with single-integrator dynamics in both continuous-time and discrete-time settings. We propose two global cost functions, namely, interaction-free and interaction-related cost functions. With the interaction-free cost function, we derive the optimal (nonsymmetric) Laplacian matrix by using a linear-quadratic-regulator-based method in both continuous-time and discrete-time settings. It is shown that the optimal (nonsymmetric) Laplacian matrix corresponds to a complete directed graph. In addition, we show that any symmetric Laplacian matrix is inverse optimal with respect to a properly chosen cost function. With the interaction-related cost function, we derive the optimal scaling factor for a prespecified symmetric Laplacian matrix associated with the interaction graph in both continuous-time and discrete-time settings. Illustrative examples are given as a proof of concept.

Posted Content
TL;DR: The method has a clear interpretation: the authors use the least amount of regularization that simultaneously makes a graph sparse and replicable under random sampling, which requires essentially no conditions.
Abstract: A challenging problem in estimating high-dimensional graphical models is to choose the regularization parameter in a data-dependent way. The standard techniques include $K$-fold cross-validation ($K$-CV), Akaike information criterion (AIC), and Bayesian information criterion (BIC). Though these methods work well for low-dimensional problems, they are not suitable in high dimensional settings. In this paper, we present StARS: a new stability-based method for choosing the regularization parameter in high dimensional inference for undirected graphs. The method has a clear interpretation: we use the least amount of regularization that simultaneously makes a graph sparse and replicable under random sampling. This interpretation requires essentially no conditions. Under mild conditions, we show that StARS is partially sparsistent in terms of graph estimation: i.e. with high probability, all the true edges will be included in the selected model even when the graph size diverges with the sample size. Empirically, the performance of StARS is compared with the state-of-the-art model selection procedures, including $K$-CV, AIC, and BIC, on both synthetic data and a real microarray dataset. StARS outperforms all these competing procedures.

Journal ArticleDOI
TL;DR: In Fig. 1a of the version of this Review originally published, the graph labelled '2- μm-thick Si wafer' is that for a 10-μm-Thick SiWafer, and the corrected figure is shown below.
Abstract: Nature Materials 9, 205–213 (2010); published online 19 February 2010; corrected online 1 September 2010. In Fig. 1a of the version of this Review originally published, the graph labelled '2-μm-thick Si wafer' is that for a 10-μm-thick Si wafer. The corrected figure is shown below. The original figure caption and descriptions in the text are correct.

Journal ArticleDOI
TL;DR: This model represents faces in each age group by a hierarchical And-or graph, in which And nodes decompose a face into parts to describe details crucial for age perception and Or nodes represent large diversity of faces by alternative selections.
Abstract: In this paper, we present a compositional and dynamic model for face aging. The compositional model represents faces in each age group by a hierarchical And-or graph, in which And nodes decompose a face into parts to describe details (e.g., hair, wrinkles, etc.) crucial for age perception and Or nodes represent large diversity of faces by alternative selections. Then a face instance is a transverse of the And-or graph-parse graph. Face aging is modeled as a Markov process on the parse graph representation. We learn the parameters of the dynamic model from a large annotated face data set and the stochasticity of face aging is modeled in the dynamics explicitly. Based on this model, we propose a face aging simulation and prediction algorithm. Inversely, an automatic age estimation algorithm is also developed under this representation. We study two criteria to evaluate the aging results using human perception experiments: (1) the accuracy of simulation: whether the aged faces are perceived of the intended age group, and (2) preservation of identity: whether the aged faces are perceived as the same person. Quantitative statistical analysis validates the performance of our aging model and age estimation algorithm.

Journal ArticleDOI
TL;DR: This paper introduces a graph-based WSD algorithm which has few parameters and does not require sense-annotated data for training, and investigates several measures of graph connectivity with the aim of identifying those best suited for WSD.
Abstract: Word sense disambiguation (WSD), the task of identifying the intended meanings (senses) of words in context, has been a long-standing research objective for natural language processing. In this paper, we are concerned with graph-based algorithms for large-scale WSD. Under this framework, finding the right sense for a given word amounts to identifying the most ?important? node among the set of graph nodes representing its senses. We introduce a graph-based WSD algorithm which has few parameters and does not require sense-annotated data for training. Using this algorithm, we investigate several measures of graph connectivity with the aim of identifying those best suited for WSD. We also examine how the chosen lexicon and its connectivity influences WSD performance. We report results on standard data sets and show that our graph-based approach performs comparably to the state of the art.

Proceedings ArticleDOI
Jiajun Bu1, Shulong Tan1, Chun Chen1, Can Wang1, Hao Wu1, Lijun Zhang1, Xiaofei He1 
25 Oct 2010
TL;DR: A novel music recommendation algorithm is proposed by using both multiple kinds of social media information and music acoustic-based content, and uses hypergraph to model the various objects and relations, and considers music recommendation as a ranking problem on this hypergraph.
Abstract: Acoustic-based music recommender systems have received increasing interest in recent years. Due to the semantic gap between low level acoustic features and high level music concepts, many researchers have explored collaborative filtering techniques in music recommender systems. Traditional collaborative filtering music recommendation methods only focus on user rating information. However, there are various kinds of social media information, including different types of objects and relations among these objects, in music social communities such as Last.fm and Pandora. This information is valuable for music recommendation. However, there are two challenges to exploit this rich social media information: (a) There are many different types of objects and relations in music social communities, which makes it difficult to develop a unified framework taking into account all objects and relations. (b) In these communities, some relations are much more sophisticated than pairwise relation, and thus cannot be simply modeled by a graph. In this paper, we propose a novel music recommendation algorithm by using both multiple kinds of social media information and music acoustic-based content. Instead of graph, we use hypergraph to model the various objects and relations, and consider music recommendation as a ranking problem on this hypergraph. While an edge of an ordinary graph connects only two objects, a hyperedge represents a set of objects. In this way, hypergraph can be naturally used to model high-order relations. Experiments on a data set collected from the music social community Last.fm have demonstrated the effectiveness of our proposed algorithm.

Journal ArticleDOI
TL;DR: A new perspective to this problem is provided by considering the existing shapes as a group, and study their similarity measures to the query shape in a graph structure, which achieves promising improvements on both shape classification and shape clustering.
Abstract: Shape similarity and shape retrieval are very important topics in computer vision. The recent progress in this domain has been mostly driven by designing smart shape descriptors for providing better similarity measure between pairs of shapes. In this paper, we provide a new perspective to this problem by considering the existing shapes as a group, and study their similarity measures to the query shape in a graph structure. Our method is general and can be built on top of any existing shape similarity measure. For a given similarity measure, a new similarity is learned through graph transduction. The new similarity is learned iteratively so that the neighbors of a given shape influence its final similarity to the query. The basic idea here is related to PageRank ranking, which forms a foundation of Google Web search. The presented experimental results demonstrate that the proposed approach yields significant improvements over the state-of-art shape matching algorithms. We obtained a retrieval rate of 91.61 percent on the MPEG-7 data set, which is the highest ever reported in the literature. Moreover, the learned similarity by the proposed method also achieves promising improvements on both shape classification and shape clustering.

Proceedings ArticleDOI
Mihai Patrascu1
05 Jun 2010
TL;DR: This work describes a carefully-chosen dynamic version of set disjointness (the "multiphase problem"), and conjecture that it requires n^Omega(1) time per operation, and forms the first nonalgebraic reduction from 3SUM, which allows3SUM-hardness results for combinatorial problems.
Abstract: We consider a number of dynamic problems with no known poly-logarithmic upper bounds, and show that they require nΩ(1) time per operation, unless 3SUM has strongly subquadratic algorithms. Our result is modular: (1) We describe a carefully-chosen dynamic version of set disjointness (the "multiphase problem"), and conjecture that it requires n^Omega(1) time per operation. All our lower bounds follow by easy reduction. (2) We reduce 3SUM to the multiphase problem. Ours is the first nonalgebraic reduction from 3SUM, and allows 3SUM-hardness results for combinatorial problems. For instance, it implies hardness of reporting all triangles in a graph. (3) It is plausible that an unconditional lower bound for the multiphase problem can be established via a number-on-forehead communication game.