scispace - formally typeset
Search or ask a question
Author

Kanat Tangwongsan

Bio: Kanat Tangwongsan is an academic researcher from Mahidol University International College. The author has contributed to research in topics: Parallel algorithm & Sliding window protocol. The author has an hindex of 22, co-authored 50 publications receiving 1548 citations. Previous affiliations of Kanat Tangwongsan include Carnegie Mellon University & IBM.

Papers
More filters
Journal Article•DOI•
01 Sep 2013
TL;DR: This paper presents a new space-efficient algorithm for counting and sampling triangles--and more generally, constant-sized cliques--in a massive graph whose edges arrive as a stream.
Abstract: This paper presents a new space-efficient algorithm for counting and sampling triangles--and more generally, constant-sized cliques--in a massive graph whose edges arrive as a stream. Compared to prior work, our algorithm yields significant improvements in the space and time complexity for these fundamental problems. Our algorithm is simple to implement and has very good practical performance on large graphs.

203 citations

Proceedings Article•DOI•
25 Jun 2012
TL;DR: This announcement describes the problem based benchmark suite (PBBS), a set of benchmarks designed for comparing parallel algorithmic approaches, parallel programming language styles, and machine architectures across a broad set of problems.
Abstract: This announcement describes the problem based benchmark suite (PBBS). PBBS is a set of benchmarks designed for comparing parallel algorithmic approaches, parallel programming language styles, and machine architectures across a broad set of problems. Each benchmark is defined concretely in terms of a problem specification and a set of input distributions. No requirements are made in terms of algorithmic approach, programming language, or machine architecture. The goal of the benchmarks is not only to compare runtimes, but also to be able to compare code and other aspects of an implementation (e.g., portability, robustness, determinism, and generality). As such the code for an implementation of a benchmark is as important as its runtime, and the public PBBS repository will include both code and performance results.The benchmarks are designed to make it easy for others to try their own implementations, or to add new benchmark problems. Each benchmark problem includes the problem specification, the specification of input and output file formats, default input generators, test codes that check the correctness of the output for a given input, driver code that can be linked with implementations, a baseline sequential implementation, a baseline multicore implementation, and scripts for running timings (and checks) and outputting the results in a standard format. The current suite includes the following problems: integer sort, comparison sort, remove duplicates, dictionary, breadth first search, spanning forest, minimum spanning forest, maximal independent set, maximal matching, K-nearest neighbors, Delaunay triangulation, convex hull, suffix arrays, n-body, and ray casting. For each problem, we report the performance of our baseline multicore implementation on a 40-core machine.

196 citations

Proceedings Article•DOI•
13 Apr 2015
TL;DR: This paper describes the design and implementation of simple and fast multicore parallel algorithms for exact, as well as approximate, triangle counting and other triangle computations that scale to billions of nodes and edges, and is much faster than existing parallel approximate triangle counting implementations.
Abstract: Triangle counting and enumeration has emerged as a basic tool in large-scale network analysis, fueling the development of algorithms that scale to massive graphs. Most of the existing algorithms, however, are designed for the distributed-memory setting or the external-memory setting, and cannot take full advantage of a multicore machine, whose capacity has grown to accommodate even the largest of real-world graphs.

143 citations

Journal Article•DOI•
01 Feb 2015
TL;DR: Reactive Aggregator (RA), a new framework for incremental sliding-window aggregation, is presented, which has an algorithmic complexity of O(m + m log (n/m)), rivaling the best prior algorithms for any m.
Abstract: Stream processing is gaining importance as more data becomes available in the form of continuous streams and companies compete to promptly extract insights from them. In such applications, sliding-window aggregation is a central operator, and incremental aggregation helps avoid the performance penalty of re-aggregating from scratch for each window change.This paper presents Reactive Aggregator (RA), a new framework for incremental sliding-window aggregation. RA is general in that it does not require aggregation functions to be invertible or commutative, and it does not require windows to be FIFO. We implemented RA as a drop-in replacement for the Aggregate operator of a commercial streaming engine. Given m updates on a window of size n, RA has an algorithmic complexity of O(m + m log (n/m)), rivaling the best prior algorithms for any m. Furthermore, RA's implementation minimizes overheads from allocation and pointer traversals by using a single flat array.

96 citations

Posted Content•
TL;DR: A proof of the $k-median result which avoids the ``coupling'' argument and can be used in other settings where the Arya et al. arguments have been used.
Abstract: We study local search algorithms for metric instances of facility location problems: the uncapacitated facility location problem (UFL), as well as uncapacitated versions of the $k$-median, $k$-center and $k$-means problems. All these problems admit natural local search heuristics: for example, in the UFL problem the natural moves are to open a new facility, close an existing facility, and to swap a closed facility for an open one; in $k$-medians, we are allowed only swap moves. The local-search algorithm for $k$-median was analyzed by Arya et al. (SIAM J. Comput. 33(3):544-562, 2004), who used a clever ``coupling'' argument to show that local optima had cost at most constant times the global optimum. They also used this argument to show that the local search algorithm for UFL was 3-approximation; their techniques have since been applied to other facility location problems. In this paper, we give a proof of the $k$-median result which avoids this coupling argument. These arguments can be used in other settings where the Arya et al. arguments have been used. We also show that for the problem of opening $k$ facilities $F$ to minimize the objective function $\Phi_p(F) = \big(\sum_{j \in V} d(j, F)^p\big)^{1/p}$, the natural swap-based local-search algorithm is a $\Theta(p)$-approximation. This implies constant-factor approximations for $k$-medians (when $p=1$), and $k$-means (when $p = 2$), and an $O(\log n)$-approximation algorithm for the $k$-center problem (which is essentially $p = \log n$).

94 citations


Cited by
More filters
01 Jan 2012

3,692 citations

Book•
02 Jan 1991

1,377 citations

Report•DOI•
01 May 2014
TL;DR: This work presents GraphChi, a disk-based system for computing efficiently on graphs with billions of edges, and builds on the basis of Parallel Sliding Windows to propose a new data structure Partitioned Adjacency Lists, which is used to design an online graph database graphChi-DB.
Abstract: : Current systems for graph computation require a distributed computing cluster to handle very large real-world problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible developing distributed graph algorithms still remains challenging, especially to non-experts. In this work, we present GraphChi, a disk-based system for computing efficiently on graphs with billions of edges. By using a well-known method to break large graphs into small parts, and a novel Parallel Sliding Windows algorithm, GraphChi is able to execute several advanced data mining, graph mining and machine learning algorithms on very large graphs, using just a single consumer-level computer. We show, through experiments and theoretical analysis, that GraphChi performs well on both SSDs and rotational hard drives. We build on the basis of Parallel Sliding Windows to propose a new data structure Partitioned Adjacency Lists, which we use to design an online graph database GraphChi-DB.We demonstrate that, on a single PC, GraphChi-DB can process over one hundred thousand graph updates per second, while simultaneously performing computation. GraphChi-DB compares favorably to existing graph databases, particularly on data that is much larger than the available memory. We evaluate our work both experimentally and theoretically. Based on the Parallel Sliding Windows algorithm, we propose new I/O efficient algorithms for solving fundamental graph problems. We also propose a novel algorithm for simulating billions of random walks in parallel on a single computer. By repeating experiments reported for existing distributed systems we show that with only fraction of the resources, GraphChi can solve the same problems in a very reasonable time. Our work makes large-scale graph computation available to anyone with a modern PC.

907 citations

Proceedings Article•DOI•
08 Oct 2012
TL;DR: GraphChi as mentioned in this paper is a disk-based system for computing efficiently on graphs with billions of edges, using a well-known method to break large graphs into small parts, and a novel parallel sliding windows method.
Abstract: Current systems for graph computation require a distributed computing cluster to handle very large real-world problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible, developing distributed graph algorithms still remains challenging, especially to non-experts.In this work, we present GraphChi, a disk-based system for computing efficiently on graphs with billions of edges. By using a well-known method to break large graphs into small parts, and a novel parallel sliding windows method, GraphChi is able to execute several advanced data mining, graph mining, and machine learning algorithms on very large graphs, using just a single consumer-level computer. We further extend GraphChi to support graphs that evolve over time, and demonstrate that, on a single computer, GraphChi can process over one hundred thousand graph updates per second, while simultaneously performing computation. We show, through experiments and theoretical analysis, that GraphChi performs well on both SSDs and rotational hard drives.By repeating experiments reported for existing distributed systems, we show that, with only fraction of the resources, GraphChi can solve the same problems in very reasonable time. Our work makes large-scale graph computation available to anyone with a modern PC.

874 citations

Proceedings Article•DOI•
23 Feb 2013
TL;DR: This paper presents a lightweight graph processing framework that is specific for shared-memory parallel/multicore machines, which makes graph traversal algorithms easy to write and significantly more efficient than previously reported results using graph frameworks on machines with many more cores.
Abstract: There has been significant recent interest in parallel frameworks for processing graphs due to their applicability in studying social networks, the Web graph, networks in biology, and unstructured meshes in scientific simulation. Due to the desire to process large graphs, these systems have emphasized the ability to run on distributed memory machines. Today, however, a single multicore server can support more than a terabyte of memory, which can fit graphs with tens or even hundreds of billions of edges. Furthermore, for graph algorithms, shared-memory multicores are generally significantly more efficient on a per core, per dollar, and per joule basis than distributed memory systems, and shared-memory algorithms tend to be simpler than their distributed counterparts.In this paper, we present a lightweight graph processing framework that is specific for shared-memory parallel/multicore machines, which makes graph traversal algorithms easy to write. The framework has two very simple routines, one for mapping over edges and one for mapping over vertices. Our routines can be applied to any subset of the vertices, which makes the framework useful for many graph traversal algorithms that operate on subsets of the vertices. Based on recent ideas used in a very fast algorithm for breadth-first search (BFS), our routines automatically adapt to the density of vertex sets. We implement several algorithms in this framework, including BFS, graph radii estimation, graph connectivity, betweenness centrality, PageRank and single-source shortest paths. Our algorithms expressed using this framework are very simple and concise, and perform almost as well as highly optimized code. Furthermore, they get good speedups on a 40-core machine and are significantly more efficient than previously reported results using graph frameworks on machines with many more cores.

816 citations