Home
/
Authors
/
Weimin Zheng

Author

Weimin Zheng

Other affiliations: Institute of High Performance Computing Singapore

Bio: Weimin Zheng is an academic researcher from Tsinghua University. The author has contributed to research in topics: Scalability & Grid. The author has an hindex of 36, co-authored 371 publications receiving 5509 citations. Previous affiliations of Weimin Zheng include Institute of High Performance Computing Singapore.

Topics: Scalability, Grid, Cloud computing, Cache, Scheduling (computing) ...read more

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1994
1993
1991
1990
1988
1987
1983
1982

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Understanding user behavior in large-scale video-on-demand systems

[...]

Hongliang Yu¹, Dongdong Zheng¹, Ben Y. Zhao², Weimin Zheng¹•Institutions (2)

Tsinghua University¹, University of California, Santa Barbara²

18 Apr 2006

TL;DR: This study focuses on user behavior, content access patterns, and their implications on the design of multimedia streaming systems, and introduces a modified Poisson distribution that more accurately models the observations.

...read moreread less

Abstract: Video-on-demand over IP (VOD) is one of the best-known examples of "next-generation" Internet applications cited as a goal by networking and multimedia researchers. Without empirical data, researchers have generally relied on simulated models to drive their design and developmental efforts. In this paper, we present one of the first measurement studies of a large VOD system, using data covering 219 days and more than 150,000 users in a VOD system deployed by China Telecom. Our study focuses on user behavior, content access patterns, and their implications on the design of multimedia streaming systems. Our results also show that when used to model the user-arrival rate, the traditional Poisson model is conservative and overestimates the probability of large arrival groups. We introduce a modified Poisson distribution that more accurately models our observations. We also observe a surprising result, that video session lengths has a weak inverse correlation with the video's popularity. Finally, we gain better understanding of the sources of video popularity through analysis of a number of internal and external factors.

...read moreread less

728 citations

Proceedings Article•DOI•

Gemini: a computation-centric distributed graph processing system

[...]

Xiaowei Zhu¹, Wenguang Chen¹, Weimin Zheng¹, Xiaosong Ma²•Institutions (2)

Tsinghua University¹, Qatar Computing Research Institute²

02 Nov 2016

TL;DR: Gemini is presented, a distributed graph processing system that applies multiple optimizations targeting computation performance to build scalability on top of efficiency and significantly outperforms all well-known existing distributed graphprocessing systems.

...read moreread less

Abstract: Traditionally distributed graph processing systems have largely focused on scalability through the optimizations of inter-node communication and load balance. However, they often deliver unsatisfactory overall processing efficiency compared with shared-memory graph computing frameworks. We analyze the behavior of several graph-parallel systems and find that the added overhead for achieving scalability becomes a major limiting factor for efficiency, especially with modern multi-core processors and high-speed interconnection networks.Based on our observations, we present Gemini, a distributed graph processing system that applies multiple optimizations targeting computation performance to build scalability on top of efficiency. Gemini adopts (1) a sparse-dense signal-slot abstraction to extend the hybrid push-pull computation model from shared-memory to distributed scenarios, (2) a chunk-based partitioning scheme enabling low-overhead scaling out designs and locality-preserving vertex accesses, (3) a dual representation scheme to compress accesses to vertex indices, (4) NUMA-aware sub-partitioning for efficient intra-node memory accesses, plus (5) locality-aware chunking and fine-grained work-stealing for improving both internode and intra-node load balance, respectively. Our evaluation on an 8-node high-performance cluster (using five widely used graph applications and five real-world graphs) shows that Gemini significantly outperforms all well-known existing distributed graph processing systems, delivering up to 39.8× (from 8.91×) improvement over the fastest among them.

...read moreread less

314 citations

Proceedings Article•DOI•

DudeTM: Building Durable Transactions with Decoupling for Persistent Memory

[...]

Mengxing Liu¹, Mingxing Zhang¹, Kang Chen¹, Xuehai Qian², Yongwei Wu¹, Weimin Zheng¹, Jinglei Ren³ - Show less +3 more•Institutions (3)

Tsinghua University¹, University of Southern California², Microsoft³

04 Apr 2017

TL;DR: DUDETM is presented, a crash-consistent durable transaction system that avoids the drawbacks of both undo logging and redo logging and can be implemented with existing hardware TMs with minor hardware modifications, leading to a further 1.7times speedup.

...read moreread less

Abstract: Emerging non-volatile memory (NVM) offers non-volatility, byte-addressability and fast access at the same time. To make the best use of these properties, it has been shown by empirical evidence that programs should access NVM directly through CPU load and store instructions, so that the overhead of a traditional file system or database can be avoided. Thus, durable transactions become a common choice of applications for accessing persistent memory data in a crash consistent manner. However, existing durable transaction systems employ either undo logging, which requires a fence for every memory write, or redo logging, which requires intercepting all memory reads within transactions.This paper presents DUDETM, a crash-consistent durable transaction system that avoids the drawbacks of both undo logging and redo logging. DUDETM uses shadow DRAM to decouple the execution of a durable transaction into three fully asynchronous steps. The advantage is that only minimal fences and no memory read instrumentation are required. This design also enables an out-of-the-box transactional memory (TM) to be used as an independent component in our system. The evaluation results show that DUDETM adds durability to a TM system with only 7.4 ~ 24.6% throughput degradation. Compared to the existing durable transaction systems, DUDETM provides 1.7times to 4.4times higher throughput. Moreover, DUDETM can be implemented with existing hardware TMs with minor hardware modifications, leading to a further 1.7times speedup.

...read moreread less

179 citations

Proceedings Article•DOI•

Extending the lifetime of flash-based storage through reducing write amplification from file systems

[...]

Youyou Lu¹, Jiwu Shu¹, Weimin Zheng¹•Institutions (1)

Tsinghua University¹

12 Feb 2013

TL;DR: An object-based flash translation layer design (OFTL), in which mechanisms are co-designed with flash memory, which enables lazy persistence of index metadata and eliminates journals while keeping consistency and coarse-grained block state maintenance reduces persistent free space management overhead.

...read moreread less

Abstract: Flash memory has gained in popularity as storage devices for both enterprise and embedded systems because of its high performance, low energy and reduced cost. The endurance problem of flash memory, however, is still a challenge and is getting worse as storage density increases with the adoption of multi-level cells (MLC). Prior work has addressed wear leveling and data reduction, but there is significantly less work on using the file system to improve flash lifetimes. Some common mechanisms in traditional file systems, such as journaling, metadata synchronization, and page-aligned update, can induce extra write operations and aggravate the wear of flash memory. This problem is called write amplification from file systems.In order to mitigate write amplification, we propose an object-based flash translation layer design (OFTL), in which mechanisms are co-designed with flash memory. By leveraging page metadata, OFTL enables lazy persistence of index metadata and eliminates journals while keeping consistency. Coarse-grained block state maintenance reduces persistent free space management overhead. With byte-unit access interfaces, OFTL is able to compact and co-locate the small updates with metadata to further reduce updates. Experiments show that an OFTL-based system, OFSS, offers a write amplification reduction of 47.4%-89.4% in SYNC mode and 19.8%-64.0% in ASYNC mode compared with ext3, ext2, and btrfs on an up-to-date page-level FTL.

...read moreread less

174 citations

Proceedings Article•DOI•

MapCG: writing parallel program portable between CPU and GPU

[...]

Chuntao Hong¹, Dehao Chen¹, Wenguang Chen¹, Weimin Zheng¹, Haibo Lin² - Show less +1 more•Institutions (2)

Tsinghua University¹, IBM²

11 Sep 2010

TL;DR: This research presents a novel and scalable approaches to solve the problem of high development and maintenance cost of writing GPU specific code with low level GPU APIs such as CUDA.

...read moreread less

Abstract: Graphics Processing Units (GPU) have been playing an important role in the general purpose computing market recently. The common approach to program GPU today is to write GPU specific code with low level GPU APIs such as CUDA. Although this approach can achieve very good performance, it raises serious portability issues: programmers are required to write a specific version of code for each potential target architecture. It results in high development and maintenance cost.We believe it is desired to have a programming model which provides source code portability between CPUs and GPUs, and different GPUs: Programmers only need to write one version of code and can be compiled and executed on either CPUs or GPUs efficiently without modification.In this paper, we propose MapCG, a MapReduce framework to provide source code level portability between CPU and GPU. Different from OpenCL, our framework is based on MapReduce, which provides a high level programming model, making programming much easier.We describe the design of the MapReduce-based high-level programming language and the underlying runtime system to enable portability between CPU and GPU. A prototype of MapCG runtime was implemented, supporting multi-core CPU and NVIDIA GPUs. Experiments show that our implementation can execute the same source code efficiently on multi-core CPU platforms and GPUs, achieving an average of 1.6-2.5x speedup over previous implementations of MapReduce on eight commonly used applications.

...read moreread less

150 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75

Collapse

Cited by

PDF

Open Access

More filters

Fast parallel algorithms for short-range molecular dynamics

[...]

Steven J. Plimpton¹•Institutions (1)

Sandia National Laboratories¹

01 May 1993

TL;DR: Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventional vector supercomputers even for small problems.

...read moreread less

Abstract: Three parallel algorithms for classical molecular dynamics are presented. The first assigns each processor a fixed subset of atoms; the second assigns each a fixed subset of inter-atomic forces to compute; the third assigns each a fixed spatial region. The algorithms are suitable for molecular dynamics models which can be difficult to parallelize efficiently—those with short-range forces where the neighbors of each atom change rapidly. They can be implemented on any distributed-memory parallel machine which allows for message-passing of data between independently executing processors. The algorithms are tested on a standard Lennard-Jones benchmark problem for system sizes ranging from 500 to 100,000,000 atoms on several parallel supercomputers--the nCUBE 2, Intel iPSC/860 and Paragon, and Cray T3D. Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventional vector supercomputers even for small problems. For large problems, the spatial algorithm achieves parallel efficiencies of 90% and a 1840-node Intel Paragon performs up to 165 faster than a single Cray C9O processor. Trade-offs between the three algorithms and guidelines for adapting them to more complex molecular dynamics simulations are also discussed.

...read moreread less

29,323 citations

Journal Article•

The Design and Analysis of Experiments

[...]

Margaret J. Robertson

01 Jun 1953-Yale Journal of Biology and Medicine

TL;DR: This book by a teacher of statistics (as well as a consultant for "experimenters") is a comprehensive study of the philosophical background for the statistical design of experiment.

...read moreread less

Abstract: THE DESIGN AND ANALYSIS OF EXPERIMENTS. By Oscar Kempthorne. New York, John Wiley and Sons, Inc., 1952. 631 pp. $8.50. This book by a teacher of statistics (as well as a consultant for \"experimenters\") is a comprehensive study of the philosophical background for the statistical design of experiment. It is necessary to have some facility with algebraic notation and manipulation to be able to use the volume intelligently. The problems are presented from the theoretical point of view, without such practical examples as would be helpful for those not acquainted with mathematics. The mathematical justification for the techniques is given. As a somewhat advanced treatment of the design and analysis of experiments, this volume will be interesting and helpful for many who approach statistics theoretically as well as practically. With emphasis on the \"why,\" and with description given broadly, the author relates the subject matter to the general theory of statistics and to the general problem of experimental inference. MARGARET J. ROBERTSON

...read moreread less

13,333 citations

Journal Article•DOI•

CHARMM: the biomolecular simulation program.

[...]

Bernard R. Brooks¹, Charles L. Brooks², Alexander D. MacKerell³, Lennart Nilsson⁴, Robert J. Petrella⁵, Benoît Roux⁶, Youngdo Won⁷, Georgios Archontis¹, Christian Bartels¹, Stefan Boresch¹, Amedeo Caflisch¹, Leo S. D. Caves¹, Qiang Cui¹, Aaron R. Dinner¹, Michael Feig¹, Stefan Fischer¹, Jiali Gao¹, Milan Hodošček¹, Wonpil Im¹, K. Kuczera¹, Themis Lazaridis¹, Jianpeng Ma¹, V. Ovchinnikov¹, Emanuele Paci¹, Richard W. Pastor¹, Carol Beth Post¹, Jingzhi Pu¹, M. Schaefer¹, Bruce Tidor¹, Richard M. Venable¹, H. L. Woodcock¹, Xiongwu Wu¹, Wei Yang¹, Darrin M. York¹, Martin Karplus⁵, Martin Karplus⁸ - Show less +32 more•Institutions (8)

National Institutes of Health¹, University of Michigan², University of Maryland, Baltimore³, Karolinska Institutet⁴, Harvard University⁵, University of Chicago⁶, Hanyang University⁷, University of Strasbourg⁸

30 Jul 2009-Journal of Computational Chemistry

TL;DR: An overview of the CHARMM program as it exists today is provided with an emphasis on developments since the publication of the original CHARMM article in 1983.

...read moreread less

Abstract: CHARMM (Chemistry at HARvard Molecular Mechanics) is a highly versatile and widely used molecu- lar simulation program. It has been developed over the last three decades with a primary focus on molecules of bio- logical interest, including proteins, peptides, lipids, nucleic acids, carbohydrates, and small molecule ligands, as they occur in solution, crystals, and membrane environments. For the study of such systems, the program provides a large suite of computational tools that include numerous conformational and path sampling methods, free energy estima- tors, molecular minimization, dynamics, and analysis techniques, and model-building capabilities. The CHARMM program is applicable to problems involving a much broader class of many-particle systems. Calculations with CHARMM can be performed using a number of different energy functions and models, from mixed quantum mechanical-molecular mechanical force fields, to all-atom classical potential energy functions with explicit solvent and various boundary conditions, to implicit solvent and membrane models. The program has been ported to numer- ous platforms in both serial and parallel architectures. This article provides an overview of the program as it exists today with an emphasis on developments since the publication of the original CHARMM article in 1983.

...read moreread less

7,035 citations

Proceedings Article•DOI•

I tube, you tube, everybody tubes: analyzing the world's largest user generated content video system

[...]

Meeyoung Cha¹, Haewoon Kwak², Pablo Rodriguez¹, Yong-Yeol Ahn², Sue Moon² - Show less +1 more•Institutions (2)

Telefónica¹, KAIST²

24 Oct 2007

TL;DR: In this article, the authors analyzed YouTube, the world's largest UGC VoD system, and provided an in-depth study of the popularity life cycle of videos, intrinsic statistical properties of requests and their relationship with video age, and the level of content aliasing or of illegal content.

...read moreread less

Abstract: User Generated Content (UGC) is re-shaping the way people watch video and TV, with millions of video producers and consumers. In particular, UGC sites are creating new viewing patterns and social interactions, empowering users to be more creative, and developing new business opportunities. To better understand the impact of UGC systems, we have analyzed YouTube, the world's largest UGC VoD system. Based on a large amount of data collected, we provide an in-depth study of YouTube and other similar UGC systems. In particular, we study the popularity life-cycle of videos, the intrinsic statistical properties of requests and their relationship with video age, and the level of content aliasing or of illegal content in the system. We also provide insights on the potential for more efficient UGC VoD systems (e.g. utilizing P2P techniques or making better use of caching). Finally, we discuss the opportunities to leverage the latent demand for niche videos that are not reached today due to information filtering effects or other system scarcity distortions. Overall, we believe that the results presented in this paper are crucial in understanding UGC systems and can provide valuable information to ISPs, site administrators, and content owners with major commercial and technical implications.

...read moreread less

1,713 citations

Book•

A grammar of spoken Chinese = 中國話的文法

[...]

元任趙

01 Jan 1968

1,644 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse