Home
/
Authors
/
Peter Varman

Author

Peter Varman

Other affiliations: University of Texas at Austin, VMware

Bio: Peter Varman is an academic researcher from Rice University. The author has contributed to research in topics: I/O scheduling & Scheduling (computing). The author has an hindex of 24, co-authored 119 publications receiving 2072 citations. Previous affiliations of Peter Varman include University of Texas at Austin & VMware.

Papers published on a yearly basis

2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2005
2004
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

mClock: handling throughput variability for hypervisor IO scheduling

[...]

Ajay Gulati¹, Arif Merchant², Peter Varman³•Institutions (3)

VMware¹, Hewlett-Packard², Rice University³

04 Oct 2010

TL;DR: The algorithm, mClock, supports proportional-share fairness subject to minimum reservations and maximum limits on the IO allocations for VMs and indicates that these rich QoS controls are quite effective in isolating VM performance and providing better application latency.

...read moreread less

Abstract: Virtualized servers run a diverse set of virtual machines (VMs), ranging from interactive desktops to test and development environments and even batch workloads. Hypervisors are responsible for multiplexing the underlying hardware resources among VMs while providing them the desired degree of isolation using resource management controls. Existing methods provide many knobs for allocating CPU and memory to VMs, but support for control of IO resource allocation has been quite limited. IO resource management in a hypervisor introduces significant new challenges and needs more extensive controls than in commodity operating systems.This paper introduces a novel algorithm for IO resource allocation in a hypervisor. Our algorithm, mClock, supports proportional-share fairness subject to minimum reservations and maximum limits on the IO allocations for VMs. We present the design of mClock and a prototype implementation inside the VMware ESX server hypervisor. Our results indicate that these rich QoS controls are quite effective in isolating VM performance and providing better application latency. We also show an adaptation of mClock (called dmClock) for a distributed storage environment, where storage is jointly provided by multiple nodes.

...read moreread less

240 citations

Proceedings Article•DOI•

pClock: an arrival curve based approach for QoS guarantees in shared storage systems

[...]

Ajay Gulati¹, Arif Merchant², Peter Varman¹•Institutions (2)

Rice University¹, Hewlett-Packard²

12 Jun 2007

TL;DR: The algorithm pClock, based on arrival curves that intuitively capture the bandwidth and burst requirements of applications, is implemented and it is shown analytically that an application following its arrival curve never misses its deadline.

...read moreread less

Abstract: Storage consolidation is becoming an attractive paradigm for data organization because of the economies of sharing and the ease of centralized management. However, sharing of resources is viable only if applications can be isolated from each other. This work targets the problem of providing performance guarantees to an application irrespective of the behavior of other workloads. Application requirements are represented in terms of the average throughput, latency and maximum burst size. Most earlier schemes only do weighted bandwidth allocation; schemes that provide control of latency either cannot handle bursts or penalize applications for their own prior behavior, such as using spare capacity. Our algorithm pClock is based on arrival curves that intuitively capture the bandwidth and burst requirements of applications. We show analytically that an application following its arrival curve never misses its deadline. We have implemented pClock both in DiskSim and as a module in the Linux kernel 2.6. Our evaluation shows three important features of pClock: (1) benefits over existing algorithms; (2) efficient performance isolation and burst handling; and (3) the ability to allocate spare capacity to either speed up some applications or to a background utility, such as backup. pClock can be efficiently implemented in a system without much overhead.

...read moreread less

146 citations

Proceedings Article•DOI•

High performance reliable variable latency carry select addition

[...]

Kai Du¹, Peter Varman¹, Kartik Mohanram²•Institutions (2)

Rice University¹, University of Pittsburgh²

12 Mar 2012

TL;DR: An analytical model for the error rate of SCSA is developed to facilitate both design exploration and convergence and shows that on average, variable latency addition using SCSA-based speculative adders is 10% faster than the DesignWare adder with up to 43% area reduction.

...read moreread less

Abstract: Speculative adders have attracted strong interest for reducing critical path delays to sub-logarithmic delays by exploiting the trade-offs between reliability and performance. Speculative adders also find use in the design of reliable variable latency adders, which combine speculation with error correction to achieve high performance for low area overhead over traditional adders. This paper describes speculative carry select addition (SCSA), a novel function speculation technique for the design of low error-rate speculative adders and low overhead, high performance, reliable variable latency adders. We develop an analytical model for the error rate of SCSA to facilitate both design exploration and convergence. We show that for an error rate of 0.01% (0.25%), SCSA-based speculative addition is 10% faster than the DesignWare adder with up to 43% (56%) area reduction. Further, on average, variable latency addition using SCSA-based speculative adders is 10% faster than the DesignWare adder with area requirements of -19% to 16% (-17% to 29%) for unsigned random (signed Gaussian) inputs.

...read moreread less

138 citations

Journal Article•DOI•

An efficient multiversion access structure

[...]

Peter Varman¹, Rakesh M. Verma•Institutions (1)

Rice University¹

01 May 1997-IEEE Transactions on Knowledge and Data Engineering

TL;DR: An efficient multiversion access structure for a transaction-time database is presented that requires optimal storage and query times for several important queries and logarithmic update times and good storage utilization and query performance is obtained.

...read moreread less

Abstract: An efficient multiversion access structure for a transaction-time database is presented. Our method requires optimal storage and query times for several important queries and logarithmic update times. Three version operations-inserts, updates, and deletes-are allowed on the current database, while queries are allowed on any version, present or past. The following query operations are performed in optimal query time: key range search, key history search, and time range view. The key-range query retrieves all records having keys in a specified key range at a specified time; the key history query retrieves all records with a given key in a specified time range; and the time range view query retrieves all records that were current during a specified time interval. Special cases of these queries include the key search query, which retrieves a particular version of a record, and the snapshot query which reconstructs the database at some past time. To the best of our knowledge no previous multiversion access structure simultaneously supports all these query and version operations within these time and space bounds. The bounds on query operations are worst case per operation, while those for storage space and version operations are (worst-case) amortized over a sequence of version operations. Simulation results show that good storage utilization and query performance is obtained.

...read moreread less

135 citations

Proceedings Article•DOI•

SoftWrAP: A lightweight framework for transactional support of storage class memory

[...]

Ellis Giles¹, Kshitij A. Doshi², Peter Varman¹•Institutions (2)

Rice University¹, Intel²

01 May 2015

TL;DR: This paper presents SoftWrAP, an open-source framework for Software based Write-Aside Persistence that provides lightweight atomicity and durability for SCM storage transactions, while ensuring fast paths to data in processor caches, DRAM, and persistent memory tiers.

...read moreread less

Abstract: In-memory computing is gaining popularity as a means of sidestepping the performance bottlenecks of block storage operations. However, the volatile nature of DRAM makes these systems vulnerable to system crashes, while the need to continuously refresh massive amounts of passive memoryresident data increases power consumption. Emerging storage-class memory (SCM) technologies combine fast DRAM-like cache-line access granularity with the persistence of storage devices like disks or SSDs, resulting in potential 10x-100x performance gains, and low passive power consumption. This unification of storage and memory into a single directly-accessible persistent tier raises significant reliability and pro-grammability challenges. In this paper, we present SoftWrAP, an open-source framework for Software based Write-Aside Persistence. SoftWrAP provides lightweight atomicity and durability for SCM storage transactions, while ensuring fast paths to data in processor caches, DRAM, and persistent memory tiers. We use our framework to evaluate both handcrafted SCM-based microbenchmarks as well as existing applications, specifically the STX B+Tree library and SQLite database, backed by emulated SCM. Our results show significant benefits of SoftWrAP over existing methods such as undo logging and shadow copying, and can match non-atomic durable writes to SCM, thereby gaining atomic consistency almost for free.

...read moreread less

94 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Approximate computing: An emerging paradigm for energy-efficient design

[...]

Jie Han¹, Michael Orshansky²•Institutions (2)

University of Alberta¹, University of Texas at Austin²

27 May 2013

TL;DR: This paper reviews recent progress in the area, including design of approximate arithmetic blocks, pertinent error and quality measures, and algorithm-level techniques for approximate computing.

...read moreread less

Abstract: Approximate computing has recently emerged as a promising approach to energy-efficient design of digital systems. Approximate computing relies on the ability of many systems and applications to tolerate some loss of quality or optimality in the computed result. By relaxing the need for fully precise or completely deterministic operations, approximate computing techniques allow substantially improved energy efficiency. This paper reviews recent progress in the area, including design of approximate arithmetic blocks, pertinent error and quality measures, and algorithm-level techniques for approximate computing.

...read moreread less

921 citations

Proceedings Article•DOI•

DFTL: a flash translation layer employing demand-based selective caching of page-level address mappings

[...]

Aayush Gupta¹, Youngjae Kim¹, Bhuvan Urgaonkar¹•Institutions (1)

Pennsylvania State University¹

07 Mar 2009

TL;DR: This work proposes a complete paradigm shift in the design of the core FTL engine from the existing techniques with a Demand-based Flash Translation Layer (DFTL), which selectively caches page-level address mappings and develops a flash simulation framework called FlashSim.

...read moreread less

Abstract: Recent technological advances in the development of flash-memory based devices have consolidated their leadership position as the preferred storage media in the embedded systems market and opened new vistas for deployment in enterprise-scale storage systems. Unlike hard disks, flash devices are free from any mechanical moving parts, have no seek or rotational delays and consume lower power. However, the internal idiosyncrasies of flash technology make its performance highly dependent on workload characteristics. The poor performance of random writes has been a cause of major concern, which needs to be addressed to better utilize the potential of flash in enterprise-scale environments. We examine one of the important causes of this poor performance: the design of the Flash Translation Layer (FTL), which performs the virtual-to-physical address translations and hides the erase-before-write characteristics of flash. We propose a complete paradigm shift in the design of the core FTL engine from the existing techniques with our Demand-based Flash Translation Layer (DFTL), which selectively caches page-level address mappings. We develop a flash simulation framework called FlashSim. Our experimental evaluation with realistic enterprise-scale workloads endorses the utility of DFTL in enterprise-scale storage systems by demonstrating: (i) improved performance, (ii) reduced garbage collection overhead and (iii) better overload behavior compared to state-of-the-art FTL schemes. For example, a predominantly random-write dominant I/O trace from an OLTP application running at a large financial institution shows a 78% improvement in average response time (due to a 3-fold reduction in operations of the garbage collector), compared to a state-of-the-art FTL scheme. Even for the well-known read-dominant TPC-H benchmark, for which DFTL introduces additional overheads, we improve system response time by 56%.

...read moreread less

832 citations

Journal Article•DOI•

External memory algorithms and data structures: dealing with massive data

[...]

Jeffrey Scott Vitter¹•Institutions (1)

Duke University¹

01 Jun 2001-ACM Computing Surveys

TL;DR: The state of the art in the design and analysis of external memory algorithms and data structures, where the goal is to exploit locality in order to reduce the I/O costs is surveyed.

...read moreread less

Abstract: Data sets in large applications are often too massive to fit completely inside the computers internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this article we survey the state of the art in the design and analysis of external memory (or EM) algorithms and data structures, where the goal is to exploit locality in order to reduce the I/O costs. We consider a variety of EM paradigms for solving batched and online problems efficiently in external memory. For the batched problem of sorting and related problems such as permuting and fast Fourier transform, the key paradigms include distribution and merging. The paradigm of disk striping offers an elegant way to use multiple disks in parallel. For sorting, however, disk striping can be nonoptimal with respect to I/O, so to gain further improvements we discuss distribution and merging techniques for using the disks independently. We also consider useful techniques for batched EM problems involving matrices (such as matrix multiplication and transposition), geometric data (such as finding intersections and constructing convex hulls), and graphs (such as list ranking, connected components, topological sorting, and shortest paths). In the online domain, canonical EM applications include dictionary lookup and range searching. The two important classes of indexed data structures are based upon extendible hashing and B-trees. The paradigms of filtering and bootstrapping provide a convenient means in online data structures to make effective use of the data accessed from disk. We also reexamine some of the above EM problems in slightly different settings, such as when the data items are moving, when the data items are variable-length (e.g., text strings), or when the allocated amount of internal memory can change dynamically. Programming tools and environments are available for simplifying the EM programming task. During the course of the survey, we report on some experiments in the domain of spatial databases using the TPIE system (transparent parallel I/O programming environment). The newly developed EM algorithms and data structures that incorporate the paradigms we discuss are significantly faster than methods currently used in practice.

...read moreread less

751 citations

2004아테네 올림픽대회출전 한국레슬링 대표선수의 무산소성 운동능력 및 등속성 근력

[...]

윤재량 ( Jae Ryang Yoon )

01 Jan 2005

608 citations

Proceedings Article•DOI•

Automated control of multiple virtualized resources

[...]

Pradeep Padala¹, Kai-Yuan Hou¹, Kang G. Shin¹, Xiaoyun Zhu², Mustafa Uysal³, Zhikui Wang³, Sharad Singhal³, Arif Merchant³ - Show less +4 more•Institutions (3)

University of Michigan¹, VMware², Hewlett-Packard³

01 Apr 2009

TL;DR: Experimental evaluation with RUBiS and TPC-W benchmarks along with production-trace-driven workloads indicates that AutoControl can detect and mitigate CPU and disk I/O bottlenecks that occur over time and across multiple nodes by allocating each resource accordingly.

...read moreread less

Abstract: Virtualized data centers enable sharing of resources among hosted applications. However, it is difficult to satisfy service-level objectives(SLOs) of applications on shared infrastructure, as application workloads and resource consumption patterns change over time. In this paper, we present AutoControl, a resource control system that automatically adapts to dynamic workload changes to achieve application SLOs. AutoControl is a combination of an online model estimator and a novel multi-input, multi-output (MIMO) resource controller. The model estimator captures the complex relationship between application performance and resource allocations, while the MIMO controller allocates the right amount of multiple virtualized resources to achieve application SLOs. Our experimental evaluation with RUBiS and TPC-W benchmarks along with production-trace-driven workloads indicates that AutoControl can detect and mitigate CPU and disk I/O bottlenecks that occur over time and across multiple nodes by allocating each resource accordingly. We also show that AutoControl can be used to provide service differentiation according to the application priorities during resource contention.

...read moreread less

553 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse