scispace - formally typeset
Search or ask a question
Journal

arXiv: Performance 

About: arXiv: Performance is an academic journal. The journal publishes majorly in the area(s): Server & Queueing theory. Over the lifetime, 872 publications have been published receiving 5021 citations.


Papers
More filters
Posted Content
TL;DR: This work investigates cost reduction opportunities that arise by the use of uninterrupted power supply units as energy storage devices and develops an online control algorithm that can optimally exploit these devices to minimize the time average cost.
Abstract: Since the electricity bill of a data center constitutes a significant portion of its overall operational costs, reducing this has become important. We investigate cost reduction opportunities that arise by the use of uninterrupted power supply (UPS) units as energy storage devices. This represents a deviation from the usual use of these devices as mere transitional fail-over mechanisms between utility and captive sources such as diesel generators. We consider the problem of opportunistically using these devices to reduce the time average electric utility bill in a data center. Using the technique of Lyapunov optimization, we develop an online control algorithm that can optimally exploit these devices to minimize the time average cost. This algorithm operates without any knowledge of the statistics of the workload or electricity cost processes, making it attractive in the presence of workload and pricing uncertainties. An interesting feature of our algorithm is that its deviation from optimality reduces as the storage capacity is increased. Our work opens up a new area in data center power management.

335 citations

Posted Content
TL;DR: This paper uses complex, near-identical kernels from a Quantum Monte Carlo application to compare the performance of CUDA and OpenCL and shows that when using NVIDIA compiler tools, converting a CUDA kernel to an OpenCL kernel involves minimal modifications.
Abstract: CUDA and OpenCL are two different frameworks for GPU programming. OpenCL is an open standard that can be used to program CPUs, GPUs, and other devices from different vendors, while CUDA is specific to NVIDIA GPUs. Although OpenCL promises a portable language for GPU programming, its generality may entail a performance penalty. In this paper, we use complex, near-identical kernels from a Quantum Monte Carlo application to compare the performance of CUDA and OpenCL. We show that when using NVIDIA compiler tools, converting a CUDA kernel to an OpenCL kernel involves minimal modifications. Making such a kernel compile with ATI's build tools involves more modifications. Our performance tests measure and compare data transfer times to and from the GPU, kernel execution times, and end-to-end application execution times for both CUDA and OpenCL.

225 citations

Journal ArticleDOI
TL;DR: In this paper, the stationary distribution of the AoI in information update systems is analyzed in terms of the stationary distributions of the system delay and the peak AoI for different service disciplines.
Abstract: This paper considers the stationary distribution of the age of information (AoI) in information update systems. We first derive a general formula for the stationary distribution of the AoI, which holds for a wide class of information update systems. The formula indicates that the stationary distribution of the AoI is given in terms of the stationary distributions of the system delay and the peak AoI. To demonstrate its applicability and usefulness, we analyze the AoI in single-server queues with four different service disciplines: first-come first-served (FCFS), preemptive last-come first-served (LCFS), and two variants of non-preemptive LCFS service disciplines. For the FCFS and the preemptive LCFS service disciplines, the GI/GI/1, M/GI/1, and GI/M/1 queues are considered, and for the non-preemptive LCFS service disciplines, the M/GI/1 and GI/M/1 queues are considered. With these results, we further show comparison results for the mean AoI's in the M/GI/1 and GI/M/1 queues under those service disciplines.

134 citations

Journal Article
TL;DR: The current landscape of TinyML is presented and the challenges and direction towards developing a fair and useful hardware benchmark for TinyML workloads are discussed, along with three preliminary benchmarks and the selection methodology are discussed.
Abstract: Recent advancements in ultra-low-power machine learning (TinyML) hardware promises to unlock an entirely new class of smart applications. However, continued progress is limited by the lack of a widely accepted benchmark for these systems. Benchmarking allows us to measure and thereby systematically compare, evaluate, and improve the performance of systems and is therefore fundamental to a field reaching maturity. In this position paper, we present the current landscape of TinyML and discuss the challenges and direction towards developing a fair and useful hardware benchmark for TinyML workloads. Furthermore, we present our four benchmarks and discuss our selection methodology. Our viewpoints reflect the collective thoughts of the TinyMLPerf working group that is comprised of over 30 organizations.

127 citations

Posted Content
TL;DR: This paper evaluates the performance and compares the results of all chipsets from Qualcomm, HiSilicon, Samsung, MediaTek and Unisoc that are providing hardware acceleration for AI inference and discusses the recent changes in the Android ML pipeline.
Abstract: The performance of mobile AI accelerators has been evolving rapidly in the past two years, nearly doubling with each new generation of SoCs. The current 4th generation of mobile NPUs is already approaching the results of CUDA-compatible Nvidia graphics cards presented not long ago, which together with the increased capabilities of mobile deep learning frameworks makes it possible to run complex and deep AI models on mobile devices. In this paper, we evaluate the performance and compare the results of all chipsets from Qualcomm, HiSilicon, Samsung, MediaTek and Unisoc that are providing hardware acceleration for AI inference. We also discuss the recent changes in the Android ML pipeline and provide an overview of the deployment of deep learning models on mobile devices. All numerical results provided in this paper can be found and are regularly updated on the official project website: this http URL.

88 citations

Network Information
Related Journals (5)
IEEE Transactions on Parallel and Distributed Systems
5.2K papers, 237.8K citations
85% related
IEEE ACM Transactions on Networking
4K papers, 296.9K citations
83% related
arXiv: Learning
45K papers, 837.1K citations
81% related
Computer Networks
5.9K papers, 261.1K citations
79% related
IEEE Journal on Selected Areas in Communications
7.3K papers, 647.9K citations
79% related
Performance
Metrics
No. of papers from the Journal in previous years
YearPapers
202173
2020122
2019126
201894
201787
201657