scispace - formally typeset
Search or ask a question
Author

Suresh Purini

Bio: Suresh Purini is an academic researcher from International Institute of Information Technology, Hyderabad. The author has contributed to research in topics: Compiler & Cloud computing. The author has an hindex of 8, co-authored 31 publications receiving 229 citations. Previous affiliations of Suresh Purini include University of Maryland, Baltimore County & International Institute of Information Technology.

Papers
More filters
Journal ArticleDOI
20 Jan 2013
TL;DR: This work constructs a small set of good sequences such that for every program class there exists a near-optimal optimization sequence in the good sequences set and completely circumvents the need to solve the program classification problem.
Abstract: The compiler optimizations we enable and the order in which we apply them on a program have a substantial impact on the program execution time. Compilers provide default optimization sequences which can give good program speedup. As the default sequences have to optimize programs with different characteristics, they embed in them multiple subsequences which can optimize different classes of programs. These multiple subsequences may falsely interact with each other and affect the potential program speedup achievable. Instead of searching for a single universally optimal sequence, we can construct a small set of good sequences such that for every program class there exists a near-optimal optimization sequence in the good sequences set. If we can construct such a good sequences set which covers all the program classes in the program space, then we can choose the best sequence for a program by trying all the sequences in the good sequences set. This approach completely circumvents the need to solve the program classification problem. Using a sequence set size of around 10 we got an average speedup up to 14p on PolyBench programs and up to 12p on MiBench programs. Our approach is quite different from either the iterative compilation or machine-learning-based prediction modeling techniques proposed in the literature so far. We use different training and test datasets for cross-validation as against the Leave-One-Out cross-validation technique.

63 citations

Proceedings ArticleDOI
11 Sep 2016
TL;DR: This paper develops an approach to map image processing pipelines expressed in the PolyMage DSL to efficient parallel FPGA designs that lead to designs that deliver significantly higher throughput, and supports a greater variety of filters.
Abstract: This paper describes an automatic approach to accelerate image processing pipelines using FPGAs. An image processing pipeline can be viewed as a graph of interconnected stages that processes images successively. Each stage typically performs a point-wise, stencil, or other more complex operations on image pixels. Recent efforts have led to the development of domain-specific languages (DSL) and optimization frameworks for image processing pipelines. In this paper, we develop an approach to map image processing pipelines expressed in the PolyMage DSL to efficient parallel FPGA designs. Our approach exploits reuse and available memory bandwidth (or chip resources) maximally. When compared to Darkroom, a state-of-the-art approach to compile high-level DSL to FPGAs, our approach (a) leads to designs that deliver significantly higher throughput, and (b) supports a greater variety of filters. Furthermore, the designs we generate obtain an improvement even over pre-optimized FPGA implementations provided by vendor libraries for some of the benchmarks.

42 citations

Proceedings ArticleDOI
27 Jun 2014
TL;DR: RLC provides a reasonable solution for high-efficiency and less disruptive migration scheme by utilizing the three phases of the process migration, and introduces a learning phase to estimate the writable working set (WWS) prior to the migration, resulting in an almost single time transfer of the pages.
Abstract: Today, IaaS cloud providers are dynamically minimizing the cost of data centers operations, while maintaining the Service Level Agreement (SLA). Currently, this is achieved by the live migration capability, which is an advanced state-of-the-art technology of Virtualization. However, existing migration techniques suffer from high network bandwidth utilization, large network data transfer, large migration time as well as the destination's VM failure during migration. In this paper, we propose Reliable Lazy Copy (RLC) - a fast, efficient and a reliable migration technique. RLC provides a reasonable solution for high-efficiency and less disruptive migration scheme by utilizing the three phases of the process migration. For effective network bandwidth utilization and reducing the total migration time, we introduce a learning phase to estimate the writable working set (WWS) prior to the migration, resulting in an almost single time transfer of the pages. Our approach decreases the total data transfer by 1.16 x - 12.21x and the total migration time by a factor of 1.42x - 9.84x against the existing approaches, thus providing a fast and an efficient, reliable VM migration of the VMs in the cloud.

40 citations

Proceedings ArticleDOI
05 Feb 2017
TL;DR: This work proposes a novel hybrid approach for automatic plagiarism detection in programming assignments using static features extracted from the intermediate representation of a program in a compiler infrastructure such as gcc and demonstrates the use of unsupervised learning techniques on the extracted feature representations.
Abstract: In this work, we propose a novel hybrid approach for automatic plagiarism detection in programming assignments. Most of the well known plagiarism detectors either employ a text-based approach or use features based on the property of the program at a syntactic level. However, both these approaches succumb to code obfuscation which is a huge obstacle for automatic software plagiarism detection. Our proposed method uses static features extracted from the intermediate representation of a program in a compiler infrastructure such as gcc. We demonstrate the use of unsupervised learning techniques on the extracted feature representations and show that our system is robust to code obfuscation. We test our method on assignments from introductory programming course. The preliminary results show that our system is better when compared to other popular tools like MOSS. For visualizing the local and global structure of the features, we obtained the low-dimensional representations of our features using a popular technique called t-SNE, a variation of Stochastic Neighbor Embedding, which can preserve neighborhood identity in low-dimensions. Based on this idea of preserving neighborhood identity, we mine interesting information such as the diversity in student solution approaches to a given problem. The presence of well defined clusters in low-dimensional visualizations demonstrate that our features are capable of capturing interesting programming patterns.

14 citations

Proceedings ArticleDOI
01 Jul 2016
TL;DR: A new correction technique - Accurus wherein the author starts from the most significant bit resulting in fast convergence of the result towards the accurate one, which is then applied to an image.
Abstract: Approximate computing techniques have paved new paths to get substantial improvement in speed and power efficiency by making a trade-off with the accuracy of computations in inherently error tolerant applications, like from image and video processing domains. The accuracy requirements of various applications can differ from each other. Even within a same application different computationscan have different accuracy requirements which can vary over time and upon user requirements. Accuracy configurable arithmeticcircuits are essential for these reasons. Such techniques proposed earlier in the literature (ACA) work by improving the accuracy over several pipeline stages. However, those techniques suffer from the drawback that the corrections being made in the initial pipeline stagesare small in magnitude as they are performed from the least significant bit position. In this paper, we propose a new correction technique - Accurus wherein we start from the most significant bit resulting in fast convergence of the result towards the accurate one. We used our approximate adder circuit in a Gaussian Blur filter which is then applied to an image. After one stage of correction, we achieved a peak signal to noise ratio of 40.90 dB when compared with 25.59 dB obtained using the previous well-known technique (ACA).

12 citations


Cited by
More filters
Proceedings ArticleDOI
02 Jul 2018
TL;DR: An ITiCSE working group conducted a systematic review of the introductory programming literature to explore trends, highlight advances in knowledge over the past 15 years, and indicate possible directions for future research.
Abstract: As computing becomes a mainstream discipline embedded in the school curriculum and acts as an enabler for an increasing range of academic disciplines in higher education, the literature on introductory programming is growing. Although there have been several reviews that focus on specific aspects of introductory programming, there has been no broad overview of the literature exploring recent trends across the breadth of introductory programming. This paper is the report of an ITiCSE working group that conducted a systematic review in order to gain an overview of the introductory programming literature. Partitioning the literature into papers addressing the student, teaching, the curriculum, and assessment, we explore trends, highlight advances in knowledge over the past 15 years, and indicate possible directions for future research.

282 citations

01 Jan 2008
TL;DR: 分析了基于处理器硬件虚拟化技术实现的KVM子系统的架构。
Abstract: 分析了基于处理器硬件虚拟化技术实现的KVM子系统的架构。针对KVM跟踪独立事件信息的局限性,提出一种新的KVM事件跟踪机制(kvmtrace)来达到性能调节的目的,并使用relayfs接口进行了设计与实现。同时探讨了Linux kernel Markers实现机制及其在kvmtrace的实际应用。

157 citations

Proceedings ArticleDOI
11 Jun 2018
TL;DR: This work describes a new domain-specific language and compiler called Spatial for higher level descriptions of application accelerators, and summarizes the compiler passes required to support these abstractions, including pipeline scheduling, automatic memory banking, and automated design tuning driven by active machine learning.
Abstract: Industry is increasingly turning to reconfigurable architectures like FPGAs and CGRAs for improved performance and energy efficiency. Unfortunately, adoption of these architectures has been limited by their programming models. HDLs lack abstractions for productivity and are difficult to target from higher level languages. HLS tools are more productive, but offer an ad-hoc mix of software and hardware abstractions which make performance optimizations difficult. In this work, we describe a new domain-specific language and compiler called Spatial for higher level descriptions of application accelerators. We describe Spatial's hardware-centric abstractions for both programmer productivity and design performance, and summarize the compiler passes required to support these abstractions, including pipeline scheduling, automatic memory banking, and automated design tuning driven by active machine learning. We demonstrate the language's ability to target FPGAs and CGRAs from common source code. We show that applications written in Spatial are, on average, 42% shorter and achieve a mean speedup of 2.9x over SDAccel HLS when targeting a Xilinx UltraScale+ VU9P FPGA on an Amazon EC2 F1 instance.

154 citations

Patent
29 Dec 2016
TL;DR: In this article, the first write operation generated by the VM to store data in a first sector, determine an identity of the first sector based on the intercepted write operation, and modify the entry in the change block bitmap file to indicate that data in the first sectors has changed.
Abstract: According to certain aspects, a system includes a client device that includes a virtual machine (VM) executed by a hypervisor, a driver located within the hypervisor, and a data agent. The VM may include a virtual hard disk file and a change block bitmap file. The driver may intercept a first write operation generated by the VM to store data in a first sector, determine an identity of the first sector based on the intercepted write operation, determine an entry in the change block bitmap file that corresponds with the first sector, and modify the entry in the change block bitmap file to indicate that data in the first sector has changed. The data agent may generate an incremental backup of the VM based on the change block bitmap file in response to an instruction from a storage manager, where the incremental backup includes the data in the first sector.

115 citations

Patent
22 Sep 2014
TL;DR: In this article, an enhanced media agent may pre-stage certain backed up data blocks which may be needed to launch the virtual machine, based on predictive analysis pertaining to the VM's operational profile.
Abstract: Systems and methods enable a virtual machine, including any applications executing thereon, to quickly start executing and servicing users based on pre-staged data blocks supplied from a backup copy in secondary storage. An enhanced media agent may pre-stage certain backed up data blocks which may be needed to launch the virtual machine, based on predictive analysis pertaining to the virtual machine's operational profile. The enhanced media agent may also pre-stage backed up data blocks for a virtual-machine-file-relocation operation, based on the operation's relocation scheme. Servicing read requests to the virtual machine may take priority over ongoing pre-staging of backed up data. Read requests may be tracked so that the media agent may properly maintain the contents of an associated read cache. Some embodiments of the illustrative storage management system may lack, or may simply not require, the relocation operation, and may operate in a “live mount” configuration.

99 citations