scispace - formally typeset
Search or ask a question
Author

Mohammad Hammoud

Other affiliations: University of Pittsburgh
Bio: Mohammad Hammoud is an academic researcher from Carnegie Mellon University. The author has contributed to research in topics: Cache & Cache pollution. The author has an hindex of 11, co-authored 33 publications receiving 586 citations. Previous affiliations of Mohammad Hammoud include University of Pittsburgh.

Papers
More filters
Proceedings ArticleDOI
29 Nov 2011
TL;DR: LARTS attempts to collocate reduce tasks with the maximum required data computed after recognizing input data network locations and sizes and adopts a cooperative paradigm seeking a good data locality while circumventing scheduling delay, scheduling skew, poor system utilization, and low degree of parallelism.
Abstract: MapReduce offers a promising programming model for big data processing. Inspired by functional languages, MapReduce allows programmers to write functional-style code which gets automatically divided into multiple map and/or reduce tasks and scheduled over distributed data across multiple machines. Hadoop, an open source implementation of MapReduce, schedules map tasks in the vicinity of their inputs in order to diminish network traffic and improve performance. However, Hadoop schedules reduce tasks at requesting nodes without considering data locality leading to performance degradation. This paper describes Locality-Aware Reduce Task Scheduler (LARTS), a practical strategy for improving MapReduce performance. LARTS attempts to collocate reduce tasks with the maximum required data computed after recognizing input data network locations and sizes. LARTS adopts a cooperative paradigm seeking a good data locality while circumventing scheduling delay, scheduling skew, poor system utilization, and low degree of parallelism. We implemented LARTS in Hadoop-0.20.2. Evaluation results show that LARTS outperforms the native Hadoop reduce task scheduler by an average of 7%, and up to 11.6%.

155 citations

Journal ArticleDOI
01 Feb 2015
TL;DR: DREAM is presented, a distributed and adaptive RDF system that combines the advantages of the state-of-the-art centralized and distributed RDF systems, whereby data communication is avoided and cluster resources are aggregated.
Abstract: The Resource Description Framework (RDF) and SPARQL query language are gaining wide popularity and acceptance. In this paper, we present DREAM, a distributed and adaptive RDF system. As opposed to existing RDF systems, DREAM avoids partitioning RDF datasets and partitions only SPARQL queries. By not partitioning datasets, DREAM offers a general paradigm for different types of pattern matching queries, and entirely averts intermediate data shuffling (only auxiliary data are shuffled). Besides, by partitioning queries, DREAM presents an adaptive scheme, which automatically runs queries on various numbers of machines depending on their complexities. Hence, in essence DREAM combines the advantages of the state-of-the-art centralized and distributed RDF systems, whereby data communication is avoided and cluster resources are aggregated. Likewise, it precludes their disadvantages, wherein system resources are limited and communication overhead is typically hindering. DREAM achieves all its goals via employing a novel graph-based, rule-oriented query planner and a new cost model. We implemented DREAM and conducted comprehensive experiments on a private cluster and on the Amazon EC2 platform. Results show that DREAM can significantly outperform three related popular RDF systems.

117 citations

Proceedings ArticleDOI
24 Jun 2012
TL;DR: This paper proposes Center-of-Gravity Reduce Scheduler (CoGRS), a locality-aware skew-aware reduce task scheduler for saving MapReduce network traffic, and implements CoGRS in Hadoop-0.20.2.
Abstract: MapReduce is by far one of the most successful realizations of large-scale data-intensive cloud computing platforms. MapReduce automatically parallelizes computation by running multiple map and/or reduce tasks over distributed data across multiple machines. Hadoop is an open source implementation of MapReduce. When Hadoop schedules reduce tasks, it neither exploits data locality nor addresses partitioning skew present in some MapReduce applications. This might lead to increased cluster network traffic. In this paper we investigate the problems of data locality and partitioning skew in Hadoop. We propose Center-of-Gravity Reduce Scheduler (CoGRS), a locality-aware skew-aware reduce task scheduler for saving MapReduce network traffic. In an attempt to exploit data locality, CoGRS schedules each reduce task at its center-of-gravity node, which is computed after considering partitioning skew as well. We implemented CoGRS in Hadoop-0.20.2 and tested it on a private cloud as well as on Amazon EC2. As compared to native Hadoop, our results show that CoGRS minimizes off-rack network traffic by averages of 9.6% and 38.6% on our private cloud and on an Amazon EC2 cluster, respectively. This reflects on job execution times and provides an improvement of up to 23.8%.

89 citations

Journal ArticleDOI
TL;DR: It is shown how the cloud paradigm can be leveraged to teach a cybersecurity course via a virtual classroom equipped with live audio and video and proposed guidelines that can be applied to teach similar computer science and engineering courses are proposed.
Abstract: Cloud computing platforms can be highly attractive to conduct course assignments and empower students with valuable and indispensable hands-on experience. In particular, the cloud can offer teaching staff and students (whether local or remote) on-demand, elastic, dedicated, isolated, (virtually) unlimited, and easily configurable virtual machines. As such, employing cloud-based laboratories can have clear advantages over using classical ones, which impose major hindrances against fulfilling pedagogical objectives and do not scale well when the number of students and distant university campuses grows up. We show how the cloud paradigm can be leveraged to teach a cybersecurity course. Specifically, we share our experience when using cloud computing to teach a senior course on cybersecurity across two campuses via a virtual classroom equipped with live audio and video. Furthermore, based on this teaching experience, we propose guidelines that can be applied to teach similar computer science and engineering courses. We demonstrate how cloud-based laboratory exercises can greatly help students in acquiring crucial cybersecurity skills as well as cloud computing ones, which are in high demand nowadays. The cloud we used for this course was the Amazon Web Services (AWS) public cloud. However, our presented use cases and approaches are equally applicable to other available cloud platforms such as Rackspace and Google Compute Engine, among others.

44 citations

Proceedings ArticleDOI
25 Apr 2007
TL;DR: Detailed design aspects of CA-RAM are presented, to be integrated in future general-purpose and application-specific processors and systems, and achieves comparable search performance while occupying much smaller area and consuming significantly less power.
Abstract: This paper proposes a specialized memory structure called CA-RAM (content addressable random access memory) to accelerate search operations present in many important real-world applications. Search operations can occupy a significant portion of total execution time and energy consumption, while posing a difficult performance problem to tackle using traditional memory hierarchy concepts. In essence, CA-RAM is a direct hardware implementation of the well-known hashing technique. Searchable records are stored in CA-RAM at a location determined by a hash function, defined on their search key. After a database has been built, looking up a record in CA-RAM typically involves a single memory access followed by a parallel key matching operation. Compared with a conventional CAM (content addressable memory) solution, CA-RAM capitalizes on dense SRAM and DRAM designs, and achieves comparable search performance while occupying much smaller area and consuming significantly less power. This paper presents detailed design aspects of CA-RAM, to be integrated in future general-purpose and application-specific processors and systems. To further motivate and justify our approach, we present two real examples of using CA-RAM to build a high-performance search accelerator targeting: IP address lookup in core routers and trigram lookup in a large speech recognition system

36 citations


Cited by
More filters
Journal ArticleDOI

590 citations

Journal ArticleDOI
TL;DR: This review introduces future innovations and a research agenda for cloud computing supporting the transformation of the volume, velocity, variety and veracity into values of Big Data for local to global digital earth science and applications.
Abstract: Big Data has emerged in the past few years as a new paradigm providing abundant data and opportunities to improve and/or enable research and decision-support applications with unprecedented value for digital earth applications including business, sciences and engineering. At the same time, Big Data presents challenges for digital earth to store, transport, process, mine and serve the data. Cloud computing provides fundamental support to address the challenges with shared computing resources including computing, storage, networking and analytical software; the application of these resources has fostered impressive Big Data advancements. This paper surveys the two frontiers – Big Data and cloud computing – and reviews the advantages and consequences of utilizing cloud computing to tackling Big Data in the digital earth and relevant science domains. From the aspects of a general introduction, sources, challenges, technology status and research opportunities, the following observations are offered: (i...

545 citations

01 Jan 2016
TL;DR: The modern operating systems is universally compatible with any devices to read, and is available in the book collection an online access to it is set as public so you can get it instantly.
Abstract: Thank you for downloading modern operating systems. As you may know, people have look hundreds times for their favorite readings like this modern operating systems, but end up in infectious downloads. Rather than enjoying a good book with a cup of coffee in the afternoon, instead they juggled with some harmful bugs inside their desktop computer. modern operating systems is available in our book collection an online access to it is set as public so you can get it instantly. Our books collection spans in multiple locations, allowing you to get the most less latency time to download any of our books like this one. Kindly say, the modern operating systems is universally compatible with any devices to read.

368 citations

Dissertation
21 Sep 2012
TL;DR: The thesis includes presentation of possible operations on sparse matrices and algorithms that basically work on graphs, but with help of duality between graph and his adjacency matrix can be presented with sequence of operations on matrices.
Abstract: The thesis presents usefulness of duality between graph and his adjacency matrix. The teoretical part provides the basis of graph theory and matrix algebra mainly focusing on sparse matrices and options of their presentation witch takes into account the number of nonzero elements in the matrix. The thesis includes presentation of possible operations on sparse matrices and algorithms that basically work on graphs, but with help of duality between graph and his adjacency matrix can be presented with sequence of operations on matrices. Practical part presents implementation of some algorithms that can work both with graphs or their adjacency matrices in programming language Java and testing algorithms that work with matrices. It focuses on comparison in efficiency of algorithm working with matrix written in standard mode and with matrix written in format for sparse matrices. It also studies witch presentation of matrices works beter for witch algorithm.

253 citations

Journal ArticleDOI
01 Jun 2016
TL;DR: This paper introduces a novel relational partitioning schema for RDF data called ExtVP that uses a semi-join based preprocessing, akin to the concept of Join Indices in relational databases, to efficiently minimize query input size regardless of its pattern shape and diameter.
Abstract: RDF has become very popular for semantic data publishing due to its flexible and universal graph-like data model. Thus, the ever-increasing size of RDF data collections raises the need for scalable distributed approaches. We endorse the usage of existing infrastructures for Big Data processing like Hadoop for this purpose. Yet, SPARQL query performance is a major challenge as Hadoop is not intentionally designed for RDF processing. Existing approaches often favor certain query pattern shapes while performance drops significantly for other shapes. In this paper, we introduce a novel relational partitioning schema for RDF data called ExtVP that uses a semi-join based preprocessing, akin to the concept of Join Indices in relational databases, to efficiently minimize query input size regardless of its pattern shape and diameter. Our prototype system S2RDF is built on top of Spark and uses SQL to execute SPARQL queries over ExtVP. We demonstrate its superior performance in comparison to state of the art SPARQL-on-Hadoop approaches.

164 citations