Showing papers in &quot;Software - Practice and Experience in 2016&quot;

Better bitmap performance with Roaring bitmaps

TL;DR: In this paper, the authors discuss the evolution of big data computing, differences between traditional data warehousing and big data, taxonomy of Big Data computing and underpinning technologies, integrated platform of Big data and clouds known as big data clouds, layered architecture and components of big Data cloud, and finally open-technical challenges and future directions.

...read moreread less

Abstract: Advances in information technology and its widespread growth in several areas of business, engineering, medical, and scientific studies are resulting in information/data explosion. Knowledge discovery and decision-making from such rapidly growing voluminous data are a challenging task in terms of data organization and processing, which is an emerging trend known as big data computing, a new paradigm that combines large-scale compute, new data-intensive techniques, and mathematical models to build data analytics. Big data computing demands a huge storage and computing for data curation and processing that could be delivered from on-premise or clouds infrastructures. This paper discusses the evolution of big data computing, differences between traditional data warehousing and big data, taxonomy of big data computing and underpinning technologies, integrated platform of big data and clouds known as big data clouds, layered architecture and components of big data cloud, and finally open-technical challenges and future directions. Copyright © 2015 John Wiley & Sons, Ltd.

...read moreread less

141 citations

Journal Article•DOI•

[...]

Samy Chambi¹, Daniel Lemire², Owen Kaser, Robert Godin²•Institutions (2)

Université du Québec à Montréal¹, Télé-université²

01 May 2016-Software - Practice and Experience

TL;DR: The Roaring compressed bitmap format as mentioned in this paper uses packed arrays for compression instead of RLE, and it has been shown to be faster than RLE-based bitmap compression.

...read moreread less

Abstract: Bitmap indexes are commonly used in databases and search engines. By exploiting bit-level parallelism, they can significantly accelerate queries. However, they can use much memory, and thus, we might prefer compressed bitmap indexes. Following Oracle's lead, bitmaps are often compressed using run-length encoding RLE. Building on prior work, we introduce the Roaring compressed bitmap format: it uses packed arrays for compression instead of RLE. We compare it to two high-performance RLE-based bitmap encoding techniques: Word Aligned Hybrid compression scheme and Compressed 'n' Composable Integer Set. On synthetic and real data, we find that Roaring bitmaps 1 often compress significantly better e.g., 2× and 2 are faster than the compressed alternatives up to 900× faster for intersections. Our results challenge the view that RLE-based bitmap compression is best. Copyright © 2015 John Wiley & Sons, Ltd.

...read moreread less

136 citations

Journal Article•DOI•

Deadline scheduling in the Linux kernel

[...]

Juri Lelli, Claudio Scordino, Luca Abeni¹, Dario Faggioli•Institutions (1)

University of Trento¹

SIMD compression and the intersection of sorted integers

TL;DR: This paper presents the experience in the design and implementation of the real‐time scheduler that has been recently included in the Linux kernel, based on the Resource Reservation paradigm, which allows to enforce temporal isolation between the running tasks.

...read moreread less

Abstract: During the last decade, there has been a considerable interest in using Linux in real-time systems, especially for industrial control. The simple and elegant design of Linux guarantees reliability and very good performance, while its open-source license allows to modify and change the source code according to the user needs. However, Linux has been designed to be a general-purpose operating system. Therefore, it presents some issues like unpredictable latencies and limited support for real-time scheduling. In this paper, we present our experience in the design and implementation of the real-time scheduler that has been recently included in the Linux kernel. The scheduler is based on the Resource Reservation paradigm, which allows to enforce temporal isolation between the running tasks. We describe the genesis of the project, the challenges we have encountered, the implementation details and the API offered to the programmers. Then, we show the experimental results measured on a real hardware. Copyright © 2015John Wiley & Sons, Ltd.

...read moreread less

80 citations

Journal Article•DOI•

[...]

Daniel Lemire¹, Leonid Boytsov², Nathan Kurz•Institutions (2)

Télé-université¹, Carnegie Mellon University²

Iterative big data clustering algorithms: a review

TL;DR: The S4-BP128-D4 algorithm as mentioned in this paper uses as little as 0.7CPU cycles per decoded 32-bit integer while still providing state-of-the-art compression.

...read moreread less

Abstract: Sorted lists of integers are commonly used in inverted indexes and database systems. They are often compressed in memory. We can use the single-instruction, multiple data SIMD instructions available in common processors to boost the speed of integer compression schemes. Our S4-BP128-D4 scheme uses as little as 0.7CPU cycles per decoded 32-bit integer while still providing state-of-the-art compression. However, if the subsequent processing of the integers is slow, the effort spent on optimizing decompression speed can be wasted. To show that it does not have to be so, we 1 vectorize and optimize the intersection of posting lists; 2 introduce the SIMD GALLOPING algorithm. We exploit the fact that one SIMD instruction can compare four pairs of 32-bit integers at once. We experiment with two Text REtrieval Conference TREC text collections, GOV2 and ClueWeb09 category B, using logs from the TREC million-query track. We show that using only the SIMD instructions ubiquitous in all modern CPUs, our techniques for conjunctive queries can double the speed of a state-of-the-art approach. Copyright © 2015 John Wiley & Sons, Ltd.

...read moreread less

69 citations

Journal Article•DOI•

[...]

Amin Mohebi¹, Saeed Aghabozorgi¹, Teh Ying Wah¹, Tutut Herawan¹, Ramin Yahyapour - Show less +1 more•Institutions (1)

Information Technology University¹

Automatic builder of class diagram ABCD: an application of UML generation from functional requirements

TL;DR: It is believed that no well‐rounded review provides a significant comparison among parallel clustering algorithms using MapReduce, and this work aims to serve as a stepping stone for researchers who are studying big data clusteringgorithms.

...read moreread less

Abstract: Enterprises today are dealing with the massive size of data, which have been explosively increasing. The key requirements to address this challenge are to extract, analyze, and process data in a timely manner. Clustering is an essential data mining tool that plays an important role for analyzing big data. However, large-scale data clustering has become a challenging task because of the large amount of information that emerges from technological progress in many areas, including finance and business informatics. Accordingly, researchers have dealt with parallel clustering algorithms using parallel programming models to address this issue. MapReduce is one of the most famous frameworks, and it has attracted great attention because of its flexibility, ease of programming, and fault tolerance. However, the framework has evident performance limitations, especially for iterative programs. This study will first review the proposed iterative frameworks that extended MapReduce to support iterative algorithms. We summarize these techniques, discuss their uniqueness and limitations, and explain how they address the challenging issues of iterative programs. We also perform an in-depth review to understand the problems and the solving techniques for parallel clustering algorithms. Hence, we believe that no well-rounded review provides a significant comparison among parallel clustering algorithms using MapReduce. This work aims to serve as a stepping stone for researchers who are studying big data clustering algorithms. Copyright © 2015 John Wiley & Sons, Ltd.

...read moreread less

63 citations

Journal Article•DOI•

[...]

Wahiba Ben Abdessalem Karaa¹, Zeineb Ben Azzouz, Aarti Singh, Nilanjan Dey², Amira S. Ashour³, Amira S. Ashour¹, Henda Ben Ghazala - Show less +3 more•Institutions (3)

Taif University¹, Bengal College of Engineering & Technology², Tanta University³

Maximizing quality of experience through context-aware mobile application scheduling in cloudlet infrastructure

TL;DR: This paper explains the vision of an automated tool for class diagram generation from user requirements expressed in natural language, which amalgamates the statistical and pattern recognition properties of natural language processing techniques.

...read moreread less

Abstract: Software development life cycle is a structured process, including the definition of user requirements specification, the system design, and programming. The design task comprises the transfer of natural language specifications into models. The class diagram of Unified Modeling Language has been considered as one of the most useful diagrams. It is a formal description of user's requirements and serves as inputs to the developers. The automated extraction of UML class diagram from natural language requirements is a highly challenging task. This paper explains our vision of an automated tool for class diagram generation from user requirements expressed in natural language. Our new approach amalgamates the statistical and pattern recognition properties of natural language processing techniques. More than 1000 patterns are defined for the extraction of the class diagram concepts. Once these concepts are captured, an XML Metadata Interchange file is generated and imported with a Computer-Aided Software Engineering tool to build the corresponding UML class diagram. Copyright © 2015 John Wiley & Sons, Ltd.

...read moreread less

46 citations

Journal Article•DOI•

[...]

Md. Redowan Mahmud¹, Mahbuba Afrin¹, Md. Abdur Razzaque¹, Mohammad Mehedi Hassan², Abdulhameed Alelaiwi², Majed Alrubaian² - Show less +2 more•Institutions (2)

University of Dhaka¹, King Saud University²

Probabilistic logic programming on the web

TL;DR: This paper presents the first work in this domain that inscribes the optimal scheduling problem for mobile application software execution requests with three‐dimensional context parameters, and demonstrates that the QCASH outperforms the state‐of‐the‐art works well across the success rate, waiting time, and QoE.

...read moreread less

Abstract: Application software execution requests, from mobile devices to cloud service providers, are often heterogeneous in terms of device, network, and application runtime contexts. These heterogeneous contexts include the remaining battery level of a mobile device, network signal strength it receives and quality-of-service QoS requirement of an application software submitted from that device. Scheduling such application software execution requests from many mobile devices on competent virtual machines to enhance user quality of experience QoE is a multi-constrained optimization problem. However, existing solutions in the literature either address utility maximization problem for service providers or optimize the application QoS levels, bypassing device-level and network-level contextual information. In this paper, a multi-objective nonlinear programming solution to the context-aware application software scheduling problem has been developed, namely, QoE and context-aware scheduling QCASH method, which minimizes the application execution times i.e., maximizes the QoE and maximizes the application execution success rate. To the best of our knowledge, QCASH is the first work in this domain that inscribes the optimal scheduling problem for mobile application software execution requests with three-dimensional context parameters. In QCASH, the context priority of each application is measured by applying min-max normalization and multiple linear regression models on three context parameters-battery level, network signal strength, and application QoS. Experimental results, found from simulation runs on CloudSim toolkit, demonstrate that the QCASH outperforms the state-of-the-art works well across the success rate, waiting time, and QoE. Copyright © 2016 John Wiley & Sons, Ltd.

...read moreread less

44 citations

Journal Article•DOI•

[...]

Fabrizio Riguzzi¹, Elena Bellodi¹, Evelina Lamma¹, Riccardo Zese¹, Giuseppe Cota¹ - Show less +1 more•Institutions (1)

University of Ferrara¹

01 Oct 2016-Software - Practice and Experience

TL;DR: cplint on SWISH, a web interface to cplint, allows users to experiment with Probabilistic Logic Programming without the need to install a system, a procedure that is often complex, error prone, and limited mainly to the Linux platform.

...read moreread less

Abstract: We present the web application 'cplint on SWI-Prolog for SHaring that allows the user to write SWISH' Probabilistic Logic Programs and submit the computation of the probability of queries with a web browser. The application is based on SWISH, a web framework for Logic Programming. SWISH is based on various features and packages of SWI-Prolog, in particular, its web server and its Pengine library, that allow to create remote Prolog engines and to pose queries to them. In order to develop the web application, we started from the PITA system, which is included in cplint, a suite of programs for reasoning over Logic Programs with Annotated Disjunctions, by porting PITA to SWI-Prolog. Moreover, we modified the PITA library so that it can be executed in a multi-threading environment. Developing 'cplint on SWISH' also required modification of the JavaScript SWISH code that creates and queries Pengines. 'cplint on SWISH' includes a number of examples that cover a wide range of domains and provide interesting applications of Probabilistic Logic Programming. By providing a web interface to cplint, we allow users to experiment with Probabilistic Logic Programming without the need to install a system, a procedure that is often complex, error prone, and limited mainly to the Linux platform. In this way, we aim to reach out to a wider audience and popularize Probabilistic Logic Programming. Copyright © 2015 John Wiley & Sons, Ltd.

...read moreread less

39 citations

Journal Article•DOI•

Consistently faster and smaller compressed bitmaps with Roaring

[...]

Daniel Lemire¹, Gregory Ssi-Yan-Kai, Owen Kaser•Institutions (1)

Télé-université¹

A cloud‐based taxi trace mining framework for smart city

TL;DR: In this paper, a new Roaring hybrid is proposed that combines uncompressed bitmaps, packed arrays, and RLE-compressed segments, which can be several times faster up to two orders of magnitude.

...read moreread less

Abstract: Compressed bitmap indexes are used in databases and search engines. Many bitmap compression techniques have been proposed, almost all relying primarily on run-length encoding RLE. However, on unsorted data, we can get superior performance with a hybrid compression technique that uses both uncompressed bitmaps and packed arrays inside a two-level tree. An instance of this technique, Roaring, has recently been proposed. Due to its good performance, it has been adopted by several production platforms e.g., Apache Lucene, Apache Spark, Apache Kylin, and Druid. Yet there are cases where run-length-encoded bitmaps are smaller than the original Roaring bitmaps-typically when the data are sorted so that the bitmaps contain long compressible runs. To better handle these cases, we build a new Roaring hybrid that combines uncompressed bitmaps, packed arrays, and RLE-compressed segments. The result is a new Roaring format that compresses better. Overall, our new implementation of Roaring can be several times faster up to two orders of magnitude than the implementations of traditional RLE-based alternatives WAH, Concise, and EWAH while compressing better. We review the design choices and optimizations that make these good results possible. Copyright © 2016 John Wiley & Sons, Ltd.

...read moreread less

37 citations

Journal Article•DOI•

[...]

Jin Liu¹, Xiao Yu¹, Zheng Xu², Kim-Kwang Raymond Choo³, Kim-Kwang Raymond Choo⁴, Liang Hong¹, Xiaohui Cui¹ - Show less +3 more•Institutions (4)

Wuhan University¹, Chinese Ministry of Public Security², University of South Australia³, University of Texas at San Antonio⁴

Performance-driven instrumentation and mapping strategies using the LARA aspect-oriented programming approach

TL;DR: Experimental results indicate that DTPM algorithm and DTC algorithm can significantly improve the overall performance and scalability of trajectory pattern mining and trajectory clustering on massive taxi trace data.

...read moreread less

Abstract: Summary As a well-known field of big data applications, smart city takes advantage of massive data analysis to achieve efficient management and sustainable development in the current worldwide urbanization process. An important problem in smart city is how to discover frequent trajectory sequence pattern and cluster trajectory. To solve this problem, this paper proposes a cloud-based taxi trajectory pattern mining and trajectory clustering framework for smart city. Our work mainly includes (1) preprocessing raw Global Positioning System trace by calling the Baidu API Geocoding; (2) proposing a distributed trajectory pattern mining (DTPM) algorithm based on Spark; and (3) proposing a distributed trajectory clustering (DTC) algorithm based on Spark. The proposed DTPM algorithm and DTC algorithm can overcome the high input/output overhead and communication overhead by adopting in-memory computation. In addition, the proposed DTPM algorithm can avoid generating redundant local trajectory patterns to significantly improve the overall performance. The proposed DTC algorithm can enhance the performance of trajectory similarity computation by transforming the trajectory similarity calculation into AND and OR operators. Experimental results indicate that DTPM algorithm and DTC algorithm can significantly improve the overall performance and scalability of trajectory pattern mining and trajectory clustering on massive taxi trace data. Copyright © 2016 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

[...]

João M. P. Cardoso¹, Jose G. F. Coutinho², Tiago Carvalho¹, Pedro C. Diniz³, Zlatko Petrov⁴, Wayne Luk², Fernando Gonçalves - Show less +3 more•Institutions (4)

University of Porto¹, Imperial College London², Information Sciences Institute³, Honeywell⁴

Privacy-preserving targeted mobile advertising: requirements, design and a prototype implementation

TL;DR: An aspect‐oriented programming language that allows programmers to convey domain‐specific knowledge and nonfunctional requirements to a toolchain composed of source‐to‐source transformers, compiler optimizers, and mapping/synthesis tools is described.

...read moreread less

Abstract: The development of applications for high-performance embedded systems is a long and error-prone process because in addition to the required functionality, developers must consider various and often conflicting nonfunctional requirements such as performance and/or energy efficiency. The complexity of this process is further exacerbated by the multitude of target architectures and mapping tools. This article describes LARA, an aspect-oriented programming language that allows programmers to convey domain-specific knowledge and nonfunctional requirements to a toolchain composed of source-to-source transformers, compiler optimizers, and mapping/synthesis tools. LARA is sufficiently flexible to target different tools and host languages while also allowing the specification of compilation strategies to enable efficient generation of software code and hardware cores using hardware description languages for hybrid target architectures - a unique feature to the best of our knowledge not found in any other aspect-oriented programming language. A key feature of LARA is its ability to deal with different models of join points, actions, and attributes. In this article, we describe the LARA approach and evaluate its impact on code instrumentation and analysis and on selecting critical code sections to be migrated to hardware accelerators for two embedded applications from industry. Copyright © 2014 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

[...]

Liu Yang¹, Andrew Simpson¹•Institutions (1)

University of Oxford¹

01 Dec 2016-Software - Practice and Experience

TL;DR: The necessary balance that needs to be struck between privacy and utility in this emerging area is discussed and privacy‐preserving targeted mobile advertising is proposed as a solution that tries to achieve that balance.

...read moreread less

Abstract: With the continued proliferation of mobile devices, the collection of information associated with such devices and their users-such as location, installed applications and cookies associated with built-in browsers-has become increasingly straightforward. By analysing such information, organisations are often able to deliver more relevant and better focused advertisements. Of course, such targeted mobile advertising gives rise to a number of concerns, with privacy-related concerns being prominent. In this paper, we discuss the necessary balance that needs to be struck between privacy and utility in this emerging area and propose privacy-preserving targeted mobile advertising as a solution that tries to achieve that balance. Our aim is to develop a solution that can be deployed by users but is also palatable to businesses that operate in this space. This paper focuses on the requirements and design of privacy-preserving targeted mobile advertising and also describes an initial prototype. We also discuss how more detailed technical aspects and a complete evaluation will underpin our future work in this area. Copyright © 2016 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

SALOON: a platform for selecting and configuring cloud environments

[...]

Clément Quinton¹, Daniel Romero¹, Laurence Duchien¹•Institutions (1)

university of lille¹

Timing analysis of the PREEMPT RT Linux kernel

TL;DR: SalOON as mentioned in this paper is a software product lines-based platform to support migrating legacy systems or deploying a new application to a cloud environment, which relies on feature models combined with a domain model to select among cloud environments a wellsuited one.

...read moreread less

Abstract: Migrating legacy systems or deploying a new application to a cloud environment has recently become very trendy, because the number of cloud providers available is still increasing. These cloud environments provide a wide range of resources at different levels of functionality, which must be appropriately configured by stakeholders for the application to run properly. Handling this variability during the configuration and deployment stages is known as a complex and error-prone process, usually made in an ad hoc manner. In this paper, we propose SALOON, a software product lines-based platform to face these issues. We describe the architecture of the SALOON platform, which relies on feature models combined with a domain model used to select among cloud environments a well-suited one. SALOON supports stakeholders while configuring the selected cloud environment in a consistent way and automates the deployment of such configurations through the generation of executable configuration scripts. This paper also reports on some experiments, showing that using SALOON significantly reduces time to configure a cloud environment compared with a manual approach and provides a reliable way to find a correct and suitable configuration. Moreover, our empirical evaluation shows that our approach is effective and scalable to properly deal with a significant number of cloud environments. Copyright © 2015 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

[...]

Daniel Bristot de Oliveira¹, Daniel Bristot de Oliveira², Romulo Silva Oliveira¹•Institutions (2)

Universidade Federal de Santa Catarina¹, Red Hat²

Netkit: network emulation for education

TL;DR: This paper traces a parallel between the theory of response‐time analysis and the abstractions used in the Linux kernel, and describes a customized trace tool that allows the measurement of the delays associated with the main abstractions of the real‐time scheduling theory.

...read moreread less

Abstract: In the theory of real-time scheduling, tasks are described by mathematical variables, which are used in analytical models in order to prove schedulability of the system. On real-time Linux, tasks are computer programs, and Linux developers try to lower the latencies caused by the Linux kernel, trying to achieve faster response for the highest-priority task. Although both seek temporal correctness, they use different abstractions, which end up separating these efforts in two different worlds, making it hard for the Linux practitioners to understand and apply the formally proved models to the Linux kernel and for theoretical researchers to apply the restrictions imposed by Linux for the theoretical models. This paper traces a parallel between the theory of response-time analysis and the abstractions used in the Linux kernel. The contribution of this paper is threefold. We first identify the PREEMPT RT Linux kernel mechanisms that impact the timing of real-time tasks and map these impacts to the main abstractions used by the real-time scheduling theory. Then, we describe a customized trace tool, based on the existing trace infrastructure of the Linux kernel, that allows the measurement of the delays associated with the main abstractions of the real-time scheduling theory. Finally, we use this customized trace tool to characterize the timing lines resulting from the behavior of the PREEMPT RT Linux kernel. Copyright © 2015John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

[...]

Maurizio Pizzonia¹, Massimo Rimondini¹•Institutions (1)

Roma Tre University¹

A rule-based procedure for automatic recognition of design patterns in UML diagrams

TL;DR: It is shown that Netkit is particularly well suited for a quick preparation of complex network scenarios comprising a wide range of networking technologies, which can be specified using configuration languages that are similar to those used on real devices and, once set up, can be easily distributed via email or published on the Web.

...read moreread less

Abstract: In the era of virtualization, virtual networking plays an important role. Besides production use, virtual networking can be effectively adopted in many other contexts where accurate emulation of functionalities is important, like testing before deployment, evaluation of what-if scenarios, research, and, increasingly, didactics. In this paper, we describe our 10-year experience in designing, implementing, using, and maintaining Netkit, an environment for simple, inexpensive, and lightweight network emulation targeted at didactics. We analyze the peculiar requirements in this context and discuss how the architecture chosen for Netkit is tailored to fulfill them. We show that Netkit is particularly well suited for a quick preparation of complex network scenarios comprising a wide range of networking technologies. These scenarios can be specified using configuration languages that are similar to those used on real devices and, once set up, can be easily distributed via email or published on the Web. Netkit comes with a rich set of ready-to-use pre-configured networks, accompanied by lecture slides that enable users to immediately experiment with specific case studies. To complete the picture, we report our experience in supporting and fostering the growth of the community of users revolving around Netkit: more than 15 educational institutions worldwide take advantage of Netkit, allowing teachers and students to practice with realistic networks without the need of expensive laboratories. We also detail how we profitably use Netkit within advanced academic-level networking courses and related examinations at the Roma Tre University. Copyright © 2014 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

[...]

Beniamino Di Martino¹, Antonio Esposito¹•Institutions (1)

Seconda Università degli Studi di Napoli¹

01 Jul 2016-Software - Practice and Experience

TL;DR: A procedure and a prototype implementation for the automatic recognition of design patterns from documentation of software artefacts design and implementation, provided in a machine readable form, namely, the XML Metadata Interchange (XMI) coded representation of UML class diagrams.

...read moreread less

Abstract: In the present work, we describe a procedure and a prototype implementation for the automatic recognition of design patterns from documentation of software artefacts design and implementation, provided in a machine readable form, namely, the XML Metadata Interchange XMI coded representation of UML class diagrams The procedure exploits a semantic representation of the patterns to be recognized, based on an existing Web Ontology Language OWL, known as object design ontology layer ODOL, defined by the University of Massey New Zealand, which has been augmented with an OWL-S based representation of the patterns' dynamic behaviour Both the UML set of diagrams related to the analysed software artefacts and the ODOL+OWL-S patterns representation are automatically scanned and translated into a first-order logic representation namely Prolog A set of first-order logic rules, independent from the specific pattern to be recognized, has been defined to describe the heuristics and features which trigger the recognition, exploiting the Prolog description of the patterns to be recognized and the base of Prolog facts, which represents the UML documentation Copyright © 2015 John Wiley & Sons, Ltd

...read moreread less

Journal Article•DOI•

Migrating legacy software to the cloud: approach and verification by means of two medical software use cases

[...]

Pieter-Jan Maenhaut¹, Hendrik Moens¹, Veerle Ongenae¹, Filip De Turck¹•Institutions (1)

Ghent University¹

Browserbite: cross-browser testing via image processing

TL;DR: The steps necessary to migrate existing applications to a public cloud environment and the steps required to add multi‐tenancy to these applications are described and verified by means of two case studies.

...read moreread less

Abstract: Cloud computing is a technology that enables elastic, on-demand resource provisioning, allowing application developers to build highly scalable systems. Multi-tenancy, the hosting of multiple customers by a single application instance, leads to improved efficiency, improved scalability, and less costs. While these technologies make it possible to create many new applications, legacy applications can also benefit from the added flexibility and cost savings of cloud computing and multi-tenancy. In this article, we describe the steps required to migrate existing applications to a public cloud environment, and the steps required to add multi-tenancy to these applications. We present a generic approach and verify this approach by means of two case studies, a commercial medical communications software package mainly used within hospitals for nurse call systems and a schedule planner for managing medical appointments. Both case studies are subject to stringent security and performance constraints, which need to be taken into account during the migration. In our evaluation, we estimate the required investment costs and compare them to the long-term benefits of the migration. Copyright © 2015 John Wiley & Sons Ltd.

...read moreread less

Journal Article•DOI•

[...]

Tonis Saar, Marlon Dumas¹, Marti Kaljuve, Nataliia Semenenko¹•Institutions (1)

University of Tartu¹

AVOCLOUDY: a simulator of volunteer clouds

TL;DR: In this paper, image segmentation is used to extract "regions" from a web page and computer vision techniques to extract a set of characteristic features from each region, which are then compared against regions extracted from the browser under test based on characteristic features.

...read moreread less

Abstract: Cross-browser compatibility testing is concerned with identifying perceptible differences in the way a Web page is rendered across different browsers or configurations thereof. Existing automated cross-browser compatibility testing methods are generally based on document object model DOM analysis, or in some cases, a combination of DOM analysis with screenshot capture and image processing. DOM analysis, however, may miss incompatibilities that arise not during DOM construction but rather during rendering. Conversely, DOM analysis produces false alarms because different DOMs may lead to identical or sufficiently similar renderings. This paper presents a novel method for cross-browser testing based purely on image processing. The method relies on image segmentation to extract 'regions' from a Web page and computer vision techniques to extract a set of characteristic features from each region. Regions extracted from a screenshot taken on a baseline browser are compared against regions extracted from the browser under test based on characteristic features. A machine learning classifier is used to determine if differences between two matched regions should be classified as an incompatibility. An evaluation involving 140 pages shows that the proposed method achieves an F-score exceeding 90%, outperforming a state-of-the-art cross-browser testing tool based on DOM analysis. Copyright © 2015 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

[...]

Stefano Sebastio¹, Michele Amoretti², Alberto Lluch Lafuente¹, Alberto Lluch Lafuente³•Institutions (3)

IMT Institute for Advanced Studies Lucca¹, University of Parma², Technical University of Denmark³

Compressed bitmap indexes: beyond unions and intersections

TL;DR: The AVOCLOUDY simulator is proposed, the internal architecture of the simulator is presented, implementation details are provided, several notable applications are summarized, and experimental results that measure the simulator performance and its accuracy are provided.

...read moreread less

Abstract: The increasing demand of computational and storage resources is shifting users toward the adoption of cloud technologies. Cloud computing is based on the vision of computing as utility, where users no more need to buy machines but simply access remote resources made available on-demand by cloud providers. The relationship between users and providers is defined by a service-level agreement, where the non-fulfillment of its terms is regulated by the associated penalty fees. Therefore, it is important that the providers adopt proper monitoring and managing strategies. Despite their reduced application, intelligent agents constitute a feasible technology to add autonomic features to cloud operations. Furthermore, the volunteer computing paradigm-one of the Information and Communications Technology ICT trends of the last decade-can be pulled alongside traditional cloud approaches, with the purpose to 'green' them. Indeed, the combination of data center and volunteer resources, managed by agents, allows one to obtain a more robust and scalable cloud computing platform. The increased challenges in designing such a complex system can benefit from a simulation-based approach, to test autonomic management solutions before their deployment in the production environment. However, currently available simulators of cloud platforms are not suitable to model and analyze such heterogeneous, large-scale, and highly dynamic systems. We propose the AVOCLOUDY simulator to fill this gap. This paper presents the internal architecture of the simulator, provides implementation details, summarizes several notable applications, and provides experimental results that measure the simulator performance and its accuracy. The latter experiments are based on real-world worldwide distributed computations on top of the PlanetLab platform. Copyright © 2015 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

[...]

Owen Kaser¹, Daniel Lemire²•Institutions (2)

University of New Brunswick¹, Université du Québec à Montréal²

RUGRAT: Evaluating program analysis and testing tools and compilers with large generated random benchmark applications

TL;DR: In this paper, the authors introduce new algorithms that are sometimes three orders of magnitude faster than a naive approach and show that bitmap indexes are more broadly applicable than is commonly believed.

...read moreread less

Abstract: Compressed bitmap indexes are used to speed up simple aggregate queries in databases. Indeed, set operations like intersections, unions and complements can be represented as logical operations AND, OR and NOT that are ideally suited for bitmaps. However, it is less obvious how to apply bitmaps to more advanced queries. For example, we might seek products in a store that meet some, but maybe not all, criteria. Such threshold queries generalize intersections and unions; they are often used in information-retrieval and data-mining applications. We introduce new algorithms that are sometimes three orders of magnitude faster than a naive approach. Our work shows that bitmap indexes are more broadly applicable than is commonly believed. Copyright © 2014 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

[...]

Ishtiaque Hussain¹, Christoph Csallner¹, Mark Grechanik², Qing Xie³, Sangmin Park⁴, Kunal Taneja³, B M Mainul Hossain² - Show less +3 more•Institutions (4)

University of Texas at Arlington¹, University of Illinois at Chicago², Accenture³, Georgia Institute of Technology⁴

01 Mar 2016-Software - Practice and Experience

TL;DR: This work proposes a novel approach for generating random benchmarks for evaluating program analysis and testing tools and compilers that uses stochastic parse trees, where language grammar production rules are assigned probabilities that specify the frequencies with which instantiations of these rules will appear in the generated programs.

...read moreread less

Abstract: Benchmarks are heavily used in different areas of computer science to evaluate algorithms and tools. In program analysis and testing, open-source and commercial programs are routinely used as benchmarks to evaluate different aspects of algorithms and tools. Unfortunately, many of these programs are written by programmers who introduce different biases, not to mention that it is very difficult to find programs that can serve as benchmarks with high reproducibility of results. We propose a novel approach for generating random benchmarks for evaluating program analysis and testing tools and compilers. Our approach uses stochastic parse trees, where language grammar production rules are assigned probabilities that specify the frequencies with which instantiations of these rules will appear in the generated programs. We implemented our tool for Java and applied it to generate a set of large benchmark programs of up to 5Mlines of code each with which we evaluated different program analysis and testing tools and compilers. The generated benchmarks let us independently rediscover several issues in the evaluated tools. Copyright © 2014 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

Empirical study of the dynamic behavior of JavaScript objects

[...]

Shiyi Wei¹, Franceska Xhakaj², Barbara G. Ryder¹•Institutions (2)

Virginia Tech¹, Lafayette College²

01 Jul 2016-Software - Practice and Experience

TL;DR: This work has performed an empirical study on real JavaScript applications to understand the dynamic behavior of JavaScript objects and investigated the behavioral patterns of observed objects to understanding the coding or user interaction practices in JavaScript software.

...read moreread less

Abstract: Despite the popularity of JavaScript for client-side web applications, there is a lack of effective software tools supporting JavaScript development and testing. The dynamic characteristics of JavaScript pose software engineering challenges such as program understanding and security. One important feature of JavaScript is that its objects support flexible mechanisms such as property changes at runtime and prototype-based inheritance, making it difficult to reason about object behavior. We have performed an empirical study on real JavaScript applications to understand the dynamic behavior of JavaScript objects. We present metrics to measure behavior of JavaScript objects during execution e.g., operations associated with an object, object size, and property type changes. We also investigated the behavioral patterns of observed objects to understand the coding or user interaction practices in JavaScript software. Copyright © 2015John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

TRINI: an adaptive load balancing strategy based on garbage collection for clustered Java systems

[...]

A. Omar Portillo-Dominguez¹, Philip Perry¹, Damien Magoni², Miao Wang¹, John Murphy¹ - Show less +1 more•Institutions (2)

University College Dublin¹, University of Bordeaux²

01 Dec 2016-Software - Practice and Experience

TL;DR: The results have shown that TRINI can achieve significant performance improvements, as well as a consistent behaviour, when it is applied to a set of commonly used load balancing algorithms, demonstrating its generality.

...read moreread less

Abstract: Nowadays, clustered environments are commonly used in high-performance computing and enterprise-level applications to achieve faster response time and higher throughput than single machine environments. Nevertheless, how to effectively manage the workloads in these clusters has become a new challenge. As a load balancer is typically used to distribute the workload among the cluster's nodes, multiple research efforts have concentrated on enhancing the capabilities of load balancers. Our previous work presented a novel adaptive load balancing strategy TRINI that improves the performance of a clustered Java system by avoiding the performance impacts of major garbage collection, which is an important cause of performance degradation in Java. The aim of this paper is to strengthen the validation of TRINI by extending its experimental evaluation in terms of generality, scalability and reliability. Our results have shown that TRINI can achieve significant performance improvements, as well as a consistent behaviour, when it is applied to a set of commonly used load balancing algorithms, demonstrating its generality. TRINI also proved to be scalable across different cluster sizes, as its performance improvements did not noticeably degrade when increasing the cluster size. Finally, TRINI exhibited reliable behaviour over extended time periods, introducing only a small overhead to the cluster in such conditions. These results offer practitioners a valuable reference regarding the benefits that a load balancing strategy, based on garbage collection, can bring to a clustered Java system. Copyright © 2016 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

C-strider: type-aware heap traversal for C

[...]

Karla Saur¹, Michael Hicks¹, Jeffrey S. Foster¹•Institutions (1)

University of Maryland, College Park¹

A novel malware for subversion of self-protection in anti-virus

TL;DR: C‐strider is presented, a framework for writing C heap traversals and transformations that tracks types as it walks the heap, so every callback is supplied with the exact type of the associated location.

...read moreread less

Abstract: Researchers have proposed many tools and techniques that work by traversing the heap, including checkpointing systems, heap profilers, heap assertion checkers, and dynamic software updating systems. Yet building a heap traversal for C remains difficult, and to our knowledge, extant services have used their own application-specific traversals. This paper presents C-strider, a framework for writing C heap traversals and transformations. Writing a basic C-strider service requires implementing only four callbacks; C-strider then generates a program-specific traversal that invokes the callbacks as each heap location is visited. Critically, C-strider is type aware - it tracks types as it walks the heap, so every callback is supplied with the exact type of the associated location. We used C-strider to implement heap serialization, dynamic software updating, heap checking, and profiling, and then applied the resulting traversals to several programs. We found that C-strider requires little programmer effort, and the resulting services are efficient and effective. Copyright © 2015John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

[...]

Byungho Min¹, Vijay Varadharajan¹•Institutions (1)

Macquarie University¹

01 Mar 2016-Software - Practice and Experience

TL;DR: This paper designed a novel malware, which makes use of the weaknesses in anti‐virus software and embeds itself to become a part of the vulnerable anti-virus solution and shows how the proposed defence can be applied to the current versions of vulnerable anti‐Virus solutions without requiring signficant modifications.

...read moreread less

Abstract: Major anti-virus solutions have introduced a feature known as 'self-protection' so that malware and even users cannot modify or disable the core functionality of their products. In this paper, we have investigated 12 anti-virus products from four vendors AVG, Avira, McAfee and Symantec and have discovered that they have certain security weaknesses that can be exploited by malware. We have then designed a novel malware, which makes use of the weaknesses in anti-virus software and embeds itself to become a part of the vulnerable anti-virus solution. It subverts the self-protection features of several anti-virus software solutions. This malware integrated anti-virus enjoys several advantages such as longevity anti-virus is active while the system is running, improved stealthy behaviour, highest privilege and capability to bypass security measures. Then we propose an effective defence against such malware. We have also implemented the defensive measure and evaluated its effectiveness. Finally, we show how the proposed defence can be applied to the current versions of vulnerable anti-virus solutions without requiring signficant modifications. Copyright © 2015John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

Design and implementation of an efficient hybrid dynamic and static typing language

[...]

Miguel Garcia¹, Francisco Ortin¹, Jose Quiroga¹•Institutions (1)

University of Oviedo¹