Data Mining Practical Machine Learning Tools and Techniques

The paradigmatic shift from a Web of manual interactions to a Web of programmatic interactions driven by Web services is creating unprecedented opportunities for the formation of online business-to-business (B2B) collaborations. In particular, the creation of value-added services by composition of existing ones is gaining a significant momentum. Since many available Web services provide overlapping or identical functionality, albeit with different quality of service (QoS), a choice needs to be made to determine which services are to participate in a given composite service. This paper presents a middleware platform which addresses the issue of selecting Web services for the purpose of their composition in a way that maximizes user satisfaction expressed as utility functions over QoS attributes, while satisfying the constraints set by the user and by the structure of the composite service. Two selection approaches are described and compared: one based on local (task-level) selection of services and the other based on global allocation of tasks to services using integer programming.

/pdf/qos-aware-middleware-for-web-services-composition-x57gs4mhme.pdf

QoS-aware middleware for Web services composition

Ontologies tend to be found everywhere. They are viewed as the silver bullet for many applications, such as database integration, peer-to-peer systems, e-commerce, semantic web services, or social networks. However, in open or evolving systems, such as the semantic web, different parties would, in general, adopt different ontologies. Thus, merely using ontologies, like using XML, does not reduce heterogeneity: it just raises heterogeneity problems to a higher level. Euzenat and Shvaikos book is devoted to ontology matching as a solution to the semantic heterogeneity problem faced by computer systems. Ontology matching aims at finding correspondences between semantically related entities of different ontologies. These correspondences may stand for equivalence as well as other relations, such as consequence, subsumption, or disjointness, between ontology entities. Many different matching solutions have been proposed so far from various viewpoints, e.g., databases, information systems, and artificial intelligence. The second edition of Ontology Matching has been thoroughly revised and updated to reflect the most recent advances in this quickly developing area, which resulted in more than 150 pages of new content. In particular, the book includes a new chapter dedicated to the methodology for performing ontology matching. It also covers emerging topics, such as data interlinking, ontology partitioning and pruning, context-based matching, matcher tuning, alignment debugging, and user involvement in matching, to mention a few. More than 100 state-of-the-art matching systems and frameworks were reviewed. With Ontology Matching, researchers and practitioners will find a reference book that presents currently available work in a uniform framework. In particular, the work and the techniques presented in this book can be equally applied to database schema matching, catalog integration, XML schema matching and other related problems. The objectives of the book include presenting (i) the state of the art and (ii) the latest research results in ontology matching by providing a systematic and detailed account of matching techniques and matching systems from theoretical, practical and application perspectives.

Ontology Matching

In the Hamadryas baboon, males are substantially larger than females. A troop of baboons is subdivided into a number of ‘one-male groups’, consisting of one adult male and one or more females with their young. The male prevents any of ‘his’ females from moving too far from him. Kummer (1971) performed the following experiment. Two males, A and B, previously unknown to each other, were placed in a large enclosure. Male A was free to move about the enclosure, but male B was shut in a small cage, from which he could observe A but not interfere. A female, unknown to both males, was then placed in the enclosure. Within 20 minutes male A had persuaded the female to accept his ownership. Male B was then released into the open enclosure. Instead of challenging male A , B avoided any contact, accepting A’s ownership.

Evolution and the Theory of Games

This ebook is the first authorized digital version of Kernighan and Ritchie's 1988 classic, The C Programming Language (2nd Ed.). One of the best-selling programming books published in the last fifty years, "K&R" has been called everything from the "bible" to "a landmark in computer science" and it has influenced generations of programmers. Available now for all leading ebook platforms, this concise and beautifully written text is a "must-have" reference for every serious programmers digital library.

As modestly described by the authors in the Preface to the First Edition, this "is not an introductory programming manual; it assumes some familiarity with basic programming concepts like variables, assignment statements, loops, and functions. Nonetheless, a novice programmer should be able to read along and pick up the language, although access to a more knowledgeable colleague will help."

The C programming language

Survey of machine learning techniques for malware analysis

Many security and software testing applications require checking whether certain properties of a program hold for any possible usage scenario. For instance, a tool for identifying software vulnerabilities may need to rule out the existence of any backdoor to bypass a program’s authentication. One approach would be to test the program using different, possibly random inputs. As the backdoor may only be hit for very specific program workloads, automated exploration of the space of possible inputs is of the essence. Symbolic execution provides an elegant solution to the problem, by systematically exploring many possible execution paths at the same time without necessarily requiring concrete inputs. Rather than taking on fully specified input values, the technique abstractly represents them as symbols, resorting to constraint solvers to construct actual instances that would cause property violations. Symbolic execution has been incubated in dozens of tools developed over the past four decades, leading to major practical breakthroughs in a number of prominent software reliability applications. The goal of this survey is to provide an overview of the main ideas, challenges, and solutions developed in the area, distilling them for a broad audience.

A Survey of Symbolic Execution Techniques

Permissioned blockchains are arising as a solution to federate companies prompting accountable interactions. A variety of consensus algorithms for such blockchains have been proposed, each of which has different benefits and drawbacks. Proof-of-Authority (PoA) is a new family of Byzantine fault-tolerant (BFT) consensus algorithms largely used in practice to ensure better performance than traditional Practical Byzantine Fault Tolerance (PBFT). However, the lack of adequate analysis of PoA hinders any cautious evaluation of their effectiveness in real-world permissioned blockchains deployed over the Internet, hence on an eventually synchronous network experimenting Byzantine nodes. In this paper, we analyse two of the main PoA algorithms, named Aura and Clique, both in terms of provided guarantees and performances. First, we derive their functioning including how messages are exchanged, then we weight, by relying on the CAP theorem, consistency, availability and partition tolerance guarantees. We also report a qualitative latency analysis based on message rounds. The analysis advocates that PoA for per- missioned blockchains, deployed over the Internet with Byzantine nodes, do not provide adequate consistency guarantees for scenarios where data integrity is essential. We claim that PBFT can fit better such scenarios, despite a limited loss in terms of performance.

/pdf/pbft-vs-proof-of-authority-applying-the-cap-theorem-to-46q13ghfet.pdf

PBFT vs proof-of-authority: applying the CAP theorem to permissioned blockchain

Today we are witnessing a dramatic shift toward a data-driven economy, where the ability to efficiently and timely analyze huge amounts of data marks the difference between industrial success stories and catastrophic failures. In this scenario Storm, an open source distributed realtime computation system, represents a disruptive technology that is quickly gaining the favor of big players like Twitter and Groupon. A Storm application is modeled as a topology, i.e. a graph where nodes are operators and edges represent data flows among such operators. A key aspect in tuning Storm performance lies in the strategy used to deploy a topology, i.e. how Storm schedules the execution of each topology component on the available computing infrastructure.In this paper we propose two advanced generic schedulers for Storm that provide improved performance for a wide range of application topologies. The first scheduler works offline by analyzing the topology structure and adapting the deployment to it; the second scheduler enhance the previous approach by continuously monitoring system performance and rescheduling the deployment at run-time to improve overall performance. Experimental results show that these algorithms can produce schedules that achieve significantly better performances compared to those produced by Storm's default scheduler.

Adaptive online scheduling in storm

Data is nowadays an invaluable resource, indeed it guides all business decisions in most of the computer-aided human activities. Threats to data integrity are thus of paramount relevance, as tampering with data may maliciously affect crucial business decisions. This issue is especially true in cloud computing environments, where data owners cannot control fundamental data aspects, like the physical storage of data and the control of its accesses. Blockchain has recently emerged as a fascinating technology which, among others, provides compelling properties about data integrity. Using the blockchain to face data integrity threats seems to be a natural choice, but its current limitations of low throughput, high latency, and weak stability hinder the practical feasibility of any blockchain-based solutions. In this paper, by focusing on a case study from the European SUNFISH project, which concerns the design of a secure by-design cloud federation platform for the public sector, we precisely delineate the actual data integrity needs of cloud computing environments and the research questions to be tackled to adopt blockchain-based databases. First, we detail the open research questions and the difficulties inherent in addressing them. Then, we outline a preliminary design of an effective blockchain-based database for cloud computing environments.

/pdf/blockchain-based-database-to-ensure-data-integrity-in-cloud-30vi0n1e8d.pdf

Roberto Baldoni

Papers

Survey of machine learning techniques for malware analysis

A Survey of Symbolic Execution Techniques

PBFT vs proof-of-authority: applying the CAP theorem to permissioned blockchain

Adaptive online scheduling in storm

Blockchain-Based Database to Ensure Data Integrity in Cloud Computing Environments.