scispace - formally typeset
Search or ask a question

Showing papers by "Mikhail J. Atallah published in 2016"


Journal ArticleDOI
TL;DR: The similarity SQL-based group-by operator (SGB) as discussed by the authors extends the semantics of the standard SQL Group-by by grouping data with similar but not necessarily equal values.
Abstract: The SQL group-by operator plays an important role in summarizing and aggregating large datasets in a data analytics stack. While the standard group-by operator, which is based on equality, is useful in several applications, allowing similarity aware grouping provides a more realistic view on real-world data that could lead to better insights. The Similarity SQL-based Group-By operator (SGB, for short) extends the semantics of the standard SQL Group-by by grouping data with similar but not necessarily equal values. While existing similarity-based grouping operators efficiently realize these approximate semantics, they primarily focus on one-dimensional attributes and treat multi-dimensional attributes independently. However, correlated attributes, such as in spatial data, are processed independently, and hence, groups in the multi-dimensional space are not detected properly. To address this problem, we introduce two new SGB operators for multi-dimensional data. The first operator is the clique (or distance-to-all) SGB, where all the tuples in a group are within some distance from each other. The second operator is the distance-to-any SGB, where a tuple belongs to a group if the tuple is within some distance from any other tuple in the group. Since a tuple may satisfy the membership criterion of multiple groups, we introduce three different semantics to deal with such a case: (i) eliminate the tuple, (ii) put the tuple in any one group, and (iii) create a new group for this tuple. We implement and test the new SGB operators and their algorithms inside PostgreSQL. The overhead introduced by these operators proves to be minimal and the execution times are comparable to those of the standard Group-by. The experimental study, based on TPC-H and a social check-in data, demonstrates that the proposed algorithms can achieve up to three orders of magnitude enhancement in performance over baseline methods developed to solve the same problem.

20 citations


Journal ArticleDOI
TL;DR: This paper addresses the security issues that arise when outsourcing business processes in the cloud as BPaaS (Business Process as a Service), and provides an efficient anonymization-based protocol to preserve the process fragment provenance and to guarantee the end-to-end availability of process-based applications.

17 citations


Proceedings Article
19 Nov 2016
TL;DR: In this article, the authors give protocols for the cases of (+,min) multiplication, (min,max) multiplication and (min+,+) multiplication over the (+,∗) ring, where one or both of the two operations in the matrix multiplication is the "min" operation.
Abstract: Many protocols exist for a client to outsource the multiplication of matrices to a remote server without revealing to the server the input matrices or the resulting product, and such that the server does all of the super-linear work whereas the client does only work proportional to the size of the input matrices. These existing techniques hinge on the existence of additive and multiplicative inverses for the familiar matrix multiplication over the (+,∗) ring, and they fail when one (or both) of these inverses do not exist, as happens for many practically important algebraic structures (including closed semi-rings) when one or both of the two operations in the matrix multiplication is the “min” or “max” operation. Such matrix multiplications are very common in optimization. We give protocols for the cases of (+,min) multiplication, (min,max) multiplication, and of (min,+) multiplication; the last two cases are particularly important primitives in many combinatorial opti-

9 citations


Proceedings ArticleDOI
16 May 2016
TL;DR: Two new Similarity SQL-based Group-By operator (SGB) operators for multi-dimensional data are introduced and demonstrated that the proposed algorithms can achieve up to three orders of magnitude enhancement in performance over baseline methods developed to solve the same problem.
Abstract: The SQL group-by operator plays an important role in summarizing and aggregating large datasets in a data analytics stack. The Similarity SQL-based Group-By operator (SGB, for short) extends the semantics of the standard SQL Group-by by grouping data with similar but not necessarily equal values. While existing similarity-based grouping operators efficiently realize these approximate semantics, they primarily focus on one-dimensional attributes and treat multi-dimensional attributes independently. However, correlated attributes, such as in spatial data, are processed independently, and hence, groups in the multi-dimensional space are not detected properly. To address this problem, we introduce two new SGB operators for multi-dimensional data. The first operator is the clique (or distance-to-all) SGB, where all the tuples in a group are within some distance from each other. The second operator is the distance-to-any SGB, where a tuple belongs to a group if the tuple is within some distance from any other tuple in the group. Since a tuple may satisfy the membership criterion of multiple groups, we introduce three different semantics to deal with such a case: (i) eliminate the tuple, (ii) put the tuple in any one group, and (iii) create a new group for this tuple. We implement and test the new SGB operators and their algorithms inside PostgreSQL. The overhead introduced by these operators proves to be minimal and the execution times are comparable to those of the standard Group-by. The experimental study, based on TPC-H and a social check-in data, demonstrates that the proposed algorithms can achieve up to three orders of magnitude enhancement in performance over baseline methods developed to solve the same problem.

7 citations



Journal ArticleDOI
12 Dec 2016
TL;DR: This work utilizes a hardware-dependent function, such as a physically unclonable function (PUF) or a hardware security module (HSM), at the authentication server to inhibit offline password discovery and describes a framework for implementing ErsatzPasswords using a Trusted Platform Module (TPM).
Abstract: In this work, we present a simple, yet effective and practical scheme to improve the security of stored password hashes, increasing the difficulty to crack passwords and exposing cracking attempts. We utilize a hardware-dependent function (HDF), such as a physically unclonable function (PUF) or a hardware security module (HSM), at the authentication server to inhibit offline password discovery. Additionally, a deception mechanism is incorporated to alert administrators of cracking attempts. Using an HDF to generate password hashes hinders attackers from recovering the true passwords without constant access to the HDF. Our scheme can integrate with legacy systems without needing additional servers, changing the structure of the hashed password file, nor modifying client machines. When using our scheme, the structure of the hashed passwords file, e.g., etc/shadow or etc/master.passwd, will appear no different than traditional hashed password files.1 However, when attackers exfiltrate the hashed password file and attempt to crack it, the passwords they will receive are ErsatzPasswords—“fake passwords.”The ErsatzPasswords scheme is flexible by design, enabling it to be integrated into existing authentication systems without changes to user experience. The proposed scheme is integrated into the pam_unix module as well as two client/server authentication schemes: Lightweight Directory Access Protocol (LDAP) authentication and the Pythia pseudorandom function (PRF) Service [Everspaugh et al. 2015]. The core library to support ErsatzPasswords written in C and Python consists of 255 and 103 lines of code, respectively. The integration of ErsatzPasswords into each explored authentication system required less than 100 lines of additional code. Experimental evaluation of ErsatzPasswords shows an increase in authentication latency on the order of 100ms, which maybe acceptable for real world systems. We also describe a framework for implementing ErsatzPasswords using a Trusted Platform Module (TPM).

3 citations