scispace - formally typeset
Search or ask a question

Showing papers in "Journal of Cryptographic Engineering in 2019"


Journal ArticleDOI
TL;DR: In this paper, the authors introduce a framework for the benchmarking of lightweight block ciphers on a multitude of embedded platforms, including 8-bit AVR, 16-bit MSP430, and 32-bit ARM.
Abstract: In this paper, we introduce a framework for the benchmarking of lightweight block ciphers on a multitude of embedded platforms Our framework is able to evaluate the execution time, RAM footprint, as well as binary code size, and allows one to define a custom “figure of merit” according to which all evaluated candidates can be ranked We used the framework to benchmark implementations of 19 lightweight ciphers, namely AES, Chaskey, Fantomas, HIGHT, LBlock, LEA, LED, Piccolo, PRESENT, PRIDE, PRINCE, RC5, RECTANGLE, RoadRunneR, Robin, Simon, SPARX, Speck, and TWINE, on three microcontroller platforms: 8-bit AVR, 16-bit MSP430, and 32-bit ARM Our results bring some new insights into the question of how well these lightweight ciphers are suited to secure the Internet of things The benchmarking framework provides cipher designers with an easy-to-use tool to compare new algorithms with the state of the art and allows standardization organizations to conduct a fair and consistent evaluation of a large number of candidates

94 citations


Journal ArticleDOI
TL;DR: It is shown that online template attacks need only one power consumption trace of a scalar multiplication on the target device and are suitable not only against ECDSA and static elliptic curve Diffie–Hellman (ECDH), but also against elliptic Curve scalar multiplied in ephemeral ECDH.
Abstract: Template attacks are a special kind of side-channel attacks that work in two stages. In a first stage, the attacker builds up a database of template traces collected from a device which is identical to the attacked device, but under the attacker’s control. In the second stage, traces from the target device are compared to the template traces to recover the secret key. In the context of attacking elliptic curve scalar multiplication with template attacks, one can interleave template generation and template matching and reduce the amount of template traces. This paper enhances the power of this technique by defining and applying the concept of online template attacks, a general attack technique with minimal assumptions for an attacker, who has very very limited control over the template device. We show that online template attacks need only one power consumption trace of a scalar multiplication on the target device; they are thus suitable not only against ECDSA and static elliptic curve Diffie–Hellman (ECDH), but also against elliptic curve scalar multiplication in ephemeral ECDH. In addition, online template attacks need only one template trace per scalar bit and they can be applied to a broad variety of scalar multiplication algorithms. To demonstrate the power of online template attacks, we recover scalar bits of a scalar multiplication using the double-and-add-always algorithm on a twisted Edwards curve running on a smartcard with an ATmega163 CPU.

50 citations


Journal ArticleDOI
TL;DR: This paper investigates whether the basic circuit of Moradi et al. can be tweaked to provide dual functionality of encryption and decryption (ENC/DEC) while keeping the hardware overhead as low as possible.
Abstract: The implementation of the AES encryption core by Moradi et al. at Eurocrypt 2011 is one of the smallest in terms of gate area. The circuit takes around 2400 gates and operates on an 8-bit datapath. However, this is an encryption-only core and unable to cater to block cipher modes like CBC and ELmD that require access to both the AES encryption and decryption modules. In this paper, we look to investigate whether the basic circuit of Moradi et al. can be tweaked to provide dual functionality of encryption and decryption (ENC/DEC) while keeping the hardware overhead as low as possible. We report two constructions of the AES circuit. The first is an 8-bit serialized implementation that provides the functionality of both encryption and decryption and occupies around 2605 GE with a latency of 226 cycles. This is a substantial improvement over the next smallest AES ENC/DEC circuit (Grain of Sand) by Feldhofer et al. which takes around 3400 gates but has a latency of over 1000 cycles for both the encryption and decryption cycles. In the second part, we optimize the above architecture to provide the dual encryption/decryption functionality in only 2227 GE and latency of 246/326 cycles for the encryption and decryption operations, respectively. We take advantage of clock gating techniques to achieve Shiftrow and Inverse Shiftrow operations in 3 cycles instead of 1. This helps us replace many of the scan flip-flops in the design with ordinary flip-flops. Furthermore, we take advantage of the fact that the Inverse Mixcolumn matrix in AES is the cube of the Forward Mixcolumn matrix. Thus by executing the Forward Mixcolumn operation three times over the state, one can achieve the functionality of Inverse Mixcolumn. This saves some more gate area as one is no longer required to have a combined implementation of the Forward and Inverse Mixcolumn circuit.

23 citations


Journal ArticleDOI
TL;DR: Barreto et al. as discussed by the authors proposed a variant of McEliece encryption which is based on quasi cyclic moderate density parity check (QC-MDPC) codes and has significantly smaller keys than the original MCE.
Abstract: The anticipated emergence of quantum computers in the foreseeable future drives the cryptographic community to start considering cryptosystems, which are based on problems that remain intractable even with large-scale quantum computers. One example is the family of code-based cryptosystems that relies on the syndrome decoding problem. Recent work by Misoczki et al. (in: 2013 IEEE international symposium on information theory, pp 2069–2073, 2013. https://doi.org/10.1109/ISIT.2013.6620590 ) showed a variant of McEliece encryption which is based on quasi cyclic moderate density parity check (QC-MDPC) codes and has significantly smaller keys than the original McEliece encryption. It was followed by the newly proposed QC-MDPC-based cryptosystems CAKE (Barreto et al. in: IMA international conference on cryptography and coding, Springer, Berlin, pp 207–226, 2017) and Ouroboros (Deneuville et al. in Ouroboros: a simple, secure and efficient key exchange protocol based on coding theory, Springer, Cham, pp 18–34, 2017. https://doi.org/10.1007/978-3-319-59879-6_2 ). These motivate dedicated new software optimizations. This paper lists the cryptographic primitives that QC-MDPC cryptosystems commonly employ, studies their software optimizations on modern processors, and reports the achieved speedups. It also assesses methods for side channel protection of the implementations and their performance costs. These optimized primitives offer a useful toolbox that can be used, in various ways, by designers and implementers of QC-MDPC cryptosystems. Indeed, we applied our methods to generate a platform-specific additional implementation of “BIKE”—a QC-MDPC key encapsulation mechanism (KEM) proposal submitted to the NIST Post-Quantum Project (NIST:Post-Quantum Cryptography—call for proposals, https://csrc.nist.gov/Projects/Post-Quantum-Cryptography , 2017). This gave a $$5\times $$ speedup compared to the reference implementation.

21 citations


Journal ArticleDOI
TL;DR: The Meas scheme is presented, which combines ideas from fresh re-keying and authentication trees by storing encryption keys in a tree structure to thwart first-order DPA without the need for DPA-protected cryptographic primitives.
Abstract: Memory encryption is used in many devices to protect memory content from attackers with physical access to a device. However, many current memory encryption schemes can be broken using differential power analysis (DPA). In this work, we present Meas—the first Memory Encryption and Authentication Scheme providing security against DPA attacks. The scheme combines ideas from fresh re-keying and authentication trees by storing encryption keys in a tree structure to thwart first-order DPA without the need for DPA-protected cryptographic primitives. Therefore, the design strictly limits the use of every key to encrypt at most two different plaintext values. Meas prevents higher-order DPA without changes to the cipher implementation by using masking of the plaintext values. Meas is applicable to all kinds of memory, e.g., NVM and RAM. For RAM, we give two concrete Meas instances based on the lightweight primitives Ascon, PRINCE, and QARMA. We implement and evaluate both instances on a Zynq XC7Z020 FPGA showing that Meas has memory and performance overhead comparable to existing memory authentication techniques without DPA protection.

21 citations


Journal ArticleDOI
TL;DR: This communication advances a fast unified hardware architecture for elliptic curve point multiplication over NIST primes and improvements of this work include word-based modular division, parallel point additions and doublings, and pipelined scalable multiplications and modular reductions.
Abstract: Elliptic curve cryptography has been widely used in public key cryptography, which applies shorter keys to achieve the same security level of RSA cryptosystems. This communication advances a fast unified hardware architecture for elliptic curve point multiplication over NIST primes. The improvements of this work include word-based modular division, parallel point additions and doublings, and pipelined scalable multiplications and modular reductions. The hardware integrates computation for five NIST curves and can compute one time of NIST-192/224/256/384/521 elliptic curve point multiplication in 0.437/0.574/0.776/1.57/2.74 ms with Xilinx Virtex IV device, costing an area of 21,638 slices, 32 DSPs and 26 kbits of RAMs, which outperforms most results as far as we know.

20 citations


Journal ArticleDOI
TL;DR: New RNS Montgomery reduction algorithms are proposed, the main part of which is twice a matrix multiplication, which makes it possible to remove some multiplication steps from conventional algorithms, and thus the new algorithms are simpler and have higher regularity compared with conventional ones.
Abstract: The residue number system (RNS) is a method for representing an integer as an n-tuple of its residues with respect to a given base Since RNS has inherent parallelism, it is actively researched to implement a faster processing system for public-key cryptography This paper proposes new RNS Montgomery reduction algorithms, Q-RNSs, the main part of which is twice a matrix multiplication Letting n be the size of a base set, the number of unit modular multiplications in the proposed algorithms is evaluated as $$(2n^2+n)$$ This is achieved by posing a new restriction on the RNS base, namely, that its elements should have a certain quadratic residuosity This makes it possible to remove some multiplication steps from conventional algorithms, and thus the new algorithms are simpler and have higher regularity compared with conventional ones From our experiments, it is confirmed that there are sufficient candidates for RNS bases meeting the quadratic residuosity requirements

18 citations


Journal ArticleDOI
TL;DR: This analysis shows that with a polynomial number of votes, the XOR Arbiter PUF stability of almost all challenges can be boosted exponentially close to 1; that is, the stability gain through majority voting can exceed the stability loss introduced by large XORs for a feasible number of Votes.
Abstract: In a novel analysis, we formally prove that arbitrarily many Arbiter PUFs can be combined into a stable XOR Arbiter PUF. To the best of our knowledge, this design cannot be modeled by any known oracle access attack in polynomial time. Using majority vote of arbiter chain responses, our analysis shows that with a polynomial number of votes, the XOR Arbiter PUF stability of almost all challenges can be boosted exponentially close to 1; that is, the stability gain through majority voting can exceed the stability loss introduced by large XORs for a feasible number of votes. Considering state-of-the-art modeling attacks by Becker and Ruhrmair et al., our proposal enables the designer to increase the attacker’s effort exponentially while still maintaining polynomial design effort. This is the first result that relates PUF design to this traditional cryptographic design principle.

16 citations


Journal ArticleDOI
TL;DR: A method for converting a Boolean mask to an arithmetic mask that runs in constant time for a fixed order and has quadratic complexity as the security order increases, a significant improvement in previous work that has exponential complexity.
Abstract: Converting a Boolean mask to an arithmetic mask, and vice versa, is often required in implementing side-channel-resistant instances of cryptographic algorithms that mix Boolean and arithmetic operations. In this paper, we describe a method for converting a Boolean mask to an arithmetic mask that runs in constant time for a fixed order and has quadratic complexity as the security order increases, a significant improvement in previous work that has exponential complexity. We propose explicit algorithms for a second-order secure Boolean-to-arithmetic mask conversion that uses 31 instructions and for a third-order secure mask conversion that uses 74 instructions. We show that our second-order secure algorithm is at least an order of magnitude faster and our third-order secure algorithm is more than twice as fast as other algorithms in the literature.

14 citations


Journal ArticleDOI
TL;DR: A pseudo random number generator (PRNG) based on cache-timing-attack resistant AES is implemented and compared with the fastest implementations in both CPU and Nvidia GPU domains to assess the performance overhead and productivity improvement achievable through the PHAST library.
Abstract: PHAST library is a high-level heterogeneous STL-like C $$++$$ library that can be targeted on multi-core processors and Nvidia GPUs. It permits to exploit the performance of modern parallel architectures without the complexity of parallel programming. The library manages the programming and critical fine tuning of the parallel execution on both platforms without interfering with the application code structure, while maintaining the possibility to use architecture-specific features and instructions. In cryptography, performance and architectural efficiency of software implementations is crucial. This is witnessed by the extensive research in highly optimized and specialized versions of many protocols. In this paper, we assess the performance overhead and productivity improvement achievable through the PHAST library. We implement a pseudo random number generator (PRNG) based on cache-timing-attack resistant AES. We compare it with the fastest implementations in both CPU and Nvidia GPU domains. Achieved results show that the PHAST code is shorter and simpler than the state-of-the-art implementations. Its source length is 59.59% of the reference CUDA C implementation and 88.18% of the sequential C $$++$$ version for CPUs, despite being the same for both targets. It is also far less complex in terms of McCabe’s and Halstead’s metrics. Results show that these productivity improvements induce a limited performance overhead of the library layer: less than 5% on single-thread execution for CPUs and around 10% on Nvidia GPUs. Furthermore, performance of the PHAST PRNG automatically scales with the available cores in a nearly linear fashion, allowing programmers to fully exploit multi-core resources with the same source code.

9 citations


Journal ArticleDOI
TL;DR: The findings reveal that the fault model undertaken while targeting the counter can be relaxed at the expense of an exponentially larger message size, which implies that in case of PAEQ the time complexities of the IDFA attack reported remain unaffected.
Abstract: In Saha and Chowdhury (Cryptographic hardware and embedded systems—CHES 2016—18th international conference, Santa Barbara, CA, USA, August 17–19, 2016, Proceedings, 2016) the concept of fault analysis using internal differentials within a cipher was introduced and used to overcome the nonce barrier of conventional differential fault analysis with a demonstration on authenticated cipher PAEQ. However, the attack had a limitation with regard to the fault model which restricted one of the faults to be injected in the last byte of the counter. This in turn also required the message size to be fixed at 255 complete blocks. In this work, we overcome these limitations by extending the concept in a more general setting. In particular, we look at the concept of Fault-Quartets which is central to these kind of fault-based attacks. We theorize the relation of the fault model with the message size which forms an important aspect as regards the complexity of internal differential fault analysis (IDFA). Our findings reveal that the fault model undertaken while targeting the counter can be relaxed at the expense of an exponentially larger message size. Interestingly, the algorithm for finding a Fault-Quartet still remains linear. This in turns implies that in case of PAEQ the time complexities of the IDFA attack reported remain unaffected. The internal differential fault attack is able to uniquely retrieve the key of three versions of full-round PAEQ of key sizes 64, 80 and 128 bits with complexities of about $$2^{16}$$ , $$2^{16}$$ and $$2^{50}$$ , respectively.

Journal ArticleDOI
TL;DR: The theoretical estimates indicate that this new scalar point multiplication algorithm is uniform, it can be parallelized, and differential addition formulas can be deployed, and it also allows trading speed for precomputation cost and storage requirements.
Abstract: We propose new algorithms for constructing multidimensional differential addition chains and for performing multidimensional scalar point multiplication based on these chains. Our algorithms work in any dimension and offer some key efficiency and security features. In particular, our scalar point multiplication algorithm is uniform, it can be parallelized, and differential addition formulas can be deployed. It also allows trading speed for precomputation cost and storage requirements. These key features and our theoretical estimates indicate that this new algorithm may offer some performance advantages over the existing point multiplication algorithms in practice. We also report some experimental results and verify some of our theoretical findings, and a simplistic Magma implementation is provided.

Journal ArticleDOI
TL;DR: A symbolic method is proposed to verify side-channel robustness of masked programs and aims to verify that intermediate computations are statistically independent from secret variables using defined distribution inference rules.
Abstract: Masking is a popular countermeasure against side-channel attacks, which randomizes secret data with random and uniform variables called masks. At software level, masking is usually added in the source code and its effectiveness needs to be verified. In this paper, we propose a symbolic method to verify side-channel robustness of masked programs. The analysis is performed at the assembly level since compilation and optimizations may alter the added protections. Our proposed method aims to verify that intermediate computations are statistically independent from secret variables using defined distribution inference rules. We verify the first round of a masked AES in 22 s and show that some secure algorithms or source codes are not leakage-free in their assembly implementations.

Journal ArticleDOI
TL;DR: The design of a scheme based on pairings and elliptic curves, that is able to handle applications where the number of multiplication is not too high, with interesting practical efficiency when compared to lattice-based solutions is focused on.
Abstract: Homomorphic encryption allows to carry out operations on encrypted data. In this paper, we focus on the design of a scheme based on pairings and elliptic curves, that is able to handle applications where the number of multiplication is not too high, with interesting practical efficiency when compared to lattice-based solutions. The starting point is the Boneh–Goh–Nissim (BGN for short) encryption scheme (Boneh et al. in Kilian J (ed) Theory of cryptography, second theory of cryptography conference, TCC 2005, Cambridge, MA, USA, February 10–12, 2005), which enables the homomorphic evaluation of polynomials of degree at most 2 on ciphertexts. In our scheme, we use constructions coming from Freeman (Gilbert H (ed) Advances in cryptology—EUROCRYPT 2010, 29th annual international conference on the theory and applications of cryptographic techniques, French Riviera, May 30–June 3, 2010) and Catalano and Fiore (Ray I, Li N, Kruegel C (eds) Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, Denver, CO, USA, October 12–16, 2015), to propose a variant of the $${\text {BGN}}$$ scheme that can handle the homomorphic evaluation of polynomials of degree at most 4. We discuss both the mathematical structure of the scheme and its implementation. We provide simulation results, showing the relevance of this solution for applications requiring a low multiplicative depth, and give relative comparison with respect to lattice-based homomorphic encryption schemes.

Journal ArticleDOI
TL;DR: The main focus of the work was electromagnetic side channels since signals with high signal-to-noise ratio (SNR) can be more conveniently captured and can be used in vulnerability testing, which can examine side-channel robustness of cryptographic software on the first steps of development.
Abstract: In this paper, we present an examination of several side-channel attack scenarios on PC-based cryptosystems. Our goal was the development of a unified physical model for sensitive information leakage. The main focus of our work was electromagnetic side channels since signals with high signal-to-noise ratio (SNR) can be more conveniently captured. Moreover, the attacker can make correlations of the EM signal with other types of side-channel signals (such as voltage fluctuations and acoustic emanations). It shows that there may be a common source for a vulnerable signal that passes through several sides channels. We have simulated several attacks on targeted cryptosystems with distinct instruction sets. Trace analysis reveals empirical evidence. which corresponds to the theoretical principles of the mechanisms x86 and x64 processors. Hardware reasons for leakage come from the instructions and data in the processor cache, to be specific, from the fluctuations of power consumption, leading to changes in the voltage regulator of the processor. Thus, the fluctuations in LC circuits result in leakage on multiple side channels. In general, the obtained data together with the principles of signal formation can be used in vulnerability testing, which can examine side-channel robustness of cryptographic software on the first steps of development.

Journal ArticleDOI
TL;DR: This study concentrates on implementing SPA-resistant Montgomery multipliers, and introduces new encoding schemes that allow multiplication with the operands having no zero digits, in order to layout more secure and timing-independent multipliers.
Abstract: As the importance of the modular arithmetic in public-key systems remains, the pursuits of sophisticated cryptographic engineering continue in designing improved architectures for realizing modular arithmetic. This sophistication does not only involve the high-performance, low-power or area-aware optimizations, but also includes secure or hardened realizations, immune against the so-called side-channel attacks. Among these, simple power analysis attack (SPA) requiring only one or a few power traces of the cryptographic activity is considered as the most dangerous treat for security. This study concentrates on implementing SPA-resistant Montgomery multipliers which are the key ingredients in designing substantial cryptosystems. We introduce new encoding schemes that allow multiplication with the operands having no zero digits. Naturally, such encodings result in a homogeneous multiplication in which accumulation needs equivalent computational work. Moreover, in order to layout more secure and timing-independent multipliers, we impose the I/O requirements that resulting Montgomery multipliers do not need extra final reduction. Finally, as proposed methods allow architectures suitable for word serial processing, a memory performance trade-off is possible for constraint environments.

Journal ArticleDOI
TL;DR: In this paper, a new class of irreducible pentanomials was introduced and a multiplier based on the Karatsuba algorithm was proposed, which has comparable time delay when compared to other multipliers based on k-means.
Abstract: We introduce a new class of irreducible pentanomials over $${\mathbb F}_{2}$$ of the form $$f(x) = x^{2b+c} + x^{b+c} + x^b + x^c + 1$$ . Let $$m=2b+c$$ and use f to define the finite field extension of degree m. We give the exact number of operations required for computing the reduction modulo f. We also provide a multiplier based on Karatsuba algorithm in $$\mathbb {F}_2[x]$$ combined with our reduction process. We give the total cost of the multiplier and found that the bit-parallel multiplier defined by this new class of polynomials has improved XOR and AND complexity. Our multiplier has comparable time delay when compared to other multipliers based on Karatsuba algorithm.

Journal ArticleDOI
TL;DR: In this article, a practical masking scheme specifying ODSM was proposed to protect the symmetric encryption against side channel attacks and fault injection attacks in AES-128 and AES-256.
Abstract: Side-channel attacks (SCAs) and fault injection attacks (FIAs) allow an opponent to have partial access to the internal behavior of the hardware Since the end of the 1990s, many works have shown that this type of attacks constitutes a serious threat to cryptosystems implemented in embedded devices In the state of the art, there exist several countermeasures to protect symmetric encryption (especially AES-128) Most of them protect only against one of these two attacks (SCA or FIA) A method called ODSM has been proposed to withstand SCA and FIA, but its implementation in the whole algorithm is a big open problem when no particular hardware protection is possible In the present paper, we propose a practical masking scheme specifying ODSM which makes it possible to protect the symmetric encryption against these two attacks

Journal ArticleDOI
TL;DR: This work proposes a compositional top-down approach to embedded system specification and verification, where the system-on-chip is modeled as a network of distributed automata communicating via paired synchronous message passing and reports on the complete verification of guest mode security properties in the HOL4 theorem prover.
Abstract: The security of embedded systems can be dramatically improved through the use of formally verified isolation mechanisms such as separation kernels, hypervisors, or microkernels. For trustworthiness, particularly for system-level behavior, the verifications need precise models of the underlying hardware. Such models are hard to attain, highly complex, and proofs of their security properties may not easily apply to similar but different platforms. This may render verification economically infeasible. To address these issues, we propose a compositional top-down approach to embedded system specification and verification, where the system-on-chip is modeled as a network of distributed automata communicating via paired synchronous message passing. Using abstract specifications for each component allows to delay the development of detailed models for cores, devices, etc., while still being able to verify high-level security properties like integrity and confidentiality, and soundly refine the result for different instantiations of the abstract components at a later stage. As a case study, we apply this methodology to the verification of information flow security for an industry-scale security-oriented hypervisor on the ARMv8-A platform and report on the complete verification of guest mode security properties in the HOL4 theorem prover.

Journal ArticleDOI
TL;DR: This paper evaluates the efficacy of the proposed circuit by gate counts and logic synthesis with a 65-nm CMOS standard cell library in comparison with conventional circuits and demonstrates that AES S-Box with the proposed Circuit achieves the best area–time efficiency.
Abstract: This paper proposes a compact and highly efficient $$\textit{GF}(2^8)$$ inversion circuit design based on a combination of non-redundant and redundant Galois field (GF) (or finite field) arithmetic. The proposed design utilizes an optimal normal basis and redundant GF representations, called polynomial ring representation and redundantly represented basis, to implement $$\textit{GF}(2^8)$$ inversion using a tower field $$\textit{GF}((2^4)^2)$$ . The flexibility of the redundant representations provides efficient mappings from/to the $$\textit{GF}(2^8)$$ . This paper evaluates the efficacy of the proposed circuit by gate counts and logic synthesis with a 65-nm CMOS standard cell library in comparison with conventional circuits. Consequently, we show that the proposed circuit achieves approximately 25% higher area–time efficiency than the conventional best inversion circuit in our environment. We also demonstrate that AES S-Box with the proposed circuit achieves the best area–time efficiency.

Journal ArticleDOI
TL;DR: This paper presents the first standard cube attack to yield maxterms for Grain-128 up to 160 initialization rounds on non-programmable hardware, and demonstrates the scalability of the solution on multi-GPU systems.
Abstract: Dinur and Shamir’s cube attack has attracted significant attention in the literature. Nevertheless, the lack of implementations achieving effective results casts doubts on its practical relevance. On the theoretical side, promising results have been recently achieved leveraging on division trails. The present paper follows a more practical approach and aims at giving new impetus to this line of research by means of a cipher-independent flexible framework that is able to carry out the cube attack on GPU/CPU clusters. We address all issues posed by a GPU implementation, providing evidence in support of parallel variants of the attack and identifying viable directions for solving open problems in the future. We report the results of running our GPU-based cube attack against round-reduced versions of three well-known ciphers: Trivium, Grain-128 and SNOW 3G. Our attack against Trivium improves the state of the art, permitting full key recovery for Trivium reduced to (up to) 781 initialization rounds (out of 1152) and finding the first-ever maxterm after 800 rounds. In this paper, we also present the first standard cube attack (i.e., neither dynamic nor tester) to yield maxterms for Grain-128 up to 160 initialization rounds on non-programmable hardware. We include a thorough evaluation of the impact of system parameters and GPU architecture on the performance. Moreover, we demonstrate the scalability of our solution on multi-GPU systems. We believe that our extensive set of results can be useful for the cryptographic engineering community at large and can pave the way to further results in the area.

Journal ArticleDOI
TL;DR: This paper presents a new approach for storage/online computation trade-off, by using a multiplicative splitting of the digits of the exponent radix-R representation, and adapt classical algorithms for modular exponentiation and scalar multiplication in order to take advantage of the proposed exponent recoding.
Abstract: Digital signature algorithm (DSA) (resp. ECDSA) involves modular exponentiation (resp. scalar multiplication) of a public and known base by a random one-time exponent. In order to speed up this operation, well-known methods take advantage of the memorization of base powers (resp. base multiples). Best approaches are the Fixed-base radix-R method and the Fixed-base Comb method. In this paper, we present a new approach for storage/online computation trade-off, by using a multiplicative splitting of the digits of the exponent radix-R representation. We adapt classical algorithms for modular exponentiation and scalar multiplication in order to take advantage of the proposed exponent recoding. An analysis of the complexity for practical size shows that our proposed approach involves a lower storage for a given level of online computation. This is confirmed by implementation results showing significant memory saving, up to 3 times for the largest NIST standardized key sizes, compared to the state-of-the-art approaches.

Journal ArticleDOI
TL;DR: A generic and automated framework has been proposed, which can determine the exploitability of fault instances from any given block cipher in a fast and scalable manner and significantly outperforms another recently proposed one as reported by Khanna et.al.
Abstract: Faults have been practically exploited on several occasions to compromise the security of mathematically robust cryptosystems at the implementation level. However, not every possible fault within a cryptosystem is exploitable for fault attack. Comprehensive knowledge about the exploitable part of the fault space is thus imperative for both the algorithm designer and the implementer in order to invent precise countermeasures and robust algorithms. This paper addresses the problem of exploitable fault characterization in the context of differential fault analysis attacks on block ciphers. A generic and automated framework has been proposed, which can determine the exploitability of fault instances from any given block cipher in a fast and scalable manner. Such automation is supposed to work as the core engine for analysing the fault spaces, which are, in general, difficult to characterize with manual effort due to their formidable size and the complex structural features of the ciphers. Our framework significantly outperforms another recently proposed one as reported by Khanna et.al. (in: DAC, ACM, pp. 1–6, 2017), in terms of attack class coverage and automation effort. Evaluation of the framework on AES and PRESENT establishes the efficacy of it as a potential tool for exploitable fault analysis.

Journal ArticleDOI
TL;DR: Effective solutions to efficiently perform all the steps of horizontal side-channel attacks, i.e., practicable means for implementing efficient horizontal attacks are introduced.
Abstract: Nowadays, horizontal or single-shot side-channel attacks against protected implementations of RSA and similar algorithms constitute a theoretic threat against secure devices. Nevertheless, in practice their application remains very difficult not only because of their complexity, but also because of environmental countermeasures integrated by designers that render their application even more difficult. Horizontal side-channel attacks take place in multiple steps. Among them, the most important are the acquisition of a complete trace with a sufficiently high sampling rate, its cutting into regular patterns, the realignment of the obtained patterns, the reduction as far as possible of noise in the acquired trace, the identification of the points of interest and the application of an effective distinguisher. Each of these steps is crucial and leads, if performed without enough attention, to an unsuccessful attack. In this context, this paper introduces effective solutions to efficiently perform all these steps, i.e., practicable means for implementing efficient horizontal attacks.

Journal ArticleDOI
TL;DR: The early life of Horst Feistel is described, in particular, the events shaping his career, which led to the development of today’s high-grade cryptographic algorithms.
Abstract: This paper documents the early life of Horst Feistel, in particular, the events shaping his career. His creativity led to the development of today’s high-grade cryptographic algorithms. We describe Feistel’s successful escape from Nazi Germany, his university training in physics in Zurich and in Boston, and the career change to cryptography. Feistel became a Research Staff Member at the IBM Thomas J. Watson Research Center in Yorktown Heights, New York, in 1968. The cryptographic algorithm LUCIFER encrypts data to secure their contents. It embodies the ideas intrinsic in Feistel’s 1971 IBM patent. Claude Shannon’s 1949 prescription for achieving ideal secrecy was the basis for LUCIFER and its successors DES, 3DES and AES. DES authenticated transactions in the automated teller machine system developed by IBM as part of the Lloyds Bank Cashpoint System in England. Public key cryptography and advances in communication networks would provide a means to secure credit card transactions and lead to a lucrative environment for E-Commerce. The availability of high-grade encryption appears to have drastically limited the National Security Agency’s Signals Intelligence mission. The Department of Justice’s dispute with Apple’s iPhone is an attempt to restrict the commercial availability of high-grade encryption algorithms. It signals the struggle between privacy and national security.

Journal ArticleDOI
TL;DR: This work suggests that using optimal radix encoding results in an asymptotic 50% increase in bandwidth, and suggests a polynomial-time approximation algorithm to find an optimal solution.
Abstract: In this work, we explore a combinatorial optimization problem stemming from the Naccache–Stern cryptosystem. We show that solving this problem results in bandwidth improvements, and suggest a polynomial-time approximation algorithm to find an optimal solution. Our work suggests that using optimal radix encoding results in an asymptotic 50% increase in bandwidth.

Journal ArticleDOI
TL;DR: This paper designs an effective countermeasure for HCCA protection, where the dependency of side-channel leakage from a school–book multiplication with the underlying multiplier operands is investigated, and shows how changing the sequence in which the operands are passed to the multiplication algorithm introduces dissimilarity in the information leakage.
Abstract: Horizontal collision correlation analysis, in short HCCA, imposes a serious threat to simple power analysis-resistant elliptic curve cryptosystems involving unified algorithms, e.g., Edwards curve unified formula. This attack can be mounted even in the presence of differential power analysis-resistant randomization schemes. In this paper, we have designed an effective countermeasure for HCCA protection, where the dependency of side-channel leakage from a school–book multiplication with the underlying multiplier operands is investigated. We have shown how changing the sequence in which the operands are passed to the multiplication algorithm introduces dissimilarity in the information leakage. This disparity has been utilized in constructing a minimal cost countermeasure against HCCA. This countermeasure integrated with an effective randomization method has been shown to successfully thwart HCCA. Additionally we provide experimental validation for our proposed countermeasure technique on a SASEBO platform. To the best of our knowledge, this is the first time that asymmetry in information leakage has been utilized in designing a side-channel countermeasure.