scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Shakti-T: A RISC-V Processor with Light Weight Security Extensions

TL;DR: This work presents a unified hardware framework for handling spatial and temporal memory attacks with a RISC-V based micro-architecture with an enhanced application binary interface that enables software layers to use these features to protect sensitive data.
Abstract: With increased usage of compute cores for sensitive applications, including e-commerce, there is a need to provide additional hardware support for securing information from memory based attacks. This work presents a unified hardware framework for handling spatial and temporal memory attacks. The paper integrates the proposed hardware framework with a RISC-V based micro-architecture with an enhanced application binary interface that enables software layers to use these features to protect sensitive data. We demonstrate the effectiveness of the proposed scheme through practical case studies in addition to taking the design through a VLSI CAD design flow. The proposed processor reduces the metadata storage overhead up to 4 x in comparison with the existing solutions, while incurring an area overhead of just 1914 LUTs and 2197 flip flops on an FPGA, without affecting the critical path delay of the processor.
Citations
More filters
Proceedings ArticleDOI
30 May 2020
TL;DR: Xuantie-910 is an industry leading 64-bit high performance embedded RISC-V processor from Alibaba T-Head division that features custom extensions to arithmetic operation, bit manipulation, load and store, TLB and cache operations, and implements the 0.7.1 stable release of RISCV vector extension specification for high efficiency vector processing.
Abstract: The open source RISC-V ISA has been quickly gaining momentum. This paper presents Xuantie-910, an industry leading 64-bit high performance embedded RISC-V processor from Alibaba T-Head division. It is fully based on the RV64GCV instruction set and it features custom extensions to arithmetic operation, bit manipulation, load and store, TLB and cache operations. It also implements the 0.7.1 stable release of RISC-V vector extension specification for high efficiency vector processing. Xuantie-910 supports multi-core multi-cluster SMP with cache coherence. Each cluster contains 1 to 4 core(s) capable of booting the Linux operating system. Each single core utilizes the state-of-the-art 12-stage deep pipeline, out-of-order, multi-issue superscalar architecture, achieving a maximum clock frequency of 2.5 GHz in the typical process, voltage and temperature condition in a TSMC 12nm FinFET process technology. Each single core with the vector execution unit costs an area of 0.8 mm2 (excluding the L2 cache). The toolchain is enhanced significantly to support the vector extension and custom extensions. Through hardware and toolchain co-optimization, to date Xuantie-910 delivers the highest performance (in terms of IPC, speed, and power efficiency) for a number of industrial control flow and data computing benchmarks, when compared with its predecessors in the RISC-V family. Xuantie-910 FPGA implementation has been deployed in the data centers of Alibaba Cloud, for application-specific acceleration (e.g., blockchain transaction). The ASIC deployment at low-cost SoC applications, such as IoT endpoints and edge computing, is planned to facilitate Alibaba's end-to-end and cloud-to-edge computing infrastructure.

55 citations


Cites background from "Shakti-T: A RISC-V Processor with L..."

  • ...Some prior arts extended RISC-V to domainspecific accelerators/coprocessors [22], [27]–[29]....

    [...]

Proceedings ArticleDOI
06 Mar 2019
TL;DR: A lightweight hardware-based secure boot architecture that incorporates an optimized Physical Unclonable Function (PUF) for providing keys to the security blocks of the System on Chip (SoC), among which, secure boot and remote attestation are presented.
Abstract: Securing thousands of connected, resource-constrained computing devices is a major challenge nowadays. Adding to the challenge, third party service providers need regular access to the system. To ensure the integrity of the system and authenticity of the software vendor, secure boot is supported by several commercial processors. However, the existing solutions are either complex, or have been compromised by determined attackers. In this scenario, open-source secure computing architectures are poised to play an important role for designers and white hat attackers. In this manuscript, we propose a lightweight hardware-based secure boot architecture. The architecture uses efficient implementation of Elliptic Curve Digital Signature Algorithm (ECDSA), Secure Hash Algorithm 3 (SHA3) hashing algorithm and Direct Memory Access (DMA). In addition, the architecture includes Key Management Unit, which incorporates an optimized Physical Unclonable Function (PUF) for providing keys to the security blocks of the System on Chip (SoC), among which, secure boot and remote attestation. We demonstrated the framework on RISC-V based SoC. Detailed analysis of performance and security for the platform is presented.

29 citations


Cites background from "Shakti-T: A RISC-V Processor with L..."

  • ...Shakti-T [8] employs the concept of base and bounds to ensure that pointers access only valid memory regions....

    [...]

Proceedings ArticleDOI
23 Jun 2019
TL;DR: The proposal is to use stack-based cookies for crafting fat-pointers instead of having object-based identifiers, which eliminates the use of shadow memory space, or any table to store the pointer metadata, and reduces the storage overheads by a great extent.
Abstract: In this era of IoT devices, security is very often traded off for smaller device footprint and low power consumption. Considering the exponentially growing security threats of IoT and cyber-physical systems, it is important that these devices have built-in features that enhance security. In this paper, we present Shakti-MS, a lightweight RISC-V processor with built-in support for both temporal and spatial memory protection. At run time, Shakti-MS can detect and stymie memory misuse in C and C++ programs, with minimum runtime overheads. The solution uses a novel implementation of fat-pointers to efficiently detect misuse of pointers at runtime. Our proposal is to use stack-based cookies for crafting fat-pointers instead of having object-based identifiers. We store the fat-pointer on the stack, which eliminates the use of shadow memory space, or any table to store the pointer metadata. This reduces the storage overheads by a great extent. The cookie also helps to preserve control flow of the program by ensuring that the return address never gets modified by vulnerabilities like buffer overflows. Shakti-MS introduces new instructions in the microprocessor hardware, and also a modified compiler that automatically inserts these new instructions to enable memory protection. This co-design approach is intended to reduce runtime and area overheads, and also provides an end-to-end solution. The hardware has an area overhead of 700 LUTs on a Xilinx Virtex Ultrascale FPGA and 4100 cells on an open 55nm technology node. The clock frequency of the processor is not affected by the security extensions, while there is a marginal increase in the code size by 11% with an average runtime overhead of 13%.

14 citations


Cites background or methods from "Shakti-T: A RISC-V Processor with L..."

  • ...Although [23] enhances a RISC-V processor to efficiently implement memory checks, the software support required for [23] is extremely complex....

    [...]

  • ...On the other hand, hardware solutions like [23, 25] reduce the run time overhead at the cost of hardware complexity....

    [...]

  • ...Safety Check Instrumentation Methods Metadata Size Performance Overheads Spatial Temporal Hardware Compiler Hardware Software [33] ✔ × × ✔ 128*n NA NA [27] ✔ ✔ × ✔ 256*n + 64 NA 29% [25] ✔ ✔ ✔ ✔ 256*n + 64 NA 25% [23] ✔ ✔ ✔ × 64*n + 128 0% NA [7] ✔ × ✔ × 128*n NA 10% Shakti-MS ✔ ✔ ✔ ✔ 128*n 0% 13%...

    [...]

  • ...Further, unlike [25], we are not using any separate shadow memory space and unlike [23], there are no additional tables or tag bits that are required in the processor to store pointer metadata....

    [...]

Journal ArticleDOI
TL;DR: A hardware-based countermeasure against return address corruption in the processor stack is proposed and validated on the OpenRISC core with a minimal hardware modification of the targeted core and an easy integration at the application level.
Abstract: With the emergence of Internet of Things, embedded devices are increasingly the target of software attacks. The aim of these attacks is to maliciously modify the behavior of the software being executed by the device. The work presented in this letter has been developed for the Cyber Security Awareness Week Embedded Security Challenge. This contest focuses on memory corruption issues, such as stack overflow vulnerabilities. These low level vulnerabilities are the result of code errors. Once exploited, they allow an attacker to write arbitrary data in memory without limitations. We detail in this letter a hardware-based countermeasure against return address corruption in the processor stack. First, several exploitation techniques targeting stack return addresses are discussed, whereas a lightweight hardware countermeasure is proposed and validated on the OpenRISC core. The countermeasure presented follows the shadow stack concept with a minimal hardware modification of the targeted core and an easy integration at the application level.

12 citations


Cites background or methods from "Shakti-T: A RISC-V Processor with L..."

  • ...On the other hand, ISA extensions such as Shakti-T [9] and Watchdog Lite [10] aim at mitigating pointer hijacking....

    [...]

  • ...First, those that use specific toolchains, compilers [9], [10] or library to adapt an applica-...

    [...]

  • ...To identify pointers, Shakti-T, and Watchdog Lite need to instrument the code in advance using compiler modification....

    [...]

Journal ArticleDOI
01 Dec 2020
TL;DR: This manuscript discusses a set of primitive building blocks of a secure SoC and presents some of the implemented security subsystems using these building blocks—such as secure boot, memory protection, PUF-based key management, a countermeasure methodology for RISC-V micro-architectural side-channel leakage, and an integration of the open keystone-enclaves for TEE.
Abstract: A rising tide of exploits, in the recent years, following a steady discovery of the many vulnerabilities pervasive in modern computing systems has led to a growing number of studies in designing systems-on-chip (SoCs) with security as a first-class consideration. Following the momentum behind RISC-V-based systems in the public domain, much of this effort targets RISC-V-based SoCs; most ideas, however, are independent of this choice. In this manuscript, we present a consolidation of our early efforts along these lines in designing a secure SoC around RISC-V, named ITUS. In particular, we discuss a set of primitive building blocks of a secure SoC and present some of the implemented security subsystems using these building blocks—such as secure boot, memory protection, PUF-based key management, a countermeasure methodology for RISC-V micro-architectural side-channel leakage, and an integration of the open keystone-enclaves for TEE. The current ITUS SoC prototype, integrating the discussed security subsystems, was built on top of the lowRISC project; however, these are portable to any other SoC code base. The SoC prototype has been evaluated on an FPGA.

9 citations

References
More filters
Proceedings ArticleDOI
01 Mar 2008
TL;DR: A hardware bounded pointer architectural primitive that supports cooperative hardware/software enforcement of spatial memory safety for C programs is proposed, which is a new hardware primitive datatype for pointers that leaves the standard C pointer representation intact, but augments it with bounds information maintained separately and invisibly by the hardware.
Abstract: The C programming language is at least as well known for its absence of spatial memory safety guarantees (i.e., lack of bounds checking) as it is for its high performance. C's unchecked pointer arithmetic and array indexing allow simple programming mistakes to lead to erroneous executions, silent data corruption, and security vulnerabilities. Many prior proposals have tackled enforcing spatial safety in C programs by checking pointer and array accesses. However, existing software-only proposals have significant drawbacks that may prevent wide adoption, including: unacceptably high run-time overheads, lack of completeness, incompatible pointer representations, or need for non-trivial changes to existing C source code and compiler infrastructure.Inspired by the promise of these software-only approaches, this paper proposes a hardware bounded pointer architectural primitive that supports cooperative hardware/software enforcement of spatial memory safety for C programs. This bounded pointer is a new hardware primitive datatype for pointers that leaves the standard C pointer representation intact, but augments it with bounds information maintained separately and invisibly by the hardware. The bounds are initialized by the software, and they are then propagated and enforced transparently by the hardware, which automatically checks a pointer's bounds before it is dereferenced. One mode of use requires instrumenting only malloc, which enables enforcement of perallocation spatial safety for heap-allocated objects for existing binaries. When combined with simple intraprocedural compiler instrumentation, hardware bounded pointers enable a low-overhead approach for enforcing complete spatial memory safety in unmodified C programs.

231 citations


"Shakti-T: A RISC-V Processor with L..." refers background in this paper

  • ...This overhead can be justified by the additional implicit security guarantee provided by the PLM against temporal memory attacks, which is absent in [14, 37]....

    [...]

  • ...Under this scenario, the solution proposed in [14, 37] would require a storage overhead of 2n words (base and bounds for each pointer), while the solution proposed in [23, 24] would incur a storage overhead of 4n + 1 words (base, bound, lock and key per pointer; and a single key value for the entire set)....

    [...]

  • ...• Reduced metadata storage overheads (of the order of 2× and 4× over [14] and [23, 24] respectively) by using a common memory region across all pointers to store the base and bounds....

    [...]

  • ...While Shakti-T clearly offers benefits in scenarios where aliased pointers exist, in scenarios where all pointers point to different memory regions it incurs relatively higher storage overheads (three words per pointer - ptr_id, base and bound) as compared to [14, 37] which incurs an overhead of only two words per pointer (base and bound)....

    [...]

  • ...Hardware solutions[14, 37, 27], on the contrary, incurs minimal run-time overheads in addition to providing strong security guarantees....

    [...]

Proceedings ArticleDOI
14 Apr 2015
TL;DR: This work studies the inherent overheads of shadow stack schemes, and designs a new scheme, the parallel shadow stack, and shows that its performance cost is significantly less than the traditional shadow stack: 3.5%.
Abstract: Control flow defenses against ROP either use strict, expensive, but strong protection against redirected RET instructions with shadow stacks, or much faster but weaker protections without. In this work we study the inherent overheads of shadow stack schemes. We find that the overhead is roughly 10% for a traditional shadow stack. We then design a new scheme, the parallel shadow stack, and show that its performance cost is significantly less: 3.5%. Our measurements suggest it will not be easy to improve performance on current x86 processors further, due to inherent costs associated with RET and memory load/store instructions. We conclude with a discussion of the design decisions in our shadow stack instrumentation, and possible lighter-weight alternatives.

214 citations


Additional excerpts

  • ...Some of the proposed solutions include: stack canaries [8]; encryption of the code pointer [9]; storing the return address in a shadow stack [11, 33, 12]; re-arranging argument locations, return addresses, previous frame pointers and local variables [34]; control flow integrity checks [1]; and, Address Space Layout Randomization (ASLR) [31]....

    [...]

Journal ArticleDOI
01 Jul 2005
TL;DR: Observing the Blaster worm's activity can provide insight into the evolution of Internet worms.
Abstract: The Blaster worm of 2003 infected at least 100000 Microsoft Windows systems and cost millions in damage. In spite of cleanup efforts, an antiworm, and a removal tool from Microsoft, the worm persists. Observing the worm's activity can provide insight into the evolution of Internet worms.

140 citations


"Shakti-T: A RISC-V Processor with L..." refers methods in this paper

  • ...Researchers have also found several ways to exploit this vulnerability, such as the blaster worm [5] and the slammer worm [21] which have been used to perform Distributed Denial of Service attacks within a network....

    [...]

Proceedings ArticleDOI
04 Nov 2013
TL;DR: To achieve the safety of fat pointers without increasing program state, this work compactly encode approximate base and bound pointers along with exact address pointers for a 46b address space into one 64-bit word with a worst-case memory overhead of 3%.
Abstract: Referencing outside the bounds of an array or buffer is a common source of bugs and security vulnerabilities in today's software. We can enforce spatial safety and eliminate these violations by inseparably associating bounds with every pointer (fat pointer) and checking these bounds on every memory access. By further adding hardware-managed tags to the pointer, we make them unforgeable. This, in turn, allows the pointers to be used as capabilities to facilitate fine-grained access control and fast security domain crossing. Dedicated checking hardware runs in parallel with the processor's normal datapath so that the checks do not slow down processor operation (0% runtime overhead). To achieve the safety of fat pointers without increasing program state, we compactly encode approximate base and bound pointers along with exact address pointers for a 46b address space into one 64-bit word with a worst-case memory overhead of 3%. We develop gate-level implementations of the logic for updating and validating these compact fat pointers and show that the hardware requirements are low and the critical paths for common operations are smaller than processor ALU operations. Specifically, we show that the fat-pointer check and update operations can run in a 4 ns clock cycle on a Virtex 6 (40nm) implementation while only using 1100 6-LUTs or about the area of a double-precision, floating-point adder.

130 citations


"Shakti-T: A RISC-V Processor with L..." refers methods in this paper

  • ...A lightweight hardware implementation of fat-pointers (Low-Fat pointers), as proposed in [20], reduces this storage overhead by encoding the base and bounds into a custom 16-bit floating point format which is stored in the upper bits of the 64-bit virtual address....

    [...]

Journal ArticleDOI
09 Jun 2012
TL;DR: This paper extends Watchdog's mechanisms to detect bounds errors, thereby providing full hardware-enforced memory safety at low overheads, and streamline the implementation and reduce runtime overhead.
Abstract: Languages such as C and C++ use unsafe manual memory management, allowing simple bugs (i.e., accesses to an object after deallocation) to become the root cause of exploitable security vulnerabilities. This paper proposes Watchdog, a hardware-based approach for ensuring safe and secure manual memory management. Inspired by prior software-only proposals, Watchdog generates a unique identifier for each memory allocation, associates these identifiers with pointers, and checks to ensure that the identifier is still valid on every memory access. This use of identifiers and checks enables Watchdog to detect errors even in the presence of reallocations. Watchdog stores these pointer identifiers in a disjoint shadow space to provide comprehensive protection and ensure compatibility with existing code. To streamline the implementation and reduce runtime overhead: Watchdog (1) uses micro-ops to access metadata and perform checks, (2) eliminates metadata copies among registers via modified register renaming, and (3) uses a dedicated metadata cache to reduce checking overhead. Furthermore, this paper extends Watchdog's mechanisms to detect bounds errors, thereby providing full hardware-enforced memory safety at low overheads.

124 citations