scispace - formally typeset
Search or ask a question
Author

Yunhai Shang

Bio: Yunhai Shang is an academic researcher from Alibaba Group. The author has contributed to research in topics: Reduced instruction set computing & RISC-V. The author has an hindex of 1, co-authored 2 publications receiving 11 citations.

Papers
More filters
Proceedings ArticleDOI
30 May 2020
TL;DR: Xuantie-910 is an industry leading 64-bit high performance embedded RISC-V processor from Alibaba T-Head division that features custom extensions to arithmetic operation, bit manipulation, load and store, TLB and cache operations, and implements the 0.7.1 stable release of RISCV vector extension specification for high efficiency vector processing.
Abstract: The open source RISC-V ISA has been quickly gaining momentum. This paper presents Xuantie-910, an industry leading 64-bit high performance embedded RISC-V processor from Alibaba T-Head division. It is fully based on the RV64GCV instruction set and it features custom extensions to arithmetic operation, bit manipulation, load and store, TLB and cache operations. It also implements the 0.7.1 stable release of RISC-V vector extension specification for high efficiency vector processing. Xuantie-910 supports multi-core multi-cluster SMP with cache coherence. Each cluster contains 1 to 4 core(s) capable of booting the Linux operating system. Each single core utilizes the state-of-the-art 12-stage deep pipeline, out-of-order, multi-issue superscalar architecture, achieving a maximum clock frequency of 2.5 GHz in the typical process, voltage and temperature condition in a TSMC 12nm FinFET process technology. Each single core with the vector execution unit costs an area of 0.8 mm2 (excluding the L2 cache). The toolchain is enhanced significantly to support the vector extension and custom extensions. Through hardware and toolchain co-optimization, to date Xuantie-910 delivers the highest performance (in terms of IPC, speed, and power efficiency) for a number of industrial control flow and data computing benchmarks, when compared with its predecessors in the RISC-V family. Xuantie-910 FPGA implementation has been deployed in the data centers of Alibaba Cloud, for application-specific acceleration (e.g., blockchain transaction). The ASIC deployment at low-cost SoC applications, such as IoT endpoints and edge computing, is planned to facilitate Alibaba's end-to-end and cloud-to-edge computing infrastructure.

55 citations

Journal ArticleDOI
TL;DR: In this paper , a hardware non-invasive mapping method for condition bits (HNIMCB) is proposed for binary translation from source architectures with condition bit operations to target architectures without condition bits operations.
Abstract: Binary translation, as an important bridge for application compatibility between different instruction set architectures (ISAs), has attracted much attention in the industry. However, due to hardware resource limitations of the target ISA, the translation efficiency and the practicability are poor. Recently, Apple has made it possible to run x86 programs on ARM through a translation technology called Rosetta based on software-hardware collaboration. In this paper, we proposed a hardware non-invasive mapping method for condition bits (HNIMCB) in binary translation, which innovatively implements the setting and referencing operations of the condition bits without changing the original instruction encoding and function of the target processor. This method is applicable for binary translation from source architectures with condition bit operations to target architectures without condition bit operations. It eliminates the difference of conditional bit resources between the source and target ISAs, reduces the computational instructions and memory access operations after translation from the source to the target ISA, and dramatically improves the translation efficiency. We conducted this experiment on a functional simulation level using the QEMU binary translator from ARM to RISC-V. A series of benchmark tests revealed that the total number of instructions decreased by 41%, while the number of memory access instructions decreased by 37% after the translation applying with the HNIMCB.

Cited by
More filters
Proceedings ArticleDOI
11 May 2021
TL;DR: In this article, the authors compared the most prominent open-source application-class RISC-V projects by running identical benchmarks on identical platforms with defined configuration settings, including the Rocket, BOOM, CVA6, and SHAKTI C-Class implementations.
Abstract: The numerous emerging implementations of RISC-V processors and frameworks underline the success of this Instruction Set Architecture (ISA) specification. The free and open source character of many implementations facilitates their adoption in academic and commercial projects. As yet it is not easy to say which implementation fits best for a system with given requirements such as processing performance or power consumption. With varying backgrounds and histories, the developed RISC-V processors are very different from each other. Comparisons are difficult, because results are reported for arbitrary technologies and configuration settings. Scaling factors are used to draw comparisons, but this gives only rough estimates. In order to give more substantiated results, this paper compares the most prominent open-source application-class RISC-V projects by running identical benchmarks on identical platforms with defined configuration settings. The Rocket, BOOM, CVA6, and SHAKTI C-Class implementations are evaluated for processing performance, area and resource utilization, power consumption as well as efficiency. Results are presented for the Xilinx Virtex UltraScale+ family and GlobalFoundries 22FDX ASIC technology.

35 citations

Journal ArticleDOI
TL;DR: Vector architectures lack tools for research, so the gem5 simulator, which is possibly the leading platform for computer-system architecture research, does not have an ava...
Abstract: Vector architectures lack tools for research. Consider the gem5 simulator, which is possibly the leading platform for computer-system architecture research. Unfortunately, gem5 does not have an available distribution that includes a flexible and customizable vector architecture model. In consequence, researchers have to develop their own simulation platform to test their ideas, which consume much research time. However, once the base simulator platform is developed, another question is the following: Which applications should be tested to perform the experiments? The lack of Vectorized Benchmark Suites is another limitation. To face these problems, this work presents a set of tools for designing and evaluating vector architectures. First, the gem5 simulator was extended to support the execution of RISC-V Vector instructions by adding a parameterizable Vector Architecture model for designers to evaluate different approaches according to the target they pursue. Second, a novel Vectorized Benchmark Suite is presented: a collection composed of seven data-parallel applications from different domains that can be classified according to the modules that are stressed in the vector architecture. Finally, a study of the Vectorized Benchmark Suite executing on the gem5-based Vector Architecture model is highlighted. This suite is the first in its category that covers the different possible usage scenarios that may occur within different vector architecture designs such as embedded systems, mainly focused on short vectors, or High-Performance-Computing (HPC), usually designed for large vectors.

20 citations

Proceedings ArticleDOI
13 Jun 2021
TL;DR: In this article, the authors proposed a method to evaluate the energy consumption using power and performance values, and shortlisted the most optimized cores for resource-constrained devices and implemented them using an ASIC prototyping platform as a unified technology.
Abstract: Resource-Constrained electronic devices targeting IoT applications need a microcontroller to control their operation, and hence this microcontroller should be energy efficient to increase battery life. One of the emerging open-source processors ISA is RISC-V; as an open-source, it provides a great opportunity for innovation and creativity in designing processor cores. This study targets to survey open-source RISC-V cores and classify them as high-performance and resource-constrained. Afterward, we shortlist the most optimized cores for resource-constrained devices. Seven shortlisted cores are implemented using an ASIC prototyping platform as a unified technology and are compared using resource utilization and energy consumption profile to find the most energy efficient core. We proposed a method to evaluate the energy consumption using power and performance values. Results showed Ibex core to have the best energy consumption characteristics with 8.7 Coremark iterations/mJ over the ASIC prototyping platform.

8 citations

Proceedings ArticleDOI
01 Jul 2022
TL;DR: This work presents its first open-source implementation of the RISC-V V extension, discusses the new specification's impact on the micro-architecture of a lane-based design, and provides insights on performance-oriented design of coupled scalar-vector processors.
Abstract: Vector architectures are gaining traction for highly efficient processing of data-parallel workloads, driven by all major ISAs (RISC-V, Arm, Intel), and boosted by landmark chips, like the Arm SVE-based Fujitsu A64FX, powering the TOP500 leader Fugaku. The RISC-V V extension has recently reached 1.0-Frozen status. Here, we present its first open-source implementation, discuss the new specification's impact on the micro-architecture of a lane-based design, and provide insights on performance-oriented design of coupled scalar-vector processors. Our system achieves comparable/better PPA than state-of-the-art vector engines that implement older RVV versions: 15% better area, 6% improved throughput, and FPU utilization >98.5% on crucial kernels.

7 citations

Proceedings ArticleDOI
01 Oct 2022
TL;DR: MINJIE, an open-source platform supporting agile processor development flow that integrates a broad set of tools for logic design, functional verification, performance modelling, pre-silicon validation and debugging for better development efficiency of state-of-the-art processor designs is proposed.
Abstract: While research has shown that the agile chip design methodology is promising to sustain the scaling of computing performance in a more efficient way, it is still of limited usage in actual applications due to two major obstacles: 1) Lack of tool-chain and developing framework supporting agile chip design, especially for large-scale modern processors. 2) The conventional verification methods are less agile and become a major bottleneck of the entire process. To tackle both issues, we propose MINJIE, an open-source platform supporting agile processor development flow. MINJIE integrates a broad set of tools for logic design, functional verification, performance modelling, pre-silicon validation and debugging for better development efficiency of state-of-the-art processor designs. We demonstrate the usage and effectiveness of MINJIE by building two generations of an open-source superscalar out-of-order RISC-V processor code-named XIANGSHAN using agile methodologies. We quantify the performance of XIANGSHAN using SPEC CPU2006 benchmarks and demonstrate that XIANGSHAN achieves industry-competitive performance.

6 citations