Author
Qi Chen
Bio: Qi Chen is an academic researcher from Tianjin University. The author has contributed to research in topics: Computer science & Virtual routing and forwarding. The author has an hindex of 2, co-authored 2 publications receiving 12 citations.
Papers
More filters
TL;DR: This paper proposes the first known approach to protect the authorship and the usage legitimacy of NoCs using specially designed routing, square spiral routing, which exploits routing redundancy inherent in the mesh NoCs and transports packets along the paths, which have very low probability to be taken under commonly used routing algorithms.
Abstract: Intellectual property (IP) core reuse is essential for the design process of system-on-chip (SoC). Network-on-chip (NoC) has been used as an independent IP core during SoC design. However, the NoC has not been protected via IP protection and paid attention on its innovations. This paper proposes the first known approach to protect the authorship and the usage legitimacy of NoCs using specially designed routing, square spiral routing. The special routing algorithm exploits routing redundancy inherent in the mesh NoCs and transports packets along the paths, which have very low probability to be taken under commonly used routing algorithms. These unique and diverse paths are exploited in this paper to embed information of the author and identify the legal buyer of NoCs, showing high robustness and credibility. The hardware implementation of an IP-protected mesh NoC shows that the area overhead is small, which is $\sim 0.74$ %, and the power overhead is $\sim 0.52$ %, while the functionality and performance of the network is not affected. In this paper, the approach is presented for the mesh NoC, but the idea is equally applicable to other NoC topologies where the unique and diverse paths also inherently exist.
9 citations
01 Sep 2015
TL;DR: This paper divides the logic operations of the router into two pipelining stages and implement a modified backpressure flow control mechanism to support high frequency and pipelined routing operation and designs high throughput architecture based on the fine-grained configurability and customizability provided by FPGAs.
Abstract: Emerging System-on-Chip (SoC) applications on FPGAs have boosted the FPGA-based Network-on-Chip (NoC) implementations. Router microarchitecture plays a central role in the performance of an NoC. This paper investigates the router architecture in great detail and designs high throughput architecture based on the fine-grained configurability and customizability provided by FPGAs. Specifically, we 1) divide the logic operations of the router into two pipelining stages and implement a modified backpressure flow control mechanism to support high frequency and pipelined routing operation; 2) explore different buffering schemes to find an architecture which can sustain low queuing delays. The experimental results show that the pipelined architecture achieves operation clock frequency over 400 MHz, which is 3 times higher than that of an open source FPGA-based NoC, CONNECT, leading to about 5 times improvement in the network saturation throughput.
6 citations
01 Jan 2023
TL;DR: In this paper , the authors proposed DFBUFFER, a data write buffer that can asynchronously handle write requests, which accelerates the write bandwidth of compute nodes, and evaluated the performance of the DF-buffer on the Sunway exascale prototype system.
Abstract: Most supercomputers adopt a data forwarding architecture to achieve storage scalability. However, it results in a significant reduction in single-process bandwidth compared to direct file system access. Moreover, considering that a majority of applications uses only a single process for writing and reading data, the low single-process performance also leads to a time overhead for these applications. This paper proposes an userspace forwarding mechanism DFBUFFER with two performance optimization methods: user-space multi-thread request processing and data write buffer in a unit of file. The client of DFBUFFER is embedded in the application as a library reducing the software overhead, and the server implements multi-thread I/O request processing to improve bandwidth efficiency. The data write buffer can asynchronously handle write requests, which accelerates the write bandwidth of compute nodes. We evaluate DFBUFFER on the Sunway exascale prototype system. The results indicate that in the regular mode of DFBUFFER, both the write and read latency are reduced, and the write bandwidth and large-block read bandwidth of single-process are increased by 1.8 times and 2.8 times respectively. The DFBUFFER buffer mode increases the write bandwidth of a single process by 0.8 times over the regular mode. Although the performance advantage of the regular mode of DFBUFFER gradually weakens with the increase of concurrent processes, the DFBUFFER buffer mode has the effect of improving the write bandwidth, the 64-IO-processes application is increased by 0.2 times.
Proceedings Article•
01 Nov 2022-International Conference for High Performance Computing, Networking, Storage and Analysis
TL;DR: SeqDLM as discussed by the authors is a distributed lock manager that exploits the sequencer mechanism to mitigate the lock conflict resolution overhead using early grant and early revocation while keeping the same semantics as traditional distributed locks.
Abstract: Distributed locks are used to guarantee the distributed client-cache coherence in parallel file systems. However, they lead to poor performance in the case of parallel writes under high-contention workloads. We analyze the distributed lock manager and find out that lock conflict resolution is the root cause of the poor performance, which involves frequent lock revocations and slow data flushing from client caches to data servers. We design a distributed lock manager named SeqDLM by exploiting the sequencer mechanism. SeqDLM mitigates the lock conflict resolution overhead using early grant and early revocation while keeping the same semantics as traditional distributed locks. To evaluate SeqDLM, we have implemented a parallel file system called ccPFS using both SeqDLM and traditional distributed locks. Evaluations on 96 nodes show SeqDLM outperforms the traditional distributed locks by up to $\boldsymbol{10.3}\times$ for high-contention parallel writes on a shared file with multiple stripes.
Cited by
More filters
TL;DR: ProNoC is an integrated tool for rapid prototyping and validation of NoC-based MCSoC projects targeting FPGA devices and adopts most advanced NoC features such as the support of virtual channel (VC), virtual network, low latency routing and different routing algorithms.
Abstract: Network-on-chip (NoC) is an emerging interconnect infrastructure to address the scalability limitation of conventional shared bus architecture for many-core system-on-chip (MCSoC). Current field-programmable gate arrays (FPGAs) have over million lookup tables, making it possible to prototype a complete NoC-based MCSoC on a single FPGA device. FPGA prototyping allows rapid system verification and optimum design parameters estimation. However, existing NoC-based MCSoC prototypes are usually adopting simple NoC architectural functionality. These NoC prototypes cannot represent a realistic projection of the state-of-the-art application-specific integrated circuit (ASIC) NoCs as these prototypes have limited overall system performance. This paper presents ProNoC, an integrated tool for rapid prototyping and validation of NoC-based MCSoC projects targeting FPGA devices. ProNoC adopts most advanced NoC features such as the support of virtual channel (VC), virtual network, low latency routing and different routing algorithms. Results show that NoC interconnect in ProNoC outperforms CONNECT, the most recent VC based prototype NoC with lower logic cell utilization, higher maximum operating frequency, higher average saturation throughput, and lower average communication latency. Moreover, ProNoC is equipped with graphical user interface to facilitate the development of MCSoC prototypes on FPGA platforms.
51 citations
TL;DR: This paper proposes a new publicly verifiable watermarking detection technique based on chaos-based zero-knowledge interaction and time stamping to resiliently resist the sensitive information leakage and embedding attacks, and is thus robust to the cheating from the prover, verifier, or third party.
Abstract: Watermarking as a novel intellectual property (IP) protection technique can protect field-programmable gate array IPs from infringement. However, existing watermarking techniques may give away sensitive information during the public verification, which enables malicious verifiers or third parties to remove the embedded watermark and resell the design. Current zero-knowledge watermarking verification schemes can address the sensitive information leakage issue but are vulnerable to embedding attacks, which makes them ineffective in preventing the infringement denying of untrusted buyers (verifiers). This paper proposes a new publicly verifiable watermarking detection technique based on chaos-based zero-knowledge interaction and time stamping to resiliently resist the sensitive information leakage and embedding attacks, and is thus robust to the cheating from the prover, verifier, or third party. Experimental results and analysis show that the proposed method has better robustness than the most recent related literature.
18 citations
TL;DR: This work addresses fault tolerance and security at NoC level with SDR, a routing algorithm that includes the concept of security zones in the MPSoC while providing support for dependable routing avoiding faulty links.
Abstract: The Internet-of-Things (IoT) boosted the building of computational systems that share computation, communication and storage resources for uncountable types of applications. MultiProcessor System-on-Chip (MPSoC) is a fundamental component of such systems offering large parallelism degree in an ocean of processors and memories connected through one or more Network-on-Chips (NoCs). Therefore, a massive quantity of sensitive information of several applications can share computation and communication resources of the MPSoCs demanding security mechanisms and policies. Besides, the advances of CMOS technologies increases the quantity of static and dynamic faults, requiring a dependable and resilient target architecture, which can be partially fulfilled by an effective and efficient NoC design. This work addresses fault tolerance and security at NoC level with SDR, a routing algorithm that includes the concept of security zones in the MPSoC while providing support for dependable routing avoiding faulty links. The proposed routing algorithm prioritizes communication paths deemed secure in 2D mesh NoCs with deadlock freedom. Experimental results employing realistic workload scenarios based on the NASA Numeric Aerodynamic Simulation (NAS) Parallel Benchmark (NPB) and a fault model for 65nm and 22nm CMOS fabrication technologies demonstrates the scalability, security, and dependability of SDR.
10 citations
TL;DR: A study treats an outstanding concept for system-on-chip communication introduced as communication network on-chip (NoC), which includes the NoC basics, network topology, relevant research issues and different abstraction levels.
Abstract: Large scale System-on-Chip (SoC) has been enabled by the scaling of microchip technologies. As data intensive applications have emerged and processing power has increased, the threat of the communication components on single-chip systems introduced network on chip (NoC). NoC provides the concept of interachip communication. In this paper a study treats an outstanding concept for system-on-chip communication introduced as communication network on-chip (NoC). This paper includes the NoC basics, network topology, relevant research issues and different abstraction levels.
5 citations
01 Dec 2017
TL;DR: A novel NoC architecture for FPGA-based MPSoCs that combines data transfers with application-specific processing by adding high-level synthesized processing units to routers of the NoC is presented.
Abstract: The end of Dennard scaling led to the use of heterogeneous Multi-Processor Systems-on-Chip (MPSoCs). Heterogeneous MPSoCs provide a high efficiency in terms of energy and performance due to the fact that each processing element can be optimized for an application task. However, the evolution of MPSoCs shows a growing number of processing elements (PEs) which leads to tremendous communication costs tending to become the performance bottleneck. Networks-on-Chip (NoCs) are a promising and scalable intra-chip communication technology for MPSoCs. This paper presents a novel NoC architecture for FPGA-based MPSoCs that combines data transfers with application-specific processing by adding high-level synthesized processing units to routers of the NoC. The execution of application-specific operations during data exchange between PEs exploits efficiently the transmission time. Furthermore, the processing units can be programmed in C/C++ using high-level synthesis and accordingly they can be specifically optimized for an application. This approach enables that transferred data can be processed by a processing element such as a MicroBlaze processor before the transmission or by a router during the transmission. Moreover, the additional processing capabilities of the routers release computing resources of the PEs.
4 citations