scispace - formally typeset
Search or ask a question
Topic

Degree of parallelism

About: Degree of parallelism is a research topic. Over the lifetime, 1515 publications have been published within this topic receiving 25546 citations.


Papers
More filters
Patent
07 May 2014
TL;DR: In this article, an analysis method for a degree of parallelism of a simulation task on basis of a DAG (Directed Acyclic Graph) is presented. And the analysis method comprises the concrete implementation steps: constructing a simulator task description and parallelism degree analyzing system, performing attribute description on the calculating complexity, communication coupling degree, and task causality sequence of the simulation task by the DAG-based simulation task description module; performing DAG normalizing processing on the simulation tasks by the normalizing module; automatically performing inter-task parallelisms degree analyzing and obtaining
Abstract: The invention discloses an analysis method for a degree of parallelism of a simulation task on basis of a DAG (Directed Acyclic Graph). A prototype system for realizing the analysis method mainly comprises a DAG-based simulation task description module, a DAG-normalizing module and a simulation task parallelism degree analyzing module. The analysis method comprises the concrete implementation steps: constructing a simulation task description and parallelism degree analyzing system; performing attribute description on the calculating complexity, communication coupling degree and task causality sequence of the simulation task by the DAG-based simulation task description module; performing DAG-normalizing processing on the simulation task by the DAG-normalizing module; automatically performing inter-task parallelism degree analyzing and obtaining a quantified parallelism degree value by the simulation task parallelism degree analyzing module according to the normalized DAG. The analysis method for the degree of parallelism of the simulation task oriented to high-effect simulation of a complex system is realized, the inter-task parallelism can be quickly, effectively and automatically analyzed according to the DAG description of the simulation task and the parallelism property and efficiency of a high-effect simulation system are ensured.

5 citations

01 Jan 2018
TL;DR: An FPGA-based odd-even merge sorter which features throughput of 27.18 GB/s when merging 4 streams and presents stable throughput performance when the number of input streams is increased due to its high degree of parallelism.
Abstract: As database systems have shifted from disk-based to in-memory, and the scale of the database in big data analysis increases significantly, the workloads analyzing huge datasets are growing. Adopting FPGAs as hardware accelerators improves the flexibility, parallelism and power consumption versus CPU-only systems. The accelerators are also required to keep up with high memory bandwidth provided by advanced memory technologies and new interconnect interfaces. Sorting is the most fundamental database operation. In multiple-pass merge sorting, the final pass of the merge operation requires significant throughput performance to keep up with the high memory bandwidth. We study the state-of-the-art hardware-based sorters and present an analysis of our own design. In this thesis, we present an FPGA-based odd-even merge sorter which features throughput of 27.18 GB/s when merging 4 streams. Our design also presents stable throughput performance when the number of input streams is increased due to its high degree of parallelism. Thanks to such a generic design, the odd-even merge sorter does not suffer throughput drop for skewed data distributions and presents constant performance over various kinds of input distributions.

5 citations

Journal ArticleDOI
TL;DR: Aiming at the new-generation video compression standard being formulated—HEVC, a kind of sub-pixel interpolation filtering algorithm is proposed (luminance): 1/4 precision, chrominance: 1/8 precision and a hardware design with pipeline structure and high degree of parallelism is put forward.
Abstract: Aiming at the new-generation video compression standard being formulated—HEVC, a kind of sub-pixel interpolation filtering algorithm is proposed (luminance: 1/4 precision, chrominance: 1/8 precision). Based on the algorithm, a hardware design with pipeline structure and high degree of parallelism is put forward. The hardware overhead is reduced by multiplex Wiener filter and the reduction of the size of register array. And the interpolation order of vertical priority is adopted to reduce the reading bandwidth of the storage. It is indicated from the performance analysis that this interpolation structure possesses better performance and smaller hardware overhead. This design also takes full consideration of the balance between speed and area, meeting the requirements of processing standard definition and high definition video image. DOI: http://dx.doi.org/10.11591/telkomnika.v11i12.3676

5 citations

Journal ArticleDOI
TL;DR: A new algorithm for parallel synchronous simulation of VHDL designs to be executed on desktop computers is proposed, which focuses on parallelizing the simulation kernel with special emphasis on signal grouping while maintaining language semantics.
Abstract: This article proposes a new algorithm for parallel synchronous simulation of VHDL designs to be executed on desktop computers. Besides executing VHDL processes in parallel, the algorithm focuses on parallelizing the simulation kernel with special emphasis on signal grouping while maintaining language semantics. Synchronous approaches are the most suitable for shared memory multiprocessor (SMP) desktop computers but may be difficult to parallelize because of the low activity detected in most of the designs. The degree of parallelism is increased in this approach by performing an exhaustive VHDL signal dependencies analysis and avoiding any sequential phase in the simulator. VHDL semantics impose a synchronization barrier after each phase, that is, the process and the kernel simulation phase, as the language definition does not allow simultaneous execution of kernel and processes. These barriers have been relaxed in order to increase the level of parallelism and obtain better performance. Another aspect the new algorithm takes into account is to improve load balancing and locality of references, both critical issues in synchronous simulators, by introducing a new load balancing algorithm that exploits the cyclic characteristics of circuit simulators. These developments make the algorithm suitable for commodity hardware, that is, SMP that are currently used as desktop personal computers.

5 citations

Proceedings ArticleDOI
22 Feb 2004
TL;DR: This work aims to describe a methodology for scheduling and allocation of hardware contexts, in applications with high degree of parallelism, in a Run-Time-Reconfiguration (RTR) proceeding for a reconfigurable FPGA.
Abstract: This work aims to describe a methodology for scheduling and allocation of hardware contexts, in applications with high degree of parallelism, in a Run-Time-Reconfiguration (RTR) proceeding for a reconfigurable FPGA. The Scheduling approach is based on the hardware resource distribution in the FPGA architecture. The Scheduler is modeled as a Petri Net and the best performance yields the best scheduling. The hardware contexts allocation is based on a Left-Edge algorithm principle for rationalization of resources in scheduling approach. The adaptation of the algorithm considers that pre-located areas for loading of the contexts in the architecture are used.

5 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
85% related
Scheduling (computing)
78.6K papers, 1.3M citations
83% related
Network packet
159.7K papers, 2.2M citations
80% related
Web service
57.6K papers, 989K citations
80% related
Quality of service
77.1K papers, 996.6K citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20221
202147
202048
201952
201870
201775