scispace - formally typeset
Search or ask a question
Topic

Degree of parallelism

About: Degree of parallelism is a research topic. Over the lifetime, 1515 publications have been published within this topic receiving 25546 citations.


Papers
More filters
Proceedings ArticleDOI
25 Jun 2012
TL;DR: This work presents a parallel patch-based texture synthesis technique that achieves high degree of parallelism, and proposes a complete implementation tuned to take advantage of massive GPU parallelism.
Abstract: Fast parallel algorithms exist for pixel-based texture synthesizers. Unfortunately, these synthesizers often fail to preserve structures from the exemplar without the user specifying additional feature information. On the contrary, patch-based synthesizers are better at capturing and preserving structural patterns. However, they require relatively slow algorithms to layout the patches and stitch them together.We present a parallel patch-based texture synthesis technique that achieves high degree of parallelism. Our synthesizer starts from a low-quality result and adds several patches in parallel to improve it. It selects patches that blend in a seamless way with the existing result, and that hide existing visual artifacts. This is made possible through two main algorithmic contributions: An algorithm to quickly find a good cut around a patch, and a deformation algorithm to further align features crossing the patch boundary. We show that even with a uniform parallel random sampling of the patches, our improved patch stitching achieves high quality synthesis results.We discuss several synthesis strategies, such as using patches of decreasing size or using various amounts of deformation during the optimization. We propose a complete implementation tuned to take advantage of massive GPU parallelism.

28 citations

Proceedings ArticleDOI
19 Apr 2010
TL;DR: This paper presents both an implementation of the Breadth First Search algorithm as well as that of a Matrix Parenthesization algorithm that showcase similar synchronization behavior when implemented on a GPU using CUDA, enabling a more direct comparison between them.
Abstract: Recently, Graphical Processing Units (GPUs) have become increasingly more capable and well-suited to general purpose applications. As a result of the GPUs high degree of parallelism and computational power, there has been a great deal of interest directed toward the platform for parallel application development. Much of the focus, however, has been on very regular applications that exhibit a high degree of data parallelism, as these applications map well to the GPU. Irregular applications, such as the Breadth First Search discussed in this paper, have not been as extensively studied and are more difficult to implement in an efficient fashion on the GPU. We will present both an implementation of the Breadth First Search algorithm as well as that of a Matrix Parenthesization algorithm. These pair of algorithms showcase similar synchronization behavior when implemented on a GPU using CUDA, enabling a more direct comparison between them. The results obtained can be used to showcase some of the synchronization issues present with irregular algorithms on the GPU.

28 citations

Book ChapterDOI
22 Jan 2005
TL;DR: A model is obtained with a strictly lower power of computation by relaxing the hypothesis on the existence of a port numbering, a high level of synchronization involved in one atomic computation step, which involves more synchronization than the message passing model.
Abstract: The different local computations mechanisms are very useful for delimiting the borderline between positive and negative results in distributed computations. Indeed, they enable to study the importance of the synchronization level and to understand how important is the initial knowledge. A high level of synchronization involved in one atomic computation step makes a model powerful but reduces the degree of parallelism. Charron-Bost et al. [1] study the difference between synchronous and asynchronous message passing models. The model studied in this paper involves more synchronization than the message passing model: an elementary computation step modifies the states of two neighbours in the network, depending only on their current states. The information the processors initially have can be global information about the network, such as the size, the diameter or the topology of the network. The initial knowledge can also be local: each node can initially know its own degree for example. Another example of local knowledge is the existence of a port numbering: each processor locally gives numbers to its incident edges and in this way, it can consistently distinguish its neighbours. In Angluin's model [2], it is assumed that a port numbering exists, whereas it is not the case in our model. In fact, we obtain a model with a strictly lower power of computation by relaxing the hypothesis on the existence of a port numbering.

28 citations

Journal ArticleDOI
TL;DR: This paper presents a method for the solution of parabolic PDEs on parallel computers, which is a combination of implicit and explicit finite difference schemes based on a domain decomposition (DD) strategy and is specifically suitable for parallel computers having a high synchronization cost or highly varying load.
Abstract: This paper presents a method for the solution of parabolic PDEs on parallel computers, which is a combination of implicit and explicit finite difference schemes based on a domain decomposition (DD) strategy. Moreover, this method is asynchronous (i.e., no explicit synchronization is required among processors). We determine the values at subdomains' boundaries by our new high-order asynchronous explicit schemes. Then, any known high-order implicit finite difference scheme can be applied within each subdomain. We present a technique for derivation of appropriate asynchronous-explicit schemes based on Green's functions. Synchronous versions of these schemes are obtained as special cases. The applicability of this method is also demonstrated for a family of nonlinear problems. Our new explicit schemes are of high order and yet stable for a large time step, as established in our analysis of their numerical properties. Moreover, these schemes provide attractive properties for parallel implementation. Being asynchronous, they allow local time stepping, thus eliminating the need for a global synchronized time step. Moreover, our asynchronous computation is time stabilizing, in the sense that the calculation implicitly prevents a growing time gap between neighboring subdomains. The locality property, due to the exponential decay of Green's functions, implies that communication is needed only between neighboring processors. Hence, this method which is designed to minimize the overhead associated with the synchronization of the multiple processors is specifically suitable for parallel computers having a high synchronization cost or highly varying load, even in cases in which some processors have persistent speed differences. Furthermore, the implementation of different resolution in each subdomain (e.g., irregular or unstructured grid) makes it valuable as an adaptive algorithm. The above schemes were implemented and tested on the shared-memory multi-user Cray J90 and Sequent Balance machines. These implementations prove high accuracy and high degree of parallelism. This work is complementary to our previous work on asynchronous schemes [Comput. Math. Appl., 24 (1992), pp. 33--53; Appl. Numer. Math., 12 (1993), pp. 27--45; Numer. Algorithms, 6 (1994), pp. 275--296; Numer. Algorithms, 12 (1996), pp. 159--192].

27 citations

Journal ArticleDOI
TL;DR: A maintenance-free itinerary-based approach to K-nearest neighbors query processing called density-aware itinerary KNN query processing (DIKNN), which outperforms the second runner with up to a 50 percent saving in energy consumption and a 40 percent reduction in query response time, while rendering the same level of query result accuracy.
Abstract: The K-nearest neighbors (KNN) query has been of significant interest in many studies and has become one of the most important spatial queries in mobile sensor networks. Applications of KNN queries may include vehicle navigation, wildlife social discovery, and squad/platoon searching on the battlefields. Current approaches to KNN search in mobile sensor networks require a certain kind of indexing support. This index could be either a centralized spatial index or an in-network data structure that is distributed over the sensor nodes. Creation and maintenance of these index structures, to reflect the network dynamics due to sensor node mobility, may result in long query response time and low battery efficiency, thus limiting their practical use. In this paper, we propose a maintenance-free itinerary-based approach called density-aware itinerary KNN query processing (DIKNN). The DIKNN divides the search area into multiple cone-shape areas centered at the query point. It then performs a query dissemination and response collection itinerary in each of the cone-shape areas in parallel. The design of the DIKNN scheme takes into account several challenging issues such as the trade-off between degree of parallelism and network interference on query response time, and the dynamic adjustment of the search radius (in terms of number of hops) according to spatial irregularity or mobility of sensor nodes. To optimize the performance of DIKNN, a detailed analytical model is derived that automatically determines the most suitable degree of parallelism under various network conditions. This model is validated by extensive simulations. The simulation results show that DIKNN yields substantially better performance and scalability over previous work, both as kappa increases and as the sensor node mobility increases. It outperforms the second runner with up to a 50 percent saving in energy consumption and up to a 40 percent reduction in query response time, while rendering the same level of query result accuracy.

27 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
85% related
Scheduling (computing)
78.6K papers, 1.3M citations
83% related
Network packet
159.7K papers, 2.2M citations
80% related
Web service
57.6K papers, 989K citations
80% related
Quality of service
77.1K papers, 996.6K citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20221
202147
202048
201952
201870
201775