scispace - formally typeset
Search or ask a question

Showing papers by "Manuel P. Malumbres published in 1996"


Proceedings ArticleDOI
23 Oct 1996
TL;DR: In this paper, a tree-based multicast with pruning to avoid deadlocks is proposed, which is targeted to situations where the size of message data is very small, like in invalidation and update messages in distributed shared-memory multiprocessors.
Abstract: This paper presents an efficient routing and flow control mechanism to implement multidestination message passing in wormhole networks. It is targeted to situations where the size of message data is very small, like in invalidation and update messages in distributed shared-memory multiprocessors (DSMs) with hardware cache coherence. The mechanism is a variation of tree-based multicast with pruning to avoid deadlocks. The new scheme does not require that the destination addresses in a given multicast message be ordered, thereby avoiding any ordering overhead. It allows messages to use any deadlock-free routing function and only requires one startup for each multicast message. The new scheme has been evaluated on several k-ary n-cube networks under synthetic loads. The results show that the proposed scheme is faster than other multicast mechanisms when the multicast traffic is composed of short messages.

127 citations


Book ChapterDOI
26 Aug 1996
TL;DR: This paper compares the 2D-mesh and the hypercube topologies assuming a very detailed router model and shows that average latency for hypercubes is slightly lower than for meshes, making them suitable for DSMs.
Abstract: Many distributed shared-memory multiprocessors (DSM) use a direct interconnection network to implement a cache coherence protocol. An interesting characteristic of the message traffic produced by coherence protocols is that all the messages are very short. Most current multicomputers use low dimensional meshes or tori because these topologies usually achieve a higher performance. However, when messages are very short, latency is mainly dominated by the distance traveled in the network. As a consequence, higher dimensional topologies may achieve a lower latency than low-dimensional topologies. In this paper, we compare the 2D-mesh and the hypercube topologies assuming a very detailed router model. Network load has been modeled taking into account the traffic produced by cache coherence protocols. Performance results show that average latency for hypercubes is slightly lower than for meshes. Moreover, hypercubes achieve a much higher throughput than meshes, making them suitable for DSMs.

23 citations


01 Jan 1996
TL;DR: The results show that the proposed scheme is faster than other multicast mechanisms when the multicast traffic is composed of short messages, and allows messages to use any deadlock-free routing function and only requires one startup for each multicast message.
Abstract: This paper presents an efficient routing and flow control mechanism to implement multidestination message passing in wormhole networks. It is targeted to situations where the size of message data is very small, like in invalidation and update messages in distributed shared-memory multiprocessors (DSMs) with hardware cache coherence. The mechanism is a variation of tree-based multicast with pruning to avoid deadlocks. The new scheme does not require that the destination addresses in a given multicast message be ordered, thereby avoiding any ordering overhead. It allows messages to use any deadlock-free routing function and only requires one startup for each multicast message. The new scheme has been evaluated on several k-ary n-cube networks under synthetic loads. The results show that the proposed scheme is faster than other multicast mechanisms when the multicast traffic is composed of short messages.

10 citations