scispace - formally typeset

Book ChapterDOI

A low–level communication library for java HPC

02 Oct 2005-pp 429-434

TL;DR: New low-level communication library for Java HPC, called mpjdev, is introduced with the goal that it can be implemented portably on network platforms and efficiently on parallel hardware.

AbstractDesigning a simple but powerful low-level communication library for Java HPC environments is an important task. We introduce new low-level communication library for Java HPC, called mpjdev. The mpjdev API is designed with the goal that it can be implemented portably on network platforms and efficiently on parallel hardware. Unlike MPI which is intended for the application developer, mpjdev is meant for library developers. Application level communication may be implemented on top of mpjdev. The mpjdev API itself might be implemented on top of Java sockets in a portable network implementation, or-on HPC platforms-through a JNI (Java Native Interface) to a subset of MPI.

Topics: Java Native Interface (67%), Java API for XML-based RPC (65%), Java annotation (64%), Real time Java (64%), Java (64%)

Summary (1 min read)

1 Introduction

  • HPJava [1] is an environment for scientific and parallel programming using Java.
  • Moreover, a translated and compiled HPJava program is a standard Java class file that can be executed by a distributed collection of Java Virtual Machines.
  • HPJava does not provide any special syntax for accessing non-local elements.
  • The mpjdev API itself might be implemented on top of Java sockets in a portable network implementation, or-on HPC platforms-through JNI (Java Native Interface) to a subset of MPI.
  • Currently not all the communication stack in this figure is implemented.

2 Communications API

  • Point-to-point communication and collective communication are two main communication modes of MPI.
  • Currently the only messaging modes for mpjdev are standard blocking mode (like MPI SEND, MPI RECV) and standard non-blocking mode (like MPI ISEND, MPI IRECV), together with a couple of ”wait” primitives.
  • The Comm class also has the initial communicator, WORLD, like MPI COMM WORLD in MPI and other utility methods.
  • It initializes the source and tag fields of the returned Status class which describes a completed communication.
  • Unlike blocking send, a non-blocking send returns immediately after its call and does not wait for completion.

3 Message Format

  • This section describes the message format used by mpjdev.
  • These features are in the spirit of MPI.
  • This is to allow for native implementations of the buffer operations, which (unlike standard Java read/write operations) may use either byte order.
  • The elements in a section will all have identical primitive Java type, or they will all have Object type (in the latter case the exact classes of the objects need not be homogeneous within the section).
  • After the primary payload there is a secondary header.

4 Discussion

  • The authors have explored enabling parallel, high-performance computation–in particular development of scientific software in the network-aware programming language, Java.
  • Traditionally, this kind of computing was done in Fortran.
  • One important issue is how to transfer data between the Java program and the network while reducing overheads of the Java Native Interface.
  • The authors discussed message buffer and communication APIs of mpjdev and also format of a message.

Did you find this useful? Give us your feedback

...read more

Content maybe subject to copyright    Report

A Low–Level Communication Library for Java
HPC
Sang Boem Lim
1
, Bryan Carpenter
2
, Geoffrey Fox
3
and Han-Ku Lee
4
1
Korea Institute of Science and Technology Information (KISTI)
Daejeon, Korea
slim@kisti.re.kr
2
OMII , U niversity of Southampton
Southampton SO17 1BJ, UK
dbc@ecs.soton.ac.uk
3
Pervasive Technology Labs at Indiana University
Bloomington, IN 47404-3730
gcf@indiana.edu
4
School of Internet and Multimedia Engineering, Konkuk University
Seoul, Korea
hlee@konkuk.ac.kr
Abstract. Designing a simple but powerful low-level communication li-
brary for Java HPC environments is an important task. We introduce
new low-level communication library for Java HPC, called mpjdev. The
mpjdev API is designed with the goal that it can be implemented portably
on network platforms and efficiently on parallel hardware. Unlike MPI
which is intended for the application developer, mpjdev is meant for li-
brary developers. Application level communication may be implemented
on top of mpjdev. The mpjdev API itself might be implemented on
top of Java sockets in a portable network implementation, or-on HPC
platforms-through a JNI (Java Native Interface) to a subset of MPI.
1 Introduction
HPJava [1] is an environment for scientific and parallel programming using Java.
It is based on an extended version of the Java language. HPJava incorporates
all of the Java language as a subset. This means any ordinary Java class can be
invoked from an HPJava program without recompilation. Moreover, a translated
and compiled HPJava program is a s tandard Java class file that can be executed
by a distributed collection of Java Virtual Machines.
Locally held elements of multiarrays and distributed arrays can be accessed
using some special syntax provided by HPJava. HPJava does not provide any
special syntax for accessing non-local elements. Non-local elements can only be
accessed by making explicit library calls. This policy in the HPJava language,
Correspondence to: Han-ku Lee, School of Internet and Multimedia Engineering,
Konkuk University, Seoul, Korea

attempts to leverage successful library-based approaches to SPMD parallel com-
puting. This idea is in very much in the spirit of MPI, with its explicit point-
to-point and collective communications. HPJava raises the level of abstraction a
notch, and adds excellent support for development of libraries that manipulate
distributed arrays. But it still exposes a multi-threaded, non-shared-memory,
execution model to programmer. Advantages of this approach include flexibility
for the programmer, and ease of compilation, because the compiler does not have
to analyze and optimize communication patterns.
Java version of Adlib
APIs
Other application−level
mpjdev
Pure Java
MPJ and
(e.g. IBM SP3, Sun HPC)
Parallel Hardware
Native MPI
Networks of PCs
SMPs or
Fig. 1. An HPJava communication stack
The mpjdev [2] [3] API is designed with the goal that it can be implemented
portably on network platforms and efficiently on parallel hardware. Unlike MPI
which is intended for the application developer, mpjdev is meant for library
developers. Application level communication libraries like the Java version of
Adlib (or MPJ [1]) may be implemented on top of mpjdev. The mpjdev API
itself might be implemented on top of Java sockets in a portable network imple-
mentation, or-on HPC platforms-through JNI (Java Native Interface) to a subset
of MPI. The positioning of the mpjdev API is illustrated in Figure 1. Currently
not all the communication stack in this figure is implemented. The Java version
of Adlib, the pure Java implementation on SMPs, and native the MPI implemen-
tation are developed and included in the current HPJava or mpiJava releases.
The r est of the stack may be filled in the future.
2 Communications API
In MPI there is a rich set of communication modes. Point-to-point communica-
tion and collective communication are two main communication modes of MPI.
Point-to-point communication support blocking and non-blocking communica-
tion modes. Blocking communication mode includes one blocking mode receive,
MPI
RECV, and four different send communication modes. Blocking send
communication modes include standard mode, MPI SEND, synchronous mode,
MPI SSEND, ready mode, MPI RSEND, and buffered mode, MPI BSEND.
Non-blocking communication mode also uses one receives, MPI
IRECV and
the same four modes as blocking send: standard, MPI ISEND, synchronous,

public class Comm {
public void size() { ... }
public void id() { ... }
public void dup() { ... }
public void create(int [] ids) { ... }
public void free() { ... }
public void send(Buffer buf, int dest, int tag) { ... }
public Status recv(Buffer buf, int src, int tag) { ... }
public Request isend(Buffer buf, int dest, int tag) { ... }
public Request irecv(Buffer buf, int dest, int tag) { ... }
public static String [] init(String[] args) { ... }
public static void finish() { ... }
. . .
}
Fig. 2. The public interface of mpjdev Comm class.
MPI
ISSEND, ready, MPI IRSEND, and buffered, MPI IBSEND. Collec-
tive communication also includes various communication modes. It has charac-
teristic collective modes like broadcast, MPI BCAST, gather, MPI GATHER,
and scatter, MPI
SCATER. Global reduction operations are also included in
collective communication.
The mpjdev API is much simpler. It only includes point-to-point communi-
cations. Currently the only messaging modes for mpjdev are standard blocking
mode (like MPI
SEND, MPI RECV) and standard non-blocking mode (like
MPI
ISEND, MPI IRECV), together with a couple of ”wait” primitives.
The communicator class, Comm, is very similar to the one in MPI but it has
a reduced number of functionalities. It has communication methods like send(),
recv(), isend(), and irecv(), and defines constants ANY SOURCE, and
ANY TAG as static variables. Figure 2 s hows the public interface of Comm
class.
We can get the number of processes that are spanned by this communicator
by calling size() (similar to MPI COMM SIZE). Current id of process relative
to this communicator is returned by id() (similar to MPI COMM RANK).
The two methods send() and recv() are blocking communication modes.
These two methods block until the communication finishes. The method send()
sends a message containing the contents of buf to the destination described by
dest and mess age tag value tag.
The method recv() receives a message from matching source described by
src with matching tag value tag and copies contents of message to the receive
buffer, buf. The receiver may use wildcard value ANY SOURCE for src and
ANY TAG for tag instead specifying src and tag values. These indicate that

public class Request {
public Status iwait() { ... }
public Status iwaitany(Request [] reqs) { ... }
. . .
}
Fig. 3. The public interface of Request class.
a receiver accepts any source and/or tag of send. The Comm class also has
the initial communicator, WORLD, like MPI COMM WORLD in MPI and
other utility methods. The capacity of receive buffer must be large enough to
accept these contents. It initializes the source and tag fields of the returned
Status class which describes a completed communication.
The functionalities of send() and recv() methods are same as standard
mode point–to–point communication of MPI (MPI
SEND and MPI RECV).
A recv() will be blocked until the send if posted. A send() will be blocked until
the message have been safely stored away. Internal buffering is not guaranteed
in send(), and the message may be copied directly into the matching receive
buffer. If no recv() is posted, send() is allowed to block indefinitely, depending
on the availability of internal buffering in the implementation. The programmer
must allow for this–this is a low-level API for experts.
The other two communication methods isend() and irecv() are non-blocking
versions of send() and recv(). These are equivalent to MPI
ISEND and
MPI
IRECV in MPI. Unlike blocking send, a non-blocking send returns imme-
diately after its call and does not wait for completion. To complete the commu-
nication a separate send complete call (like iwait() and iwaitany() methods
in the Request class) is needed. A non-blocking receive also work similarly.
The wait() op er ations block exactly as for the blocking versions of send() and
recv() (e.g. the wait() operation for an isend() is allowed to block indefinitely
if no matching receive is posted). The method dup() creates a new communi-
cator the spanning the same set of processes, but with a distinct communica-
tion context. We can also create a new communicator spanning a selected set
of processes selected using the create() method. The ids of array ids contain
a list of ids relative to this communicator. Processes that are outside of the
group will get a null result. The new communicator has a distinct communica-
tion context. By calling the free() method, we can destroy this communicator
(like MPI COMM FREE in MPI). This method is called usually when this
communicator is no longer in use. It frees any resources that used by this com-
municator.
We should call static init() method once before calling any other methods
in communicator. This static method initializes mpjdev and makes it ready to
use. The static method finish() (which is equivalent of MPI
FINALIZE) is
the last method s hould be called in mpjdev.

The other important class is Request (Figure 3). This class is used for non-
blocking communications to ensure completion of non-blocking send and receive.
We wait for a single non-blocking communication to complete by calling iwait()
method. This method returns when the op eration identified by the current class
is complete. The other method iwaitany() waits for one non-blocking commu-
nication from a set of requests reqs to complete. This method returns when one
of the op er ations associated with the active requests in the array reqs has com-
pleted. After completion of iwait() or iwaitany() call, the source and tag fields
of the returned status object are initialized. One more field, index, is initialized
for iwaitway() method. This field indicates the index of the selected request in
the reqs array.
3 Message Format
This section describes the message format used by mpjdev. The specification
here doesn’t define how a message vector which contained in the Buffer object
is stored internally-for example it may be as a Java byte [] array or it may be
as a C char [] array, accessed through native methods. But this section does
define the organization of data in the buffer. It is the responsibility of the user to
ensure that sufficient space is available in the buffer to hold the desired message.
Trying to write too much data to a buffer causes an exception to be thrown.
Likewise, trying to receive a message into a buffer that is too small will cause
an exception to be thrown. These features are (arguably) in the spirit of MPI.
A message is divided into two main parts. The primary payload is used to
store message elements of primitive type. The secondary payload is intended to
hold the data from object elements in the message (although other uses for the
secondary payload are conceivable). The size of the primary payload is limited
by the fixed capacity of the buffer, as discussed above. The size of the secondary
payload, if it is non-empty, is likely to be determined ”dynamically”-for example
as objects are written to the buffer.
The message starts with a short primary header, defining an encoding scheme
used in headers and primary payload, and the total number of data bytes in the
primary payload. Only one byte is allocated in the message to describe the
encoding scheme: currently the only encoding schemes supported or envisaged
are big-endian and little-endian. This is to allow for native implementations of
the buffer operations, which (unlike standard Java read/write operations) may
use either byte order. A message is divided into zero or more sections. Each
section contains a fixed number of elements of homogeneous type. The elements
in a section will all have identical primitive Java type, or they will all have
Object type (in the latter case the exact classes of the objects need not be
homogeneous within the s ection).
Each section has a short header in the primary payload, specifying the type
of the elements, and the number of elements in the section. For sections with
primitive type, the header is followed by the actual data. For s ections with object
type, the header is the only representation of the section appearing in the primary

Citations
More filters

Proceedings ArticleDOI
23 May 2009
TL;DR: This paper proposes a methodology to compare the communication pattern of distributed-memory programs and applies it to four applications in the NAS parallel benchmark suite and evaluates the communication patterns by studying the effects of varying problem size and the number of logical processes (LPs).
Abstract: Interprocessor communication is an important factor in determining the performance scalability of parallel systems. The communication requirements of a parallel application can be quantified to understand its communication pattern and communication pattern similarities among applications can be determined. This is essential for the efficient mapping of applications on parallel systems and leads to better interprocessor communication implementation among others. This paper proposes a methodology to compare the communication pattern of distributed-memory programs. Communication correlation coefficient quantifies the degree of similarity between two applications based on the communication metrics selected to characterize the applications. To capture the network topology requirements, we extract the communication graph of each applications and quantities this similarity. We apply this methodology to four applications in the NAS parallel benchmark suite and evaluate the communication patterns by studying the effects of varying problem size and the number of logical processes (LPs).

26 citations


Cites background from "A low–level communication library f..."

  • ...To achieve good cost-performance trade-off, it is important to understand the underlying communication behavior and patterns of parallel applications [2], [8], [21]....

    [...]

  • ...Existing related work focuses either on the characterization of collective and point-to-point communication of parallel applications [11], [20], [23], or amelioration of the efficiency of the communication of parallel applications [2], [3], [17]....

    [...]


Journal ArticleDOI
TL;DR: This paper presents a more efficient Java message‐passing communications device, based on Java Input/Output sockets, that avoids this buffering overhead and implements several strategies, both in the communication protocol and in the HPC hardware support, which optimizes Java message­Passing communications.
Abstract: Since its release, the Java programming language has attracted considerable attention from the high-performance computing (HPC) community because of its portability, high programming productivity, and built-in multithreading and networking support. As a consequence, several initiatives have been taken to develop a high-performance Java message-passing library to program distributed memory architectures, such as clusters. The performance of Java message-passing applications relies heavily on the communications performance. Thus, the design and implementation of low-level communication devices that support message-passing libraries is an important research issue in Java for HPC. MPJ Express is our Java message-passing implementation for developing high-performance parallel Java applications. Its public release currently contains three communication devices: the first one is built using the Java New Input/Output (NIO) package for the TCP/IP; the second one is specifically designed for the Myrinet Express library on Myrinet; and the third one supports thread-based shared memory communications. Although these devices have been successfully deployed in many production environments, previous performance evaluations of MPJ Express suggest that the buffering layer, tightly coupled with these devices, incurs a certain degree of copying overhead, which represents one of the main performance penalties. This paper presents a more efficient Java message-passing communications device, based on Java Input/Output sockets, that avoids this buffering overhead. Moreover, this device implements several strategies, both in the communication protocol and in the HPC hardware support, which optimizes Java message-passing communications. In order to evaluate its benefits, this paper analyzes the performance of this device comparatively with other Java and native message-passing libraries on various high-speed networks, such as Gigabit Ethernet, Scalable Coherent Interface, Myrinet, and InfiniBand, as well as on a shared memory multicore scenario. The reported communication overhead reduction encourages the upcoming incorporation of this device in MPJ Express (). Copyright © 2011 John Wiley & Sons, Ltd.

8 citations


Cites methods from "A low–level communication library f..."

  • ...The point-to-point primitives (or base level) are implemented on top of mpjdev, the MPJ device layer [13], which has two implementations, the ‘pure’ (100%) Java and the native one....

    [...]


Proceedings ArticleDOI
09 Dec 2008
TL;DR: This work proposes a new approach MPACP (matching of parallel application communication patterns) to automate the analysis of the similarity between two parallel applications and provides a reliable report which will help users or developers understand the similarity among communication patterns of parallel applications.
Abstract: Current trends in HPC (high performance computing) suggest that clusters will soon consist with hundreds, if not thousands, processors and the size of current scientific problems becomes much larger than before. Many researchers have predicted that the communication among these processors has dominated the execution time of the scientific parallel applications. Users will need well understanding on communication patterns among scientific parallel applications and their similarities so that users benefit not only from cost saving on constructing the running environment for these applications but also from obtaining better performance. In this paper, we address the communication pattern matching, and focus on point-to-point communication, which is primarily utilized (over 90% all MPI (message passing interface) calls) in most MPI codes and has much more impact on the communication performance than collective communication does. In this work, our contribution is that we propose a new approach MPACP (matching of parallel application communication patterns) to automate the analysis of the similarity between two parallel applications and provide a reliable report which will help users or developers understand the similarity among communication patterns of parallel applications. Furthermore, experimental results demonstrate the effective performance of our scheme in terms of the automatic matching of parallel application communication patterns.

2 citations


Cites background from "A low–level communication library f..."

  • ...Many teams include researchers and developers have realized that the communication patterns of parallel applications play a significant role in the performance [4], [5] and [6]....

    [...]


Proceedings ArticleDOI
27 Aug 2009
TL;DR: The paper discusses an overview of APJava and pre-translation scheme and basic translation scheme adopted in a translator for the APJava language.
Abstract: The paper introduces the APJava programming environment, called APJava. APJava is an aspect-oriented parallel dialect of Java that imports HPJava-like arrays -- in particular the distributed arrays -- as new data structures. The main purpose of APJava is to provide an easy-to-use aspect-oriented parallel programming environment to engineers and scientists unfamiliar with parallel programming. The paper discusses an overview of APJava and pre-translation scheme and basic translation scheme adopted in a translator for the APJava language.

1 citations


Cites background from "A low–level communication library f..."

  • ...Languages...

    [...]

  • ...D.3.3 [Programming Techniques]: Concurrent Programming – Parallel Programming...

    [...]


Book ChapterDOI
TL;DR: A non-blocking communication library to efficiently support specialized communication hardware and provides the basis for a Java Message-passing library to be implemented on top of it.
Abstract: This paper presents communication strategies for supporting efficient non-blocking Java communication on clusters. The communication performance is critical for the overall cluster performance. It is possible to use non-blocking communications to reduce the communication overhead. Previous efforts to efficiently support non-blocking communication in Java have led to the introduction of the Java NIO API. Although the Java NIO package addresses scalability issues by providing select() like functionality, it lacks support for high speed interconnects. To solve this issue, this paper introduces a non-blocking communication library to efficiently support specialized communication hardware. This library focuses on reducing the startup communication time, avoiding unnecessary copying, and overlapping computation with communication. This project provides the basis for a Java Message-passing library to be implemented on top of it. Towards the end, this paper evaluates the proposed approach on a Scalable Coherent Interface (SCI) and Gigabit Ethernet (GbE) testbed cluster. Experimental results show that the proposed library reduces the communication overhead and increases computation and communication overlapping.

1 citations


Cites methods from "A low–level communication library f..."

  • ...This is the case for mpjdev [4] used in HPJava [5] and of xdev used in a Java messaging-passing system, MPJ Express [6]....

    [...]


References
More filters

01 Jan 2003
TL;DR: The paper describes the novel issues in the implementation of device level library on different platforms, and gives comprehensive benchmark results on a parallel platform.
Abstract: Two characteristic run-time communication libraries of HPJava are developed as an application level library and device level library. A high-level communication API, Adlib, is developed as an application level communication library. This communication library supports collective operations on distributed arrays. The mpjdev API is a device level underlying communication library for HPJava. This library is developed to perform actual communication between processes. The paper describes the novel issues in the implementation of device level library on different platforms, and gives comprehensive benchmark results on a parallel platform. All software developed in this project is available for free download from www.hpjava.org. Procs2 p = new Procs2(P, P) ; on(p) { Range x = new BlockRange(M, p.dim(0)) ; Range y = new BlockRange(N, p.dim(1)) ; float [[-,-]] a = new float [[x, y]], b = new float [[x, y]], c = new float [[x, y]]; ... initialize values in `a', `b' overall(i = x for :) overall(j = y for :) c [i, j] = a [i, j] + b [i, j] ;

11 citations


01 Jan 2003
TL;DR: The dissertation work will concentrate on issues related to the development of efficient run time support software for parallel languages extending an underlying object-oriented language, and gives comprehensive benchmark results on a parallel platform.
Abstract: The dissertation research is concerned with enabling parallel, high-performance computation in particular development of scientific software in the network-aware programming language, Java. Traditionally, this kind of computing was done in Fortran. Arguably, Fortran is becoming a marginalized language, with limited economic incentive for vendors to produce modern development environments, optimizing compilers for new hardware, or other kinds of associated software expected of by today's programmers. Hence, Java looks like a very promising alternative for the future. The dissertation will discuss in detail a particular environment called HPJava. HPJava is the environment for parallel programming—especially data-parallel scientific programming—in Java. Our HPJava is based around a small set of language extensions designed to support parallel computation with distributed arrays, plus a set of communication libraries. In particular the dissertation work will concentrate on issues related to the development of efficient run time support software for parallel languages extending an underlying object-oriented language. Two characteristic run-time communication libraries of HPJava are developed as an application level library and device level library. A high-level communication API, Adlib, is developed as an application level communication library suitable for our HPJava. This communication library supports collective operations on distributed arrays. We include Java Object as one of the Adlib communication data types. So we fully support communication of intrinsic Java types, including primitive types, and Java object types. The Adlib library is developed on top of low-level communication library called mpjdev, designed to interface efficiently to the Java execution environment (virtual machine). The mpjdev API is a device level underlying communication library for HPJava. This library is developed to perform actual communication between processes. The mpjdev API is developed with HPJava in mind, but it is a standalone library and could be used by other systems. This can be implementing portably on network platforms and efficiently on parallel hardware. The dissertation describes the novel issues in the interface and implementation of these libraries on different platforms, and gives comprehensive benchmark results on a parallel platform. All software developed in this project is available for free download from www.hpjava.org.

6 citations