scispace - formally typeset
Open AccessJournal ArticleDOI

Understanding the High-Performance-Computing Community: A Software Engineer's Perspective

Reads0
Chats0
TLDR
Computational scientists developing software for HPC systems face unique software engineering issues and attempts to transfer SE technologies to this domain must take these issues into account.
Abstract
Computational scientists developing software for HPC systems face unique software engineering issues. Attempts to transfer SE technologies to this domain must take these issues into account.

read more

Content maybe subject to copyright    Report

focus
2 I E E E S O F T W A R E P u b l i s h e d b y t h e I E E E C o m p u t e r S o c i e t y 0 74 0 -7 4 5 9 / 0 8 / $ 2 5 . 0 0 © 2 0 0 8 I E E E
developing scientific software
Understanding
the High-Performance-
Computing Community:
A Software Engineers Perspective
Victor R. Basili and Daniela Cruzes, University of Maryland, College Park,
and Fraunhofer Center for Experimental Software Engineering-Maryland
Jeffrey C. Carver, Mississippi State University
Lorin M. Hochstein, University of Nebraska-Lincoln
Jeffrey K. Hollingsworth and Marvin V. Zelkowitz, University of Maryland,
College Park
Forrest Shull, Fraunhofer Center for Experimental Software Engineering-Maryland
Computational
scientists developing
software for HPC
systems face
unique software
engineering issues.
Attempts to transfer
SE technologies to
this domain must
take these issues
into account.
F
or the past few years, weve had the opportunity, as software engineers, to ob-
serve the development of computational-science software (called codes) built for
high-performance-computing (HPC) machines in many different contexts. Al-
though we havent studied all types of HPC development, weve encountered a
wide cross-section of projects. Despite these projectsdiversity, several common traits exist:
Many developers receive their software train-
ing from other scientists. Although the scien-
tists often have been writing software for many
years, they generally lack formal software en-
gineering (SE) training, especially in managing
multiperson development teams and complex
software artifacts.
Many of the codes arent initially designed to
be large. They start small and then grow on the
basis of their scientific success.
Many development teams use their own code
(or code developed as part of their research
group).
For these reasons (and many others), development
practices in this community differ considerably
from those in moretraditionalSE.
We aim here to distill our experience about
how software engineers can productively engage
the HPC community. Several SE practices gener-
ally considered good ideas in other development
environments are quite mismatched to the HPC
community’s needs. For SE researchers, the keys
to successful interactions include a healthy sense of
humility and the avoidance of assumptions that SE
expertise applies equally in all contexts.
Background
A list of the 500 fastest supercomputers (www.
top500.org) shows that, as of November 2007, the

July/August 2008 I E E E S O F T W A R E
3
most powerful system had 212,992 processors. Al-
though a given application wouldnt routinely use
all these processors, it would regularly use a high
percentage of them for a single job. Effectively using
tens of thousands of processors on a single project is
considered normal in this community.
We were interested in codes requiring nontrivial
communication among the individual processors
throughout the execution. Although HPC systems
have many uses, a common application is to simu-
late physical phenomena such as earthquakes, global
climate change, or nuclear reactions. These codes
must be written to explicitly harness HPC systems
parallelism. Although many parallel-programming
models exist, the dominant model is MPI (message-
passing interface), a library where the programmer
explicitly specifies all communication. Fortran re-
mains widely used for developing new HPC soft-
ware, as do C and C++. Frequently, a single system
incorporates multiple programming languages. We
even saw several projects use dynamic languages
such as Python to couple different modules written
in a mix of Fortran, C, and C++.
In 2004, Darpa launched the High Productiv-
ity Computing Systems program (HPCS, www.
highproductivity.org) to significantly advance HPC
technology by supporting vendor efforts to de-
velop next-generation systems, focusing on both
hardware and software issues. In addition, Darpa
also funded researchers to develop productivity
evaluation methods that measure scientific output
more realistically than does simple processor utili-
zation, the measure used by the Top500 list. Our
initial role was to evaluate how newly proposed
languages affect programmer productivity. In addi-
tion, one of us helped conduct a series of case stud-
ies of existing HPC projects in government labs to
characterize these projects and document lessons
learned.
The HPCS programs significance was its shift in
emphasis from execution time to time-to-solution,
which incorporates both development and execution
time. We began this research by running controlled
experiments to measure the impact of different par-
allel-programming models. Because the proposed
languages werent yet usable, we studied available
technologies such as MPI, OpenMP, UPC (Unified
Parallel C), Co-Array Fortran, and Matlab*P, us-
ing students in parallel-programming courses from
eight different universities.
1
To widen this researchs scope, we collected
“folklore”—that is, the community’s tacit, un-
formalized view of what’s true. We collected it
first through a focus group of HPC researchers,
then by surveying HPC practitioners involved in
the HPCS program, and then by interviewing a
sampling of practitioners including academic re-
searchers, technologists developing new HPC
systems, and project managers. Finally, we con-
ducted case studies of projects at both US govern-
ment labs
2
and academic labs.
3
The development world
of the computational scientist
To understand why certain SE technologies are a
poor t for computational scientists, it’s important
to rst understand the scientistsworld and the con-
straints it places on them. Overall, we found that
theres no such thing as a single “HPC community.
Our research was restricted entirely to computa-
tional scientists using HPC systems to run simula-
tions. Despite this narrow focus, we saw enormous
Table 1
HPC community attributes
Attribute Values Description
Team size
Individual This scenario, sometimes called the “lone researcher” scenario, involves only one developer.
Large This scenario involves “community codes” with multiple groups, possibly geographically distributed.
Code life
Short A code that’s executed few times (for example, one from the intelligence community) might trade less devel-
opment time (less time spent on performance and portability) for more execution time.
Long A code that’s executed many times (for example, a physics simulation) will likely spend more time in develop-
ment (to increase portability and performance) and amortize that time over many executions.
Users
Internal Only developers use the code.
External The code is used by other groups in the organization (for example, at US government labs) or sold commer-
cially (for example, Gaussian, www.gaussian.com)
Both “Community codes” are used both internally and externally. Version control is more complex in this case
because both a development and a release version must be maintained.

4 I E E E S O F T W A R E w w w . c o m p u t e r. o r g / s o f t w a r e
variation, especially in the kinds of problems that
people are using HPC systems to solve. Table 1
shows four of the many attributes that vary across
the HPC community.
The goal of scientists
is to do science, not execute software
One possible measure of productivity is sci-
entifically useful results over calendar time.
This implies sufcient simulated time and
resolution, plus sufficient accuracy of the
physical models and algorithms. (All quotes
are from interviews with scientists unless
otherwise noted.)
[Floating-point operations per second]
rates are not a useful measure of science
achieved.—user talk, IBM scientific users
group conference
4
Initially, we believed that performance was of
paramount importance to scientists developing
on HPC systems. However, after in-depth inter-
views, we found that scientific researchers focus
on producing publishable results. Writing codes
that perform efficiently on HPC systems is a
means to an end, not an end in itself. Although
this point might sound obvious, we feel that
many in the HPC community overlook it.
A scientists goal is to produce new scientific
knowledge. So, if scientists can execute their
computational simulations using the time and re-
sources allocated to them on the HPC system, they
see no need for or benefit from optimizing the per-
formance. They see the need for optimization only
when they can’t complete the simulation at the de-
sired fidelity with the allocated resources. When
optimization is necessary, it’s often broad-based,
not only including traditional computer science
notions of code tuning and algorithm modifica-
tion but also rethinking the underlying mathemat-
ical approximations and potentially fundamen-
tally changing the computation. So, technologies
that focus only on code tuning are of somewhat
limited utility to this community.
Computational scientists dont view perfor-
mance gains in the same way as computer scien-
tists. For example, one of us (trained in computer
science) improved a codes performance by more
than a factor of two. He expected this improve-
ment would save computing time. Instead, when he
informed the computational scientist, the scientist
responded that the saved time could be used to add
more function—that is, to get a higher-fidelity ap-
proximation of the problem being solved.
Conclusion: Scientists make decisions based
on maximizing scientific output, not program
performance.
Performance vs.
portability and maintainability
If somebody said, maybe you could get 20
percent [performance improvement] out of
it, but you have to do quite a bit of a rewrite,
and you have to do it in such a way that it
becomes really ugly and unreadable, then
maintainability becomes a real problem.
I don’t think we would ever do anything for
20 percent. The number would have to be
between 2x and an order of magnitude.
Readability is critical in these codes: describe
the algorithms in a mathematical language as
opposed to a computer language.
Scientists must balance performance and develop-
ment effort. We saw a preference for technologies
that let scientists control the performance to the
level needed for their science, even by sacrificing
abstraction and ease of programming. Hence their
extensive use of C and Fortran, which offer more
predictable performance and less abstraction than
higher-level programming languages.
Conversely, the scientists arent driven entirely
by performance. They wont sacrifice significant
maintainability for modest performance improve-
ments. Because the codes must run on multiple cur-
rent and future HPC systems, portability is a ma-
jor concern. Codes must run efficiently on multiple
machines. Application scientists arent interested in
performing machine-specific performance tuning
because they’ll lose the benefits of their efforts when
they port the code to the next platform. In addition,
source code changes that improve performance
typically make code more difficult to understand,
creating a disincentive to make certain kinds of per-
formance improvements.
Conclusion: Scientists want the control to in-
crease performance as necessary but wont sacrifice
everything to performance.
Verification and validation
for scientific codes
Testing is different. … It’s very much a qual-
itative judgment about how an algorithm is
actually performing in a mathematical sense.
Finally, when the thing is working in a satis-
factory way—say, in a single component—you
may then go and run it in a coupled applica-
tion, and you’ll find out there are some fea-
Scientists want
the control
to increase
performance
as necessary
but wont
sacrifice
everything to
performance.

July/August 2008 I E E E S O F T W A R E
5
tures you didn’t understand that came about
in a coupled application and you need to go
back and think about those.
Simulation software commonly produces an ap-
proximation to a set of equations that cant be
solved exactly. You can think of this development
as a two-step process: translating the problem to
an algorithm and translating the algorithm to code.
You can evaluate these approximations (mapping a
problem to an algorithm) qualitatively on the basis
of possessing desirable properties (for example, sta-
bility) and ensuring that various conservation laws
hold (for example, that energy is conserved). The
approximations required precision depends on the
nature of the phenomenon youre simulating. For
example, new problems can arise when you inte-
grate approximations of a systems different aspects.
Suddenly, an approximation that was perfectly ade-
quate for standalone use might not be good enough
for the integrated simulation. Identifying and evalu-
ating an algorithms quality is a challenge. One sci-
entist we spoke with said that algorithmic defects
are much more significant than coding defects.
Validating simulation codes is an enormous
challenge. In principle, you can validate a code by
comparing the simulation output with a physical
experiment’s results. In practice, because simula-
tions are written for domains in which experiments
are prohibitively expensive or impossible, validation
is very difcult. Entire scientific programs, costing
hundreds of millions of dollars per year for many
years, have been built around experimental valida-
tion of large codes.
Conclusion: Debugging and validation are qual-
itatively different for HPC than for traditional soft-
ware development.
Skepticism of new technologies
I hate MPI, I hate C++. [But} if I had to
choose again, I would probably choose the
same.
Our codes are much larger and more com-
plex than the “toy” programs normally used
in [classroom settings]. We would like to see
a number of large workhorse applications
converted and benchmarked.
The scientists have a cynical view of new technol-
ogies because the history of HPC is littered with
new technologies that promised increased scien-
tific productivity but are no longer available. Some
of this skepticism is also due to the long life of
HPC codes; frequently, a code will have a 30-year
life cycle. Because of this long life, scientists will
embrace a new technology only if they believe it
will survive in the long term. This explains MPI’s
widespread popularity, despite constant grumbling
about its difficulty.
Scientific programmers often develop code such
that they can plug in different technologies to evalu-
ate them. For example, when MPI was new in the
1990s, many groups were cautious about its long-
term prospects and added it to their code alongside
existing message-passing libraries. As MPI became
widely used and trusted, these older libraries were
retired. Similar patterns have been observed with
solver libraries, I/O libraries, and tracing tools.
The languages being developed in the Darpa
HPCS program were intended to extend the fron-
tiers of what’s possible in today’s machines. So, we
sought practitioners working on very large codes
running on very large machines. Because of the
time theyve already invested in their codes and
their need for long-lived codes, they all expressed
great trepidation at the prospect of rewriting a code
in a new language.
Conclusion: A new technology that can coexist
with older ones has a greater chance of success than
one requiring complete buy-in at the beginning.
Shared, centralized computing resources
The problem with debugging, of course, is
that you want to rerun and rerun. The whole
concept of a batch queue would make that a
week-long process. Whereas, on a dedicated
weekend, in a matter of hours you can pound
out 10 or 20 different runs of enormous size
and understand where the logic is going wrong.
Because of HPC systems’ cost, complexity, and
size, they’re typically located at HPC centers and
shared among user groups, with batch scheduling
to coordinate executions. Users submit their jobs
to a queue with a request for a certain number of
processors and maximum execution time. This in-
formation is used to determine when to schedule
the job. If the time estimate is too low, the job will
be preemptively terminated; if it’s too high, the job
will wait in the queue longer than necessary.
Because these systems are shared resources,
scientists are physically remote from the comput-
ers they use. So, potentially useful tools that were
designed to be interactive become unusably slow
and are soon discarded because they don’t take

6 I E E E S O F T W A R E w w w . c o m p u t e r. o r g / s o f t w a r e
into account the long latency times of remote
connections. Unfortunately for scientists, using
an HPC system typically means interacting with
the batch queue.
Debugging batch-scheduled jobs is also tedious
because the queue wait increases the turnaround
time. Some systems provide “interactive nodes
that let users run smaller jobs without entering the
batch queue. Unfortunately, some defects mani-
fest themselves only when the code runs on large
numbers of processors.
Center policies that use system utilization as
a productivity metric might exacerbate the prob-
lem of the queue. Because utilization is inversely
proportional to availability, policies that favor
maximizing utilization will have longer waits.
5
As a counterexample, Lincoln Laboratories pro-
vides interactive access to all users and purchases
excess computing capacity to ensure that users
computational needs are met.
2
Conclusion: Remote access precludes the use of
certain software tools, and system access policies
can significantly affect productivity.
Mismatches between
computational science and SE
Repeatedly, we saw that SE technologies that dont
take the scientists’ constraints into account fail or
arent adopted. The computer science community
isn’t necessarily aware of this lesson. Software
engineers collaborating with scientists should un-
derstand that the resistance to adoption of unfa-
miliar technologies is based on real experiences.
For example, concepts such as CMMI arent
well matched to the incremental nature of HPC
development.
Object-oriented languages
Java is for drinking.—parallel-programming
course syllabus
Developers on a project said, “we’re going to
use class library X that will hide all our array
operations and do all the right things.” … Im-
mediately, you ran into all sorts of issues. First
of all, C++, for example, was not transport-
able because compilers work in different ways
across these machines.
OO technologies are firmly entrenched in the SE
community. But in the HPC community, C and For-
tran still dominate, although C++ is used and one
project was exploring the use of Java. We also saw
some Python use, although never for performance-
critical code.
Fortran-like Matlab has seen widespread adop-
tion among scientists, although not necessarily in
the HPC community. To date, OO hasnt been a
good fit for HPC, even though the community has
adopted some concepts. One reason for the lack of
widespread adoption might be that OO-based lan-
guages such as C++ have been evolving much more
rapidly than C and Fortran in recent years and are
therefore riskier choices.
Conclusion: More study is needed to identify
why OO has seen such little adoption and whether
pockets exist in HPC where OO might be suitable.
Frameworks
If you talk about components in the Com-
mon Component Architecture or anywhere
else, components make very myopic deci-
sions. In order to achieve capability, you
need to make global decisions. If you allow
the components to make local decisions,
performance isn’t as good.
Frameworks provide programmers a higher level of
abstraction, but at the cost of adopting the frame-
work’s perspective on how to structure the code.
Example HPC frameworks include
Pooma (Parallel Object-Oriented Methods and
Applications), a novel C++ OO framework for
writing parallel codes that hides the low-level
details of parallelism, and
CCA (Common Component Architecture), for
implementing component-based HPC software.
Douglass Post and Richard Kendall tell how Los
Alamos National Laboratory sought to modern-
ize an old Fortran-based HPC code using Pooma.
6
Even though the project spent over 50 percent of its
code-development resources on Pooma, the frame-
work was slower than the original Fortran code. It
also lacked the flexibility of the lower-level parallel
libraries to implement the desired physics.
The scientist in our studies dont use frame-
works. Instead, they implement their own abstrac-
tion levels on top of MPI to hide low-level details,
and they develop their own component architecture
to couple their subsystems.
Of all the multiphysics applications we encoun-
tered, only one used any aspect of CCA technol-
ogy, and one of that application’s developers was
an active member of the CCA initiative. When we
Scientists
have yet
to be convinced
that reusing
existing
frameworks
will save them
more effort
than building
their own
from scratch.

Citations
More filters
Proceedings ArticleDOI

How do scientists develop and use scientific software

TL;DR: The main conclusions are that the knowledge required to develop and use scientific software is primarily acquired from peers and through self-study, rather than from formal education and training and there is no uniform trend of association between rank of importance of software engineering concepts and project/team size.
Journal ArticleDOI

Troubling Trends in Scientific Software Use

TL;DR: This work describes problems with the adoption and use of scientific software and reveals key insights and best practices for how to develop, standardize, and implement software.
Journal ArticleDOI

Engineering the Software for Understanding Climate Change

TL;DR: The authors conducted an ethnographic study of climate scientists and found that their culture and practices share many features of agile and open source projects, but with highly customized software validation and verification techniques.
Journal ArticleDOI

A checklist for integrating student empirical studies with research and teaching goals

TL;DR: The requirements that research and pedagogy place on a valid empirical study with students are identified and used as the basis for a checklist that provides guidance for researchers and educators when planning and conducting studies in university courses.
Journal ArticleDOI

Amesos2 and Belos: Direct and iterative solvers for large sparse linear systems

TL;DR: Together, Amesos2 and Belos form a complete suite of sparse linear solvers, which favors algorithms that solve higher-level problems, such as multiple simultaneous linear systems and sequences of related linear systems, faster than standard algorithms.
References
More filters
Proceedings ArticleDOI

Software Development Environments for Scientific and Engineering Software: A Series of Case Studies

TL;DR: Nine lessons learned from five representative projects are presented, along with their software engineering implications, to provide insight into the software development environments in this domain.
Proceedings ArticleDOI

Parallel Programmer Productivity: A Case Study of Novice Parallel Programmers

TL;DR: A series of studies instrumented the development process used in multiple HPC classroom environments and analyzed data within and across such studies, varying factors such as the parallel programming model used and the application being developed, to understand their impact on theDevelopment process.
Journal ArticleDOI

Software Project Management and Quality Engineering Practices for Complex, Coupled Multiphysics, Massively Parallel Computational Simulations: Lessons Learned From ASCI

TL;DR: This work has developed “lessons learned” from a set of code projects that the Department of Energy National Nuclear Security Agency has sponsored to develop nuclear weapons simulations over the last 50 years.
Journal ArticleDOI

The ASC-Alliance Projects: A Case Study of Large-Scale Parallel Scientific Code Development

TL;DR: Five large-scale computational science software projects operated at the five ASC-Alliance centers were examined to better understand the nature of software development in this context.
Posted Content

Annual World Bank Conference on Development Economics 2004 : Accelerating Development

TL;DR: In the proceedings of the 2003 World Bank Annual Bank Conference on Development Economics (ABCDE), the volume as mentioned in this paper presents new research findings and discussions on key policy issues related to poverty reduction by eminent scholars and practitioners from around the world.
Related Papers (5)
Frequently Asked Questions (16)
Q1. What are the contributions in "A software engineer’s perspective" ?

Although the authors haven ’ t studied all types of HPC development, they ’ ve encountered a wide cross-section of projects. 

For SE researchers, the keys to successful interactions include a healthy sense of humility and the avoidance of assumptions that SE expertise applies equally in all contexts. 

The authors even saw several projects use dynamic languages such as Python to couple different modules written in a mix of Fortran, C, and C++. 

Although HPC systems have many uses, a common application is to simulate physical phenomena such as earthquakes, global climate change, or nuclear reactions. 

Example HPC frameworks includePooma (Parallel Object-Oriented Methods and Applications), a novel C++ OO framework for writing parallel codes that hides the low-level details of parallelism, and CCA (Common Component Architecture), for implementing component-based HPC software. 

The authors saw extensive reuse-in-the-small, in the form of reusing externally developed libraries such as preconditioners, solvers, adaptive mesh refinement support, and parallel I/O libraries. 

The HPCS program’s significance was its shift in emphasis from execution time to time-to-solution, which incorporates both development and execution time. 

On multiphysics applications involving integration of multiple models maintained by independent groups, the scientists devoted much effort on software architecture for integrating these components, including using OO concepts. 

Although many parallel-programming models exist, the dominant model is MPI (messagepassing interface), a library where the programmer explicitly specifies all communication. 

A code that’s executed many times (for example, a physics simulation) will likely spend more time in development (to increase portability and performance) and amortize that time over many executions. 

In addition, Darpa also funded researchers to develop productivity evaluation methods that measure scientific output more realistically than does simple processor utilization, the measure used by the Top500 list. 

Because of the time they’ve already invested in their codes and their need for long-lived codes, they all expressed great trepidation at the prospect of rewriting a code in a new language. 

Software engineers collaborating with scientists should understand that the resistance to adoption of unfamiliar technologies is based on real experiences. 

One reason for the lack of widespread adoption might be that OO-based languages such as C++ have been evolving much more rapidly than C and Fortran in recent years and are therefore riskier choices. 

Application scientists aren’t interested in performing machine-specific performance tuning because they’ll lose the benefits of their efforts when they port the code to the next platform. 

on a dedicated weekend, in a matter of hours you can pound out 10 or 20 different runs of enormous size and understand where the logic is going wrong.