What are the contributions in "A software engineer’s perspective" ?

Although the authors haven ’ t studied all types of HPC development, they ’ ve encountered a wide cross-section of projects.

What did they see in the reuse-in-the-small?

The authors saw extensive reuse-in-the-small, in the form of reusing externally developed libraries such as preconditioners, solvers, adaptive mesh refinement support, and parallel I/O libraries.

What did the scientists do to integrate the components of their multiphysics applications?

On multiphysics applications involving integration of multiple models maintained by independent groups, the scientists devoted much effort on software architecture for integrating these components, including using OO concepts.

What did Darpa do to help improve productivity?

In addition, Darpa also funded researchers to develop productivity evaluation methods that measure scientific output more realistically than does simple processor utilization, the measure used by the Top500 list.

Why are application scientists not interested in performing machine-specific performance tuning?

Application scientists aren’t interested in performing machine-specific performance tuning because they’ll lose the benefits of their efforts when they port the code to the next platform.

(Open Access) Understanding the High-Performance-Computing Community: A Software Engineer's Perspective (2008) | Victor R. Basili

Q: What languages were used to couple different modules?

The authors even saw several projects use dynamic languages such as Python to couple different modules written in a mix of Fortran, C, and C++.

Q: What are some examples of HPC frameworks?

Example HPC frameworks includePooma (Parallel Object-Oriented Methods and Applications), a novel C++ OO framework for writing parallel codes that hides the low-level details of parallelism, and CCA (Common Component Architecture), for implementing component-based HPC software.

Q: What was the significance of the HPCS program?

The HPCS program’s significance was its shift in emphasis from execution time to time-to-solution, which incorporates both development and execution time.

focus

2 I E E E S O F T W A R E P u b l i s h e d b y t h e I E E E C o m p u t e r S o c i e t y 0 74 0 -7 4 5 9 / 0 8 / $ 2 5 . 0 0 © 2 0 0 8 I E E E

developing scientific software

Understanding

the High-Performance-

Computing Community:

A Software Engineer’s Perspective

Victor R. Basili and Daniela Cruzes, University of Maryland, College Park,

and Fraunhofer Center for Experimental Software Engineering-Maryland

Jeffrey C. Carver, Mississippi State University

Lorin M. Hochstein, University of Nebraska-Lincoln

Jeffrey K. Hollingsworth and Marvin V. Zelkowitz, University of Maryland,

College Park

Forrest Shull, Fraunhofer Center for Experimental Software Engineering-Maryland

Computational

scientists developing

software for HPC

systems face

unique software

engineering issues.

Attempts to transfer

SE technologies to

this domain must

take these issues

into account.

or the past few years, we’ve had the opportunity, as software engineers, to ob-

serve the development of computational-science software (called codes) built for

high-performance-computing (HPC) machines in many different contexts. Al-

though we haven’t studied all types of HPC development, we’ve encountered a

wide cross-section of projects. Despite these projects’ diversity, several common traits exist:

Many developers receive their software train-

ing from other scientists. Although the scien-

tists often have been writing software for many

years, they generally lack formal software en-

gineering (SE) training, especially in managing

multiperson development teams and complex

software artifacts.

Many of the codes aren’t initially designed to

be large. They start small and then grow on the

basis of their scientiﬁc success.

Many development teams use their own code

(or code developed as part of their research

group).

For these reasons (and many others), development

■

practices in this community differ considerably

from those in more “traditional” SE.

We aim here to distill our experience about

how software engineers can productively engage

the HPC community. Several SE practices gener-

ally considered good ideas in other development

environments are quite mismatched to the HPC

community’s needs. For SE researchers, the keys

to successful interactions include a healthy sense of

humility and the avoidance of assumptions that SE

expertise applies equally in all contexts.

Background

A list of the 500 fastest supercomputers (www.

top500.org) shows that, as of November 2007, the

July/August 2008 I E E E S O F T W A R E

most powerful system had 212,992 processors. Al-

though a given application wouldn’t routinely use

all these processors, it would regularly use a high

percentage of them for a single job. Effectively using

tens of thousands of processors on a single project is

considered normal in this community.

We were interested in codes requiring nontrivial

communication among the individual processors

throughout the execution. Although HPC systems

have many uses, a common application is to simu-

late physical phenomena such as earthquakes, global

climate change, or nuclear reactions. These codes

must be written to explicitly harness HPC systems’

parallelism. Although many parallel-programming

models exist, the dominant model is MPI (message-

passing interface), a library where the programmer

explicitly speciﬁes all communication. Fortran re-

mains widely used for developing new HPC soft-

ware, as do C and C++. Frequently, a single system

incorporates multiple programming languages. We

even saw several projects use dynamic languages

such as Python to couple different modules written

in a mix of Fortran, C, and C++.

In 2004, Darpa launched the High Productiv-

ity Computing Systems program (HPCS, www.

highproductivity.org) to signiﬁcantly advance HPC

technology by supporting vendor efforts to de-

velop next-generation systems, focusing on both

hardware and software issues. In addition, Darpa

also funded researchers to develop productivity

evaluation methods that measure scientiﬁc output

more realistically than does simple processor utili-

zation, the measure used by the Top500 list. Our

initial role was to evaluate how newly proposed

languages affect programmer productivity. In addi-

tion, one of us helped conduct a series of case stud-

ies of existing HPC projects in government labs to

characterize these projects and document lessons

learned.

The HPCS program’s signiﬁcance was its shift in

emphasis from execution time to time-to-solution,

which incorporates both development and execution

time. We began this research by running controlled

experiments to measure the impact of different par-

allel-programming models. Because the proposed

languages weren’t yet usable, we studied available

technologies such as MPI, OpenMP, UPC (Uniﬁed

Parallel C), Co-Array Fortran, and Matlab*P, us-

ing students in parallel-programming courses from

eight different universities.

To widen this research’s scope, we collected

“folklore”—that is, the community’s tacit, un-

formalized view of what’s true. We collected it

ﬁrst through a focus group of HPC researchers,

then by surveying HPC practitioners involved in

the HPCS program, and then by interviewing a

sampling of practitioners including academic re-

searchers, technologists developing new HPC

systems, and project managers. Finally, we con-

ducted case studies of projects at both US govern-

ment labs

and academic labs.

The development world

of the computational scientist

To understand why certain SE technologies are a

poor ﬁt for computational scientists, it’s important

to ﬁrst understand the scientists’ world and the con-

straints it places on them. Overall, we found that

there’s no such thing as a single “HPC community.”

Our research was restricted entirely to computa-

tional scientists using HPC systems to run simula-

tions. Despite this narrow focus, we saw enormous

Table 1

HPC community attributes

Attribute Values Description

Team size

Individual This scenario, sometimes called the “lone researcher” scenario, involves only one developer.

Large This scenario involves “community codes” with multiple groups, possibly geographically distributed.

Code life

Short A code that’s executed few times (for example, one from the intelligence community) might trade less devel-

opment time (less time spent on performance and portability) for more execution time.

Long A code that’s executed many times (for example, a physics simulation) will likely spend more time in develop-

ment (to increase portability and performance) and amortize that time over many executions.

Users

Internal Only developers use the code.

External The code is used by other groups in the organization (for example, at US government labs) or sold commer-

cially (for example, Gaussian, www.gaussian.com)

Both “Community codes” are used both internally and externally. Version control is more complex in this case

because both a development and a release version must be maintained.

4 I E E E S O F T W A R E w w w . c o m p u t e r. o r g / s o f t w a r e

variation, especially in the kinds of problems that

people are using HPC systems to solve. Table 1

shows four of the many attributes that vary across

the HPC community.

The goal of scientists

is to do science, not execute software

One possible measure of productivity is sci-

entiﬁcally useful results over calendar time.

This implies sufﬁcient simulated time and

resolution, plus sufﬁcient accuracy of the

physical models and algorithms. (All quotes

are from interviews with scientists unless

otherwise noted.)

[Floating-point operations per second]

rates are not a useful measure of science

achieved.—user talk, IBM scientiﬁc users

group conference

Initially, we believed that performance was of

paramount importance to scientists developing

on HPC systems. However, after in-depth inter-

views, we found that scientiﬁc researchers focus

on producing publishable results. Writing codes

that perform efﬁciently on HPC systems is a

means to an end, not an end in itself. Although

this point might sound obvious, we feel that

many in the HPC community overlook it.

A scientist’s goal is to produce new scientiﬁc

knowledge. So, if scientists can execute their

computational simulations using the time and re-

sources allocated to them on the HPC system, they

see no need for or beneﬁt from optimizing the per-

formance. They see the need for optimization only

when they can’t complete the simulation at the de-

sired ﬁdelity with the allocated resources. When

optimization is necessary, it’s often broad-based,

not only including traditional computer science

notions of code tuning and algorithm modiﬁca-

tion but also rethinking the underlying mathemat-

ical approximations and potentially fundamen-

tally changing the computation. So, technologies

that focus only on code tuning are of somewhat

limited utility to this community.

Computational scientists don’t view perfor-

mance gains in the same way as computer scien-

tists. For example, one of us (trained in computer

science) improved a code’s performance by more

than a factor of two. He expected this improve-

ment would save computing time. Instead, when he

informed the computational scientist, the scientist

responded that the saved time could be used to add

more function—that is, to get a higher-ﬁdelity ap-

proximation of the problem being solved.

Conclusion: Scientists make decisions based

on maximizing scientiﬁc output, not program

performance.

Performance vs.

portability and maintainability

If somebody said, maybe you could get 20

percent [performance improvement] out of

it, but you have to do quite a bit of a rewrite,

and you have to do it in such a way that it

becomes really ugly and unreadable, then

maintainability becomes a real problem. …

I don’t think we would ever do anything for

20 percent. The number would have to be

between 2x and an order of magnitude. …

Readability is critical in these codes: describe

the algorithms in a mathematical language as

opposed to a computer language.

Scientists must balance performance and develop-

ment effort. We saw a preference for technologies

that let scientists control the performance to the

level needed for their science, even by sacriﬁcing

abstraction and ease of programming. Hence their

extensive use of C and Fortran, which offer more

predictable performance and less abstraction than

higher-level programming languages.

Conversely, the scientists aren’t driven entirely

by performance. They won’t sacriﬁce signiﬁcant

maintainability for modest performance improve-

ments. Because the codes must run on multiple cur-

rent and future HPC systems, portability is a ma-

jor concern. Codes must run efﬁciently on multiple

machines. Application scientists aren’t interested in

performing machine-speciﬁc performance tuning

because they’ll lose the beneﬁts of their efforts when

they port the code to the next platform. In addition,

source code changes that improve performance

typically make code more difﬁcult to understand,

creating a disincentive to make certain kinds of per-

formance improvements.

Conclusion: Scientists want the control to in-

crease performance as necessary but won’t sacriﬁce

everything to performance.

Veriﬁcation and validation

for scientiﬁc codes

Testing is different. … It’s very much a qual-

itative judgment about how an algorithm is

actually performing in a mathematical sense.

… Finally, when the thing is working in a satis-

factory way—say, in a single component—you

may then go and run it in a coupled applica-

tion, and you’ll ﬁnd out there are some fea-

Scientists want

the control

to increase

performance

as necessary

but won’t

sacriﬁce

everything to

performance.

July/August 2008 I E E E S O F T W A R E

tures you didn’t understand that came about

in a coupled application and you need to go

back and think about those.

Simulation software commonly produces an ap-

proximation to a set of equations that can’t be

solved exactly. You can think of this development

as a two-step process: translating the problem to

an algorithm and translating the algorithm to code.

You can evaluate these approximations (mapping a

problem to an algorithm) qualitatively on the basis

of possessing desirable properties (for example, sta-

bility) and ensuring that various conservation laws

hold (for example, that energy is conserved). The

approximation’s required precision depends on the

nature of the phenomenon you’re simulating. For

example, new problems can arise when you inte-

grate approximations of a system’s different aspects.

Suddenly, an approximation that was perfectly ade-

quate for standalone use might not be good enough

for the integrated simulation. Identifying and evalu-

ating an algorithm’s quality is a challenge. One sci-

entist we spoke with said that algorithmic defects

are much more signiﬁcant than coding defects.

Validating simulation codes is an enormous

challenge. In principle, you can validate a code by

comparing the simulation output with a physical

experiment’s results. In practice, because simula-

tions are written for domains in which experiments

are prohibitively expensive or impossible, validation

is very difﬁcult. Entire scientiﬁc programs, costing

hundreds of millions of dollars per year for many

years, have been built around experimental valida-

tion of large codes.

Conclusion: Debugging and validation are qual-

itatively different for HPC than for traditional soft-

ware development.

Skepticism of new technologies

I hate MPI, I hate C++. [But} if I had to

choose again, I would probably choose the

same.

Our codes are much larger and more com-

plex than the “toy” programs normally used

in [classroom settings]. We would like to see

a number of large workhorse applications

converted and benchmarked.

The scientists have a cynical view of new technol-

ogies because the history of HPC is littered with

new technologies that promised increased scien-

tiﬁc productivity but are no longer available. Some

of this skepticism is also due to the long life of

HPC codes; frequently, a code will have a 30-year

life cycle. Because of this long life, scientists will

embrace a new technology only if they believe it

will survive in the long term. This explains MPI’s

widespread popularity, despite constant grumbling

about its difﬁculty.

Scientiﬁc programmers often develop code such

that they can plug in different technologies to evalu-

ate them. For example, when MPI was new in the

1990s, many groups were cautious about its long-

term prospects and added it to their code alongside

existing message-passing libraries. As MPI became

widely used and trusted, these older libraries were

retired. Similar patterns have been observed with

solver libraries, I/O libraries, and tracing tools.

The languages being developed in the Darpa

HPCS program were intended to extend the fron-

tiers of what’s possible in today’s machines. So, we

sought practitioners working on very large codes

running on very large machines. Because of the

time they’ve already invested in their codes and

their need for long-lived codes, they all expressed

great trepidation at the prospect of rewriting a code

in a new language.

Conclusion: A new technology that can coexist

with older ones has a greater chance of success than

one requiring complete buy-in at the beginning.

Shared, centralized computing resources

The problem with debugging, of course, is

that you want to rerun and rerun. The whole

concept of a batch queue would make that a

week-long process. Whereas, on a dedicated

weekend, in a matter of hours you can pound

out 10 or 20 different runs of enormous size

and understand where the logic is going wrong.

Because of HPC systems’ cost, complexity, and

size, they’re typically located at HPC centers and

shared among user groups, with batch scheduling

to coordinate executions. Users submit their jobs

to a queue with a request for a certain number of

processors and maximum execution time. This in-

formation is used to determine when to schedule

the job. If the time estimate is too low, the job will

be preemptively terminated; if it’s too high, the job

will wait in the queue longer than necessary.

Because these systems are shared resources,

scientists are physically remote from the comput-

ers they use. So, potentially useful tools that were

designed to be interactive become unusably slow

and are soon discarded because they don’t take

6 I E E E S O F T W A R E w w w . c o m p u t e r. o r g / s o f t w a r e

into account the long latency times of remote

connections. Unfortunately for scientists, using

an HPC system typically means interacting with

the batch queue.

Debugging batch-scheduled jobs is also tedious

because the queue wait increases the turnaround

time. Some systems provide “interactive” nodes

that let users run smaller jobs without entering the

batch queue. Unfortunately, some defects mani-

fest themselves only when the code runs on large

numbers of processors.

Center policies that use system utilization as

a productivity metric might exacerbate the prob-

lem of the queue. Because utilization is inversely

proportional to availability, policies that favor

maximizing utilization will have longer waits.

As a counterexample, Lincoln Laboratories pro-

vides interactive access to all users and purchases

excess computing capacity to ensure that users’

computational needs are met.

Conclusion: Remote access precludes the use of

certain software tools, and system access policies

can signiﬁcantly affect productivity.

Mismatches between

computational science and SE

Repeatedly, we saw that SE technologies that don’t

take the scientists’ constraints into account fail or

aren’t adopted. The computer science community

isn’t necessarily aware of this lesson. Software

engineers collaborating with scientists should un-

derstand that the resistance to adoption of unfa-

miliar technologies is based on real experiences.

For example, concepts such as CMMI aren’t

well matched to the incremental nature of HPC

development.

Object-oriented languages

Java is for drinking.—parallel-programming

course syllabus

Developers on a project said, “we’re going to

use class library X that will hide all our array

operations and do all the right things.” … Im-

mediately, you ran into all sorts of issues. First

of all, C++, for example, was not transport-

able because compilers work in different ways

across these machines.

OO technologies are ﬁrmly entrenched in the SE

community. But in the HPC community, C and For-

tran still dominate, although C++ is used and one

project was exploring the use of Java. We also saw

some Python use, although never for performance-

critical code.

Fortran-like Matlab has seen widespread adop-

tion among scientists, although not necessarily in

the HPC community. To date, OO hasn’t been a

good ﬁt for HPC, even though the community has

adopted some concepts. One reason for the lack of

widespread adoption might be that OO-based lan-

guages such as C++ have been evolving much more

rapidly than C and Fortran in recent years and are

therefore riskier choices.

Conclusion: More study is needed to identify

why OO has seen such little adoption and whether

pockets exist in HPC where OO might be suitable.

Frameworks

If you talk about components in the Com-

mon Component Architecture or anywhere

else, components make very myopic deci-

sions. In order to achieve capability, you

need to make global decisions. If you allow

the components to make local decisions,

performance isn’t as good.

Frameworks provide programmers a higher level of

abstraction, but at the cost of adopting the frame-

work’s perspective on how to structure the code.

Example HPC frameworks include

Pooma (Parallel Object-Oriented Methods and

Applications), a novel C++ OO framework for

writing parallel codes that hides the low-level

details of parallelism, and

CCA (Common Component Architecture), for

implementing component-based HPC software.

Douglass Post and Richard Kendall tell how Los

Alamos National Laboratory sought to modern-

ize an old Fortran-based HPC code using Pooma.

Even though the project spent over 50 percent of its

code-development resources on Pooma, the frame-

work was slower than the original Fortran code. It

also lacked the ﬂexibility of the lower-level parallel

libraries to implement the desired physics.

The scientist in our studies don’t use frame-

works. Instead, they implement their own abstrac-

tion levels on top of MPI to hide low-level details,

and they develop their own component architecture

to couple their subsystems.

Of all the multiphysics applications we encoun-

tered, only one used any aspect of CCA technol-

ogy, and one of that application’s developers was

an active member of the CCA initiative. When we

■

Scientists

have yet

to be convinced

that reusing

existing

frameworks

will save them

more effort

than building

their own

from scratch.

Understanding the High-Performance-Computing Community: A Software Engineer's Perspective

Citations

How do scientists develop and use scientific software

Troubling Trends in Scientific Software Use

Engineering the Software for Understanding Climate Change

A checklist for integrating student empirical studies with research and teaching goals

Amesos2 and Belos: Direct and iterative solvers for large sparse linear systems

References

Software Development Environments for Scientific and Engineering Software: A Series of Case Studies

Parallel Programmer Productivity: A Case Study of Novice Parallel Programmers

Software Project Management and Quality Engineering Practices for Complex, Coupled Multiphysics, Massively Parallel Computational Simulations: Lessons Learned From ASCI

The ASC-Alliance Projects: A Case Study of Large-Scale Parallel Scientific Code Development

Annual World Bank Conference on Development Economics 2004 : Accelerating Development

Related Papers (5)

Software Development Environments for Scientific and Engineering Software: A Series of Case Studies

Developing Scientific Software

Engineering the Software for Understanding Climate Change

When Software Engineers Met Research Scientists: A Case Study

Dealing with Risk in Scientific Software Development

Frequently Asked Questions (16)

Q1. What are the contributions in "A software engineer’s perspective" ?

Q2. What are the keys to successful interactions?

Q3. What languages were used to couple different modules?

Q4. What are some common uses of HPC systems?

Q5. What are some examples of HPC frameworks?

Q6. What did they see in the reuse-in-the-small?

Q7. What was the significance of the HPCS program?

Q8. What did the scientists do to integrate the components of their multiphysics applications?

Q9. What is the dominant model of parallel programming?

Q10. What is the common way to describe a code that’s executed many times?

Q11. What did Darpa do to help improve productivity?

Q12. Why did they all express trepidation at the prospect of rewriting a code?

Q13. What is the lesson for software engineers collaborating with scientists?

Q14. Why have OO-based languages such as C and Fortran been so poorly adopted?

Q15. Why are application scientists not interested in performing machine-specific performance tuning?

Q16. How many runs can you pound out in a week?