Modeling Particle Accelerators using C++ and the POOMA Framework

 

Graham A. Mark, William F. Humphrey, Julian C. Cummings, Timothy J. Cleland, Robert D. Ryne, and Salman Habib

 

Los Alamos National Laboratory, Los Alamos, NM

 

Introduction

 

This paper concerns the use of C++ and the POOMA Framework [Reynders et al., 1996] to model high-intensity particle accelerators. This work is part of the Computational Accelerator Physics Grand Challenge, sponsored by the U.S. Department of Energy. Another paper in this conference, "The DOE Grand Challenge in Computational Accelerator Physics," describes the goals and progress to date of the project.

 

This Grand Challenge project requires implementation of well-known numerical methods in electromagnetic particle-in-cell simulation on the latest parallel computer architectures, development of alternative computational approaches, and smooth interaction of multiple physics packages. For these reasons, we have chosen to use an object-oriented design for our linear accelerator code. By structuring our model in terms of abstractions ("objects" and "classes") relevant to accelerator physics, we can develop code that is relatively easy to understand, maintain and extend.

 

C++ Language Features

 

C++ has many features that make it an attractive language for object-oriented scientific application codes. C++ classes provide the means to define abstractions relevant to a particular problem domain. These classes can contain both data and methods that act on the data. The contents of a class may be either hidden from or visible to other code modules, a device that allows that class' internals to be encapsulated and its external interface to be fixed. In addition, C++ classes can be developed in a hierarchy, so that a child class inherits properties of parent classes. Inheritance can greatly aid in adapting and specializing an existing class for a new purpose.

 

Classes can be created to describe the features and behaviors of physical objects such as charged particles, particle beams, and beamline elements. Similarly, physics entities such as electric and magnetic fields and computational objects such as grids for spatial discretization can be represented by classes. A specific accelerator model may be constructed by creating the appropriate objects with appropriate internal state. This approach leads to a tremendous amount of flexibility and code reuse.

 

In addition to the object model of programming, C++ offers the capabilities of polymorphism and generic programming. Polymorphism allows an object to specify its behavior or characteristics when a particular function acts on it. This permits a very flexible style of coding in which methods are invoked on heterogeneous collections of objects; each object determines how the methods are to do their tasks. Polymorphism is achieved in C++ through virtual member functions, overloaded functions and operators, and the definition of "traits" classes using C++ templates. Templates underlie generic programming. They allow the C++ programmer to parameterize a class or function with an unspecified type. Specifying the parametric type creates a particular kind of object (an "instantiation"); different instantiations result from different parametric types. This technique allows the same piece of code to be reused in many different settings. The judicious use of templates and generic programming can produce compact but highly powerful code.

 

Parallel Programming and POOMA

 

These language features can be used to address a problem known as the "Parallel Platform Paradox": the time it takes to develop a typical physics application code for the latest supercomputer roughly equals the lifetime of that computer. This problem exists because custom code must be used if one is to exploit the novel features of the latest supercomputer. This custom code may include such things as message passing or load balancing algorithms, as well as architecture-specific data structures or numerical optimizations. Learning about the new system and writing the code takes time, however, and the newest supercomputer rather quickly becomes obsolete.

 

Any given problem domain requires certain commonly used data structures and operations. Representing these structures and operations efficiently in a program often requires substantial amounts of optimized architecture-specific code. It would make sense to collect these data structures and operations in a C++ class library, where the optimization and custom coding for the target machine would be done just once. Classes in the library would provide interfaces for the structures and operations and would simultaneously encapsulate them, keeping custom code out of the application code.

 

Another area that often requires custom code is parallelism. All modern supercomputers rely on parallel processing of some form. Machines differ, however, in exactly how they undertake parallelism and how their parallel architecture is best exploited. C++'s encapsulation can hide an algorithm's implementation, whether parallel or serial. Using a library of suitably encapsulated algorithms, the application developer can construct a physics model without worrying about the exact target architecture. The resulting code should be both portable and efficient.

 

POOMA, an acronym for "Parallel Object-Oriented Methods and Applications," is a C++ class library designed to provide all of these services, and thus to resolve the Parallel Platform Paradox. Application code that relies on POOMA can be compiled and run without any change wherever POOMA itself exists—on a parallel supercomputer, on a workstation cluster, or on desktop system. The problems of porting code and of efficient exploitation of each computer system become problems for the maintainers of the POOMA Framework. The person writing application code can concentrate on physics rather than on machine-specific programming. This division of labor speeds the development of new applications and broadcasts code optimizations across a rather broad class of physics codes.

 

Object-Oriented Accelerator Model

 

We began with a High Performance Fortran program written by R. Ryne and S. Habib. The program models transport of an intense charged particle beam in a magnetic quadrupole channel. The central routines of the code follow a collection of particles moving through successive elements in the channel. The electromagnetic field of each element, and the beam's self-field, affect the particles' positions and momenta. The program's major computational job is integrating the particles forward in time through each of the beamline elements using the charged-particle equations of motion in an electromagnetic field.

 

We defined classes that correspond to the main entities in the model: a class BeamlineElements, with subclasses Drift and Quadrupole to describe specific types of elements; a class called Beamline, consisting of a collection of BeamlineElements; and a Beam class that consists of a set of charged particles. To tie it all together, we created an Accelerator class that contains a Beamline and a Beam and describes our complete physical system.

 

Some of these classes--Beamline and BeamlineElement, for example—are useful only in an accelerator code, and we defined them from scratch for this project. Others, like the Beam class, rely on concepts that are useful in other kinds of physics applications. This is precisely the sort of general physics-based abstraction that POOMA provides. POOMA has a base class, ParticleBase, from which our Beam class was derived. ParticleBase provides a minimal description of a particle collection (a position and ID number for each particle), along with interfaces for a variety of useful operations such as data-parallel computations and interpolation to and from a grid. The Beam class inherits these features and adds data specific to our charged-particle representation.

 

Another POOMA class of this sort is the Field class, which represents a multidimensional array. The Field class provides several characteristics usually expected of field quantities in physics models, such as built-in boundary conditions, existence on a discretized mesh, and the ability to have scalar, vector or tensor fields. Moreover, the POOMA Field supports array syntax, stencil operations, differential operators, and reductions. In the accelerator code, we use the Field class to represent charge density, electrostatic potential, and the electric field. POOMA also contains an FFT class that operates on Fields and is used extensively within the field solver portion of the code.

 

POOMA's ParticleBase and Field classes contain parallel data structures, which are automatically distributed across processors. By using these classes, we avail ourselves of the many data-parallel operations that are built into POOMA. We can compile and run our code without change on any platform to which POOMA has been ported; POOMA will utilize that particular hardware and parallel system as efficiently as possible. In addition to this portable parallelism, POOMA applications such as ours can leverage off of the many built-in features of the physics-based abstractions contained in the POOMA Framework.

 

Performance Issues

 

Despite all of these benefits, the use of C++ in general and of POOMA in particular would make no sense if the performance of the resulting code were substantially worse than the performance of equivalent custom-coded Fortran. Until very recently, numerical codes written in C++ did not perform well in comparison to equivalent Fortran, but the situation is rapidly changing [Veldhuizen, 1997]. One reason for the poor performance of C++ has been the absence of good optimizing compilers. The KCC compiler from Kuck and Associates, Inc. (KAI) has filled that gap well, and other good optimizing compilers that are fully compliant with the ANSI C++ standard are on the horizon.

 

Another cause of poor performance is inherent in the C++ language. Consider the following code example:

 

class Matrix { /*...*/ };

Matrix A, B, C, D;

/* ... */

A = B + C + D;

 

Suppose that class Matrix overloads the operators "+" and "=" to perform elementwise addition and assignment. The final line will be evaluated in a series of binary operations. These will involve temporary Matrix objects that store intermediate results: tmp1 = B + C; tmp2 = tmp1 + D; A = tmp2. Creation and destruction of temporary objects can severely degrade performance, especially if each object contains a lot of data.

 

This problem has been recognized for some time, and various attempts have been made to solve it. The best solution to date is "expression templates" [Veldhuizen, 1995], a flexible and general device that avoids the creation of temporaries. POOMA relies heavily on expression templates to optimize data-parallel expressions involving particles and fields. POOMA applications thereby retain the benefits of overloaded operators with no loss in performance.

 

Project Status

 

Our goal is to produce a "dimension-independent" linear accelerator model capable of simulating beam behavior for a variety of beamline elements. We will use classes that are parameterized by dimension using C++ templates. This means that a single code base will support both 2D and 3D models. (Other dimensionalities are formally possible but have little practical use.) POOMA provides classes templated on dimension, so our accelerator code can use this feature and derive templated classes from POOMA classes as needed.

 

We have a 2D-prototype code implemented in C++ and POOMA. It supports a K-V or Gaussian initial beam distribution in the x-y plane and integrates the beam particles through a series of drift and quadrupole elements. The integration is performed using a split-operator approach. The beam's self-consistent electrostatic potential is computed by scattering charge density into a Field, performing an FFT, applying a Green's function in Fourier space, and inverting the FFT. POOMA provides simple functions for scattering the particle charge density, computing the gradient of the electrostatic potential, and gathering the resulting electric field at the particle positions.

 

Our results are in agreement with results of Ryne and Habib's 2D HPF code. The POOMA code is instrumented to send particle and field data to ACLVIS, a Los Alamos visualization package, during a code run. This provides real-time data visualization capabilities that enable users quickly to spot problems in code behavior and to study the effects of various beamline elements. Furthermore, POOMA provides a simple mechanism for profiling application codes with the Tau profiling tools [Tau]. Simple macros in the accelerator code generate timing data. Tau uses the data to chart the CPU time spent in each instrumented routine by each processor. We are using these profiling tools to analyze the performance of our code, and to compare it with the performance of the HPF code. Our most recent tests, run on an Origin 2000 symmetric multiprocessor computer, indicate that the POOMA code is comparable to the HPF code. More studies need to be done before specific performance data can be provided. Our future work includes such studies and recasting the current code into a generic templeted form.

 

References

 

Reynders, John V. W., Paul J. Hinker, Julian C. Cummings, Susan R. Atlas, Subhankar Banerjee, William F. Humphrey, Steve R. Karmesin, Katarzyna Keahey, M. Srikant, MaryDell Tholburn, 1996. "POOMA". In Parallel Programming Using C++, G . V. Wilson and P. Lu, eds. The MIT Press, Cambridge.

 

Tau. http://www.acl.lanl.gov/tau/

 

Veldhuizen, Todd, 1995. "Expression Templates." C++ Report 7:5 (June, 1995), 26-31. Reprinted in C++ Gems, Stanley B. Lippman, ed., 1996. Sigs Books, NY.

 

Veldhuizen, Todd, 1997. "Scientific Computing: C++ versus Fortran." http://monet.uwaterloo.ca/~tveldhui/DrDobbs2/drdobbs2.html

 

Principal Author

 

Graham A. Mark

gam@lanl.gov

505-667-8147