National Aeronautics and Space Administration
Jet Propulsion Laboratory, California Institute of Technology
Department of Physics and Astronomy
University of California, Los Angeles
and
National Aeronautics and Space Administration
Jet Propulsion Laboratory, California Institute of Technology
Department of Computer Science
Rensselaer Polytechnic Institute
Abstract
One goal of the Numerical Turbulent Transport Project is to model a tokamak (fusion) plasma with 10^8-10^9 particles, to explain anomalous transport of particles and energy. Since this ambitious high performance computing and communications (HPCC) project involves multiple institutions, and multidisciplinary collaborations, several project members have been investigating object-oriented techniques for designing particle-in-cell (PIC) codes. We summarize our experiences in this area using the modern constructs of the Fortran 90 programming language [2,3,4,7,8].
The Fortran 90 programming language [5] addresses
the needs of modern scientific programming by providing features that
raise the level of abstraction, without sacrificing performance.
Consider a 3D parallel plasma particle-in-cell program in Fortran 77
which will typically define the particles, charge density field, force
field, and routines to push particles and deposit charge. This is a
segment of the main program where many details have been omitted.
dimension part(idimp, npmax), q(nx, ny, nzpmx)
dimension fx(nx, ny, nzpmx), fy(nx, ny, nzpmx), fz(nx, ny, nzpmx)
data qme, dt /-1.,.2/
call push(part,fx,fy,fz,npp,noff,qtme,dt,wke,nx,ny,idimp,npmax,nzpmx)
call dpost(part,q,npp,noff,qme,nx,ny,idimp,npmax,nzpmx)
Note that the arrays must be dimensioned at compile-time. Also
parameters must either be passed by reference, creating long argument
lists, or kept in common and exposed to inadvertent modification. Such
an organization is complex to maintain, especially as codes are
modified for new experiments.
Using the new features of Fortran 90, abstractions can be introduced
that clarify the organization of the code. The Fortran 90 version is
more readable while designed for modification and extension.
use partition_module ; use plasma_module
type (species) :: electrons
type (scalarfield) :: charge_density
type (vectorfield) :: efield
type (slabpartition) :: edges
real :: dt = .2
call plasma_particle_push( electrons, efield, edges, dt )
call plasma_deposit_charge( electrons, charge_density, edges )
This style of object-oriented programming, where the basic data unit
is an "object" that shields its internal data from misuse by providing
public routines to manipulate it, allows such a code to be designed
and written. Object-Oriented programming clarifies software while
increasing safety and communication among developers, but its benefits
are only useful for sufficiently large and complex programs.
While Fortran 90 is not an object-oriented language, the new features allow most of these concepts to be modeled directly. (Some concepts are more complex to emulate.) In the following, we will describe how object-oriented concepts can be modeled in Fortran 90, the application of these ideas to plasma PIC programming on supercomputers, and the future of Fortran programming (represented by Fortran 2000) that will contain explicit object-oriented features.
These objects represent abstractions. Another important concept is the notion of inheritance, which allows new abstractions to be created by preserving features of existing abstractions. This allows objects to gain new features through some form of code reuse. Additionally, polymorphism allows routines to be applied to a variety of objects that share some relationship, but the specific action taken varies dynamically based on the object's type. These ideas are mechanisms for writing applications that more closely represent the problem at hand. As a result, a number of programming languages support OOP concepts in some manner.
Fortran 90 is well-known for introducing array-syntax operations and dynamic memory management. While useful, this represents a small subset of the powerful new features available for scientific programming. Fortran 90 is backward compatible with Fortran 77 and, since it is a subset of High Performance Fortran (HPF), it provides a migration path for data-parallel programming. Fortran 90 type-checks parameters to routines, so passing the wrong arguments to a function will generate a compile-time error. Additionally, the automatic creation of implicit variables can be suppressed reducing unexpected results.
However, more powerful features include derived-types, which allow user-defined types to be created from existing intrinsic types and previously defined derived-types. Many forms of dynamic memory management operations are now available, including dynamic arrays and pointers. These new Fortran 90 constructs are objects that know information such as their size, whether they have been allocated, and if they refer to valid data. Fortran 90 modules allow routines to be associated with types and data defined within the module. These modules can be used in various ways, to bring new functionality to program units. Components of the module can be private and/or public allowing interfaces to be constructed that control the accessibility of module components. Additionally, operator and routine overloading are supported (name reuse), allowing the proper routine to be called automatically based on the number and types of the arguments. Optional arguments are supported, as well as generic procedures that allow a single routine name to be used while the action taken differs based on the type of the parameter. All of these features can be used to support an object-oriented programming methodology [2,3,6].
A portion of the species module, shown below, illustrates how data and
routines can be encapsulated using object-oriented concepts. This
module defines the particle collection, where the interface to the
particle Maxwellian distribution routine is included.
module species_module
use distribution_module ; use partition_module
implicit none
type particle
private
real :: x, y, z, vx, vy, vz ! position & velocity components
end type particle
type species
real :: qm, qbm, ek ! charge, charge/mass, kinetic energy
integer :: nop, npp ! # of particles, # of particles on PE
type (particle), dimension(:), pointer :: p ! particle collection (dynamic)
end type species
contains
subroutine species_distribution(this, edges, distf)
type (species), intent (out) :: this
type (slabpartition), intent (in) :: edges
type (distfcn), intent (in) :: distf
! SUBROUTINE BODY
end subroutine species_distribution
! ADDITIONAL MODULE MEMBER ROUTINES
end module species_module
Some OOP concepts, such as inheritance, had limited usefulness while
run-time polymorphism was used infrequently. Our experience has shown
that these features, while sometimes appropriate for general purpose
programming, do not seem to be as useful in scientific programming.
Well-defined interfaces, that support manipulation of abstractions,
were more important. More details on the overall structure of the
code can be found in [7].
The wall-clock execution times for the 3D parallel PIC code written in Fortran 90, Fortran 77, and C++ are illustrated in the table below. Although our experience has been that Fortran 90 continually outperforms C++ on complete programs, generally by a factor of two, others have performance results that indicate that C++ can sometimes outperform Fortran 90 on some computational kernels [1]. (In these cases, "expression templates" are introduced as a compile-time optimization to speed up complicated array operations.)
Cornell Theory Center IBM SP2 (32 PEs, 8 Millon Particles) | ||||
3D Plasma PIC | Time (seconds) | |||
Language | Compiler | RS6000 Model 390 Chips | P2SC Super Chips | P2SC Optimized |
Fortran 77 | IBM xlf | N/A | 668.03 | 537.95 |
Fortran 90 | IBM xlf90 | 1226.75 | 622.60 | 488.88 |
C++ | KAI KCC | 2817.62 | 1316.20 | 1173.31 |
3D Parallel Plasma PIC Experiments - CPU Times for Various Compilers
(KAI C++, IBM F90, and IBM F77 with IBM MPI on Cornell's SP2)
The most aggressive compiler options produced the fastest timings and are represented in the table. The KAI C++ compiler with +K3 -O3 --abstract_pointer spent over 2 hours in the compilation process. The IBM F90 compiler with -O3 -qlanglvl=90std -qstrict -qalias=noaryovrlp used 5 minutes for compilation. (The KAI compiler is generally considered the most efficient C++ compiler when objects are used. This compiler generated slightly faster executables than the IBM C++ compiler.) Applying hardware optimization switches (-qarch=pwr2 -qtune=pwr2) introduced additional performance improvements specific to the P2SC processors. These timings are illustrated in yellow, the first timing shown for each language.
We have found Fortran 90 very useful, and generally safer with higher performance than C++ and sometimes Fortran 77, for large problems on supercomputers. Fortran 90 derived-type objects improved cache utilization, for large problems, over Fortran 77. (The C++ and Fortran 90 objects had the same storage organization.) Fortran 90 is less powerful than C++, since it has fewer features and those available can be restricted to enhance performance, but many of the advanced features of C++ have not been required in scientific computing. Nevertheless, advanced C++ features may be more appropriate for other problem domains [4,7].
Our web site provides many additional examples of how object-oriented concepts can be modeled in Fortran 90 [6]. Many concepts, like encapsulation of data and routines can be represented directly. Other features, such as inheritance and polymorphism, must be emulated with a combination of Fortran 90's existing features and user-defined constructs. (Procedures for doing this are also included at the web site.) Additionally, an evaluation of compilers is included to provide users with an impartial comparison of products from different vendors.
The Fortran 2000 standard has been defined to include explicit object-oriented features including single inheritance, polymorphic objects, parameterized derived-types, constructors, and destructors. Other features, such as interoperability with C will simplify support for advanced graphics within Fortran 2000.
Parallel programming with MPI and supercomputers is possible with Fortran 90. However, MPI does not explicitly support Fortran 90 style arrays, so structures such as array subsections cannot be passed to MPI routines. The Fortran 90 programs were longer than the Fortran 77 versions (but more readable), and much shorter than the C++ programs because features useful for scientific programming are not automatically available in C++.
Jet Propulsion Laboratory
California Institute of Technology
MS 168-522
4800 Oak Grove Drive
Pasadena, CA 91109-8099, U.S.A.
email: Charles.D.Norton@jpl.nasa.gov
phone: (818) 393-3920