Tutorial: MPI-parallelized calculation of spectra

In this short tutorial we demonstrate how to accelerate the calculation of spectra via MPI multi-processing parallelization.

Load modules

In [1]:
## --- for benchmark: limit openmp to 1 thread (for parallel scipy/numpy routines)
##     Must be done before loading numpy to have an effect

import os
nthreads = 1
os.environ["MKL_NUM_THREADS"] = "{}".format(int(nthreads))
os.environ["NUMEXPR_NUM_THREADS"] = "{}".format(int(nthreads))
os.environ["OMP_NUM_THREADS"] = "{}".format(int(nthreads))

## --- load pyGDM modules
from pyGDM2 import structures
from pyGDM2 import materials
from pyGDM2 import fields

from pyGDM2 import core

import numpy as np


## --- Note: It is not necessary to load mpi4py within the simulation script, this
## --- will be done automatically by pyGDM2 prior to the actual MPI-simulation.
## --- We do it however at this point to do some output to stdout only from
## --- the master process (rank == 0).
from mpi4py import MPI
rank = MPI.COMM_WORLD.rank

Config of the simulation

We will demonstrate the MPI spectra calculation on the simple example of a small dielectric sphere

In [2]:
## ---------- Setup structure
mesh = 'cube'
step = 20.0
radius = 3.5
geometry = structures.sphere(step, R=radius, mesh=mesh)
material = materials.dummy(2.0)
n1, n2 = 1.0, 1.0
struct = structures.struct(step, geometry, material, n1,n2,
                           structures.get_normalization(mesh))

## ---------- Setup incident field
field_generator = fields.planewave
wavelengths = np.linspace(400, 800, 20)
kwargs = dict(theta = [0.0])
efield = fields.efield(field_generator, wavelengths=wavelengths, kwargs=kwargs)

## ---------- Simulation initialization
sim = core.simulation(struct, efield)
/home/wiecha/.local/lib/python2.7/site-packages/pyGDM2/structures.py:108: UserWarning: Minimum structure Z-value lies below substrate level! Shifting structure bottom to Z=step/2.
  " Shifting structure bottom to Z=step/2.")

Run the simulation with the MPI wrapper to core.scatter

The only difference to a non MPI-parallelized run of the simulation is, that we use core.scatter_mpi instead of core.scatter. scatter_mpi will automatically distribute the calculation of the different wavelengths in the spectrum on the available processes.

In [3]:
## --- mpi: print in process with rank=0, to avoid flooding of stdout
if rank == 0:
    print "performing MPI parallel simulation... "

core.scatter_mpi(sim)

if rank == 0:
    print "simulation done."

/home/wiecha/.local/lib/python2.7/site-packages/pyGDM2/core.py:408: UserWarning: Executing only one MPI process! Should be run using e.g. 'mpirun -n X python scriptname.py', where X is the number of parallel processes.
  " is the number of parallel processes.")
performing MPI parallel simulation...
timing 400.00nm - inversion: 150.1 ms, repropagation: 260.1ms (1 field configs), total: 410.4 ms
timing 421.05nm - inversion: 116.4 ms, repropagation: 123.9ms (1 field configs), total: 240.5 ms
timing 442.11nm - inversion: 107.5 ms, repropagation: 121.2ms (1 field configs), total: 228.8 ms
timing 463.16nm - inversion: 115.8 ms, repropagation: 123.1ms (1 field configs), total: 239.0 ms
timing 484.21nm - inversion: 129.8 ms, repropagation: 124.0ms (1 field configs), total: 253.9 ms
timing 505.26nm - inversion: 134.8 ms, repropagation: 124.4ms (1 field configs), total: 259.3 ms
timing 526.32nm - inversion: 100.3 ms, repropagation: 122.6ms (1 field configs), total: 223.0 ms
timing 547.37nm - inversion: 100.3 ms, repropagation: 120.8ms (1 field configs), total: 221.2 ms
timing 568.42nm - inversion: 132.5 ms, repropagation: 122.0ms (1 field configs), total: 254.6 ms
timing 589.47nm - inversion: 100.7 ms, repropagation: 124.1ms (1 field configs), total: 224.9 ms
timing 610.53nm - inversion: 133.0 ms, repropagation: 122.7ms (1 field configs), total: 255.9 ms
timing 631.58nm - inversion: 100.1 ms, repropagation: 120.8ms (1 field configs), total: 221.0 ms
timing 652.63nm - inversion: 135.8 ms, repropagation: 124.4ms (1 field configs), total: 260.2 ms
timing 673.68nm - inversion: 114.7 ms, repropagation: 121.8ms (1 field configs), total: 236.6 ms
timing 694.74nm - inversion: 135.6 ms, repropagation: 127.8ms (1 field configs), total: 263.5 ms
timing 715.79nm - inversion: 112.8 ms, repropagation: 123.6ms (1 field configs), total: 236.5 ms
timing 736.84nm - inversion: 119.2 ms, repropagation: 121.5ms (1 field configs), total: 240.8 ms
timing 757.89nm - inversion: 135.6 ms, repropagation: 122.7ms (1 field configs), total: 258.5 ms
timing 778.95nm - inversion: 102.0 ms, repropagation: 125.0ms (1 field configs), total: 227.1 ms
timing 800.00nm - inversion: 100.3 ms, repropagation: 122.0ms (1 field configs), total: 222.5 ms
simulation done.

IMPORTANT NOTE: In order to be run by MPI, the script needs to be executed by the program mpirun:

$ mpirun -n 4 python pygdm_script_using_mpi.py

where in this example, the argument “-n 4” tells MPI to run 4 parallel processes.

Note: in case the number of wavelengths in the spectra is not divisable by the number of MPI processes, some MPI-processes will be idle for some time during execution. In this case scatter_mpi will return a warning:

UserWarning: Efficiency warning: Number of wavelengths (20) not divisable by Nr of processes (3)!

Timing of the above example

  • MPI-run: 1.25 s
  • sequential-run: 4.05 s
  • speed-up: x3.2

This should be more close to x4 in case of a larger simulation, where the MPI overhead becomes small compared to the simulation runtime.