Multiprocessing with vasp.py
One of the reasons we use scripting is to setup, run and analyze a lot of calculations. Usually though, our scripts are serial, which means if we want to do ten things, we do them one at a time. We consider here how to use multiprocessing to speed things up. We consider a simple equation of state where we want 10 calculations at lattice constants from ±10% of a base value. This isn't a definitive guide on multiprocessing or threads, just some working examples of what they do.
1 The serial case
We start with the serial case where each result is read one after another. This will give us a base case for how long it takes to read results when they are finished.
from vasp import Vasp from ase import Atom, Atoms import time import numpy as np t0 = time.time() factors = np.linspace(0.9, 1.1, 10) a = 3.6 calcs = [Vasp('mp/serial-Cu-{}'.format(i), xc='pbe', encut=300, kpts=[6, 6, 6], isym=2, atoms=Atoms([Atom('Cu',(0, 0, 0))], cell=0.5 * a * f * np.array([[1.0, 1.0, 0.0], [0.0, 1.0, 1.0], [1.0, 0.0, 1.0]]))) for i, f in enumerate(factors)] energies = [calc.potential_energy for calc in calcs] print(energies) print('Elapsed-time = {0:1.1f} seconds'.format(time.time() - t0))
[-2.84280715, -3.21541663, -3.46563573, -3.62036549, -3.69449845, -3.71605521, -3.69440019, -3.64156925, -3.5622029, -3.464047] Elapsed-time = 4.6 seconds
Just under 5 seconds. Doesn't seem like a big deal, but suppose there were 1000 results, or you need to do this over and over. It might make sense to shorten this if possible. One way might be to run these in parallel. Each calculation is independent, so this should be easy.
2 Multiprocessing case
See https://docs.python.org/2/library/multiprocessing.html. The key to multiprocessing is to create a function that does the work, and then create the threads to run the function. There are a couple of examples here. It might seem like more processes would finish faster, but it isn't always the case. It takes some time to create processes/threads. I found that just 2-3 processes gives close to the best performance, and more is not better for this example. This may be specific to this example, since it doesn't seem to take long to get the energy anyway.
2.1 Pool example
from vasp import Vasp from ase import Atom, Atoms import multiprocessing import time import numpy as np def worker(arg): i, f = arg a = 3.6 atoms = Atoms([Atom('Cu',(0, 0, 0))], cell=0.5 * a * f * np.array([[1.0, 1.0, 0.0], [0.0, 1.0, 1.0], [1.0, 0.0, 1.0]])) calc = Vasp('mp/mp-pool-Cu-{}'.format(i), xc='pbe', encut=300, kpts=[6, 6, 6], isym=2, atoms=atoms) return calc.potential_energy if __name__ == '__main__': t0 = time.time() factors = np.linspace(0.9, 1.1, 10) from multiprocessing import Pool p = Pool(2) energies = p.map(worker, enumerate(factors)) print energies print('Elapsed-time = {0:1.1f} seconds'.format(time.time() - t0))
[-2.84280715, -3.21541663, -3.46563573, -3.62036549, -3.69449845, -3.71605521, -3.69440019, -3.64156925, -3.5622029, -3.464047] Elapsed-time = 2.8 seconds
2.2 ThreadPool example
This is basically like using threads but with multiprocessing.
from vasp import Vasp from ase import Atom, Atoms import multiprocessing import time import numpy as np def worker(i, f): a = 3.6 atoms = Atoms([Atom('Cu',(0, 0, 0))], cell=0.5 * a * f * np.array([[1.0, 1.0, 0.0], [0.0, 1.0, 1.0], [1.0, 0.0, 1.0]])) calc = Vasp('mp/mp-threadpool-Cu-{}'.format(i), xc='pbe', encut=300, kpts=[6, 6, 6], isym=2, atoms=atoms) return calc.potential_energy t0 = time.time() factors = np.linspace(0.9, 1.1, 10) from multiprocessing.pool import ThreadPool pool = ThreadPool(processes=2) energies = [r.get() for r in [pool.apply_async(worker, (i, f)) for i, f in enumerate(factors)]] print(energies) print('Elapsed-time = {0:1.1f} seconds'.format(time.time() - t0))
[-2.84280715, -3.21541663, -3.46563573, -3.62036549, -3.69449845, -3.71605521, -3.69440019, -3.64156925, -3.5622029, -3.464047] Elapsed-time = 3.3 seconds
Interesting, it isn't quite two times faster, but it is reproducibly faster.
3 Threading
Threads are a bit different than multiprocessing. For one, it doesn't come with a nice "pool" feature. We have to roll our own with a queue. I tried a few variations with this, and here is one that worked in the end. Another difference is we have to use a mutable list to capture the results. There is no way to "return" a value!
from vasp import Vasp from ase import Atom, Atoms import Queue import threading import time import numpy as np class Worker(threading.Thread): def __init__(self, queue): self.queue = queue threading.Thread.__init__(self) self.daemon = True def run(self): while 1: item = self.queue.get() f, energies, index = item a = 3.6 atoms = Atoms([Atom('Cu',(0, 0, 0))], cell=0.5 * a * f * np.array([[1.0, 1.0, 0.0], [0.0, 1.0, 1.0], [1.0, 0.0, 1.0]])) calc = Vasp('mp/queue-Cu-{}'.format(index), xc='pbe', encut=300, kpts=[6, 6, 6], isym=2, atoms=atoms) energies[index] = calc.potential_energy q.task_done() t0 = time.time() factors = np.linspace(0.9, 1.1, 10) # Setup our queue q = Queue.Queue() num_threads = 2 for i in range(num_threads): Worker(q).start() energies = [None for f in factors] for i, f in enumerate(factors): q.put([f, energies, i]) q.join() print(energies) print('Elapsed-time = {0:1.1f} seconds'.format(time.time() - t0))
[-2.84280715, -3.21541663, -3.46563573, -3.62036549, -3.69449845, -3.71605521, -3.69440019, -3.64156925, -3.5622029, -3.464047] Elapsed-time = 3.2 seconds
Again, not notably faster if you add more threads. Your mileage could vary. The ThreadPoolExecutor is a little more convenient to use here, as it allows you to return something from the function. The syntax is a little heavy with the context manager.
from vasp import Vasp from ase import Atom, Atoms import time import numpy as np from concurrent import futures def worker(arg): f, index = arg a = 3.6 calc = Vasp('mp/futures-Cu-{}'.format(index), xc='pbe', encut=300, kpts=[6, 6, 6], isym=2, atoms=Atoms([Atom('Cu',(0, 0, 0))], cell=0.5 * a * f * np.array([[1.0, 1.0, 0.0], [0.0, 1.0, 1.0], [1.0, 0.0, 1.0]]))) return calc.potential_energy t0 = time.time() factors = np.linspace(0.9, 1.1, 10) with futures.ThreadPoolExecutor(max_workers=2) as executor: fs = [executor.submit(worker, (f, i)) for i, f in enumerate(factors)] energies = [f.result() for f in futures.as_completed(fs)] print(energies) print('Elapsed-time = {0:1.1f} seconds'.format(time.time() - t0))
[-2.84280715, -3.21541663, -3.46563573, -3.62036549, -3.69449845, -3.71605521, -3.69440019, -3.64156925, -3.464047, -3.5622029] Elapsed-time = 3.3 seconds
4 Summary
This example didn't show the benefits of threading/multiprocessing very well. It is hard to get even two times faster with this example, and it plateaus out after 2-3 threads. I guess this is a feature of how fast each function evaluation is in the first place. Still, if you have code that is iterative and the iterations are independent and long-running, you might try this to see if you can get a real-time speed up. Take care if you are on a multi-user machine though, the other users might not appreciate you using all the cores and putting a heavy load on them ;)