Having a long calculation terminated just because it ran out of time in the queue is very frustrating; even more so if restarting it from the last accesible point is hard to do.

I have recently performed some particularly demanding calculation: Basis Set Superposition Error (BSSE) with the Counterpoise method and second order Moller-Plesset perturbation theory calculation (MP2). The calculation ran out of time but I was able to restart it because I had the rwf file! My input looked a bit like this:

#p mp2/GEN counterpoise=2 maxdisk=200GB

So here is how it works.

The very first line of your calculation gives you the process ID number which is not necessarily the same as the PID given by the queue system (in fact, is not the same because the latter corresponds to the submitted script, not the instructions in it i.e. your calculation)

 Entering Gaussian System, Link 0=g09
 Initial command:
 /opt/SC/aplicaciones/g09-C.01/l1.exe /tmpu/joaqbf_g/joaqbf/Gau-38954.inp 
-scrdir=/tmpu/joaqbf_g/joaqbf/
 Entering Link 1 = /opt/SC/aplicaciones/g09-C.01/l1.exe PID=     38955.

(emphasis in red is mine)

This is the number you want to write down. You will need to find the corresponding rwf file (usually in your SCRATCH directory) as Gau-PID.rwf (in the aforementioned case, Gau-38955.rwf). If you are a bit paranoid like myself you want to copy and keep this file safe but be aware that these are very long files, in my case it was 175 GB long. Now you need to launch your calculation again with the following input:

%rwf=myfile.rwf
%nosave
%chk=myfile.chk

Title Card

# restart

rest of input

You can add all other controls to the Link0 section such as %nprocshared or %mem according to your needs.

I’m pretty sure it should work for other kinds of calculations in which taking from the checkpoint file is not as easy, so if you run into this kind of problems, its worth the try.

Advertisement