Restarting Calculations from rwf Files – Gaussian

Having a long calculation terminated just because it ran out of time in the queue is very frustrating; even more so if restarting it from the last accesible point is hard to do.

I have recently performed some particularly demanding calculation: Basis Set Superposition Error (BSSE) with the Counterpoise method and second order Moller-Plesset perturbation theory calculation (MP2). The calculation ran out of time but I was able to restart it because I had the rwf file! My input looked a bit like this:

#p mp2/GEN counterpoise=2 maxdisk=200GB

So here is how it works.

The very first line of your calculation gives you the process ID number which is not necessarily the same as the PID given by the queue system (in fact, is not the same because the latter corresponds to the submitted script, not the instructions in it i.e. your calculation)

 Entering Gaussian System, Link 0=g09
 Initial command:
 /opt/SC/aplicaciones/g09-C.01/l1.exe /tmpu/joaqbf_g/joaqbf/Gau-38954.inp 
-scrdir=/tmpu/joaqbf_g/joaqbf/
 Entering Link 1 = /opt/SC/aplicaciones/g09-C.01/l1.exe PID=     38955.

(emphasis in red is mine)

This is the number you want to write down. You will need to find the corresponding rwf file (usually in your SCRATCH directory) as Gau-PID.rwf (in the aforementioned case, Gau-38955.rwf). If you are a bit paranoid like myself you want to copy and keep this file safe but be aware that these are very long files, in my case it was 175 GB long. Now you need to launch your calculation again with the following input:

%rwf=myfile.rwf
%nosave
%chk=myfile.chk

Title Card

# restart

rest of input

You can add all other controls to the Link0 section such as %nprocshared or %mem according to your needs.

I’m pretty sure it should work for other kinds of calculations in which taking from the checkpoint file is not as easy, so if you run into this kind of problems, its worth the try.

Advertisements

About joaquinbarroso

Theoretical chemist in his late thirties, in love with life and deeply in love with his woman and children. I love science, baseball, literature, movies (perhaps even in that order). I'm passionate about food and lately wines have become a major hobby. In a nutshell I'm filled with regrets but also with hope, and that is called "living".

Posted on October 13, 2015, in Computational Chemistry, Tricks, White papers and tagged , , , . Bookmark the permalink. 8 Comments.

  1. Is it possible to restart with additional parameters, for example
    # restart CPHF(MaxInv=15000)
    ?
    I haven’t had any luck with this.

    • I don’t think so. The restart option recovers the calculation from the read write file as it was. Depending on the calculation you’re doing you may want to first load the wavefunction with guess=read and then add more parameters, but don’t use the restart option.
      If you provide more details I may be able to make a better suggestion.
      Have a nice day!

  2. Thanks a lot for a great overview of calculations gaussian, your post helped me a lot.

  3. Marcel Louzada

    Does leaving the input file the same but including the %oldchk=$$$.chk. The copying over the chk file from the last job work? Does this work and if so is it limited to certain jobs.

    • I’m not sure I understand your question. Do you want to write over an old chk file or not? you can use these options:
      %chk=oldfilename.chk
      %NoSave

      this will use the previous chk file in your new calculation but leave it unchanged after the new calculation is done. You will not have access to all the data for the new calculation except for that in the log file.

      I hope this helps

  4. Diogo Medeiros

    Dear Joaquim,

    thanks for your explanations, they are always very useful.
    I am trying to perform a single point energy calculation at CCSD(T)/aug-cc-pVQZ level of Alkyl radical with G09. I run my calculations with a cluster that allows me to run computations for 96 hours maximum, but when I run out of time, I need to restart them. I have tried your method but every time I try to restart I get the following error message:

    Intra-link restart file has length 1070 expected 106.
    SetILR is confused about IRwILR.
    Error termination via Lnk1e in /apps/applications/gaussian/G09.D01/2/default/g09/l913.exe at Sun Aug 20 23:35:07 2017.

    After this error message, if I try to restart again (simply by repeating the same restart attempt) it runs, but if I check the convergence I can see it did not restart from where it was previously and the E(CORR) value different and more distant from the convergence.

    Examples:
    log file before my calculation ran out of time in the cluster.

    Iteration Nr. 6
    **********************
    DD1Dir will call FoFJK 3 times, MxPair= 70
    NAB= 105 NAA= 0 NBB= 0 NumPrc= 24.
    DE(Corr)= -0.91612505 E(CORR)= -194.95762662 Delta=-2.96D-05
    NORM(A)= 0.11537662D+01

    First iteration after restart:
    Iteration Nr. 1
    **********************
    DD1Dir will call FoFJK 2 times, MxPair= 106
    NAB= 105 NAA= 0 NBB= 0 NumPrc= 16.
    FoFJK: IHMeth= 1 ICntrl= 200 DoSepK=F KAlg= 0 I1Cent= 0 FoldK=F
    IRaf= 990000000 NMat= 106 IRICut= 132 DoRegI=T DoRafI=T ISym2E= 2.
    FoFCou: FMM=F IPFlag= 0 FMFlag= 100000 FMFlg1= 0
    NFxFlg= 0 DoJE=F BraDBF=F KetDBF=F FulRan=T
    wScrn= 0.000000 ICntrl= 200 IOpCl= 0 I1Cent= 0 NGrid= 0
    NMat0= 106 NMatS0= 0 NMatT0= 53 NMatD0= 106 NMtDS0= 0 NMtDT0= 0
    Integrals replicated using symmetry in FoFCou.
    MP4(R+Q)= 0.33372292D-01
    E3= -0.90312015D-01 EUMP3= -0.19501553117D+03
    E4(DQ)= -0.64362779D-02 UMP4(DQ)= -0.19502196745D+03
    E4(SDQ)= -0.13244209D-01 UMP4(SDQ)= -0.19502877538D+03
    DE(Corr)= -0.90945857 E(Corr)= -194.95096014
    NORM(A)= 0.11503023D+01

    Have you ever faced this problem? I appreciate you attention.
    Best regards,

    Diogo

    • Hello Diogo,

      I have unfortunately (or maybe fortunately) found this error. Restarting a CCSD calculation should only employ the word RESTART in the route section with no other keyword, but it should also make use of a read-write file. Specify one in the link0 section as:
      %RWF=/path/filename.rwf

      You can add several files to split the rwf but start with one. If you don’t specify the rwf file path then you cannot restart it from where it left because the checkpoint file doesn’t save the amplitudes of each iteration of the CCSD procedure.

      The gaussian manual says:
      “If you anticipate that a job may need restarting at some point, then you can assign a name to the read-write file in the original job:

      %RWF=myrwfile
      %NoSave
      %Chk=myfile
      # desired route section

      job file continues …
      By default, any file which is named with a % directive is retained when the job finishes. The %NoSave after the %RWF overrides this default, so that the read-write file will still be deleted if the job finishes normally (but will be left behind if the job terminates early). In this case, the %Chk directive will typically be placed after %NoSave since you usually want to retain it for future use.”

      I hope this helps. Have a nice day

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: