Restarting Calculations from rwf Files – Gaussian

Having a long calculation terminated just because it ran out of time in the queue is very frustrating; even more so if restarting it from the last accesible point is hard to do.

I have recently performed some particularly demanding calculation: Basis Set Superposition Error (BSSE) with the Counterpoise method and second order Moller-Plesset perturbation theory calculation (MP2). The calculation ran out of time but I was able to restart it because I had the rwf file! My input looked a bit like this:

#p mp2/GEN counterpoise=2 maxdisk=200GB

So here is how it works.

The very first line of your calculation gives you the process ID number which is not necessarily the same as the PID given by the queue system (in fact, is not the same because the latter corresponds to the submitted script, not the instructions in it i.e. your calculation)

 Entering Gaussian System, Link 0=g09
 Initial command:
 /opt/SC/aplicaciones/g09-C.01/l1.exe /tmpu/joaqbf_g/joaqbf/Gau-38954.inp 
-scrdir=/tmpu/joaqbf_g/joaqbf/
 Entering Link 1 = /opt/SC/aplicaciones/g09-C.01/l1.exe PID=     38955.

(emphasis in red is mine)

This is the number you want to write down. You will need to find the corresponding rwf file (usually in your SCRATCH directory) as Gau-PID.rwf (in the aforementioned case, Gau-38955.rwf). If you are a bit paranoid like myself you want to copy and keep this file safe but be aware that these are very long files, in my case it was 175 GB long. Now you need to launch your calculation again with the following input:

%rwf=myfile.rwf
%nosave
%chk=myfile.chk

Title Card

# restart

rest of input

You can add all other controls to the Link0 section such as %nprocshared or %mem according to your needs.

I’m pretty sure it should work for other kinds of calculations in which taking from the checkpoint file is not as easy, so if you run into this kind of problems, its worth the try.

About joaquinbarroso

Theoretical chemist in his early forties, in love with life and deeply in love with his woman and children. I love science, baseball, literature, movies (perhaps even in that order). I'm passionate about food and lately wines have become a major hobby. In a nutshell I'm filled with regrets but also with hope, and that is called "living".

Posted on October 13, 2015, in Computational Chemistry, Tricks, White papers and tagged , , , . Bookmark the permalink. 13 Comments.

  1. Is it possible to restart with additional parameters, for example
    # restart CPHF(MaxInv=15000)
    ?
    I haven’t had any luck with this.

    • I don’t think so. The restart option recovers the calculation from the read write file as it was. Depending on the calculation you’re doing you may want to first load the wavefunction with guess=read and then add more parameters, but don’t use the restart option.
      If you provide more details I may be able to make a better suggestion.
      Have a nice day!

  2. Thanks a lot for a great overview of calculations gaussian, your post helped me a lot.

  3. Marcel Louzada

    Does leaving the input file the same but including the %oldchk=$$$.chk. The copying over the chk file from the last job work? Does this work and if so is it limited to certain jobs.

    • I’m not sure I understand your question. Do you want to write over an old chk file or not? you can use these options:
      %chk=oldfilename.chk
      %NoSave

      this will use the previous chk file in your new calculation but leave it unchanged after the new calculation is done. You will not have access to all the data for the new calculation except for that in the log file.

      I hope this helps

  4. Diogo Medeiros

    Dear Joaquim,

    thanks for your explanations, they are always very useful.
    I am trying to perform a single point energy calculation at CCSD(T)/aug-cc-pVQZ level of Alkyl radical with G09. I run my calculations with a cluster that allows me to run computations for 96 hours maximum, but when I run out of time, I need to restart them. I have tried your method but every time I try to restart I get the following error message:

    Intra-link restart file has length 1070 expected 106.
    SetILR is confused about IRwILR.
    Error termination via Lnk1e in /apps/applications/gaussian/G09.D01/2/default/g09/l913.exe at Sun Aug 20 23:35:07 2017.

    After this error message, if I try to restart again (simply by repeating the same restart attempt) it runs, but if I check the convergence I can see it did not restart from where it was previously and the E(CORR) value different and more distant from the convergence.

    Examples:
    log file before my calculation ran out of time in the cluster.

    Iteration Nr. 6
    **********************
    DD1Dir will call FoFJK 3 times, MxPair= 70
    NAB= 105 NAA= 0 NBB= 0 NumPrc= 24.
    DE(Corr)= -0.91612505 E(CORR)= -194.95762662 Delta=-2.96D-05
    NORM(A)= 0.11537662D+01

    First iteration after restart:
    Iteration Nr. 1
    **********************
    DD1Dir will call FoFJK 2 times, MxPair= 106
    NAB= 105 NAA= 0 NBB= 0 NumPrc= 16.
    FoFJK: IHMeth= 1 ICntrl= 200 DoSepK=F KAlg= 0 I1Cent= 0 FoldK=F
    IRaf= 990000000 NMat= 106 IRICut= 132 DoRegI=T DoRafI=T ISym2E= 2.
    FoFCou: FMM=F IPFlag= 0 FMFlag= 100000 FMFlg1= 0
    NFxFlg= 0 DoJE=F BraDBF=F KetDBF=F FulRan=T
    wScrn= 0.000000 ICntrl= 200 IOpCl= 0 I1Cent= 0 NGrid= 0
    NMat0= 106 NMatS0= 0 NMatT0= 53 NMatD0= 106 NMtDS0= 0 NMtDT0= 0
    Integrals replicated using symmetry in FoFCou.
    MP4(R+Q)= 0.33372292D-01
    E3= -0.90312015D-01 EUMP3= -0.19501553117D+03
    E4(DQ)= -0.64362779D-02 UMP4(DQ)= -0.19502196745D+03
    E4(SDQ)= -0.13244209D-01 UMP4(SDQ)= -0.19502877538D+03
    DE(Corr)= -0.90945857 E(Corr)= -194.95096014
    NORM(A)= 0.11503023D+01

    Have you ever faced this problem? I appreciate you attention.
    Best regards,

    Diogo

    • Hello Diogo,

      I have unfortunately (or maybe fortunately) found this error. Restarting a CCSD calculation should only employ the word RESTART in the route section with no other keyword, but it should also make use of a read-write file. Specify one in the link0 section as:
      %RWF=/path/filename.rwf

      You can add several files to split the rwf but start with one. If you don’t specify the rwf file path then you cannot restart it from where it left because the checkpoint file doesn’t save the amplitudes of each iteration of the CCSD procedure.

      The gaussian manual says:
      “If you anticipate that a job may need restarting at some point, then you can assign a name to the read-write file in the original job:

      %RWF=myrwfile
      %NoSave
      %Chk=myfile
      # desired route section

      job file continues …
      By default, any file which is named with a % directive is retained when the job finishes. The %NoSave after the %RWF overrides this default, so that the read-write file will still be deleted if the job finishes normally (but will be left behind if the job terminates early). In this case, the %Chk directive will typically be placed after %NoSave since you usually want to retain it for future use.”

      I hope this helps. Have a nice day

  5. Hello Joaquin,

    I have tried to restart a frequency calculation with:

    %RWF=myrwfile
    %Chk=myfile
    # Restart

    But it ends immediately with the error message being:
    No route information found on RWF

    But I have given the correct pathway of the rwf. file above and the file still is in this folder.
    Do you have any idea why it doesn´t work?

    Thanks in davance and best wishes,

    Fred

    • Hello Fred,

      Did you include the %NoSave line in your original input file right after the %rwf? I sometimes get confused about Gaussian’s logic, so if your freq calculation ended abnormally the rwf should be kept intact despite having the %nosave option. However I think restarting from rwf is only for analytical freqs and not numerical; if this is your case you have a better chance of restarting it from chk file but in that case you need to specify the whole route section.

      Let me know if it worked, please.

      Best wishes to you too

      • Hi Joqauin,

        No i didn´t include the %NoSave line in the original File.
        I was doing an analytical frequency calculation. My jobs were accidentally killed by the administrator of our hpc.
        This particular job I couldn´t restart, even though I had the rwf File and put in the correct path and rwf Filename (double checked it).
        For some reason it seems the chk-File was overwritten. Since the file date was changed.
        Maybe I made a misstake when restarting it for the first time. Allthough that doesn´t really explain the error message.
        For all the other files i was able to restart them with %rwf=/path/myfile and the # restart command.
        Therefore I can´t tell for sure what was the problem.

        Thanks for your reply,

        Thomas

  6. THANK YOU!!! I just restarted a few long-running (month-plus) energy and counterpoise calculations that were knocked out by a power outage. THANK YOU!!! -Neal

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: