Having a long calculation terminated just because it ran out of time in the queue is very frustrating; even more so if restarting it from the last accesible point is hard to do.
I have recently performed some particularly demanding calculation: Basis Set Superposition Error (BSSE) with the Counterpoise method and second order Moller-Plesset perturbation theory calculation (MP2). The calculation ran out of time but I was able to restart it because I had the rwf file! My input looked a bit like this:
#p mp2/GEN counterpoise=2 maxdisk=200GB
So here is how it works.
The very first line of your calculation gives you the process ID number which is not necessarily the same as the PID given by the queue system (in fact, is not the same because the latter corresponds to the submitted script, not the instructions in it i.e. your calculation)
Entering Gaussian System, Link 0=g09
Initial command:
/opt/SC/aplicaciones/g09-C.01/l1.exe /tmpu/joaqbf_g/joaqbf/Gau-38954.inp
-scrdir=/tmpu/joaqbf_g/joaqbf/
Entering Link 1 = /opt/SC/aplicaciones/g09-C.01/l1.exe PID= 38955.
(emphasis in red is mine)
This is the number you want to write down. You will need to find the corresponding rwf file (usually in your SCRATCH directory) as Gau-PID.rwf (in the aforementioned case, Gau-38955.rwf). If you are a bit paranoid like myself you want to copy and keep this file safe but be aware that these are very long files, in my case it was 175 GB long. Now you need to launch your calculation again with the following input:
%rwf=myfile.rwf %nosave %chk=myfile.chk Title Card # restart rest of input
You can add all other controls to the Link0 section such as %nprocshared or %mem according to your needs.
I’m pretty sure it should work for other kinds of calculations in which taking from the checkpoint file is not as easy, so if you run into this kind of problems, its worth the try.
Is it possible to restart with additional parameters, for example
# restart CPHF(MaxInv=15000)
?
I haven’t had any luck with this.
I don’t think so. The restart option recovers the calculation from the read write file as it was. Depending on the calculation you’re doing you may want to first load the wavefunction with guess=read and then add more parameters, but don’t use the restart option.
If you provide more details I may be able to make a better suggestion.
Have a nice day!
Thanks a lot for a great overview of calculations gaussian, your post helped me a lot.
thats great one
http://goo.gl/tfeKBm
Does leaving the input file the same but including the %oldchk=$$$.chk. The copying over the chk file from the last job work? Does this work and if so is it limited to certain jobs.
I’m not sure I understand your question. Do you want to write over an old chk file or not? you can use these options:
%chk=oldfilename.chk
%NoSave
this will use the previous chk file in your new calculation but leave it unchanged after the new calculation is done. You will not have access to all the data for the new calculation except for that in the log file.
I hope this helps
Dear Joaquim,
thanks for your explanations, they are always very useful.
I am trying to perform a single point energy calculation at CCSD(T)/aug-cc-pVQZ level of Alkyl radical with G09. I run my calculations with a cluster that allows me to run computations for 96 hours maximum, but when I run out of time, I need to restart them. I have tried your method but every time I try to restart I get the following error message:
Intra-link restart file has length 1070 expected 106.
SetILR is confused about IRwILR.
Error termination via Lnk1e in /apps/applications/gaussian/G09.D01/2/default/g09/l913.exe at Sun Aug 20 23:35:07 2017.
After this error message, if I try to restart again (simply by repeating the same restart attempt) it runs, but if I check the convergence I can see it did not restart from where it was previously and the E(CORR) value different and more distant from the convergence.
Examples:
log file before my calculation ran out of time in the cluster.
Iteration Nr. 6
**********************
DD1Dir will call FoFJK 3 times, MxPair= 70
NAB= 105 NAA= 0 NBB= 0 NumPrc= 24.
DE(Corr)= -0.91612505 E(CORR)= -194.95762662 Delta=-2.96D-05
NORM(A)= 0.11537662D+01
First iteration after restart:
Iteration Nr. 1
**********************
DD1Dir will call FoFJK 2 times, MxPair= 106
NAB= 105 NAA= 0 NBB= 0 NumPrc= 16.
FoFJK: IHMeth= 1 ICntrl= 200 DoSepK=F KAlg= 0 I1Cent= 0 FoldK=F
IRaf= 990000000 NMat= 106 IRICut= 132 DoRegI=T DoRafI=T ISym2E= 2.
FoFCou: FMM=F IPFlag= 0 FMFlag= 100000 FMFlg1= 0
NFxFlg= 0 DoJE=F BraDBF=F KetDBF=F FulRan=T
wScrn= 0.000000 ICntrl= 200 IOpCl= 0 I1Cent= 0 NGrid= 0
NMat0= 106 NMatS0= 0 NMatT0= 53 NMatD0= 106 NMtDS0= 0 NMtDT0= 0
Integrals replicated using symmetry in FoFCou.
MP4(R+Q)= 0.33372292D-01
E3= -0.90312015D-01 EUMP3= -0.19501553117D+03
E4(DQ)= -0.64362779D-02 UMP4(DQ)= -0.19502196745D+03
E4(SDQ)= -0.13244209D-01 UMP4(SDQ)= -0.19502877538D+03
DE(Corr)= -0.90945857 E(Corr)= -194.95096014
NORM(A)= 0.11503023D+01
Have you ever faced this problem? I appreciate you attention.
Best regards,
Diogo
Hello Diogo,
I have unfortunately (or maybe fortunately) found this error. Restarting a CCSD calculation should only employ the word RESTART in the route section with no other keyword, but it should also make use of a read-write file. Specify one in the link0 section as:
%RWF=/path/filename.rwf
You can add several files to split the rwf but start with one. If you don’t specify the rwf file path then you cannot restart it from where it left because the checkpoint file doesn’t save the amplitudes of each iteration of the CCSD procedure.
The gaussian manual says:
“If you anticipate that a job may need restarting at some point, then you can assign a name to the read-write file in the original job:
%RWF=myrwfile
%NoSave
%Chk=myfile
# desired route section
job file continues …
By default, any file which is named with a % directive is retained when the job finishes. The %NoSave after the %RWF overrides this default, so that the read-write file will still be deleted if the job finishes normally (but will be left behind if the job terminates early). In this case, the %Chk directive will typically be placed after %NoSave since you usually want to retain it for future use.”
I hope this helps. Have a nice day
Dear Diogo,
I know it has been quite some time but were you able to solve the problem? I am running a multi-link calculation using the G4(MP2)-6X composite method in g09 on a single node (32 virtual threads). After the available 24h on my local cluster, the CI/CC has only just converged. (Please find the final lines of the log file below).
Iteration Nr. 36
**********************
DD1Dir will call FoFMem 1 times, MxPair= 3782
NAB= 1891 NAA= 0 NBB= 0.
Norm of the A-vectors is 4.3258077D-06 conv= 1.00D-05.
RLE energy= -3.2585251235
DE(Corr)= -3.2585251 E(CORR)= -1047.8759797 Delta= 2.95D-10
NORM(A)= 0.14736506D+01
CI/CC converged in 36 iterations to DelEn= 2.95D-10 Conv= 1.00D-07 ErrA1= 4.33D-06 Conv= 1.00D-05
T1 Diagnostic = 0.01347087
Largest amplitude= 5.02D-02
However, when restarting the CCSD(T) job, I get exactly the same error in the first attempt and the subsequent behavior you have described.
@Joaquim: I believe I named and specified the rwf path as you suggested:
%RWF=/some_long_absolute_path/gaussian_scratch/productionRun2.rwf
%NoSave
%mem=55GB
%chk=production_run2.chk
# restart
–Link1–
…
I am looking forward to any ideas.
Kind regards,
Christoph
Hello Joaquin,
I have tried to restart a frequency calculation with:
%RWF=myrwfile
%Chk=myfile
# Restart
But it ends immediately with the error message being:
No route information found on RWF
But I have given the correct pathway of the rwf. file above and the file still is in this folder.
Do you have any idea why it doesn´t work?
Thanks in davance and best wishes,
Fred
Hello Fred,
Did you include the %NoSave line in your original input file right after the %rwf? I sometimes get confused about Gaussian’s logic, so if your freq calculation ended abnormally the rwf should be kept intact despite having the %nosave option. However I think restarting from rwf is only for analytical freqs and not numerical; if this is your case you have a better chance of restarting it from chk file but in that case you need to specify the whole route section.
Let me know if it worked, please.
Best wishes to you too
Hi Joqauin,
No i didn´t include the %NoSave line in the original File.
I was doing an analytical frequency calculation. My jobs were accidentally killed by the administrator of our hpc.
This particular job I couldn´t restart, even though I had the rwf File and put in the correct path and rwf Filename (double checked it).
For some reason it seems the chk-File was overwritten. Since the file date was changed.
Maybe I made a misstake when restarting it for the first time. Allthough that doesn´t really explain the error message.
For all the other files i was able to restart them with %rwf=/path/myfile and the # restart command.
Therefore I can´t tell for sure what was the problem.
Thanks for your reply,
Thomas
THANK YOU!!! I just restarted a few long-running (month-plus) energy and counterpoise calculations that were knocked out by a power outage. THANK YOU!!! -Neal
Thank you, Neal. I’m always glad to know this blog is useful.
Have a nice day
Hi Jaoquin, I am running some single point calculations on fairly complex inorganic cluster compounds at the ub3lyp+/ccPVTZ level of theory. I am running the single point calculations using geometries that were optimized at the bpv86/def2SVP level of theory, and I’ve been restarting them by writing and reading a rwf file as they timeout after running for 4 days. This approach has worked for 2 of my four compounds, but unfortunately my more complex compounds timeout after running for an additional 4 days. Is it possible to restart a single-point calculation from a rwf file more than once? I am not sure if I am just overwriting my previous restart calculation in doing so. Thank you for your help!
Hi Jon,
Interesting question, have you checked if you get new rwf files after the second submission? Each time you submit a job this gets a new process id (PID) which is echoed at the very first line of the output file; this number is used to identify the rwf files.
I hope this helps
Hi there,
Sorry for bringing this up again, but I’ve actually come across the same issue as the previous comment. An analytical frequency job I was running ran out of time on the HPC I was using, so I needed to restart it. Eventually I figured out how to get it working (the contradictory information in the gaussian manual didn’t help) and continued the frequency calculation until it, again, ran out of time. I went to the scratch file and found that no new .chk or .rwf file had been made, which concerned me, but I thought ‘Fair enough, it must have overwritten the previous .chk and .rwf file’. Unfortunately, this was not the case, and it appears that the second restart started at the same place the first restart did.
Has this ever happened to you? Is it not possible to restart a calculation a second time in gaussian? I would really appreciate any help anyone can give me… 😦
sir I have the message i dont seem to find the rwf file in the scratch file , I only have this first line in my output file
Entering Link 1 = C:\G09W\l1.exe PID= 13420.
i dont have the initial command line