# Category Archives: White papers

## Atom specifications unexpectedly found in input stream.

*“Well, where else were they supposed to appear?”*

I was sent this error along with the previous question for a failed optimization. Apparently there is no answer in the internet (I quickly checked) so here it is:

Gaussian is confused about finding atomic coordinates because there is also a** geom=check **instruction placed in the route section, i.e., it was told to retrieve the atomic coordinates from a checkpoint and then it was given those atomic coordinates within the input so it doesn’t know what you mean and exits.

## The HOMO-LUMO Gap in Open Shell Calculations. Meaningful or meaningless?

The HOMO – LUMO orbitals are central to the Frontier Molecular Orbital (FMO) Theory devised by Kenichi Fukui back in the fifties. The central tenet of the FMO theory resides on the idea that most of chemical reactivity is dominated by the interaction between these orbitals in an electron donor-acceptor pair, in which the most readily available electrons of the former arise from the HOMO and will land at the LUMO in the latter. The energy difference between the HOMO and LUMO of any chemical species, known as the HOMO-LUMO gap, is a very useful quantity for describing and understanding the photochemistry and photophysics of organic molecules since most of the electronic transitions in the UV-Vis region are dominated by the electron transfer between these two frontier orbitals.

But when we talk about Frontier Orbitals we’re usually referring to their doubly occupied version; in the case of open shell calculations the electron density with *α* spin is separate from the one with *β* spin, therefore giving rise to two separate sets of singly occupied orbitals and those in turn have a *α-*HOMO/LUMO and *β-*HOMO/LUMO, although SOMO (Singly Occupied Molecular Orbital) is the preferred nomenclature. Most people will then dismiss the HOMO/LUMO question for open shell systems as meaningless because ultimately we are dealing with two different sets of molecular orbitals. Usually the approach is to work backwards when investigating the optical transitions of a, say, organic radical, e.g. by calculating the transitions with such methods like TD-DFT (Time Dependent DFT) and look to the main orbital components of each within the set of *α* and *β* densities.

To the people who have asked me this question I strongly suggest to first try Restricted Open calculations, RODFT, which pair all electrons and treat them with identical orbitals and treat the unpaired ones independently. As a consequence, RO calculations and Unrestricted calculations vary due to variational freedom. RO calculations could yield wavefunctions with small to large values of spin contamination, so beware. Or just go straight to TDDFT calculations with hybrid orbitals which include a somewhat large percentage of HF exchange and polarized basis sets, but to always compare results to experimental values, if available, since DFT based calculations are Kohn-Sham orbitals which are defined for non-interacting electrons so the energy can be biased. Performing CI or CASSCF calculations is almost always prohibitive for systems of chemical interest but of course they would be the way to go.

## Calculating NMR shifts – Short and Long Ways

Nuclear Magnetic Resonance is a most powerful tool for elucidating the structure of diamagnetic compounds, which makes it practically universal for the study of organic chemistry, therefore the calculation of ^{1}H and ^{13}C chemical shifts, as well as coupling constants, is extremely helpful in the assignment of measured signals on a spectrum to an actual functional group.

Several packages offer an additive (group contribution) empirical approach to the calculation of chemical shifts (ChemDraw, Isis, ChemSketch, etc.) but they are usually only partially accurate for the simplest molecules and no insight is provided for the more interesting effects of long distance interactions (*vide infra*) so quantum mechanical calculations are really the way to go.

With Gaussian the calculation is fairly simple just use the NMR keyword in the route section in order to calculate the NMR shielding tensors for relevant nuclei. Bear in mind that an optimized structure with a large basis set is required in order to get the best results, also the use of an implicit solvation model goes a long way. The output displays the value of the total isotropic magnetic shielding for each nucleus in ppm (image taken from the Gaussian website):

Magnetic shielding (ppm): 1 C Isotropic = 57.7345 Anisotropy = 194.4092 XX= 48.4143 YX= .0000 ZX= .0000 XY= .0000 YY= -62.5514 ZY= .0000 XZ= .0000 YZ= .0000 ZZ= 187.3406 2 H Isotropic = 23.9397 Anisotropy = 5.2745 XX= 27.3287 YX= .0000 ZX= .0000 XY= .0000 YY= 24.0670 ZY= .0000 XZ= .0000 YZ= .0000 ZZ= 20.4233

Now, here is why this is the long way; in order for these values to be meaningful they need to be contrasted with a reference, which experimentally for ^{1}H and ^{13}C is tetramethylsilane, TMS. This means you have to perform the same calculation for TMS at -preferably- the same level of theory used for the sample and substract the corresponding values for either H or C accordingly. Only then the chemical shifts will read as something we can all remember from basic analytical chemistry class.

GaussView 6.0 provides a shortcut; open the Results menu, select NMR and in the new window there is a dropdown menu for selecting the nucleus and a second menu for selecting a reference. In the case of hydrogen the available references are TMS calculated with the HF and B3LYP methods. The SCF – GIAO plot will show the assignments to each atom, the integration simulation and a reference curve if desired.

The chemical shifts obtained this far will be a good approximation and will allow you to assign any peaks in any given spectrum but still not be completely accurate though. The reasons behind the numerical deviations from calculated and experimental values are many, from the chosen method to solvent interactions or basis set limitations, scaling factors are needed; that’s when you can ask the Cheshire Cat which way to go

If you don’t know where you are going any road will get you there.

Lewis Carroll – Alice in Wonderland

Well, not really. The Chemical Shift Repository for computed NMR scaling factors, with Coupling Constants Added Too (aka CHESHIRE CCAT) provides with straight directions on how to correct your computed NMR chemical shifts according to the level of theory without the need to calculate the NMR shielding tensor for the reference compound (usually TMS as pointed out earlier). In a nutshell, the group of Prof. Dean Tantillo (UC Davis) has collected a large number of isotropic magnetic shielding values and plotted them against experimental chemical shifts. Just go to their scaling factors page and check all their linear regressions and use the values that more closely approach to your needs, there are also all kinds of scripts and spreadsheets to make your job even easier. Of course, if you make use of their website **don’t forget** to give the proper credit by including these references in your paper.

We’ve recently published an interesting study in which the 1H – 19F coupling constants were calculated via the long way (I was just recently made aware of CHESHIRE CCAT by Dr. Jacinto Sandoval who knows all kinds of web resources for computational chemistry calculations) as well as their conformational dependence for some substituted 2-aza-carbazoles (fig. 1).

The paper is published in the Journal of Molecular Structure. In this study we used the GIAO NMR computations to assign the peaks on an otherwise cluttered spectrum in which the signals were overlapping due to conformational variations arising from the rotation of the C-C bond which re-orients the F atoms in the fluorophenyl grou from the H atom in the carbazole. After the calculations and the scans were made assigning the peaks became a straightforward task even without the use of scaling factors. We are now expanding these calculations to more complex systems and will contrast both methods in this space. Stay tuned.

## Post Calculation Addition of Empirical Dispersion – Fixing interaction energies

Calculation of interaction energies is one of those things people are more concerned with and is also something mostly done wrong. The so called ‘*gold standard*‘ according to Pavel Hobza for calculating supramolecular interaction energies is the CCSD(T)/CBS level of theory, which is highly impractical for most cases beyond 50 or so light atoms. Basis set extrapolation methods and inclusion of electronic correlation with MP2 methods yield excellent results but they are not nonetheless almost as time consuming as CC. DFT methods in general are terrible and still are the most widely used tools for electronic structure calculations due to their competitive computing times and the wide availability of schemes for including terms which help describe various kinds of interactions. The most important ingredients needed to get a decent to good interaction energies values calculated with DFT methods are correlation and dispersion. The first part can be recreated by a good correlation functional and the use of empirical dispersion takes care of the latter shortcoming, dramatically improving the results for interaction energies even for lousy functionals such as the infamous B3LYP. The results still wont be of benchmark quality but still the deviations from the *gold standard* will be shortened significantly, thus becoming more quantitatively reliable.

There is an online tool for calculating and adding the empirical dispersion from Grimme’s group to a calculation which originally lacked it. In the link below you can upload your calculation, select the basis set and functionals employed originally in it, the desired damping model and you get in return the corrected energy through a geometrical-Counterpoise correction and Grimme’s empirical dispersion function, D3, of which I have previously written here.

The gCP-D3 Webservice is located at: http://wwwtc.thch.uni-bonn.de/

The platform is entirely straightforward to use and it works with xyz, turbomole, orca and gaussian output files. The concept is very simple, a both gCP and D3 contributions are computed in the selected basis set and added to the uncorrected DFT (or HF) energy (eq. 1)

(**1**)

If you’re trying to calculate interaction energies, remember to perform these corrections for every component in your supramolecular assembly (eq. 2)

(**2**)

Here’s a screen capture of the outcome after uploading a G09 log file for the simplest of options B3LYP/6-31G(*d*), a decomposed energy is shown at the left while a 3D interactive Jmol rendering of your molecule is shown at the right. Also, various links to the literature explaining the details of these calculations are available in the top menu.

I’m currently writing a book chapter on methods for calculating ineraction energies so expect many more posts like this. A special mention to Dr. Jacinto Sandoval, who is working with us as a postdoc researcher, for bringing this platform to my attention, I was apparently living under a rock.

## Error for Gaussian16 .log files and GaussView5

There’s an error message when opening some Gaussian16 output files in GaussView5 for which the message displayed is the following:

ConnectionGLOG::Parse_Gauss_Coord(). Failure reading oriented atomic coordinates. Line Number

We have shared some solutions to the GaussView handling of *chk and *.fchk files in teh past but never for *.log files, and this time Dr. Davor Šakić from the University of Zagreb in Croatia has brought to my attention a fix for this error. If “Dipole orientation” with subsequent orientation is removed, the file becomes again readable by GaussView5.

Here you can download a script to fix the file without any hassle. The usage from the command line is simply:

˜$ chmod 777 Fg16TOgv5 ˜$ ./Fg16TOgv5 name.log

The first line is to change and grant all permissions to the script (use at your discretion/own risk), which in turn will take the output file **name.log** and yield two more files: **gv5_name.log** and and **name.arch**; the latter archive allows for easy generation of SI files while the former is formatted for GaussView5.x.

Thanks to Dr. Šakić for his script and insight, we hope you find it useful and if indeed you do please credit him whenever its due, also, if you find this or other posts in the blog useful, please let us know by sharing, staring and commenting in all of them, your feedback is incredibly helpful in justifying to my bosses the time I spent curating this blog.

Thanks for reading.

## Python scripts for calculating Fukui Indexes

One of the most popular posts in this blog has to do with calculating Fukui indexes, however, when dealing with a large number of molecules, our described methodology can become cumbersome since it requires to manually extract the population analysis from two or three different output files and then performing the arithmetic on them separately with a spreadsheet or something.

Our new team member Ricardo Loaiza has written a python script that takes the three aforementioned files and yields a .csv file with the calculated Fukui indexes, and it even points out which of the atoms exhibit the largest values so if you have a large molecule you don’t have to manually check for them. We have also a batch version which takes all the files in any given directory and performs the Fukui calculations for each, provided it can find file triads with the naming requirements described below.

Output files must be named *filename*.log (the N electrons reference state), *filename***_plus**.log (the state with N+1 electrons) and *filename***_minus**.log (the N-1 electrons state). Another restriction is that so far these scripts only work with NBO population analysis as provided by the NBO3.1 program available in the various versions of Gaussian. I imagine the listing is similar in NBO5.x and NBO6.x and so it should work if you do the population analysis with them.

The syntax for the single molecule version is:

python fukui.py filename.log filename_minus.log filename_plus.log

For the batch version is:

./fukuiPorLote.sh

(*Por Lote* means *In Batch* in Spanish.)

These scripts are available via GitHub. We hope you find them useful, and you do please let us know whether here at the comments section or at our GitHub site.

## fchk file errors (Gaussian) Missing or bad Data: RBond

We’ve covered some common errors when dealing with formatted checkpoint files (*.fchk) generated from Gaussian, specially when analyzed with the associated GaussView program. (see here and here for previous posts on the matter.)

Prof. Neal Zondlo from the University of Delaware kindly shared this solution with us when the following message shows up:

CConnectionGFCHK::Parse_GFCHK() Missing or bad data: Rbond Line Number 1234

The Rbond label has to do with the connectivity displayed by the visualizer and can be overridden by close examination of the input file. In the example provided by Prof. Zondlo he found the following line in the connectivity matrix of the input file:

2 9 0.0

which indicates a zero bond order between atoms 2 and 9, possibly due to their proximity. He changed the line to simply

2

So editing the connectivity of your atoms in the input can help preventing the Rbond message.

I hope this helps someone else.

## Quick note on WFN(X) files and MP2 calculations #G09 #CompChem

A few weeks back we wrote about using WFN(X) files with MultiWFN in order to find σ-holes in halogen atoms by calculating the maximum potential on a given surface. We later found out that using a chk file to generate a wfn(x) file using the guess=(read,only) keyword didn’t retrieve the MP2 wavefunction but rather the HF wavefunction! Luckily we realized this problem very quickly and were able to fix it. We tried to generate the wfn(x) file with the following keywords at the route section

#p guess=(read,only) density=current

but we kept retrieving the HF values, which we noticed by running the corresponding HF calculation and noticing that every value extracted from the WFN file was exactly the same.

So, if you want a WFN(X) file for post processing an MP2 (or any other post-HartreFock calculation for that matter) ask for it from the beginning of your calculation in the same job. I still don’t know how to work around this or but will be happy to report it whenever I do.

PS. A sincere apology to all subscribers for getting a notification to this post when it wasn’t still finished.

## The “art” of finding Transition States Part 2

Last week we posted some insights on finding Transitions States in Gaussian 09 in order to evaluate a given reaction mechanism. A stepwise methodology is tried to achieve and this time we’ll wrap the post with two flow charts trying to synthesize the information given. It must be stressed that knowledge about the chemistry of the reaction is of paramount importance since G09 cannot guess the structure connecting two minima on its own but rather needs our help from our chemical intuition. So, without further ado here is the remainder of Guillermo’s post.

**METHOD 3. **QST3. For this method, you provide the coordinates of your reagents, products and TS (in that order) and G09 uses the QST3 method to find the first order saddle point. As for QST2 the numbering scheme must match for all the atoms in your three sets of coordinates, again, use the connection editor to verify it. Here is an example of the input file.

link 0 --blank line-- #p b3lyp/6-31G(d,p)opt=(qst3,calcfc)geom=connectivity freq=noraman --blank line--Charge MultiplicityCoordinates of reagents --blank line—Charge MultiplicityCoordinates of products --blank line--Charge MultiplicityCoordinates of TS --blank line---

As I previously mentioned, it happens that you find a first order saddle point but does not correspond to the TS you want, you find an imaginary vibration that is not the one for the bond you are forming or breaking. For these cases, I suggest you to take that TS structure and manually modify the region that is causing you trouble, then use method 2.

**METHOD 4. **When the previous methods fail to yield your desired TS, the brute force way is to acquire the potential energy surface (PES) and visually locate your possible TS. The task is to perform a rigid PES scan, for this, the molecular structure must be defined using z-matrix. Here is an example of the input file.

link 0 --blank line-- #p b3lyp/6-31G(d,p)scan testgeom=connectivity --blank line--Charge MultiplicityZ-matrix of reagents (or products) --blank line--

In the Z-matrix section you must specify which variables (B, A or D) you want to modify. First, locate the variables you want to modify (distance B, angle A, or dihedral angle D). Then modify those lines within the Z-matrix, here is an example.

B1 1.41 3 0.05 A1 104.5 2 1.0

What you are specifying with this is that the variable B1 (a distance) is going to be stepped 3 times by 0.05. Then variable A1 (an angle) is going to be stepped 2 times by 1.0. Thus, a total of 12 energy evaluations will be performed. At the end of the calculation open the .log file in gaussview and in Results choose the Scan… option. This will open a 3D surface where you should locate the saddle point, this is an educated guess, so take the structure you think corresponds to your TS and use it for method 2.

I have not fully explored this method so I encourage you to go to Gaussian.com and thoroughly review it.

Once you have found your TS structure and via the imaginary vibration confirmed that is the one you are looking for the next step is to verify that your TS connects both your reagents and products in the potential energy surface. For this, an Intrinsic Reaction Coordinate (IRC) calculation must be performed. Here is an example of the input file for the IRC.

link 0 --blank line-- #p b3lyp/6-31G(d,p)irc=calcfcgeom=connectivity --blank line--Charge MultiplicityCoordinates of TS --blank line--

With this input, you ask for an IRC calculation, the default numbers of steps are 20 for each side of your TS in the PES; you must specify the coordinates of your TS or take them from the .chk file of your optimization. In addition, an initial force constant calculation must be made. It often occurs that the calculation fails in the correction step, thus, for complicated cases I hardly suggest to use **irc=calcall**, this will consume very long time (even days) but there is a 95% guaranty. If the number of points is insufficient you can put more within the route section, here is such an example for a complicated case.

link 0 --blank line-- #p b3lyp/6-31G(d,p)irc=(calcall,maxpoints=80)geom=connectivity --blank line--Charge MultiplicityCoordinates of TS --blank line--

With this route section, you are asking to perform an IRC calculation with 80 points on each side of the PES, calculating the force constants at every point. For an even complicated case try adding the **scf=qc** keyword in the route section, quadratic convergence often works better for IRC calculations.