# Blog Archives

## Basis Set Superposition Error (BSSE). A short intro

Molecular Orbitals (MOs) are linear combinations of Atomic Orbitals (AOs), which in turn are linear combinations of other functions called ‘basis functions’. A basis, or more accurately a basis set, is a collection of functions which obey a set of rules (such as being orthogonal to each other and possibly being normalized) with which all AOs are constructed, and although these are centered on each atomic nucleus, the canonical way in which they are combined yield delocalized MOs; in other words, an MO can occupy a large space spanning several atoms at once. We don’t mind this expansion across a molecule, but what about between two molecules? Calculating the interaction energy between two or more molecular fragments leads to an artificial extra–stabilization term that stems from the fact that electrons in molecule 1 can occupy AO’s (or the basis functions which form them) centered on atoms from molecule 2.

Fundamentally, the interaction energy of any A—B dimer, Eint, is calculated as the energy difference between the dimer and the separately calculated energies for each component (Equation 1).

Eint = EAB – EA – EB (1)

However the calculation of Eint by this method is highly sensitive to the choice of basis set due to the Basis Set Superposition Error (BSSE) described in the first paragraph. The BSSE is particularly troublesome when small basis sets are used, due to the poor description of dispersion interactions but treating this error by just choosing a larger basis set is seldom useful for systems of considerable sizes. The Counterpoise method is a nifty correction to equation 1, in which EA and EB are calculated with the basis set of A and B respectively, i.e., only in EAB a larger basis set (that of A and B simultaneously) is used. The Counterpoise method calculates each component with the AB basis set (Equation 2)

EintCP = EABAB – EAAB– EBAB (2)

where the superscript AB means the whole basis set is used. This is accomplished by using ‘ghost‘ atoms with no nuclei and no electrons but empty basis set functions centered on them.

In Gaussian, BSSE is calculated with the Counterpoise method developed by Boys and Simon. It requires the keyword Counterpoise=N where N is the number of fragments to be considered (for an A—B system, N=2). Each atom in the coordinates list must be specified to which fragment pertains; additionally, the charge and multiplicity for each fragment and the whole supermolecular ensemble must be specified. Follow the example of this hydrogen fluoride dimer.

```%chk=HF2.chk
#P opt wB97XD/6-31G(d,p) Counterpoise=2

HF dimer

0,1 0,1 0,1
H(Fragment=1) 0.00 0.00 0.00
F(Fragment=1) 0.00 0.00 0.70
H(Fragment=2) 0.00 0.00 1.00
F(Fragment=2) 0.00 0.00 1.70```

For closed shell fragments the first line is straightforward but one must pay attention that the first pair of numbers in the charge multiplicity line correspond to the whole ensemble, whereas the folowing pairs correspond to each fragment in consecutive order. Fragments do not need to be specified contiguously, i.e., you don’t need to define all atoms for fragment 1 and after those the atoms for fragment 2, etc. They could be mixed and the program still assigns them correctly. Just as an example I typed wB97XD but any other method, DFT or ab initio, may be used; only semiempirical methods do not admit a BSSE calculation because they don’t make use of a basis set in the first place!

The output provides the corrected energy (in atomic units) for the whole system, as well as the BSSE correction (which added to the previous term yields the un-corrected energy of the system). Gaussian16 also provides these values in kcal/mol as ‘Complexation energies’ first raw (uncorrected) and then the corrected energy.

BSSE is always present and cannot be entirely eliminated because of the use of finite basis sets but it can be correctly dealt with if the Counterpoise method is included.

## Aurides Chemistry – New Paper in Organometallics

Compound 2 represents the first structural example of a 12 e− auride complex, with a pseudohalide/hydride nature in bonding. According to our NBO calculations, this electron deficient gold center is stabilized by weak intramolecular interactions between Au p orbitals and σC−C and σC−H bonds of adjacent aromatic rings together with a Ga−Au−Ga 3 centers−2 electrons bond (I like the term ‘banana bond‘, don’t you?).

I was invited to participate in this wonderful venture by my good friend and colleague Dr. José Oscar Carlos Jiménez-Halla, from the University of Guanajuato, Mexico, with whom we’re now working with Prof. Rong Shang at the Hiroshima University. Prof. Shang has synthesized this portentous Auride complex and over the last year, Leonardo “Leo” Lugo has worked with Oscar and I in calculating their electronic structure and bonding properties.

Gold catalysis is an active area of research but low valent Au compounds are electron deficient and therefore highly reactive and elusive; that’s why researchers prefer to synthesize these compounds in situ, to harness their catalytic properties before they’re lost. Power’s digalladeltacyclane was used as a ligand framework to bind to a Au(I) center, which became reduced after the addition and breaking of the Ga−Ga bond while the opposite face of the metallic center became blocked by the bulky aromatic groups on the main ligand. NBO calculations at the M05-2X/[LANL2TZ(f),6-311G(d,p)] and QTAIM BCP analysis show the main features of Au bonding in 2, noteworthy features are the 3c−2e bond (banana) and the σC−C and σC−H donations (See figure 2).

One of the most interesting features of this compound is the fact that Au(PPh3)Cl reacts differently to the digallane ligand than it does to analogous B−B, Si−Si, or Sn−Sn bonds. The Au−Cl bond does not undergo metathesis as with B−B, nor does it undergo an oxidative addition, so to further understand the chemistry of−and leading to−compound 2, the reaction mechanism energy profile was calculated in a rather painstakingly effort (Kudos, Leo, and a big shoutout to my friend Dr. Jacinto Sandoval for his one on one assistance). Figure 3 shows the energy profile for the reaction mechanism for the formation of 2 from Power’s digallane reagent and Au(PPh3)Cl.

You can read more details about this research in Organometallics DOI:10.1021/acs.organomet.0c00557. Thanks again to Profs. Rong Shang and Óscar Jiménez-Halla for bringing me on board of this project and to Leo for his relentless work getting those NBO calculations done; this is certainly the beginning of a golden opportunity for us to collaborate on a remarkable field of chemistry, it has certainly made me go bananas over Aurides chemistry. OK I’ll see myself out.

## Orbital Contributions to Excited States

This is a guest post by our very own Gustavo “Gus” Mondragón whose work centers around the study of excited states chemistry of photosynthetic pigments.

When you’re calculating excited states (no matter the method you’re using, TD-DFT, CI-S(D), EOM-CCS(D)) the analysis of the orbital contributions to electronic transitions poses a challenge. In this post, I’m gonna guide you through the CI-singles excited states calculation and the analysis of the electronic transitions.

I’ll use adenine molecule for this post. After doing the corresponding geometry optimization by the method of your choice, you can do the excited states calculation. For this, I’ll use two methods: CI-Singles and TD-DFT.

The route section for the CI-Singles calculation looks as follows:

`%chk=adenine.chk%nprocshared=8%mem=1Gb#p CIS(NStates=10,singlets)/6-31G(d,p) geom=check guess=read scrf=(cpcm,solvent=water)adenine excited states with CI-Singles method0 1--blank line--`

I use the same geometry from the optimization step, and I request only for 10 singlet excited states. The CPCP implicit solvation model (solvent=water) is requested. If you want to do TD-DFT, the route section should look as follows:

`%chk=adenine.chk%nprocshared=8%mem=1Gb#p FUNCTIONAL/6-31G(d,p) TD(NStates=10,singlets) geom=check guess=read scrf=(cpcm,solvent=water)adenine excited states with CI-Singles method0 1--blank line--`

Where FUNCTIONAL is the DFT exchange-correlation functional of your choice. Here I strictly not recommend using B3LYP, but CAM-B3LYP is a noble choice to start.

Both calculations give to us the excited states information: excitation energy, oscillator strength (as f value), excitation wavelength and multiplicity:

Excitation energies and oscillator strengths:

` Excited State   1:      Singlet-A      6.3258 eV  196.00 nm  f=0.4830  <S**2>=0.000      11 -> 39        -0.00130      11 -> 42        -0.00129      11 -> 43         0.00104      11 -> 44        -0.00256      11 -> 48         0.00129      11 -> 49         0.00307      11 -> 52        -0.00181      11 -> 53         0.00100      11 -> 57        -0.00167      11 -> 59         0.00152      11 -> 65         0.00177`

The data below corresponds to all the electron transitions involved in this excited state. I have to cut all the electron transitions because there are a lot of them for all excited states. If you have done excited states calculations before, you realize that the HOMO-LUMO transition is always an important one, but not the only one to be considered. Here is when we calculate the Natural Transition Orbitals (NTO), by these orbitals we can analyze the electron transitions.

For the example, I’ll show you first the HOMO-LUMO transition in the first excited state of adenine. It appears in the long list as follows:

35 -> 36         0.65024

The 0.65024 value corresponds to the transition amplitude, but it doesn’t mean anything for excited state analysis. We must calculate the NTOs of an excited state from a new Gaussian input file, requesting from the checkpoint file we used to calculate excited states. The file looks as follows:

`%Oldchk=adenine.chk%chk=adNTO1.chk%nproc=8%mem=1Gb#p SP geom=allcheck guess=(read,only) density=(Check,Transition=1) pop=(minimal,NTO,SaveNTO)`

I want to say some important things right here for this last file. See that no level of theory is needed, all the calculation data is requested from the checkpoint file “adenine.chk”, and saved into the new checkpoint file “adNTO1.chk”, we must use the previous calculated density and specify the transition of interest, it means the excited state we want to analyze. As we don’t need to specify charge, multiplicity or even the comment line, this file finishes really fast.

After doing this last calculation, we use the new checkpoint file “adNTO1.chk” and we format it:

`formchk -3 adNTO1.chk adNTO1.fchk`

If we open this formatted checkpoint file with GaussView, chemcraft or the visualizer you want, we will see something interesting by watching he MOs diagram, as follows:

We can realize that frontier orbitals shows the same value of 0.88135, which means the real transition contribution to the first excited state. As these orbitals are contributing the most, we can plot them by using the cubegen routine:

`cubegen 0 mo=homo adNTO1.fchk adHOMO.cub 0 h`

This last command line is for plotting the equivalent as the HOMO orbital. If we want to plot he LUMO, just change the “homo” keyword for “lumo”, it doesn’t matter if it is written with capital letters or not.

You must realize that the Natural Transition Orbitals are quite different from Molecular Orbitals. For visual comparisson, I’ve printed also the molecular orbitals, given from the optimization and from excited states calculations, without calculating NTOs:

These are the molecular frontier orbitals, plotted with Chimera with 0.02 as the isovalue for both phase spaces:

The frontier NTOs look qualitatively the same, but that’s not necessarily always the case:

If we analyze these NTOs on a hole-electron model, the HOMO refers to the hole space and the LUMO refers to the electron space.

Maybe both orbitals look the same, but both frontier orbitals are quite different between them, and these last orbitals are the ones implied on first excited state of adenine. The electron transition will be reported as follows:

If I can do a graphic summary for this topic, it will be the next one:

NTOs analysis is useful no matter if you calculate excited states by using CIS(D), EOM-CCS(D), TD-DFT, CASSCF, or any of the excited states method of your election. These NTOs are useful for population analysis in excited states, but these calculations require another software, MultiWFN is an open-source code that allows you to do this analysis, and another one is called TheoDORE, which we’ll cover in a later post.

## Non-canonical Base Pairs show Watson-Crick pairing in MD simulations

Elucidating the pairing of non-hydrogen bonded unnatural base pairs (UBPs) is still a controversial subject due to the lack of specificity in their mutual interactions. Experimentally, NMR is the method of choice but the DNA strand must be affixed on template of sorts such as a polymerase protein. Those discrepancies are well documented in a recent review which cites our previous computational work, both DFT and MD, on UBPs.

Since that last paper of ours on synthetic DNA, my good friend Dr. Rodrigo Galindo from Utah U. and I have had serious doubts on the real pairing fashion exhibited by Romesberg’s famous hydrophobic nucleotides d5SICS – dNaM. While the authors claim a stacked pairing (within the context of the strand in the KlenTaq polymerase enzime), our simulations showed a Watson-Crick-like pairing was favored in the native form. To further shed light on the matter we performed converged micro-seconds long simulations, varying the force field (two recent AMBER fields were explored: Bsc1 and OL15), the water model (TIP3P and OPC), and the ionic compensation scheme (Na+/Cl or Mg2+/Cl).

In the image below it can be observed how the pairing is consistently WC (dC1′-C1′ ~10.4 A) in the most populated clusters regardless of the force field.

Also, a flipping experiment was performed where both nucleotides were placed 180.0° outwards and the system was left to converge inwards to explore a ‘de novo’ pairing guided solely by their mutual interactions and the template formed by the rest of the strand. Distance population for C1′ – C1′ were 10.4 A for Bsc1 (regardless of ionic compensation) and 9.8 A for OL15 (10.4 A where Mg2+ was used as charge compensation).

This study is now published in the Journal of Biomolecular Structure & Dynamics doi.org/10.1080/07391102.2019.1671898.

Despite the successful rate of replication by a living organism -which is a fantastic feat!- of these two nucleotides, there is little chance they can be used for real coding applications (biological or otherwise) due to the lack of structural control of the double helix. The work of Romesberg is impressive, make no mistake about it, but my money isn’t on hydrophobic unnatural nucleotides for information applications 🙂

All credit and glory is due to the amazing Dr. Rodrigo Galindo-Murillo from the University of Utah were he works as a developer for the AMBER code among many other things. Go check his impressive record!

## XXVIII International Materials Research Congress

I just came back from beautiful Cancun where I attended for the third time the IMRC conference invited by my good friend and awesome collaborator Dr. Eddie López-Honorato, who once again pulled off the organization of a wonderful symposium on materials with environmental applications.

Dr. López-Honorato and I have been working for a number of years now on the design application of various kinds of materials that can eliminate arsenic species from drinking water supplies, an ever present problem in northern Mexico in South West US. So far we have successfully explored the idea of using calix[n]arenes hosts for various arsenic (V) oxides and their derivatives, but now his group has been thoroughly exploring the use of graphene and graphene oxide (GO) to perform the task.

Our joint work is a wonderful example of what theory and experiment and achieve when working hand-in-hand. During this invited talk I had the opportunity to speak about the modeling side of graphene oxide, in which we’ve been able to rationalize why polar solvents seem to be -counterintuitively- more efficient than non-polar solvents to exfoliate graphene sheets from graphite through attrition milling, as well as to understand the electronic mechanism by which UV light radiation degrades GO without significantly diminishing there arsenic-adsorbing properties. All these results are part of an upcoming paper so more details will come ahead.

Thanks to Dr. Eddie López for his invitation and the opportunity provided to meet old friends and make new ones within the wonderful world of scientific collaborations.

## Useful Thermochemistry from Gaussian Calculations

Statistical Mechanics is the bridge between microscopic calculations and thermodynamics of a particle ensemble. By means of calculating a partition function divided in electronic, rotational, translational and vibrational functions, one can calculate all thermodynamic functions required to fully characterize a chemical reaction. From these functions, the vibrational contribution, together with the electronic contribution, is the key element to getting thermodynamic functions.

Calculating the Free Energy change of any given reaction is a useful approach to asses their thermodynamic feasibility. A large negative change in Free Energy when going from reagents to products makes up for a quantitative spontaneous (and exothermic) reaction, nevertheless the rate of the reaction is a different story, one that can be calculated as well.

Using the freq option in your route section for a Gaussian calculation is mandatory to ascertain the current wave function corresponds to a minimum on a potential energy hypersurface, but also yields the thermochemistry and thermodynamic values for the current structure. However, thermochemistry calculations are not restricted to minima but it can also be applied to transition states, therefore yielding a full thermodynamic characterization of a reaction mechanism.

A regular freq calculation yields the following output (all values in atomic units):

```Zero-point correction=                           0.176113 (Hartree/Particle)
Thermal correction to Energy=                    0.193290
Thermal correction to Enthalpy=                  0.194235
Thermal correction to Gibbs Free Energy=         0.125894
Sum of electronic and zero-point Energies=           -750.901777
Sum of electronic and thermal Energies=              -750.884600
Sum of electronic and thermal Enthalpies=            -750.883656
Sum of electronic and thermal Free Energies=         -750.951996```

For any given reaction say A+B -> C one could take the values from the last row (lets call it G) for all three components of the reaction and perform the arithmetic: DG = GC – [GA + GB], so products minus reagents.

By default, Gaussian calculates these values (from the previously mentioned partition function) using normal conditions, T = 298.15 K and P = 1 atm. For an assessment of the thermochemistry at other conditions you can include in your route section the corresponding keywords Temperature=x.x and Pressure=x.x, in Kelvin and atmospheres, respectively.

(Huge) Disclaimer: Although calculating the thermochemistry of any reaction by means of DFT calculations is a good (and potentially very useful) guide to chemical reactivity, getting quantitative results require of high accuracy methods like G3 or G4 methods, collectively known as Gn mehtods, which are composed of pre-defined stepwise calculations. The sequence of these calculations is carried out automatically; no basis set should be specified. Other high accuracy methods like CBS-QB3 or W1U can also be considered whenever Gn methods are too costly.

## Atom specifications unexpectedly found in input stream.

“Well, where else were they supposed to appear?”

I was sent this error along with the previous question for a failed optimization. Apparently there is no answer in the internet (I quickly checked) so here it is:

Gaussian is confused about finding atomic coordinates because there is also a geom=check instruction placed in the route section, i.e., it was told to retrieve the atomic coordinates from a checkpoint and then it was given those atomic coordinates within the input so it doesn’t know what you mean and exits.

## 10 Years of Blogging!

This week marks the 10th anniversary of this little blog! It’s crazy to think a pet project that I took on during my last year as a postdoc is still going on after a decade of recording the work of our group in computational chemistry and it is also a happy coincidence that this year is the centennial anniversary of IUPAC and the sesquicentennial anniversary of the Periodic Table, for which 2019 has been designated as the International Year of the Periodic Table. I will release various posts celebrating this first decade of blogging and some regarding the IYPT2019 as soon as possible, also some major changes in layout and look are coming. It has been suggested to me that setting a patreon.com account could help me raise some funding for assisting underprivileged students but I’m not so sure yet.

By 2009 the chemistry blogosphere was already in full swing, so I got to it a bit late. (Is commenting ‘First!‘ still a thing?) At the time my job future seemed a bit uncertain, I had already spent two years as a postdoc in Romania and prior to that I worked for a private company in their research center here in Mexico so I started to ramble here so in upcoming job interviews I could point to a resource which gathered my thoughts and some achievements in a more informal fashion than a CV or a resume. (Plus, I like writing about things other than chemistry just for myself, maybe someday I’ll start a blog with some fiction writings I have here and there.) Quite frankly I didn’t think I could go back into academia so I was mainly looking for jobs in the R&D departments of various chemical companies, particularly in the field of coatings which was the one I already had some experience.

I never imagined this little blog would gain any attention, I think I had something like 2,000 views on the first year, now it’s up to 1,500 views a week! One of the first posts that gained popularity quite quickly dealt with the calculation of SCRF calculations and some parameters we were struggling to get right. Once I found the best parameters for running them, my boss, the late Prof. Dr. Ioan Silaghi-Dumitrescu at Babes-Bolyai University, asked me to email them to the group and post them physically in the lab so we wouldn’t loose them; I thought it would be a better idea to have them on the blog so we all could access them easily and at the same time share our findings with whomever had the same issues. Turned out that many people struggled with these parameters for SCRF calculations in Gaussian and from that moment on that became one of the underlying principles of the blog: “any problem we face in the lab is definitely faced by someone else, so lets share our solution”; the other principle of course was my blatant self promotion.

One of the most rewarding aspects of having kept this blog going on for so long is knowing that it is a modest resource that some people has found helpful. Attending conferences and having people telling me they like my posts and have found help in them is extremely gratifying. Also, academically it has allowed me to meet wonderful people with whom I’ve established very interesting collaborations in various countries like Iran, US, Slovakia, Czech Republic, Chile, Bulgaria and many more.

Very early I started getting direct questions to specific problems and to the best of my abilities I’ve tried to answer them although I not always have the time to do so and for that I apologize to all the readers who didn’t get an answer; up until now keeping this blog has been a spare-time endeavor which not always gets the priority it deserves withing my academic tasks.

Thank you for reading, commenting and sharing these posts during this past decade! I truly appreciate it and it has been very important to me; I sometimes feel the posts go into the void but every now and then I’m approached by readers who have found the blog helpful and that is very rewarding. Here’s to ten more years!

## Using PDB files for Electronic Structure Calculations

#### Quick Post on preparing Gaussian input files from PDB files.

If you’re modeling biological systems chances are that, more often than not, you start by retrieving a PDB file. The Protein Data Bank is a repository for all things biochemistry – from oligo-peptides to full DNA sequences with over 140,000 available files encoding the corresponding structure obtained by various experimental means ranging from X-Ray diffraction, NMR and more recently, Cryo Electron Microscopy (CEM).

The PDB file encodes the Cartesian coordinates for each atom present in the structure as well as their in the same way molecular dynamics codes -like AMBER or GROMACS- code the parameters for a force field; this makes the PDB a natural input file for MD.

There are however some considerations to have in mind for when you need to use these coordinates in electronic structure calculations. Personally I give it a pass with OpenBabel to add (or possibly just re-add) all Hydrogen atoms with the following instruction:

`\$>obabel -ipdb filename.pdb -ogjf -Ofilename.gjf -hAlternatively, you can select a pH value, say 7.5 with:\$>obabel -ipdb filename.pdb -ogjf -Ofilename.gjf -h -p7.5`

You may also use the GUI if by any chance you’re working in Windows:

This sends all H atoms to the end of the atoms list. Usually for us the next step is to optimize their positions with a partial optimization at a low level of theory for which you need to use the ReadOptimize ReadOpt or RdOpt in the route section and then add the atom list at the end of the input file:

`Atomic coordinates--blank line--noatoms atoms=H--blank line--`

Finally, visual inspection of your input structure is always helpful to find any meaningful errors, remember that PDB files come from experimental measurements which are not free of problems.

As usual thanks for reading, commenting, and sharing.

## Gustavo “Gus” Mondragón M.Sc. – Thesis Defense

We celebrate the successful thesis defense of Gustavo “Gus” Mondragón who has now completed his Masters degree and is now on to getting a PhD in our group. Gustavo has worked on the search for multiexcitonic states and their involvement in the excitonic transference between photosynthetic pigments, specifically between bacteriochlorophyll-d molecules (BChl-d) from the bchQRU chlorosome whose whole structure is shown in the gallery below. To this end, Gustavo has studied and implemented the Restricted Active Space method with double spin flip (RAS-2SF) with the use of QChem5.0, a method that has required the use and understanding of states with high multiplicities. Additionally, Gustavo has investigated the influence of the environment within the chlorosome by performing ONIOM calculations for the spectroscopic properties of a BChl-d dimer, finding albeit qualitatively a batochromic effect, probably an expected result but nonetheless an impressive feat for the level of theory selected.

There’s still a lot of work to do in this line of research and although we’re eager to publish our results in this excitonic transference mechanism we want to be completely sure that we’re taking every possibility into consideration so we don’t incur into any inconsistencies.

Gustavo cultivates many research interests from excited states of these pigments to biochemical processes that require the use of various tools; I’m sure his permanence in our lab will bring lots of interesting results. Congratulations, Gus! Thank you for your hard work.