# Blog Archives

## DFT beyond academia

Density Functional Theory is by far the most successful way of gaining access to molecular properties starting from their composition. Calculating the electronic structure of molecules or solid phases has become a widespread activity in computational as well as in experimental labs not only for shedding light on the properties of a system under study but also as a tool to design those systems with taylor-made properties. This level of understanding of matter brought by DFT is based in a rigorous physical and mathematical development, still–and maybe because of it–DFT (and electronic structure calculations in general for that matter) might be thought of as something of little use outside academia.

Prof. Juan Carlos Sancho-García from the University of Alicante in Spain, encouraged me to talk to his students last month about the reaches of DFT in the industrial world. Having once worked in the IP myself I remembered the simulations performed there were mostly DPD (Dissipative Particle Dynamics), a coarse grained kind of molecular dynamics, for investigating the interactions between polymers and surfaces, but no DFT calculations were ever on sight. It is widely known that Docking, QSAR, and Molecular Dynamics are widely used in the pharma industry for the development of new drugs but I wasn’t sure where DFT could fit in all this. I thought patent search would be a good descriptor for the commercial applicability of DFT. So I took a shallow dive and searched for patents explicitly mentioning the use of DFT as part of the invention development process and protection. The first thing I noticed is that although they appear to be only a few, these are growing in numbers throughout the years (Figure 1). Again, this was not an exhaustive search so I’m obviously overlooking many.

The second thing that caught my attention was that the first hit came from 1998, nicely coinciding with the rise of B3LYP (Figure 2). This patent was awarded to Australian inventors from the University of Wollongong, South New Wales to determine trace gas concentrations by chromatography by means of calculating the FT-IR spectra of sample molecules (Figure 3), so DFT is used as part of the invention but I ignore if this is a widespread method in analytical labs.

While I’m mentioning the infamous B3LYP functional, a search about it in patents yields the following graph (Figure 4), most of which relate to the protection of photoluminescent or thermoluminescent molecules for light emitting devices; it appears that DFT calculations are used to provide the key features of their protection, such as HOMO-LUMO gap etc.

**Figure 4**– Patents bearing B3LYP as part of their invention

So what about software? Most of the more recent patents in Figure 1 (2018 – 2022) lie in the realm of electronics, particularly the development of semiconductors, ceramical or otherwise, so it was safe to assume VASP could be a popular choice to that end, right? turns out that’s not necessarily the case since a patent search for VASP only accounts for about the 10% of all awarded patents (Figure 5).

I guess it’s safe to say by now that DFT has a significant impact in the industrial development, one could only expect it to keep on rising, however the advent of machine learning techniques and other artificial intelligence related methods promise an accelerated development. I went again to the patents database and this time searched for ‘*machine learning *development *materials*‘ (the term ‘development’ was deleted by the search engine, guess found it too obvious) and its rise is quite notorious, surpassing the frequency of DFT in patents (Figure 6), particularly in the past 5 years (2018 – 2022).

**Figure 6**– The rise of the machines in materials development

I’m guessing in some instances DFT and ML will tend to go hand in hand in the industrial development process, but the timescales reachable by ML will only tend to grow, so I’m left with the question of what are we waiting for to make ML and AI part of the chemistry curricula? As computational chemistry teachers we should start talking about this points with our students and convince the head of departments to help us create proper courses or we risk our graduates to become niche scientists in a time when new skills are sought after in the IP.

__________________________________________________________________________________

Thanks again to Prof. Juan Carlos Sancho García at the University of Alicante, Spain, who asked me talk about the subject in front of his class, and to Prof. José Pedro Cerón-Carrasco from Cartagena for allowing me to talk about this and other topics at Centro Universitario de la Defensa. Thank you, guys! I look forward to meeting you again soon.

## Geometry Optimizations for Excited States

Electronic excitations are calculated vertically according to the Frank—Condon principle, this means that the geometry does not change upon the excitation and we merely calculate the energy required to reach the next electronic state. But for some instances, say calculating not only the absorption spectra but also the emission, it is important to know what the geometry minimum of this final state looks like, or if it even exists at all (Figure 1). Optimizing the geometry of a given excited state requires the prior calculation of the vertical excitations whether via a multireference method, quantum Monte Carlo, or the Time Dependent Density Functional Theory, TD-DFT, which due to its lower computational cost is the most widespread method.

Most single-reference treatments, ab initio or density based, yield good agreement with experiments for lower states, but not so much for the higher excitations or process that involve the excitation of two electrons. Of course, an appropriate selection of the method ensures the accuracy of the obtained results, and the more states are considered, the better their description although it becomes more computationally demanding in turn.

In Gaussian 09 and 16, the argument to the ROOT keyword selects a given excited state to be optimized. In the following example, five excited states are calculated and the optimization is requested upon the second excited state. If no ROOT is specified, then the optimization would be carried out by default on the first excited state (Where L.O.T. stands for Level of Theory).

#p opt TD=(nstates=5,root=2)L.O.T.

Gaussian16 includes now the calculation of analytic second derivatives which allows for the calculation of vibrational frequencies for IR and Raman spectra, as well as transition state optimization and IRC calculations in excited states opening thus an entire avenue for the computation of photochemistry.

If you already computed the excited states and just want to optimize one of them from a previous calculation, you can read the previous results with the following input :

#p opt TD=(Read,Root=N)L.O.T.Density=Current Guess=Read Geom=AllCheck

Common problems. The following error message is commonly observed in excited state calculations whether in TD-DFT, CIS or other methods:

No map to state XX, you need to solve for more vectors in order to follow this state.

This message usually means you need to increase the number of excited states to be calculated for a proper description of the one you’re interested in. Increase the number N for nstates=N in the route section at higher computational cost. A rule of thumb is to request at least 2 more states than the state of interest. This message can also reflect the fact that during the optimization the energy ordering changes between states, and can also mean that the ground state wave function is unstable, i.e., the energy of the excited state becomes lower than that of the ground state, in this case a single determinant approach is unviable and CAS should be used if the size of the molecule allows it. Excited state optimizations are tricky this way, in some cases the optimization may cross from one PES to another making it hard to know if the resulting geometry corresponds to the state of interest or another. Gaussian recommends changing the step size of the optimization from the default 0.3 Bohr radius to 0.1, but obviously this will make the calculation take longer.

Opt=(MaxStep=10)

If the minimum on the excited state potential energy surface (PES) doesn’t exist, then the excited state is not bound; take for example the first excited state of the H_{2} molecule which doesn’t show a minimum, and therefore the optimized geometry would correspond to both H atoms moving away from each other indefinitely (Figure 2). Nevertheless, a failed optimization doesn’t necessarily means the minimum does not exist and further analysis is required, for instance, checking the gradient is converging to zero while the forces do not.

## Density Keyword in Excited State Calculations with Gaussian

I have written about extracting information from excited state calculations but an important consideration when analyzing the results is the proper use of the keyword *density*.

This keyword let’s Gaussian know which density is to be used in calculating some results. An important property to be calculated when dealing with excited states is the change in dipole moment between the ground state and any given state. The Transition Dipole Moment is an important quantity that allows us to predict whether any given electronic transition will be allowed or not. A change in the dipole moment (i.e. non-zero) of a molecule during an electronic transition helps us characterize said transition.

Say you perform a TD-DFT calculation without the *density* keyword, the default will provide results on the lowest excited state from all the requested states, which may or may not be the state of interest to the transition of interest; you may be interested in the dipole moment of all your excited states.

Three separate calculations would be required to calculate the change of dipole moment upon an electronic transition:

1) A regular DFT for the ground state as a reference

2) TD-DFT, to calculate the electronic transitions; request as many states as you need/want, analyze it and from there you can see which transition is the most important.

3) Request the density of the Nth state of interest to be recovered from the checkpoint file with the following route section:

# TD(Read,Root=N)LOTDensity=Current Guess=Read Geom=AllCheck

replace *N* for the *N*th state which caught your eye in step number 2) and *LOT* for the *Level of Theory* you’ve been using in the previous steps. That should give you the dipole moment for the structure of the *N*th excited state and you can compare it with the one in the ground state calculated in 1). Again, if density=current is not used, only properties of *N*=1 will be printed.

## NIST CCCBDB – Vibrational Scaling Factors & ThermoChem Data

The Computational Chemistry Comparison and Benchmark DataBase (CCCBDB) from the National Institute of Standards and Technology (NIST) collects experimental and calculated thermochemistry—related values for 1968 common molecules, constituting a vast source of benchmarks for various kinds of calculations.

In particular, scaling factors for vibrational frequencies are very useful when calculating vibrational spectra. These scaling factors are arranged by levels of theory ranging from HF to MP2, DFT, and multireference methods. These scaling factors are obtained by least squares regression between experimental and calculated frequencies for a set of molecules at a given level of theory.

Aside from vibrational spectroscopy, a large number of structural and energetic properties can be found and estimated for small molecules. A quick formation enthalpy can be calculated from experimental data and then compared to the reported theoretical values at a large number of levels of theory. Moments of inertia, enthalpies, entropies, charges, frontier orbital gaps, and even some odd values or even calculations gone awry are pointed out for you to know if you’re dealing with a particularly problematic system. The CCCB Database includes tutorials and input/output files for performing these kinds of calculations around thermochemistry, making it also a valuable learning resource.

Every computational chemist should be aware of this site, particularly when collaborating with experimentalists or when carrying calculations trying to replicate experimental data. The vastness of the site calls for a long dive to explore their possibilities and capabilities for more accurate calculations.

## XXVIII International Materials Research Congress

I just came back from beautiful Cancun where I attended for the third time the IMRC conference invited by my good friend and awesome collaborator Dr. Eddie López-Honorato, who once again pulled off the organization of a wonderful symposium on materials with environmental applications.

Dr. López-Honorato and I have been working for a number of years now on the design application of various kinds of materials that can eliminate arsenic species from drinking water supplies, an ever present problem in northern Mexico in South West US. So far we have successfully explored the idea of using calix[*n*]arenes hosts for various arsenic (V) oxides and their derivatives, but now his group has been thoroughly exploring the use of graphene and graphene oxide (GO) to perform the task.

Our joint work is a wonderful example of what theory and experiment and achieve when working hand-in-hand. During this invited talk I had the opportunity to speak about the modeling side of graphene oxide, in which we’ve been able to rationalize why polar solvents seem to be -counterintuitively- more efficient than non-polar solvents to exfoliate graphene sheets from graphite through attrition milling, as well as to understand the electronic mechanism by which UV light radiation degrades GO without significantly diminishing there arsenic-adsorbing properties. All these results are part of an upcoming paper so more details will come ahead.

Thanks to Dr. Eddie López for his invitation and the opportunity provided to meet old friends and make new ones within the wonderful world of scientific collaborations.

## Post Calculation Addition of Empirical Dispersion – Fixing interaction energies

Calculation of interaction energies is one of those things people are more concerned with and is also something mostly done wrong. The so called ‘*gold standard*‘ according to Pavel Hobza for calculating supramolecular interaction energies is the CCSD(T)/CBS level of theory, which is highly impractical for most cases beyond 50 or so light atoms. Basis set extrapolation methods and inclusion of electronic correlation with MP2 methods yield excellent results but they are not nonetheless almost as time consuming as CC. DFT methods in general are terrible and still are the most widely used tools for electronic structure calculations due to their competitive computing times and the wide availability of schemes for including terms which help describe various kinds of interactions. The most important ingredients needed to get a decent to good interaction energies values calculated with DFT methods are correlation and dispersion. The first part can be recreated by a good correlation functional and the use of empirical dispersion takes care of the latter shortcoming, dramatically improving the results for interaction energies even for lousy functionals such as the infamous B3LYP. The results still wont be of benchmark quality but still the deviations from the *gold standard* will be shortened significantly, thus becoming more quantitatively reliable.

There is an online tool for calculating and adding the empirical dispersion from Grimme’s group to a calculation which originally lacked it. In the link below you can upload your calculation, select the basis set and functionals employed originally in it, the desired damping model and you get in return the corrected energy through a geometrical-Counterpoise correction and Grimme’s empirical dispersion function, D3, of which I have previously written here.

The gCP-D3 Webservice is located at: http://wwwtc.thch.uni-bonn.de/

The platform is entirely straightforward to use and it works with xyz, turbomole, orca and gaussian output files. The concept is very simple, a both gCP and D3 contributions are computed in the selected basis set and added to the uncorrected DFT (or HF) energy (eq. 1)

(**1**)

If you’re trying to calculate interaction energies, remember to perform these corrections for every component in your supramolecular assembly (eq. 2)

(**2**)

Here’s a screen capture of the outcome after uploading a G09 log file for the simplest of options B3LYP/6-31G(*d*), a decomposed energy is shown at the left while a 3D interactive Jmol rendering of your molecule is shown at the right. Also, various links to the literature explaining the details of these calculations are available in the top menu.

I’m currently writing a book chapter on methods for calculating ineraction energies so expect many more posts like this. A special mention to Dr. Jacinto Sandoval, who is working with us as a postdoc researcher, for bringing this platform to my attention, I was apparently living under a rock.

## DFT Textbook in Spanish by Dr. José Cerón-Carrasco

Today’s science is published mostly in English, which means that non-English speakers must first tackle the language barrier before sharing their scientific ideas and results with the community; this blog is a proof that non-native-English speakers such as myself cannot outreach a large audience in another language.

For young scientists learning English is a must nowadays but it shouldn’t shy students away from learning science in their own native tongues. To that end, the noble effort by Dr. José Cerón-Carrasco from Universidad Católica San Antonio de Murcia, in Spain, of writing a DFT textbook in Spanish constitutes a remarkable resource for Spanish-speaking computational chemistry students because it is not only a clear and concise introduction to ab initio and DFT methods but because it was also self published and written directly in Spanish. His book “*Introducción a los métodos DFT: Descifrando B3LYP sin morir en el intento*” is now available in Amazon. Dr. Cerón-Carrasco was very kind to invite me to write a prologue for his book, I’m very thankful to him for this opportunity.

Así que para los estudiantes hispanoparlantes hay ahora un muy valioso recurso para aprender DFT sin morir en el intento gracias al esfuerzo y la mente del Dr. José Pedro Cerón Carrasco a quien le agradezco haberme compartido la primicia de su libro

¡Salud y olé!

## XVI Mexican Meeting on Phys.Chem.

A yearly tradition of this Comp.Chem. lab and many others throughout our nation is to attend the Mexican Meeting on Theoretical Physical Chemistry to share news, progress and also a few drinks and laughs. This year the RMFQT was held in Puebla and although unfortunately I was not able to attend this lab was proudly represented by its current members. Gustavo Mondragón gave a talk about his progress on his photosynthesis research linking to the previous work of María Eugenia Sandoval already presented in previous editions; kudos to Gustavo for performing remarkably and thanks to all those who gave us their valuable feedback and criticism. Also, five posters were presented successfully, I can only thank the entire team for representing our laboratory in such an admirable way, and a special mention to the junior members, I hope this was the first of many scientific events they attend and may you deeply enjoy each one of them.

Among the invited speakers, the RMFQT had the honor to welcome Prof. John Perdew (yes, the **P** in PBE); the team took the opportunity of getting a lovely picture with him.

Here is the official presentation of the newest members of our group:

**Alejandra Barrera** (hyperpolarizabilty calculations on hypothetical poly-calyx[n]arenes for the search of NLO materials)

**Fernando Uribe** (Interaction energy calculations for non-canonical nucleotides)

**Juan Guzmán** (Reaction mechanisms calculations for catalyzed organic reactions)

We thank the organizing committee for giving us the opportunity to actively participate in this edition of the RMFQT, we eagerly await for next year as every year.

## Dealing with Spin Contamination

Most organic chemistry deals with closed shell calculations, but every once in a while you want to calculate carbenes, free radicals or radical transition states coming from a homolytic bond break, which means your structure is now open shell.

Closed shell systems are characterized by having doubly occupied molecular orbitals, that is to say the calculation is ‘restricted’: Two electrons with opposite spin occupy the same orbital. In open shell systems, unrestricted calculations have a complete set of orbitals for the electrons with alpha spin and another set for those with beta spin. Spin contamination arises from the fact that wavefunctions obtained from unrestricted calculations are no longer eigenfunctions of the total spin operator <*S*^2>. In other words, one obtains an artificial mixture of spin states; up until now we’re dealing only with single reference methods. With each step of the SCF procedure the value of <*S*^2> is calculated and compared to *s*(*s*+1) where *s* is half the number of unpaired electrons (0.75 for a radical and 2.0 for triplets, and so on); if a large deviation between these two numbers is found, the then calculation stops.

Gaussian includes an annihilation step during SCF to reduce the amount of spin contamination but it’s not 100% reliable. Spin contaminated wavefunctions aren’t reliable and lead to errors in geometries, energies and population analyses.

One solution to overcome spin contamination is using Restricted Open Shell calculations (ROHF, ROMP2, etc.) for which singly occupied orbitals is used for the unpaired electrons and doubly occupied ones for the rest. These calculations are far more expensive than the unrestricted ones and energies for the unpaired electrons (the interesting ones) are unreliable, specially spin polarization is lost since dynamical correlation is hardly accounted for. The IOP(5/14=2) in Gaussian uses the annihilated wavefunction for the population analysis if acceptable but since Mulliken’s method is not reliable either I don’t advice it anyway.

The case of DFT is different since rho.alpha and rho.beta can be separated (similarly to the case of unrestricted ab initio calculations), but the fact that both densities are built of Kohn-Sham orbitals and not true canonical orbitals, compensates the contamination somehow. That is not to say that it never shows up in DFT calculations but it is usually less severe, of course for the case of hybrid functional the more HF exchange is included the more important spin contamination may become.

So, in short, for spin contaminated wavefunctions you want to change from restricted to unrestricted and if that doesn’t work then move to Restricted Open Shell; if using DFT you can use the same scheme and also try changing from hybrid to pure orbitals at the cost of CPU time. There is a last option which is using spin projection methods but I’ll discuss that in a following post.