Category Archives: Thermodynamics
The war against COVID-19 has been waged in many fronts. The computational chemistry community has done their share during this pandemic to put forward a cure, a vaccine, or a better understanding of the molecular mechanisms behind the human infection by the SARS-CoV-2 virus. As few vaccines show currently their heads and start making their way around the globe to stop the spreading, amidst a climate of disinformation, distrust and political upheaval, all of which pose several challenges yet to be faced aside from the technical and scientific ones.
This is by no means a comprehensive review of the literature, in fact, most of the cited literature herein was observed in Twitter under the #CompChem and #COVID combined hashtags; Summarizing the research by the CompChem community on COVID-19 related topics in a single blog-post would be near to impossible—I trust a book is being written on it as I type these lines.
The structural elucidation of the proteins associated to the SARS-CoV-2 virus is probably the first step required in designing chemical compounds capable of modifying their functions and altering their life-cycle without altering the biochemistry of the hosts. The Coronavirus Structural Taskforce has elucidated the structure of 28 proteins of SARS-CoV-2 aside from the 300+ proteins from the previous SARS-CoV virus using the tools from the FoldIt at home game based on the Rosetta program to heuristically predict the structure of these proteins. Structure based drug design rely on the knowledge of the structure of the active site (hence the name), but in the case of newly discovered proteins for which homology modeling is not entirely feasible, a ligand-based approach named D3Similarity was developed early in the pandemic for identifying the possible active sites by the group of Prof. Zhijian Xu. Mapping of the of the viral genome and proteome was also achieved early on during the first dates of lockdown in the American continent. The information was readily made available and usable for further studies which prompts another challenge: the rapid dissemination, review and evaluation of information to make scientifically sound claims and make data-based decisions. In this regard, the role of preprints cannot be stressed enough. Without a rapid communication, scientific results cannot generate a much needed critical mass to turn all these data into knowledge. As evidenced by the vast majority of the links present in this post, ChemRXiv from the ACS served the much needed function to gather, link and put the data for scientific evaluation out there in order to accelerate the discovery of solutions to the various steps of the virus’ reproductive cycle through various strategies.
The role of supercomputing has been paramount worldwide to the various efforts made in CompChem (read the C&EN piece) in various fronts from structural elucidation, such as the AI driven structural modelling of spike proteins and their infection mechanism led by Prof. Rommie Amaro (UCSD) and Dr. Arvind Ramanathan which was celebrated by the Bell Prize, to development of vaccines. Many Molecular Dynamics simulations have been performed on potential inhibitors of proteins such as the spike protein, in some cases these simulations coupled with cryo-EM microscopy allowed for the elucidation of the hinging mechanism of these spike proteins, their thermodynamic properties, and all atoms-simulations assessed the rigidity of the receptor as the cause of its infectivity. Still, owning these computing resources isn’t always cost effective; that’s why there have been outsourced to companies such as Amazon web services as Pearlman did for the QM/DFT calculations of the binding energy of several drug candidates for the inhibition of the virus’ main protease (MPro). Many other CADD studies are available (here, here, and here). Researchers from all around the world can chip in and join the effort by reaching out to the COVID-19 High Performance Computing Consortium (HPC) which brings together some of the most advanced computing systems to the hands of private and academic researchers with relevant projects aimed to the study of the virus. On the other side of the Atlantic, the Partnership for Advanced Computing in Europe (PRACE) also provides access to advanced computing services for research. As an effort to keep all the developing information curated and concentrated, the COVID-19 Molecular Structure and Therapeutics Hub was created to provide a community-driven data repository and curation service for molecular structures, models, therapeutics, and simulations related to computational research related to therapeutic opportunities.
As described above, molecular dynamics simulations are capital in the assessment of how drugs interact with proteins. But molecular dynamics can only do so much as they’re computing intensive so, the use of Polarizable Force Fields (PFF) algorithms to obtain results in the microseconds regime with high-resolution sampling methods which have been applied also to the modeling of the MPro protein; the phase space is sampled by different MD trajectories which are then tested and selected. Aside from classical simulations, artificial intelligence predictions and docking calculations, also quantum mechanical calculations have been employed in the search for the most intimate interactions governing the mechanisms of inhibition of proteins. In this front, a Fragment Molecular Orbital based analysis was carried out to find which residues in MPro interacted the most with a given inhibitor.
Virtual screening is at the heart of the computationally aided drug discovery process, specially high-throughput virtual screening such as the one performed by the group of Andre Fischer at Basel, in which 11 potential drugs were narrowed from a pool of over 600 million compounds that were analyzed as potential protease inhibitors. Repurposing of antiviral drugs, and other entry-inhibiting compounds, is also a major avenue explored in the search for treatments; in the linked study by Shailly Tomar et al. antiviral drugs which are also anti inflammatory are believed to take care of lung inflammation and injury associated to the infection at the same time they tend to disrupt the virus’ infection mechanism. The comeback of Virtual Reality can make virtual screening more cooperative even during lockdown conditions and more ‘tangible’ as the company Nanome has proven with their COVID-19 Town Hall meetings which aim to the modeling of proteins in 3D space. Aside from the de novo and repurposing efforts, the search for peptides against infection by SARS-CoV-2 was an important topic (here and here). More recently, Skariyachan and Gopal turn to natural products from herbal origins for their virtual screening (molecular docking and dynamics). In their perspective the chemical complexity achieved through biosynthesis can overcome the bottleneck of chemical discovery while at the same time turning to the ancient practices of herbal remedies described in Ayurveda. Other researchers like Manish Manish have also turned to libraries of 500,000+ natural compounds to find potential drugs for MPro.
The year is coming to an end but not the pandemic in any way. Now, with the advent of new strains, and the widespread vaccination effort put in place, it is more important than ever to keep the fight strong in our labs but also in our personal habits and responsibilities—the same advices that were given at the beginning of the year are still in effect today and will continue to be for the months to come. I want to wish everyone who reads this a happy holiday season, but above all I want to pay a small tribute to the scientists working relentlessly in one of the largest coordinated scientific efforts in modern history, one that can only be compared to the Moon landing or the Manhattan Project; to those scientists and all the healthcare personnel, may you find rest soon, may your efforts never go unnoticed: Thank you for your service.
The Computational Chemistry Comparison and Benchmark DataBase (CCCBDB) from the National Institute of Standards and Technology (NIST) collects experimental and calculated thermochemistry—related values for 1968 common molecules, constituting a vast source of benchmarks for various kinds of calculations.
In particular, scaling factors for vibrational frequencies are very useful when calculating vibrational spectra. These scaling factors are arranged by levels of theory ranging from HF to MP2, DFT, and multireference methods. These scaling factors are obtained by least squares regression between experimental and calculated frequencies for a set of molecules at a given level of theory.
Aside from vibrational spectroscopy, a large number of structural and energetic properties can be found and estimated for small molecules. A quick formation enthalpy can be calculated from experimental data and then compared to the reported theoretical values at a large number of levels of theory. Moments of inertia, enthalpies, entropies, charges, frontier orbital gaps, and even some odd values or even calculations gone awry are pointed out for you to know if you’re dealing with a particularly problematic system. The CCCB Database includes tutorials and input/output files for performing these kinds of calculations around thermochemistry, making it also a valuable learning resource.
Every computational chemist should be aware of this site, particularly when collaborating with experimentalists or when carrying calculations trying to replicate experimental data. The vastness of the site calls for a long dive to explore their possibilities and capabilities for more accurate calculations.
Statistical Mechanics is the bridge between microscopic calculations and thermodynamics of a particle ensemble. By means of calculating a partition function divided in electronic, rotational, translational and vibrational functions, one can calculate all thermodynamic functions required to fully characterize a chemical reaction. From these functions, the vibrational contribution, together with the electronic contribution, is the key element to getting thermodynamic functions.
Calculating the Free Energy change of any given reaction is a useful approach to asses their thermodynamic feasibility. A large negative change in Free Energy when going from reagents to products makes up for a quantitative spontaneous (and exothermic) reaction, nevertheless the rate of the reaction is a different story, one that can be calculated as well.
Using the freq option in your route section for a Gaussian calculation is mandatory to ascertain the current wave function corresponds to a minimum on a potential energy hypersurface, but also yields the thermochemistry and thermodynamic values for the current structure. However, thermochemistry calculations are not restricted to minima but it can also be applied to transition states, therefore yielding a full thermodynamic characterization of a reaction mechanism.
A regular freq calculation yields the following output (all values in atomic units):
Zero-point correction= 0.176113 (Hartree/Particle) Thermal correction to Energy= 0.193290 Thermal correction to Enthalpy= 0.194235 Thermal correction to Gibbs Free Energy= 0.125894 Sum of electronic and zero-point Energies= -750.901777 Sum of electronic and thermal Energies= -750.884600 Sum of electronic and thermal Enthalpies= -750.883656 Sum of electronic and thermal Free Energies= -750.951996
For any given reaction say A+B -> C one could take the values from the last row (lets call it G) for all three components of the reaction and perform the arithmetic: DG = GC – [GA + GB], so products minus reagents.
By default, Gaussian calculates these values (from the previously mentioned partition function) using normal conditions, T = 298.15 K and P = 1 atm. For an assessment of the thermochemistry at other conditions you can include in your route section the corresponding keywords Temperature=x.x and Pressure=x.x, in Kelvin and atmospheres, respectively.
(Huge) Disclaimer: Although calculating the thermochemistry of any reaction by means of DFT calculations is a good (and potentially very useful) guide to chemical reactivity, getting quantitative results require of high accuracy methods like G3 or G4 methods, collectively known as Gn mehtods, which are composed of pre-defined stepwise calculations. The sequence of these calculations is carried out automatically; no basis set should be specified. Other high accuracy methods like CBS-QB3 or W1U can also be considered whenever Gn methods are too costly.
Calculating the pKa value for a Brønsted acid is very hard, like really hard. A full thermodynamic cycle (fig. 1) needs to be calculated along with the high-accuracy solvation free energy for each of the species under consideration, not to mention the use of expensive methods which will be reviewed here in another post in two weeks time.
Finding descriptors that help us circumvent the need for such sophisticated calculations can help great deal in estimating the pKa value of any given acid. We’ve been interested in the reactivity of σ-hole bearing groups in the past and just like Halogen, Tetrel, Pnicogen and Chalcogen bonds, Hydrogen bonds are highly directional and their strength depends on the polarization of the O-H bond. Therefore, we suggested the use of the maximum surface electrostatic potential (VS,max) on the acid hydrogen atom of carboxylic acids as a descriptor for the strength of their interaction with water, the first step in the deprotonation process.
We selected six basis sets; five density functionals; the MP2 method for a total of thirty-six levels of theory to optimize and calculate VS,max on thirty carboxylic acids for a grand total of 1,080 wavefunctions, which were later passed onto MultiWFN (all calculations were taken with PCM = water). Correlation with the experimental pKa values showed a great correlation across the levels of theory (R2 > 0.9), except for B3LYP. Still, the best correlations were obtained with LC-wPBE/cc-pVDZ and wB97XD/cc-pVDZ. From this latter level of theory the linear correlation yielded the following equation:
pKa = -0.2185(VS,max) + 16.1879
Differences in pKa turned out to be less than 0.5 units, which is remarkable for such a straightforward method; bear in mind that calculation of full thermodynamic cycles above chemical accuracy (1.0 kcal/mol) yields pKa differences above 1.0 units.
We then took this equation for a test with 10 different carboxylic acids and the prediction had a correlation of 98% (fig. 2)
I think this method can really catch on for a quick way to predict the pKa values of any carboxylic acid imaginable. We’re now working on the model extension to other groups (i.e. Bronsted bases) and putting together a black-box workflow so as to make it even more accessible and straightforward to use.
We’ve recently published this work in the journal Molecules, an open access publication. Thanks to Prof. Steve Scheiner for inviting us to participate in the special issue devoted to tetrel bonding. Thanks to Guillermo Caballero for the inception of this project and to Dr. Jacinto Sandoval for taking the time from his research in photosynthesis to work on this pet project of ours and of course the rest of the students (Gustavo Mondragón, Marco Diaz, Raúl Torres) whose hard work produced this work.
Another scientific concept that is hard to grasp by laypeople and that to my opinion has been the center of much distortion in the chemistry classroom, is the thermodynamical function Entropy, S.
More often than not, S is said to be a measure of “disorder” and people just take it! If one was to define disorder then one would have to also define order: Is my apartment too entropic? what about my life? Does nature understand order in the same way as we do? How do we understand order inside a living cell where many molecules and organelles are floating around? If indeed S was a measure of disorder then, why is it important to measure it?
Entropy in a nutshell. There have been many attempts to define S in a way young students may understand it, yet tracing parallelisms with ordinary every-day-life concepts is hard and often leads to miss conceptions. A student of mine once asked: “if entropy is always increasing, how come bodies tend to cool down?” he meant to ask how come the translation motions of a molecular ensamble tended to decrease (and with this achieving “order”.)
Prof. Mayo Martínez-Kahn at UNAM in Mexico wrote a very interesting paper about Entropy in the local journal of the Chemistry School, “Educación Química”. The paper was entitled “The tombs of Entropy” as a reference to the widely known fact that in Boltzmann’s tomb his famous equation relating Entropy to the partition function Q, is engraved. Prof. Martínez then ventures in imagining how would other tombs from people who have made contributions to the concept and notion of S would look like. I remember distinctively the one of Sadi Carnot’s in which his famous thermodynamic cycle was displayed.
Entropy in so many words is a function that describes how many different energy levels are available in a thermodynamic system. The more levels, the higher the entropy. It also describes the spontaneity of a process to occur since in nature a system always tends to undergo changes that increase its entropy along with that of its surroundings.
How come Gibbs’ free energy or Helmholtz don’t cause such confusions? my guess is because nobody has attached an every-day-word to them!
PS. It is still important to make scientific concepts permeate into the general audience. Recently decesead comedian George Carlin mentioned Entropy in the following video…