Elucidating the pairing of non-hydrogen bonded unnatural base pairs (UBPs) is still a controversial subject due to the lack of specificity in their mutual interactions. Experimentally, NMR is the method of choice but the DNA strand must be affixed on template of sorts such as a polymerase protein. Those discrepancies are well documented in a recent review which cites our previous computational work, both DFT and MD, on UBPs.
Since that last paper of ours on synthetic DNA, my good friend Dr. Rodrigo Galindo from Utah U. and I have had serious doubts on the real pairing fashion exhibited by Romesberg’s famous hydrophobic nucleotides d5SICS – dNaM. While the authors claim a stacked pairing (within the context of the strand in the KlenTaq polymerase enzime), our simulations showed a Watson-Crick-like pairing was favored in the native form. To further shed light on the matter we performed converged micro-seconds long simulations, varying the force field (two recent AMBER fields were explored: Bsc1 and OL15), the water model (TIP3P and OPC), and the ionic compensation scheme (Na+/Cl– or Mg2+/Cl–).
In the image below it can be observed how the pairing is consistently WC (dC1′-C1′ ~10.4 A) in the most populated clusters regardless of the force field.
Also, a flipping experiment was performed where both nucleotides were placed 180.0° outwards and the system was left to converge inwards to explore a ‘de novo’ pairing guided solely by their mutual interactions and the template formed by the rest of the strand. Distance population for C1′ – C1′ were 10.4 A for Bsc1 (regardless of ionic compensation) and 9.8 A for OL15 (10.4 A where Mg2+ was used as charge compensation).
Despite the successful rate of replication by a living organism -which is a fantastic feat!- of these two nucleotides, there is little chance they can be used for real coding applications (biological or otherwise) due to the lack of structural control of the double helix. The work of Romesberg is impressive, make no mistake about it, but my money isn’t on hydrophobic unnatural nucleotides for information applications 🙂
All credit and glory is due to the amazing Dr. Rodrigo Galindo-Murillo from the University of Utah were he works as a developer for the AMBER code among many other things. Go check his impressive record!
I just came back from beautiful Cancun where I attended for the third time the IMRC conference invited by my good friend and awesome collaborator Dr. Eddie López-Honorato, who once again pulled off the organization of a wonderful symposium on materials with environmental applications.
Dr. López-Honorato and I have been working for a number of years now on the design application of various kinds of materials that can eliminate arsenic species from drinking water supplies, an ever present problem in northern Mexico in South West US. So far we have successfully explored the idea of using calix[n]arenes hosts for various arsenic (V) oxides and their derivatives, but now his group has been thoroughly exploring the use of graphene and graphene oxide (GO) to perform the task.
Our joint work is a wonderful example of what theory and experiment and achieve when working hand-in-hand. During this invited talk I had the opportunity to speak about the modeling side of graphene oxide, in which we’ve been able to rationalize why polar solvents seem to be -counterintuitively- more efficient than non-polar solvents to exfoliate graphene sheets from graphite through attrition milling, as well as to understand the electronic mechanism by which UV light radiation degrades GO without significantly diminishing there arsenic-adsorbing properties. All these results are part of an upcoming paper so more details will come ahead.
Thanks to Dr. Eddie López for his invitation and the opportunity provided to meet old friends and make new ones within the wonderful world of scientific collaborations.
Statistical Mechanics is the bridge between microscopic calculations and thermodynamics of a particle ensemble. By means of calculating a partition function divided in electronic, rotational, translational and vibrational functions, one can calculate all thermodynamic functions required to fully characterize a chemical reaction. From these functions, the vibrational contribution, together with the electronic contribution, is the key element to getting thermodynamic functions.
Calculating the Free Energy change of any given reaction is a useful approach to asses their thermodynamic feasibility. A large negative change in Free Energy when going from reagents to products makes up for a quantitative spontaneous (and exothermic) reaction, nevertheless the rate of the reaction is a different story, one that can be calculated as well.
Using the freq option in your route section for a Gaussian calculation is mandatory to ascertain the current wave function corresponds to a minimum on a potential energy hypersurface, but also yields the thermochemistry and thermodynamic values for the current structure. However, thermochemistry calculations are not restricted to minima but it can also be applied to transition states, therefore yielding a full thermodynamic characterization of a reaction mechanism.
A regular freq calculation yields the following output (all values in atomic units):
Zero-point correction= 0.176113 (Hartree/Particle) Thermal correction to Energy= 0.193290 Thermal correction to Enthalpy= 0.194235 Thermal correction to Gibbs Free Energy= 0.125894 Sum of electronic and zero-point Energies= -750.901777 Sum of electronic and thermal Energies= -750.884600 Sum of electronic and thermal Enthalpies= -750.883656 Sum of electronic and thermal Free Energies= -750.951996
For any given reaction say A+B -> C one could take the values from the last row (lets call it G) for all three components of the reaction and perform the arithmetic: DG = GC – [GA + GB], so products minus reagents.
By default, Gaussian calculates these values (from the previously mentioned partition function) using normal conditions, T = 298.15 K and P = 1 atm. For an assessment of the thermochemistry at other conditions you can include in your route section the corresponding keywords Temperature=x.x and Pressure=x.x, in Kelvin and atmospheres, respectively.
(Huge) Disclaimer: Although calculating the thermochemistry of any reaction by means of DFT calculations is a good (and potentially very useful) guide to chemical reactivity, getting quantitative results require of high accuracy methods like G3 or G4 methods, collectively known as Gn mehtods, which are composed of pre-defined stepwise calculations. The sequence of these calculations is carried out automatically; no basis set should be specified. Other high accuracy methods like CBS-QB3 or W1U can also be considered whenever Gn methods are too costly.
The format of a research paper hasn’t changed much throughout history, despite the enormous changes in platforms available for their consumption and the near extinction of the library issue. Convenient electronic files such as PDFs still resemble printed-and-bound-in-issues papers in their layout instead of exploiting the seemingly endless capabilities of the electronic format.
For instance, why do we still need to have page numbers? a DOI is a full, traceable and unique identification for each work and there are so many nowadays that publishers have to pour them out as e-first, ASAPs, and just accepted before having them assigned page numbers, a process which is still a concern for some researchers (and even for some of the organizations funding them or evaluating their performance). Numbers for Issues, Volumes and Pages are library indexes needed to sort and retrieve information from physical journals but in the e-realm where one can browse all issues online, perform a search and download the results these indexes are hardly of any use, only the year is helpful in establishing a chronological order to the development of ideas. This brings me to the next issue (no pun intended): If bound-issues are no longer a thing then neither should be covers. Being selected for a cover is a huge honor, it means the editorial staff think your work stands out from the published works in the same period; but nowadays is an honor that comes to a price, sometimes a high price. With the existence of covers, back-covers, inner-covers and inner-back-covers and whatnot at USD$1,500 a piece, the honor gets a bit diluted. Advertisers know this and now they place their ads as banners, pop-ups and other online digital formats instead of -to some extent- paying for placing ads in the pages of the journals.
I recently posted a quick informal poll on Twitter about the scientific reading habits of chemists and I confirmed what I expected: only one in five still prefers to mostly read papers on actual paper*, the rest rely on an electronic version such as HTML full text or the most popular PDF on a suitable reader.
— Joaquin Barroso (@joaquinbarroso) June 3, 2019
What came as a surprise for me was that in the follow up poll, Reference Manager programs such as Mendeley, Zotero, EndNote or ReadCube are only preferred by 15% while 80% prefer the PDF reader (I’m guessing Acrobat Reader might be the most popular.) A minority seems to prefer the HTML full text version, which I think is the richest but hardly customizable for note taking, sharing, or, uhm hoarding.
A follow up on the previous poll. Dear #ChemTweeps, if you mostly read papers in electronic format what is your preferred platform?
— Joaquin Barroso (@joaquinbarroso) June 10, 2019
I’m a Mendeley user because I like the integration between users, its portability between platforms and the synchronization features but if I were to move to another reference manager software it would be ReadCube. I like taking notes, highlighting text, and adding summaries and ideas onto the file but above all I like the fact that I can conduct searches in the myriad of PDF files I’ve acumulated over the years. During my PhD studies I had piles of (physical) paper and folders with PDF files that sometimes were easier to print than to sort and organize (I even had a spreadsheet with the literature read-a nightmarish project in itself!)
So, here is my wish list for what I want e-papers in the 21st century to do. Some features are somewhat available in some journals and some can be achieved within the PDF itself others would require a new format or a new platform to be carried out. Please comment what other features would you like to have in papers.
- Say goodbye to the two columns format. I’m zooming to a single column anyway.
- Pop-up charts/plots/schemes/figures. Let me take a look at any graphical object by hovering (or 3D touching in iOS, whatever) on the “see Figure X” legend instead of having to move back and forth to check it, specially when the legend is “see figure SX” and I have to go to the Supporting Information file/section.
- Pop-up References. Currently some PDFs let you jump to the References section when you click on one but you can’t jump back but scroll and find the point where you left.
- Interactive objects. Structures, whether from X-ray diffraction experiments or calculations could be deposited as raw coordinates files for people to play with and most importantly to download** and work with. This would increase the hosting journals need to devote to each work so I’m not holding my breath.
- Audio output. This one should be trickier, but far most helpful. I commute long hours so having papers being read out loud would be a huge time-saver, but it has to be smart. Currently I make Siri read papers by opening them in the Mendeley app, then “select all“, “voice“, but when it hits a formula, or a set of equations the flow is lost (instead of reading water as ‘H-Two-O‘, it reads ‘H-subscript Two-O‘; try having the formula of a perovskite be read)
- A compiler that outputs the ‘traditional version‘ for printing. Sure, why not.
I realize this post may come out as shallow in view of the Plan-S or FAIR initiatives, sorry for that but comfort is not incompatible with accessibility.
What other features do you think research papers should have by now?
* It is true that our attention -and more importantly- our retention of information is not the same when we read on paper than on a screen. Recently there was an interview on this matter on Science Friday.
** I absolutely hate having a Supporting Information section with long PDF lists of coordinates to copy-paste and fix into a new input file. OpenBabel, people!
There was this following message on a GIAO calculation when trying to open the file in GaussView5.0 (it opens successfully in ChemCraft)
CConnectionGLOG::Parse_GLOG() Failure reading NMR data Line Number 2414
When you go to said line (line 2414) you find the following string:
Eigenvalues:-12345.6789 -12345.6789 -12345.6789
Which belong to the eigenvalues of the SCF NMR GIAO shielding tensor. The problem lies with the space missing between the colon sign ‘:’ and the ‘-‘ sign of the first eigenvalue. You can fix it either by hand with an editor but GV only warns you about the first instance so there may be others and you need to repeat the procedure. It is probably best to fix them all in one go with the following command from the terminal:
sed -i ‘s/Eigenvalues:-/Eigenvalues: -/g’
It is good to be back in Romania at the UBB writing these posts where this blog began. Thanks to my good friend Dr. Alexandru Lupan for pointing out this error.
“Well, where else were they supposed to appear?”
I was sent this error along with the previous question for a failed optimization. Apparently there is no answer in the internet (I quickly checked) so here it is:
Gaussian is confused about finding atomic coordinates because there is also a geom=check instruction placed in the route section, i.e., it was told to retrieve the atomic coordinates from a checkpoint and then it was given those atomic coordinates within the input so it doesn’t know what you mean and exits.
This week marks the 10th anniversary of this little blog! It’s crazy to think a pet project that I took on during my last year as a postdoc is still going on after a decade of recording the work of our group in computational chemistry and it is also a happy coincidence that this year is the centennial anniversary of IUPAC and the sesquicentennial anniversary of the Periodic Table, for which 2019 has been designated as the International Year of the Periodic Table. I will release various posts celebrating this first decade of blogging and some regarding the IYPT2019 as soon as possible, also some major changes in layout and look are coming. It has been suggested to me that setting a patreon.com account could help me raise some funding for assisting underprivileged students but I’m not so sure yet.
By 2009 the chemistry blogosphere was already in full swing, so I got to it a bit late. (Is commenting ‘First!‘ still a thing?) At the time my job future seemed a bit uncertain, I had already spent two years as a postdoc in Romania and prior to that I worked for a private company in their research center here in Mexico so I started to ramble here so in upcoming job interviews I could point to a resource which gathered my thoughts and some achievements in a more informal fashion than a CV or a resume. (Plus, I like writing about things other than chemistry just for myself, maybe someday I’ll start a blog with some fiction writings I have here and there.) Quite frankly I didn’t think I could go back into academia so I was mainly looking for jobs in the R&D departments of various chemical companies, particularly in the field of coatings which was the one I already had some experience.
I never imagined this little blog would gain any attention, I think I had something like 2,000 views on the first year, now it’s up to 1,500 views a week! One of the first posts that gained popularity quite quickly dealt with the calculation of SCRF calculations and some parameters we were struggling to get right. Once I found the best parameters for running them, my boss, the late Prof. Dr. Ioan Silaghi-Dumitrescu at Babes-Bolyai University, asked me to email them to the group and post them physically in the lab so we wouldn’t loose them; I thought it would be a better idea to have them on the blog so we all could access them easily and at the same time share our findings with whomever had the same issues. Turned out that many people struggled with these parameters for SCRF calculations in Gaussian and from that moment on that became one of the underlying principles of the blog: “any problem we face in the lab is definitely faced by someone else, so lets share our solution”; the other principle of course was my blatant self promotion.
One of the most rewarding aspects of having kept this blog going on for so long is knowing that it is a modest resource that some people has found helpful. Attending conferences and having people telling me they like my posts and have found help in them is extremely gratifying. Also, academically it has allowed me to meet wonderful people with whom I’ve established very interesting collaborations in various countries like Iran, US, Slovakia, Czech Republic, Chile, Bulgaria and many more.
Very early I started getting direct questions to specific problems and to the best of my abilities I’ve tried to answer them although I not always have the time to do so and for that I apologize to all the readers who didn’t get an answer; up until now keeping this blog has been a spare-time endeavor which not always gets the priority it deserves withing my academic tasks.
Thank you for reading, commenting and sharing these posts during this past decade! I truly appreciate it and it has been very important to me; I sometimes feel the posts go into the void but every now and then I’m approached by readers who have found the blog helpful and that is very rewarding. Here’s to ten more years!
Quick Post on preparing Gaussian input files from PDB files.
If you’re modeling biological systems chances are that, more often than not, you start by retrieving a PDB file. The Protein Data Bank is a repository for all things biochemistry – from oligo-peptides to full DNA sequences with over 140,000 available files encoding the corresponding structure obtained by various experimental means ranging from X-Ray diffraction, NMR and more recently, Cryo Electron Microscopy (CEM).
The PDB file encodes the Cartesian coordinates for each atom present in the structure as well as their in the same way molecular dynamics codes -like AMBER or GROMACS- code the parameters for a force field; this makes the PDB a natural input file for MD.
There are however some considerations to have in mind for when you need to use these coordinates in electronic structure calculations. Personally I give it a pass with OpenBabel to add (or possibly just re-add) all Hydrogen atoms with the following instruction:
$>obabel -ipdb filename.pdb -ogjf -Ofilename.gjf -h
Alternatively, you can select a pH value, say 7.5 with:
$>obabel -ipdb filename.pdb -ogjf -Ofilename.gjf -h -p7.5
You may also use the GUI if by any chance you’re working in Windows:
This sends all H atoms to the end of the atoms list. Usually for us the next step is to optimize their positions with a partial optimization at a low level of theory for which you need to use the ReadOptimize ReadOpt or RdOpt in the route section and then add the atom list at the end of the input file:
Finally, visual inspection of your input structure is always helpful to find any meaningful errors, remember that PDB files come from experimental measurements which are not free of problems.
As usual thanks for reading, commenting, and sharing.
We celebrate the successful thesis defense of Gustavo “Gus” Mondragón who has now completed his Masters degree and is now on to getting a PhD in our group. Gustavo has worked on the search for multiexcitonic states and their involvement in the excitonic transference between photosynthetic pigments, specifically between bacteriochlorophyll-d molecules (BChl-d) from the bchQRU chlorosome whose whole structure is shown in the gallery below. To this end, Gustavo has studied and implemented the Restricted Active Space method with double spin flip (RAS-2SF) with the use of QChem5.0, a method that has required the use and understanding of states with high multiplicities. Additionally, Gustavo has investigated the influence of the environment within the chlorosome by performing ONIOM calculations for the spectroscopic properties of a BChl-d dimer, finding albeit qualitatively a batochromic effect, probably an expected result but nonetheless an impressive feat for the level of theory selected.
There’s still a lot of work to do in this line of research and although we’re eager to publish our results in this excitonic transference mechanism we want to be completely sure that we’re taking every possibility into consideration so we don’t incur into any inconsistencies.
Gustavo cultivates many research interests from excited states of these pigments to biochemical processes that require the use of various tools; I’m sure his permanence in our lab will bring lots of interesting results. Congratulations, Gus! Thank you for your hard work.
Calculating the pKa value for a Brønsted acid is very hard, like really hard. A full thermodynamic cycle (fig. 1) needs to be calculated along with the high-accuracy solvation free energy for each of the species under consideration, not to mention the use of expensive methods which will be reviewed here in another post in two weeks time.
Finding descriptors that help us circumvent the need for such sophisticated calculations can help great deal in estimating the pKa value of any given acid. We’ve been interested in the reactivity of σ-hole bearing groups in the past and just like Halogen, Tetrel, Pnicogen and Chalcogen bonds, Hydrogen bonds are highly directional and their strength depends on the polarization of the O-H bond. Therefore, we suggested the use of the maximum surface electrostatic potential (VS,max) on the acid hydrogen atom of carboxylic acids as a descriptor for the strength of their interaction with water, the first step in the deprotonation process.
We selected six basis sets; five density functionals; the MP2 method for a total of thirty-six levels of theory to optimize and calculate VS,max on thirty carboxylic acids for a grand total of 1,080 wavefunctions, which were later passed onto MultiWFN (all calculations were taken with PCM = water). Correlation with the experimental pKa values showed a great correlation across the levels of theory (R2 > 0.9), except for B3LYP. Still, the best correlations were obtained with LC-wPBE/cc-pVDZ and wB97XD/cc-pVDZ. From this latter level of theory the linear correlation yielded the following equation:
pKa = -0.2185(VS,max) + 16.1879
Differences in pKa turned out to be less than 0.5 units, which is remarkable for such a straightforward method; bear in mind that calculation of full thermodynamic cycles above chemical accuracy (1.0 kcal/mol) yields pKa differences above 1.0 units.
We then took this equation for a test with 10 different carboxylic acids and the prediction had a correlation of 98% (fig. 2)
I think this method can really catch on for a quick way to predict the pKa values of any carboxylic acid imaginable. We’re now working on the model extension to other groups (i.e. Bronsted bases) and putting together a black-box workflow so as to make it even more accessible and straightforward to use.
We’ve recently published this work in the journal Molecules, an open access publication. Thanks to Prof. Steve Scheiner for inviting us to participate in the special issue devoted to tetrel bonding. Thanks to Guillermo Caballero for the inception of this project and to Dr. Jacinto Sandoval for taking the time from his research in photosynthesis to work on this pet project of ours and of course the rest of the students (Gustavo Mondragón, Marco Diaz, Raúl Torres) whose hard work produced this work.