Category Archives: Coding
Density Functional Theory is by far the most successful way of gaining access to molecular properties starting from their composition. Calculating the electronic structure of molecules or solid phases has become a widespread activity in computational as well as in experimental labs not only for shedding light on the properties of a system under study but also as a tool to design those systems with taylor-made properties. This level of understanding of matter brought by DFT is based in a rigorous physical and mathematical development, still–and maybe because of it–DFT (and electronic structure calculations in general for that matter) might be thought of as something of little use outside academia.
Prof. Juan Carlos Sancho-García from the University of Alicante in Spain, encouraged me to talk to his students last month about the reaches of DFT in the industrial world. Having once worked in the IP myself I remembered the simulations performed there were mostly DPD (Dissipative Particle Dynamics), a coarse grained kind of molecular dynamics, for investigating the interactions between polymers and surfaces, but no DFT calculations were ever on sight. It is widely known that Docking, QSAR, and Molecular Dynamics are widely used in the pharma industry for the development of new drugs but I wasn’t sure where DFT could fit in all this. I thought patent search would be a good descriptor for the commercial applicability of DFT. So I took a shallow dive and searched for patents explicitly mentioning the use of DFT as part of the invention development process and protection. The first thing I noticed is that although they appear to be only a few, these are growing in numbers throughout the years (Figure 1). Again, this was not an exhaustive search so I’m obviously overlooking many.
The second thing that caught my attention was that the first hit came from 1998, nicely coinciding with the rise of B3LYP (Figure 2). This patent was awarded to Australian inventors from the University of Wollongong, South New Wales to determine trace gas concentrations by chromatography by means of calculating the FT-IR spectra of sample molecules (Figure 3), so DFT is used as part of the invention but I ignore if this is a widespread method in analytical labs.
While I’m mentioning the infamous B3LYP functional, a search about it in patents yields the following graph (Figure 4), most of which relate to the protection of photoluminescent or thermoluminescent molecules for light emitting devices; it appears that DFT calculations are used to provide the key features of their protection, such as HOMO-LUMO gap etc.
So what about software? Most of the more recent patents in Figure 1 (2018 – 2022) lie in the realm of electronics, particularly the development of semiconductors, ceramical or otherwise, so it was safe to assume VASP could be a popular choice to that end, right? turns out that’s not necessarily the case since a patent search for VASP only accounts for about the 10% of all awarded patents (Figure 5).
I guess it’s safe to say by now that DFT has a significant impact in the industrial development, one could only expect it to keep on rising, however the advent of machine learning techniques and other artificial intelligence related methods promise an accelerated development. I went again to the patents database and this time searched for ‘machine learning development materials‘ (the term ‘development’ was deleted by the search engine, guess found it too obvious) and its rise is quite notorious, surpassing the frequency of DFT in patents (Figure 6), particularly in the past 5 years (2018 – 2022).
I’m guessing in some instances DFT and ML will tend to go hand in hand in the industrial development process, but the timescales reachable by ML will only tend to grow, so I’m left with the question of what are we waiting for to make ML and AI part of the chemistry curricula? As computational chemistry teachers we should start talking about this points with our students and convince the head of departments to help us create proper courses or we risk our graduates to become niche scientists in a time when new skills are sought after in the IP.
Thanks again to Prof. Juan Carlos Sancho García at the University of Alicante, Spain, who asked me talk about the subject in front of his class, and to Prof. José Pedro Cerón-Carrasco from Cartagena for allowing me to talk about this and other topics at Centro Universitario de la Defensa. Thank you, guys! I look forward to meeting you again soon.
There’s an error message when opening some Gaussian16 output files in GaussView5 for which the message displayed is the following:
ConnectionGLOG::Parse_Gauss_Coord(). Failure reading oriented atomic coordinates. Line Number
We have shared some solutions to the GaussView handling of *chk and *.fchk files in teh past but never for *.log files, and this time Dr. Davor Šakić from the University of Zagreb in Croatia has brought to my attention a fix for this error. If “Dipole orientation” with subsequent orientation is removed, the file becomes again readable by GaussView5.
Here you can download a script to fix the file without any hassle. The usage from the command line is simply:
˜$ chmod 777 Fg16TOgv5 ˜$ ./Fg16TOgv5 name.log
The first line is to change and grant all permissions to the script (use at your discretion/own risk), which in turn will take the output file name.log and yield two more files: gv5_name.log and and name.arch; the latter archive allows for easy generation of SI files while the former is formatted for GaussView5.x.
Thanks to Dr. Šakić for his script and insight, we hope you find it useful and if indeed you do please credit him whenever its due, also, if you find this or other posts in the blog useful, please let us know by sharing, staring and commenting in all of them, your feedback is incredibly helpful in justifying to my bosses the time I spent curating this blog.
Thanks for reading.
One of the most popular posts in this blog has to do with calculating Fukui indexes, however, when dealing with a large number of molecules, our described methodology can become cumbersome since it requires to manually extract the population analysis from two or three different output files and then performing the arithmetic on them separately with a spreadsheet or something.
Our new team member Ricardo Loaiza has written a python script that takes the three aforementioned files and yields a .csv file with the calculated Fukui indexes, and it even points out which of the atoms exhibit the largest values so if you have a large molecule you don’t have to manually check for them. We have also a batch version which takes all the files in any given directory and performs the Fukui calculations for each, provided it can find file triads with the naming requirements described below.
Output files must be named filename.log (the N electrons reference state), filename_plus.log (the state with N+1 electrons) and filename_minus.log (the N-1 electrons state). Another restriction is that so far these scripts only work with NBO population analysis as provided by the NBO3.1 program available in the various versions of Gaussian. I imagine the listing is similar in NBO5.x and NBO6.x and so it should work if you do the population analysis with them.
The syntax for the single molecule version is:
python fukui.py filename.log filename_minus.log filename_plus.log
For the batch version is:
(Por Lote means In Batch in Spanish.)
These scripts are available via GitHub. We hope you find them useful, and you do please let us know whether here at the comments section or at our GitHub site.