Adding SMILES codes to HDF5 Files ================================= In this tutorial, you will learn how to take an existing file in the DEEPMD-Kit HDF5 format, and add a SMILES string to each molecule in the file. Learning Objectives ------------------- - Understand how to read an HDF5 file in the DEEPMD-Kit format. - Learn how to add SMILES strings to the HDF5 file. - Understand how to save the modified HDF5 file. Required Files ---------------- The files for this tutorial are located in examples/AddSmiles and examples/AddSmiles/inputs - t8.hdf5 : This is the HDF5 file that you will be modifying. It comes from the tautobase, and is relatively small (which makes it easier to work with). - setup_smiles.py : This is the script that will be used to add the SMILES strings to the HDF5 file. Tutorial -------- In this example, we demonstrate how to add SMILES strings to the HDF5 file. The SMILES strings are generated using the RDKit library, which is a popular library for cheminformatics. SMILES strings are a way to represent chemical structures in a text format. They are widely used in cheminformatics and molecular modeling, and can be easily converted to other formats (e.g., InChI, SDF, etc.). Sometimes, the SMILES strings are not immediately available and you may need to generate them from the molecular structure when first building the database. This is what we will do in this example. To get started, we will import the library and load the HDF5 file to show its current contents. .. literalinclude :: ../../../../examples/AddSmiles/setup_smiles.py :language: python :start-at: from pharmaforge.database import DataBase :end-at: four keys in the entry Now we will add smiles strings in two ways. The first is to just add them, assuming the charge for every structure is zero. .. literalinclude :: ../../../../examples/AddSmiles/setup_smiles.py :language: python :start-at: outputs/t8_with_smiles.hdf5 :end-before: outputs/t8_with_smiles_suggested_charge.hdf5 The second is to add them allowing rdkit to predict the charge. This is done by adding the optional flag `use_suggested_charge=True`. .. literalinclude :: ../../../../examples/AddSmiles/setup_smiles.py :language: python :start-at: outputs/t8_with_smiles_suggested_charge.hdf5 Now there are smiles strings added to the HDF5 file! Full Code ---------- .. literalinclude :: ../../../../examples/AddSmiles/setup_smiles.py :language: python