3.1. pharmaforge.io.hdf5_utils
- pharmaforge.io.hdf5_utils.assign_smiles(molecule_name, use_suggested_charge=False, maxcount=2, xyz_file='molecule_structure.xyz')[source]
Attempts to assign a SMILES notation to a molecule using its XYZ coordinates.
The function tries multiple possible formal charges to determine bonds and generate a valid SMILES representation using RDKit. If all attempts fail, the molecule name is added to the failed_mols list.
The use_suggested_charge option allows the function to use the suggested charge from the RDKit error message, which usually reads something like: Final molecular charge (0) does not match input (-1); could not find valid bond ordering. In this case, it would have originally tried charge 0, but then the next charge it will attempt is -1. The max_count option allows the user to set how many times the function will try this before giving up and not generating a SMILES at all.
Warning
This function, with the use_suggested_charge option, will try to assign the charge based on the error message from RDKit. Thus, it is important to implemennt some degree of skepticism with respect to whether the charges are actually the true charges or not.
- Parameters:
molecule_name (str) – The name of the molecule for which to generate SMILES.
failed_mols (list) – A list to keep track of failed molecules.
use_suggested_charge (bool, optional) – If True, use the suggested charge from the error message. Default is False.
maxcount (int, optional) – The maximum number of attempts to generate SMILES with different charges. Only works when use_suggested_charge is true. Default is 5. Otherwise, if use_suggested_charge is false, the function will only try charge 0.
xyz_file (str, optional) – The path to the XYZ file containing the molecule’s coordinates. Default is ‘molecule_structure.xyz’, which is the default for the save_molecule_to_xyz function.
- Returns:
str or None – The SMILES notation for the molecule, or None if generation failed.
int or None – The charge used to generate the SMILES notation, or None if generation failed.
- Raises:
FileNotFoundError – If the XYZ file does not exist.
Check which molecules are shared between all databases in the DataBase instance.
- Parameters:
database_instance (DataBase) – An instance of the DataBase class containing the HDF5 data.
- Returns:
Prints the shared molecules between all databases.
- Return type:
None
- pharmaforge.io.hdf5_utils.modify_hdf5_file(original_file: str, new_file: str) None [source]
Modifies an HDF5 file by renaming molecules and creating a type map.
- Parameters:
- Returns:
The function creates a new HDF5 file with modified molecule names and type maps.
- Return type:
None
- Raises:
FileNotFoundError – If the original file does not exist.
ValueError – If the original file is not an HDF5 file or if the original and new files are the same.
- pharmaforge.io.hdf5_utils.modify_molecule_name(mol_name: str) Tuple[str, List[str]] [source]
Modifies the molecule name by removing zero count elements and creating a type map.
- pharmaforge.io.hdf5_utils.new_hdf5_file_with_smiles(original_file: str, new_file: str, exist_ok=False, use_suggested_charge=False, maxcount=5) None [source]
Create a new HDF5 file with SMILES notation for each molecule. This function reads the original HDF5 file, generating a SMILES string for each molecule and saving it in the new file. The function also handles the case where the original file and the new file are the same, and raises appropriate errors if the files do not exist or if the new file already exists.
- Parameters:
original_file (str) – Path to the original HDF5 file.
new_file (str) – Path to the new HDF5 file to be created.
exist_ok (bool) – If True, overwrite the existing file if it exists. Default is False.
use_suggested_charge (bool) – If True, use the suggested charge from the error message. Default is False. (option for assign_charge)
maxcount (int) – The maximum number of attempts to generate SMILES with different charges. Default is 5. (option for assign_charge)
- Raises:
FileNotFoundError – If the original file does not exist.
ValueError – If the original file is not an HDF5 file.
FileExistsError – If the new file already exists and exist_ok is False.
OSError – If the directory for the new file cannot be created.
See also
molecule_to_xyz
Function to save a molecule’s coordinates and types to a temporary XYZ file.
assign_smiles
Function to generate SMILES notation for a molecule using its XYZ coordinates.
- pharmaforge.io.hdf5_utils.process_multiple_hdf5_files(files: List[str]) None [source]
Process multiple HDF5 files by modifying their contents and creating new files.
- Parameters:
files (List[str]) – A list of paths to the original HDF5 files to be processed.
- Returns:
The function creates new HDF5 files with modified contents based on the original files.
- Return type:
None
- Raises:
None – The function handles file operations and exceptions internally.
- pharmaforge.io.hdf5_utils.save_molecule_to_xyz(db, db_name, mol, output_file='molecule_structure.xyz', config=0)[source]
Saves a molecule’s coordinates and types to a temporary XYZ file.
- Parameters:
- Returns:
The function creates an XYZ file with the molecule’s coordinates and types.
- Return type:
None
- Raises:
None – The function handles file operations and exceptions internally.
Verify that all shared molecules between two databases have identical attributes, including nested groups like ‘set.000’.
- Parameters:
- Returns:
Prints the matched and unmatched molecules between the two databases.
- Return type:
None