5.1. create_mongodb_collections

pharmaforge.dbutils.create_mongodb_collections.add_theory_to_collection(collection_name, hdf5_file_path, level_of_theory)[source]

Add a new level of theory to an existing MongoDB collection using data from an HDF5 file.

Parameters:
  • collection_name (pymongo.collection.Collection) – The MongoDB collection to update.

  • hdf5_file_path (str) – Path to the HDF5 file containing the new level of theory data.

  • level_of_theory (str) – The name of the new level of theory to add.

Return type:

None

Raises:
  • FileNotFoundError – If the specified HDF5 file does not exist.

  • KeyError – If the molecule ID does not exist in the collection.

pharmaforge.dbutils.create_mongodb_collections.create_collection(collection_name, hdf5_file_path, level_of_theory='wB97M-D3(BJ)/def2-TZVPPD', data_source=None)[source]

Function to create mongodb collections using the hdf5 files having smiles notation

This adds extra fields including atom_coeff if present, and related configuration fields

Parameters:
  • collection_name (str) – Name of the MongoDB collection to create.

  • hdf5_file_path (str) – Path to the HDF5 file containing molecule data.

  • level_of_theory (str) – Level of theory used for the calculations.

  • data_source (str) – Source of the data, if applicable.

Return type:

None

Raises:

FileNotFoundError – If the specified HDF5 file does not exist.

pharmaforge.dbutils.create_mongodb_collections.process_hdf5_folder(folder_path, database_name, level_of_theory=None, data_source=None)[source]

Process all HDF5 files in a folder and create MongoDB collections for each HDF5 file.

Parameters:
  • folder_path (str) – Path to the folder containing HDF5 files

  • database_name (str) – Name of the MongoDB database to use

  • level_of_theory (str, optional) – Level of theory to use for all collections

  • data_source (str, optional) – Data source identifier for all collections

Raises:

Todo

  • Remove hardcoded MongoDB connection string