5.2. mongo_utils

MongoDB utility functions for handling molecular data.

This module provides functions to create collections, add fields, modify documents, and visualize data in a MongoDB database. It also includes functions to create HDF5 files from query results and to count documents without SMILES strings.

pharmaforge.dbutils.mongo_utils.add_configurations_count(database_name, collection_name)[source]

Add a ‘configurations_count’ field to all documents in the collection based on the number of configurations in the ‘set.000.force.npy’ array.

Parameters:
  • database_name (str) – Name of the MongoDB database.

  • collection_name (str) – Name of the collection within the database.

Returns:

Summary of how many documents were updated.

Return type:

dict

pharmaforge.dbutils.mongo_utils.add_field_to_all_documents(database_name, collection_name, new_field_name, new_field_value)[source]

Add a new field to all documents in a MongoDB collection.

Parameters:
  • database_name (str) – Name of the MongoDB database.

  • collection_name (str) – Name of the collection within the database.

  • new_field_name (str) – Name of the new field to add.

  • new_field_value (any) – Value to set for the new field.

Returns:

Summary of the update operation, including matched and modified counts.

Return type:

dict

pharmaforge.dbutils.mongo_utils.clientloader(mongoclient='mongodb://localhost:27017/')[source]

Loads the database from the MongoDB server.

Parameters:

mongoclient (str) – The MongoDB client connection string. Default is ‘mongodb://localhost:27017/’.

Returns:

client – The MongoDB client object.

Return type:

MongoClient

pharmaforge.dbutils.mongo_utils.count_docs_without_smiles_for_all_collections(db: Database)[source]

Prints the number of documents without SMILES in each collection of the MongoDB database.

Parameters:

db (Database) – The MongoDB database instance.

Returns:

The function prints the count of documents without SMILES for each collection.

Return type:

None

pharmaforge.dbutils.mongo_utils.modify_field_by_query(database_name, collection_name, query, field_name, new_value)[source]

Modify a specific field for documents matching the query criteria.

Parameters:
  • database_name (str) – Name of the MongoDB database.

  • collection_name (str) – Name of the collection within the database.

  • query (dict) – MongoDB query to filter documents.

  • field_name (str) – Name of the field to modify.

  • new_value (any) – New value to set for the field.

Returns:

Summary of the update operation, including matched and modified counts.

Return type:

dict

pharmaforge.dbutils.mongo_utils.modify_single_field(database_name, collection_name, field_name, old_value, new_value)[source]

Modify a single attribute value across all documents that match the old value.

Parameters:
  • database_name (str) – Name of the MongoDB database.

  • collection_name (str) – Name of the collection within the database.

  • field_name (str) – Name of the field to modify.

  • old_value (any) – The value to be replaced.

  • new_value (any) – The new value to set.

Returns:

Summary of the update operation, including matched and modified counts.

Return type:

dict

pharmaforge.dbutils.mongo_utils.plot_molecule_frequency_histogram(database_name, collection_name, smiles_field='smiles')[source]

Create a histogram of molecule counts for each type of molecule in a MongoDB collection.

Parameters:
  • database_name (str) – Name of the MongoDB database.

  • collection_name (str) – Name of the collection within the database.

  • smiles_field (str) – Field name containing the SMILES string.

Returns:

The function displays a histogram of molecule counts.

Return type:

None