1. DataBase

class pharmaforge.database.DataBase[source]

Bases: object

A class to represent a pymongo database of molecules

Methods

add_data(filepath)

Add data to the database

find_all_unique([aslist])

Find all unique keys in the entire database

find_empty_molecules()

Find all empty molecules in the database

find_molecule(mol_key[, verbose])

Find a molecule in the database

find_shared_molecules(db_key1, db_key2)

Find all shared molecules between two databases

find_unique(db_key)

Find all unique keys in an individual database

obtain_feature(feature_key[, db_key])

Obtain a feature from the database

add_data(filepath)[source]

Add data to the database

Parameters:

filepath (str or Path) – The name or path to the data file to add. The file must be in hdf5 format.

Return type:

None

:raises ValueError : If the file type is not hdf5: :raises FileNotFoundError : If the file is not found, or the path does not exist:

find_all_unique(aslist=False)[source]

Find all unique keys in the entire database

Parameters:

aslist (bool) – Whether to return the unique keys as a list or a dictionary

Returns:

unique – A dictionary of all unique keys in the database

Return type:

dictionary of lists

find_empty_molecules()[source]

Find all empty molecules in the database

Returns:

empty – A dictionary of all empty molecules in the database

Return type:

dictionary of lists

find_molecule(mol_key, verbose=True)[source]

Find a molecule in the database

Parameters:

mol_key (str) – The key of the molecule to search

Returns:

mol – The molecule data

Return type:

dict

find_shared_molecules(db_key1, db_key2)[source]

Find all shared molecules between two databases

Parameters:
  • db_key1 (str) – The key of the first database to search

  • db_key2 (strAdding SMILES codes to HDF5 Files) – The key of the second database to search

Returns:

shared – A list of all shared molecules between the two databases

Return type:

list

find_unique(db_key)[source]

Find all unique keys in an individual database

Parameters:

db_key (str) – The key of the database to search

Returns:

unique – A list of all unique keys in the database

Return type:

list

obtain_feature(feature_key, db_key=None)[source]

Obtain a feature from the database

Parameters:
  • feature_key (str) – The key of the feature to obtain

  • db_key (str) – The key of the database to search

Returns:

feature – A list of the feature values

Return type:

list

class pharmaforge.database.Loader(client_addr='mongodb://localhost:27017/', database_name=None, collection_name=None)[source]

Bases: object

A class ot access the database, once it has been put into the mongoDB

Parameters:
  • client_addr (str, optional) – The address of the client to open. Default is ‘mongodb://localhost:27017/’

  • database_name (str, optional) – The name of the database to open. Default is None

  • collection_name (str, optional) – The name of the collection to open. Default is None

client

The MongoDB client

Type:

pymongo.MongoClient

db

The database object

Type:

pymongo.database.Database

collection

The collection object

Type:

pymongo.collection.Collection

Methods

list_collection_entry([entry])

List all the data in the selected entry of the collection

list_collections([verbose])

List all collections in the selected database

list_db_names()

List all database names in the client

select_collection(collection_name)

Select a collection from the database

select_db(db_name[, verbose])

Select a database from the client

list_collection_entry(entry=None)[source]

List all the data in the selected entry of the collection

Parameters:

entry (str) – The name of the entry to list. If None, all entries will be listed

Return type:

None

list_collections(verbose=True)[source]

List all collections in the selected database

Parameters:

verbose (bool) – Whether to print the collection names or not

Returns:

collection_names – A list of all collection names in the database

Return type:

list

list_db_names()[source]

List all database names in the client

Parameters:

None

Returns:

db_names – A list of all database names in the client

Return type:

list

select_collection(collection_name)[source]

Select a collection from the database

Parameters:

collection_name (str) – The name of the collection to select

Return type:

None

select_db(db_name, verbose=True)[source]

Select a database from the client

Parameters:

db_name (str) – The name of the database to select

Return type:

None