8.1. Comparator - Statistical Comparisons

class pharmaforge.stats.comparator.Comparator(client_addr=None, database_name=None, collection_name=None)[source]

Bases: object

A class to do operations on data stored in the database, such as comparing data from different levels of theory.

client

The MongoDB client object used to connect to the database.

Type:

pharmaforge.database.Loader

Parameters:
  • client_addr (str, optional) – The address of the MongoDB client. Default is None, which uses the default address “mongodb://localhost:27017”.

  • database_name (str, optional) – The name of the MongoDB database. Default is None.

  • collection_name (str, optional) – The name of the MongoDB collection. Default is None.

Methods

calculate(first_level, second_level[, ...])

Calculate the mean absolute error or mean squared error between two levels of theory for a given quantity.

calculate(first_level, second_level, quantity='energies', query='nmols gt 0', comparetype='MAE', verbose=False)[source]

Calculate the mean absolute error or mean squared error between two levels of theory for a given quantity.

This function compares the data from two different levels of theory for a given quantity (e.g., eneries or forces) and then calculates the mean absolute error (MAE) or mean squared error (MSE) between them. The comparison is done for all molecules in the database that match the given query, assuming that both molecules have the same data present (some molecules may not have data for both levels of theory, or SCF may not have converged during the relabeling of data.

Parameters:
  • first_level (str) – The first level of theory.

  • second_level (str) – The second level of theory.

  • quantity (str, optional) – The quantity to compare. Default is “energies”. Other option is “forces”.

  • query (str, optional) – The query to filter the data. Default is “nmols gt 0”. This is a MongoDB query string.

Returns:

The mean absolute error between the two levels of theory for the given quantity.

Return type:

float

Raises:

ValueError – If the database is not found or if the query is invalid.

Example

>>> from pharmaforge.stats.comparator import Comparator
>>> compare = Comparator(
...     client_addr="mongodb://localhost:27017/",
...     database_name="QDPi1_Database",
... )
>>> results = compare.calculate(
...     first_level="wB97X/6-31G*-DFTB3",
...     second_level="QDPi1",
...     quantity="forces",
...     query="nmols gt 0",
...     comparetype="MAE"
... )