Welcome to PharmaForge’s documentation!

This is the documentation for the PharmaForge package, which is designed to store, process, and query molecular data in MongoDB. The dataset inclused hierarchical molecular information, making a NoSQL database like MongoDB the ideal choice for scalability and flexibility.

The workflow includes:

  • Storing molecular data in MongoDB

  • Processing the data using the PharmaForge package

  • Labeling the data with the help of the PharmaForge package (e.g., adding SMILES notation)

  • Querying the data using MongoDB’s powerful query language

  • Exporting the data to the DeepMD kit HDF5 format for MLP training.

  • Relabeling data within the database using the PharmaForge package.

The design principles of the PharmaForge package are to make it easy to use, and flexible across a wide range of molecular systems, and easily queryable to find the data that you need.

The package includes a robust interface to a wide-variety of electronic structure codes, including XTB, DFTB+, Gaussian, Psi4, and direct interfaces to the DeepMD-kit library as well.

There is also a robust set of tools for querying the data within the database, through the querying api.

>>> from pharmaforge.queries.Query import Query
>>> query = "contains_elements only ['H', 'O']"
>>> q = Query(query)
>>> print(q.parsed_query)
{'contains_elements': {'$not': {'$elemMatch': {'$nin': ['H', 'O']}}}}

Folder Structure

The folder structure of the documentation is as follows:

digraph G {
    rankdir=LR;
    node [shape=folder];

    "mldatabase" [label="mldatabase/"];
    "docs" [label="docs/"];
    "src" [label="src/"];
    "pharmaforge" [label="pharmaforge/"];
    "queries" [label="queries/"];
    "dbutils" [label="dbutils/"];
    "interfaces" [label="interfaces/"];
    "scripts" [label="scripts/"];
      "recipes" [label="recipes/"];
      "io" [label="io/"];
      "labeling" [label="labeling/"];
      "examples" [label="examples/"];

    "mldatabase" -> "docs";
    "mldatabase" -> "examples";
    "examples" -> "AddSmiles/"
    "examples" -> "DatabaseGeneration/"
    "examples" -> "QDPi1_Generation/";
    "examples" -> "QDPi2_Generation/";
    "examples" -> "DatabaseQuerying/";
    "examples" -> "CalculationInterfaces/";
    "examples" -> "Relabeling/";
    "examples" -> "CalculateMLP/";
    "examples" -> "Relabeling/";
    "mldatabase" -> "src";
    "src" -> "pharmaforge";
    "pharmaforge" -> "queries";
    "pharmaforge" -> "dbutils";
    "pharmaforge" -> "interfaces";
    "pharmaforge" -> "scripts";
    "pharmaforge" -> "recipes";
    "pharmaforge" -> "io";
    "pharmaforge" -> "labeling";
    "pharmaforge" -> "stats";


    node [shape=box];
    "pharmaforge" -> "DataBase.py";
    "dbutils" -> "create_mongodb_collections.py";
    "dbutils" -> "mongo_utils.py";
    "interfaces" -> "gaussianio.py";
    "interfaces" -> "xtbio.py";
    "interfaces" -> "psi4io.py";
    "interfaces" -> "deepmdio.py";
    "interfaces" -> "dftbio.py";
    "interfaces" -> "abstract.py";
    "interfaces" -> "deepio.py";
    "interfaces" -> "scanning.py";
    "scripts" -> "run_chunk_relabel.py";
    "recipes" -> "GeneralDatabase.py";
    "recipes" -> "QDPi1.py";
    "recipes" -> "QDPi2.py";
    "io" -> "hdf5_utils.py";
    "labeling" -> "get_smiles.py";
    "labeling" -> "relabeler.py";
    "labeling" -> "smiles.py";
    "queries" -> "Query.py";
    "stats" -> "comparator.py";
}

Folder structure for the project.

Indices and tables

User Documentation