Welcome to PharmaForge’s documentation!
This is the documentation for the PharmaForge package, which is designed to store, process, and query molecular data in MongoDB. The dataset inclused hierarchical molecular information, making a NoSQL database like MongoDB the ideal choice for scalability and flexibility.
The workflow includes:
Storing molecular data in MongoDB
Processing the data using the PharmaForge package
Labeling the data with the help of the PharmaForge package (e.g., adding SMILES notation)
Querying the data using MongoDB’s powerful query language
Exporting the data to the DeepMD kit HDF5 format for MLP training.
Relabeling data within the database using the PharmaForge package.
The design principles of the PharmaForge package are to make it easy to use, and flexible across a wide range of molecular systems, and easily queryable to find the data that you need.
The package includes a robust interface to a wide-variety of electronic structure codes, including XTB, DFTB+, Gaussian, Psi4, and direct interfaces to the DeepMD-kit library as well.
There is also a robust set of tools for querying the data within the database, through the querying api.
>>> from pharmaforge.queries.Query import Query
>>> query = "contains_elements only ['H', 'O']"
>>> q = Query(query)
>>> print(q.parsed_query)
{'contains_elements': {'$not': {'$elemMatch': {'$nin': ['H', 'O']}}}}
Folder Structure
The folder structure of the documentation is as follows:
![digraph G {
rankdir=LR;
node [shape=folder];
"mldatabase" [label="mldatabase/"];
"docs" [label="docs/"];
"src" [label="src/"];
"pharmaforge" [label="pharmaforge/"];
"queries" [label="queries/"];
"dbutils" [label="dbutils/"];
"interfaces" [label="interfaces/"];
"scripts" [label="scripts/"];
"recipes" [label="recipes/"];
"io" [label="io/"];
"labeling" [label="labeling/"];
"examples" [label="examples/"];
"mldatabase" -> "docs";
"mldatabase" -> "examples";
"examples" -> "AddSmiles/"
"examples" -> "DatabaseGeneration/"
"examples" -> "QDPi1_Generation/";
"examples" -> "QDPi2_Generation/";
"examples" -> "DatabaseQuerying/";
"examples" -> "CalculationInterfaces/";
"examples" -> "Relabeling/";
"examples" -> "CalculateMLP/";
"examples" -> "Relabeling/";
"mldatabase" -> "src";
"src" -> "pharmaforge";
"pharmaforge" -> "queries";
"pharmaforge" -> "dbutils";
"pharmaforge" -> "interfaces";
"pharmaforge" -> "scripts";
"pharmaforge" -> "recipes";
"pharmaforge" -> "io";
"pharmaforge" -> "labeling";
"pharmaforge" -> "stats";
node [shape=box];
"pharmaforge" -> "DataBase.py";
"dbutils" -> "create_mongodb_collections.py";
"dbutils" -> "mongo_utils.py";
"interfaces" -> "gaussianio.py";
"interfaces" -> "xtbio.py";
"interfaces" -> "psi4io.py";
"interfaces" -> "deepmdio.py";
"interfaces" -> "dftbio.py";
"interfaces" -> "abstract.py";
"interfaces" -> "deepio.py";
"interfaces" -> "scanning.py";
"scripts" -> "run_chunk_relabel.py";
"recipes" -> "GeneralDatabase.py";
"recipes" -> "QDPi1.py";
"recipes" -> "QDPi2.py";
"io" -> "hdf5_utils.py";
"labeling" -> "get_smiles.py";
"labeling" -> "relabeler.py";
"labeling" -> "smiles.py";
"queries" -> "Query.py";
"stats" -> "comparator.py";
}](_images/graphviz-e84b2c30ac8336aaa5dbed9166550f9ee002955c.png)
Folder structure for the project.
Indices and tables
User Documentation
- 1. Installation
- 2. Query Syntax
- 3. Examples
- 3.1. Adding SMILES codes to HDF5 Files
- 3.2. Building a MongoDB Database
- 3.3. Generating QDPi1 MongoDB Database
- 3.4. Generating QDPi2 MongoDB Database
- 3.5. Querying the MongoDB Database
- 3.6. Interfaces with ASE
- 3.7. Running DeepMD-Kit with Pharmaforge
- 3.8. Relabeling Existing Data within the Database
- 3.9. Generating Free Energy Profiles