7.1. Query Class
This module provides a class for querying the MongoDB database and accessing the data store in it.
- class pharmaforge.queries.query.Query(querystring: str)[source]
Bases:
object
This is a class for querying a MongoDB database. It takes a query string and parses it into a dictionary format that can be used to query the database. The query string can contain various operators such as ‘and’, ‘or’, ‘not’, ‘any’, and field-value pairs. The class also provides methods to display the query, apply it to the database, and convert the results to an HDF5 file format.
- Parameters:
querystring (str) – The query string to be parsed.
- theorylevels
A dictionary to store the theory levels of the molecules and how many molecules match each level.
- Type:
Notes
The query string should be in the format: field operator value
Queries can be run using the apply method of the Query class, like this:
query = "nmols eq 5" q = Query(query) results = q.apply(db, "ani_qdpi") for molecule in results: print(str(molecule.get('molecule_id')))
See also
pymongo
The MongoDB driver for Python.
h5py
The HDF5 file format library for Python.
Methods
apply
(db, collection[, verbose])Returns the results of the query.
Displays the the parsed query in a readable format.
parse_query
(querystring)Parses the query string and returns a dictionary of the query.
result_to_ase
(result)Converts the result to an ASE structure.
results_to_deepmdkit
(hdf5_filename[, ...])Create an HDF5 file from MongoDB query results.
- apply(db, collection, verbose=True)[source]
Returns the results of the query.
- Parameters:
db (pymongo.database.Database) – The MongoDB database to query.
collection (str) – The name of the collection to query.
verbose (bool, optional) – If True, prints additional information about the query results. Default is True.
- Returns:
A list of documents matching the query.
- Return type:
- Raises:
ValueError – If the query string is invalid or contains unsupported operators.
Notes
The function applies the parsed query to the specified collection in the MongoDB database and returns the results as a list of documents. The function also prints the number of documents found and the theory levels of the molecules in the results.
- display_query()[source]
Displays the the parsed query in a readable format. This prints it in a convenient json format.
This function is useful for debugging and understanding the structure of the query, and seeing it across multiple lines.
- Parameters:
None
- Return type:
None
- parse_query(querystring)[source]
Parses the query string and returns a dictionary of the query.
This function interprets the query string and converts it into a dictionary format that can be used to query a MongoDB database. The query string can contain various operators such as ‘and’, ‘or’, ‘not’, ‘any’, and field-value pairs.
- Parameters:
querystring (str) – The query string to be parsed.
- Returns:
A dictionary representing the parsed query.
- Return type:
- Raises:
ValueError – If the query string is invalid or contains unsupported operators.
Notes
The query string should be in the format: field operator value where ‘field’ is the name of the field to query, ‘operator’ is the comparison operator (e.g., ‘gt’, ‘lt’, ‘gte’, ‘lte’, ‘eq’, ‘ne’, ‘any’, ‘all’), and ‘value’ is the value to compare against.
Here are some examples:
example_queries=[ "nmols eq 5", "not nmols eq 5", "nmols eq 1", "contains_elements any [H,N,O,C]", "contains_elements any [H,O] and contains_elements any [C]", "contains_elements any [H,O] and not contains_elements any [C]", "contains_elements any [H,N,O,C] or nmols gt 1", "molecular_charge eq -1", "not molecular_charge eq 0", ]
Examples
>>> from pharmaforge.queries.Query import Query >>> q = Query("nmols eq 5") >>> q.parse_query("nmols eq 5") {'nmols': {'$eq': 5}} >>> q = Query("not nmols eq 5") >>> q.parse_query("not nmols eq 5") {'nmols': {'$not': {'$eq': 5}}} >>> q = Query("contains_elements any [H,N,O,C]") >>> q.parse_query("contains_elements any [H,N,O,C]") {'contains_elements': {'$not': {'$nin': ['H', 'N', 'O', 'C']}}} >>> q = Query("contains_elements any [H,O] and contains_elements any [C]") >>> q.parse_query("contains_elements any [H,O] and contains_elements any [C]") {'$and': [{'contains_elements': {'$not': {'$nin': ['H', 'O']}}}, {'contains_elements': {'$not': {'$nin': ['C']}}}]} >>> q = Query("contains_elements any [H,O] and not contains_elements any [C]") >>> q.parse_query("contains_elements any [H,O] and not contains_elements any [C]") {'$and': [{'contains_elements': {'$not': {'$nin': ['H', 'O']}}}, {'contains_elements': {'$not': {'$not': {'$nin': ['C']}}}}]} >>> q = Query("contains_elements any [H,N,O,C] or nmols gt 1") >>> q.parse_query("contains_elements any [H,N,O,C] or nmols gt 1") {'$or': [{'contains_elements': {'$not': {'$nin': ['H', 'N', 'O', 'C']}}}, {'nmols': {'$gt': 1}}]} >>> q = Query("molecular_charge eq -1") >>> q.parse_query("molecular_charge eq -1") {'molecular_charge': {'$eq': -1}} >>> q = Query("not molecular_charge eq 0") >>> q.parse_query("not molecular_charge eq 0") {'molecular_charge': {'$not': {'$eq': 0}}} >>> q = Query("contains_elements only [H,N,O,C]") >>> q.parse_query("contains_elements only [H,N,O,C]") {'contains_elements': {'$not': {'$elemMatch': {'$nin': ['H', 'N', 'O', 'C']}}}}
- static result_to_ase(result)[source]
Converts the result to an ASE structure.
- Parameters:
result (dict) – The result from the database query.
- Returns:
ase_structures – A list of ASE Atoms objects representing the structure.
- Return type:
Notes
This function is a placeholder and should be implemented based on the specific requirements of the ASE structure.
- results_to_deepmdkit(hdf5_filename, level_of_theory=None, second_level_of_theory=None, verbose=False)[source]
Create an HDF5 file from MongoDB query results.
- Parameters:
hdf5_filename (str) – Name of the HDF5 file to create (without extension).
level_of_theory (str) – The level of theory to be used for the HDF5 file. This is required to specify the forces and energies.
second_level_of_theory (str, optional) – An additional level of theory to be used if training a delta MLP. Default is None.
verbose (bool, optional) – If True, prints additional information about the HDF5 file creation process. Default is False.
- Returns:
The function creates an HDF5 file and does not return any value.
- Return type:
None
Notes
The function creates an HDF5 file with the specified filename and stores the query results in it.
As an example:
query = "nmols gt 4 and nmols lt 7" q = Query(query) print("*****************************************") print(f"Query: {q.querystring}") print(f"Parsed query: {q.parsed_query}") q.display_query() results = q.apply(db, "ani_qdpi") for molecule in results: print(str(molecule.get('molecule_id'))) q.results_to_deepmdkit("saved_model.hdf5")