2. Query Syntax

The pharmaforge package has implemented a query syntax that allows users to create complex queries using a simple and intuitive format. The syntax is designed to be flexible and powerful, enabling users to filter and manipulate data efficiently. The syntax is based on a combination of keywords, operators, and values that can be combined to form complex expressions.

An example for querying a dataset is included in Querying the MongoDB Database.

2.1. Basic queries

The basic form of a query is as follows:

` query = "field operator value" `

Where:

  • field is the name of the field you want to query.

  • operator is the operator you want to use (e.g., gt, lt, eq, ne, any)

  • value is the value you want to compare against.

so you can do the following:

query = "nmols eq 5"

This query will return all records where the nmols field is equal to 5.

You can also use the not operator to negate a condition:

query = "not nmols eq 5"

Behind the scenes, this is creating a more complex query, like …

query = {
    "nmols": {
        "$eq": 1
        }
}

for the first example, and…

query={
    "nmols": {
        '$not': {
            "$eq": 5
        }
    }
}

for the second example.

2.2. Advanced Query Syntax

We have also implemented support for compound queries. These are queries that combine multiple conditios using logical operators, such as and, and or.

The syntax for a compound query is as follows:

compound_query = "field1 operator1 value1 and field2 operator2 value2"

Where:

  • field1 and field2 are the names of the fields you want to query.

  • operator1 and operator2 are the operators you want to use (e.g., gt, lt, eq, ne, any)

  • value1 and value2 are the values you want to compare against.

For example, you can create a compound query that returns all records where contains_elements has H and O atoms, but not C atoms, as:

query = "contains_elements any [H,O] and not contains_elements any [C]"

This query will return all records where the contains_elements field contains either H or O, but not C.

The compound query is translated into a more complex query, like:

query={
    '$and': [{
        'contains_elements': {
            '$not': {
                '$nin': ['H', 'O']
                }
            }
        },
        {'contains_elements': {
            '$not': {
                '$not': {
                    '$nin': ['C']
                        }
                    }
            }
        }]
    }

Note here the negations get confusing, which is why there is a distinct advantage in using the not operator in the query syntax as a compound clause.

2.3. List of Query operators

The following operators are preliminarily supported in the query syntax:

  • eq: Equal to

  • ne: Not equal to

  • gt: Greater than

  • lt: Less than

  • gte: Greater than or equal to

  • lte: Less than or equal to

  • any: Contains any of the specified values

  • and: Logical AND

  • or: Logical OR

  • not: Logical NOT

2.4. List of fields

Query Syntax is technically agnostic to the fields in the dataset; for the qdpi dataset, the following queries have been tested.

# Start Examples
example_queries=[
    "nmols eq 5",
    "not nmols eq 5",
    "nmols eq 1",
    "contains_elements any [H,N,O,C]",
    "contains_elements any [H,O] and contains_elements any [C]",
    "contains_elements any [H,O] and not contains_elements any [C]",
    "contains_elements any [H,N,O,C] or nmols gt 1",
    "molecular_charge eq -1",
    "not molecular_charge eq 0",
]
# End Examples