Relabeling Existing Data within the Database ============================================= In this tutorial, we will pursue calculations in the vein of the previous labeling example, but we will now show how to relabel data that is already present in the database, and how to add the newly labeled data back into the database. We will also show how to use the subdivided relabeling approach to run large dataseets in parallel. .. image:: figures/db_relabel_diagram.png :width: 1200px :align: center :alt: Diagram of different relabeling options. Learning Objectives ------------------- - Learn how to use the `pharmaforge` package with `ase` to re-label data in the database. - Learn how to add the new_data to the database. - Learn how to use the parallel interface to subdivide relabeling tasks into chunks. Required Files -------------- - The tutorial python script is located in examples/Relabeling Tutorial -------- As with most of our examples, we first need to import the required packages, and set up a working database. For this exmaple, we will create a temporary database to work with, called "example_add_data" which will have the collection "example_qdpi" within it. This example is actually just working with the same t8 dataset that we used in the previous example, but you can use any dataset that you want. .. literalinclude:: ../../../../examples/Relabeling/add_data_to_db.py :language: python :end-at: END Database Generation Now, as a start, we will show how to relabel an entire database (with all entries). This is done by using the `Relabeler` class. In the next code snippet, we initialize the class, add the database to the class as the data to be relabeled, and then initialize the interface (in this case XTB) which we will use to relabel the data. .. literalinclude:: ../../../../examples/Relabeling/add_data_to_db.py :language: python :start-at: # Call the relabeling class :end-before: # Do the actual relabeling Now, at this point we haven't done any calculations. We will now set up the calculation and run it, and then add the data back into the database. .. literalinclude:: ../../../../examples/Relabeling/add_data_to_db.py :language: python :start-at: # Do the actual relabeling :end-at: relabeling_task.to_database(db) Note - that now the data is present within the database, and you can query it as you would any other data. Relabeling only Query ^^^^^^^^^^^^^^^^^^^^^ As is described elsewhere, you can also relabel only a query. This is done in a similar way to the above, but now you pass a query to the class alongside the datbase. .. literalinclude:: ../../../../examples/Relabeling/add_data_to_db.py :language: python :start-at: # Now, we can try this with a query :end-before: # End query example Note - here we have not added the data back into the datbase, but you COULD do this if you wanted to. Chunking Large Relabeling Tasks ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Now, often you will have a large dataset and maybe a more expensive interface than DFTB, this can be used with the `Relabeler` class to chunk the relabeling task into smaller pieces that can either be run locally (as in this example), or moved to a cluster or other parallel high-throughput computing system. .. literalinclude:: ../../../../examples/Relabeling/add_data_to_db.py :language: python :start-at: # Now for BIG datasets :end-at: combine_task.combine_subdivided Note, the general logic here is to run the setup for a parallel task on a local system, generate pickle (binary) files that describe the tasks that can be run on a parallel system (which must also have `pharmaforge` installed, but need not have the database present), and then run the tasks there which then generate output pickle files. These output pickel files can then be read back into the local system and combined using the `pharmaforge.relabeling.Relabeler.combine_subdivided` function. As with before, after this point you can add the data back into the database if you want using the same `to_database` method as before. Full Code --------- .. literalinclude:: ../../../../examples/Relabeling/add_data_to_db.py :language: python