4.3. Relabeler Class

class pharmaforge.labeling.relabeler.Relabeler[source]

Bases: object

A class to handle relabeling of data in the database. This class is cognizant of the fact that one might want to relabel data in a database without having to have the database available at the time of relabeling.

Methods

chunk_calc(nchunk)

Reads in a chunk input file and then runs the relabeling process on that chunk and then saves the results to a new file.

combine_subdivided(result_files)

Combines the results of the subdivided relabeling process into a single, calculated result in the same format as the relabel output.

relabel([interface, restart])

Relabels the data in the entire selection in one go.

relabel_chunk(chunk[, interface])

Relabels a chunk of data.

select_data(database[, query, select_collection])

Select data that must be relabeled.

subdivided_relabel([nchunks])

Sets up the relabeling process to be subdivided into smaller chunks with an equal number of tasks.

to_database(database[, level_of_theory])

Saves the relabeled data back to the database.

static chunk_calc(nchunk)[source]

Reads in a chunk input file and then runs the relabeling process on that chunk and then saves the results to a new file.

Parameters:

nchunk (int) – The chunk number to read in.

Return type:

None

combine_subdivided(result_files)[source]

Combines the results of the subdivided relabeling process into a single, calculated result in the same format as the relabel output.

Parameters:

nchunks (int, optional) – The number of chunks that were combined. Default is 1.

relabel(interface=None, restart=False, **kwargs)[source]

Relabels the data in the entire selection in one go. Ideal for systems with small number of processors, and small datasets.

Parameters:
  • interface (object, optional) – The interface to use for relabeling.

  • **kwargs (dict, optional) – Additional arguments to pass to the interface.

static relabel_chunk(chunk, interface=None, **kwargs)[source]

Relabels a chunk of data. This is useful for large datasets, where one might want to relabel the data in chunks.

Parameters:
  • chunk (list) – The chunk of data to relabel.

  • interface (object, optional) – The interface to use for relabeling.

  • output_file (str, optional) – The file to save the relabeled data to.

  • **kwargs (dict, optional) – Additional arguments to pass to the interface.

select_data(database, query=None, select_collection=None)[source]

Select data that must be relabeled. This can either be a query or an entire collection of data.

Parameters:
  • database (pymongo.database.Database) – The database to select data from.

  • query (str, optional) – The query to select data from the database. If not provided, the entire collection will be selected.

  • select_collection (str, optional) – The collection to select data from the database. If not provided, the default collection will be used.

subdivided_relabel(nchunks=1)[source]

Sets up the relabeling process to be subdivided into smaller chunks with an equal number of tasks. This is ideal for systems with a large number of processors, or for HPC environments where one might want to do many tasks at once.

Parameters:

nchunks (int, optional) – The number of chunks to subdivide the relabeling process into. Default is 1.

to_database(database, level_of_theory=None)[source]

Saves the relabeled data back to the database.