4.3. Relabeler Class

class pharmaforge.labeling.relabeler.Relabeler[source]

Bases: object

A class to handle relabeling of data in the database. This class is cognizant of the fact that one might want to relabel data in a database without having to have the database available at the time of relabeling.

Methods

`chunk_calc`(nchunk)	Reads in a chunk input file and then runs the relabeling process on that chunk and then saves the results to a new file.
`combine_subdivided`(result_files)	Combines the results of the subdivided relabeling process into a single, calculated result in the same format as the relabel output.
`relabel`([interface, restart])	Relabels the data in the entire selection in one go.
`relabel_chunk`(chunk[, interface])	Relabels a chunk of data.
`select_data`(database[, query, select_collection])	Select data that must be relabeled.
`subdivided_relabel`([nchunks])	Sets up the relabeling process to be subdivided into smaller chunks with an equal number of tasks.
`to_database`(database[, level_of_theory])	Saves the relabeled data back to the database.

static chunk_calc(nchunk)[source]

Reads in a chunk input file and then runs the relabeling process on that chunk and then saves the results to a new file.

Parameters:: nchunk (int) – The chunk number to read in.
Return type:: None

combine_subdivided(result_files)[source]

Combines the results of the subdivided relabeling process into a single, calculated result in the same format as the relabel output.

Parameters:: nchunks (int, optional) – The number of chunks that were combined. Default is 1.

relabel(interface=None, restart=False, **kwargs)[source]

Relabels the data in the entire selection in one go. Ideal for systems with small number of processors, and small datasets.

Parameters:

interface (object, optional) – The interface to use for relabeling.
**kwargs (dict, optional) – Additional arguments to pass to the interface.

static relabel_chunk(chunk, interface=None, **kwargs)[source]

Relabels a chunk of data. This is useful for large datasets, where one might want to relabel the data in chunks.

Parameters:

chunk (list) – The chunk of data to relabel.
interface (object, optional) – The interface to use for relabeling.
output_file (str, optional) – The file to save the relabeled data to.
**kwargs (dict, optional) – Additional arguments to pass to the interface.

select_data(database, query=None, select_collection=None)[source]

Select data that must be relabeled. This can either be a query or an entire collection of data.

Parameters:

database (pymongo.database.Database) – The database to select data from.
query (str, optional) – The query to select data from the database. If not provided, the entire collection will be selected.
select_collection (str, optional) – The collection to select data from the database. If not provided, the default collection will be used.

subdivided_relabel(nchunks=1)[source]

Sets up the relabeling process to be subdivided into smaller chunks with an equal number of tasks. This is ideal for systems with a large number of processors, or for HPC environments where one might want to do many tasks at once.

Parameters:: nchunks (int, optional) – The number of chunks to subdivide the relabeling process into. Default is 1.

to_database(database, level_of_theory=None)[source]: Saves the relabeled data back to the database.