4.3. Relabeler Class
- class pharmaforge.labeling.relabeler.Relabeler[source]
Bases:
object
A class to handle relabeling of data in the database. This class is cognizant of the fact that one might want to relabel data in a database without having to have the database available at the time of relabeling.
Methods
chunk_calc
(nchunk)Reads in a chunk input file and then runs the relabeling process on that chunk and then saves the results to a new file.
combine_subdivided
(result_files)Combines the results of the subdivided relabeling process into a single, calculated result in the same format as the relabel output.
relabel
([interface, restart])Relabels the data in the entire selection in one go.
relabel_chunk
(chunk[, interface])Relabels a chunk of data.
select_data
(database[, query, select_collection])Select data that must be relabeled.
subdivided_relabel
([nchunks])Sets up the relabeling process to be subdivided into smaller chunks with an equal number of tasks.
to_database
(database[, level_of_theory])Saves the relabeled data back to the database.
- static chunk_calc(nchunk)[source]
Reads in a chunk input file and then runs the relabeling process on that chunk and then saves the results to a new file.
- Parameters:
nchunk (int) – The chunk number to read in.
- Return type:
None
- combine_subdivided(result_files)[source]
Combines the results of the subdivided relabeling process into a single, calculated result in the same format as the relabel output.
- Parameters:
nchunks (int, optional) – The number of chunks that were combined. Default is 1.
- relabel(interface=None, restart=False, **kwargs)[source]
Relabels the data in the entire selection in one go. Ideal for systems with small number of processors, and small datasets.
- static relabel_chunk(chunk, interface=None, **kwargs)[source]
Relabels a chunk of data. This is useful for large datasets, where one might want to relabel the data in chunks.
- select_data(database, query=None, select_collection=None)[source]
Select data that must be relabeled. This can either be a query or an entire collection of data.
- Parameters:
database (pymongo.database.Database) – The database to select data from.
query (str, optional) – The query to select data from the database. If not provided, the entire collection will be selected.
select_collection (str, optional) – The collection to select data from the database. If not provided, the default collection will be used.
- subdivided_relabel(nchunks=1)[source]
Sets up the relabeling process to be subdivided into smaller chunks with an equal number of tasks. This is ideal for systems with a large number of processors, or for HPC environments where one might want to do many tasks at once.
- Parameters:
nchunks (int, optional) – The number of chunks to subdivide the relabeling process into. Default is 1.