cleanX.image_work.journaling_pipeline module

class cleanX.image_work.journaling_pipeline.JournalingPipeline(steps=None, batch_size=None, journal=True, keep_journal=False)

Bases: Pipeline

This class extends Pipeline with the ability to store the progress and the state in a database.

__init__(steps=None, batch_size=None, journal=True, keep_journal=False)

Initializes pipeline with two additional arguments controlling the behavior of persistent storage. See Pipeline for remaining arguments.

Parameters:
  • journal (Union[bool, str]) – If True is passed, the pipeline code will use a preconfigured directory to store the journal. Otherwise, this must be the path to the directory to store the journal database.

  • keep_journal (bool) – Controls whether the journal is kept after successful completion of the pipeline.

classmethod restore(journal_dir, skip=0, **overrides)

Restore the previously created journaling pipeline from the last executed step.

Parameters:
  • journal_dir (Suitable for os.path.join()) – The directory containing journal database to restore from.

  • skip – Skip this many steps before attempting to resume the pipeline. This is useful if you know that the step that failed will fail again, but you want to execute the rest of the steps in the pipeline.

  • **overrides – Arguments to pass to the created pipeline instance that will override those restored from the journal.

Returns:

Fresh JournalingPipeline object fast-forwarded to the last executed step + skip.

Return type:

JournalingPipeline

process(source)

Starts this pipeline.

Parameters:

source (Iterable) – This must be an iterable that yields file names for the images to be processed.

process_batch_agg(batch, step)
process_batch_parallel(batch, step)
process_step(step, srciter)