cleanX.image_work.pipeline module

class cleanX.image_work.pipeline.MultiSource(sources)

Bases: object

A class to append multiple sources such as GlobSource or DirectorySource. All source classes implement iterator interface.

__init__(sources)

Initializes this iterator with multiple sources in a way similar to itertools.chain()

Parameters:

sources (Sequence) – A sequence of sources, such as GlobSource or DirectorySource.

__iter__()

Iterator implementation.

class cleanX.image_work.pipeline.GlobSource(expression, recursive=False)

Bases: object

A class that creates an iterator to list all files matching glob pattern.

__init__(expression, recursive=False)

Initializes this iterator with the arguments to be passed to glob.glob().

Parameters:
  • expression (Union[str, bytes]) – Expression to be passed to glob.glob()

  • recursive – Controls the interpretation of ** pattern. If True, will interpret it to mean any number of path fragments.

__iter__()

Iterator implementation.

class cleanX.image_work.pipeline.DirectorySource(directory, extension='jpg')

Bases: object

A class that creates an iterator to look at files in the given directory.

__init__(directory, extension='jpg')

Initializes this iterator.

Parameters:
  • directory (Must be valid for os.path.join()) – The directory in which to look for images.

  • extension – A glob pattern for fle extension. Whether it is case-sensitive depends on the filesystem being used.

__iter__()

Iterator implementation.

exception cleanX.image_work.pipeline.PipelineError

Bases: RuntimeError

These errors are reported when pipeline encounters errors with reading or writing images.

class cleanX.image_work.pipeline.Pipeline(steps=None, batch_size=None)

Bases: object

This class is the builder for the image processing pipeline.

This class executes a sequence of Step. It attempts to execute as many steps as possible in parallel. However, in order to avoid running out of memory, it saves the intermediate results to the disk. You can control the number of images processed at once by specifying batch_size parameter.

__init__(steps=None, batch_size=None)

Initializes this pipeline, but doesn’t start its execution.

Parameters:
  • steps (Sequence[Step]) – A sequence of Step this pipeline should execute.

  • batch_size (int) – The number of images that will be processed in parallel.

process(source)

Starts this pipeline.

Parameters:

source (Iterable) – This must be an iterable that yields file names for the images to be processed.

process_step(step, srciter)
process_batch_agg(batch, step)
process_batch_parallel(batch, step)