cleanX Command-Line Interface

python3 -m cleanX

python3 -m cleanX [OPTIONS] COMMAND [ARGS]...

Options

-c, --config <config>

Configuration value pairs. The values will be processed using JSON parser. For the list of possible values see Config.

-f, --config-file <config_file>

Similar to --config it is possible to provide all the necessary configuraiton settings as a file. The file needs to be in JSON format suitable for Config.parse().

-v, --verbosity <verbosity>

Controls verbosity level set for logging. The default is logging.WARNING.

Options:

critical | debug | error | fatal | info | warning | warn | notset

dataset

python3 -m cleanX dataset [OPTIONS] COMMAND [ARGS]...

report

python3 -m cleanX dataset report [OPTIONS]

Options

-r, --train-source <train_source>

The source of the test data (usually, a file path).

Supported source types are:

* json
* csv
-t, --test-source <test_source>

The source of the test data (usually, a file path).

Supported source types are:

* json
* csv
-i, --unique_id <unique_id>

The name of the column that uniquely selects cases in the dataset (typically, patient’s id). If not given, the first matching column in the test and train datasets is considered to be the unique id.

-l, --label-tag <label_tag>

The name of the column that typically has the diagnosis, the propery that is being learned in this machine learning task. The default value is “Label”.

-s, --sensitive-category <sensitive_category>

Repeatable. The name of the column that describes the property of the dataset that may potentially exhibit bias, eg. “gender”, “ethnicity” etc.

--report-duplicates, --no-report-duplicates

Whether the report should contain information about ducplicates.

--report-leakage, --no-report-leakage

Whether the report should contain information about leakage.

--report-bias, --no-report-bias

Whether the report should contain information about leakage.

--report-understand, --no-report-understand

Whether the report should contain information about understanding.

-o, --output <output>

The file to output the report to. If no file is given, the report will be printed on stdout. Supported report formats are (inferred from file extension):

* txt

dicom

python3 -m cleanX dicom [OPTIONS] COMMAND [ARGS]...

extract

python3 -m cleanX dicom extract [OPTIONS]

Options

-i, --input <input>

Repeatable. Takes two arguments. First argument is a type of source, the second is the source description.

Supported source types are:

* dir
* glob

If source type is `dir’, then the source description must be a path to a directory.

If source type is `glob’, then the source description must be a glob pattern as used by Python’s builtin glob function. Whether glob pattern will be interpreted as recursive is controlled by configuration setting GLOB_IS_RECURSIVE.

-o, --output <output>

The directory where the extracted images will be placed.

-c, --config-reader <config_reader>

Options to pass to the DICOM reader at initialization time.

These will depend on the chosen reader.

report

python3 -m cleanX dicom report [OPTIONS]

Options

-i, --input <input>

Repeatable. Takes two arguments. First argument is a type of source, the second is the source description.

Supported source types are:

* dir
* glob

If source type is `dir’, then the source description must be a path to a directory.

If source type is `glob’, then the source description must be a glob pattern as used by Python’s built-in glob function. Whether glob pattern will be interpreted as recursive is controlled by configuration setting GLOB_IS_RECURSIVE.

-o, --output <output>

The directory where the extracted images will be placed.

-c, --config-reader <config_reader>

Options to pass to the DICOM reader at initialization time.

These will depend on the chosen reader.

images

python3 -m cleanX images [OPTIONS] COMMAND [ARGS]...

restore-pipeline

python3 -m cleanX images restore-pipeline [OPTIONS]

Options

-j, --journal-dir <journal_dir>

Required Where is the journal stored

-s, --skip <skip>

Number of steps to skip before resuming the pipeline

-r, --source <source>

Glob-like expression to look for source images

run-pipeline

python3 -m cleanX images run-pipeline [OPTIONS]

Options

-s, --step <step>

Step to be executed by the pipeline

-b, --batch-size <batch_size>

How many images to process concurrently.

-j, --journal <journal>

Where to store the journal. If not specified, the default journal location is used. You can control the default location by modifying JOURNAL_HOME configuration setting.

-k, --keep-journal

Whether to keep journal after the pipeline finishes.

-r, --source <source>

Glob-like expression to look for source images