cleanX Command-Line Interface¶
python3 -m cleanX¶
python3 -m cleanX [OPTIONS] COMMAND [ARGS]...
Options
- -c, --config <config>¶
Configuration value pairs. The values will be processed using JSON parser. For the list of possible values see
Config
.
- -f, --config-file <config_file>¶
Similar to
--config
it is possible to provide all the necessary configuraiton settings as a file. The file needs to be in JSON format suitable forConfig.parse()
.
- -v, --verbosity <verbosity>¶
Controls verbosity level set for
logging
. The default islogging.WARNING
.- Options:
critical | debug | error | fatal | info | warning | warn | notset
dataset¶
python3 -m cleanX dataset [OPTIONS] COMMAND [ARGS]...
report¶
python3 -m cleanX dataset report [OPTIONS]
Options
- -r, --train-source <train_source>¶
The source of the test data (usually, a file path).
Supported source types are:
* json* csv
- -t, --test-source <test_source>¶
The source of the test data (usually, a file path).
Supported source types are:
* json* csv
- -i, --unique_id <unique_id>¶
The name of the column that uniquely selects cases in the dataset (typically, patient’s id). If not given, the first matching column in the test and train datasets is considered to be the unique id.
- -l, --label-tag <label_tag>¶
The name of the column that typically has the diagnosis, the propery that is being learned in this machine learning task. The default value is “Label”.
- -s, --sensitive-category <sensitive_category>¶
Repeatable. The name of the column that describes the property of the dataset that may potentially exhibit bias, eg. “gender”, “ethnicity” etc.
- --report-duplicates, --no-report-duplicates¶
Whether the report should contain information about ducplicates.
- --report-leakage, --no-report-leakage¶
Whether the report should contain information about leakage.
- --report-bias, --no-report-bias¶
Whether the report should contain information about leakage.
- --report-understand, --no-report-understand¶
Whether the report should contain information about understanding.
- -o, --output <output>¶
The file to output the report to. If no file is given, the report will be printed on stdout. Supported report formats are (inferred from file extension):
* txt
dicom¶
python3 -m cleanX dicom [OPTIONS] COMMAND [ARGS]...
extract¶
python3 -m cleanX dicom extract [OPTIONS]
Options
- -i, --input <input>¶
Repeatable. Takes two arguments. First argument is a type of source, the second is the source description.
Supported source types are:
* dir* globIf source type is `dir’, then the source description must be a path to a directory.
If source type is `glob’, then the source description must be a glob pattern as used by Python’s builtin glob function. Whether glob pattern will be interpreted as recursive is controlled by configuration setting GLOB_IS_RECURSIVE.
- -o, --output <output>¶
The directory where the extracted images will be placed.
- -c, --config-reader <config_reader>¶
Options to pass to the DICOM reader at initialization time.
These will depend on the chosen reader.
report¶
python3 -m cleanX dicom report [OPTIONS]
Options
- -i, --input <input>¶
Repeatable. Takes two arguments. First argument is a type of source, the second is the source description.
Supported source types are:
* dir* globIf source type is `dir’, then the source description must be a path to a directory.
If source type is `glob’, then the source description must be a glob pattern as used by Python’s built-in glob function. Whether glob pattern will be interpreted as recursive is controlled by configuration setting GLOB_IS_RECURSIVE.
- -o, --output <output>¶
The directory where the extracted images will be placed.
- -c, --config-reader <config_reader>¶
Options to pass to the DICOM reader at initialization time.
These will depend on the chosen reader.
images¶
python3 -m cleanX images [OPTIONS] COMMAND [ARGS]...
restore-pipeline¶
python3 -m cleanX images restore-pipeline [OPTIONS]
Options
- -j, --journal-dir <journal_dir>¶
Required Where is the journal stored
- -s, --skip <skip>¶
Number of steps to skip before resuming the pipeline
- -r, --source <source>¶
Glob-like expression to look for source images
run-pipeline¶
python3 -m cleanX images run-pipeline [OPTIONS]
Options
- -s, --step <step>¶
Step to be executed by the pipeline
- -b, --batch-size <batch_size>¶
How many images to process concurrently.
- -j, --journal <journal>¶
Where to store the journal. If not specified, the default journal location is used. You can control the default location by modifying JOURNAL_HOME configuration setting.
- -k, --keep-journal¶
Whether to keep journal after the pipeline finishes.
- -r, --source <source>¶
Glob-like expression to look for source images