cleanX.image_work.image_functions module

CleanX: a library for cleaning radiological data used in machine learning applications

exception cleanX.image_work.image_functions.Cv2Error

Bases: RuntimeError

cleanX.image_work.image_functions.cv2_imread(image, *args)
cleanX.image_work.image_functions.crop_np(image_array)

Crops black edges of an image array

Parameters:

image_array (ndarray) – Image array.

Returns:

NumPy array with the image data with the black margins cropped.

Return type:

ndarray

cleanX.image_work.image_functions.crop_np_white(image_array)

Crops white edges of an image array :param image_array: Image array. :type image_array: ndarray :return: NumPy array with the image data with the white margins cropped. :rtype: ndarray

cleanX.image_work.image_functions.find_outliers_sum_of_pixels_across_set(directory, percent_to_examine)

This function finds images that are outliers in terms of having a large or small total pixel sum, which in most cases, once images are normalized will correlate with under or overexposure OR pathology if the percent parameter is set to a low number - 3% (will be written as 3) is recommended

Parameters:

directory (string) – Directory of images.

Returns:

top, bottom (dataframes of highest and lowest total)

Return type:

class:

tuple

cleanX.image_work.image_functions.hist_sum_of_pixels_across_set(directory)

This function finds the sum of pixels per image in a set of images, then turns these values into a histogram. This is useful to compare exposure across normalized groups of images.

Parameters:

directory (: string) – directory of images.

Returns:

NumPy array shown as histogram.

Return type:

ndarray

cleanX.image_work.image_functions.crop(image)

Crops an image of a black or white frame: made for Numpy arrays only now. Previous version handled PIL images. Next version handles all colors of borders i.e. not only black or white frames

Parameters:

image (This must be a NumPy array holding image data,) – Image

Returns:

NumPy array with the image data with the black margins cropped.

Return type:

ndarray.

cleanX.image_work.image_functions.subtle_sharpie_enhance(image)

Makes a new image that is very subtly sharper to the human eye, but has new values in most of the pixels (besides the background). This is an augmentation, that has not been tested for how well the outputs match X-rays from new machines used well, but is within a reasonable range by human eye.

Parameters:

image (str) – String for image name

Returns:

new_image_array, a nearly imperceptibly sharpened image for humans

Return type:

ndarray

cleanX.image_work.image_functions.harsh_sharpie_enhance(image)

Makes a new image that is very sharper to the human eye, and has new values in most of the pixels (besides the background). This augmentation may allow humans to understand certain elements of an image, but should be used with care to make augmented data.

Parameters:

image (str) – String for image name

Returns:

new_image_array, a sharpened image for humans

Return type:

ndarray

cleanX.image_work.image_functions.salting(img)

This function adds some noise to an image. The noise is synthetic. It has not been tested for similarity to older machines, which also add noise.

Parameters:

img_name (str) – String for image name

Returns:

new_image_array, with noise

Return type:

ndarray

cleanX.image_work.image_functions.simple_rotate_no_pil(image, angle, center=None, scale=1.0)

This function works without the PIL library. It takes one picture and rotates is by a number of degrees in the parameter angle. This function can be used with the augment_and_move function as follows (example): .. code-block:: python

augment_and_move(

‘D:/my_academia/dataset/random_within_domain’, ‘D:/my_academia/elo’, [partial(simple_rotate_no_pil, 5)],

)

Parameters:

image (Image (JPEG)) – Image.

Returns:

rotated image

Return type:

numpy.ndarray

cleanX.image_work.image_functions.blur_out_edges(image)

For an individual image, blurs out the edges as an augmentation. This augmentation is not like any real-world X-ray, but can make images which helps force attention in a neural net away from the edges of the images.

Parameters:

image (Image (JPEG)) – Image

Returns:

blurred_edge_image an array of an image blurred around the edges

Return type:

ndarray

cleanX.image_work.image_functions.multi_rotation_augmentation_no_pill(angle1, angle2, number_slices, image)

Works on a single image, and returns a list of augmented images which are based on twisting the angle from angle1 to angle2 with ‘number_slices’ as the number of augmented images to be made from the original. It is not realistic or desirable to augment images of most X-rays by flipping them. In abdominal or chest X-rays that would create an augmentation that could imply specific pathologies e.g. situs inversus. We suggest augmenting between angles -5 to 5. :param angle1: angle1 is the angle from the original to the first augmented :type angle1: float :param angle2: angle2 is the angle from the original to the last augmented :type angle2: float :param number_slices: number of images to be produced :type number_slices: int :param image: image :type image: string (string where image is located) :return: list of image arrays :rtype: list

cleanX.image_work.image_functions.show_major_lines_on_image(pic_name)

A function that takes individual images and shows suspect lines i.e. lines more likely to be non-biological.

Parameters:

pic_name (str) – String of image full name e.g. “C:/folder/image.jpg”

Returns:

shows image but technically returns a matplotlib plotted image

Return type:

matplotlib.image.AxesImage

cleanX.image_work.image_functions.find_big_lines(directory, line_length)

Finds number of lines in images at or over the length of line_length, gives back a DataFrame with this information. Note lines can fold back on themselves, and every pixel is counted if they are all contiguous

Parameters:
  • directory (Suitable for os.path.join()) – Directory with set_of_images (should include final ‘/’).

  • line_length (int) – Minimal length of lines for the function to count.

Returns:

DataFrame with column for line count at or above line_length

Return type:

DataFrame

cleanX.image_work.image_functions.separate_image_averager(set_of_images, s=5)

This function runs on a list of images to make a prototype tiny X-ray that is an average image of them. The images should be given as the strings that are the location for the image file.

Parameters:
  • set_of_images (Collection of elements suitable for os.path.join()) – Set_of_images

  • s (int) – length of sides in the image made

Returns:

image

Return type:

class: numpy.ndarray

cleanX.image_work.image_functions.dimensions_to_df(image_directory)

Finds dimensions on images in a folder, and makes a dataframe for exploratory data analysis.

Parameters:

folder_name (os.path.join()) – Adress of folder with images (should include final ‘/’)

Returns:

image height, width and proportion height/width as a new DataFrame

Return type:

DataFrame

cleanX.image_work.image_functions.dimensions_to_histo(image_directory, bins_count=10)

Looks in the directory given, and produces a histogram of various widths and heights. Important information as many neural nets take images all the same size. Classically most chest-X-rays are 2500 \times 2000 or 2500 \times 2048; however the dataset may be different and/or varied

Parameters:
  • folder_name (str) – Folder_name, directory name.(should include final ‘/’)

  • bins_count (int) – bins_count, number of bins desired (defaults to 10)

Returns:

histo_ht_wt, a labeled histogram

Return type:

tuple

cleanX.image_work.image_functions.proportions_ht_wt_to_histo(folder_name, bins_count=10)

Looks in the directory given, produces a histogram of various proportions of the images by dividing their heights by widths. Important information as many neural nets take images all the same size. Classically most chest X-rays are 2500 \times 2000 or 2500 \times 2048; however the dataset may be different and/or varied

Parameters:
  • folder_name (str) – Folder_name, directory name. (should include final ‘/’)

  • bins_count (int) – bins_count, number of bins desired (defaults to 10)

Returns:

histo_ht_wt_p, a labeled histogram

Return type:

tuple

cleanX.image_work.image_functions.find_very_hazy(directory)

Finds pictures that are really “hazy” i.e. there is no real straight line because they are blurred. Usually, at least the left or right tag should give straight lines, thus this function finds image of a certain questionable technique level.

Parameters:

directory (os.path.join()) – The folder the images are in (should include final ‘/’)

Returns:

DataFrame with images sorted as hazy or regular under label_for_haze

Return type:

DataFrame

cleanX.image_work.image_functions.find_by_sample_upper(source_directory, percent_height_of_sample, value_for_line)

This function takes an average (mean) of upper pixels, and can show outliers defined by a percentage, i.e. the function shows images with an average of upper pixels in top x % where x is the percent height of the sample. Note: images with high averages in the upper pixels are likely to be inverted, upside down or otherwise different from more typical X-rays.

Parameters:
  • source_directory (os.path.join()) – folder the images are in(should include final ‘/’)

  • percent_height_of_sample (int) – From where on image to call upper

  • value_for_line (int) – From where in pixel values to call averaged values abnormal

Returns:

DataFrame with images labeled

Return type:

DataFrame

cleanX.image_work.image_functions.find_sample_upper_greater_than_lower(source_directory, percent_height_of_sample)

Takes average of upper pixels, average of lower pixels (you define what percent of picture should be considered upper and lower) and compares. In a CXR if lower average is greater than upper it may be upside down or otherwise bizarre, as the neck is smaller than the abdomen.

Parameters:
  • source_directory (os.path.join()) – folder the images are in(should include final ‘/’)

  • percent_height_of_sample (int) – From where on image to call upper or lower

Returns:

DataFrame with images labeled

Return type:

DataFrame

cleanX.image_work.image_functions.find_outliers_by_total_mean(source_directory, percentage_to_say_outliers)

Takes the average of all pixels in an image, returns a DataFrame with those images that are outliers by mean. This function can catch some inverted or otherwise problematic images

Parameters:
  • source_directory (os.path.join()) – folder images in (include final /)

  • percentage_to_say_outliers (int) – Percentage to capture

Returns:

DataFrame made up of outliers only

Return type:

DataFrame

cleanX.image_work.image_functions.find_outliers_by_mean_to_df(source_directory, percentage_to_say_outliers)

Takes the average of all pixels in an image, returns a DataFrame with those images classified. This function can catch some inverted or otherwise problematic images Important note: approximate, and the function can by chance cut the groups so images with the same mean are in and out of normal range, if the knife so falls

Parameters:
  • source_directory (os.path.join()) – The folder in which the images are

  • percentage_to_say_outliers (int) – Percentage to capture

Returns:

DataFrame all images, marked as high, low or within range

Return type:

DataFrame

cleanX.image_work.image_functions.create_matrix(width, height, default_element)

Takes width, height then creates a matrix populated by the default element. Super handy for advanced image manipulation. Note you can not create matrices bigger than your computer memory can handle making. Therefore the function will work on matrices with dimensions up to maybe 500*500 depending

Parameters:
  • width (int) – Width of the matrix to be created

  • height (int) – The height of matrix to be created

  • default_element (Union[float, int, str]) – Element to populate the matrix with

Returns:

2D matrix populated

Return type:

list

cleanX.image_work.image_functions.find_tiny_image_differences(directory, s=5, percentile=8)

Finds differences between a manufactured tiny image, and all your images at that size. If you return the outliers they are inverted, or dramatically different in some way. Note: percentile returned is approximate, may be a tad more

Parameters:
  • directory (Suitable for os.path.join()) – Directory with source images.

  • s (int) – length to make the sides of the tiny image for comparison

  • percentile (int) – percentile to mark as abnormal

Returns:

DataFrame with a column that notes mismatches and within range images

Return type:

DataFrame

cleanX.image_work.image_functions.tesseract_specific(directory)

Finds images with text on them. Multi-lingual including English.

Parameters:

directory (Suitable for os.path.join()) – Directory with source images.

Returns:

DataFrame with a column of text found

Return type:

DataFrame

cleanX.image_work.image_functions.find_suspect_text(directory, label_word)

Finds images with a specific text you ask for on them. Multi-lingual including English. Accuracy is very high, but not perfect.

Parameters:
  • directory (Suitable for os.path.join()) – Directory with source images.

  • label_word (str) – Label word

Returns:

DataFrame with a column of text found over the length

Return type:

DataFrame

cleanX.image_work.image_functions.find_suspect_text_by_length(directory, length)

Finds images with text over a specific length (of letters, digits, and spaces), specified by you the user. Useful if you know you do not care about R and L or SUP. Multi-lingual including English. Accuracy is very high, but not perfect.

Parameters:
  • directory (Suitable for os.path.join()) – Directory with source images.

  • length (int) – Length to find above, inclusive

Returns:

DataFrame with a column of text found

Return type:

DataFrame

cleanX.image_work.image_functions.histogram_difference_for_inverts(directory)

This function looks for images by a spike on the end of their pixel value histogram to find inverted images. Note we assume classical X-rays, not inverted fluoroscopy images.

Parameters:

directory (Suitable for os.path.join()) – Directory with source images.

Returns:

a list of images suspected to be inverted

Return type:

list

cleanX.image_work.image_functions.inverts_by_sum_compare(directory)

This function looks for images and compares them to their inverts. In the case of inverted typical CXR images the sum of all pixels in the image will be higher than the sum of pixels in the un-inverted (or inverted*2) image

Parameters:

directory (Suitable for os.path.join()) – Directory with source images.

return: a DataFrame with images categorized :rtype: DataFrame

cleanX.image_work.image_functions.histogram_difference_for_inverts_todf(directory)

This function looks for images by a spike on the end of their pixel value histogram to find inverted images, then puts what it found into a DataFrame. Images are listed as regulars, inverts of unclear (the unclear have equal spikes on both ends). #histo

Parameters:

directory (Suitable for os.path.join()) – Directory with source images.

Returns:

a DataFrame with images categorized

Return type:

DataFrame

cleanX.image_work.image_functions.find_duplicated_images(directory)

Finds duplicated images with filecmp and returns a list of them. This function should be replaced with cv2_phash_for_dupes

Parameters:

directory (Suitable for os.path.join()) – Directory with source images.

Returns:

a list of duplicated images

Return type:

list

cleanX.image_work.image_functions.find_duplicated_images_todf(directory)

Finds duplicated images and returns a DataFrame of them. This function should be replaced with cv2_phash_for_dupes

Parameters:

directory (Suitable for os.path.join()) – Directory with source images.

Returns:

a DataFrame of duplicated images

Return type:

DataFrame

cleanX.image_work.image_functions.show_images_in_df(iter_ob, length_name)

Shows images by taking them off a DataFrame column, and displays them but in smaller versions, so they can be compared quickly

Parameters:
  • iter_ob (list) – List, chould be a DataFrame column, use .to_list()

  • length_name (int) – Size of image name going from end

Returns:

technically no return but makes a plot of images with names

Return type:

none

cleanX.image_work.image_functions.dataframe_up_my_pics(directory, diagnosis_string)

Takes images in a directory (should all be with same label), and puts the name (with path) and label into a DataFrame

Parameters:
  • directory (Suitable for os.path.join()) – Directory with source images.

  • diagnosis_string (str) – Usually a label, may be any string

Returns:

DataFrame of pictures and label

Return type:

DataFrame

class cleanX.image_work.image_functions.Rotator(image, center=None, scale=1.0)

Bases: object

Class for rotating OpenCV images.

class RotationIterator(rotator, start, end, step)

Bases: object

Class RotationIterator iterator implementation of a range of rotated images

__init__(rotator, start, end, step)

Creates an instance of RotationIterator.

Parameters:
  • rotator (Rotator) – The Rotator object for which this is an iterator.

  • start (numeric) – The initial angle (in degrees).

  • end (numeric) – The final angle (in degrees).

  • step (numeric) – Increment (in degrees).

__iter__()

Implementation of iteratble protocol

__init__(image, center=None, scale=1.0)

Creates a wrapper object that allows creation of ranges of rotation of the given image.

Parameters:
  • image (cv2.Image) – OpenCV image

  • center – Coordinate of the center of rotation (defaults to the middle of the image).

  • scale – Scale ratio of the resulting image (after rotation, defaults to 1.0).

iter(start=0, end=360, step=1)

Class method iter returns a generator group of images that are on angles from start to stop with increment of step. Usage example:

image = cv2.imread('normal-frontal-chest-x-ray.jpg')
rotator = Rotator(image)
for rotated in rotator.iter(0, 360, 10):
     # shows the np arrays for the 36 (step=10) images
    print(rotated)
cleanX.image_work.image_functions.simple_spinning_template(picy, greys_template, angle_start, angle_stop, slices, threshold4=0.7)

This function creates an image compared to a rotated template as an image.

Parameters:
  • picy (str) – String for image name of base image

  • greys_template (ndarray) – The image array of the template,

  • angle_start (float) – angle to spin template to, it would normally start at zero if picking up exact template itself is desired

  • angle_stop (float) – last angle to spin template to,

  • slices (float) – number of different templates to make between angles

  • threshold4 (float) – A number between zero and one which sets the precision of matching. NB: .999 is stringent, .1 will pick up too much

Returns:

copy_image, a copy of base image with the template areas caught outlined in blue rectangles

Return type:

ndarray

cleanX.image_work.image_functions.make_contour_image(im)

Makes an image into a contour image :param im: image name :type im: str

Returns:

drawing, the contour image

Return type:

ndarray

cleanX.image_work.image_functions.avg_image_maker(set_of_images)

This function shows you an average sized image that has been made with the average per pixel place (in a normalized matrix) of all images averaged from the set_of_images group.

Parameters:

set_of_images (list) – A set of images, can be read in with glob.glob() on a folder of jpgs.

Returns:

final_avg, an image that is the average image of images in the set

Return type:

ndarray

cleanX.image_work.image_functions.set_image_variability(set_of_images)

This function shows you an average sized image created to show variability per pixel if all images were averaged (in terms of size) and compared. Here you will see where the variability- and therefore in some cases pathologies like pneumonia can be typically located, as well as patient- air interface (not all subjects same size) and other obviously variable aspects of your image set.

Parameters:

set_of_images (list) – A set of images, can be read in with glob.glob() on a folder of jpgs.

Returns:

Final_diff, an image that is the average virability per pixel of the image in images in the set.

Return type:

ndarray

cleanX.image_work.image_functions.avg_image_maker_by_label(master_df, dataframe_image_column, dataframe_label_column, image_folder)

This function sorts images by labels and makes an average image per label. If images are all the same size subtracting one from the other should reveal salient differences. N.B. blending different views e.g. PA and lateral is not suggested. :param master_df: Dataframe with image location and labels (must be in image folder) :type master_df: DataFrame :param dataframe_image_column: name of dataframe column with image location string :type dataframe_image_column: str :param dataframe_label_column: name of dataframe column with label string :type dataframe_label_column: str :param image_folder: name of folder where images are :type image_folder: str

Returns:

list of titled average images per label

Return type:

list

cleanX.image_work.image_functions.zero_to_twofivefive_simplest_norming(img_pys)

This function takes an image and makes the highest pixel value 255, and the lowest zero. Note that this will not give anything like a true normalization, but will put all images into 0 to 255 values

Parameters:

img_pys (str) – Image name.

Returns:

OpenCV image.

Return type:

cv2.Image

cleanX.image_work.image_functions.rescale_range_from_histogram_low_end(img, tail_cut_percent)

This function takes an image and makes the highest pixel value 255, and the lowest zero. It also normalizes based on the histogram distribution of values, such that the lowest percent (specified by tail_cut_percent) all become zero. This function must take images where the pixel values are integers. To implement this function on an image valued from 0 to 1, multiply all pixels by 255 first. The new histogram will be more sparse, but resamples should fix the problem (presumably you will have to sample down in size for a neural net anyways)

Parameters:
  • img_pys – NumPy array with image data.

  • tail_cut_percent (int) – Percent of histogram to be discarded from low end

Returns:

New NumPy array with image data.

Return type:

ndarray

cleanX.image_work.image_functions.make_histo_scaled_folder(imgs_folder, tail_cut_percent, target_folder)

This function takes each image inside a folder and normalizes them by the histogram. It then puts the new normalized images in to a folder which is called the target folder (to be made by user)

Parameters:
  • imgs_folder (str) – Foulder with source images.

  • tail_cut_percent (int) – Percent of histogram to be discarded from low end

Returns:

Target_name, but your images go into target folder with target_name.

Return type:

str

cleanX.image_work.image_functions.give_size_count_df(folder)

This function returns a dataframe of the unique sizes of the images ,and how many images have such a size.

Parameters:

folder (string) – folder with jpgs

Returns:

df

Return type:

pandas.core.frame.DataFrame

cleanX.image_work.image_functions.give_size_counted_dfs(folder)

This function returns dataframes of uniquely sized images in a list

Parameters:

folder (string) – folder with jpgs

Returns:

big_sizer

Return type:

list

cleanX.image_work.image_functions.image_quality_by_size(specific_image)

This function returns the size of an image which can indicate one aspect of quality (can be used as a helper function)

Parameters:

specific_image – the jpg image

Returns:

q

Return type:

int

cleanX.image_work.image_functions.find_close_images(folder, compression_level, ref_mse)

This function finds potentially duplicated images by comparing compressed versions of the images.

Parameters:
  • folder (str) – folder with jpgs

  • compression_level (float) – size to compress down to

  • ref_mse (float) – mse is a mean squared error

Returns:

near_dupers

Return type:

DataFrame

cleanX.image_work.image_functions.show_close_images(folder, compression_level, ref_mse, plot_limit=20)

This function shows potentially duplicated images by comparing compressed versions of the images, then displays them for inspection.

Parameters:
  • folder (str) – folder with jpgs

  • compression_level (float) – size to compress down to

  • ref_mse (float) – mse is a mean squared error

  • plot_limit (int) – How many images to plot when showing duplicates. Negative values mean to show all images.

cleanX.image_work.image_functions.image_to_histo(image)

This is a small helper function that makes returns the arrray of an image histogram :param image: the image as an array (not filename) :type image: array

Returns:

histogram

Return type:

float

cleanX.image_work.image_functions.black_end_ratio(image_array)

This is a function to assess a specific parameter of image quality. The parameter checked is whether there are enough very dark/black pixels. In a normal chest X-ray we would expect black around the neck, and therefore have a lot of those low values. If the image was shot without the neck, we will assume poor technique (note in some theoretical cases this technique might have been requested, but it is not standard at ALL) If the ratio is below 0.3, you have a chestX-ray that is unusual in value distributions, and in 9.9/10 cases one shot with poor technique. The images MUST be cropped of any frames and normalized to 0-255.

Parameters:

image_array (array) – the image as an array

Returns:

ratio

Return type:

float

cleanX.image_work.image_functions.outline_segment_by_otsu(image_to_transform, blur_k_size=1)

This is a function to turn an Xray into an outline with a specific method that involves an implementation of Otsu’s algorithm, and cv2 version of Canny the result is line images that can be very useful in and of themselves to run a neural net on or can be used for segmentation in some cases blur_k_size used in a blur to make our lines less detailed if set to a higher value, 0 < values < 100, and odd

Parameters:
  • image_to_transform (string) – the image name

  • blur_k_size – must be odd and value <100, kernel to blur

to make ourlines less detailed :type blur_k_size: int

Returns:

edges (an image with lines)

Return type:

numpy.ndarray

cleanX.image_work.image_functions.binarize_by_otsu(image_to_transform, blur_k_size)

This is a function to turn an Xray into an binarized image with a specific method that involves an implementation of Otsu’s algorithm, the result is line images that can be very useful in and of themselves to run a neural net on or can be used for segmentation in some cases blur_k_size used in a blur to make our lines less detailed if set to a higher value, 0 < values < 100, and odd

Parameters:
  • image_to_transform (string) – the image name

  • blur_k_size – must be odd and value <100, kernel to blur

to make ourlines less detailed :type blur_k_size: int

Returns:

output_image (an image binarized to 0s or 255s)

Return type:

numpy.ndarray

cleanX.image_work.image_functions.column_sum_folder(directory)

Takes images in directory and makes a graph for each image of sums along horizontal or vertical lines this is saved as an accompanying image. Returns a dataframe with this information for each image, but also deposits new images into a new folder because each run will include the newly made images. NB: This is a home-made projection algorithm. Projection algorithms can be used in image registration, and future versions of cleanX will have more efficient projection algorithms. Also note the df will be enormous…

Parameters:

directory (string) – Directory with set_of_images.

Returns:

sumpix_df (df with info from new images of column sums)

Return type:

pandas.core.frame.DataFrame

cleanX.image_work.image_functions.blind_quality_matrix(directory)

Creates a dataframe of image quality charecteristics including: laplacian variance (somewhat correlated to blurriness/ resolution),total pixel sum (somewhat correlated to exposure), and a fast Fourier transform variance measure (correlated to resolution and contrast), contrast by two different measures (standard deviation, and Michaelson), bit depth (with an eye to a future when there may well be higher bit depths , although probably not on your screen since at some point these distinctions go beyond human eye ability) and filesize divided by image area The data frame is colored with a diverging color scheme (purple low, green high) map so that groups of images can be compared intuitively NB: images should be roughly comparable in dimension size for results to be meaningful.

Parameters:

directory (string) – Directory with set_of_images.

Returns:

frame (dataframe)

Return type:

class ‘pandas.io.formats.style.Styler’

cleanX.image_work.image_functions.fourier_transf(image)

A fourier transformed image from an X-ray can actually provide information on everything from aliasing (moire pattern) and other noise patterns to image orientation and potential registration in the right hands. This creates Fourier transformed images out of all in a directory. This function is simply the appropriate numpy fast Fourier transforms made into a single code line/ “wrapper”.

Parameters:

image (numpy.ndarray) – original image (3 or single channel)

Returns:

transformed

Return type:

numpy.ndarray

cleanX.image_work.image_functions.pad_to_size(img, ht, wt)

This function applies a padding with value 0 around the image symmetrically until it is the ht and wt parameters specificed. Note if ht or wt below the existing ones are chosen, the image will be returned unpadded with a message. Note: this is not suggested as a pre-convolution padding. A preconvolution padding can be done easily in opencv with copyMakeBorder function. This function is a helper function, but can be used alone.

Parameters:
  • img (numpy.ndarray) – original image (3 or single channel)

  • ht (int) – desired image height

  • wt (wt) – desired image width

Returns:

image

Return type:

numpy.ndarray

cleanX.image_work.image_functions.cut_to_size(img, ht, wt)

This function applies a crop around the image symmetrically until it is the ht and wt parameters specified. Note if ht or wt above the existing ones are chosen, the original image will be returned uncut, and a message will be printed.

Parameters:
  • img (numpy.ndarray) – original image (3 or single channel)

  • ht (int) – desired image height

  • wt (wt) – desired image width

Returns:

image

Return type:

numpy.ndarray

cleanX.image_work.image_functions.cut_or_pad(img, ht, wt)

This function applies a cropping or a padding around the image symmetrically until it is the ht and wt parameters specified. Please note: what is usually appropriate for neural nets is to crop off frames, then resize all the images, then pad them all, so they are all as unform as possible.

Parameters:
  • img (numpy.ndarray) – original image (3 or single channel)

  • ht (int) – desired image height

  • wt (wt) – desired image width

Returns:

image

Return type:

numpy.ndarray

cleanX.image_work.image_functions.rotated_with_max_clean_area(image, angle)

Given an image, will rotate the image and crop off the blank triangle edges Note: if image is given with a triangle edge (previously rotated?), or border these existing edges and borders will not be cropped.

Parameters:
  • img (numpy.ndarray) – original image (3 or single channel)

  • angle (int) – desired angle for rotation

Returns:

image

Return type:

numpy.ndarray

cleanX.image_work.image_functions.noise_sum_cv(image)

Given an image, will try to sum up the noise, then divide by the area of the image. The noise summation here is based on an opencv2 algorithm for noise called fastNlMeansDenoising which is an implementation of non-local means denoising.

Parameters:

img (numpy.ndarray) – original image (3 or single channel)

Returns:

final_sum

Return type:

float

cleanX.image_work.image_functions.noise_sum_median_blur(image)

Given an image, will try to sum up the noise, then divide by the area of the image. The noise summation here is based on a median filter denoising

Parameters:

img (numpy.ndarray) – original image (3 or single channel)

Returns:

final_sum

Return type:

float

cleanX.image_work.image_functions.noise_sum_gaussian(image)

Given an image, will try to sum up the noise, then divide by the area of the image. The noise summation here is based on a gaussian filter denoising

Parameters:

img (numpy.ndarray) – original image (3 or single channel)

Returns:

final_sum

Return type:

float

cleanX.image_work.image_functions.noise_sum_bilateral(image)

Given an image, will try to sum up the noise, then divide by the area of the image. The noise summation here is based on a bilatera filter denoising given a fairly large area (15 pixels)

Parameters:

img (numpy.ndarray) – original image (3 or single channel)

Returns:

final_sum

Return type:

float

cleanX.image_work.image_functions.noise_sum_bilateralLO(image)

Given an image, will try to sum up the noise, then divide by the area of the image. The noise summation here is based on a bilatera filter denoising given a fairly large area (15 pixels)

Parameters:

img (numpy.ndarray) – original image (3 or single channel)

Returns:

final_sum

Return type:

float

cleanX.image_work.image_functions.noise_sum_5k(image)

Given an image, will try to sum up the noise, then divide by the area of the image. The noise summation here is based on a median filter denoising using a 5*5 kernel. This kernel is reccomended for picking up moire patterns and other repetitive noise that may be missed by a smaller kernel.

Parameters:

img (numpy.ndarray) – original image (3 or single channel)

Returns:

final_sum

Return type:

float

cleanX.image_work.image_functions.noise_sum_7k(image)

Given an image, will try to sum up the noise, then divide by the area of the image. The noise summation here is based on a median filter denoising using a 7*7 kernel. This kernel is reccomended for picking up moire patterns and other repetitive noise that may be missed by a smaller kernel.

Parameters:

img (numpy.ndarray) – original image (3 or single channel)

Returns:

final_sum

Return type:

float

cleanX.image_work.image_functions.blind_noise_matrix(directory)

Creates a dataframe of image noise approximations by different algorithms here run over the whole image. The data frame is colored with a diverging color scheme (purple low, green high) map so that groups of images can be compared intuitively NB: images should be roughly comparable in dimension size for results to be meaningful.

Parameters:

directory (string) – Directory with set_of_images.

Returns:

frame (dataframe)

Return type:

class ‘pandas.io.formats.style.Styler’

cleanX.image_work.image_functions.segmented_blind_noise_matrix(directory)

Creates a dataframe of image noise approximations by different algorithms but only on the very dark areas. Essentially this is a segmentation to the background, and a judgement of noise there. The data frame is colored with a diverging color scheme (purple low, green high) map so that groups of images can be compared intuitively NB: images should be roughly comparable in dimension size for results to be meaningful.

Parameters:

directory (string) – Directory with set_of_images.

Returns:

frame (dataframe)

Return type:

class ‘pandas.io.formats.style.Styler’

cleanX.image_work.image_functions.make_inverted(read_image)

Create an inverted image from a read_image

Parameters:

read_image (numpy.ndarray) – An image.

Returns:

inverted image (black is white and white is black)

Return type:

class:

numpy.ndarray

cleanX.image_work.image_functions.cv2_phash_for_dupes(origin_folder)

Finds duplicated images by using p-hashing and returns a list of them. :param directory: Directory with source images. :type directory: Suitable for os.path.join() :return: a df of duplicated images :rtype: class:~pandas.DataFrame