Intro¶

If you read chapter 4 from Practical Deep Learning for Coders
and you have questions about all these numbers from images
like what is $28*28$ or $784$ or why we divide these tensors with image data on 255
then this is a post for you

Usual start of working with images¶

In [1]:

from fastai.vision.all import *

Download MNIST SAMPLE images dataset with FastAI function untar_data
os.getcwd() + '/images' returns path of notebook directory and add
subdir images to it
images will be download in this [notebook dir]\images directory

In [2]:

path_to_img_dir = untar_data(URLs.MNIST_SAMPLE, data=Path(os.getcwd() + '/images'))
path_to_img_dir

Out[2]:

Path('/home/harley/mnt/pci_ssd/jupyter_notebooks/fastai/images/mnist_sample')

After we have directory with images we use some commands from FastAI
and received these images in nice batches which we can feed to our model

In [3]:

db_imgs = DataBlock(blocks = (ImageBlock, CategoryBlock),
                 get_items=get_image_files,
                 splitter=RandomSplitter(seed=42),
                 get_y=parent_label)
dls_imgs = db_imgs.dataloaders(path_to_img_dir)

Here is our first batch of images

In [4]:

dls_imgs.show_batch(nrows=3, ncols=3, figsize=(4, 4))

No description has been provided for this image

What is going on under the hood in the previous code block?¶

Let’s make a simulation on the easy example of two numbers from MNIST image dataset

Images unpacking and loading¶

After downloading and unpacking dataset from FastAI or from other source,
we have a directory which contains all images.

In MNIST dataset case this directory contains
subdirectorieswith the name of the number in image

Make two lists of paths to images of 3 and 7

In [5]:

paths_to_threes = (path_to_img_dir/'train'/'3').ls()
paths_to_sevens = (path_to_img_dir/'train'/'7').ls()

#printing first five paths to images of 3
for path in paths_to_threes[:5]: print(str(path)[42:])

/fastai/images/mnist_sample/train/3/43330.png
/fastai/images/mnist_sample/train/3/34239.png
/fastai/images/mnist_sample/train/3/5102.png
/fastai/images/mnist_sample/train/3/40805.png
/fastai/images/mnist_sample/train/3/3171.png

Let’s make two lists of tensors with 3 and 7 images
For this we will use Image.open(full_path_to_image)
tensor for converting image to tensor
[] for packing these tensors in list

In [6]:

list_tensors_three = [tensor(Image.open(path)) for path in paths_to_threes]
list_tensors_seven = [tensor(Image.open(path)) for path in paths_to_sevens]

Length of list is a quantity of images in this list

In [7]:

print(f"Quantity of images of 3 in the three_tensors: {len(list_tensors_three)}")
print(f"Quantity of images of 7 in the seven_tensors: {len(list_tensors_seven)}")

Quantity of images of 3 in the three_tensors: 6131
Quantity of images of 7 in the seven_tensors: 6265

Looking inside the image¶

Each item of these lists is a tensor/array which contains image data
three_tensors[5] means 5th image/tensor in our list

In [8]:

list_tensors_three[5].shape

Out[8]:

torch.Size([28, 28])

torch.Size([28, 28]) means that there are $28*28$ pixels/numbers in the tensor
and each of these numbers is from 0 to 255
this 0 to 255 number is a color:

0 – black color / no color
128 – middle shade of grey
255 – white color

if we look at first 25 rows and middle 14 columns from 10th to 24th
we will see ASCII picture of 3

(if we output more than 14 columns, the notebook splits the row into two rows
and we won’t see the picture)

In [9]:

list_tensors_three[5][0:25, 10:24]

Out[9]:

tensor([[  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,  78, 207, 254, 206, 254, 230, 144,  42,   0,   0,   0,   0],
        [  0,  55, 244, 254, 253, 253, 253, 253, 253, 250,  69,   0,   0,   0],
        [  0,  14, 183, 254, 184, 111, 102, 175, 253, 253, 190,   0,   0,   0],
        [  0,   0,   5,  11,   4,   0,   0,  56, 253, 253, 199,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,  80, 253, 253,  99,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,  57, 235, 253, 206,  22,   0,   0,   0],
        [  0,   0,   0,   0,   3, 104, 239, 253, 250,  30,   0,   0,   0,   0],
        [  0,  33,  45,  60, 181, 253, 253, 200,  65,   0,   0,   0,   0,   0],
        [188, 237, 253, 254, 253, 253, 253, 122,   0,   0,   0,   0,   0,   0],
        [253, 253, 253, 254, 253, 253, 253, 246,  96,   0,   0,   0,   0,   0],
        [111, 111, 111, 112, 139, 234, 255, 254, 216,  12,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,  31, 217, 253, 253,  22,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0, 133, 253, 253,  22,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0, 133, 253, 253,  22,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0, 133, 253, 253,  22,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0, 133, 253, 222,  14,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,  53, 239, 253, 112,   0,   0,   0,   0,   0],
        [ 45,  45,  45,  60, 155, 237, 253, 200,  22,   0,   0,   0,   0,   0],
        [253, 253, 253, 254, 253, 253, 203,  23,   0,   0,   0,   0,   0,   0],
        [253, 253, 253, 240, 143,  52,  16,   0,   0,   0,   0,   0,   0,   0],
        [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0]],
       dtype=torch.uint8)

There is an interesting way of using pandas.DataFrame to receive
more clear ASCII image from this data

In [10]:

df = pd.DataFrame(list_tensors_three[5])
df.style.set_properties(**{'font-size':'6pt', 'padding': '1px'}).background_gradient('Greys_r')

Out[10]:

	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20
0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	78	207	254	206	254	230	144	42	0
5	0	0	0	0	0	0	0	55	244	254	253	253	253	253	253	250	69
6	0	0	0	0	0	0	0	14	183	254	184	111	102	175	253	253	190
7	0	0	0	0	0	0	0	0	5	11	4	0	0	56	253	253	199
8	0	0	0	0	0	0	0	0	0	0	0	0	0	80	253	253	99
9	0	0	0	0	0	0	0	0	0	0	0	0	57	235	253	206	22
10	0	0	0	0	0	0	0	0	0	0	3	104	239	253	250	30	0
11	0	0	0	0	0	0	0	33	45	60	181	253	253	200	65	0	0
12	0	0	0	0	0	125	188	237	253	254	253	253	253	122	0	0	0
13	0	0	0	0	78	251	253	253	253	254	253	253	253	246	96	0	0
14	0	0	0	0	39	111	111	111	111	112	139	234	255	254	216	12	0
15	0	0	0	0	0	0	0	0	0	0	0	31	217	253	253	22	0
16	0	0	0	0	0	0	0	0	0	0	0	0	133	253	253	22	0
17	0	0	0	0	0	0	0	0	0	0	0	0	133	253	253	22	0
18	15	56	27	0	0	0	0	0	0	0	0	0	133	253	253	22	0
19	67	253	225	127	19	0	0	0	0	0	0	0	133	253	222	14	0
20	67	253	253	253	112	1	0	0	0	0	0	53	239	253	112	0	0
21	26	208	253	253	253	158	45	45	45	60	155	237	253	200	22	0	0
22	0	26	246	253	253	253	253	253	253	254	253	253	203	23	0	0	0
23	0	0	41	191	230	253	253	253	253	240	143	52	16	0	0	0	0
24	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
25	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
26	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
27	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0

and the same image data but in usual image regime

In [11]:

show_image(list_tensors_three[5], cmap='grey')

Out[11]:

<Axes: >

Preparation image data for ML¶

so we have lists of tensors which contain integers from 0 to 255
for machine learning we need to convert these lists to tensors

torch.stack converts list of tensors into sequence of tensors (tensor of tensors)

In [12]:

tensor_threes = torch.stack(list_tensors_three)
tensor_sevens = torch.stack(list_tensors_seven)

print(f"type of list_tensors_three: {type(list_tensors_three)}")
print(f"Type of tensor_threes: {type(tensor_threes)}")
print(f"Shape of tensor_threes: {tensor_threes.shape}")

type of list_tensors_three: <class 'list'>
Type of tensor_threes: <class 'torch.Tensor'>
Shape of tensor_threes: torch.Size([6131, 28, 28])

so for now we have tensor_threes with 6131 elements
each of them contains an 28*28 image

Changing image data’s range¶

machine learning usually better work with numbers from 0 to 1 or from -1 to 1
so we need to convert our data into the small range
to do this we will convert our data in tensor to float and divide each pixel by 255
tensors use broadcasting so this dividing operation divide not the object tensor
but each element/pixel of this tensor

In [13]:

converted_tensor_threes = tensor_threes.float()/255
converted_tensor_sevens = tensor_sevens.float()/255

What’s interesting: after dividing we still can see this image in DataFrame visualization
but with different numbers inside

In [14]:

df = pd.DataFrame(converted_tensor_threes[5])
df.style.set_properties(**{'font-size':'4pt', 'padding': '1px'}).background_gradient('Greys_r')

Out[14]:

	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20
0	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
1	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
2	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
3	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
4	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.305882	0.811765	0.996078	0.807843	0.996078	0.901961	0.564706	0.164706	0.000000
5	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.215686	0.956863	0.996078	0.992157	0.992157	0.992157	0.992157	0.992157	0.980392	0.270588
6	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.054902	0.717647	0.996078	0.721569	0.435294	0.400000	0.686275	0.992157	0.992157	0.745098
7	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.019608	0.043137	0.015686	0.000000	0.000000	0.219608	0.992157	0.992157	0.780392
8	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.313726	0.992157	0.992157	0.388235
9	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.223529	0.921569	0.992157	0.807843	0.086275
10	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.011765	0.407843	0.937255	0.992157	0.980392	0.117647	0.000000
11	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.129412	0.176471	0.235294	0.709804	0.992157	0.992157	0.784314	0.254902	0.000000	0.000000
12	0.000000	0.000000	0.000000	0.000000	0.000000	0.490196	0.737255	0.929412	0.992157	0.996078	0.992157	0.992157	0.992157	0.478431	0.000000	0.000000	0.000000
13	0.000000	0.000000	0.000000	0.000000	0.305882	0.984314	0.992157	0.992157	0.992157	0.996078	0.992157	0.992157	0.992157	0.964706	0.376471	0.000000	0.000000
14	0.000000	0.000000	0.000000	0.000000	0.152941	0.435294	0.435294	0.435294	0.435294	0.439216	0.545098	0.917647	1.000000	0.996078	0.847059	0.047059	0.000000
15	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.121569	0.850980	0.992157	0.992157	0.086275	0.000000
16	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.521569	0.992157	0.992157	0.086275	0.000000
17	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.521569	0.992157	0.992157	0.086275	0.000000
18	0.058824	0.219608	0.105882	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.521569	0.992157	0.992157	0.086275	0.000000
19	0.262745	0.992157	0.882353	0.498039	0.074510	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.521569	0.992157	0.870588	0.054902	0.000000
20	0.262745	0.992157	0.992157	0.992157	0.439216	0.003922	0.000000	0.000000	0.000000	0.000000	0.000000	0.207843	0.937255	0.992157	0.439216	0.000000	0.000000
21	0.101961	0.815686	0.992157	0.992157	0.992157	0.619608	0.176471	0.176471	0.176471	0.235294	0.607843	0.929412	0.992157	0.784314	0.086275	0.000000	0.000000
22	0.000000	0.101961	0.964706	0.992157	0.992157	0.992157	0.992157	0.992157	0.992157	0.996078	0.992157	0.992157	0.796078	0.090196	0.000000	0.000000	0.000000
23	0.000000	0.000000	0.160784	0.749020	0.901961	0.992157	0.992157	0.992157	0.992157	0.941176	0.560784	0.203922	0.062745	0.000000	0.000000	0.000000	0.000000
24	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
25	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
26	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
27	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000

also we still can see this data in usual image regime

In [15]:

show_image(converted_tensor_threes[5], cmap='grey')

Out[15]:

<Axes: >

Observation about image data¶

It’s always was very strange for me when I saw these transformations of range
in data before ML training. I always thought that this is some mutiliation of the data.

This visualization shows to us that this changing doesn’t change the data
we changed order of digits but not the relation/proportion among pieces of data

Image editors like range 0..255
Machine learning likes smaller range 0..1
but data is the same in both cases

Does ML models work with info about sides of image?¶

Some models work with information about side sizes, for example CNNs
in which case we have ready tensors with data in good format

In [16]:

print(f"Shape of Converted tensor images ready for CNN: {converted_tensor_threes.shape}")

Shape of Converted tensor images ready for CNN: torch.Size([6131, 28, 28])

Some models want to throw off this information about sides sizes, for example FCN
in which case we need to flatten our image data

If you read the chapter 4 from Practical Deep Learning for Coders
that’s exactly what is going on there: we use linear model there
and need to flatten out this $28*28$ images into the row of 784 pixels

we can do this flattening by using command view

number 784 is a
$side * height = 28 * 28 = 784$

this command slice image by rows and concatenate these rows from left to right

In [17]:

flattened_tensor_threes = converted_tensor_threes.view(-1, 28*28)
print(f"Shape of Flattened tensor images ready for CNN: {flattened_tensor_threes.shape}")

Shape of Flattened tensor images ready for CNN: torch.Size([6131, 784])

If previous explanation about view function doesn’t make sense
let’s see output of this command on the simplier example

we create “image” with sides 3 * 3 and we need to flat it out

In [18]:

D2_tensor = torch.tensor([[1,2,3], [4,5,6], [7,8,9]])
print(D2_tensor)

tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])

In [19]:

flat_D2_tensor = D2_tensor.view(-1, 3*3)
print(flat_D2_tensor)

tensor([[1, 2, 3, 4, 5, 6, 7, 8, 9]])

What about color images?¶

All this were about greyscale images. And what about full color images?
The same principles but with a little trick from visual domain.

With mix of red green and blue we can mix any color.

So for storing full color image we can use three images and just combine them
to achieve any color.

These three mixed images have a name channels

Manual crafting of a color image¶

Let’s create tensor with the same sides but with additional dimension

earlier we used this dimension as an index of file
so when we saw this: Shape of tensor_trees: torch.Size([6131, 28, 28])
we knew that this is a tensor with 6131 images

for now we will use this dimension as a channel
so when we see this Shape of clr_tensor: torch.Size([3, 28, 28])
we knew that this is an image tensor with 3 channels and 28 * 28 sides

R – 0 channel is red
G – 1 channel is green
B – 2 channel is blue

It’s still the same dimension from the programming point of view,
we just put another sense of use into this dimension.

In [20]:

clr_tensor = torch.zeros(3,28,28)
print(f"Shape of clr_tensor: {clr_tensor.shape}")

Shape of clr_tensor: torch.Size([3, 28, 28])

Let’s look at our image

In [21]:

show_image(clr_tensor)

Out[21]:

<Axes: >

It’s just black and it’s ok because we create our tensor and fill it with zeroes
which is black for an image

Now we will fill with ones first 5 rows in 0/Red channel in our tensor

so we should receive a horizontal red line

In [22]:

clr_tensor[0][:5]=1
show_image(clr_tensor)

Out[22]:

<Axes: >

Next channel 1 is green, we will put horizontal line of ones at the bottom

In [23]:

clr_tensor[1][23:28]=1
show_image(clr_tensor)

Out[23]:

<Axes: >

final blue channel 2 we fill it with vertical line of ones
and here we can see interesting effect of color mixing:
left up corner – mix of red and blue gives magenta
left bottom corner – mix of blue and green gives cyan

In [24]:

clr_tensor[2][:,:5]=1
show_image(clr_tensor)

Out[24]:

<Axes: >

How ML models work with images?