std – article

Innocent beginning¶

I was figuring out normalization and came across some code where
the MNIST dataset was divided by 255 before normalization.

I began to suspect, based on observations from this article:
How ML models work with images? that this is incorrect.

Not incorrect, but unnecessary, because normalization should
“handle” this division by itself.

I always thought that std gives you a range around mean

However when I calculated mean and std of MNIST dataset (not divided by 255)
I found that $mean = 33.31$ and $std = 78.56$
but $33 – 78 = -45$ and since MNIST images do not have negative values,
I realized my understanding of std was wrong.
To clarify this, I decide to check it on a simple array

In [1]:

import torch

In [2]:

tensor_a = torch.tensor([0.,2,3,4,255])
print(f'mean = {tensor_a.mean()}')
print(f'std  = {tensor_a.std()}')

mean = 52.79999923706055
std  = 113.04291534423828

First step to rabbit hole¶

I don’t know why, but I decided that it would be beneficial for me
to calculate mean and std from scratch.
I found the formula for the mean $$\mu = \frac{1}{N} \sum_{i=1}^{N} x_i$$

In [3]:

tensor_sum = 0
for i in tensor_a:
    tensor_sum += i

mean = tensor_sum / len(tensor_a)
print(mean)

tensor(52.8000)

It’s always so cool when a daunting mathematical formula
turns into simple and easily understandable code.

Now for std formula and we’re done. Easy peasy. $$\sigma = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (x_i – \mu)^2}$$

In [4]:

squared_sum_diffs = 0.
for i in tensor_a:
    squared_sum_diffs += (i - mean)**2

variance = squared_sum_diffs / len(tensor_a)
std = torch.sqrt(variance)
print(f'std={std}')

std=101.10865020751953

Oops¶

My code and torch.std gave different results.
Of course, I had made some stupid mistake.
I started debugging by trying to isolate different parts.

After giving up, I began looking at other’s code, only to find the same result as mine.

Finally, I realized I should check another standard library where std also exists: numpy

In [5]:

import numpy as np
numpy_a = np.array(tensor_a)
np.std(numpy_a)

Out[5]:

101.10865

Ok, at least it’s nice to know that I didn’t make a mistake
in my implementation.

What is going on?¶

Why does numpy give one result and torch another?

That’s why you need to learn math if you want to understand ML
It turns out there are two types of std:

Population standard deviation

$$\sigma = \sqrt{\frac{1}{N-1} \sum_{i=1}^{N} (x_i – \mu)^2}$$

Sample standard deviation

$$\sigma = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (x_i – \mu)^2}$$

The only difference is that in
Population std, division is by $N$
Sample std division is by $N – 1$

Population std is used when we have complete information
Sample std is used when we have only a sample of data.

With division by $N – 1$ Bessel’s correction we get a larger variability
estimate that adjusts for the fact that we have incomplete information

Let’s double-check as usual

In [6]:

print(f'torch std={torch.std(tensor_a):.2f}')

squared_sum_diffs = 0.
for i in tensor_a:
    squared_sum_diffs += (i - mean)**2

variance = squared_sum_diffs / (len(tensor_a) - 1)
std = torch.sqrt(variance)
print(f'manual population std={std:.2f}')

torch std=113.04
manual population std=113.04

In [7]:

print(f'numpy std={np.std(numpy_a):.2f}')

squared_sum_diffs = 0.
for i in tensor_a:
    squared_sum_diffs += (i - mean)**2

variance = squared_sum_diffs / len(tensor_a)
std = torch.sqrt(variance)
print(f'manual sample std={std:.2f}')

numpy std=101.11
manual sample std=101.11

So PyTorch by default uses Sample std while Numpy uses Population std

That’s cool. Just a few hours of figuring things out.
It could be much worse. It could be part of some big project.
Then it would be days of debugging.

STDs can be different

Innocent beginning¶

First step to rabbit hole¶

Oops¶

What is going on?¶

Comments

Leave a Reply Cancel reply