Innocent beginning¶
I was figuring out normalization and came across some code where
the MNIST dataset was divided by 255 before normalization.
I began to suspect, based on observations from this article:
How ML models work with images? that this is incorrect.
Not incorrect, but unnecessary, because normalization should
“handle” this division by itself.
I always thought that std
gives you a range around mean
However when I calculated mean
and std
of MNIST dataset (not divided by 255)
I found that $mean = 33.31$ and $std = 78.56$
but $33 – 78 = -45$ and since MNIST images do not have negative values,
I realized my understanding of std
was wrong.
To clarify this, I decide to check it on a simple array
import torch
tensor_a = torch.tensor([0.,2,3,4,255])
print(f'mean = {tensor_a.mean()}')
print(f'std = {tensor_a.std()}')
mean = 52.79999923706055 std = 113.04291534423828
First step to rabbit hole¶
I don’t know why, but I decided that it would be beneficial for me
to calculate mean
and std
from scratch.
I found the formula for the mean
$$\mu = \frac{1}{N} \sum_{i=1}^{N} x_i$$
tensor_sum = 0
for i in tensor_a:
tensor_sum += i
mean = tensor_sum / len(tensor_a)
print(mean)
tensor(52.8000)
It’s always so cool when a daunting mathematical formula
turns into simple and easily understandable code.
Now for std
formula and we’re done. Easy peasy.
$$\sigma = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (x_i – \mu)^2}$$
squared_sum_diffs = 0.
for i in tensor_a:
squared_sum_diffs += (i - mean)**2
variance = squared_sum_diffs / len(tensor_a)
std = torch.sqrt(variance)
print(f'std={std}')
std=101.10865020751953
Oops¶
My code and torch.std
gave different results.
Of course, I had made some stupid mistake.
I started debugging by trying to isolate different parts.
After giving up, I began looking at other’s code, only to find the same result as mine.
Finally, I realized I should check another standard library where std
also exists: numpy
import numpy as np
numpy_a = np.array(tensor_a)
np.std(numpy_a)
101.10865
Ok, at least it’s nice to know that I didn’t make a mistake
in my implementation.
What is going on?¶
Why does numpy
give one result and torch
another?
That’s why you need to learn math if you want to understand ML
It turns out there are two types of std
:
- Population standard deviation
- Sample standard deviation
The only difference is that in
Population std
, division is by $N$
Sample std
division is by $N – 1$
Population std
is used when we have complete information
Sample std
is used when we have only a sample of data.
With division by $N – 1$ Bessel’s correction we get a larger variability
estimate that adjusts for the fact that we have incomplete information
Let’s double-check as usual
print(f'torch std={torch.std(tensor_a):.2f}')
squared_sum_diffs = 0.
for i in tensor_a:
squared_sum_diffs += (i - mean)**2
variance = squared_sum_diffs / (len(tensor_a) - 1)
std = torch.sqrt(variance)
print(f'manual population std={std:.2f}')
torch std=113.04 manual population std=113.04
print(f'numpy std={np.std(numpy_a):.2f}')
squared_sum_diffs = 0.
for i in tensor_a:
squared_sum_diffs += (i - mean)**2
variance = squared_sum_diffs / len(tensor_a)
std = torch.sqrt(variance)
print(f'manual sample std={std:.2f}')
numpy std=101.11 manual sample std=101.11
So PyTorch
by default uses Sample std
while Numpy
uses Population std
That’s cool. Just a few hours of figuring things out.
It could be much worse. It could be part of some big project.
Then it would be days of debugging.
Leave a Reply