Numba and Computing Means
I needed a reason to visit Python JIT compilers, and in particular Numba.
Iterative Means
When I was working in a neuroscience laboratory, we needed to compute the mean and standard deviation for signal in each brain parcel. This was accomplished in C++ using the ITK library.
The first iteration of the program I wrote the means didn’t match a previously caculuated number, they were off my a small amount $\approx 0.001%$.
My boss was not having it. It didn’t matter if the discrepancy was trivial and smaller than the uncertainty of the signal numbers, that still needed fixing. He found this great article by a professional statistician, John Cook.
Large Numbers and Floating Point Rounding Error
I’m not going to attempt to explain the fine points of floating point numbers and how they are defined. You can read that on Wikipedia.
It a nut shell, we need to know that a 32 bit floating points number has 24 significant bits which is about 7 decimal digits. A 64 bit floating point number has 15 decimals of precisionSO Answer nicely summarizes.
To get a better feel for how a number is represented in floating point notation watch this 4 minute video.
If we are calculating the standard deviation of large numbers on the order of magnitude of 1e9, when we perform the substraction the rounding error can cause a negaitve number where we’d expect and require a positive number. JC’s blog post has more detail and better explanation; link.
Numba
So I use Python in my day to day programming and analysis work. The numpy mean
method is not special in any way other than being vectorized and optimzed. Which is usually enough for my needs but not always. Note, for weighted averages use the nump.average method.
To compute the iterative standard deviation we also need an iterative mean.
OK they are a tiny bit different. Before moving on to the full standard deviation calculation I want to time the executions.
Good, Numba is faster, I must be doing it somewhat correctly.
to be continued