Fitting low stats data

Mantid still doesn’t seem to have a Cost Function which lets Fit correctly deal with low statistics bins, especially those with zero counts where a simple error=sqrt(counts)=0 pins the function to zero at that point and either modifying to error=1 or discarding those points gives a biased result. There isn’t a correct error that will make Least Squares work exactly!

The correct cost function formula for Poisson statistics should be (for bin contents=N, fitted mean count rate=y)
y<0: forbidden (infinite cost)
y>=0, N=0: cost function=2y
y>=0, N>0: cost function=2
( (y-N) + N*(log(N)-log(y)) )

Least Squares approximates to this for very large N.

You assume that all points have equal statistical weights. Say you have several points, that you measured for different lengths of time. If you measured point 1 for one hour, we assume an error bar of 1. If you measured point 2 for one hour, and we get 0 counts, we should say the error bar is also 1. But what about if we measured point 2 for 100 hours? Then the error bar should be 0.01. The main problem that Mantid has in this respect is that it does not carry around data and statistical weights separately.

Opened an issue #14490 to add the Poisson cost function

In the example I’m using, these values are all counts in raw data bins from a time spectrum, with the same width, and accumulated for the same number of ISIS “frames”. So the different weighting that Andrei mentions does not occur. I agree that might be relevant in other cases such as triple axis neutron measurements. The solution is to have the value “y” in my formula be the theoretical count rate multiplied by the exposure time for that bin, which would have to be known on a bin-by-bin basis, and is independent of the data itself. If fitting multiple spectra then y also has to be multiplied by that detector’s solid angle or some equivalent calibration factor.

No progress seems to have been made for the last 2 years but we still have lots of low stats data waiting to be analysed at all, and medium stats data where we could do a better job. A particular example is the long time limit in muon spectra. Is it on anyone’s “to-do” list?


It is on Roman’s to-do list, but unfortunately he has been consistently very busy on other work, and is planned to be for the next few months as well. If we get any resource available in the next few months I’ll prioritize this.