Chi squared for different fit minimisers

Question from @jameslord:

I’m looking at the “Cost Function” returned by fits.

I’d expect to see (weighted) chi-squared per degree of freedom. This should be close to 1.0 for a successful fit where the error bars on the input data are an accurate representation of the scatter in the points and the fit function can represent the ideal curve well. I do get this with the Levenberg-Marquardt minimiser and also from CalculateChiSquared(). All the other minimisers (including Levenberg-MarquardtMD) generally return a value which is half of this, though the fitted parameters are the same and CalculateChiSquared still gives 1.0 from the same function and final parameters.

I’ve observed this both with Fit() from a Python script (see attached) and using the Muon Interface.

The script actually tries different numbers of maximum iterations and returns the status and parameters so far. Sometimes Fit returns something like “Failed to converge after 10 iterations”. Shouldn’t this also appear as a warning (orange) in the Results Log window, between the usual “Fit started” and “Fit successful, duration 1.0 seconds”?

Some minimisers (the Conjugate Gradient ones for example), given plenty more iterations than necessary to work with, stop with “iteration is not making progress towards solution” rather than “success”. The parameter values and cost function look good anyway. Conversely sometimes Simplex returns “success” but its parameters aren’t quite as good as the others.

I note there’s now a page:
http://docs.mantidproject.org/nightly/concepts/FittingMinimizers.html

Should this list the convergence parameters (e.g. MaxError) that some minimisers take and under what conditions I might want to change them from the defaults?

It also says that some minimisers such as Levenberg-Marquardt and BFGS use the second derivatives of the function. Should there therefore be a way for a fit function to return these higher derivatives, where they’re straightforward to derive analytically, as it can for the first order ones?

Minor problem while writing that script: it seems that EvaluateFunction requires that the OutputWorkspace property is set, even when I’m assigning its return value (that workspace reference) to a Python variable for use as input to other algorithms later and don’t care whether it goes in the ADS or not. This is unlike most other algorithms.

FitConvergenceComparison.py (2.6 KB)

Another minimiser comment:

If I have a parameter with a large value and a relatively small error bar, or meaningful range, such as the line centre position on a spectrum, things can go wrong with the fit if it falls back to numerical derivatives. See the script here:
DynamicRangeOfFits.py (1.5 KB)

I think Mantid’s default numerical derivative varies the parameter by 0.1%. If this shifts a sharp line by an amount comparable to its width, the derivatives will be meaningless!

I’ve modified some of my code to have a hard coded offset on some sensitive parameters so they are in the range (-1,1) and all seems to be working much better now! But there should be a warning in the documentation - and also perhaps at the conclusion of Fit() if it finds that its step size for derivatives is not small compared to the error bar on that parameter. Or should I be able to hint at an appropriate step size for each parameter, based on knowledge of the function?

A further comment: giving a parameter an initial guess of zero, even if that’s quite close to the expected value, upsets the numerical derivative and the minimiser exits with “Singular matrix”.

Just wondering - In the comparison of fit minimiser performance, were these tests done using analytical derivatives (specially coded C++ or Python fit function) or numerical derivatives (name=UserFunction, formula=…)? Would it be worth running both variants?

And shouldn’t the Accuracy ranking be relative to the certified values provided by NIST rather than the best that any of the currently implemented Mantid minimisers can do? (I’d expect that some do a good job and it doesn’t make a difference but it would be good to be certain)

Yes, the comparison of minimizers was done using UserFunction in all cases (no tailored fit function defintiions). We should clarify that detail in the text.

We defined the ranking as relative to the best solution available in Mantid for simplicity and readability. As it stands now it is a comparison in relative terms, but not an evaluation of how good the minimizers are. In the future we should add further details on how close the best Mantid minimiser is to the best/certified solution.

In our experience, and also according to the experts from the numerical group of the SCD we are collaborating with, the NIST certified values are extremely good. For some problems we have not been able to get close to those certified best solutions. We’d suspect that they were obtained by generating more starting points, as it seems really challenging to obtain such good ressults starting from any of the two official starting points.

Thank you for your coments, James. Please, don’t hesitate to give us any further feedback. This was just a first version and we are planning to extend it for the next releases. Our aim to include more specific neutron & muon datasets and fitting problems, besides new minimiizers that are being developed.

With regards to derivatives, may it be worthwhile looking into automatic differentiation again? A while ago I looked into that (a library called adept for completely different reasons but never completed the work properly. There’s a design document:

This was done using version 1.0, now there’s a version 1.1 which apparently had some major changes, so I don’t know if the performance was impacted by that.

The advantage of using automatic differentiation is that you get the same values as for analytical derivatives down to machine precision, but you don’t need to write the derivatives. In many cases it’s much faster than numerical derivatives as well. The disadvantage is that you need to use a special data type (not plain double) that records the operations on the variables and function evaluation can get a bit slower (usually not massively).

Let me know if this would still be of interest.

That Adept library sounds interesting and may have applications elsewhere but will it work for me? In my fit functions the parameters are used to initialise some elements of NumPY matrices which are then multiplied, inverted, diagonalised, etc and the final fit curve usually evaluated as a series of sine/cosine terms.

I could write my own derivatives to do numerical differences internally with a carefully chosen step size and then fill in the Jacobian inside functionDeriv1D() as if I had defined analytical derivatives. That assumes of course that the minimiser uses them in the case of composite functions or multi data set fitting. As the calculations are to double precision there is probably quite a large dynamic range between the step size being too large (function non-linearity) and too small (precision/rounding) and it’s not too hard to write a “test” function to plot the errors in the derivatives versus the step.

As an aside, function1D returns the whole fit curve in one go as a numpy array while functionDeriv1D needs Npts * Nparams calls to jacobian.set(). In most simple functions the analytical derivative with respect to a parameter is best calculated as a numpy array and could be returned all in one go if there was an interface for it.