Sample environment log timestamps wrong

jameslord · June 26, 2018, 1:52pm

Expected behavior

Time stamps of log values (seen by the Python API) should be correct!

Actual behaviour

For runs measured in the (UK) summer, time stamps are an extra hour ahead of local time so all the log points appear to be during or more likely after the run. Winter times are OK.

Steps to reproduce the behaviour

Try the following:

LR=LoadMuonNexus(Filename="HIFI00134470",OutputWorkspace="exampleRun")
ws=LR.OutputWorkspace[0] # grouped, was in period mode
print "run begin=",ws.getRun().getProperty("run_start").value
print "run end=",ws.getRun().getProperty("run_end").value
t=ws.getRun().getProperty("Temp_Sample").times
print "first log timestamp",repr(t[0])
print "last log timestamp",repr(t[-1])

I get:

run begin= 2018-06-25T09:14:57
run end= 2018-06-25T09:26:59
first log timestamp numpy.datetime64('2018-06-25T09:57:16.000000000+0100')
last log timestamp numpy.datetime64('2018-06-25T10:27:00.000000000+0100')

This example run chosen as I was in the cabin at the time. The run begin and end times are correct (in BST), and the same as in the Nexus file as viewed by HDFview, where they are stored as strings. The Nexus file time stamps for log points are integer seconds, running from -1061 to +723. (The script changed temperature after the previous run and before this one started, so I expect log points before run_begin)

Platforms affected

Muon data taken by SECI during daylight savings time, saved in Nexus version 1 format. Possibly also affects neutron data, and that acquired by IBEX (we have no IBEX muon instruments yet)
The bug may have appeared with the change to numpy.datetime64 from the previous Mantid-own time format.

jameslord · June 27, 2018, 10:32am

I note that the FloatTimeSeriesProperty also has method nthTime(index) which returns a mantid.kernel.DateAndTime object. These are correct. So I have an option for a work around (for now).

The recipe to convert mantid.kernel.DateAndTime to numpy.datetime64 on this page reproduces the bug:
http://docs.mantidproject.org/nightly/api/python/mantid/kernel/DateAndTime.html

peterfpeterson · July 3, 2018, 12:48pm

I created an issue in github and expect to have a fix in time for the release.

peterfpeterson · July 3, 2018, 3:08pm

Looking at this more, it is significantly more subtle than your example suggests. The string representation of numpy.datetime64 is the correct value if you notice the timestamp. The value found in ws.getRun().getProperty("run_start").value is a time stamp in “local time” which is ambiguous (not in ISO8601 format) because it does not have a time-zone designation. The times stored in ws.getRun().getProperty("run_end").value are proper time stamps (absolute time) and the repr and str functions will have them print their value using the operating system’s current time zone. For me in eastern USA during daylight savings (time-zone is EDT or -04:00) I get

first log timestamp numpy.datetime64('2018-06-25T04:57:16.000000000-0400')
last log timestamp numpy.datetime64('2018-06-25T05:27:00.000000000-0400')

The “better” way to print the times from a ndarray of datetime64 is to use numpy.datetime_as_string which allows specifying a timezone using pytz. I haven’t been able to confirm, but it appears to use the daylight savings appropriate for the datetime itself

import numpy as np
import pytz
print np.datetime_as_string(t[0], timezone=pytz.timezone('US/Eastern'))
print np.datetime_as_string(t[0], timezone=pytz.timezone('Europe/London'))
print np.datetime_as_string(t[0], timezone=pytz.timezone('Europe/Copenhagen'))
print np.datetime_as_string(t[0], timezone=pytz.timezone('Europe/Paris'))

gives the “proper” ISO8601 strings

'2018-06-25T04:57:16.000000000-0400'
'2018-06-25T09:57:16.000000000+0100'
'2018-06-25T10:57:16.000000000+0200'
'2018-06-25T10:57:16.000000000+0200'

Numpy decided that including the timezone designation was clarifying behavior as documented here. Mantid doesn’t currently have a mechanism for inquiring where (geographically) data was measured which would be required for determining if the particular datetime was during daylight savings.

I will update the documentation, but a “fix” for the issue is not going change the behavior you see with str(t[0]). The “best” solution I can think of is to add a method to the Facility object to return the correct pytz object to simplify using np.datetime_as_string.

jameslord · July 3, 2018, 3:55pm

I think the problem is not in the final string representation of the numpy.datetime64 values but the initial creation of them. The end time of the previous run, and therefore the approximate time of the first log point in this one (I was in the cabin and witnessed it!) was 08:57:16+0100 (BST) or 07:57:16+0000 (UTC). Mantid might assume that the ambiguous string for run_start “2018-06-24T09:14:57” was in UTC and then convert it to numpy.datetime64 which when converted back in the Europe/London locale gives 10:14:57+0100 (and slightly earlier times for the first log points).

Your comments highlight that there may be more traps awaiting our users when they get back to their home institution in another time zone and look at their data with a script that isn’t very carefully written.

Originally, the Nexus file contains the ambiguous local times for “start” and “end”, and all the logs are then in seconds offset from “start”. The most important requirement for most scripts is to tell which log values were taken while the run was counting, and which were before (or after). Next, is that times of runs and logs agree with things the users note down in their lab notebook while here, or messages in the ISIS MCR news “beam off at 15:30”. So things should be consistent.

I think the proper fix ought to be:

have a flag in the facilities file to give the “default” time zone for ambiguous data from that facility/instrument
convert ws.getRun().getProperty("run_start").value and ws.getRun().getProperty("run_end").value from strings to the same time data type as ws.getRun().getProperty(any_log).times (using the above flag if they don’t have a time zone included). Use these converted values as the base for the time stamps of other logs. If returning something other than strings would break things, have another method such as “.value_as_datetime64”.
the experimental control software should write run_start and run_end in ISO8601 format from now on.

jameslord · July 4, 2018, 11:24am

As a comparison, the “Sample Logs” dialog creates tables with:

Numeric log values: the “time” column is numeric in seconds from run_start, i.e. not converted to a date/time at all.
String log values: the “time” column is a string or similar shown as hh:mm:ss with no date or timezone information. These values agree with the “run_start”, at least for me in the same time zone as the instrument. Some care is needed for runs that were in progress at midnight. For example look at ICPEVENT.

These tables only exist at the QTIplot level and aren’t accessible from Python.

peterfpeterson · July 5, 2018, 3:44pm

I agree that the absolute best fix would be for the data acquisition to write proper datetime strings into the files (including for the start attribute of the logs which you’ll see below) that remove the ambiguity.

This is also a good time to point out that numpy changed the behavior of how it interprets ambiguous datetime strings (without timezone designation). One version assumes that ambiguous datetime strings are in local timezone, the other version assumes it is UTC. DateAndTimeTest.py shows this off.

Using nxbrowse to look at the contents of HIFI00134470 you’ll see that the datetime strings from both the start time and log are ambiguous:

NX/run> read start_time 
  start_time[19] (NX_CHAR) = 2018-06-25T09:14:57
NX/run> cd temperature_log_1/ 
NX/run/temperature_log_1> read name
  name[11] (NX_CHAR) = Temp_Sample

    available = 320 
NX/run/temperature_log_1> read time 
  time[320] (NX_FLOAT32) = [ -1061.000000 -1055.000000 -1050.000000 ...]
    units = seconds
    start = 2018-06-25T09:14:57

This points out that the first log value is 1061 before the start of the run. Mantid uses the start attribute of the log as the epoch for generating all of the times. Once mantid has read in a log, it is stored as a proper timestamp and converted to strings as appropriate. The times method of a log returns a ndarray of datetime64 as that is used significantly more than the string version of the datetimes. The DateAndTime documentation isn’t explicit, but you can convert the times to the seconds from the start of the log in one line:

time = (times - times[0])/np.timedelta64(1, 's')

If you want seconds from the start of the run, you’ll have to decide how to interpret that ambiguous datetime string.

I will be adding better timezone functionality for the facilities in the PR I’m working on for the release. Changing how the logs are initially read in require a bit more discussion. Another question is if we should convert the run_start and run_end to DateAndTime objects on load in order to make manipulating things easier.