Expected behavior
Time stamps of log values (seen by the Python API) should be correct!
Actual behaviour
For runs measured in the (UK) summer, time stamps are an extra hour ahead of local time so all the log points appear to be during or more likely after the run. Winter times are OK.
Steps to reproduce the behaviour
Try the following:
LR=LoadMuonNexus(Filename="HIFI00134470",OutputWorkspace="exampleRun")
ws=LR.OutputWorkspace[0] # grouped, was in period mode
print "run begin=",ws.getRun().getProperty("run_start").value
print "run end=",ws.getRun().getProperty("run_end").value
t=ws.getRun().getProperty("Temp_Sample").times
print "first log timestamp",repr(t[0])
print "last log timestamp",repr(t[-1])
I get:
run begin= 2018-06-25T09:14:57
run end= 2018-06-25T09:26:59
first log timestamp numpy.datetime64('2018-06-25T09:57:16.000000000+0100')
last log timestamp numpy.datetime64('2018-06-25T10:27:00.000000000+0100')
This example run chosen as I was in the cabin at the time. The run begin and end times are correct (in BST), and the same as in the Nexus file as viewed by HDFview, where they are stored as strings. The Nexus file time stamps for log points are integer seconds, running from -1061 to +723. (The script changed temperature after the previous run and before this one started, so I expect log points before run_begin)
Platforms affected
Muon data taken by SECI during daylight savings time, saved in Nexus version 1 format. Possibly also affects neutron data, and that acquired by IBEX (we have no IBEX muon instruments yet)
The bug may have appeared with the change to numpy.datetime64 from the previous Mantid-own time format.
I note that the FloatTimeSeriesProperty also has method nthTime(index) which returns a mantid.kernel.DateAndTime object. These are correct. So I have an option for a work around (for now).
The recipe to convert mantid.kernel.DateAndTime to numpy.datetime64 on this page reproduces the bug:
http://docs.mantidproject.org/nightly/api/python/mantid/kernel/DateAndTime.html
I created an issue in github and expect to have a fix in time for the release.
Looking at this more, it is significantly more subtle than your example suggests. The string representation of numpy.datetime64
is the correct value if you notice the timestamp. The value found in ws.getRun().getProperty("run_start").value
is a time stamp in “local time” which is ambiguous (not in ISO8601 format) because it does not have a time-zone designation. The times stored in ws.getRun().getProperty("run_end").value
are proper time stamps (absolute time) and the repr
and str
functions will have them print their value using the operating system’s current time zone. For me in eastern USA during daylight savings (time-zone is EDT
or -04:00
) I get
first log timestamp numpy.datetime64('2018-06-25T04:57:16.000000000-0400')
last log timestamp numpy.datetime64('2018-06-25T05:27:00.000000000-0400')
The “better” way to print the times from a ndarray
of datetime64
is to use numpy.datetime_as_string
which allows specifying a timezone using pytz
. I haven’t been able to confirm, but it appears to use the daylight savings appropriate for the datetime itself
import numpy as np
import pytz
print np.datetime_as_string(t[0], timezone=pytz.timezone('US/Eastern'))
print np.datetime_as_string(t[0], timezone=pytz.timezone('Europe/London'))
print np.datetime_as_string(t[0], timezone=pytz.timezone('Europe/Copenhagen'))
print np.datetime_as_string(t[0], timezone=pytz.timezone('Europe/Paris'))
gives the “proper” ISO8601 strings
'2018-06-25T04:57:16.000000000-0400'
'2018-06-25T09:57:16.000000000+0100'
'2018-06-25T10:57:16.000000000+0200'
'2018-06-25T10:57:16.000000000+0200'
Numpy decided that including the timezone designation was clarifying behavior as documented here. Mantid doesn’t currently have a mechanism for inquiring where (geographically) data was measured which would be required for determining if the particular datetime was during daylight savings.
I will update the documentation, but a “fix” for the issue is not going change the behavior you see with str(t[0])
. The “best” solution I can think of is to add a method to the Facility
object to return the correct pytz
object to simplify using np.datetime_as_string
.
I think the problem is not in the final string representation of the numpy.datetime64 values but the initial creation of them. The end time of the previous run, and therefore the approximate time of the first log point in this one (I was in the cabin and witnessed it!) was 08:57:16+0100 (BST) or 07:57:16+0000 (UTC). Mantid might assume that the ambiguous string for run_start “2018-06-24T09:14:57” was in UTC and then convert it to numpy.datetime64 which when converted back in the Europe/London locale gives 10:14:57+0100 (and slightly earlier times for the first log points).
Your comments highlight that there may be more traps awaiting our users when they get back to their home institution in another time zone and look at their data with a script that isn’t very carefully written.
Originally, the Nexus file contains the ambiguous local times for “start” and “end”, and all the logs are then in seconds offset from “start”. The most important requirement for most scripts is to tell which log values were taken while the run was counting, and which were before (or after). Next, is that times of runs and logs agree with things the users note down in their lab notebook while here, or messages in the ISIS MCR news “beam off at 15:30”. So things should be consistent.
I think the proper fix ought to be:
- have a flag in the facilities file to give the “default” time zone for ambiguous data from that facility/instrument
- convert
ws.getRun().getProperty("run_start").value
and ws.getRun().getProperty("run_end").value
from strings to the same time data type as ws.getRun().getProperty(any_log).times
(using the above flag if they don’t have a time zone included). Use these converted values as the base for the time stamps of other logs. If returning something other than strings would break things, have another method such as “.value_as_datetime64”.
- the experimental control software should write
run_start
and run_end
in ISO8601 format from now on.
As a comparison, the “Sample Logs” dialog creates tables with:
- Numeric log values: the “time” column is numeric in seconds from run_start, i.e. not converted to a date/time at all.
- String log values: the “time” column is a string or similar shown as hh:mm:ss with no date or timezone information. These values agree with the “run_start”, at least for me in the same time zone as the instrument. Some care is needed for runs that were in progress at midnight. For example look at ICPEVENT.
These tables only exist at the QTIplot level and aren’t accessible from Python.
I agree that the absolute best fix would be for the data acquisition to write proper datetime strings into the files (including for the start
attribute of the logs which you’ll see below) that remove the ambiguity.
This is also a good time to point out that numpy changed the behavior of how it interprets ambiguous datetime strings (without timezone designation). One version assumes that ambiguous datetime strings are in local timezone, the other version assumes it is UTC. DateAndTimeTest.py shows this off.
Using nxbrowse to look at the contents of HIFI00134470
you’ll see that the datetime strings from both the start time and log are ambiguous:
NX/run> read start_time
start_time[19] (NX_CHAR) = 2018-06-25T09:14:57
NX/run> cd temperature_log_1/
NX/run/temperature_log_1> read name
name[11] (NX_CHAR) = Temp_Sample
available = 320
NX/run/temperature_log_1> read time
time[320] (NX_FLOAT32) = [ -1061.000000 -1055.000000 -1050.000000 ...]
units = seconds
start = 2018-06-25T09:14:57
This points out that the first log value is 1061 before the start of the run. Mantid uses the start
attribute of the log as the epoch for generating all of the times. Once mantid has read in a log, it is stored as a proper timestamp and converted to strings as appropriate. The times
method of a log returns a ndarray
of datetime64
as that is used significantly more than the string version of the datetimes. The DateAndTime documentation isn’t explicit, but you can convert the times to the seconds from the start of the log in one line:
time = (times - times[0])/np.timedelta64(1, 's')
If you want seconds from the start of the run, you’ll have to decide how to interpret that ambiguous datetime string.
I will be adding better timezone functionality for the facilities in the PR I’m working on for the release. Changing how the logs are initially read in require a bit more discussion. Another question is if we should convert the run_start
and run_end
to DateAndTime
objects on load in order to make manipulating things easier.