Tom Lane
2014-10-05 21:33:20 UTC
I got interested in the problem discussed in
1. Allow tznames entries to consist of an abbreviation and the name of
a zic timezone, for example
MSK Europe/Moscow
instead of the current scheme whereby an abbreviation is defined by a
daylight-savings flag and a numeric GMT offset. When an abbreviation is
defined this way, the implied offset and DST flag are looked up
dynamically as described below. (In my message quoted above, I'd imagined
that we'd write a DST flag and a zone name, but it turns out this does not
work because there are cases where the DST property has changed over time.
Yes, really. So this design mandates that we derive the DST flag by
looking into the zic timezone data.) Note that we'll still allow the old
style of entries, and indeed prefer that way for cases where an
abbreviation has never changed meaning, because:
* We need that anyway for forwards compatibility of existing custom
abbreviations files.
* It's a lot cheaper to interpret a fixed-meaning zone abbreviation using
the existing logic than to do it as I propose here, so we shouldn't spend
the extra cycles unless necessary.
* Converting every one of the existing abbreviation-file entries would be
really tedious, so I don't want to do it where not necessary.
Also note that this doesn't touch the aspect of the existing design
whereby there are multiple potential abbreviations files. We still have
the problem that the same abbreviation can be in use in different
timezones, so we have to let users configure which zone they mean by a
given abbreviation.
2. To interpret such an abbreviation in the context of timestamptz input,
look up the referenced zic timezone, and use the meaning of the
abbreviation that prevailed at or most recently before the local time
indicated by the rest of the timestamptz string. If the abbreviation was
never used before that time in the given zone, use its earliest later
interpretation; or if it was never used at all (ie bad configuration file)
throw error. Note that this is different from what happens if you give
the underlying zone name directly. It's always been the case that you
could say, for instance, "EST" to force interpretation of a datetime as
standard time even when DST is in force, or "EDT" to force the opposite
interpretation, and this definition preserves that behavior.
3. In the context of timetz input, we only have a time of day not a full
datetime to look at, so it's not entirely clear what to do. We could
throw an error, but that would result in rejecting some inputs currently
considered valid. Perhaps we don't really care, since we consider timetz
a deprecated type anyway. If that doesn't seem OK, we could assume
today's date and the given time-of-day and look up the abbreviation's
meaning as described above. This would mean that the meaning of, say,
'15:00 MSK'::timetz would change over time --- but that happens now,
whenever we change the contents of the abbreviations file entry for MSK,
so maybe this isn't as horrid as it sounds.
4. I've eyeballed the relevant code a bit, and it seems that the only
implementation aspect that isn't perfectly straightforward is figuring
out how to cram a zic timezone reference into a datetkn table entry.
I suggest that before tackling this feature proper, we bring struct
datetkn into the 21st century by widening it from 12 to 16 bytes, along
the lines of
typedef struct
{
char token[TOKMAXLEN + 1]; /* now always null-terminated */
char type;
int32 value;
} datetkn;
and getting rid of all of the very crufty code that deals with
non-null-terminated token strings and cramming values that don't really
fit into a char-sized field into "value". (We might save more code bytes
that way than we spend on the wider token-table entries :-( ... and we'll
certainly make the code less ugly.) Having done that, the "value" can be
large enough to be an index into additional storage appended to a
TimeZoneAbbrevTable. I imagine it pointing at a struct like this:
struct DynamicTimeZoneAbbrev
{
const pg_tz *tz; /* zic timezone, or NULL if not yet looked up */
char name[1]; /* zone name (variable length string)
};
We'd resolve the timezone name into a pg_tz pointer only upon first use of
a dynamic abbreviation, since we don't want to force loading of every zone
referenced in the configuration file at startup; many sessions wouldn't
ever use them.
(I also considered just allowing struct datetkn to contain a pointer; but
adding a union would make initialization of constant datetkn arrays more
notationally painful, and perhaps impossible with older C compilers.)
5. It's worth debating whether we should back-patch such a change.
It certainly is a feature addition, and as such not something we'd
normally consider back-patching, but:
* Those time-varying zone abbreviations are out there whether we like
it or not. As Bruce noted in the other thread, this is going to be
a pain point for a lot of people, particularly in Russia.
* Our maintenance processes for the timezone data files assume that we
can back-patch the same change into all active branches. It'll be a lot
more tedious and error-prone if we can only use this feature in the most
recent branches.
So I'm inclined to propose not merely doing this, but back-patching
into all supported branches. I can see that there might be consensus
against that though.
Thoughts, objections, better ideas?
regards, tom lane
It's becoming clear to me that our existing design whereby zone
abbreviations represent fixed GMT offsets isn't really good enough.
I've been wondering whether we could change things so that, for instance,
"EDT" means "daylight time according to America/New_York" and the system
would consult the zic database to find out what the prevailing GMT offset
was in that zone on that date. This would be a lot more robust in the
face of the kind of foolishness we now see actually goes on.
Here is a fairly detailed design sketch for a solution:abbreviations represent fixed GMT offsets isn't really good enough.
I've been wondering whether we could change things so that, for instance,
"EDT" means "daylight time according to America/New_York" and the system
would consult the zic database to find out what the prevailing GMT offset
was in that zone on that date. This would be a lot more robust in the
face of the kind of foolishness we now see actually goes on.
1. Allow tznames entries to consist of an abbreviation and the name of
a zic timezone, for example
MSK Europe/Moscow
instead of the current scheme whereby an abbreviation is defined by a
daylight-savings flag and a numeric GMT offset. When an abbreviation is
defined this way, the implied offset and DST flag are looked up
dynamically as described below. (In my message quoted above, I'd imagined
that we'd write a DST flag and a zone name, but it turns out this does not
work because there are cases where the DST property has changed over time.
Yes, really. So this design mandates that we derive the DST flag by
looking into the zic timezone data.) Note that we'll still allow the old
style of entries, and indeed prefer that way for cases where an
abbreviation has never changed meaning, because:
* We need that anyway for forwards compatibility of existing custom
abbreviations files.
* It's a lot cheaper to interpret a fixed-meaning zone abbreviation using
the existing logic than to do it as I propose here, so we shouldn't spend
the extra cycles unless necessary.
* Converting every one of the existing abbreviation-file entries would be
really tedious, so I don't want to do it where not necessary.
Also note that this doesn't touch the aspect of the existing design
whereby there are multiple potential abbreviations files. We still have
the problem that the same abbreviation can be in use in different
timezones, so we have to let users configure which zone they mean by a
given abbreviation.
2. To interpret such an abbreviation in the context of timestamptz input,
look up the referenced zic timezone, and use the meaning of the
abbreviation that prevailed at or most recently before the local time
indicated by the rest of the timestamptz string. If the abbreviation was
never used before that time in the given zone, use its earliest later
interpretation; or if it was never used at all (ie bad configuration file)
throw error. Note that this is different from what happens if you give
the underlying zone name directly. It's always been the case that you
could say, for instance, "EST" to force interpretation of a datetime as
standard time even when DST is in force, or "EDT" to force the opposite
interpretation, and this definition preserves that behavior.
3. In the context of timetz input, we only have a time of day not a full
datetime to look at, so it's not entirely clear what to do. We could
throw an error, but that would result in rejecting some inputs currently
considered valid. Perhaps we don't really care, since we consider timetz
a deprecated type anyway. If that doesn't seem OK, we could assume
today's date and the given time-of-day and look up the abbreviation's
meaning as described above. This would mean that the meaning of, say,
'15:00 MSK'::timetz would change over time --- but that happens now,
whenever we change the contents of the abbreviations file entry for MSK,
so maybe this isn't as horrid as it sounds.
4. I've eyeballed the relevant code a bit, and it seems that the only
implementation aspect that isn't perfectly straightforward is figuring
out how to cram a zic timezone reference into a datetkn table entry.
I suggest that before tackling this feature proper, we bring struct
datetkn into the 21st century by widening it from 12 to 16 bytes, along
the lines of
typedef struct
{
char token[TOKMAXLEN + 1]; /* now always null-terminated */
char type;
int32 value;
} datetkn;
and getting rid of all of the very crufty code that deals with
non-null-terminated token strings and cramming values that don't really
fit into a char-sized field into "value". (We might save more code bytes
that way than we spend on the wider token-table entries :-( ... and we'll
certainly make the code less ugly.) Having done that, the "value" can be
large enough to be an index into additional storage appended to a
TimeZoneAbbrevTable. I imagine it pointing at a struct like this:
struct DynamicTimeZoneAbbrev
{
const pg_tz *tz; /* zic timezone, or NULL if not yet looked up */
char name[1]; /* zone name (variable length string)
};
We'd resolve the timezone name into a pg_tz pointer only upon first use of
a dynamic abbreviation, since we don't want to force loading of every zone
referenced in the configuration file at startup; many sessions wouldn't
ever use them.
(I also considered just allowing struct datetkn to contain a pointer; but
adding a union would make initialization of constant datetkn arrays more
notationally painful, and perhaps impossible with older C compilers.)
5. It's worth debating whether we should back-patch such a change.
It certainly is a feature addition, and as such not something we'd
normally consider back-patching, but:
* Those time-varying zone abbreviations are out there whether we like
it or not. As Bruce noted in the other thread, this is going to be
a pain point for a lot of people, particularly in Russia.
* Our maintenance processes for the timezone data files assume that we
can back-patch the same change into all active branches. It'll be a lot
more tedious and error-prone if we can only use this feature in the most
recent branches.
So I'm inclined to propose not merely doing this, but back-patching
into all supported branches. I can see that there might be consensus
against that though.
Thoughts, objections, better ideas?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers