Kevin Grittner
2014-06-14 23:56:44 UTC
Attached is a WIP patch for implementing the capture of delta
relations for a DML statement, in the form of two tuplestores --
one for the old versions and one for the new versions. In the
short term it is intended to make these relations available in
trigger functions, although the patch so far doesn't touch any PLs
-- it just takes things as far as providing the relations as
tuplestores in the TriggerData structure when appropriate, for the
PLs to pick up from there. It seemed best to get agreement on the
overall approach before digging into all the PLs. This is
implemented only for INSERT, UPDATE, and DELETE since it wasn't
totally clear what the use cases and proper behavior was for other
triggers. Opinions on whether we should try to provide deltas for
other cases, and if so what the semantics are, are welcome.
Once triggers can access this delta data, it will also be used for
incremental maintenance of materialized views, although I don't
want get too sidetracked on any details of that until we have
proven delta data available in triggers. (One step at a time or
we'll never get there.)
I looked at the standard, and initially tried to implement the
standard syntax for this; however, it appeared that the reasons
given for not using standard syntax for the row variables also
apply to the transition relations (the term used by the standard).
There isn't an obvious way to tie that in to all the PLs we
support. It could be done, but it seems like it would intolerably
ugly, and more fragile than what we have done so far.
Some things which I *did* follow from the standard: these new
relations are only allowed within AFTER triggers, but are available
in both AFTER STATEMENT and AFTER ROW triggers. That is, an AFTER
UPDATE ... FOR EACH ROW trigger could use both the OLD and NEW row
variables as well as the delta relations (under whatever names we
pick). That probably won't be used very often, but I can imagine
some cases where it might be useful. I expect that these will
normally be used in FOR EACH STATEMENT triggers.
There are a couple things I would really like to get settled in
this round of review, so things don't need to be refactored in
major ways later:
(1) My first impulse was to capture this delta data in the form of
tuplestores of just TIDs, and fetching the tuples themselves from
the heap on reference. In earlier discussions others have argued
for using tuplestores of the actual rows themselves. I have taken
that advice here, but still don't feel 100% convinced. What I am
convinced of is that I don't want to write a lot of code based on
that decision and only then have people weigh in on the side of how
I had planned to do it in the first place. I hate it when that
happens.
(2) Do we want to just pick names for these in the PLs rather than
using the standard syntax? Implementing the standard syntax seemed
to require three new (unreserved) keywords, changes to the catalogs
to store the chosen relations names, and some way to tie the
specified relation names in to the various PLs. The way I have
gone here just adds two new fields to the TriggerData structure and
leaves it to each PL how to deal with that. Failure to do anything
in a PL just leaves it at the status quo with no real harm done --
it just won't have the new delta relations available.
Of course, any other comments on the approach taken or how it can
be improved are welcome.
At this point the only testing is that make check-world completes
without problems. If we can agree on this part of it I will look
at the PLs, and create regression tests. I would probably submit
each PL implementation as a separate patch.
I was surprised that the patch to this point was so small:
5 files changed, 170 insertions(+), 19 deletions(-)
Hopefully that's not due to having missed something.
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
relations for a DML statement, in the form of two tuplestores --
one for the old versions and one for the new versions. In the
short term it is intended to make these relations available in
trigger functions, although the patch so far doesn't touch any PLs
-- it just takes things as far as providing the relations as
tuplestores in the TriggerData structure when appropriate, for the
PLs to pick up from there. It seemed best to get agreement on the
overall approach before digging into all the PLs. This is
implemented only for INSERT, UPDATE, and DELETE since it wasn't
totally clear what the use cases and proper behavior was for other
triggers. Opinions on whether we should try to provide deltas for
other cases, and if so what the semantics are, are welcome.
Once triggers can access this delta data, it will also be used for
incremental maintenance of materialized views, although I don't
want get too sidetracked on any details of that until we have
proven delta data available in triggers. (One step at a time or
we'll never get there.)
I looked at the standard, and initially tried to implement the
standard syntax for this; however, it appeared that the reasons
given for not using standard syntax for the row variables also
apply to the transition relations (the term used by the standard).
There isn't an obvious way to tie that in to all the PLs we
support. It could be done, but it seems like it would intolerably
ugly, and more fragile than what we have done so far.
Some things which I *did* follow from the standard: these new
relations are only allowed within AFTER triggers, but are available
in both AFTER STATEMENT and AFTER ROW triggers. That is, an AFTER
UPDATE ... FOR EACH ROW trigger could use both the OLD and NEW row
variables as well as the delta relations (under whatever names we
pick). That probably won't be used very often, but I can imagine
some cases where it might be useful. I expect that these will
normally be used in FOR EACH STATEMENT triggers.
There are a couple things I would really like to get settled in
this round of review, so things don't need to be refactored in
major ways later:
(1) My first impulse was to capture this delta data in the form of
tuplestores of just TIDs, and fetching the tuples themselves from
the heap on reference. In earlier discussions others have argued
for using tuplestores of the actual rows themselves. I have taken
that advice here, but still don't feel 100% convinced. What I am
convinced of is that I don't want to write a lot of code based on
that decision and only then have people weigh in on the side of how
I had planned to do it in the first place. I hate it when that
happens.
(2) Do we want to just pick names for these in the PLs rather than
using the standard syntax? Implementing the standard syntax seemed
to require three new (unreserved) keywords, changes to the catalogs
to store the chosen relations names, and some way to tie the
specified relation names in to the various PLs. The way I have
gone here just adds two new fields to the TriggerData structure and
leaves it to each PL how to deal with that. Failure to do anything
in a PL just leaves it at the status quo with no real harm done --
it just won't have the new delta relations available.
Of course, any other comments on the approach taken or how it can
be improved are welcome.
At this point the only testing is that make check-world completes
without problems. If we can agree on this part of it I will look
at the PLs, and create regression tests. I would probably submit
each PL implementation as a separate patch.
I was surprised that the patch to this point was so small:
5 files changed, 170 insertions(+), 19 deletions(-)
Hopefully that's not due to having missed something.
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company