Question about RI checks

Discussion:

Nick Barnes

2014-10-16 13:51:29 UTC

One of the queries in ri_triggers.c has be a little baffled.

For (relatively) obvious reasons, a FK insert triggers a SELECT 1 FROM
pk_rel ... FOR KEY SHARE.
For not-so-obvious reasons, a PK delete triggers a SELECT 1 FROM fk_rel ...
FOR KEY SHARE.

I can't see what the lock on fk_rel achieves. Both operations are already
contending for the lock on the PK row, which seems like enough to cover
every eventuality.

And even if the lock serves a purpose, KEY SHARE is an odd choice, since
the referencing field is, in general, not a "key" in this sense.

Can anyone provide an explanation / counterexample?

Florian Pflug

2014-10-21 15:04:49 UTC

Permalink

(CCing Alvaro, since he implemented KEY SHARE locks)

Post by Nick Barnes
One of the queries in ri_triggers.c has be a little baffled.
For (relatively) obvious reasons, a FK insert triggers a SELECT 1 FROM pk_rel ... FOR KEY SHARE.
For not-so-obvious reasons, a PK delete triggers a SELECT 1 FROM fk_rel ... FOR KEY SHARE.
I can't see what the lock on fk_rel achieves. Both operations are already
contending for the lock on the PK row, which seems like enough to cover
every eventuality.

I remember wondering about this too, quite a while ago. That was before we
had KEY locks, i.e. it simply read "FOR SHARE" back then.

I don't think I ever figured out if and why that lock is necessary - so here's
an attempt at unraveling this.

The lock certainly isn't required to ensure that we see all the child rows in
the fk_rel -- since we're doing this in an AFTER trigger, we're already holding
the equivalent of an UPDATE lock on the parent row, so no new fk_rel rows can
appear concurrently. Note that "seeing" here refers to an up-to-date snapshot
-- in REPEATABLE READ mode and above, our transaction's snapshot doesn't
necessarily see those rows, but ri_PerformCheck ensure that we error out if our
transaction's snapshot prevents us from seeing any newly added child rows
(c.f. detectNewRows).

However, DELETing from fk_rel isn't restricted by any of the constraint triggers,
so child rows may still be deleted concurrently. So without the lock, it might
happen that we report a constraint violation, yet the offending row is already
gone by the time we report the error. Note that ri_PerformCheck's detectNewRows
flag doesn't enter here -- AFAIK, it only raises an error if the transaction's
snapshot sees *fewer* rows than a current snapshot.

So AFAICS, the current behaviour (sans the KEY SHARE / SHARE confusion, see
below) is this:

In READ COMMITED mode, we'll ignore rows that are deleted or reparented
concurrently (after waiting for the concurrent transaction to commit, of course)

In REPEATABLE READ mode and above, we'll report a serialization error if a child
row is deleted or reparented concurrently (if the concurrent transaction committed,
of course)

Without the lock, we'd report a constraint violation in both cases.

So in conclusion, the lock avoids raising constraint violation errors in
a few cases in READ COMMITTED mode. In REPEATABLE READ mode, it converts some
constraint violation errors into serialization failures. Or at least that's
how it looks to me.

Post by Nick Barnes
And even if the lock serves a purpose, KEY SHARE is an odd choice, since
the referencing field is, in general, not a "key" in this sense.

Hm, yeah, that's certainly weird. AFAICS, what this will do is prevent any
concurrent update of any columns mentions in any unique index on the *fk_rel*
- but as you say, that set doesn't necessarily the set of columns in the
FK constraint at all.

So currently, it seems that the lock only prevent concurrent DELETES, but
not necessarily concurrent UPDATEs, even if such an UPDATE changes the parent
that a child row refers to.

Independent from whether the lock is actually desirable or not, that
inconsistency certainly looks like a bug to me...

best regards,
Florian Pflug

--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kevin Grittner

2014-10-21 16:19:55 UTC

Permalink

Post by Florian Pflug
So in conclusion, the lock avoids raising constraint violation errors in
a few cases in READ COMMITTED mode. In REPEATABLE READ mode, it converts some
constraint violation errors into serialization failures. Or at least that's
how it looks to me.

It doesn't seem like this analysis considers all of the available ON
DELETE and ON UPDATE behaviors available. Besides RESTRICT there is
CASCADE, SET NULL, SET DEFAULT, and NO ACTION. Some of those
require updating the referencing rows.

Post by Florian Pflug

Post by Nick Barnes
And even if the lock serves a purpose, KEY SHARE is an odd choice, since
the referencing field is, in general, not a "key" in this sense.

Hm, yeah, that's certainly weird.

I don't think I understand that either.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Nick Barnes

2014-10-21 17:27:57 UTC

Permalink

Post by Kevin Grittner
It doesn't seem like this analysis considers all of the available ON
DELETE and ON UPDATE behaviors available. Besides RESTRICT there is
CASCADE, SET NULL, SET DEFAULT, and NO ACTION. Some of those
require updating the referencing rows.

I think the logic in question is specific to RESTRICT and NO ACTION. The
other cases don't look like they need to explicitly lock anything; the
UPDATE / DELETE itself should take care of that.

Post by Kevin Grittner

Post by Florian Pflug
So in conclusion, the lock avoids raising constraint violation errors in
a few cases in READ COMMITTED mode. In REPEATABLE READ mode, it converts

some

Post by Florian Pflug
constraint violation errors into serialization failures. Or at least

that's

Post by Florian Pflug
how it looks to me.

Post by Florian Pflug

Post by Nick Barnes
And even if the lock serves a purpose, KEY SHARE is an odd choice, since
the referencing field is, in general, not a "key" in this sense.

Hm, yeah, that's certainly weird.

I don't think I understand that either.
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Florian Pflug

2014-10-22 10:31:12 UTC

Permalink

I go the REPEATABLE READ case wrong there, and that case is actually broken!.
I initially assumed that trying to lock a row whose latest version is invisible
a transaction in mode REPEATABLE READ or above will result in a serialization
error. But for the queries run by ri_triggers.c, this is not the case.

AFAICS, the only executor node which takes the crosscheck snapshot into account
is nodeModifyTable. So both a plain SELECT and a SELECT FOR [KEY] SHARE | UPDATE
will simply run with the snapshot passed to the SPI, which for the query in
question is always a *current* snapshot (we pass the original snapshot as the
crosscheck snapshot in REPEATABLE READ). Thus, if there's a child row that
is visible to the transaction deleting the parent row, but that child was meanwhile
removed, it seems that we currently *won't* complain. The same holds, of course
for mode SERIALIZABLE -- we don't distinguish the two in ri_triggers.c.

But that's wrong. The transaction's snapshot *would* see that row, so we
ought to raise an error. Note that this applies also to mode SERIALIZABLE, and
breaks true serializability in some cases, since we don't do conflict detection
for RI enforcement queries.

Here's a test case, involving two transaction A and B. I tried this on
REL9_4_STABLE.

Setup
-----
CREATE TABLE parent (id int NOT NULL PRIMARY KEY);
CREATE TABLE child (id int NOT NULL PRIMARY KEY,
parent_id int NOT NULL REFERENCES parent (id));
INSERT INTO parent (id) VALUES (1);
INSERT INTO child (id, parent_id) VALUES (1, 1);

Failure Case
------------
A:: set default_transaction_isolation='serializable';
A:: begin;
A:: select * from child;
-> id | parent_id
----+-----------
1 | 1
B:: set default_transaction_isolation='serializable';
B:: begin;
B:: delete from child;
-> DELETE 1
B:: commit;
A:: delete from parent;
-> DELETE 1
A:: commit;

A can neither come before B in any serializable history (since the DELETE
should fail then), nor after it (since it shouldn't see the row in the child
table then). Yet we don't complain, even though both transaction run in
SERIALIZABLE mode.

Post by Florian Pflug

I think the logic in question is specific to RESTRICT and NO ACTION.
The other cases don't look like they need to explicitly lock anything;
the UPDATE / DELETE itself should take care of that.

Exactly. We run the dubious SELECT FROM <fkrel> ... FOR KEY SHARE query *only*
for RESTRICT and NO ACTION.

I've traced through the revisions of ri_triggers.c a bit. Locking the
conflicting child rows in the parent table's DELETE and UPDATE triggers was
added in commit 0882951b0cdb4c6686e57121d56216cb2044f7eb which is dated
1999-12-08. (This was before we even has row-level SHARE locks, so the code
did a SELECT .. FOR UPDATE back then). This is also the commit which added
FOR UPDATE locking to RI_FKey_check, i.e. which ensured that parent rows are
locked when adding new child rows. It's commit message unfortunately doesn't
offer much in terms of explanations - it just says "Fixed concurrent visibility
bug."

When SHARE locks where implemented in commit bedb78d386a47fd66b6cda2040e0a5fb545ee371,
dating from 2005-04-28, it seems that "FOR UPDATE" was simply replaced by
"FOR SHARE" throughout ri_triggers.c. The same seems to have happened when
KEY SHARE locks where introduced on 2013-01-13 with commit
0ac5ad5134f2769ccbaefec73844f8504c4d6182.

best regards,
Florian Pflug

--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kevin Grittner

2014-10-22 15:46:42 UTC

Permalink

Post by Florian Pflug
But that's wrong. The transaction's snapshot *would* see that row, so we
ought to raise an error. Note that this applies also to mode SERIALIZABLE, and
breaks true serializability in some cases, since we don't do conflict detection
for RI enforcement queries.
Here's a test case, involving two transaction A and B. I tried this on
REL9_4_STABLE.
Setup
-----
CREATE TABLE parent (id int NOT NULL PRIMARY KEY);
CREATE TABLE child (id int NOT NULL PRIMARY KEY,
parent_id int NOT NULL REFERENCES parent (id));
INSERT INTO parent (id) VALUES (1);
INSERT INTO child (id, parent_id) VALUES (1, 1);
Failure Case
------------
A:: set default_transaction_isolation='serializable';
A:: begin;
A:: select * from child;
-> id | parent_id
----+-----------
1 | 1
B:: set default_transaction_isolation='serializable';
B:: begin;
B:: delete from child;
-> DELETE 1
B:: commit;
A:: delete from parent;
-> DELETE 1
A:: commit;
A can neither come before B in any serializable history (since the DELETE
should fail then), nor after it (since it shouldn't see the row in the child
table then). Yet we don't complain, even though both transaction run in
SERIALIZABLE mode.

Simplifying the display of this a bit:

Tables parent and child each have one row.

Transaction A
=============
select * from child;
[it sees the child row]
Transaction B
=============
delete from child;
[child row is deleted]
delete from parent;
[parent row deleted]

TA sees the row that TB deletes, creating a rw-dependency that
implies that TA ran first. On the other hand, the FK trigger
fired by the delete from parent *doesn't* see the row deleted by
TB, which implies that TB ran first. I agree, this is a problem
for serializable transactions.

This should not be considered a problem for repeatable read
transactions because the change in visible rows meet the definition
of phantom reads, which are allowed in repeatable read: "A
transaction re-executes a query returning a set of rows that
satisfy a search condition and finds that the set of rows
satisfying the condition has changed due to another
recently-committed transaction." Phantom reads are not *required*
to occur in repeatable read transactions, and in PostgreSQL they
generally don't, so we *might* want to change this behavior; I'm
just saying that we are conforming to requirements of the standard
even if we don't. Leaving this alone for repeatable read
transactions would require a change to our documentation, though,
since we currently assert that we don't allow phantom reads in our
repeatable read implementation.

Either SSI needs to include the RI checking queries in its tracking
*or* TA needs to throw an error if there are child rows visible
according to its transaction snapshot -- at least in serializable
transactions.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Florian Pflug

2014-10-22 16:44:39 UTC

Permalink

Post by Kevin Grittner
This should not be considered a problem for repeatable read
transactions because the change in visible rows meet the definition
of phantom reads, which are allowed in repeatable read: "A
transaction re-executes a query returning a set of rows that
satisfy a search condition and finds that the set of rows
satisfying the condition has changed due to another
recently-committed transaction."

Now I'm confused. Isn't the whole point of REPEATABLE READ to provide, well, repeatable reads? Also, note that after the DELETE FROM parent, further SELECTS in the same transaction will use the original snapshot again, und thus will see the conflicting child rows again that were ignored by the RI trigger. But they won't, of course, see the parent row.

IOW, transaction A will, after the delete, see a state of the database in which the PK constraint is broken. I don't think that's acceptable in any isolation level.

Best regards,
Florian Pflug

--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kevin Grittner

2014-10-22 17:07:16 UTC

Permalink

Post by Florian Pflug

Post by Kevin Grittner
This should not be considered a problem for repeatable read
transactions because the change in visible rows meet the
definition of phantom reads, which are allowed in repeatable
read: "A transaction re-executes a query returning a set of rows
that satisfy a search condition and finds that the set of rows
satisfying the condition has changed due to another
recently-committed transaction."

Now I'm confused. Isn't the whole point of REPEATABLE READ to
provide, well, repeatable reads?

What the standard requires for REPEATABLE READ is that if you
re-read the same row later in a transaction it will containt the
same values -- the read of any particular *row* is repeatable (if
it exists at both times); it does not guarantee that the same
selection criteria will return the same set of rows every time.
The standard only requires that of SERIALIZABLE transactions.

PostgreSQL has historically provided more rigorous protections at
REPEATABLE READ than the standard requires. Our docs claim:

| When you select the level Read Uncommitted you really get Read
| Committed, and phantom reads are not possible in the PostgreSQL
| implementation of Repeatable Read, so the actual isolation level
| might be stricter than what you select. This is permitted by the
| SQL standard: the four isolation levels only define which
| phenomena must not happen, they do not define which phenomena
| must happen.

As you have shown, our FK/RI implementation exposes phantom reads
to clients, so at a minimum our docs are wrong.

Post by Florian Pflug
Also, note that after the DELETE FROM parent, further SELECTS in
the same transaction will use the original snapshot again, und
thus will see the conflicting child rows again that were ignored
by the RI trigger. But they won't, of course, see the parent row.
IOW, transaction A will, after the delete, see a state of the
database in which the PK constraint is broken. I don't think
that's acceptable in any isolation level.

Good point. Based on that observation, I agree that our RI is
broken at both the REPEATABLE READ and SERIALIZABLE isolation
levels. I think that READ COMMITTED is OK, because it will see the
child row as deleted in time to prevent problems.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kevin Grittner

2014-10-23 15:45:52 UTC

Permalink

Post by Kevin Grittner

Post by Florian Pflug
Also, note that after the DELETE FROM parent, further SELECTS in
the same transaction will use the original snapshot again, und
thus will see the conflicting child rows again that were ignored
by the RI trigger. But they won't, of course, see the parent
row.
IOW, transaction A will, after the delete, see a state of the
database in which the PK constraint is broken. I don't think
that's acceptable in any isolation level.

Good point. Based on that observation, I agree that our RI is
broken at both the REPEATABLE READ and SERIALIZABLE isolation
levels. I think that READ COMMITTED is OK, because it will see
the child row as deleted in time to prevent problems.

Every way I look at it, inside a REPEATABLE READ or SERIALIZABLE
transaction a check for child rows when validating a parent DELETE
should consider both rows which exist according to the transaction
snapshot and according to a "current" snapshot. Interestingly, the
run of the query passes both snapshots through to the executor, but
for this query the estate->es_crosscheck_snapshot field (which
contains the transaction snapshot) doesn't seem to be consulted.
It makes me wonder whether we were at some point doing this right
and it later got broken.

Before I write a patch to fix this, does anyone feel that we should
not use that -- in other words, does anyone consider that it is OK
for a REPEATABLE READ or SERIALIZABLE transaction to delete a
referenced row if that transaction can see referencing rows but a
concurrent transaction has deleted them? (This currently allows
subsequent queries in the transaction to see orphaned "child" rows
when they can no longer see the parent.)

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Florian Pflug

2014-10-23 18:41:46 UTC

Permalink

Post by Kevin Grittner
Every way I look at it, inside a REPEATABLE READ or SERIALIZABLE
transaction a check for child rows when validating a parent DELETE
should consider both rows which exist according to the transaction
snapshot and according to a "current" snapshot. Interestingly, the
run of the query passes both snapshots through to the executor, but
for this query the estate->es_crosscheck_snapshot field (which
contains the transaction snapshot) doesn't seem to be consulted.
It makes me wonder whether we were at some point doing this right
and it later got broken.

I've been pondering a completely different way to fix this. Many years
ago I tried to get rid of the crosscheck snapshot completely by changing
the way locking conflicts are treated for REPEATABLE READ transactions
above.

The basic idea is that taking a share lock on a row implies that you're
going to apply further changes whose correctness depends on existence
of the row you lock. That, in particular, applies to the locks taken
by RI triggers -- we lock the parent row before we add children, because
the children's existence necessitates the existence of the parent. If
you take an exclusive lock, OTOH, that implies a modification of the row
itself (we never explicitly take that lock during an UPDATE or DELETE,
but we do so implicitly, because UPDATEs and DELETEs conflict with SHARE
locks). So after obtaining such a lock, its the lock holder's responsibility
to check that the desired update doesn't break anything, i.e. in the case
of RI that it doesn't create any orphaned children.

The only reason we need the crosscheck snapshot to do that is because
children may have been added (and the change committed) *after* the
transaction which removed the parent has taken its snapshot, but *before*
that transaction locks the parent row.

My proposal is to instead extend the locking protocol to prevent that.
Essentially, we have to raise a serialization error whenever

1) We attempt to exclusively lock a row (this includes updating or deleting
it), and

2) somebody else did hold a share lock on that row recently, and

3) That somebody is invisible to us according to our snapshot.

My initial attempt to do that failed, because we used to have very little
means of storing the locking history - the only source of information was
the xmax field, so any update of a tuple removed information about previous
lock holders - even if that update was later aborted. I pondered using
multi-xids for this, but at the time I deemed that too risky - back at the
time, they had a few wraparound issues and the like which were OK for share
locks, but not for what I intended.

But now that we have KEY SHARE locks, the situation changes. We now rely on
multi-xids to a much greater extent, and AFAIK multi-xid wraparound is now
correctly dealt with. We also already ensure that the information contained
in multi-xids are preserved across tuple upgrades (otherwise, updating a row
on which someone holds a KEY SHARE lock would be broken).

So all that is missing, I think, is

1) To make sure we only remove a multi-xid if none of the xids are invisible
to any snapshot (i.e. lie before GlobalXmin or something like that).

2) When we acquire a lock (either explicitly or implicitly by doing an
UPDATE or DELETE) check if all previous committed lock holders are
visible according to our snapshot, and raise a serialization error
if not.

The big advantage of doing that over fixing the crosscheck logic would be
that it'd make it possible to write concurrency-safe FK triggers in any
procedural language.

best regards,
Florian Pflug

--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Nick Barnes

2014-10-21 16:54:06 UTC

Permalink

Thanks! I've been mulling this over for weeks; nice to know it wasn't just
staring me in the face...

So in conclusion, the lock avoids raising constraint violation errors in

Post by Florian Pflug
a few cases in READ COMMITTED mode. In REPEATABLE READ mode, it converts some
constraint violation errors into serialization failures. Or at least that's
how it looks to me.

Yeah, it had occurred to me that this is one place you might see some
benefit. But waiting around on a potentially irrelevant update, just in
case the RI violation resolves itself, didn't really sound like a net win.
Not to mention the possibility of a deadlock, if the other transaction
updates our PK or adds another reference to it.

Thanks again,
Nick Barnes