Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

Discussion:

Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

Alexey Bashtanov

2014-10-09 10:34:17 UTC

Hello!

Autovacuum daemon performs vacuum when the number of rows
updated/deleted (n_dead_tuples) reaches some threshold.
Similarly it performs analyze when the number of rows changed in any way
(incl. inserted).
When a table is mostly insert-only, its visibility map is not updated as
vacuum threshold is almost never reached, but analyze does not update
visibility map.

Why could it be a bad idea to run vacuum after some number of any
changes including inserts, like analyze?
Or at least make it tunable by user (add a second bunch of paramters to
control second vacuum threshold, disabled by default)?

Best regards,
Alexey Bashtanov

--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Bruce Momjian

2014-10-09 20:58:51 UTC

Post by Alexey Bashtanov
Hello!
Autovacuum daemon performs vacuum when the number of rows
updated/deleted (n_dead_tuples) reaches some threshold.
Similarly it performs analyze when the number of rows changed in any
way (incl. inserted).
When a table is mostly insert-only, its visibility map is not
updated as vacuum threshold is almost never reached, but analyze
does not update visibility map.
Why could it be a bad idea to run vacuum after some number of any
changes including inserts, like analyze?
Or at least make it tunable by user (add a second bunch of paramters
to control second vacuum threshold, disabled by default)?

I agree this is a serious problem. We have discussed various options,
but have not decided on anything. The TODO list has:

https://wiki.postgresql.org/wiki/Todo

Improve setting of visibility map bits for read-only and insert-only
workloads

http://www.postgresql.org/message-id/***@momjian.us
--
Bruce Momjian <***@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Alvaro Herrera

2014-10-09 21:03:00 UTC

Post by Bruce Momjian
I agree this is a serious problem. We have discussed various options,
https://wiki.postgresql.org/wiki/Todo
Improve setting of visibility map bits for read-only and insert-only
workloads

I hate to repeat myself, but I think autovacuum could be modified to run
actions other than vacuum and analyze. In this specific case we could
be running a table scan that checks only pages that don't have the
all-visible bit set, and see if it can be set. (Of course, this idea
needs refinement to avoid running over and over when the bit cannot be
set on some pages for whatever reason.)

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

2014-10-09 21:11:26 UTC

Post by Alvaro Herrera

Post by Bruce Momjian
I agree this is a serious problem. We have discussed various options,
https://wiki.postgresql.org/wiki/Todo
Improve setting of visibility map bits for read-only and insert-only
workloads

I hate to repeat myself, but I think autovacuum could be modified to run
actions other than vacuum and analyze. In this specific case we could
be running a table scan that checks only pages that don't have the
all-visible bit set, and see if it can be set.

Isn't that *precisely* what a plain vacuum run does?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Alvaro Herrera

2014-10-09 21:16:46 UTC

Post by Andres Freund

Post by Alvaro Herrera

Post by Bruce Momjian
I agree this is a serious problem. We have discussed various options,
https://wiki.postgresql.org/wiki/Todo
Improve setting of visibility map bits for read-only and insert-only
workloads

I hate to repeat myself, but I think autovacuum could be modified to run
actions other than vacuum and analyze. In this specific case we could
be running a table scan that checks only pages that don't have the
all-visible bit set, and see if it can be set.

Isn't that *precisely* what a plain vacuum run does?

Well, it also scans for dead tuples, removes them, and needs to go
through indexes to remove their references. I'm thinking in something
very lightweight. Otherwise, why don't we just reduce the
vacuum_scale_factor default to something very small, so that vacuum is
triggered more often?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

2014-10-09 21:19:41 UTC

Post by Alvaro Herrera

Post by Andres Freund

Post by Alvaro Herrera

Post by Bruce Momjian
I agree this is a serious problem. We have discussed various options,
https://wiki.postgresql.org/wiki/Todo
Improve setting of visibility map bits for read-only and insert-only
workloads

I hate to repeat myself, but I think autovacuum could be modified to run
actions other than vacuum and analyze. In this specific case we could
be running a table scan that checks only pages that don't have the
all-visible bit set, and see if it can be set.

Isn't that *precisely* what a plain vacuum run does?

Well, it also scans for dead tuples, removes them, and needs to go
through indexes to remove their references.

IIRC it doesn't do most of that if that there's no need. And if it's a
insert only table without rollbacks. I *do* think there's some
optimizations we could make in general.

Post by Alvaro Herrera
I'm thinking in something
very lightweight. Otherwise, why don't we just reduce the
vacuum_scale_factor default to something very small, so that vacuum is
triggered more often?

The problem here is that that doesn't trigger for inserts. Just for
updates/deletes or rollbacks.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Jim Nasby

2014-10-19 02:36:48 UTC

Post by Andres Freund

Post by Alvaro Herrera

Post by Alvaro Herrera

Post by Bruce Momjian
I agree this is a serious problem. We have discussed various options,
https://wiki.postgresql.org/wiki/Todo
Improve setting of visibility map bits for read-only and insert-only
workloads

I hate to repeat myself, but I think autovacuum could be modified to run
actions other than vacuum and analyze. In this specific case we could
be running a table scan that checks only pages that don't have the
all-visible bit set, and see if it can be set.

Isn't that*precisely* what a plain vacuum run does?

Well, it also scans for dead tuples, removes them, and needs to go
through indexes to remove their references.

IIRC it doesn't do most of that if that there's no need. And if it's a
insert only table without rollbacks. I*do* think there's some
optimizations we could make in general.

No, it always attempts dead tuple removal. The "weird" part is that if it's not doing a freeze it will just punt on a page if it can't get the cleanup lock. I have to believe that could seriously screw up autovacuum scheduling.

Now that we have forks, I'm wondering if it would be best to come up with a per-page system that could be used to determine when a table needs background work to be done. The visibility map could serve a lot of this purpose, but I'm not sure if it would work for getting hint bits set in the background. I think it would also be a win if we had a way to advance relfrozenxid and relminmxid. Perhaps something that simply remembered the last XID that touched each page...

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

2014-10-19 16:41:34 UTC

Post by Jim Nasby

Post by Andres Freund

Post by Alvaro Herrera

Post by Alvaro Herrera

Post by Bruce Momjian
I agree this is a serious problem. We have discussed various options,
https://wiki.postgresql.org/wiki/Todo
Improve setting of visibility map bits for read-only and insert-only
workloads

I hate to repeat myself, but I think autovacuum could be modified to run
actions other than vacuum and analyze. In this specific case we could
be running a table scan that checks only pages that don't have the
all-visible bit set, and see if it can be set.

Isn't that*precisely* what a plain vacuum run does?

Well, it also scans for dead tuples, removes them, and needs to go
through indexes to remove their references.

IIRC it doesn't do most of that if that there's no need. And if it's a
insert only table without rollbacks. I*do* think there's some
optimizations we could make in general.

No, it always attempts dead tuple removal.

I said some steps, not all steps. Check it out:

/* If any tuples need to be deleted, perform final vacuum cycle */
/* XXX put a threshold on min number of tuples here? */
if (vacrelstats->num_dead_tuples > 0)
{
/* Log cleanup info before we touch indexes */
vacuum_log_cleanup_info(onerel, vacrelstats);

/* Remove index entries */
for (i = 0; i < nindexes; i++)
lazy_vacuum_index(Irel[i],
&indstats[i],
vacrelstats);
/* Remove tuples from heap */
lazy_vacuum_heap(onerel, vacrelstats);
vacrelstats->num_index_scans++;
}

There's rub here though. We unconditionally do:
/* Do post-vacuum cleanup and statistics update for each index */
for (i = 0; i < nindexes; i++)
lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);

and that's not particularly cheap. Maybe we should make that conditional
when there's been no lazy_vacuum_index/heap calls at all?

Post by Jim Nasby
The "weird" part is that if it's not doing a freeze it will just punt
on a page if it can't get the cleanup lock.

I don't think that's particularly wierd. Otherwise vacuum can get stuck
behind a single very hot page - leading to much, much more bloat.

Post by Jim Nasby
I have to believe that could seriously screw up autovacuum scheduling.

Why?

Post by Jim Nasby
Now that we have forks, I'm wondering if it would be best to come up
with a per-page system that could be used to determine when a table
needs background work to be done. The visibility map could serve a lot
of this purpose, but I'm not sure if it would work for getting hint
bits set in the background.

It would. Per definition, all tuples that are 'all visible' need to be
fully hint bitted.

Post by Jim Nasby
I think it would also be a win if we had a way to advance relfrozenxid
and relminmxid. Perhaps something that simply remembered the last XID
that touched each page...

Not sure what you're getting at here?

I think the big missing piece lest something like Heikki's xid lsn
ranges thing gets finished is a freeze map.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

2014-10-19 16:50:30 UTC

Post by Andres Freund
/* Do post-vacuum cleanup and statistics update for each index */
for (i = 0; i < nindexes; i++)
lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);
and that's not particularly cheap. Maybe we should make that conditional
when there's been no lazy_vacuum_index/heap calls at all?

Absolutely not. If the cleanup step is skippable, it would be the
province of the index AM to make that decision.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

2014-10-19 16:54:59 UTC

Post by Tom Lane

Post by Andres Freund
/* Do post-vacuum cleanup and statistics update for each index */
for (i = 0; i < nindexes; i++)
lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);
and that's not particularly cheap. Maybe we should make that conditional
when there's been no lazy_vacuum_index/heap calls at all?

Absolutely not. If the cleanup step is skippable, it would be the
province of the index AM to make that decision.

Fair point. At the moment we're doing a full of nbtree indexes everytime
we do a vacuum. Even when the heap vacuum only scanned a couple hundred
pages of a huge table. That makes partial vacuum noticeably less
useful. So I do think we need to do something to improve upon the
situation.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Jim Nasby

2014-10-20 01:43:29 UTC

Post by Andres Freund

Post by Jim Nasby

Post by Andres Freund

Post by Alvaro Herrera

Post by Alvaro Herrera

Post by Bruce Momjian
I agree this is a serious problem. We have discussed various options,
https://wiki.postgresql.org/wiki/Todo
Improve setting of visibility map bits for read-only and insert-only
workloads

I hate to repeat myself, but I think autovacuum could be modified to run
actions other than vacuum and analyze. In this specific case we could
be running a table scan that checks only pages that don't have the
all-visible bit set, and see if it can be set.

Isn't that*precisely* what a plain vacuum run does?

Well, it also scans for dead tuples, removes them, and needs to go
through indexes to remove their references.

IIRC it doesn't do most of that if that there's no need. And if it's a
insert only table without rollbacks. I*do* think there's some
optimizations we could make in general.

No, it always attempts dead tuple removal.

/* If any tuples need to be deleted, perform final vacuum cycle */
/* XXX put a threshold on min number of tuples here? */
if (vacrelstats->num_dead_tuples > 0)
{
/* Log cleanup info before we touch indexes */
vacuum_log_cleanup_info(onerel, vacrelstats);
/* Remove index entries */
for (i = 0; i < nindexes; i++)
lazy_vacuum_index(Irel[i],
&indstats[i],
vacrelstats);
/* Remove tuples from heap */
lazy_vacuum_heap(onerel, vacrelstats);
vacrelstats->num_index_scans++;
}
/* Do post-vacuum cleanup and statistics update for each index */
for (i = 0; i < nindexes; i++)
lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);
and that's not particularly cheap. Maybe we should make that conditional
when there's been no lazy_vacuum_index/heap calls at all?

We could possibly pass in to lazy_cleanup_index whether we actually removed any tuples.

Post by Andres Freund

Post by Jim Nasby
The "weird" part is that if it's not doing a freeze it will just punt
on a page if it can't get the cleanup lock.

I don't think that's particularly wierd. Otherwise vacuum can get stuck
behind a single very hot page - leading to much, much more bloat.

Post by Jim Nasby
I have to believe that could seriously screw up autovacuum scheduling.

Why?

I'm worried there could be some pathological cases where we'd skip a large number of pages, perhaps if a vacuum scan and a seqscan ended up running alongside each other.

Perhaps this is just paranoia, but we have no idea how bad things might be, because we don't have any logging for how many pages we skipped because we couldn't lock them.

Also, if this really is that big a deal for heap pages, how come we don't get screwed by it on Btree index pages, where we mandate that we acquire a cleanup lock?

Post by Andres Freund

Post by Jim Nasby
Now that we have forks, I'm wondering if it would be best to come up
with a per-page system that could be used to determine when a table
needs background work to be done. The visibility map could serve a lot
of this purpose, but I'm not sure if it would work for getting hint
bits set in the background.

It would. Per definition, all tuples that are 'all visible' need to be
fully hint bitted.

Post by Jim Nasby
I think it would also be a win if we had a way to advance relfrozenxid
and relminmxid. Perhaps something that simply remembered the last XID
that touched each page...

Not sure what you're getting at here?

That ultimately, our current method for determining when and what to vacuum is rather crude, and likely results in wasted effort during scans as well as not firing autovac often enough. Keep in mind that autovac started as a user-space utility and the best it could possibly do was to keep a table of stats counters.

The visibility map obviously helps cut down on extra work during a scan, but it only goes so far in that regard.

Instead of relying on the crude methods, if we reliably tracked certain txids on a per-block basis in a fork, we could cheaply scan the fork and make an extremely informed decision on how much a vacuum would gain us, and exactly what blocks it should hit.

Let me use freezing as an example. If we had a reliable list of the lowest txid for each block of a relation that would allow us to do a freeze scan by hitting only blocks with minimum txid within our freeze range. The same could be done for multixacts.

That's just one example. We could do something similar for background hinting (oldest xmin/xmax of all non-hinted tuples), and whether it's worth trying to vacuum (I think that'd be a combination of oldest non-locking xmax and seeing how much room the FSM has listed for the page).

If we stored 3 txids for each block in a fork, we could fit information for ~680 heap blocks in each fork block. So in a database with 680G of heap data, we could fully determine every *block* (not table) we needed to vacuum by scanning just 1GB of data. That would allow for far better autovacuum scheduling than what we do today.

Post by Andres Freund
I think the big missing piece lest something like Heikki's xid lsn
ranges thing gets finished is a freeze map.

The problem with a simple freeze map is when do you actually set the bit? If you do it while the transaction that created all the tuples is still running then any attempt to use the map prior to those tuples becoming all visible is pointless. Presumably this is why pd_prune_xid stores a txid and isn't just a boolean.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

2014-10-20 20:11:14 UTC

Post by Jim Nasby

Post by Andres Freund

Post by Jim Nasby
The "weird" part is that if it's not doing a freeze it will just punt
on a page if it can't get the cleanup lock.

I don't think that's particularly wierd. Otherwise vacuum can get stuck
behind a single very hot page - leading to much, much more bloat.

Post by Jim Nasby
I have to believe that could seriously screw up autovacuum scheduling.

Why?

I'm worried there could be some pathological cases where we'd skip a
large number of pages, perhaps if a vacuum scan and a seqscan ended up
running alongside each other.

I've seen little evidence of that. The reverse, a stuck autovacuum, is
imo much more likely. For this to be an actual problem you'd need to
encounter many pages that are not locked, but are pinned. That state
doesn't exist for very long.

Post by Jim Nasby
Perhaps this is just paranoia, but we have no idea how bad things
might be, because we don't have any logging for how many pages we
skipped because we couldn't lock them.

But so what? If we skip individual pages it won't be too bad - and very
likely waiting very long is going to be more painful. The page won't be
marked 'all visible' so the next vacuum will come around to it
again. And it'll also get cleaned up by opportunistic hot pruning.

Post by Jim Nasby
Also, if this really is that big a deal for heap pages, how come we
don't get screwed by it on Btree index pages, where we mandate that we
acquire a cleanup lock?

Because we never hold pins for btree pages for very long. Whereas we do
that for heap pages. If you e.g. run a cursor forward you can hold a pin
for essentially unbounded time.

Post by Jim Nasby

Post by Andres Freund

Post by Jim Nasby
Now that we have forks, I'm wondering if it would be best to come up
with a per-page system that could be used to determine when a table
needs background work to be done. The visibility map could serve a lot
of this purpose, but I'm not sure if it would work for getting hint
bits set in the background.

It would. Per definition, all tuples that are 'all visible' need to be
fully hint bitted.

Post by Jim Nasby
I think it would also be a win if we had a way to advance relfrozenxid
and relminmxid. Perhaps something that simply remembered the last XID
that touched each page...

Not sure what you're getting at here?

That ultimately, our current method for determining when and what to
vacuum is rather crude, and likely results in wasted effort during
scans as well as not firing autovac often enough. Keep in mind that
autovac started as a user-space utility and the best it could possibly
do was to keep a table of stats counters.

I agree that we should trigger autovacuum more often. It's
*intentionally* not triggered *at all* for insert only workloads (if you
discount anti wraparound vacuums). I think it's time to change that. For
that we'd need to make vacuums that don't delete any tuples cheaper. We
already rescan only the changed parts of the heaps - but we always scan
indexes fully...

Post by Jim Nasby
The visibility map obviously helps cut down on extra work during a
scan, but it only goes so far in that regard.

Aha.

Post by Jim Nasby
Instead of relying on the crude methods, if we reliably tracked
certain txids on a per-block basis in a fork, we could cheaply scan
the fork and make an extremely informed decision on how much a vacuum
would gain us, and exactly what blocks it should hit.
Let me use freezing as an example. If we had a reliable list of the
lowest txid for each block of a relation that would allow us to do a
freeze scan by hitting only blocks with minimum txid within our freeze
range. The same could be done for multixacts.

It'd also become a prime contention point because you'd need to
constantly update it. In contrast to a simple 'is frozen' bit (akin to
is_visible) which only changes infrequently, and only in one direction.

Post by Jim Nasby
If we stored 3 txids for each block in a fork, we could fit
information for ~680 heap blocks in each fork block. So in a database
with 680G of heap data, we could fully determine every *block* (not
table) we needed to vacuum by scanning just 1GB of data. That would
allow for far better autovacuum scheduling than what we do today.

It's not that simple. Wraparounds and locking complicate it
significantly.

Post by Jim Nasby

Post by Andres Freund
I think the big missing piece lest something like Heikki's xid lsn
ranges thing gets finished is a freeze map.

The problem with a simple freeze map is when do you actually set the
bit?

There's precisely one place where you can set it for normal
operation. During vacuum's scan.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Jim Nasby

2014-10-21 00:39:16 UTC

Post by Andres Freund

Post by Jim Nasby

Post by Andres Freund

Post by Jim Nasby
The "weird" part is that if it's not doing a freeze it will just punt
on a page if it can't get the cleanup lock.

I don't think that's particularly wierd. Otherwise vacuum can get stuck
behind a single very hot page - leading to much, much more bloat.

Post by Jim Nasby
I have to believe that could seriously screw up autovacuum scheduling.

Why?

I'm worried there could be some pathological cases where we'd skip a
large number of pages, perhaps if a vacuum scan and a seqscan ended up
running alongside each other.

I've seen little evidence of that. The reverse, a stuck autovacuum, is
imo much more likely. For this to be an actual problem you'd need to
encounter many pages that are not locked, but are pinned. That state
doesn't exist for very long.

Post by Jim Nasby
Perhaps this is just paranoia, but we have no idea how bad things
might be, because we don't have any logging for how many pages we
skipped because we couldn't lock them.

But so what? If we skip individual pages it won't be too bad - and very
likely waiting very long is going to be more painful. The page won't be
marked 'all visible' so the next vacuum will come around to it
again. And it'll also get cleaned up by opportunistic hot pruning.

Probably true. Hopefully we can start logging it and then we'll know for sure.

Post by Andres Freund

Post by Jim Nasby
That ultimately, our current method for determining when and what to
vacuum is rather crude, and likely results in wasted effort during
scans as well as not firing autovac often enough. Keep in mind that
autovac started as a user-space utility and the best it could possibly
do was to keep a table of stats counters.

I agree that we should trigger autovacuum more often. It's
*intentionally* not triggered *at all* for insert only workloads (if you
discount anti wraparound vacuums). I think it's time to change that. For
that we'd need to make vacuums that don't delete any tuples cheaper. We
already rescan only the changed parts of the heaps - but we always scan
indexes fully...

Or maybe vacuum isn't the right way to handle some of these scenarios. It's become the catch-all for all of this stuff, but maybe that doesn't make sense anymore. Certainly when it comes to dealing with inserts there's no reason we *have* to do anything other than set hint bits and possibly freeze xmin.

Post by Andres Freund

Post by Jim Nasby
Instead of relying on the crude methods, if we reliably tracked
certain txids on a per-block basis in a fork, we could cheaply scan
the fork and make an extremely informed decision on how much a vacuum
would gain us, and exactly what blocks it should hit.
Let me use freezing as an example. If we had a reliable list of the
lowest txid for each block of a relation that would allow us to do a
freeze scan by hitting only blocks with minimum txid within our freeze
range. The same could be done for multixacts.

It'd also become a prime contention point because you'd need to
constantly update it. In contrast to a simple 'is frozen' bit (akin to
is_visible) which only changes infrequently, and only in one direction.

Actually, the contention on freeze would very possibly be minimal, because it probably doesn't change very often. Even if it did, it's OK if the value isn't 100% accurate, so long as the recorded XID is guaranteed older than what's actually on the page.

Post by Andres Freund

Post by Jim Nasby
If we stored 3 txids for each block in a fork, we could fit
information for ~680 heap blocks in each fork block. So in a database
with 680G of heap data, we could fully determine every *block* (not
table) we needed to vacuum by scanning just 1GB of data. That would
allow for far better autovacuum scheduling than what we do today.

It's not that simple. Wraparounds and locking complicate it
significantly.

I realize what I'm talking about isn't trivial (though, I'm confused by your comment about wraparound since presumably TransactionIdPrecedes() and it's ilk solve that problem...)

My ultimate point here is that we're using what are (today) very crude methods to control what gets vacuumed when, and I think that now that we have resource forks would could do *much* better without a tremendous amount of work. But to make a big advancement here we'll need to take a step back and rethink some things (like vacuum is the only way to handle these problems).

Let me put some thought into this.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Josh Berkus

2014-10-21 00:43:26 UTC

Post by Jim Nasby
Or maybe vacuum isn't the right way to handle some of these scenarios.
It's become the catch-all for all of this stuff, but maybe that doesn't
make sense anymore. Certainly when it comes to dealing with inserts
there's no reason we *have* to do anything other than set hint bits and
possibly freeze xmin.

+1
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

2014-10-21 00:46:47 UTC

Post by Jim Nasby
Or maybe vacuum isn't the right way to handle some of these scenarios.
It's become the catch-all for all of this stuff, but maybe that doesn't
make sense anymore. Certainly when it comes to dealing with inserts
there's no reason we *have* to do anything other than set hint bits and
possibly freeze xmin.

+1

A page read is a page read. What's the point of heaving another process
do it? Vacuum doesn't dirty pages if they don't have to be
dirtied. Especially stuff like freezing cannot really be dealt with
outside of vacuum unless you make already complex stuff more complex for
a marginal benefit.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Jeff Janes

2014-10-21 21:36:58 UTC

Post by Andres Freund

Post by Jim Nasby
Or maybe vacuum isn't the right way to handle some of these scenarios.
It's become the catch-all for all of this stuff, but maybe that doesn't
make sense anymore. Certainly when it comes to dealing with inserts
there's no reason we *have* to do anything other than set hint bits and
possibly freeze xmin.

+1

A page read is a page read. What's the point of heaving another process
do it?

It is only a page read if you have to read the page. It would seem optimal
to have bgwriter adventitiously set hint bits and vm bits, because that is
the last point at which the page can be changed without risking that it be
written out twice. At that point, it has been given the maximum amount of
time it can be given for the interested transactions to have committed and
to have aged past the xmin horizon. I seem to recall that the main problem
with that, though, is that you must be attached to a database in order to
determine visibility, and bgwriter is not attached to a database.

Cheers,

Jeff

Jim Nasby

2014-10-21 22:25:58 UTC

Post by Andres Freund

Post by Jim Nasby
Or maybe vacuum isn't the right way to handle some of these scenarios.
It's become the catch-all for all of this stuff, but maybe that doesn't
make sense anymore. Certainly when it comes to dealing with inserts
there's no reason we *have* to do anything other than set hint bits and
possibly freeze xmin.

+1

A page read is a page read. What's the point of heaving another process
do it?
It is only a page read if you have to read the page. It would seem optimal to have bgwriter adventitiously set hint bits and vm bits, because that is the last point at which the page can be changed without risking that it be written out twice. At that point, it has been given the maximum amount of time it can be given for the interested transactions to have committed and to have aged past the xmin horizon. I seem to recall that the main problem with that, though, is that you must be attached to a database in order to determine visibility, and bgwriter is not attached to a database.

It's also a bit more complex than a simple question of "is the page still in shared buffers". Our *real* last chance is when the page is about to be evicted from the filesystem cache; after that reading it back it will be extremely expensive (relatively speaking).

I think it's worth considering this, because if you have any moderate length transactions on a busy database bgwriter won't be able to help much; you'll be burning through shared buffers too quickly.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kevin Grittner

2014-10-09 21:10:42 UTC

Post by Alvaro Herrera

Post by Bruce Momjian
I agree this is a serious problem. We have discussed various options,
https://wiki.postgresql.org/wiki/Todo
Improve setting of visibility map bits for read-only and insert-only
workloads

I hate to repeat myself, but I think autovacuum could be modified to run
actions other than vacuum and analyze. In this specific case we could
be running a table scan that checks only pages that don't have the
all-visible bit set, and see if it can be set. (Of course, this idea
needs refinement to avoid running over and over when the bit cannot be
set on some pages for whatever reason.)

Wouldn't we get substantially the same thing just by counting tuple
inserts toward the autovacuum vacuum threshold? I mean, it unless
the table is due for wraparound prevention autovacuum, it will only
visit pages that don't have the all-visible bit set, right? And
how much work would that do beyond what you're describing if none
of the tuples are dead?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Alvaro Herrera

2014-10-09 21:18:29 UTC

Post by Kevin Grittner
Wouldn't we get substantially the same thing just by counting tuple
inserts toward the autovacuum vacuum threshold? I mean, it unless
the table is due for wraparound prevention autovacuum, it will only
visit pages that don't have the all-visible bit set, right? And
how much work would that do beyond what you're describing if none
of the tuples are dead?

The problem is precisely what happens if there are some dead tuples, but
not enough to reach the 20% threshold: this vacuum now has to scan the
table twice and has to clean up indexes also.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Jim Nasby

2014-10-09 23:15:29 UTC

Post by Alvaro Herrera

Post by Bruce Momjian
I agree this is a serious problem. We have discussed various options,
https://wiki.postgresql.org/wiki/Todo
Improve setting of visibility map bits for read-only and insert-only
workloads

I hate to repeat myself, but I think autovacuum could be modified to run
actions other than vacuum and analyze. In this specific case we could
be running a table scan that checks only pages that don't have the
all-visible bit set, and see if it can be set. (Of course, this idea
needs refinement to avoid running over and over when the bit cannot be
set on some pages for whatever reason.)

If we go down that road we should also think about having it proactively set hint bits...

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

19 Replies
21 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Alexey Bashtanov 2014-10-09 10:34:17 UTC

Bruce Momjian 2014-10-09 20:58:51 UTC

Alvaro Herrera 2014-10-09 21:03:00 UTC

Andres Freund 2014-10-09 21:11:26 UTC

Alvaro Herrera 2014-10-09 21:16:46 UTC

Andres Freund 2014-10-09 21:19:41 UTC

Jim Nasby 2014-10-19 02:36:48 UTC

Andres Freund 2014-10-19 16:41:34 UTC

Tom Lane 2014-10-19 16:50:30 UTC

Andres Freund 2014-10-19 16:54:59 UTC

Jim Nasby 2014-10-20 01:43:29 UTC

Andres Freund 2014-10-20 20:11:14 UTC

Jim Nasby 2014-10-21 00:39:16 UTC

Josh Berkus 2014-10-21 00:43:26 UTC

Andres Freund 2014-10-21 00:46:47 UTC

Jeff Janes 2014-10-21 21:36:58 UTC

Jim Nasby 2014-10-21 22:25:58 UTC

Kevin Grittner 2014-10-09 21:10:42 UTC

Alvaro Herrera 2014-10-09 21:18:29 UTC

Jim Nasby 2014-10-09 23:15:29 UTC

about - legalese

Loading...