Post by Jim NasbyPost by Andres FreundPost by Jim NasbyThe "weird" part is that if it's not doing a freeze it will just punt
on a page if it can't get the cleanup lock.
I don't think that's particularly wierd. Otherwise vacuum can get stuck
behind a single very hot page - leading to much, much more bloat.
Post by Jim NasbyI have to believe that could seriously screw up autovacuum scheduling.
Why?
I'm worried there could be some pathological cases where we'd skip a
large number of pages, perhaps if a vacuum scan and a seqscan ended up
running alongside each other.
I've seen little evidence of that. The reverse, a stuck autovacuum, is
imo much more likely. For this to be an actual problem you'd need to
encounter many pages that are not locked, but are pinned. That state
doesn't exist for very long.
Post by Jim NasbyPerhaps this is just paranoia, but we have no idea how bad things
might be, because we don't have any logging for how many pages we
skipped because we couldn't lock them.
But so what? If we skip individual pages it won't be too bad - and very
likely waiting very long is going to be more painful. The page won't be
marked 'all visible' so the next vacuum will come around to it
again. And it'll also get cleaned up by opportunistic hot pruning.
Post by Jim NasbyAlso, if this really is that big a deal for heap pages, how come we
don't get screwed by it on Btree index pages, where we mandate that we
acquire a cleanup lock?
Because we never hold pins for btree pages for very long. Whereas we do
that for heap pages. If you e.g. run a cursor forward you can hold a pin
for essentially unbounded time.
Post by Jim NasbyPost by Andres FreundPost by Jim NasbyNow that we have forks, I'm wondering if it would be best to come up
with a per-page system that could be used to determine when a table
needs background work to be done. The visibility map could serve a lot
of this purpose, but I'm not sure if it would work for getting hint
bits set in the background.
It would. Per definition, all tuples that are 'all visible' need to be
fully hint bitted.
Post by Jim NasbyI think it would also be a win if we had a way to advance relfrozenxid
and relminmxid. Perhaps something that simply remembered the last XID
that touched each page...
Not sure what you're getting at here?
That ultimately, our current method for determining when and what to
vacuum is rather crude, and likely results in wasted effort during
scans as well as not firing autovac often enough. Keep in mind that
autovac started as a user-space utility and the best it could possibly
do was to keep a table of stats counters.
I agree that we should trigger autovacuum more often. It's
*intentionally* not triggered *at all* for insert only workloads (if you
discount anti wraparound vacuums). I think it's time to change that. For
that we'd need to make vacuums that don't delete any tuples cheaper. We
already rescan only the changed parts of the heaps - but we always scan
indexes fully...
Post by Jim NasbyThe visibility map obviously helps cut down on extra work during a
scan, but it only goes so far in that regard.
Aha.
Post by Jim NasbyInstead of relying on the crude methods, if we reliably tracked
certain txids on a per-block basis in a fork, we could cheaply scan
the fork and make an extremely informed decision on how much a vacuum
would gain us, and exactly what blocks it should hit.
Let me use freezing as an example. If we had a reliable list of the
lowest txid for each block of a relation that would allow us to do a
freeze scan by hitting only blocks with minimum txid within our freeze
range. The same could be done for multixacts.
It'd also become a prime contention point because you'd need to
constantly update it. In contrast to a simple 'is frozen' bit (akin to
is_visible) which only changes infrequently, and only in one direction.
Post by Jim NasbyIf we stored 3 txids for each block in a fork, we could fit
information for ~680 heap blocks in each fork block. So in a database
with 680G of heap data, we could fully determine every *block* (not
table) we needed to vacuum by scanning just 1GB of data. That would
allow for far better autovacuum scheduling than what we do today.
It's not that simple. Wraparounds and locking complicate it
significantly.
Post by Jim NasbyPost by Andres FreundI think the big missing piece lest something like Heikki's xid lsn
ranges thing gets finished is a freeze map.
The problem with a simple freeze map is when do you actually set the
bit?
There's precisely one place where you can set it for normal
operation. During vacuum's scan.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers