Discussion:
Deferring some AtStart* allocations?
Andres Freund
2014-06-29 23:15:44 UTC
Permalink
Hi,

In workloads with mostly simple statements, memory allocations are the
primary bottleneck. Some of the allocations are unlikely to be avoidable
without major work, but others seem superflous in many scenarios.

Why aren't we delaying allocations in e.g. AtStart_Inval(),
AfterTriggerBeginXact() to when the data structures are acutally used?
In most transactions neither will be?

Greetings,

Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Tom Lane
2014-06-29 23:52:23 UTC
Permalink
Post by Andres Freund
Why aren't we delaying allocations in e.g. AtStart_Inval(),
AfterTriggerBeginXact() to when the data structures are acutally used?
Aren't we? Neither of those would be doing much work certainly.

regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund
2014-06-30 00:19:31 UTC
Permalink
Post by Tom Lane
Post by Andres Freund
Why aren't we delaying allocations in e.g. AtStart_Inval(),
AfterTriggerBeginXact() to when the data structures are acutally used?
Aren't we? Neither of those would be doing much work certainly.
They are perhaps not doing much in absolute terms, but it's a fair share
of the processing overhead for simple statements. AfterTriggerBeginXact()
is called unconditionally from StartTransaction() and does three
MemoryContextAlloc()s. AtStart_Inval() one.
I think they should just be initialized whenever the memory is used?
Doesn't look too complicated to me.

Greetings,

Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Tom Lane
2014-06-30 01:12:49 UTC
Permalink
Post by Andres Freund
Post by Tom Lane
Post by Andres Freund
Why aren't we delaying allocations in e.g. AtStart_Inval(),
AfterTriggerBeginXact() to when the data structures are acutally used?
Aren't we? Neither of those would be doing much work certainly.
They are perhaps not doing much in absolute terms, but it's a fair share
of the processing overhead for simple statements. AfterTriggerBeginXact()
is called unconditionally from StartTransaction() and does three
MemoryContextAlloc()s. AtStart_Inval() one.
I think they should just be initialized whenever the memory is used?
Doesn't look too complicated to me.
Meh. Even "SELECT 1" is going to be doing *far* more pallocs than that to
get through raw parsing, parse analysis, planning, and execution startup.
If you can find a few hundred pallocs we can avoid in trivial queries,
it would get interesting; but I'll be astonished if saving 4 is measurable.

regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund
2014-06-30 01:22:07 UTC
Permalink
Post by Tom Lane
Post by Andres Freund
Post by Tom Lane
Post by Andres Freund
Why aren't we delaying allocations in e.g. AtStart_Inval(),
AfterTriggerBeginXact() to when the data structures are acutally used?
Aren't we? Neither of those would be doing much work certainly.
They are perhaps not doing much in absolute terms, but it's a fair share
of the processing overhead for simple statements. AfterTriggerBeginXact()
is called unconditionally from StartTransaction() and does three
MemoryContextAlloc()s. AtStart_Inval() one.
I think they should just be initialized whenever the memory is used?
Doesn't look too complicated to me.
Meh. Even "SELECT 1" is going to be doing *far* more pallocs than that to
get through raw parsing, parse analysis, planning, and execution startup.
The quick test I ran used prepared statements - there the number of
memory allocations is *much* lower...
Post by Tom Lane
If you can find a few hundred pallocs we can avoid in trivial queries,
it would get interesting; but I'll be astonished if saving 4 is measurable.
I only noticed it because it shows up in profiles. I doubt it'll even
remotely be noticeable without using prepared statements, but a lot of
people do use those.

Greetings,

Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas
2014-10-08 17:52:14 UTC
Permalink
Post by Tom Lane
Meh. Even "SELECT 1" is going to be doing *far* more pallocs than that to
get through raw parsing, parse analysis, planning, and execution startup.
If you can find a few hundred pallocs we can avoid in trivial queries,
it would get interesting; but I'll be astonished if saving 4 is measurable.
I got nerd-sniped by this problem today, probably after staring at the
profiling data very similar to what led Andres to ask the question in
the first place. The gains are indeed not measurable on a
macrobenchmark, but I had to write the patch to figure that out, so
here it is.

Although the gain isn't a measurable percentage of total runtime, it
does appear to be a significant percentage of the palloc traffic on
trivial queries. I did 30-second SELECT-only pgbench runs at scale
factor 10, using prepared queries. According to perf, on unpatched
master, StartTransactionCommand accounts for 11.46% of the calls to
AllocSetAlloc. (I imagine this is by runtime, not by call count, but
it probably doesn't matter much either way.) But with this patch,
StartTransactionCommand's share drops to 4.43%. Most of that is
coming from AtStart_Inval(), which wouldn't be hard to fix either.

I'm inclined to think that tamping down palloc traffic is worthwhile
even if we can't directly measure the savings on a macrobenchmark.
AllocSetAlloc is often at or near the top of profiling results, but
there's rarely a single caller responsible for enough of those calls
for to make it worth optimizing. But that means that if we refuse to
take patches that save just a few pallocs, we're basically giving up
on ever improving anything in this area, and that doesn't seem like a
good idea either; it's only going to get slowly worse over time as we
add more features.

BTW, if anyone's curious what the biggest source of AllocSetAlloc
calls is on this workload, the answer is ExecInitIndexScan(), which
looks to be indirectly responsible for almost half the total (or just
over half, with the patch applied). That apparently calls a lot of
different things that allocate small amounts of memory, and it
accounts for nearly all of the AllocSetAlloc traffic that originates
from PortalStart(). I haven't poked around in there enough to know
whether there's anything worth optimizing, but given the degree to
which this patch shifts the profile, I bet the number that it takes to
make a material savings is more like a few dozen than a few hundred.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Amit Kapila
2014-10-09 03:37:05 UTC
Permalink
Post by Robert Haas
Post by Tom Lane
Meh. Even "SELECT 1" is going to be doing *far* more pallocs than that to
get through raw parsing, parse analysis, planning, and execution startup.
If you can find a few hundred pallocs we can avoid in trivial queries,
it would get interesting; but I'll be astonished if saving 4 is measurable.
I got nerd-sniped by this problem today, probably after staring at the
profiling data very similar to what led Andres to ask the question in
the first place. The gains are indeed not measurable on a
macrobenchmark, but I had to write the patch to figure that out, so
here it is.
Although the gain isn't a measurable percentage of total runtime, it
does appear to be a significant percentage of the palloc traffic on
trivial queries. I did 30-second SELECT-only pgbench runs at scale
factor 10, using prepared queries. According to perf, on unpatched
master, StartTransactionCommand accounts for 11.46% of the calls to
AllocSetAlloc. (I imagine this is by runtime, not by call count, but
it probably doesn't matter much either way.) But with this patch,
StartTransactionCommand's share drops to 4.43%. Most of that is
coming from AtStart_Inval(), which wouldn't be hard to fix either.
I'm inclined to think that tamping down palloc traffic is worthwhile
even if we can't directly measure the savings on a macrobenchmark.
AllocSetAlloc is often at or near the top of profiling results, but
there's rarely a single caller responsible for enough of those calls
for to make it worth optimizing. But that means that if we refuse to
take patches that save just a few pallocs, we're basically giving up
on ever improving anything in this area, and that doesn't seem like a
good idea either; it's only going to get slowly worse over time as we
add more features.
Yeah, this makes sense, even otherwise also if somebody comes
up with a much larger patch which reduce few dozen of pallocs and shows
noticeable gain, then it will be difficult to quantify the patch based on
which particular pallocs gives improvements, as reducing few pallocs
might not give any noticeable gain.
Post by Robert Haas
BTW, if anyone's curious what the biggest source of AllocSetAlloc
calls is on this workload, the answer is ExecInitIndexScan(), which
looks to be indirectly responsible for almost half the total (or just
over half, with the patch applied). That apparently calls a lot of
different things that allocate small amounts of memory, and it
accounts for nearly all of the AllocSetAlloc traffic that originates
from PortalStart().
One way to dig into this problem is that we look at each individual
pallocs in PortalStart() path and try to see which one's can be
optimized, another way is that we introduce a concept of
Prepared Portals some thing similar to Prepared Statements, but for
Portal. This will not only avoid the pallocs in executor startup phase,
but can improve the performance by avoiding many other calls related
to executor startup. I understand that this will be a much bigger
project, however the gains can also be substantial for prepared
statements when all/most of the data fits in memory. It seems other
databases also have similar concepts for execution (example Oracle
uses Cursor_sharing/Shared cursors to achieve the same)


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Andres Freund
2014-10-09 09:34:48 UTC
Permalink
Post by Robert Haas
Post by Tom Lane
Meh. Even "SELECT 1" is going to be doing *far* more pallocs than that to
get through raw parsing, parse analysis, planning, and execution startup.
If you can find a few hundred pallocs we can avoid in trivial queries,
it would get interesting; but I'll be astonished if saving 4 is measurable.
I got nerd-sniped by this problem today, probably after staring at the
profiling data very similar to what led Andres to ask the question in
the first place. The gains are indeed not measurable on a
macrobenchmark, but I had to write the patch to figure that out, so
here it is.
Although the gain isn't a measurable percentage of total runtime, it
does appear to be a significant percentage of the palloc traffic on
trivial queries. I did 30-second SELECT-only pgbench runs at scale
factor 10, using prepared queries. According to perf, on unpatched
master, StartTransactionCommand accounts for 11.46% of the calls to
AllocSetAlloc. (I imagine this is by runtime, not by call count, but
it probably doesn't matter much either way.) But with this patch,
StartTransactionCommand's share drops to 4.43%. Most of that is
coming from AtStart_Inval(), which wouldn't be hard to fix either.
Interesting - in my local profile AtStart_Inval() is more pronounced
than AfterTriggerBeginQuery(). I've quickly and in a ugly fashion hacked
AtStart_Inval() out of readonly queries ontop of your patch. Together
that yields a ~3.5% performance improvement in my trivial 'SELECT * FROM
tbl WHER pkey = xxx' testcase.
Post by Robert Haas
I'm inclined to think that tamping down palloc traffic is worthwhile
even if we can't directly measure the savings on a macrobenchmark.
AllocSetAlloc is often at or near the top of profiling results, but
there's rarely a single caller responsible for enough of those calls
for to make it worth optimizing. But that means that if we refuse to
take patches that save just a few pallocs, we're basically giving up
on ever improving anything in this area, and that doesn't seem like a
good idea either; it's only going to get slowly worse over time as we
add more features.
I think it depends a bit on the callsites. If its somewhere that nobody
will ever care, because it's a slowpath, then we shouldn't care
either. But that's not the case here, so I do think that makes sense.

Greetings,

Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas
2014-10-09 12:18:18 UTC
Permalink
Post by Andres Freund
Interesting - in my local profile AtStart_Inval() is more pronounced
than AfterTriggerBeginQuery(). I've quickly and in a ugly fashion hacked
AtStart_Inval() out of readonly queries ontop of your patch. Together
that yields a ~3.5% performance improvement in my trivial 'SELECT * FROM
tbl WHER pkey = xxx' testcase.
Whoa. Now that's clearly significant. You didn't attach the patch;
was that inadvertent, or was it too ugly for that?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund
2014-10-09 12:20:38 UTC
Permalink
Post by Robert Haas
Post by Andres Freund
Interesting - in my local profile AtStart_Inval() is more pronounced
than AfterTriggerBeginQuery(). I've quickly and in a ugly fashion hacked
AtStart_Inval() out of readonly queries ontop of your patch. Together
that yields a ~3.5% performance improvement in my trivial 'SELECT * FROM
tbl WHER pkey = xxx' testcase.
Whoa. Now that's clearly significant.
Well, my guess it'll be far less noticeable in less trivial
workloads. But it does seem worthwile.
Post by Robert Haas
You didn't attach the patch; was that inadvertent, or was it too ugly
for that?
Far, far too ugly ;). I just removed the AtStart() call from xact.c and
sprinkled it around relevant places instead ;)

Greetings,

Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas
2014-10-09 19:01:19 UTC
Permalink
Post by Andres Freund
Post by Robert Haas
Post by Andres Freund
Interesting - in my local profile AtStart_Inval() is more pronounced
than AfterTriggerBeginQuery(). I've quickly and in a ugly fashion hacked
AtStart_Inval() out of readonly queries ontop of your patch. Together
that yields a ~3.5% performance improvement in my trivial 'SELECT * FROM
tbl WHER pkey = xxx' testcase.
Whoa. Now that's clearly significant.
Well, my guess it'll be far less noticeable in less trivial
workloads. But it does seem worthwile.
Post by Robert Haas
You didn't attach the patch; was that inadvertent, or was it too ugly
for that?
Far, far too ugly ;). I just removed the AtStart() call from xact.c and
sprinkled it around relevant places instead ;)
OK, here's an attempt at a real patch for that. I haven't perf-tested this.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Andres Freund
2014-10-09 19:53:02 UTC
Permalink
Post by Robert Haas
Post by Andres Freund
Post by Robert Haas
Post by Andres Freund
Interesting - in my local profile AtStart_Inval() is more pronounced
than AfterTriggerBeginQuery(). I've quickly and in a ugly fashion hacked
AtStart_Inval() out of readonly queries ontop of your patch. Together
that yields a ~3.5% performance improvement in my trivial 'SELECT * FROM
tbl WHER pkey = xxx' testcase.
Whoa. Now that's clearly significant.
Well, my guess it'll be far less noticeable in less trivial
workloads. But it does seem worthwile.
Post by Robert Haas
You didn't attach the patch; was that inadvertent, or was it too ugly
for that?
Far, far too ugly ;). I just removed the AtStart() call from xact.c and
sprinkled it around relevant places instead ;)
OK, here's an attempt at a real patch for that. I haven't perf-tested this.
Neato. With a really trivial SELECT:

before:
tps = 28150.794776 (excluding connections establishing)
after:
tps = 29978.767703 (excluding connections establishing)

slightly more meaningful:

before:
tps = 21272.400039 (including connections establishing)
after:
tps = 22290.703482 (excluding connections establishing)

So that's a noticeable win. Obviously it's going to be less for more
complicated stuff, but still...

I've not really looked at the patches though.

Greetings,

Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas
2014-10-09 21:02:02 UTC
Permalink
Post by Andres Freund
Post by Robert Haas
OK, here's an attempt at a real patch for that. I haven't perf-tested this.
tps = 28150.794776 (excluding connections establishing)
tps = 29978.767703 (excluding connections establishing)
tps = 21272.400039 (including connections establishing)
tps = 22290.703482 (excluding connections establishing)
So that's a noticeable win. Obviously it's going to be less for more
complicated stuff, but still...
I've not really looked at the patches though.
Yeah, not bad at all for the amount of work involved. Want to
disclose the actual queries?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund
2014-10-09 21:03:48 UTC
Permalink
Post by Robert Haas
Post by Andres Freund
Post by Robert Haas
OK, here's an attempt at a real patch for that. I haven't perf-tested this.
tps = 28150.794776 (excluding connections establishing)
tps = 29978.767703 (excluding connections establishing)
tps = 21272.400039 (including connections establishing)
tps = 22290.703482 (excluding connections establishing)
So that's a noticeable win. Obviously it's going to be less for more
complicated stuff, but still...
I've not really looked at the patches though.
Yeah, not bad at all for the amount of work involved. Want to
disclose the actual queries?
SELECT 1;
SELECT * FROM smalltable WHERE id = xxx;

The latter is something quite frequent in the real world, so it's not
all academic...

Greetings,

Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Simon Riggs
2014-10-16 10:26:27 UTC
Permalink
Post by Robert Haas
OK, here's an attempt at a real patch for that. I haven't perf-tested this.
Patch looks good to me. Surprised we didn't do that before.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund
2014-10-21 16:00:00 UTC
Permalink
Hi,
/*
@@ -960,18 +966,38 @@ AtEOXact_Inval(bool isCommit)
...
+ /*
+ * We create invalidation stack entries lazily, so the parent might
+ * not have one. Instead of creating one, moving all the data over,
+ * and then freeing our own, we can just adjust the level of our own
+ * entry.
+ */
+ if (myInfo->parent == NULL || myInfo->parent->my_level < my_level - 1)
+ {
+ myInfo->my_level--;
+ return;
+ }
+
I think this bit might not be correct. What if the subxact one level up
aborts? Then it'll miss dealing with these invalidation entries. Or am I
misunderstanding something?

I like the patch, except the above potential issue.

Greetings,

Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas
2014-10-23 16:04:40 UTC
Permalink
Post by Andres Freund
/*
@@ -960,18 +966,38 @@ AtEOXact_Inval(bool isCommit)
...
+ /*
+ * We create invalidation stack entries lazily, so the parent might
+ * not have one. Instead of creating one, moving all the data over,
+ * and then freeing our own, we can just adjust the level of our own
+ * entry.
+ */
+ if (myInfo->parent == NULL || myInfo->parent->my_level < my_level - 1)
+ {
+ myInfo->my_level--;
+ return;
+ }
+
I think this bit might not be correct. What if the subxact one level up
aborts? Then it'll miss dealing with these invalidation entries. Or am I
misunderstanding something?
One of us is. I think you're asking about a situation where we have a
transaction, and a subtransaction, and within that another
subtransaction. Only the innermost subtransaction has invalidation
messages. At the innermost level, we commit; the above code makes
those messages the responsibility of the outer subtransaction. If
that subtransaction abouts, AtEOSubXact_Inval() gets called again,
sees that it has messages (that it inherited from the innermost
subtransaction), and takes the exact same code-path that it would have
pre-patch.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund
2014-10-21 15:52:09 UTC
Permalink
Post by Robert Haas
Post by Tom Lane
Meh. Even "SELECT 1" is going to be doing *far* more pallocs than that to
get through raw parsing, parse analysis, planning, and execution startup.
If you can find a few hundred pallocs we can avoid in trivial queries,
it would get interesting; but I'll be astonished if saving 4 is measurable.
I got nerd-sniped by this problem today, probably after staring at the
profiling data very similar to what led Andres to ask the question in
the first place.
I've loolked through this patch and I'm happy with it. One could argue
that the current afterTriggers == NULL checks should be replaced with a
Assert ensuring we're inside the xact. But I think you're right in
removing them, and I don't think we actually need the asserts.

Greetings,

Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Loading...