Discussion:
Prepared statements considered harmful
Peter Eisentraut
2006-08-31 11:56:29 UTC
Permalink
With time, it becomes ever clearer to me that prepared SQL statements are just
a really bad idea. On some days, it seems like half the performance problems
in PostgreSQL-using systems are because a bad plan was cached somewhere. I'd
say, in the majority of cases the time you save parsing and planning is
irrelevant compared to the possibly disastrous effects of wrong or suboptimal
plans. I wonder if other people have similar experiences.

I'd wish that we reconsider when and how prepared statements are used. The
JDBC interface and PL/pgSQL are frequently noticed perpetrators, but the
problem is really all over the place.

A couple of actions to consider:

- Never use prepared statements unless the user has turned them on. (This is
the opposite of the current behavior.)

- Transparently invalidate and regenerate prepared plans more often. This
could be tied to the transaction count, update activity obtained from the
statistics collector, etc.

- Redefine "prepared" to mean "parsed" rather than "parsed and planned".

Each of these or similar changes would only solve a subset of the possible
problems. Possibly, we need more knobs to adjust these things. But
something needs to be done.

Comments?
--
Peter Eisentraut
http://developer.postgresql.org/~petere/

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
Martijn van Oosterhout
2006-08-31 12:09:09 UTC
Permalink
Post by Peter Eisentraut
With time, it becomes ever clearer to me that prepared SQL statements are just
a really bad idea. On some days, it seems like half the performance problems
in PostgreSQL-using systems are because a bad plan was cached somewhere. I'd
say, in the majority of cases the time you save parsing and planning is
irrelevant compared to the possibly disastrous effects of wrong or suboptimal
plans. I wonder if other people have similar experiences.
Yeah, it seems to me that many of the benefits of not planning are
overrun by the effects of bad plans.
Post by Peter Eisentraut
- Redefine "prepared" to mean "parsed" rather than "parsed and planned".
I think this is the best. Some way to specify that you don't want
planning to take place immediately would be good.

One question though: there is a function PQexecParams(). Does this
suffer from the same problem? I imagine most interfaces like
out-of-band parameters (no escaping issues), why do they not use this?

Have a nice day,
--
Post by Peter Eisentraut
From each according to his ability. To each according to his ability to litigate.
Csaba Nagy
2006-08-31 12:11:19 UTC
Permalink
Post by Peter Eisentraut
- Redefine "prepared" to mean "parsed" rather than "parsed and planned".
How about "prepared" means really "prepared"... in the sense of parsed,
analyzed all sensible plans, and save a meta-plan which based on current
statistics and parameter values chooses one of the considered (and
cached) plans ?

That would be immune both to statistics changes and parameter value
changes in certain limits. It would be also a lot more complex too than
a simple plan...

Cheers,
Csaba.





---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
Peter Eisentraut
2006-08-31 12:32:47 UTC
Permalink
Post by Csaba Nagy
How about "prepared" means really "prepared"... in the sense of parsed,
analyzed all sensible plans, and save a meta-plan which based on current
statistics and parameter values chooses one of the considered (and
cached) plans ?
I don't think this could solve one particularly frequent problem which is that
pattern matching queries don't get along with prepared plans if the search
pattern isn't known at planning time.
--
Peter Eisentraut
http://developer.postgresql.org/~petere/

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org
Csaba Nagy
2006-08-31 12:52:05 UTC
Permalink
Post by Peter Eisentraut
Post by Csaba Nagy
How about "prepared" means really "prepared"... in the sense of parsed,
analyzed all sensible plans, and save a meta-plan which based on current
statistics and parameter values chooses one of the considered (and
cached) plans ?
I don't think this could solve one particularly frequent problem which is that
pattern matching queries don't get along with prepared plans if the search
pattern isn't known at planning time.
Why not ? I specifically said you would prepare a few sensible plans
based on statistics/expected variations of the statistics, and parameter
value ranges which would trigger different plans.

So for the like query case you could save 2 plans, one for the indexable
case, one for the not indexable case. Then at runtime you choose the
proper one based on the pattern value. The meta-plan I mentioned would
be a collection of plans with rules to choose the right one at run time
based on parameter values and perhaps the current statistics.

This of course would need a lot more preparation time than just prepare
one plan, but that's why you want to do it upfront and then cache the
results. A central plan repository mentioned in other posts would fit
nicely here... and you could use prepared plans for non-parameterized
queries too by simply considering the constants as parameters, to
increase the chances for a prepared plan reuse - this of course for
complex enough queries.

Cheers,
Csaba.


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
zhou bo
2006-08-31 13:05:31 UTC
Permalink
hello everyone ,


i has been add to you guys' mail list by accident, i don't how to
refuse to receive your mails, would you please help me to remove my mail
address form mail group pgsql-***@postgresql.org?

i appreciatewhat you will do for me. (my mail address:
***@hotmail.com)


thanks .
Subject: Re: [HACKERS] Prepared statements considered harmful
Date: Thu, 31 Aug 2006 14:52:05 +0200
Post by Peter Eisentraut
Post by Csaba Nagy
How about "prepared" means really "prepared"... in the sense of parsed,
analyzed all sensible plans, and save a meta-plan which based on current
statistics and parameter values chooses one of the considered (and
cached) plans ?
I don't think this could solve one particularly frequent problem which is that
pattern matching queries don't get along with prepared plans if the search
pattern isn't known at planning time.
Why not ? I specifically said you would prepare a few sensible plans
based on statistics/expected variations of the statistics, and parameter
value ranges which would trigger different plans.
So for the like query case you could save 2 plans, one for the indexable
case, one for the not indexable case. Then at runtime you choose the
proper one based on the pattern value. The meta-plan I mentioned would
be a collection of plans with rules to choose the right one at run time
based on parameter values and perhaps the current statistics.
This of course would need a lot more preparation time than just prepare
one plan, but that's why you want to do it upfront and then cache the
results. A central plan repository mentioned in other posts would fit
nicely here... and you could use prepared plans for non-parameterized
queries too by simply considering the constants as parameters, to
increase the chances for a prepared plan reuse - this of course for
complex enough queries.
Cheers,
Csaba.
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
Peter Eisentraut
2006-08-31 13:19:22 UTC
Permalink
Post by Csaba Nagy
So for the like query case you could save 2 plans, one for the indexable
case, one for the not indexable case. Then at runtime you choose the
proper one based on the pattern value.
OK, why don't you work out an example. Let's look at this query:

SELECT * FROM t1 WHERE a LIKE $1;

What two plans would you prepare?
--
Peter Eisentraut
http://developer.postgresql.org/~petere/

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly
Csaba Nagy
2006-08-31 13:36:25 UTC
Permalink
Post by Peter Eisentraut
SELECT * FROM t1 WHERE a LIKE $1;
What two plans would you prepare?
if substring($1 from 1 for 1) != '%' then
use plan 1 (see below);
else
use plan 2 (see below);
end if;

Save both plans from below with the meta-plan from above, and call it a
prepared plan.

cnagy=# create table t1 (a text);
CREATE TABLE
cnagy=# insert into t1 select round(10000000 * random()) from
generate_series(1,10000);
INSERT 0 10000
cnagy=# create index idx_t1_a on t1 (a);
CREATE INDEX
cnagy=# analyze verbose t1;
INFO: analyzing "public.t1"
INFO: "t1": scanned 55 of 55 pages, containing 10000 live rows and 0
dead rows; 3000 rows in sample, 10000 estimated total rows
ANALYZE
cnagy=# explain select a from t1 where a like '121%';
QUERY PLAN
------------------------------------------------------------------------
Bitmap Heap Scan on t1 (cost=2.06..27.63 rows=10 width=10)
Filter: (a ~~ '121%'::text)
-> Bitmap Index Scan on idx_t1_a (cost=0.00..2.06 rows=10 width=0)
Index Cond: ((a >= '121'::text) AND (a < '122'::text))
(4 rows)

cnagy=# explain select a from t1 where a like '%121';
QUERY PLAN
------------------------------------------------------
Seq Scan on t1 (cost=0.00..180.00 rows=80 width=10)
Filter: (a ~~ '%121'::text)
(2 rows)


Cheers,
Csaba.



---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
Peter Eisentraut
2006-08-31 13:49:21 UTC
Permalink
Post by Csaba Nagy
Post by Peter Eisentraut
SELECT * FROM t1 WHERE a LIKE $1;
What two plans would you prepare?
if substring($1 from 1 for 1) != '%' then
use plan 1 (see below);
else
use plan 2 (see below);
end if;
Note that plan 1 can only be created if you know the actual value for $1.
--
Peter Eisentraut
http://developer.postgresql.org/~petere/

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly
Csaba Nagy
2006-08-31 13:51:29 UTC
Permalink
Post by Peter Eisentraut
Note that plan 1 can only be created if you know the actual value for $1.
Why would that be so ? The plan can contain functions of $1 (both
constants in plan 1 are a function of $1).

Cheers,
Csaba



---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
m***@mark.mielke.cc
2006-08-31 15:22:33 UTC
Permalink
Post by Csaba Nagy
Post by Peter Eisentraut
SELECT * FROM t1 WHERE a LIKE $1;
What two plans would you prepare?
if substring($1 from 1 for 1) != '%' then
use plan 1 (see below);
else
use plan 2 (see below);
end if;
It would be cool if PostgreSQL did this - but I think it is also
true that anybody (or JDBC) who tried to prepare a plan in the cases
that are known to cause problems, is making a mistake.

While on the 'it would be cool' subject - I think it might be cool if
the prepare statement took sample arguments that could be used to
prepare the plans with. "Prepare a plan that would work best with
these arguments." Then JDBC could prepare both plans for you - if it
was smart enough... :-)

Cheers,
mark
--
***@mielke.cc / ***@ncf.ca / ***@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
Lukas Kahwe Smith
2006-08-31 15:14:45 UTC
Permalink
Post by Peter Eisentraut
Post by Csaba Nagy
So for the like query case you could save 2 plans, one for the indexable
case, one for the not indexable case. Then at runtime you choose the
proper one based on the pattern value.
SELECT * FROM t1 WHERE a LIKE $1;
What two plans would you prepare?
Well I guess for the case that none of the "expected" plans fit you can
always fallback to generating a new plan on the fly.

Anyways it would of course be cool if pgsql could set an invalid flag if
it detects that a certain plan performed badly (maybe even automatically
cause a fresh table analysis) or some DDL/DML was executed that likely
invalidated the plan.

I am not sure if there is any "philosphie" that pgsql tries to adhere
to. Does it want to leave the job of tuning to the DBA or does it want
to do things automatically (which always means that in some situations
it will do the wrong thing).

tweak planner vs. planner hints
manually analyze vs. automatically analyze
manual vaccum vs autovaccum

Hmm actually its probably not a black and white thing and the ultimate
goal would be to offer both with maybe some installer checkbox to
default everything to "DBA-less" automode.

Anyways I never liked the idea of planner hints. I think it makes much
more sense to give people direct access to plans in that case. Meaning
they can "partially" hardcode (parameterized) plans they want. I have
mentioned before that Sybase seems to have such a feature (you can dump
plans, tweak them and remove pieces that should be done on the fly and
associate them with stored procedures - not sure if you also do that for
prepared statements).

regards,
Lukas


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
AgentM
2006-08-31 14:15:02 UTC
Permalink
Post by Csaba Nagy
This of course would need a lot more preparation time than just prepare
one plan, but that's why you want to do it upfront and then cache the
results. A central plan repository mentioned in other posts would fit
nicely here... and you could use prepared plans for non-parameterized
queries too by simply considering the constants as parameters, to
increase the chances for a prepared plan reuse - this of course for
complex enough queries.
If prepared statements become more expensive to create, then it would
make more sense for them to persist across sessions. All of an
application's prepared statements could be cached.

-M

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org
Zeugswetter Andreas DCP SD
2006-08-31 13:24:47 UTC
Permalink
Post by Peter Eisentraut
Post by Csaba Nagy
How about "prepared" means really "prepared"... in the sense of
parsed, analyzed all sensible plans, and save a meta-plan which
based
Post by Peter Eisentraut
Post by Csaba Nagy
on current statistics and parameter values chooses one of the
considered (and cached) plans ?
I don't think this could solve one particularly frequent
problem which is that pattern matching queries don't get
along with prepared plans if the search pattern isn't known
at planning time.
I think what we would actually want is knowledge about how
much difference different parameters actually make in plan decision.
(the stats show an even distribution and join correlation)
Then we could prepare the plan when there is not much difference
and postpone planning until we know the parameters when the difference
is big.

OLTP workload typically benefits from prepared plans, and the one plan
is good
for all possible inputs, so imho we cannot just assume all plans need
replanning
for different parameters.

Andreas

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq
Stefan Kaltenbrunner
2006-08-31 14:29:47 UTC
Permalink
Post by Peter Eisentraut
With time, it becomes ever clearer to me that prepared SQL statements are just
a really bad idea. On some days, it seems like half the performance problems
in PostgreSQL-using systems are because a bad plan was cached somewhere. I'd
say, in the majority of cases the time you save parsing and planning is
irrelevant compared to the possibly disastrous effects of wrong or suboptimal
plans. I wonder if other people have similar experiences.
I'd wish that we reconsider when and how prepared statements are used. The
JDBC interface and PL/pgSQL are frequently noticed perpetrators, but the
problem is really all over the place.
- Never use prepared statements unless the user has turned them on. (This is
the opposite of the current behavior.)
- Transparently invalidate and regenerate prepared plans more often. This
could be tied to the transaction count, update activity obtained from the
statistics collector, etc.
well this sounds like being best done with the "central plan cache" idea
that is floating around(I think neilc once worked on that) - once we
have something like that I would expect we can easily
invalidate/regenerate plans there based on certain criteria (from
obvious things like DDL-changes to more subtile ones like maybe "age of
the plan" or "statistics changed significantly on table foo" or
"regenerate plan everytime when the table bla is involved")
Most of that is pure speculation - but something like that would be a
very powerful thing to have.


Stefan




---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
Merlin Moncure
2006-08-31 13:05:23 UTC
Permalink
Post by Peter Eisentraut
With time, it becomes ever clearer to me that prepared SQL statements are just
a really bad idea. On some days, it seems like half the performance problems
in PostgreSQL-using systems are because a bad plan was cached somewhere. I'd
say, in the majority of cases the time you save parsing and planning is
irrelevant compared to the possibly disastrous effects of wrong or suboptimal
plans. I wonder if other people have similar experiences.
I have to respectfully disagree. I have used them to great effect in
many of my projects. In the most extreme case, prepared statements can
provide a 50% reduction or greater in overall query time...this is too
good a benefit to simply discard. I worked on converted isam projects
which would not have been possbile to make efficient without prepared
statements. However you are correct that the planner does often
create wacky plans which can cause disasterous results in some cases.

My major issue is that you cannot supply hints to the query engine.
For example one of my favorite tricks is to paramterize the limit
clause in a query which creates a sliding window over the table for
progressive readahead. Unfortunately the planner assumes 10% which
borks the plan. My work around is to turn off bitmap, seqscan before
plan and turn them on after the prepare.

The proposal to supply hints to statements and functions has been
voted down several times due to the argument that it is better to fix
the planner. I think supplying hints does fix the planner, and is a
balanced solution.

merlin

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
Peter Eisentraut
2006-08-31 13:25:32 UTC
Permalink
Post by Merlin Moncure
The proposal to supply hints to statements and functions has been
voted down several times due to the argument that it is better to fix
the planner. I think supplying hints does fix the planner, and is a
balanced solution.
Planner hints are a way to address a deficient planner. But neither a
manually hinted planner nor a perfectly good planner will help if the
planning decisions are based on outdated information.
--
Peter Eisentraut
http://developer.postgresql.org/~petere/

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq
Peter Eisentraut
2006-08-31 14:42:27 UTC
Permalink
I don't chime in very often, but I do think the refusal to
incorporate hints into the planner system is fantastically stubborn
and nonsensical.
What is actually fantastically nonsensical about this is that the issues I
outlined about prepared statements would merely become worse if planner hints
were used. Then, you wouldn't only have to worry about plans that were
created earlier during the session, you would be faced with plans that were
created earlier during the application's development. In general, the
solutions to the prepared statement issues need to effect that the plans are
created more often, not less often.
--
Peter Eisentraut
http://developer.postgresql.org/~petere/

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
Zeugswetter Andreas DCP SD
2006-09-01 10:44:18 UTC
Permalink
Post by Peter Eisentraut
I don't chime in very often, but I do think the refusal to
incorporate
Post by Peter Eisentraut
hints into the planner system is fantastically stubborn and
nonsensical.
What is actually fantastically nonsensical about this is that
the issues I outlined about prepared statements would merely
become worse if planner hints were used. Then, you wouldn't
only have to worry about plans that were created earlier
during the session, you would be faced with plans that were
created earlier during the application's development. In
general, the solutions to the prepared statement issues need
to effect that the plans are created more often, not less often.
I have yet to see one of our partial Informix hints (where the planner
does it's
usual job only with one path with lowered/elevated costs) fall foul on
not anticipated change of underlying data.

Thus I don't buy the argument that hints are always bad.
Of course their use should be extremely rare and well thought out.
Most of the time sql tuning involves a concerted effort between the
programmer and a db performance expert, usually resulting in
rewritten sql or program logic without adding hints.

I can see arguments for hints the dba can set himself centrally on the
server,
but in my experience chances for substantial improvement are very
limited in that case.

Andreas

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq
Theo Schlossnagle
2006-08-31 14:09:10 UTC
Permalink
Post by Peter Eisentraut
Post by Merlin Moncure
The proposal to supply hints to statements and functions has been
voted down several times due to the argument that it is better to fix
the planner. I think supplying hints does fix the planner, and is a
balanced solution.
Planner hints are a way to address a deficient planner. But neither a
manually hinted planner nor a perfectly good planner will help if the
planning decisions are based on outdated information.
I don't chime in very often, but I do think the refusal to
incorporate hints into the planner system is fantastically stubborn
and nonsensical. I whole-heartedly agree that it is _better_ to fix
the planner, but many of us have production systems and can't just go
check out CVS HEAD to address our day-to-day issues and we suffer
from this decision.

There are many databases out there with better planners than
PostgreSQL -- likely there will always be. Even those databases have
query planner hints. Why? Because the authors of those database had
the humility to realize that the planner they designed wasn't perfect
and that people _still_ need their database to perform well despite a
non-optimal query plan here and there.

A good query planner hint system would act as a catalyst to the
improvement of the current query planner as users could share their
complex queries and associated improved query plans through hinting.

I like Postgres a lot, I think the people that work on it are very
very sharp. I do feel that the consistent refusal to allow query
hinting to be introduced demonstrates an unhealthy amount of hubris
that, in the end, negatively impacts users.

While Postgres is missing a ton of other needed features, I rarely
see the attitude that they are _unwanted_. Instead I see the "if it
is important to you, go build it" attitude which is what I would
expect in an open source project.

// Theo Schlossnagle
// CTO -- http://www.omniti.com/~jesus/
// OmniTI Computer Consulting, Inc. -- http://www.omniti.com/



---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
m***@mark.mielke.cc
2006-08-31 15:27:18 UTC
Permalink
Post by Theo Schlossnagle
There are many databases out there with better planners than
PostgreSQL -- likely there will always be. Even those databases have
query planner hints. Why? Because the authors of those database had
the humility to realize that the planner they designed wasn't perfect
and that people _still_ need their database to perform well despite a
non-optimal query plan here and there.
A good query planner hint system would act as a catalyst to the
improvement of the current query planner as users could share their
complex queries and associated improved query plans through hinting.
Would a hint system allow the planner to execute quicker? Eliminate
plans from consideration early, without evaluation how long they might
take to execute? Sort of possible today with toggling of the 'seqscan'
and other such options... :-)
Post by Theo Schlossnagle
I like Postgres a lot, I think the people that work on it are very
very sharp. I do feel that the consistent refusal to allow query
hinting to be introduced demonstrates an unhealthy amount of hubris
that, in the end, negatively impacts users.
Hubris isn't always bad. If hints were provided, the need for the
fully automatic planner to improve would be reduced. But yes, they do
seem to be competing goals, disenfranchising the user.
Post by Theo Schlossnagle
While Postgres is missing a ton of other needed features, I rarely
see the attitude that they are _unwanted_. Instead I see the "if it
is important to you, go build it" attitude which is what I would
expect in an open source project.
There is also "what you submit should be maintainable because we know
you might disappear at any time".

Cheers,
mark
--
***@mielke.cc / ***@ncf.ca / ***@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
Jeff Davis
2006-08-31 18:55:44 UTC
Permalink
Post by Theo Schlossnagle
Post by Merlin Moncure
The proposal to supply hints to statements and functions has been
voted down several times due to the argument that it is better to fix
the planner. I think supplying hints does fix the planner, and is a
balanced solution.
There are many databases out there with better planners than
PostgreSQL -- likely there will always be. Even those databases have
query planner hints. Why? Because the authors of those database had
the humility to realize that the planner they designed wasn't perfect
and that people _still_ need their database to perform well despite a
non-optimal query plan here and there.
You can see a related discussion here:

http://archives.postgresql.org/pgsql-hackers/2006-08/msg00463.php

What I understood from that thread was that the concept of planner hints
was not completely rejected. I think the most likely outcome (if any
planning system is implemented) is some mechanism to state the hint in a
separate SQL declaration rather than inside the query itself. This can
still result in potentially stale (or very stale) plans, but at least
you don't have to change your application every time you modify the
"hints". However, as far as I know, this has not progressed beyond the
brainstorming stage.

I think many people are still very skeptical of various implementations
of planner hints, but there is some reasonable level of discussion.

Regards,
Jeff Davis


---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq
Merlin Moncure
2006-08-31 17:26:11 UTC
Permalink
Post by Peter Eisentraut
Post by Merlin Moncure
The proposal to supply hints to statements and functions has been
voted down several times due to the argument that it is better to fix
the planner. I think supplying hints does fix the planner, and is a
balanced solution.
Planner hints are a way to address a deficient planner. But neither a
manually hinted planner nor a perfectly good planner will help if the
planning decisions are based on outdated information.
right, anyways it's clearer now what you are suggesting and I think
your idea regarding impicitly generated plans has some merit. the
major annoyance for me is I have to force disconnect anytime there is
a schema search_path change.

query hints, which I still think would make my life much easier, do
not have much to do with the thrust of your argument.

I think, maybe to add some intelligence to implicit plan generation
parhaps guarded by GUC:
implicit_plan_generation=[none, smart, all]
with smart meaning some defined events including perhaps:
* creation or deletion of temp table
* duration of time
* user invocation
* manipulation of search_path

just thinking out loud here,
merlin

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
Andreas Pflug
2006-08-31 13:30:28 UTC
Permalink
Post by Merlin Moncure
Post by Peter Eisentraut
With time, it becomes ever clearer to me that prepared SQL statements are just
a really bad idea. On some days, it seems like half the performance problems
in PostgreSQL-using systems are because a bad plan was cached
somewhere. I'd
say, in the majority of cases the time you save parsing and planning is
irrelevant compared to the possibly disastrous effects of wrong or suboptimal
plans. I wonder if other people have similar experiences.
I have to respectfully disagree. I have used them to great effect in
many of my projects.
Peter doesn't propose to remove prepared statements as such. They are
certainly of great value, if used carefully and specifically, as in your
case. The problems he's addressing stem from plans _implicitly_ created
and stored.
Post by Merlin Moncure
In the most extreme case, prepared statements can
provide a 50% reduction or greater in overall query time...this is too
good a benefit to simply discard. I worked on converted isam projects
which would not have been possbile to make efficient without prepared
statements. However you are correct that the planner does often
create wacky plans which can cause disasterous results in some cases.
My major issue is that you cannot supply hints to the query engine.
I don't believe extending this thread to the we-need-hints issue is a
good idea.

Regards,
Andreas


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly
Jeroen T. Vermeulen
2006-08-31 13:06:57 UTC
Permalink
Post by Peter Eisentraut
With time, it becomes ever clearer to me that prepared SQL statements are just
a really bad idea. On some days, it seems like half the performance problems
in PostgreSQL-using systems are because a bad plan was cached somewhere.
Is there any kind of pattern at all to this problem? Anything
recognizable? A few typical pitfalls?

Without knowing much of the internals, I could imagine [waves hands in
vague gestures] other options--something like recognizing major changes
that upset the cost functions that went into generating a plan, and
invalidating the plan based on those; or noting bad estimates somehow as
they become apparent during execution, and annotating the plan with a
"this assumption was a bad idea" marker so you'll do better next time.

I guess you can't go far wrong if you re-define "prepared" to mean merely
"pre-parsed," but it sounds like such a waste of opportunity...


Jeroen



---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
Peter Eisentraut
2006-08-31 13:29:15 UTC
Permalink
Post by Jeroen T. Vermeulen
Is there any kind of pattern at all to this problem? Anything
recognizable? A few typical pitfalls?
If data is not distributed evenly, then any old WHERE foo = $1 is prone to be
the wrong plan for half of the possible values of $1. The more data you have
and the more it changes, the worse this gets.
--
Peter Eisentraut
http://developer.postgresql.org/~petere/

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
Phil Frost
2006-08-31 14:41:20 UTC
Permalink
Post by Jeroen T. Vermeulen
Post by Peter Eisentraut
With time, it becomes ever clearer to me that prepared SQL
statements are just a really bad idea. On some days, it seems like
half the performance problems in PostgreSQL-using systems are
because a bad plan was cached somewhere.
Is there any kind of pattern at all to this problem? Anything
recognizable? A few typical pitfalls?
Frequently I have found preplanning will result in a horrible plan
because it is assumed parameters may be volatile while in practice they
are literals. Here is a function from my database:

CREATE FUNCTION nullorblank(character varying) RETURNS boolean
AS $_$ select $1 is null or trim($1) = '' $_$
LANGUAGE sql IMMUTABLE;

This is used in stored procedures that answer search queries. For
example, let's consider one that searches products, filtered on any
number of "part number", "manufacturer", or "name". If one of these is
not specified, it does not restrict the query. One might write that
query so:

-- $1: part number
-- $2: manufacturer
-- $3: name

SELECT * FROM product WHERE
(nullorblank($1) OR lower(partnumber) = lower($1))
AND (nullorblank($2) OR manufacturername = $2)
AND (nullorblank($3) OR name = $3)

The parameters will always be literal strings, taken from some form
presented to the user. If one does the parameter subsitution manually,
the plans are quite reasonable:

EXPLAIN ANALYZE
SELECT * FROM product WHERE
(nullorblank('int2100/512') OR lower(partnumber) = lower('int2100/512'))
AND (nullorblank('') OR manufacturername = '')
AND (nullorblank('') OR name = '');

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=15.54..4494.71 rows=1867 width=254) (actual time=43.502..43.507 rows=1 loops=1)
-> Bitmap Heap Scan on product (cost=15.54..4494.71 rows=1867 width=254) (actual time=43.161..43.162 rows=1 loops=1)
Recheck Cond: (lower((partnumber)::text) = 'int2100/512'::text)
-> Bitmap Index Scan on product_partnumber_loweridx (cost=0.00..15.54 rows=1867 width=0) (actual time=43.022..43.022 rows=1 loops=1)
Index Cond: (lower((partnumber)::text) = 'int2100/512'::text)
Total runtime: 51.626 ms
(7 rows)

The 'manufacturername' and 'name' disjuncts have been removed by
simplification, since the expression is known to be true.

However, if "prepared", it's horrible:

PREPARE to_be_slow(text, text, text) AS
SELECT * FROM product WHERE
(nullorblank($1) OR lower(partnumber) = lower($1))
AND (nullorblank($2) OR manufacturername = $2)
AND (nullorblank($3) OR name = $3);

explain analyze execute to_be_slow('int2100/512', NULL, NULL);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=0.00..22317.13 rows=1 width=254) (actual time=1115.167..1579.535 rows=1 loops=1)
-> Seq Scan on product (cost=0.00..22317.12 rows=1 width=254) (actual time=1114.845..1579.211 rows=1 loops=1)
Filter: (((($1)::character varying IS NULL) OR (btrim(($1)::text) = ''::text) OR (lower((partnumber)::text) = lower($1))) AND ((($2)::character varying IS NULL) OR (btrim(($2)::text) = ''::text) OR (manufacturername = $2)) AND ((($3)::character varying IS NULL) OR (btrim(($3)::text) = ''::text) OR ((name)::text = $3)))
Total runtime: 1580.006 ms
(5 rows)

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org
Jeroen T. Vermeulen
2006-09-01 08:56:19 UTC
Permalink
Post by Phil Frost
Post by Jeroen T. Vermeulen
Is there any kind of pattern at all to this problem? Anything
recognizable? A few typical pitfalls?
Frequently I have found preplanning will result in a horrible plan
because it is assumed parameters may be volatile while in practice they
That's a very common thing in processor design as well, and there's a
standard trick for it: the saturating two-bit counter. It tends to work
pretty well for branch prediction, value prediction etc. Usually it's the
first thing you reach for, so of course somebody may already have tried it
here and found it didn't work.

In this particular case it might be applied something like this: for each
parameter in a prepared statement you cache a predictor value, plus a
"confidence counter" saying (more or less--see below) how many times in
succession that value has repeated. Let's say each of the counters count
from 0 to 3 inclusive, with its confidence threshold right in the middle,
between 1 and 2.

On every invocation, you check each parameter value against the
corresponding predictor value. If it's identical, you increment its
counter (provided it can be incremented any further). If it isn't, you
decrement its counter, and if the counter ends up below its confidence
threshold, you replace the predictor value with the new parameter value.

Then, whenever any new planning needs to be done (I'll get to that in a
moment), you see which counters are above their confidence thresholds. In
your new planning you assume that all parameters with confident
predictions will remain pseudo-constant for the next few invocations.

Of course there's a problem when parameters do not match predicted values.
That's where having one or two backup plans could come in handy. You
could keep your original, fully-generalized plan around. If plans are
cheap enough to store, you could try to keep a cache of old plans for the
same query. The great thing about keeping some backup plans around is
that a pseudo-constant parameter can have a different value once in a
while, then flick back to its old habits without invalidating all your
efforts. Your usually-unused search fields are a good example. You may
also have two stable parameter patterns with different sets of
pseudo-constants competing for your attention.

It's not perfect, and it clearly has its pathological cases--but if it
works well enough overall, the bad cases could be caught and handled as
exceptions. Confidence counters can be tweaked to lose confidence more
easily than they gain it, or vice versa. Some meta-confidence scheme may
catch the worst offenders. I won't go into that now--first I'll shut up
and wait for others to point out what I've missed. :)


Jeroen



---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
Martijn van Oosterhout
2006-09-01 09:53:11 UTC
Permalink
Post by Jeroen T. Vermeulen
That's a very common thing in processor design as well, and there's a
standard trick for it: the saturating two-bit counter. It tends to work
pretty well for branch prediction, value prediction etc. Usually it's the
first thing you reach for, so of course somebody may already have tried it
here and found it didn't work.
Interesting thought. It might be worth trying. But my big question: is
all this testing and counting actually going to be faster than just
replanning? Postgresql's planner is not that slow.
Post by Jeroen T. Vermeulen
Of course there's a problem when parameters do not match predicted values.
That's where having one or two backup plans could come in handy. You
could keep your original, fully-generalized plan around. If plans are
cheap enough to store, you could try to keep a cache of old plans for the
same query. The great thing about keeping some backup plans around is
that a pseudo-constant parameter can have a different value once in a
while, then flick back to its old habits without invalidating all your
efforts. Your usually-unused search fields are a good example. You may
also have two stable parameter patterns with different sets of
pseudo-constants competing for your attention.
The thing is that number of possible plans is going to be proportional
to factorial(number of tables). Once you have 3 tables you're going to
have at least a dozen possible plans, probably more. What the best plan
is depends strongly on what the parameters are.

Anyway, your plan assumes that you have information to work with. The
current system plans prepared queries with no information at all about
parameters and people are advocating to keep it that way. I think a
good first step would be the plan on first execution, like Oracle does.

Have a nice day,
--
Post by Jeroen T. Vermeulen
From each according to his ability. To each according to his ability to litigate.
Zeugswetter Andreas DCP SD
2006-09-01 10:39:01 UTC
Permalink
Post by Martijn van Oosterhout
Anyway, your plan assumes that you have information to work
with. The current system plans prepared queries with no
information at all about parameters and people are advocating
to keep it that way. I think a good first step would be the
plan on first execution, like Oracle does.
Yup, it is also possible to try to find an obvious plan and only
delay planning (or part of the plan) when different inputs make a
big difference (like MaxDB and Informix).

Andreas

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org
Jeroen T. Vermeulen
2006-09-01 14:10:42 UTC
Permalink
Post by Martijn van Oosterhout
Interesting thought. It might be worth trying. But my big question: is
all this testing and counting actually going to be faster than just
replanning? Postgresql's planner is not that slow.
In the best case (which of course would have to be very frequent for any
of this to matter in the first place) it's mainly just a short loop
comparing the call's parameter values to their counterparts stored with
the plan and update those two-bit confidence counters. You wouldn't
*believe* how simple you have to keep these things in processor
architecture. :-)
Post by Martijn van Oosterhout
The thing is that number of possible plans is going to be proportional
to factorial(number of tables). Once you have 3 tables you're going to
have at least a dozen possible plans, probably more. What the best plan
is depends strongly on what the parameters are.
Of course. That's the whole point: to end up with a small but effective
subset of all those possible plans. I'd guess that you could cover even
most of the nasty cases with a maximum of three plans or so per prepared
statement, including the original fully-generalized one. The plans could
be replaced on an LRU basis, which isn't very costly for three or so
entries.
Post by Martijn van Oosterhout
Anyway, your plan assumes that you have information to work with. The
current system plans prepared queries with no information at all about
parameters and people are advocating to keep it that way. I think a
good first step would be the plan on first execution, like Oracle does.
Yes, delaying things a bit can help a lot sometimes. That's also what JIT
compilers in JVMs do, for instance. FWIW, libpqxx doesn't prepare
statements until they're first called anyway.

But if this choice to discard parameter information is exactly what causes
a lot of the bad plans in the first place, as Peter says, what's wrong
with putting it to use instead? For those cases, you're pretty much
screwed by definition as long as you fail to do so. And it's not like
what I'm suggesting is very difficult!

The real question is whether it's worthwhile. To find that out, we'd need
to estimate four factors: coverage (how often you'd get a useful
prediction), accuracy (how often that prediction would be accurate), cost
of misprediction (near-zero compared to current situation, assuming we
keep the generalized plans handy), and savings for correct prediction (in
our case, benefit of planning for a constant instead of a variable minus
the cost of re-planning which you say isn't very expensive).

Based on what Peter and you tell me about cost, the main worries here are
coverage and accuracy. Coverage and accuracy can be extracted (and
tweaked!) relatively easily if we have logs of prepared-statement
executions in a wide variety of real-life applications. Listings of
consecutive prepared-statement invocations (statement name plus parameter
values) are all that's needed.

Do we have any logs like that? If we do, I'll be more than happy to run
some simulations and see if the idea shows any promise. Like I said,
there's every chance that it doesn't. It was just an off-the-cuff
suggestion and if it's no good I'll have no problems saying so. But
there's not much point sitting around arguing over theoretical merits if
they're that easy to quantify!


Jeroen



---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org
Gregory Stark
2006-09-01 15:14:32 UTC
Permalink
Post by Jeroen T. Vermeulen
Post by Martijn van Oosterhout
Interesting thought. It might be worth trying. But my big question: is
all this testing and counting actually going to be faster than just
replanning? Postgresql's planner is not that slow.
In the best case (which of course would have to be very frequent for any
of this to matter in the first place) it's mainly just a short loop
comparing the call's parameter values to their counterparts stored with
the plan and update those two-bit confidence counters. You wouldn't
*believe* how simple you have to keep these things in processor
architecture. :-)
I think the slow part is trying to figure out whether to count the current
call as a hit or a miss. How do you determine whether the plan you're running
is the best plan without replanning the query?
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
Jeroen T. Vermeulen
2006-09-01 16:35:46 UTC
Permalink
Post by Gregory Stark
I think the slow part is trying to figure out whether to count the current
call as a hit or a miss. How do you determine whether the plan you're running
is the best plan without replanning the query?
The question of knowing which plan is best _based on what's in the actual
tables_ would be unsolved just as it always was. The scheme addresses
only the opportunity to optimize for pseudo-constant parameters. It
treats the existing planner as a black box. If you find a solution to the
problem of inaccurate statistics, it'll probably be more or less
orthogonal to what I'm describing: you could have one or the other, but
combining them shouldn't be much harder.

I don't think telling hits from misses would be all that hard. Let's say
you're having a prepared statement called, and you're evaluating a
candidate plan. Each parameter is in one of two sets: those "predicted"
by the plan to have certain values (let's call them P), and those "not
predicted" by the plan because their confidence counters were below the
threshold (I'm tempted to call this set NP, but let's make it Q instead).
Whether a parameter is in P or in Q can be derived from its confidence
counter. In my previous example, you just take its most-significant bit.

* For any parameter in P, if the actual value does not match the plan's
prediction, you have a miss. Can't use this plan. Use another if you
have one that applies (such as your regular old non-specialized
plan--that always applies), or if not, write a new one!

If you get through this without finding a mismatch, congratulations: you
have a hit. The plan you're looking at is applicable to your call. But
now we see if we can do better:

* For any parameter in Q, if its value would have been predicted
correctly but its counter was below the confidence threshold, you
increment the counter. If that lifts the counter above the threshold,
you have room for improving on this plan. It means there's a good chance
you can re-plan for the case that this parameter is also a
pseudo-constant, without the effort being wasted. Of course you could
also set a minimum number of invocations between re-plannings to get a
more long-term view (e.g. different parameters being recognized as
pseudo-constants in subsequent calls--you may not want to re-plan for
each of those calls).

So which plan do you execute if you have more than one applicable
candidate? We can see what works well. As a starter I would definitely
pick the one with the larger P (smaller Q), breaking ties in favour of the
most recently generated plan. I'm assuming we only want one plan for a
given P.

We'd probably want to limit the number of candidate plans per statement to
some very small, fixed number--somewhere between one and four, I'd say; or
maybe one generalized plan plus up to two specialized ones. With numbers
like that, none of this should be very expensive. A simple linear match
against 1-4 candidates may be more effective than any attempt to be
clever.

I must admit I haven't thought through all of the consequences of caching
more than one specialized plan per statement. For example, we could give
every cached plan its own set of confidence counters, and match an
incoming invocation against each of those; or we could keep just one "most
likely" plan with its associated predictor state, and only consider
previously generated plans if we either miss or find room for improvement
in the predictor.


Jeroen



---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly
Jim C. Nasby
2006-09-02 21:49:58 UTC
Permalink
Post by Gregory Stark
Post by Jeroen T. Vermeulen
Post by Martijn van Oosterhout
Interesting thought. It might be worth trying. But my big question: is
all this testing and counting actually going to be faster than just
replanning? Postgresql's planner is not that slow.
In the best case (which of course would have to be very frequent for any
of this to matter in the first place) it's mainly just a short loop
comparing the call's parameter values to their counterparts stored with
the plan and update those two-bit confidence counters. You wouldn't
*believe* how simple you have to keep these things in processor
architecture. :-)
I think the slow part is trying to figure out whether to count the current
call as a hit or a miss. How do you determine whether the plan you're running
is the best plan without replanning the query?
Simply looking at estimated row counts/cost versus what actually
happened would probably suffice. It'd at least be a great start.
--
Jim C. Nasby, Database Architect ***@nasby.net
512.569.9461 (cell) http://jim.nasby.net

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org
m***@mark.mielke.cc
2006-09-01 14:28:00 UTC
Permalink
Post by Martijn van Oosterhout
Post by Jeroen T. Vermeulen
That's a very common thing in processor design as well, and there's a
standard trick for it: the saturating two-bit counter. It tends to work
pretty well for branch prediction, value prediction etc. Usually it's the
first thing you reach for, so of course somebody may already have tried it
here and found it didn't work.
Interesting thought. It might be worth trying. But my big question: is
all this testing and counting actually going to be faster than just
replanning? Postgresql's planner is not that slow.
The difference between a pre-planned query, and a plan each time
query, for me, seems to be a minimum of around 0.3 - 0.5 ms. This is
on a fairly modern AMD X2 3800+. If the tests and counting are kept
simple - I don't see why they would take anywhere near that long.

Cheers,
mark
--
***@mielke.cc / ***@ncf.ca / ***@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
Tom Lane
2006-09-01 14:30:49 UTC
Permalink
Post by Martijn van Oosterhout
Interesting thought. It might be worth trying. But my big question: is
all this testing and counting actually going to be faster than just
replanning? Postgresql's planner is not that slow.
...
The thing is that number of possible plans is going to be proportional
to factorial(number of tables).
Yeah. One of the reasons the planner is acceptably fast is that it is
aggressive about discarding candidate plans as soon as they are clearly
inferior to other plans. Tracking multiple plans that might be optimal
under varying assumptions about the query parameters would make things
exponentially slower.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly
Jeroen T. Vermeulen
2006-09-01 15:26:07 UTC
Permalink
Post by Tom Lane
Yeah. One of the reasons the planner is acceptably fast is that it is
aggressive about discarding candidate plans as soon as they are clearly
inferior to other plans. Tracking multiple plans that might be optimal
under varying assumptions about the query parameters would make things
exponentially slower.
AFAICS the planner shouldn't be affected at all--it'd just be invoked more
often as and when the need for new plans became apparent. Not
"exponentially" (that's an overused word anyway) but "proportionally" to
that.

I've been assuming that once you have a plan, storing it is not very
expensive. If, say, doubling the number of plans stored with a session's
prepared statements incurs some serious cost (apart from the planning
itself, of course) then that changes things.


Jeroen



---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly
Andreas Pflug
2006-08-31 13:18:01 UTC
Permalink
Post by Peter Eisentraut
With time, it becomes ever clearer to me that prepared SQL statements are just
a really bad idea. On some days, it seems like half the performance problems
in PostgreSQL-using systems are because a bad plan was cached somewhere. I'd
say, in the majority of cases the time you save parsing and planning is
irrelevant compared to the possibly disastrous effects of wrong or suboptimal
plans. I wonder if other people have similar experiences.
I'd wish that we reconsider when and how prepared statements are used. The
JDBC interface and PL/pgSQL are frequently noticed perpetrators, but the
problem is really all over the place.
- Never use prepared statements unless the user has turned them on. (This is
the opposite of the current behavior.)
- Transparently invalidate and regenerate prepared plans more often. This
could be tied to the transaction count, update activity obtained from the
statistics collector, etc.
- Redefine "prepared" to mean "parsed" rather than "parsed and planned".
Each of these or similar changes would only solve a subset of the possible
problems. Possibly, we need more knobs to adjust these things. But
something needs to be done.
Not to mention problems with outdated plans after schema changes. Using
views unplanned (replanned) when used in joins could lead to improved
resulting plans (e.g. if the view contains outer joins itself).

Regards,
Andreas

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq
Peter Eisentraut
2006-08-31 13:30:25 UTC
Permalink
Post by Andreas Pflug
Not to mention problems with outdated plans after schema changes. Using
views unplanned (replanned) when used in joins could lead to improved
resulting plans (e.g. if the view contains outer joins itself).
Views don't contain execution plans. I don't see how this is relevant.
--
Peter Eisentraut
http://developer.postgresql.org/~petere/

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq
Peter Eisentraut
2006-08-31 14:52:56 UTC
Permalink
Cached plans etc. might have an impact, but please do not overlook the
benefits of parameterized queries in avoiding SQL injection attacks, as
well as often being much cleaner to code.
That might be part of the confusion. Composing queries with the variable
parameters out of line is a very nice feature. But that concept is totally
independent of the question whether the execution plan should be cached. The
APIs (and their documentations) just don't convey that very well.
--
Peter Eisentraut
http://developer.postgresql.org/~petere/

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly
m***@mark.mielke.cc
2006-08-31 15:18:14 UTC
Permalink
Post by Peter Eisentraut
With time, it becomes ever clearer to me that prepared SQL
statements are just a really bad idea. On some days, it seems like
half the performance problems in PostgreSQL-using systems are
because a bad plan was cached somewhere. I'd say, in the majority
of cases the time you save parsing and planning is irrelevant
compared to the possibly disastrous effects of wrong or suboptimal
plans. I wonder if other people have similar experiences.
...
Comments?
Hello.

I'm attempting to understand why prepared statements would be used for
long enough for tables to change to a point that a given plan will
change from 'optimal' to 'disastrous'.

Wouldn't this require that the tables are completely re-written, or
that their data is drastically updated? For my own tables, most of the
data remains static for months on end. Data is accumulated. Small
changes are made. I don't see why a prepared statement used over a
24 hour period would ever become disastrous.

This suggests to me that you are doing either:

1) Maintaining prepared statements for weeks or months at a time.

2) Churning your tables up into a froth.

I'm guessing, as you mentioned JDBC, that you might be hitting 1), in
the context of JDBC being used from a Web Application, where the
application server holds a connection open for weeks or months at a
time. If so, it does sound as if JDBC is doing wrong by keeping
prepared queries around for that long. A time limit of an hour, or
even a few minutes would make sense.

My experience does not match yours. Prepared queries have always
significantly improved my execution times. They do have a place.
Whatever the scenarios you are hitting should be dealt with, possibly
in JDBC.

Cheers,
mark
--
***@mielke.cc / ***@ncf.ca / ***@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
AgentM
2006-08-31 15:36:36 UTC
Permalink
Post by m***@mark.mielke.cc
I'm attempting to understand why prepared statements would be used for
long enough for tables to change to a point that a given plan will
change from 'optimal' to 'disastrous'.
Wouldn't this require that the tables are completely re-written, or
that their data is drastically updated? For my own tables, most of the
data remains static for months on end. Data is accumulated. Small
changes are made. I don't see why a prepared statement used over a
24 hour period would ever become disastrous.
Scenario: A web application maintains a pool of connections to the
database. If the connections have to be regularly restarted due to a
postgres implementation detail (stale plans), then that is a database
deficiency.

-M

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org
m***@mark.mielke.cc
2006-08-31 15:53:24 UTC
Permalink
Post by AgentM
Post by m***@mark.mielke.cc
I'm attempting to understand why prepared statements would be used for
long enough for tables to change to a point that a given plan will
change from 'optimal' to 'disastrous'.
Wouldn't this require that the tables are completely re-written, or
that their data is drastically updated? For my own tables, most of the
data remains static for months on end. Data is accumulated. Small
changes are made. I don't see why a prepared statement used over a
24 hour period would ever become disastrous.
Scenario: A web application maintains a pool of connections to the
database. If the connections have to be regularly restarted due to a
postgres implementation detail (stale plans), then that is a database
deficiency.
Or a JDBC deficiency. Nobody is forcing JDBC to automatically reuse a
prepared plan indefinately. If automatically prepared, it can
regenerate them whenever it wishes.

Does Oracle automatically regenerate prepared plans on occasion?

I don't consider it a deficiency. It is doing exactly what you are
asking it to do. That it isn't second guessing you isn't a deficiency.
For all PostgreSQL knows, your tables are not changing such that a
query a week later is suddenly disastrous because the consistency of
your data has changed drastically, and what you prepared a week ago,
and chose to execute today, is still the optimal plan.

Cheers,
mark
--
***@mielke.cc / ***@ncf.ca / ***@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
Gregory Stark
2006-08-31 21:48:29 UTC
Permalink
Post by m***@mark.mielke.cc
Does Oracle automatically regenerate prepared plans on occasion?
Not due to statistics changes, only if your schema changes.

(caveat: I last used Oracle back at 8i)
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
Tom Lane
2006-08-31 16:04:51 UTC
Permalink
Post by AgentM
Post by m***@mark.mielke.cc
I'm attempting to understand why prepared statements would be used for
long enough for tables to change to a point that a given plan will
change from 'optimal' to 'disastrous'.
Scenario: A web application maintains a pool of connections to the
database. If the connections have to be regularly restarted due to a
postgres implementation detail (stale plans), then that is a database
deficiency.
The two major complaints that I've seen are

* plpgsql's prepared plans don't work at all for scenarios involving
temp tables that are created and dropped in each use of the function.
Then, the plan needs to be regenerated on every successive call.
Right now we tell people they have to use EXECUTE, which is painful
and gives up unnecessary amounts of performance (because it might
well be useful to cache a plan for the lifespan of the table).

* for parameterized queries, a generic plan gives up too much
performance compared to one generated for specific constant parameter
values.

Neither of these problems have anything to do with statistics getting
stale.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
Csaba Nagy
2006-08-31 16:14:08 UTC
Permalink
Post by Tom Lane
Neither of these problems have anything to do with statistics getting
stale.
... and the second one would benefit from a "meta-plan" facility which
puts some "meta-plan" nodes on top of specific plans to dispatch based
on parameter values at runtime.

Incidentally, the dispatch could check the statistics assumptions too.
If you don't need to do the planning for each execution, you could
afford to check the assumptions for each execution instead...

Cheers,
Csaba.



---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly
AgentM
2006-08-31 16:17:47 UTC
Permalink
Post by Tom Lane
The two major complaints that I've seen are
<snip>
Post by Tom Lane
Neither of these problems have anything to do with statistics getting
stale.
Not stats-- plans. Plan invalidation has been discussed before, no?

-M

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
Peter Eisentraut
2006-08-31 16:39:17 UTC
Permalink
Post by AgentM
Post by Tom Lane
The two major complaints that I've seen are
<snip>
Post by Tom Lane
Neither of these problems have anything to do with statistics
getting stale.
Not stats-- plans. Plan invalidation has been discussed before, no?
Plan invalidation helps with schema changes and data changes but not
with parametrized queries.
--
Peter Eisentraut
http://developer.postgresql.org/~petere/

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
Tom Lane
2006-08-31 15:27:18 UTC
Permalink
Post by Peter Eisentraut
With time, it becomes ever clearer to me that prepared SQL statements
are just a really bad idea.
That's an overstatement, but I'll agree that they have strong
limitations.
Post by Peter Eisentraut
I'd wish that we reconsider when and how prepared statements are used. The
JDBC interface and PL/pgSQL are frequently noticed perpetrators, but the
problem is really all over the place.
AFAIK those are the only two places where preparation is the default
... what else were you thinking of?
Post by Peter Eisentraut
- Transparently invalidate and regenerate prepared plans more often. This
could be tied to the transaction count, update activity obtained from the
statistics collector, etc.
FWIW, I've assumed right along that once we have a plan-invalidation
mechanism, any ANALYZE stats update would invalidate affected plans.
Post by Peter Eisentraut
- Redefine "prepared" to mean "parsed" rather than "parsed and planned".
For plan-inval to work in all cases, we'll have to store either the source
query string or the raw grammar's output tree, before even parse analysis.
Is that what you are thinking of? It's hardly "prepared" at all if you
do that.

As noted downthread, we've confused out-of-line parameter value shipping
with prepared statements. It might be worth rejiggering the FE/BE
protocol to separate those things better.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
Andrew Dunstan
2006-08-31 15:43:23 UTC
Permalink
Post by Tom Lane
As noted downthread, we've confused out-of-line parameter value shipping
with prepared statements. It might be worth rejiggering the FE/BE
protocol to separate those things better.
Well, that's surely not going to happen in a hurry, is it? Maybe a quick
fix would be a way to allow the user to turn plan caching on and off.

cheers

andrew

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
Tom Lane
2006-08-31 16:10:35 UTC
Permalink
Post by Andrew Dunstan
Post by Tom Lane
As noted downthread, we've confused out-of-line parameter value shipping
with prepared statements. It might be worth rejiggering the FE/BE
protocol to separate those things better.
Well, that's surely not going to happen in a hurry, is it? Maybe a quick
fix would be a way to allow the user to turn plan caching on and off.
There aren't any "quick fixes" here (at least nothing that's likely to
appear in 8.2). But I didn't mean the above suggestion as our only
response to Peter's criticism --- more that that is one of several areas
we ought to think about.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org
Andrew - Supernews
2006-08-31 15:47:20 UTC
Permalink
Post by Tom Lane
Post by Peter Eisentraut
I'd wish that we reconsider when and how prepared statements are used. The
JDBC interface and PL/pgSQL are frequently noticed perpetrators, but the
problem is really all over the place.
AFAIK those are the only two places where preparation is the default
RI triggers.
--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ***@postgresql.org so that your
message can get through to the mailing list cleanly
Martijn van Oosterhout
2006-08-31 16:19:02 UTC
Permalink
Post by Tom Lane
Post by Peter Eisentraut
I'd wish that we reconsider when and how prepared statements are used. The
JDBC interface and PL/pgSQL are frequently noticed perpetrators, but the
problem is really all over the place.
AFAIK those are the only two places where preparation is the default
... what else were you thinking of?
Perl DBI (DBD::Pg) defaults to prepared plans when connecting to a
version 8.0 or higher server.

Or at least, that's the way I read the documentation.

Have a nice day,
--
Post by Tom Lane
From each according to his ability. To each according to his ability to litigate.
Lukas Kahwe Smith
2006-08-31 16:29:43 UTC
Permalink
Post by Martijn van Oosterhout
Post by Tom Lane
Post by Peter Eisentraut
I'd wish that we reconsider when and how prepared statements are used. The
JDBC interface and PL/pgSQL are frequently noticed perpetrators, but the
problem is really all over the place.
AFAIK those are the only two places where preparation is the default
... what else were you thinking of?
Perl DBI (DBD::Pg) defaults to prepared plans when connecting to a
version 8.0 or higher server.
Or at least, that's the way I read the documentation.
AFAIK this is also the case for PHP PDO extension, which is bundled
since PHP 5.1.

regards,
Lukas

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
Lukas Kahwe Smith
2006-08-31 16:34:45 UTC
Permalink
Post by Lukas Kahwe Smith
Post by Martijn van Oosterhout
Post by Tom Lane
Post by Peter Eisentraut
I'd wish that we reconsider when and how prepared statements are
used. The JDBC interface and PL/pgSQL are frequently noticed
perpetrators, but the problem is really all over the place.
AFAIK those are the only two places where preparation is the default
... what else were you thinking of?
Perl DBI (DBD::Pg) defaults to prepared plans when connecting to a
version 8.0 or higher server.
Or at least, that's the way I read the documentation.
AFAIK this is also the case for PHP PDO extension, which is bundled
since PHP 5.1.
BTW: PDO has gotten a switch to force client side placeholder
replacement in favor of using server side prepared statements due to the
fact that prepared statements side-step the MySQL query cache.

http://netevil.org/node.php?uuid=444a6017-0548-2459-2943-44a601714d58
BTW: I am not posting this to solicit MySQL bashing.

The main reason why PDO pushes prepared statements is the fact that they
offer good protection against SQL injection. However obviously in shared
nothing architectures like PHP, which does not yet have any sort of
connection/statement-pooling solution, the danger of prepared statements
becoming stale over time is small. However the problem of running the
same statements with two different parameters that require different
plans is still quite real.

regards,
Lukas

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq
Martijn van Oosterhout
2006-08-31 17:29:45 UTC
Permalink
Post by Lukas Kahwe Smith
BTW: PDO has gotten a switch to force client side placeholder
replacement in favor of using server side prepared statements due to the
fact that prepared statements side-step the MySQL query cache.
Perl DBD:Pg also has a switch to force one way or the other.

However (as has been stated already) people are confusing prepared
statements with out-of-line parameters. Even DBI uses the phrase
"prepare" for setting up statements, whereas this doesn't actually
require server-side prepare, all it needs is out-of-line parameters.

I see from the source that DBD::Pg does use PQexecParams() sometimes so
maybe it does support out-of-line parameters...

Have a nice day,
--
Post by Lukas Kahwe Smith
From each according to his ability. To each according to his ability to litigate.
Zeugswetter Andreas DCP SD
2006-09-01 07:38:41 UTC
Permalink
Post by Tom Lane
Post by Martijn van Oosterhout
Post by Tom Lane
Post by Peter Eisentraut
I'd wish that we reconsider when and how prepared statements are
used. The JDBC interface and PL/pgSQL are frequently noticed
perpetrators, but the problem is really all over the place.
AFAIK those are the only two places where preparation is
the default
Post by Martijn van Oosterhout
Post by Tom Lane
... what else were you thinking of?
Perl DBI (DBD::Pg) defaults to prepared plans when connecting to a
version 8.0 or higher server.
Or at least, that's the way I read the documentation.
Yea, but if you close the statement or leave the scope of the statement
variable the plan is gone. So it is doing exactly what I would expect.

It is written $stmt->prepare('select 1') what else would you expect ?
There are enough other functions to get a result without a plan sticking
around, like $db->selectrow_array ....

Andreas

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
Gregory Stark
2006-08-31 17:43:38 UTC
Permalink
Post by Peter Eisentraut
- Redefine "prepared" to mean "parsed" rather than "parsed and planned".
Then you would be going very much against the user's expectations.

Driver interfaces expose very clearly to the user an explicit interface to
prepare and execute a query separately. What your proposing is to go behind
the user's back and do what he's gone out of his way to tell you not to do.
You can always choose to prepare your queries immediately before use. Most
drivers even supply an interface to do so in a single step for convenience.

If you've gone to the trouble of saving the prepared query handle you very
much do NOT want the database spontaneously deciding to change the behaviour
of that query (even just the performance behaviour) without warning.

In fact somewhere down the list of my personal wishlist for Postgres is plan
stability which would let the DBA control exactly when plans could change.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq
Peter Eisentraut
2006-08-31 18:01:09 UTC
Permalink
Post by Gregory Stark
Then you would be going very much against the user's expectations.
Driver interfaces expose very clearly to the user an explicit
interface to prepare and execute a query separately. What your
proposing is to go behind the user's back and do what he's gone out
of his way to tell you not to do. You can always choose to prepare
your queries immediately before use. Most drivers even supply an
interface to do so in a single step for convenience.
Let's verify that. JDBC and PL/pgSQL have been mentioned.

The JDBC documentation merely contains statements of the sort "A SQL
statement with or without IN parameters can be pre-compiled and stored
in a PreparedStatement object. This object can then be used to
efficiently execute this statement multiple times." There is
absolutely no indication that the execution plan of the statement is
computed at the time of preparation. In fact, it doesn't say
what "pre-compiled" means at all.

For PL/pgSQL, you simply write a query and all the preparing action
happens implicitly. There is nothing explicit about that interface.

So if users have certain expectations here, they're just making them up.
--
Peter Eisentraut
http://developer.postgresql.org/~petere/

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
Tom Lane
2006-08-31 18:58:48 UTC
Permalink
Post by Peter Eisentraut
Post by Gregory Stark
Driver interfaces expose very clearly to the user an explicit
interface to prepare and execute a query separately.
The JDBC documentation merely contains statements of the sort "A SQL
statement with or without IN parameters can be pre-compiled and stored
in a PreparedStatement object. This object can then be used to
efficiently execute this statement multiple times." There is
absolutely no indication that the execution plan of the statement is
computed at the time of preparation.
The key word there is "efficiently". I think it is a reasonable
presumption on the user's part that a query done this way will have less
overhead than just resubmitting the raw query each time.

The important thing I see here is that JDBC allows use of IN parameters
with or without a PreparedStatement (no?). So they've separated the
concepts of out-of-line parameters and "preparing" a statement. That's
the distinction we have unfortunately fudged in the V3 protocol.

The protocol does let you use OOL parameters without retaining a
prepared plan, thanks to the hack introduced later to not plan the
unnamed statement at Parse time, but that's definitely a bit of a wart
on the original protocol design. Maybe it's good enough, or maybe not.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org
Martijn van Oosterhout
2006-08-31 19:14:32 UTC
Permalink
Post by Tom Lane
The protocol does let you use OOL parameters without retaining a
prepared plan, thanks to the hack introduced later to not plan the
unnamed statement at Parse time, but that's definitely a bit of a wart
on the original protocol design. Maybe it's good enough, or maybe not.
Urk, so it was a hack. Unfortunatly it seems something you can't really
change without changing the protocol.

So what are the options now? A GUC like so:

prepare_means_plan = [true|false]

So then a prepare will always parse straightaway, but you can choose
whether or not you want to plan straightaway or at bind time.

Would this be acceptable?

Have a nice day,
--
Post by Tom Lane
From each according to his ability. To each according to his ability to litigate.
Tom Lane
2006-08-31 20:29:44 UTC
Permalink
Post by Martijn van Oosterhout
prepare_means_plan = [true|false]
So then a prepare will always parse straightaway, but you can choose
whether or not you want to plan straightaway or at bind time.
That seems like just a kluge, as you'd typically want query-by-query
control, and a GUC setting isn't convenient for that.

It's entirely possible that the current protocol definition is Good
Enough, assuming that client-library designers are aware of the
implications of using named vs unnamed statements (which I bet not
all of 'em are). You *can* have either behavior today, so far as
client-issued queries go. The area that seems to need work more
drastically is controlling what happens with queries inside plpgsql.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
Dave Cramer
2006-08-31 19:17:38 UTC
Permalink
Post by Tom Lane
Post by Peter Eisentraut
Post by Gregory Stark
Driver interfaces expose very clearly to the user an explicit
interface to prepare and execute a query separately.
The JDBC documentation merely contains statements of the sort "A SQL
statement with or without IN parameters can be pre-compiled and stored
in a PreparedStatement object. This object can then be used to
efficiently execute this statement multiple times." There is
absolutely no indication that the execution plan of the statement is
computed at the time of preparation.
The key word there is "efficiently". I think it is a reasonable
presumption on the user's part that a query done this way will have less
overhead than just resubmitting the raw query each time.
The important thing I see here is that JDBC allows use of IN
parameters
with or without a PreparedStatement (no?).
No, not that I am aware of. You can create a statement, and execute
it, but you need a PreparedStatement to set IN parameters
Post by Tom Lane
So they've separated the
concepts of out-of-line parameters and "preparing" a statement.
That's
the distinction we have unfortunately fudged in the V3 protocol.
The protocol does let you use OOL parameters without retaining a
prepared plan, thanks to the hack introduced later to not plan the
unnamed statement at Parse time, but that's definitely a bit of a wart
on the original protocol design. Maybe it's good enough, or maybe not.
regards, tom lane
---------------------------(end of
broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org
Dave

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
Hannu Krosing
2006-08-31 19:11:13 UTC
Permalink
Post by Peter Eisentraut
For PL/pgSQL, you simply write a query and all the preparing action
happens implicitly. There is nothing explicit about that interface.
So if users have certain expectations here, they're just making them up.
Or basing them on experience.

I for one would not like it at all if all my queries (select * from
plpgsqlfunc()) just magically become slower by 10-50%

If there will be an option not to plan/optimise prepared statemants, I
would certainly expect it to be off by default.
--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com



---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org
Gregory Stark
2006-08-31 22:02:35 UTC
Permalink
Post by Peter Eisentraut
Let's verify that. JDBC and PL/pgSQL have been mentioned.
The JDBC documentation merely contains statements of the sort "A SQL
statement with or without IN parameters can be pre-compiled and stored
in a PreparedStatement object. This object can then be used to
efficiently execute this statement multiple times." There is
absolutely no indication that the execution plan of the statement is
computed at the time of preparation. In fact, it doesn't say
what "pre-compiled" means at all.
I didn't say you were violating the technical definition in the specification.
I said you're going against expectations. This is the problem with being
dogmatic about abstraction boundaries. Obviously someone who doesn't know
what's going on under the hood has no specific expectations about what
"pre-compiling" might mean.

But the reality is that you can't effectively use a database without
understanding what query plans are and users do have expectations about
behaviour below the abstraction barrier.

If you don't think "pre-compiled" and "efficiently execute multiple times"
doesn't translate into "generates a query plan so it doesn't have to go
through that process to execute the query" I think you're in a very small
minority.
Post by Peter Eisentraut
For PL/pgSQL, you simply write a query and all the preparing action
happens implicitly. There is nothing explicit about that interface.
Well that's sort of the inherent problem with PLpgSQL and the way it mixes up
the procedural language with SQL.

I guess the natural extension of questioning PL/pgSQL would be to wonder why
subqueries in SQL queries don't get replanned every time they're executed. The
data distribution could certainly change partway though.
Post by Peter Eisentraut
So if users have certain expectations here, they're just making them up.
Well, that's what makes them "expectations" rather than promises.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
Martijn van Oosterhout
2006-08-31 18:08:40 UTC
Permalink
Post by Gregory Stark
Post by Peter Eisentraut
- Redefine "prepared" to mean "parsed" rather than "parsed and planned".
Then you would be going very much against the user's expectations.
Driver interfaces expose very clearly to the user an explicit interface to
prepare and execute a query separately. What your proposing is to go behind
the user's back and do what he's gone out of his way to tell you not to do.
You can always choose to prepare your queries immediately before use. Most
drivers even supply an interface to do so in a single step for convenience.
Is that really so? Under Perl DBI, the only way to get a statement
handle is to "prepare" it. Yet I don't want to use server-side prepares
because I know of the problems it causes. The "single-step" approach
provides no statement handle at all, which has several drawbacks.

People are encouraged to use prepared stataments for clarity and
security reasons, not speed. I would really like an option to choose
between:

- slightly more planning time but always good plans
- plan once and be unforgiving if the plan doesn't work with the
parameters

I'd take the first option anyday, but that's just the types of queries
I'm doing.

Have a nice day,
--
Post by Gregory Stark
From each according to his ability. To each according to his ability to litigate.
Andrew Dunstan
2006-08-31 18:16:32 UTC
Permalink
Post by Martijn van Oosterhout
Post by Gregory Stark
Post by Peter Eisentraut
- Redefine "prepared" to mean "parsed" rather than "parsed and planned".
Then you would be going very much against the user's expectations.
Driver interfaces expose very clearly to the user an explicit interface to
prepare and execute a query separately. What your proposing is to go behind
the user's back and do what he's gone out of his way to tell you not to do.
You can always choose to prepare your queries immediately before use. Most
drivers even supply an interface to do so in a single step for convenience.
Is that really so? Under Perl DBI, the only way to get a statement
handle is to "prepare" it. Yet I don't want to use server-side prepares
because I know of the problems it causes. The "single-step" approach
provides no statement handle at all, which has several drawbacks.
People are encouraged to use prepared stataments for clarity and
security reasons, not speed. I would really like an option to choose
- slightly more planning time but always good plans
- plan once and be unforgiving if the plan doesn't work with the
parameters
I'd take the first option anyday, but that's just the types of queries
I'm doing.
Have a nice day,
According to the docs you can actually choose between server side
prepare or not on a per call basis. It contains this example:

$sth->{pg_server_prepare} = 1;
$sth->execute(22);
$sth->{pg_server_prepare} = 0;
$sth->execute(44);
$sth->{pg_server_prepare} = 1;
$sth->execute(66);

cheers

andrew

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
Martijn van Oosterhout
2006-08-31 18:53:25 UTC
Permalink
Post by Andrew Dunstan
According to the docs you can actually choose between server side
Yeah, but it also contains this:

Using prepared statements is in theory quite a bit faster: not only
does the PostgreSQL backend only have to prepare the query only
once, but DBD::Pg no longer has to worry about quoting each value
before sending it to the server.

Which just continues the misconception: you can not worry about quoting
each value and still not use server-side prepares. There's a third
option which is not made clear (and it's not clear if it's available
via DBI).

Basically, unnamed "prepares" are not planned until bind time, named
statements are planned at prepare time. The question is, do you want to
be able to defer planning for named statements also?

Have a nice day,
--
Post by Andrew Dunstan
From each according to his ability. To each according to his ability to litigate.
Gregory Stark
2006-08-31 23:04:07 UTC
Permalink
Post by Martijn van Oosterhout
Post by Gregory Stark
Then you would be going very much against the user's expectations.
Driver interfaces expose very clearly to the user an explicit interface to
prepare and execute a query separately. What your proposing is to go behind
the user's back and do what he's gone out of his way to tell you not to do.
You can always choose to prepare your queries immediately before use. Most
drivers even supply an interface to do so in a single step for convenience.
Is that really so? Under Perl DBI, the only way to get a statement
handle is to "prepare" it.
Sure, but you can prepare it right before you use it and throw it away instead
of keeping it around.

The server has to prepare the query sometime. The v3 protocol just gives you
control over when that happens, but it doesn't force you to do it at any
particular time.
--
greg


---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq
Martijn van Oosterhout
2006-09-01 06:42:42 UTC
Permalink
Post by Gregory Stark
The server has to prepare the query sometime. The v3 protocol just gives you
control over when that happens, but it doesn't force you to do it at any
particular time.
Not really. All named prepares are planned straight away, all unnamed
ones are planned at bind time. Therefore you cannot have more than one
parsed-but-not-planned prepared query at a time. In a connection pool
scenario there's no way to share such plans since you can't tell which
query has been prepared. That's not forcing, but it's an asymmetry we
could do with out.

Have a nice day,
--
Post by Gregory Stark
From each according to his ability. To each according to his ability to litigate.
Lukas Kahwe Smith
2006-09-01 07:26:24 UTC
Permalink
Post by Martijn van Oosterhout
Post by Gregory Stark
The server has to prepare the query sometime. The v3 protocol just gives you
control over when that happens, but it doesn't force you to do it at any
particular time.
Not really. All named prepares are planned straight away, all unnamed
ones are planned at bind time. Therefore you cannot have more than one
parsed-but-not-planned prepared query at a time. In a connection pool
scenario there's no way to share such plans since you can't tell which
query has been prepared. That's not forcing, but it's an asymmetry we
could do with out.
AFAIK since Oracle 9i planning is always deferred until the first
execution. This way they hope to get a better plan, which would
obviously not be possible if the selectivity varies greatly.

So are the plans generated without taking any bound values into account
more stable in performance (albeit at times slower than what would have
been produced if the value would have been known)?

Either way mixing the question of when to prepare the plan with the
prepared statement being named or unnamed seems unexpected.

regards,
Lukas

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
Martijn van Oosterhout
2006-09-01 07:48:29 UTC
Permalink
Post by Lukas Kahwe Smith
AFAIK since Oracle 9i planning is always deferred until the first
execution. This way they hope to get a better plan, which would
obviously not be possible if the selectivity varies greatly.
Interesting.
Post by Lukas Kahwe Smith
So are the plans generated without taking any bound values into account
more stable in performance (albeit at times slower than what would have
been produced if the value would have been known)?
Possibly, though they might uniformly suck or be uniformly good...
Post by Lukas Kahwe Smith
Either way mixing the question of when to prepare the plan with the
prepared statement being named or unnamed seems unexpected.
Yes, well, when the V3 protocol designed it wasn't considered to
seperate the parse and plan phases. The exact commit that got us to
where we are now is here:

http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/tcop/postgres.c#rev1.420

That was over two years ago, I'm not sure whether any of the ideas
mentioned in there have been implemented since.

Have a nice day,
--
Post by Lukas Kahwe Smith
From each according to his ability. To each according to his ability to litigate.
Tom Lane
2006-09-01 14:18:37 UTC
Permalink
Post by Martijn van Oosterhout
Post by Gregory Stark
The server has to prepare the query sometime. The v3 protocol just gives you
control over when that happens, but it doesn't force you to do it at any
particular time.
Not really. All named prepares are planned straight away, all unnamed
ones are planned at bind time. Therefore you cannot have more than one
parsed-but-not-planned prepared query at a time. In a connection pool
scenario there's no way to share such plans since you can't tell which
query has been prepared. That's not forcing, but it's an asymmetry we
could do with out.
Sure, but how much does it really matter? If you don't want the plan
saved for reuse, merely avoiding retransmission of the query text does
not seem like a major win. Having had time to think about it, I no
longer think the protocol design is a blocking bug for this problem
area. It's something we could improve when we are ready to design
protocol V4, but it does not seem in itself enough reason to make a
new protocol (with all the pain that entails).

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
Martijn van Oosterhout
2006-09-01 15:26:44 UTC
Permalink
Post by Tom Lane
Sure, but how much does it really matter? If you don't want the plan
saved for reuse, merely avoiding retransmission of the query text does
not seem like a major win. Having had time to think about it, I no
longer think the protocol design is a blocking bug for this problem
area. It's something we could improve when we are ready to design
protocol V4, but it does not seem in itself enough reason to make a
new protocol (with all the pain that entails).
Well, I see the documentation does actually describe the situation
fairly well, so perhaps the right approach is to get interface writers
to appreciate the difference and indicate if named or unnamed prepares
are used.

Have a nice day,
--
Post by Tom Lane
From each according to his ability. To each according to his ability to litigate.
Jim C. Nasby
2006-09-02 22:40:38 UTC
Permalink
Post by Tom Lane
Post by Martijn van Oosterhout
Post by Gregory Stark
The server has to prepare the query sometime. The v3 protocol just gives you
control over when that happens, but it doesn't force you to do it at any
particular time.
Not really. All named prepares are planned straight away, all unnamed
ones are planned at bind time. Therefore you cannot have more than one
parsed-but-not-planned prepared query at a time. In a connection pool
scenario there's no way to share such plans since you can't tell which
query has been prepared. That's not forcing, but it's an asymmetry we
could do with out.
Sure, but how much does it really matter? If you don't want the plan
saved for reuse, merely avoiding retransmission of the query text does
not seem like a major win. Having had time to think about it, I no
longer think the protocol design is a blocking bug for this problem
area. It's something we could improve when we are ready to design
protocol V4, but it does not seem in itself enough reason to make a
new protocol (with all the pain that entails).
That should either go into the TODO, or a "V4 wishlist"...
--
Jim C. Nasby, Database Architect ***@nasby.net
512.569.9461 (cell) http://jim.nasby.net

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
Loading...