Discussion:
BUG #10728: json_to_recordset with nested json objects NULLs columns
Tom Lane
2014-06-23 15:43:00 UTC
Permalink
Digging more into that, I have found the issue and a fix for it. It happens
that populate_recordset_object_start, which is used to initialize the
process for the population of the record, is taken *each* time a json
object is found, re-creating every time the hash table for the parsing
process, hence removing from PopulateRecordsetState all the entries already
parsed and creating the problem reported by Matti. The fix I am proposing
to fix this issue is rather simple: simply bypass the creation of the hash
table if lex_level > 1 as we are in presence of a nested object and rely on
the existing hash table.
Yes, this code is clearly not handling the nested-objects case correctly.
I had written a fix more or less equivalent to yours last night.

However, it seems to me that these functions (json[b]_to_record[set]) are
handling the nested-json-objects case in a fairly brain-dead fashion to
start with. I would like to propose that we should think about getting
rid of the use_json_as_text flag arguments altogether. What purpose do
they serve? If we're going to the trouble of parsing the nested JSON
objects anyway, why don't we just reconstruct from that data?

(IOW, we probably actually should have nested hash tables in this case.
I suspect that the current bug arose from incompletely-written logic
to do it like that.)

Since we've already decided to force an initdb for 9.4beta2, it's not
quite too late to revisit this API, and I think it needs revisiting.

regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Merlin Moncure
2014-06-23 16:19:05 UTC
Permalink
Post by Tom Lane
Digging more into that, I have found the issue and a fix for it. It happens
that populate_recordset_object_start, which is used to initialize the
process for the population of the record, is taken *each* time a json
object is found, re-creating every time the hash table for the parsing
process, hence removing from PopulateRecordsetState all the entries already
parsed and creating the problem reported by Matti. The fix I am proposing
to fix this issue is rather simple: simply bypass the creation of the hash
table if lex_level > 1 as we are in presence of a nested object and rely on
the existing hash table.
Yes, this code is clearly not handling the nested-objects case correctly.
I had written a fix more or less equivalent to yours last night.
However, it seems to me that these functions (json[b]_to_record[set]) are
handling the nested-json-objects case in a fairly brain-dead fashion to
start with. I would like to propose that we should think about getting
rid of the use_json_as_text flag arguments altogether. What purpose do
they serve? If we're going to the trouble of parsing the nested JSON
objects anyway, why don't we just reconstruct from that data?
I think they should be removed. (I called this out in the feature
level review: http://www.postgresql.org/message-id/CAHyXU0wqadCJk7MMkeARuuY05VrD=***@mail.gmail.com).
AIUI, the flag was introduced as a workaround to try and deal with
mapping nested structures. Text variant 'json' flags have had them.

merlin
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Andrew Dunstan
2014-06-23 20:01:03 UTC
Permalink
Post by Tom Lane
Digging more into that, I have found the issue and a fix for it. It happens
that populate_recordset_object_start, which is used to initialize the
process for the population of the record, is taken *each* time a json
object is found, re-creating every time the hash table for the parsing
process, hence removing from PopulateRecordsetState all the entries already
parsed and creating the problem reported by Matti. The fix I am proposing
to fix this issue is rather simple: simply bypass the creation of the hash
table if lex_level > 1 as we are in presence of a nested object and rely on
the existing hash table.
Yes, this code is clearly not handling the nested-objects case correctly.
I had written a fix more or less equivalent to yours last night.
However, it seems to me that these functions (json[b]_to_record[set]) are
handling the nested-json-objects case in a fairly brain-dead fashion to
start with. I would like to propose that we should think about getting
rid of the use_json_as_text flag arguments altogether. What purpose do
they serve? If we're going to the trouble of parsing the nested JSON
objects anyway, why don't we just reconstruct from that data?
(IOW, we probably actually should have nested hash tables in this case.
I suspect that the current bug arose from incompletely-written logic
to do it like that.)
Since we've already decided to force an initdb for 9.4beta2, it's not
quite too late to revisit this API, and I think it needs revisiting.
Looks like we have some problems in this whole area, not just the new
function, so we need to fix 9.3 also :-(

IIRC, originally, the intention was to disallow nested json objects, but
the use_json_as_text was put in as a possibly less drastic possibility.
If we get rid of it our only recourse is to error out if we encounter
nested json. I was probably remiss in not considering the likelihood of
a json target field.

I currently don't have lots of time to devote to this, sadly, but
Michael's patch looks like a good minimal fix.

cheers

andrew
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Tom Lane
2014-06-23 23:34:08 UTC
Permalink
Post by Andrew Dunstan
Post by Tom Lane
However, it seems to me that these functions (json[b]_to_record[set]) are
handling the nested-json-objects case in a fairly brain-dead fashion to
start with. I would like to propose that we should think about getting
rid of the use_json_as_text flag arguments altogether. What purpose do
they serve? If we're going to the trouble of parsing the nested JSON
objects anyway, why don't we just reconstruct from that data?
Looks like we have some problems in this whole area, not just the new
function, so we need to fix 9.3 also :-(
IIRC, originally, the intention was to disallow nested json objects, but
the use_json_as_text was put in as a possibly less drastic possibility.
If we get rid of it our only recourse is to error out if we encounter
nested json. I was probably remiss in not considering the likelihood of
a json target field.
I currently don't have lots of time to devote to this, sadly, but
Michael's patch looks like a good minimal fix.
I can spend some time on it over the next couple of days. I take it you
don't have a problem with the concept of doing recursive processing,
as long as it doesn't add much complication?

I'm not following your comment about 9.3. The json[b]_to_record[set]
functions are new in 9.4, which is what makes me feel it's not too
late to redefine their behavior. But changing behavior of stuff that
was in 9.3 seems a lot more debatable.

regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Andrew Dunstan
2014-06-24 00:34:08 UTC
Permalink
Post by Tom Lane
Post by Andrew Dunstan
Post by Tom Lane
However, it seems to me that these functions (json[b]_to_record[set]) are
handling the nested-json-objects case in a fairly brain-dead fashion to
start with. I would like to propose that we should think about getting
rid of the use_json_as_text flag arguments altogether. What purpose do
they serve? If we're going to the trouble of parsing the nested JSON
objects anyway, why don't we just reconstruct from that data?
Looks like we have some problems in this whole area, not just the new
function, so we need to fix 9.3 also :-(
IIRC, originally, the intention was to disallow nested json objects, but
the use_json_as_text was put in as a possibly less drastic possibility.
If we get rid of it our only recourse is to error out if we encounter
nested json. I was probably remiss in not considering the likelihood of
a json target field.
I currently don't have lots of time to devote to this, sadly, but
Michael's patch looks like a good minimal fix.
I can spend some time on it over the next couple of days. I take it you
don't have a problem with the concept of doing recursive processing,
as long as it doesn't add much complication?
I'm not following your comment about 9.3. The json[b]_to_record[set]
functions are new in 9.4, which is what makes me feel it's not too
late to redefine their behavior. But changing behavior of stuff that
was in 9.3 seems a lot more debatable.
This problem is also manifest in json_populate_recordset, which also
uses the function in question, and is in 9.3:

andrew=# create type yyy as (a int, b json, c int, d int);
CREATE TYPE
andrew=# select * from json_populate_recordset(null::yyy, '[
{"a":2,"c":3,"b":{"z":4}, "d":6}
]
',true) x;
a | b | c | d
---+---------+---+---
| {"z":4} | | 6
(1 row)

I don't have any problem with recursive processing, but I'm not sure I
understand how it will work. If you post a patch I will be able to look
it over, though.

cheers

andrew
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Tom Lane
2014-06-24 01:43:31 UTC
Permalink
Post by Andrew Dunstan
Post by Tom Lane
I'm not following your comment about 9.3. The json[b]_to_record[set]
functions are new in 9.4, which is what makes me feel it's not too
late to redefine their behavior. But changing behavior of stuff that
was in 9.3 seems a lot more debatable.
This problem is also manifest in json_populate_recordset, which also
Ah, I see the problem.

Here is a first cut suggestion:

* Get rid of the use_json_as_text flag argument for the new functions.
In json_populate_record(set), ignore its value and deprecate using it.
(The fact that it already had a default makes that easier.) The
behavior should always be as below.

* For nested json objects, we'll spit those out in json textual format,
which means they'll successfully convert to either text or json/jsonb.
Compared to the old behavior of json_populate_recordset, this just means
that we don't throw an error anymore regardless of the flag value,
which seems ok (though maybe not something to backpatch into 9.3).

* Nested json arrays are a bit more problematic. What I'd ideally like
is to spit them out in a form that would be successfully parsable as a SQL
array of the appropriate element type. Unfortunately, I think that that
ship has sailed because json_populate_recordset failed to do that in 9.3.
What we should probably do is define this the same as the nested object
case, ie, we spit it out in *json* array format, meaning you can insert it
into a text or json/jsonb field of the result record. Maybe sometime in
the future we can add a json-array-to-SQL-array converter function, but
these functions won't do that.
Post by Andrew Dunstan
From a user's standpoint this just boils down to (a) fix the bug with
mishandling of the hash tables, and (b) get rid of the gratuitous
error report.

regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Michael Paquier
2014-06-24 02:08:18 UTC
Permalink
Post by Tom Lane
Post by Andrew Dunstan
Post by Tom Lane
I'm not following your comment about 9.3. The json[b]_to_record[set]
functions are new in 9.4, which is what makes me feel it's not too
late to redefine their behavior. But changing behavior of stuff that
was in 9.3 seems a lot more debatable.
This problem is also manifest in json_populate_recordset, which also
Ah, I see the problem.
* Get rid of the use_json_as_text flag argument for the new functions.
In json_populate_record(set), ignore its value and deprecate using it.
(The fact that it already had a default makes that easier.) The
behavior should always be as below.
Agreed. This simplifies the interface of the existing functions and will
need a mention in the release notes of 9.3.
Post by Tom Lane
* For nested json objects, we'll spit those out in json textual format,
which means they'll successfully convert to either text or json/jsonb.
Compared to the old behavior of json_populate_recordset, this just means
that we don't throw an error anymore regardless of the flag value,
which seems ok (though maybe not something to backpatch into 9.3).
* Nested json arrays are a bit more problematic. What I'd ideally like
Post by Tom Lane
is to spit them out in a form that would be successfully parsable as a SQL
array of the appropriate element type. Unfortunately, I think that that
ship has sailed because json_populate_recordset failed to do that in 9.3.
What we should probably do is define this the same as the nested object
case, ie, we spit it out in *json* array format, meaning you can insert it
into a text or json/jsonb field of the result record. Maybe sometime in
the future we can add a json-array-to-SQL-array converter function, but
these functions won't do that.
Just a question (lack of coffee): do those two points implicitly mean that
we do not parse the nested json objects, pass them as simple text to the
hash table, and bypass the creation of fresh hash tables with lex_level > 1
in populate_recordset_object_start.
--
Michael
Merlin Moncure
2014-06-24 12:31:45 UTC
Permalink
Post by Tom Lane
* Nested json arrays are a bit more problematic. What I'd ideally like
is to spit them out in a form that would be successfully parsable as a SQL
array of the appropriate element type. Unfortunately, I think that that
ship has sailed because json_populate_recordset failed to do that in 9.3.
What we should probably do is define this the same as the nested object
case, ie, we spit it out in *json* array format, meaning you can insert it
into a text or json/jsonb field of the result record. Maybe sometime in
the future we can add a json-array-to-SQL-array converter function, but
these functions won't do that.
Not quite following your logic here. 9.3 gave an error for an
internally nested array:

postgres=# create type foo as(a int, b int[]);
postgres=# select * from json_populate_recordset(null::foo, '[{"a": 1,
"b": [1,2,3]},{"a": 1, "b": [1,2,3]}]');
ERROR: cannot call json_populate_recordset on a nested object

With your proposal this would still fail? TBH, I'd rather this
function fail as above than implement a behavior we couldn't take back
later.

merlin
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Tom Lane
2014-06-25 00:01:00 UTC
Permalink
Post by Merlin Moncure
Post by Tom Lane
* Nested json arrays are a bit more problematic. What I'd ideally like
is to spit them out in a form that would be successfully parsable as a SQL
array of the appropriate element type. Unfortunately, I think that that
ship has sailed because json_populate_recordset failed to do that in 9.3.
What we should probably do is define this the same as the nested object
case, ie, we spit it out in *json* array format, meaning you can insert it
into a text or json/jsonb field of the result record. Maybe sometime in
the future we can add a json-array-to-SQL-array converter function, but
these functions won't do that.
Not quite following your logic here. 9.3 gave an error for an
postgres=# create type foo as(a int, b int[]);
postgres=# select * from json_populate_recordset(null::foo, '[{"a": 1,
"b": [1,2,3]},{"a": 1, "b": [1,2,3]}]');
ERROR: cannot call json_populate_recordset on a nested object
Yeah, that's the default behavior, with use_json_as_text false.
However, consider what happens with use_json_as_text true:

regression=# select * from json_populate_recordset(null::foo, '[{"a": 1,
"b": [1,2,3]},{"a": 1, "b": [1,2,3]}]', true);
ERROR: missing "]" in array dimensions

That case is certainly useless, but suppose somebody had done

regression=# create type foo2 as(a int, b json);
CREATE TYPE
regression=# select * from json_populate_recordset(null::foo2, '[{"a": 1,
"b": [1,2,3]},{"a": 1, "b": [1,2,3]}]', true);
a | b
---+---------
1 | [1,2,3]
1 | [1,2,3]
(2 rows)

or even just

regression=# create type foo3 as(a int, b text);
CREATE TYPE
regression=# select * from json_populate_recordset(null::foo3, '[{"a": 1,
"b": [1,2,3]},{"a": 1, "b": [1,2,3]}]', true);
a | b
---+---------
1 | [1,2,3]
1 | [1,2,3]
(2 rows)

Since these cases work and do something arguably useful, I doubt we
can break them.

However, I don't see anything wrong with changing the behavior in
cases that currently throw an error, since presumably no application
is depending on them. Perhaps Andrew's comment about looking at the
target type info yields a way forward, ie, we could output in SQL-array
format if the target is an array, or in JSON-array format if the target
is json. Multiply-nested cases might be a pain to get right though.

regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Andrew Dunstan
2014-06-24 14:08:15 UTC
Permalink
Post by Tom Lane
Post by Andrew Dunstan
Post by Tom Lane
I'm not following your comment about 9.3. The json[b]_to_record[set]
functions are new in 9.4, which is what makes me feel it's not too
late to redefine their behavior. But changing behavior of stuff that
was in 9.3 seems a lot more debatable.
This problem is also manifest in json_populate_recordset, which also
Ah, I see the problem.
* Get rid of the use_json_as_text flag argument for the new functions.
In json_populate_record(set), ignore its value and deprecate using it.
(The fact that it already had a default makes that easier.) The
behavior should always be as below.
* For nested json objects, we'll spit those out in json textual format,
which means they'll successfully convert to either text or json/jsonb.
Compared to the old behavior of json_populate_recordset, this just means
that we don't throw an error anymore regardless of the flag value,
which seems ok (though maybe not something to backpatch into 9.3).
* Nested json arrays are a bit more problematic. What I'd ideally like
is to spit them out in a form that would be successfully parsable as a SQL
array of the appropriate element type. Unfortunately, I think that that
ship has sailed because json_populate_recordset failed to do that in 9.3.
What we should probably do is define this the same as the nested object
case, ie, we spit it out in *json* array format, meaning you can insert it
into a text or json/jsonb field of the result record. Maybe sometime in
the future we can add a json-array-to-SQL-array converter function, but
these functions won't do that.
Post by Andrew Dunstan
From a user's standpoint this just boils down to (a) fix the bug with
mishandling of the hash tables, and (b) get rid of the gratuitous
error report.
The big problem is that we have been ignoring the result type when
constructing the hash, even though the info is available. There is some
sense in this in that the field might not even be present in the result
type. And it works except for structured types like records, arrays and
json. Even if we don't have a nested value, the functions will do the
wrong thing for a scalar string destined for a json field (it will be
de-escaped, when it should not be).

w.r.t. json arrays, I think you're chasing a chimera, since they are
heterogenous, unlike SQL arrays.

w.r.t. the use_json_as_text argument, yes, it has a default, but the
default is false. Ignoring it seems to be more than just deprecating it.
I agree it's a mess, though :-(


cheers

andrew
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Merlin Moncure
2014-06-24 16:57:53 UTC
Permalink
Post by Andrew Dunstan
w.r.t. json arrays, I think you're chasing a chimera, since they are
heterogenous, unlike SQL arrays.
But, there are many useful cases where the json is known to be well
formed, right? Or do you mean that the difficulties stem from simply
validating the type? Basically, I'm wondering if

SELECT to_json(foo_t[])

is ever going to be able to be reversed by:

SELECT array(json[b]_populate_recordset(null::foo_t[]), '...'::json[b])

...where foo_t is some arbitrarily complex nested type. even simpler
(although not necessarily faster) would be:

SELECT from_json(null::foo_t[], ',,,');

or even

SELECT '...'::foo_t[]::json::foo_t[];

My basic gripe with the json[b] APIs is that there is no convenient
deserialization reverse of to_json. Tom's proposal AIUI, in particular
having internal json arrays force to json, would foreclose the last
two cases from ever being possible.

merlin
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Tom Lane
2014-06-26 22:31:54 UTC
Permalink
Post by Tom Lane
* Get rid of the use_json_as_text flag argument for the new functions.
In json_populate_record(set), ignore its value and deprecate using it.
(The fact that it already had a default makes that easier.) The
behavior should always be as below.
Here's a draft patch that gets rid of the use_json_as_text flag argument
altogether for all the functions newly added in 9.4. The pre-existing
json_populate_record(set) functions still have such an argument, but
it's ignored and undocumented (not that it was well documented before).
The behavior is as if the flag had been specified as true, which AFAICS
is the only useful case.

This does not do anything about the question of possibly printing
JSON arrays in SQL array syntax when the target record field is of
array type. While I think that's definitely do-able, it would require
some significant rewriting of the code, which is probably not something
to be doing at this point for 9.4. I think it's something we can add
as a new feature in 9.5 or later, because it changes the behavior only
in cases that are currently guaranteed-to-fail, so there's not really
any backwards compatibility problem IMO.

BTW, depending on how hard we want to work, one could imagine also
printing JSON objects in SQL record format if the target type is record
rather than JSON, and then making this happen recursively for nested
arrays/composites. Again, any attempt to do that today would fail with
a syntax error from record_in, so there's no backwards compatibility
problem.

regards, tom lane

Michael Paquier
2014-06-24 02:34:38 UTC
Permalink
Post by Tom Lane
Post by Andrew Dunstan
Post by Tom Lane
However, it seems to me that these functions (json[b]_to_record[set]) are
handling the nested-json-objects case in a fairly brain-dead fashion to
start with. I would like to propose that we should think about getting
rid of the use_json_as_text flag arguments altogether. What purpose do
they serve? If we're going to the trouble of parsing the nested JSON
objects anyway, why don't we just reconstruct from that data?
Looks like we have some problems in this whole area, not just the new
function, so we need to fix 9.3 also :-(
IIRC, originally, the intention was to disallow nested json objects, but
the use_json_as_text was put in as a possibly less drastic possibility.
If we get rid of it our only recourse is to error out if we encounter
nested json. I was probably remiss in not considering the likelihood of
a json target field.
I currently don't have lots of time to devote to this, sadly, but
Michael's patch looks like a good minimal fix.
I can spend some time on it over the next couple of days. I take it you
don't have a problem with the concept of doing recursive processing,
as long as it doesn't add much complication?
I'm not following your comment about 9.3. The json[b]_to_record[set]
functions are new in 9.4, which is what makes me feel it's not too
late to redefine their behavior. But changing behavior of stuff that
was in 9.3 seems a lot more debatable.
This problem is also manifest in json_populate_recordset, which also uses
andrew=# create type yyy as (a int, b json, c int, d int);
CREATE TYPE
andrew=# select * from json_populate_recordset(null::yyy, '[
{"a":2,"c":3,"b":{"z":4}, "d":6}
]
',true) x;
a | b | c | d
---+---------+---+---
| {"z":4} | | 6
(1 row)
Yeah, the somewhat-backpatchable patch I sent fixes that as well. I
wouldn't mind writing a more complete patch with new regression tests and
tutti-quanti for 9.3 and 9.4master, but it seems that Tom is already on it.
--
Michael
Tom Lane
2014-06-25 04:25:53 UTC
Permalink
Post by Andrew Dunstan
Post by Andrew Dunstan
I currently don't have lots of time to devote to this, sadly, but
Michael's patch looks like a good minimal fix.
This problem is also manifest in json_populate_recordset, which also
I've pushed this patch back through 9.3, along with a fix to ensure that
json_populate_record destroys the hashtable it creates. I want to do some
more work on this code, but this much is indubitably a bug fix.

regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Loading...