Shared memory and memory context question

Discussion:

r***@playford.net

2006-02-05 14:03:59 UTC

Dear all,

I am writing a C-language shared-object file which is dynamically linked with
postgres, and uses the various SPI functions for executing queries from
numerous trigger functions.

My question is thus: what is the best method for a dynamically linked object
to share memory with the same object running on other backends? Am I right in
thinking that if I allocate memory in the "upper execution context" from
SPI_palloc(), this is not shared with the other processes?

I thought of a few ways of doing this (please forgive me if these appear
idiotic, as I am fairly new to postgres):

1. Change memory context to TopMemoryContext and palloc everything there.
(However, I believe this still isn't shared between processes?)

2. Use the shmem functions in src/backend/storage/ipc/shmem.c to create a
chunk of shared memory and use this (Although I would like to avoid writing
my own memory manager to carve up the space).

3. Somehow create shared memory using the shmem functions, and set a memory
context to live *inside* this shared memory, which my trigger functions can
then switch to. Then use palloc() and pfree() without worrying..

Please let me know if this problem has been solved before, as I have searched
through the mailing lists and through the source, but am not sure which is
the best way to resolve it. Thanks for your help.

Regards,

Richard

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Martijn van Oosterhout

2006-02-05 14:11:25 UTC

Permalink

Post by r***@playford.net
1. Change memory context to TopMemoryContext and palloc everything there.
(However, I believe this still isn't shared between processes?)

Not shared, correct.

This is the generally accepted method. Please remember that when
sharing structures you have to worry about concurrency. So you need
locking.

Post by r***@playford.net
3. Somehow create shared memory using the shmem functions, and set a memory
context to live *inside* this shared memory, which my trigger functions can
then switch to. Then use palloc() and pfree() without worrying..

Nope, palloc/pfree don't deal with concurrency.

Post by r***@playford.net
Please let me know if this problem has been solved before, as I have searched
through the mailing lists and through the source, but am not sure which is
the best way to resolve it. Thanks for your help.

Most people allocate chunks of shared memory and don't use
palloc/pfree. What are you doing that requires such management? Most
shared structures in PostgreSQL are allocated once and never freed...

Have a nice day,

Post by r***@playford.net
Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.

Richard Hills

2006-02-05 14:31:23 UTC

Permalink

Post by Martijn van Oosterhout
This is the generally accepted method. Please remember that when
sharing structures you have to worry about concurrency. So you need
locking.

Of course - I have already implemented locking with semaphores (I may simply
use one big lock and carefully avoid reentry).

Post by Martijn van Oosterhout
Nope, palloc/pfree don't deal with concurrency.

Indeed, although if I lock the shared memory then I can palloc and pfree()
without worrying. The problem I see is that new memory contexts have their
memory assigned to them when they are created. I can't tell them "go here!"

Post by Martijn van Oosterhout
Most people allocate chunks of shared memory and don't use
palloc/pfree. What are you doing that requires such management? Most
shared structures in PostgreSQL are allocated once and never freed...

I have a number of functions which modify tables based on complex rules stored
in script-files. I wrote a parser for these files as a separate program first
before incorporating it as a shared object, subsequentially it loads and
executes rules from memory. As anything can be read from the files, and rules
can be unloaded later, I was hoping for flexibility in allocing memory to
store it all.

Another option is to load the files but store the rules within the database,
which should be possible, but appears to be a slightly messy way of doing it.
Then again, messing about with shared memory allocation may be messier.
Asking as an fairly inexperienced postgres person, what would you suggest?

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Martijn van Oosterhout

2006-02-05 14:43:56 UTC

Permalink

Post by Richard Hills
I have a number of functions which modify tables based on complex rules stored
in script-files. I wrote a parser for these files as a separate program first
before incorporating it as a shared object, subsequentially it loads and
executes rules from memory. As anything can be read from the files, and rules
can be unloaded later, I was hoping for flexibility in allocing memory to
store it all.

So what you load are the already processed rules? In that case you
could probably use the buffer management system. Ask it to load the
blocks and they'll be in the buffer cache. As long as you have the
buffer pinned they'll stay there. That's pretty much a read-only
approach.

If you're talking about things that don't come from disk, well, hmm...
If you want you could use a file on disk as backing and mmap() it into
each processes address space...

Post by Richard Hills
Another option is to load the files but store the rules within the database,
which should be possible, but appears to be a slightly messy way of doing it.
Then again, messing about with shared memory allocation may be messier.
Asking as an fairly inexperienced postgres person, what would you suggest?

The real question is, does it need to be shared-writable.
Shared-readonly is much easier (ie one writer, multiple readers). Using
a file as backing store for mmap() may be the easiest....

Have a nice day,

Post by Richard Hills
Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.

r***@playford.net

2006-02-05 14:58:46 UTC

Permalink

Post by Martijn van Oosterhout
So what you load are the already processed rules? In that case you
could probably use the buffer management system. Ask it to load the
blocks and they'll be in the buffer cache. As long as you have the
buffer pinned they'll stay there. That's pretty much a read-only
approach.
If you're talking about things that don't come from disk, well, hmm...
If you want you could use a file on disk as backing and mmap() it into
each processes address space...

<...>

Post by Martijn van Oosterhout
The real question is, does it need to be shared-writable.
Shared-readonly is much easier (ie one writer, multiple readers). Using
a file as backing store for mmap() may be the easiest....

I load the rules from a script and parse them, storing them in a forest of
linked malloced structures. These structures are created by one writer but
then read by a number of readers, and later may be removed by the original
writer.

So, as you can imagine, I could store the forest in the db, although it might
be a mess. First I will look through the buffer management system, and see if
that will do the job.

Thanks for your help,

Regards,

Richard

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Tom Lane

2006-02-05 16:16:39 UTC

Permalink

... until you get to the end of the transaction, where the buffer
manager will barf because somebody forgot an unpin. Long-term buffer
pins are really not acceptable anyway --- you'd essentially be asserting
that your little facility is more important than any other use of shared
buffers, and I'm sorry but that ain't so.

AFAICT the data structures you are worried about don't have any readily
predictable size, which means there is no good way to keep them in
shared memory --- we can't dynamically resize shared memory. So I think
storing the rules in a table and loading into private memory at need is
really the only reasonable solution. Storing them in a table has a lot
of other advantages anyway, mainly that you can manipulate them from
SQL.

You can find some prior discussion of similar issues in the archives;
IIRC the idea of a shared plan cache was being kicked around for awhile
some years back.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Richard Hills

2006-02-05 17:13:29 UTC

Permalink

Post by Tom Lane
AFAICT the data structures you are worried about don't have any readily
predictable size, which means there is no good way to keep them in
shared memory --- we can't dynamically resize shared memory. So I think
storing the rules in a table and loading into private memory at need is
really the only reasonable solution. Storing them in a table has a lot
of other advantages anyway, mainly that you can manipulate them from
SQL.

I have come to the conclusion that storing the rules and various other bits in
tables is the best solution, although this will require a much more complex
db structure than I had originally planned. Trying to allocate and free
memory in shared memory is fairly straightforward, but likely to become
incredibly messy.

Seeing as some of the rules already include load-value-from-db-on-demand, it
should be fairly straightforward to extend it to load-rule-from-db-on-demand.

Thanks for all your help,

Regards,

Richard

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Mark Woodward

2006-02-06 05:17:27 UTC

Permalink

Post by Richard Hills

I have come to the conclusion that storing the rules and various other bits in
tables is the best solution, although this will require a much more complex
db structure than I had originally planned. Trying to allocate and free
memory in shared memory is fairly straightforward, but likely to become
incredibly messy.
Seeing as some of the rules already include load-value-from-db-on-demand, it
should be fairly straightforward to extend it to
load-rule-from-db-on-demand.

I posted some source to a shared memory sort of thing to the group, as
well as to you, I believe.

For variables and values that change very infrequently, using the DB is
the right idea. PostgreSQL, as well as most databases, crumble under a
highly changing database. By changing, I mean a lot of UPDATES and
DELETES. Inserts are not so bad. PostgreSQL has a fairl poor (IMHO) UPDATE
behaviour. Most transaction aware databases do, but PostgreSQL seems quite
bad.

For an example, if you are doing a scoreboard sort of thing for a website,
updating a single varible in a table 20 times a second, will quickly make
that simple and normally fast update/query take a very long time. You have
to run VACUUM a whole lot.

The next example is a session table for a website, you may have a few
hundred or a few thousand active session rows, but each row may get many
updates, and you may have tens of thousands of sessions which may be
inactive. Unless you vaccum very frequently, you are doing a lot of disk
I/O for every session, because the query has to walk the table file to
find a valid row.

A database is a BAD system to manage data like sessions in an active
website. It is a good tool for most all, but if you are implementing an
eBay or Yahoo, you'll swamp your DB quickly.

The issue with a shared memory system is that you don't get the data
security that you do with disk storage.

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Richard Hills

2006-02-06 13:42:55 UTC

Permalink

Post by Mark Woodward
I posted some source to a shared memory sort of thing to the group, as
well as to you, I believe.

Indeed, and it looks rather interesting. I'll have a look through it when I
have a chance...

So, after more discussion and experimentation, the possible methods in order
of +elegance/-difficulty/-complexity are:

=1. OSSP supported shared mem, possibly with a pg memory context or Mark's
shared memory manager.
=1. Separate application which the postgres backends talk to over tcp (which
actually turns out to be quite a clean way of doing it).
3. Storing rules in db and reloading them each time (which turns out to be a
utter bastard to do).
4. Shared memory with my own memory manager.

I am *probably* going to go for the separate network application, as I
believe this is easy and relatively clean, as the required messages should be
fairly straightforward. Each postgres backend opens a connection to the
single separate "rules-server" which sends back a serious of commands
(probably SQL), before the connection is closed again.

If this is Clearly Insane - please let me know!

Regards,

Richard

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Mark Woodward

2006-02-06 14:43:44 UTC

Permalink

Post by Richard Hills

Post by Mark Woodward
I posted some source to a shared memory sort of thing to the group, as
well as to you, I believe.

Indeed, and it looks rather interesting. I'll have a look through it when I
have a chance...
So, after more discussion and experimentation, the possible methods in order
=1. OSSP supported shared mem, possibly with a pg memory context or Mark's
shared memory manager.
=1. Separate application which the postgres backends talk to over tcp (which
actually turns out to be quite a clean way of doing it).

If you hop on over to http://www.mohawksoft.org, you'll see a server
application called "MCache." MCache is written to handle *exactly* the
sort of information you are looking to manage. Its primary duty is to
manage highly concurrent/active sessions for a large web cluster. I have
also been working on a PostgreSQL extension for it. It needs to be fleshed
out and, again, some heavy duty QA, but "works on my machine."

I alluded to releasing an extension module for PostgreSQL, I'm actually
working on a much larger set of projects intended to tightly integrate
PostgreSQL, web servers (PHP right now), and a set of service applications
including search and recommendations. In another thread I wanted to add an
extension, "xmldbx," to postgresql's contrib dir. Anyway, I digress.

If anyone is interested in lending a hand in QA, examples, and so on, I'd
be glad to take this off line.

Post by Richard Hills
3. Storing rules in db and reloading them each time (which turns out to be a
utter bastard to do).
4. Shared memory with my own memory manager.

If you have time and the inclanation to so, it is a fund sort of thing to
write.

Post by Richard Hills
I am *probably* going to go for the separate network application, as I
believe this is easy and relatively clean, as the required messages should be
fairly straightforward. Each postgres backend opens a connection to the
single separate "rules-server" which sends back a serious of commands
(probably SQL), before the connection is closed again.
If this is Clearly Insane - please let me know!

It isn't a bad idea at all. For MCache, I leave the socket connection open
for the next use of the PostgreSQL session. Web environments usually keep
a cache of active database connections to save the overhead of connecting
each time. You just need to be careful when you clean up.

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Doug McNaught

2006-02-05 14:16:51 UTC

Permalink

Post by r***@playford.net
1. Change memory context to TopMemoryContext and palloc everything there.
(However, I believe this still isn't shared between processes?)

Nope.

Post by r***@playford.net
2. Use the shmem functions in src/backend/storage/ipc/shmem.c to create a
chunk of shared memory and use this (Although I would like to avoid writing
my own memory manager to carve up the space).
3. Somehow create shared memory using the shmem functions, and set a memory
context to live *inside* this shared memory, which my trigger functions can
then switch to. Then use palloc() and pfree() without worrying..

You'd have to do one of the above, but #2 is probably out because all
shared memory is allocated to various purposes at startup and there is
none free at runtime (as I understand it).

For #3, how do you plan to have a memory context shared by multiple
backends with no synchronization? If two backends try to do
allocation or deallocation at the same time you will get corruption,
as I don't think palloc() and pfree() do any locking (they currently
never allocate from shared memory).

You should probably think very carefully about whether you can get
along without using additional shared memory, because it's not that
easy to do.

-Doug

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Neil Conway

2006-02-06 01:09:16 UTC

Permalink

This has been done before, by the TelegraphCQ folks: they implemented a
shared memory MemoryContext on top of OSSP MM[1]. The code is in the
v0.2 TelegraphCQ tarball[2] -- see shmctx.c and shmset.c in
src/backend/utils/mmgr/. I'm not aware of an independent distribution,
but you could probably separate it out without too much pain.

(Of course, the comments elsewhere in the thread about using an
alternative are probably still true...)

-Neil

[1] http://www.ossp.org/pkg/lib/mm/
[2] http://telegraph.cs.berkeley.edu/downloads/TelegraphCQ-0.2.tar.gz

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Mark Woodward

2006-02-06 03:37:24 UTC

Permalink

Hi!!

I was just browsing the message and saw yours. I have actually written a
shared memory system for PostgreSQL.

I've done some basic bench testing, and it seems to work, but I haven't
given it the big QA push yet.

My company, Mohawk Software, is going to release a bunch of PostgreSQL
extenssions for text search, shared memory, interfacing, etc.

Here's the source for the shared module. Mind you, it has not been through
rigerous QA yet!!! Also, this is the UNIX/Linux/SHM version, the Win32
version has not been written yet.

http://www.mohawksoft.org