Discussion:
.ready files appearing on slaves
Jehan-Guillaume de Rorthais
2014-09-04 15:50:36 UTC
Permalink
Hi hackers,

Since few months, we occasionally see .ready files appearing on some slave
instances from various context. The two I have in mind are under 9.2.x.

I tried to investigate a bit. These .ready files are created when a WAL file
from pg_xlog has no corresponding file in pg_xlog/archive_status. I could
easily experience this by deleting such a file: it is created again at the next
restartpoint or checkpoint received from the master.

Looking at the WAL in pg_xlog folder corresponding to these .ready files, they
are all much older than the current WAL "cycle" in both mtime and name logic
sequence. As instance on one of these box we have currently 6 of those "ghost"
WALs:

0000000200001E53000000FF
0000000200001F18000000FF
0000000200002047000000FF
00000002000020BF000000FF
0000000200002140000000FF
0000000200002370000000FF
000000020000255D000000A8
000000020000255D000000A9
[...normal WAL sequence...]
000000020000255E0000009D

And on another box:

000000010000040E000000FF
0000000100000414000000DA
000000010000046E000000FF
0000000100000470000000FF
00000001000004850000000F
000000010000048500000010
[...normal WAL sequence...]
000000010000048500000052

So it seems for some reasons, these old WALs were "forgotten" by the
restartpoint mechanism when they should have been recylced/deleted.

For one of these servers, I could correlate this with some brutal disconnection
of the streaming replication appearing in its logs. But there was no known SR
disconnection on the second one.

Any idea about this weird behaviour? What can we do to help you investigate
further?

Regards,
--
Jehan-Guillaume de Rorthais
Dalibo
http://www.dalibo.com
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Jehan-Guillaume de Rorthais
2014-09-15 15:37:24 UTC
Permalink
Hi hackers,

An issue that seems related to this has been posted on pgsql-admin. See:

http://www.postgresql.org/message-id/CAAS3tyLnXaYDZ0+zhXLPdVtOvHQOvR+***@mail.gmail.com

How can we help on this issue?

Cheers,

On Thu, 4 Sep 2014 17:50:36 +0200
Post by Jehan-Guillaume de Rorthais
Hi hackers,
Since few months, we occasionally see .ready files appearing on some slave
instances from various context. The two I have in mind are under 9.2.x.
I tried to investigate a bit. These .ready files are created when a WAL file
from pg_xlog has no corresponding file in pg_xlog/archive_status. I could
easily experience this by deleting such a file: it is created again at the
next restartpoint or checkpoint received from the master.
Looking at the WAL in pg_xlog folder corresponding to these .ready files, they
are all much older than the current WAL "cycle" in both mtime and name logic
sequence. As instance on one of these box we have currently 6 of those "ghost"
0000000200001E53000000FF
0000000200001F18000000FF
0000000200002047000000FF
00000002000020BF000000FF
0000000200002140000000FF
0000000200002370000000FF
000000020000255D000000A8
000000020000255D000000A9
[...normal WAL sequence...]
000000020000255E0000009D
000000010000040E000000FF
0000000100000414000000DA
000000010000046E000000FF
0000000100000470000000FF
00000001000004850000000F
000000010000048500000010
[...normal WAL sequence...]
000000010000048500000052
So it seems for some reasons, these old WALs were "forgotten" by the
restartpoint mechanism when they should have been recylced/deleted.
For one of these servers, I could correlate this with some brutal
disconnection of the streaming replication appearing in its logs. But there
was no known SR disconnection on the second one.
Any idea about this weird behaviour? What can we do to help you investigate
further?
Regards,
--
Jehan-Guillaume de Rorthais
Dalibo
http://www.dalibo.com
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Jehan-Guillaume de Rorthais
2014-09-18 16:07:34 UTC
Permalink
Hi hackers,

We spend some time with Guillaume Lelarge studying this issue.

CreateRestartPoint() calls RemoveOldXlogFiles() to drop/recycle old WALs. This
one is calling XLogArchiveCheckDone() to check if the given WAL can be dropped.
As our slave has "archive_mode" & "archive_command" set, XLogArchivingActive()
returns true, forcing XLogArchiveCheckDone() to look for a ".done" file. If the
corresponding ".done" file doesn't exist, XLogArchiveCheckDone() calls
XLogArchiveNotify() to actually create the ".ready" file.


Now, on a standby, we are supposed to create the ".done" files by calling
XLogArchiveForceDone() either after a successful restore_command or when the
streaming replication finishes and flushes a WAL. It seems like the root cause
of this trouble: maybe a race condition where XLogArchiveForceDone() would not
be called for a WAL. As the ".done" file is never created, we end up calling
XLogArchiveNotify() to create a ".ready" and this WAL will never be archived by
the slave, sitting in the pg_xlog dir.

But worst, we experience this under two 9.2.8 clusters and realized
this version is not supposed to have 256 segments per WAL, finishing a WAL to
the *FE one. I'm quoting back the history of my previous mail as we clearly
have some *FF files on the slave side!
Post by Jehan-Guillaume de Rorthais
Post by Jehan-Guillaume de Rorthais
000000010000040E000000FF
0000000100000414000000DA
000000010000046E000000FF
0000000100000470000000FF
00000001000004850000000F
000000010000048500000010
[...normal WAL sequence...]
000000010000048500000052
Note that it seems to happen only during streaming replication, not in pure Log
Shipping replication. We added the debug message on the master side to make
sure it never archive *FF file

We kept the WAL files and log files for further analysis. How can we help
regarding this issue?

Regards,


On Mon, 15 Sep 2014 17:37:24 +0200
Post by Jehan-Guillaume de Rorthais
Hi hackers,
How can we help on this issue?
Cheers,
On Thu, 4 Sep 2014 17:50:36 +0200
Post by Jehan-Guillaume de Rorthais
Hi hackers,
Since few months, we occasionally see .ready files appearing on some slave
instances from various context. The two I have in mind are under 9.2.x.
I tried to investigate a bit. These .ready files are created when a WAL file
from pg_xlog has no corresponding file in pg_xlog/archive_status. I could
easily experience this by deleting such a file: it is created again at the
next restartpoint or checkpoint received from the master.
Looking at the WAL in pg_xlog folder corresponding to these .ready files,
they are all much older than the current WAL "cycle" in both mtime and name
logic sequence. As instance on one of these box we have currently 6 of
0000000200001E53000000FF
0000000200001F18000000FF
0000000200002047000000FF
00000002000020BF000000FF
0000000200002140000000FF
0000000200002370000000FF
000000020000255D000000A8
000000020000255D000000A9
[...normal WAL sequence...]
000000020000255E0000009D
000000010000040E000000FF
0000000100000414000000DA
000000010000046E000000FF
0000000100000470000000FF
00000001000004850000000F
000000010000048500000010
[...normal WAL sequence...]
000000010000048500000052
So it seems for some reasons, these old WALs were "forgotten" by the
restartpoint mechanism when they should have been recylced/deleted.
For one of these servers, I could correlate this with some brutal
disconnection of the streaming replication appearing in its logs. But there
was no known SR disconnection on the second one.
Any idea about this weird behaviour? What can we do to help you investigate
further?
Regards,
--
Jehan-Guillaume de Rorthais
Dalibo
http://www.dalibo.com
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Michael Paquier
2014-10-08 07:44:59 UTC
Permalink
On Fri, Sep 19, 2014 at 1:07 AM, Jehan-Guillaume de Rorthais <
Post by Jehan-Guillaume de Rorthais
We kept the WAL files and log files for further analysis. How can we help
regarding this issue?
Commit c2f79ba has added as assumption that the WAL receiver should always
enforce the create of .done files when WAL files are done being streamed
(XLogWalRcvWrite and WalReceiverMain) or archived
(KeepFileRestoredFromArchive). Then using this assumption 1bd42cd has
changed a bit RemoveOldXlogFiles, removing a check looking if the node is
in recovery. Now, based on the information given here yes it happens that
there are still cases where .done file creation is not correctly done,
leading to those extra files. Even by looking at the code, I am not
directly seeing any code paths where an extra call to XLogArchiveForceDone
would be needed on the WAL receiver side but... Something like the patch
attached (which is clearly a band-aid) may help though as it would make
files to be removed even if they are not marked as .done for a node in
recovery. And this is consistent with the pre-1bd42cd.

Comments welcome.
--
Michael
Heikki Linnakangas
2014-10-08 09:54:00 UTC
Permalink
Post by Michael Paquier
On Fri, Sep 19, 2014 at 1:07 AM, Jehan-Guillaume de Rorthais <
Post by Jehan-Guillaume de Rorthais
We kept the WAL files and log files for further analysis. How can we help
regarding this issue?
Commit c2f79ba has added as assumption that the WAL receiver should always
enforce the create of .done files when WAL files are done being streamed
(XLogWalRcvWrite and WalReceiverMain) or archived
(KeepFileRestoredFromArchive). Then using this assumption 1bd42cd has
changed a bit RemoveOldXlogFiles, removing a check looking if the node is
in recovery. Now, based on the information given here yes it happens that
there are still cases where .done file creation is not correctly done,
leading to those extra files. Even by looking at the code, I am not
directly seeing any code paths where an extra call to XLogArchiveForceDone
would be needed on the WAL receiver side but... Something like the patch
attached (which is clearly a band-aid) may help though as it would make
files to be removed even if they are not marked as .done for a node in
recovery. And this is consistent with the pre-1bd42cd.
There are two mysteries here:

1. Where do the FF files come from? In 9.2, FF-segments are not supposed
to created, ever.

Since this only happens with streaming replication, the FF segments are
probably being created by walreceiver. XLogWalRcvWrite is the function
that opens the file. I don't see anything obviously wrong there.
XLogWalRcvWrite opens the file corresponding the start position in the
message received from the master. There is no check that the start
position is valid, though; if the master sends a start position in the
FF segment, walreceiver will merrily write it. So the problem could be
in the walsender side. However, I don't see anything wrong there either.

I think we should add a check in walreceiver, to throw an error if the
master sends an invalid WAL pointer, pointing to an FF segment.


2. Why are the .done files sometimes not being created?

I may have an explanation for that. Walreceiver creates a .done file
when it closes an old segment and opens a new one. However, it does this
only when it's about to start writing to the new segment, and still has
the old segment open. If you stream the FE segment fully, but drop
replication connection at exactly that point, the .done file is not
created. That might sound unlikely, but it's actually pretty easy to
trigger. Just do "select pg_switch_xlog()" in the master, followed by
"pg_ctl stop -m i" and a restart.

The creation of the .done files seems quite unreliable anyway. If only a
portion of a segment is streamed, we don't write a .done file for it, so
we still have the original problem that we will try to archive the
segment after failover, even though the master might already have
archived it.

I looked again at the thread where this was discussed:
http://www.postgresql.org/message-id/flat/CAHGQGwHVYqbX=A+zo+***@mail.gmail.com.
I believe the idea was that the server that generates a WAL segment is
always responsible for archiving it. A standby should never attempt to
archive a WAL segment that was restored from the archive, or streamed
from the master.

In that thread, it was not discussed what should happen to WAL files
that an admin manually copies into pg_xlog of the standby. Should the
standby archive them? I don't think so - the admin should copy them
manually to the archive too, if he wants them archived. It's a good and
simple rule that the server that generates the WAL, archives the WAL.

Instead of creating any .done files during recovery, we could scan
pg_xlog at promotion, and create a .done file for every WAL segment
that's present at that point. That would be more robust. And then apply
your patch, to recycle old segments during archive recovery, ignoring
.done files.

- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Michael Paquier
2014-10-08 13:00:42 UTC
Permalink
Post by Heikki Linnakangas
1. Where do the FF files come from? In 9.2, FF-segments are not supposed
to created, ever.
Since this only happens with streaming replication, the FF segments are
Post by Heikki Linnakangas
probably being created by walreceiver. XLogWalRcvWrite is the function that
opens the file. I don't see anything obviously wrong there. XLogWalRcvWrite
opens the file corresponding the start position in the message received
from the master. There is no check that the start position is valid,
though; if the master sends a start position in the FF segment, walreceiver
will merrily write it. So the problem could be in the walsender side.
However, I don't see anything wrong there either.
Good to hear that. By looking at the wal receiver and sender code paths, I
found nothing really weird.

I think we should add a check in walreceiver, to throw an error if the
Post by Heikki Linnakangas
master sends an invalid WAL pointer, pointing to an FF segment.
Then we're good for a check in ProcessWalSndrMessage for walEnd I guess.
Seems like a straight-forward patch.

2. Why are the .done files sometimes not being created?
Post by Heikki Linnakangas
I may have an explanation for that. Walreceiver creates a .done file when
it closes an old segment and opens a new one. However, it does this only
when it's about to start writing to the new segment, and still has the old
segment open. If you stream the FE segment fully, but drop replication
connection at exactly that point, the .done file is not created. That might
sound unlikely, but it's actually pretty easy to trigger. Just do "select
pg_switch_xlog()" in the master, followed by "pg_ctl stop -m i" and a
restart.
That's exactly the test I have been doing a couple of times to trigger this
behavior before sending my previous email, but without success on the
standby with master: all the WAL files were marked as .done. Now, I have
just retried it, with far more tries on REL9_3_STABLE and on HEAD and I
have been able to actually trigger the problem a couple of times. Simply
run a long transaction generating a lot of WAL like that:
=# create table aa as select generate_series(1,1000000000);
And then run that:
$ psql -c 'select pg_switch_xlog()'; pg_ctl stop -m immediate; pg_ctl start
And with enough "luck", .ready files may appear. It may take a dozen of
tries before seeing at least ones. And I noticed that generally multiple
.ready files appear at the same time.
Post by Heikki Linnakangas
The creation of the .done files seems quite unreliable anyway. If only a
portion of a segment is streamed, we don't write a .done file for it, so we
still have the original problem that we will try to archive the segment
after failover, even though the master might already have archived it.
Yep. Agreed.
Post by Heikki Linnakangas
I believe the idea was that the server that generates a WAL segment is
always responsible for archiving it. A standby should never attempt to
archive a WAL segment that was restored from the archive, or streamed from
the master.
In that thread, it was not discussed what should happen to WAL files that
Post by Heikki Linnakangas
an admin manually copies into pg_xlog of the standby. Should the standby
archive them? I don't think so - the admin should copy them manually to the
archive too, if he wants them archived. It's a good and simple rule that
the server that generates the WAL, archives the WAL.
Question time: why has the check based on recovery state of the node been
removed in 1bd42cd? Just assuming, but did you have in mind that relying on
XLogArchiveForceDone and XLogArchiveCheckDone was enough and more robust at
this point?

Instead of creating any .done files during recovery, we could scan pg_xlog
Post by Heikki Linnakangas
at promotion, and create a .done file for every WAL segment that's present
at that point. That would be more robust. And then apply your patch, to
recycle old segments during archive recovery, ignoring .done files.
The additional process at promotion sounds like a good idea, I'll try to
get a patch done tomorrow. This would result as well in removing the
XLogArchiveForceDone stuff. Either way, not that I have been able to
reproduce the problem manually, things can be clearly solved.
Regards,
--
Michael
Michael Paquier
2014-10-09 06:26:41 UTC
Permalink
Post by Michael Paquier
The additional process at promotion sounds like a good idea, I'll try to
get a patch done tomorrow. This would result as well in removing the
XLogArchiveForceDone stuff. Either way, not that I have been able to
reproduce the problem manually, things can be clearly solved.
Please find attached two patches aimed to fix this issue and to improve the
situation:
- 0001 prevents the apparition of those phantom WAL segment file by
ensuring that when a node is in recovery it will remove it whatever its
status in archive_status. This patch is the real fix, and should be applied
down to 9.2.
- 0002 is a patch implementing Heikki's idea of enforcing all the segment
files present in pg_xlog to have their status to .done, marking them for
removal. When looking at the code, I finally concluded that Fujii-san's
point, about marking in all cases as .done segment files that have been
fully streamed, actually makes more sense to not be backward. This patch
would actually not be mandatory for back-patching, but it makes the process
more robust IMO.

I imagine that it would be nice to get those things fixed before the next
minor release.
Regards,
--
Michael
Fujii Masao
2014-10-17 12:23:43 UTC
Permalink
On Thu, Oct 9, 2014 at 3:26 PM, Michael Paquier
Post by Michael Paquier
Post by Michael Paquier
The additional process at promotion sounds like a good idea, I'll try to
get a patch done tomorrow. This would result as well in removing the
XLogArchiveForceDone stuff. Either way, not that I have been able to
reproduce the problem manually, things can be clearly solved.
Please find attached two patches aimed to fix this issue and to improve the
- 0001 prevents the apparition of those phantom WAL segment file by ensuring
that when a node is in recovery it will remove it whatever its status in
archive_status. This patch is the real fix, and should be applied down to
9.2.
- 0002 is a patch implementing Heikki's idea of enforcing all the segment
files present in pg_xlog to have their status to .done, marking them for
removal. When looking at the code, I finally concluded that Fujii-san's
point, about marking in all cases as .done segment files that have been
fully streamed, actually makes more sense to not be backward. This patch
would actually not be mandatory for back-patching, but it makes the process
more robust IMO.
Thanks for the patches!

I found one problem in the 0002 patch. The patch changes the recovery so that
it creates .done files for every WAL files which exist in pg_xlog directory at
the end of recovery. But even WAL files which will have to be archived later
can exist in pg_xlog at that moment. For example, the latest, recycled and
fully-written-but-not-archived-yet (i.e., maybe having .ready files) WAL files.
The patch wrongly prevents them from being archvied at all.

ISTM that the 0001 patch has the similar problem. Please imagine the following
scenario.

1. There are many unarchived WAL files in pg_xlog because of the continuous
failure of archive_command, for example, and then the server unfortunately
crashes because of the corruption of database itself.

2. DBA restores the backup onto the server and copies all the WAL files
from old pg_xlog to new one. Then he or she prepares for archive recovery.

3. DBA starts the server and the archive recovery starts.

4. After all the archived WAL files are replayed, the server tries to replay
the WAL files in pg_xlog. Since there are many WAL files in pg_xlog,
more than one restartpoints happen while they are being replayed.

In this case, the patch seems to make the restartpoint recycle even WAL files
which have .ready files and will have to be archived later. Thought?

Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Fujii Masao
2014-10-17 13:12:06 UTC
Permalink
Post by Fujii Masao
On Thu, Oct 9, 2014 at 3:26 PM, Michael Paquier
Post by Michael Paquier
Post by Michael Paquier
The additional process at promotion sounds like a good idea, I'll try to
get a patch done tomorrow. This would result as well in removing the
XLogArchiveForceDone stuff. Either way, not that I have been able to
reproduce the problem manually, things can be clearly solved.
Please find attached two patches aimed to fix this issue and to improve the
- 0001 prevents the apparition of those phantom WAL segment file by ensuring
that when a node is in recovery it will remove it whatever its status in
archive_status. This patch is the real fix, and should be applied down to
9.2.
- 0002 is a patch implementing Heikki's idea of enforcing all the segment
files present in pg_xlog to have their status to .done, marking them for
removal. When looking at the code, I finally concluded that Fujii-san's
point, about marking in all cases as .done segment files that have been
fully streamed, actually makes more sense to not be backward. This patch
would actually not be mandatory for back-patching, but it makes the process
more robust IMO.
Thanks for the patches!
While reviewing the patch, I found another bug related to WAL file in recovery
mode. The problem is that exitArchiveRecovery() always creates .ready file for
the last WAL file of the old timeline even when it's restored from the archive
and has .done file. So this causes the already-archived WAL file to be archived
again.... Attached patch fixes this bug.

Regards,
--
Fujii Masao
Michael Paquier
2014-10-20 08:15:20 UTC
Permalink
Post by Michael Paquier
Post by Fujii Masao
On Thu, Oct 9, 2014 at 3:26 PM, Michael Paquier
On Wed, Oct 8, 2014 at 10:00 PM, Michael Paquier <
Post by Michael Paquier
The additional process at promotion sounds like a good idea, I'll try
to
Post by Fujii Masao
Post by Michael Paquier
get a patch done tomorrow. This would result as well in removing the
XLogArchiveForceDone stuff. Either way, not that I have been able to
reproduce the problem manually, things can be clearly solved.
Please find attached two patches aimed to fix this issue and to improve
the
Post by Fujii Masao
- 0001 prevents the apparition of those phantom WAL segment file by
ensuring
Post by Fujii Masao
that when a node is in recovery it will remove it whatever its status in
archive_status. This patch is the real fix, and should be applied down
to
Post by Fujii Masao
9.2.
- 0002 is a patch implementing Heikki's idea of enforcing all the
segment
Post by Fujii Masao
files present in pg_xlog to have their status to .done, marking them for
removal. When looking at the code, I finally concluded that Fujii-san's
point, about marking in all cases as .done segment files that have been
fully streamed, actually makes more sense to not be backward. This patch
would actually not be mandatory for back-patching, but it makes the
process
Post by Fujii Masao
more robust IMO.
Thanks for the patches!
While reviewing the patch, I found another bug related to WAL file in recovery
mode. The problem is that exitArchiveRecovery() always creates .ready file for
the last WAL file of the old timeline even when it's restored from the archive
and has .done file. So this causes the already-archived WAL file to be archived
again.... Attached patch fixes this bug.
That's a good catch! Patch looks good. I think that you should change
xlogpath to use MAXFNAMELEN instead of MAXPGPATH in exitArchiveRecovery.
This is only for correctness, so that's a master-only remark, because this
variable is used only to calculate a segment file name, and not a path.
Renaming the variable from xlogpath to xlogname would make sense as well.
Regards,
--
Michael
Fujii Masao
2014-10-23 07:34:07 UTC
Permalink
On Mon, Oct 20, 2014 at 5:15 PM, Michael Paquier
Post by Michael Paquier
Post by Fujii Masao
Post by Fujii Masao
On Thu, Oct 9, 2014 at 3:26 PM, Michael Paquier
On Wed, Oct 8, 2014 at 10:00 PM, Michael Paquier
Post by Michael Paquier
The additional process at promotion sounds like a good idea, I'll try to
get a patch done tomorrow. This would result as well in removing the
XLogArchiveForceDone stuff. Either way, not that I have been able to
reproduce the problem manually, things can be clearly solved.
Please find attached two patches aimed to fix this issue and to improve the
- 0001 prevents the apparition of those phantom WAL segment file by ensuring
that when a node is in recovery it will remove it whatever its status in
archive_status. This patch is the real fix, and should be applied down to
9.2.
- 0002 is a patch implementing Heikki's idea of enforcing all the segment
files present in pg_xlog to have their status to .done, marking them for
removal. When looking at the code, I finally concluded that Fujii-san's
point, about marking in all cases as .done segment files that have been
fully streamed, actually makes more sense to not be backward. This patch
would actually not be mandatory for back-patching, but it makes the process
more robust IMO.
Thanks for the patches!
While reviewing the patch, I found another bug related to WAL file in recovery
mode. The problem is that exitArchiveRecovery() always creates .ready file for
the last WAL file of the old timeline even when it's restored from the archive
and has .done file. So this causes the already-archived WAL file to be archived
again.... Attached patch fixes this bug.
That's a good catch! Patch looks good. I think that you should change
xlogpath to use MAXFNAMELEN instead of MAXPGPATH in exitArchiveRecovery.
This is only for correctness, so that's a master-only remark, because this
variable is used only to calculate a segment file name, and not a path.
Renaming the variable from xlogpath to xlogname would make sense as well.
Thanks for the review! Applied.

Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Michael Paquier
2014-10-17 13:37:08 UTC
Permalink
Post by Fujii Masao
In this case, the patch seems to make the restartpoint recycle even WAL files
which have .ready files and will have to be archived later. Thought?
The real problem currently is that it is possible to have a segment file
not marked as .done during recovery when stream connection is abruptly cut
when this segment is switched, marking it as .ready in archive_status and
simply letting this segment in pg_xlog because it will neither be recycled
nor removed. I have not been able to look much at this code these days, so
I am not sure how invasive it would be in back-branches, but perhaps we
should try to improve code such as when a segment file is switched and
connection to the is cut, we guarantee that this file is completed and
marked as .done.
--
Michael
Michael Paquier
2014-10-20 06:26:27 UTC
Permalink
On Fri, Oct 17, 2014 at 10:37 PM, Michael Paquier
Post by Fujii Masao
In this case, the patch seems to make the restartpoint recycle even WAL files
which have .ready files and will have to be archived later. Thought?
The real problem currently is that it is possible to have a segment file not marked as .done during recovery when stream connection is abruptly cut when this segment is switched, marking it as .ready in archive_status and simply letting this segment in pg_xlog because it will neither be recycled nor removed. I have not been able to look much at this code these days, so I am not sure how invasive it would be in back-branches, but perhaps we should try to improve code such as when a segment file is switched and connection to the is cut, we guarantee that this file is completed and marked as .done.
I have spent more time on that, with a bit more of underground work...
First, the problem can be reproduced most of the time by running this
simple command:
psql -c 'select pg_switch_xlog()'; pg_ctl restart -m immediate

This will enforce a segment file switch and restart the master in
crash recovery. This has as effect to immediately cut the WAL stream
on slave, symbolized by a FATAL in libpqrcv_receive where rawlen == 0.
For example, let's imagine that stream fails when switching from
000000010000000000000003 to the next segment, then the
the last XLogRecPtr in XLogWalRcvProcessMsg for dataStart is for
example 0/3100000, and walrcv->latestWalEnd is 0/4000000. When stream
restarts it will begin once again from 0/4000000, ignoring that
000000010000000000000003 should be marked as .done, ultimately marking
it in .ready state when old segment files are recycled or removed.
There is nothing that can really be done to enforce the creation of a
.done file before the FATAL of libpqrcv_receive because we cannot
predict the stream failure..

Now, we can do better than what we have now by looking at WAL start
position used when starting streaming in WAL receiver and enforce
.done if the start position is the last one of previous segment.
Hence, in the case of start position 0/4000000 that was found
previously, the file that will be enforced to .done is
000000010000000000000003. I have written the patch attached that
implements this idea and fixes the problem. Now let's see if you guys
see any flaws in this simple logic which uses a sniper gun instead of
a bazooka as in the previous patches sent.

Regards,
--
Michael
Heikki Linnakangas
2014-10-22 11:53:53 UTC
Permalink
Post by Michael Paquier
On Fri, Oct 17, 2014 at 10:37 PM, Michael Paquier
Post by Fujii Masao
In this case, the patch seems to make the restartpoint recycle even WAL files
which have .ready files and will have to be archived later. Thought?
The real problem currently is that it is possible to have a segment file not marked as .done during recovery when stream connection is abruptly cut when this segment is switched, marking it as .ready in archive_status and simply letting this segment in pg_xlog because it will neither be recycled nor removed. I have not been able to look much at this code these days, so I am not sure how invasive it would be in back-branches, but perhaps we should try to improve code such as when a segment file is switched and connection to the is cut, we guarantee that this file is completed and marked as .done.
I have spent more time on that, with a bit more of underground work...
First, the problem can be reproduced most of the time by running this
psql -c 'select pg_switch_xlog()'; pg_ctl restart -m immediate
This will enforce a segment file switch and restart the master in
crash recovery. This has as effect to immediately cut the WAL stream
on slave, symbolized by a FATAL in libpqrcv_receive where rawlen == 0.
For example, let's imagine that stream fails when switching from
000000010000000000000003 to the next segment, then the
the last XLogRecPtr in XLogWalRcvProcessMsg for dataStart is for
example 0/3100000, and walrcv->latestWalEnd is 0/4000000. When stream
restarts it will begin once again from 0/4000000, ignoring that
000000010000000000000003 should be marked as .done, ultimately marking
it in .ready state when old segment files are recycled or removed.
There is nothing that can really be done to enforce the creation of a
.done file before the FATAL of libpqrcv_receive because we cannot
predict the stream failure..
Now, we can do better than what we have now by looking at WAL start
position used when starting streaming in WAL receiver and enforce
.done if the start position is the last one of previous segment.
Hence, in the case of start position 0/4000000 that was found
previously, the file that will be enforced to .done is
000000010000000000000003. I have written the patch attached that
implements this idea and fixes the problem. Now let's see if you guys
see any flaws in this simple logic which uses a sniper gun instead of
a bazooka as in the previous patches sent.
Hmm. This will still miss the .done file if you don't re-establish the
streaming replication connection after the restart. For example, if you
shut down the master, and promote the standby server.

I think we should take a more wholesale approach to this. We should
enforce the rule that the server only ever archives WAL files belonging
to the same timeline that the server generates. IOW, the server only
archives the WAL that it has generated.

- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Heikki Linnakangas
2014-10-22 19:28:15 UTC
Permalink
Post by Heikki Linnakangas
I think we should take a more wholesale approach to this. We should
enforce the rule that the server only ever archives WAL files belonging to
the same timeline that the server generates. IOW, the server only archives
the WAL that it has generated.
Hm?! Would that be really back-patchable? There may be in the wild tools or
users that rely on the fact a node archives segment files from all
timelines.
Hmm, so it would be a tool or user that manually copies a file to the
pg_xlog directory of a standby server, and expects the standby to
archive the file after promotion. That seems a bit far-fetched, although
I've seen people do strange things. I think it would be acceptable as
long as we document the change in behavior in the release notes. I don't
have much hope that we'll ever be able to nail down the correct behavior
with the current approach.

- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Fujii Masao
2014-10-23 05:59:13 UTC
Permalink
On Mon, Oct 20, 2014 at 3:26 PM, Michael Paquier
Post by Michael Paquier
On Fri, Oct 17, 2014 at 10:37 PM, Michael Paquier
Post by Fujii Masao
In this case, the patch seems to make the restartpoint recycle even WAL files
which have .ready files and will have to be archived later. Thought?
The real problem currently is that it is possible to have a segment file not marked as .done during recovery when stream connection is abruptly cut when this segment is switched, marking it as .ready in archive_status and simply letting this segment in pg_xlog because it will neither be recycled nor removed. I have not been able to look much at this code these days, so I am not sure how invasive it would be in back-branches, but perhaps we should try to improve code such as when a segment file is switched and connection to the is cut, we guarantee that this file is completed and marked as .done.
I have spent more time on that, with a bit more of underground work...
First, the problem can be reproduced most of the time by running this
psql -c 'select pg_switch_xlog()'; pg_ctl restart -m immediate
What about fixing this problem directly? That is, we can make walreceiver
check whether the end of last received WAL data is the end of current WAL file
or not, and then close the WAL file and create .done file if the test is true.

This is not a perfect solution. If the standby crashes during very
short interval
(i.e., after closing the WAL file and before creating .done file), the problem
would happen again. But it can really rarely happen, so I don't think that it's
worth fixing the corner case at least in back-branches. Of course, we can
find out the "perfect" solution for the master, though.

Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Heikki Linnakangas
2014-10-23 08:09:16 UTC
Permalink
Post by Fujii Masao
On Mon, Oct 20, 2014 at 3:26 PM, Michael Paquier
Post by Michael Paquier
On Fri, Oct 17, 2014 at 10:37 PM, Michael Paquier
Post by Fujii Masao
In this case, the patch seems to make the restartpoint recycle even WAL files
which have .ready files and will have to be archived later. Thought?
The real problem currently is that it is possible to have a segment file not marked as .done during recovery when stream connection is abruptly cut when this segment is switched, marking it as .ready in archive_status and simply letting this segment in pg_xlog because it will neither be recycled nor removed. I have not been able to look much at this code these days, so I am not sure how invasive it would be in back-branches, but perhaps we should try to improve code such as when a segment file is switched and connection to the is cut, we guarantee that this file is completed and marked as .done.
I have spent more time on that, with a bit more of underground work...
First, the problem can be reproduced most of the time by running this
psql -c 'select pg_switch_xlog()'; pg_ctl restart -m immediate
What about fixing this problem directly? That is, we can make walreceiver
check whether the end of last received WAL data is the end of current WAL file
or not, and then close the WAL file and create .done file if the test is true.
This is not a perfect solution. If the standby crashes during very
short interval
(i.e., after closing the WAL file and before creating .done file), the problem
would happen again. But it can really rarely happen, so I don't think that it's
worth fixing the corner case at least in back-branches. Of course, we can
find out the "perfect" solution for the master, though.
Sounds reasonable, for back-branches. Although I'm still worried we
might miss some corner-case unless we go with a more wholesale solution.

At least for master, we should consider changing the way the archiving
works so that we only archive WAL that was generated in the same server.
I.e. we should never try to archive WAL files belonging to another timeline.

I just remembered that we discussed a different problem related to this
some time ago, at
http://www.postgresql.org/message-id/***@lab.ntt.co.jp.
The conclusion of that was that at promotion, we should not archive the
last, partial, segment from the old timeline. Also, we could make the
archive recovery more precise in what WAL files it tries to restore. For
example, if we're doing recovery with target timeline 4, but the
timeline history says that segment AB came from timeline 3 (i.e. the
switch to timeline 4 happened within a later segment), we should not try
to restore segment AB from timeline 4. It shouldn't exist, so trying to
restore it is a waste of cycles, but more importantly, if there's some
sort of confusion with the timelines, and the file exists anyway, we
should not restore it. Also, if the timeline history says e.g. that we
switched from TLI 3 to 4 at WAL position 0/AD123456, and we restore
segment AD from timeline 3, we should stop recovering that segment at
position 0/AD123456 and try to find the AD segment from timeline 4. We
currently only do that when streaming, but we should also do that when
restoring from archive.

In summary, let's do something small for back-branches, like what you
suggested. But for master, let's do bigger changes to the timeline
handling. Would you like to write a patch for the back-branches, if I
start working on master?

- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Michael Paquier
2014-10-23 10:25:58 UTC
Permalink
On Thu, Oct 23, 2014 at 10:09 AM, Heikki Linnakangas <
Sounds reasonable, for back-branches. Although I'm still worried we might
miss some corner-case unless we go with a more wholesale solution.
Don't really want to be the intruder here, but isn't that the simple patch
attached? There is still a small window between XLogWalRcvFlush and
XLogArchiveForceDone in XLogWalRcvWrite if the standby crashes exactly
between them.
--
Michael
Heikki Linnakangas
2014-10-23 11:45:07 UTC
Permalink
Post by Michael Paquier
On Thu, Oct 23, 2014 at 10:09 AM, Heikki Linnakangas <
Sounds reasonable, for back-branches. Although I'm still worried we might
miss some corner-case unless we go with a more wholesale solution.
Don't really want to be the intruder here, but isn't that the simple patch
attached?
That's not right. Should check *after* the write if the segment was
completed, and close it if so. Like the attached.
Post by Michael Paquier
There is still a small window between XLogWalRcvFlush and
XLogArchiveForceDone in XLogWalRcvWrite if the standby crashes exactly
between them.
Yeah. I think we can live with that.

- Heikki
Fujii Masao
2014-10-23 12:23:08 UTC
Permalink
On Thu, Oct 23, 2014 at 8:45 PM, Heikki Linnakangas
Post by Heikki Linnakangas
Post by Michael Paquier
On Thu, Oct 23, 2014 at 10:09 AM, Heikki Linnakangas <
Sounds reasonable, for back-branches. Although I'm still worried we might
miss some corner-case unless we go with a more wholesale solution.
Don't really want to be the intruder here, but isn't that the simple patch
attached?
That's not right. Should check *after* the write if the segment was
completed, and close it if so. Like the attached.
Looks good to me. WalReceiverMain has almost the same code as
what XLogWalRcvFileClose does. So we can refactor that.
Post by Heikki Linnakangas
Post by Michael Paquier
There is still a small window between XLogWalRcvFlush and
XLogArchiveForceDone in XLogWalRcvWrite if the standby crashes exactly
between them.
Yeah. I think we can live with that.
Yes.

Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Fujii Masao
2014-10-23 12:34:07 UTC
Permalink
Post by Fujii Masao
On Thu, Oct 23, 2014 at 8:45 PM, Heikki Linnakangas
Post by Heikki Linnakangas
Post by Michael Paquier
On Thu, Oct 23, 2014 at 10:09 AM, Heikki Linnakangas <
Sounds reasonable, for back-branches. Although I'm still worried we might
miss some corner-case unless we go with a more wholesale solution.
Don't really want to be the intruder here, but isn't that the simple patch
attached?
That's not right. Should check *after* the write if the segment was
completed, and close it if so. Like the attached.
Looks good to me. WalReceiverMain has almost the same code as
what XLogWalRcvFileClose does. So we can refactor that.
While looking at the code of WAL archiving and recovery, I found
another small issue. The standby always creates .ready file for
the timeline history file even when WAL archiving is not enabled.
Since WAL archiving is off, that .ready file will remain infinitely.
Probably this is almost harmless but confusing, so I'd like to fix that.
Patch attached. Thought?

Regards,
--
Fujii Masao
Michael Paquier
2014-10-23 12:53:35 UTC
Permalink
Post by Fujii Masao
Post by Fujii Masao
On Thu, Oct 23, 2014 at 8:45 PM, Heikki Linnakangas
Post by Heikki Linnakangas
Post by Michael Paquier
On Thu, Oct 23, 2014 at 10:09 AM, Heikki Linnakangas <
Post by Heikki Linnakangas
Sounds reasonable, for back-branches. Although I'm still worried we
might
Post by Fujii Masao
Post by Heikki Linnakangas
Post by Michael Paquier
Post by Heikki Linnakangas
miss some corner-case unless we go with a more wholesale solution.
Don't really want to be the intruder here, but isn't that the simple
patch
Post by Fujii Masao
Post by Heikki Linnakangas
Post by Michael Paquier
attached?
That's not right. Should check *after* the write if the segment was
completed, and close it if so. Like the attached.
Looks good to me. WalReceiverMain has almost the same code as
what XLogWalRcvFileClose does. So we can refactor that.
While looking at the code of WAL archiving and recovery, I found
another small issue. The standby always creates .ready file for
the timeline history file even when WAL archiving is not enabled.
Since WAL archiving is off, that .ready file will remain infinitely.
Probably this is almost harmless but confusing, so I'd like to fix that.
Patch attached. Thought?
Good catch once again. We could as well put the check of
XLogArchivingActive directly in XLogArchiveNotify...
--
Michael
Michael Paquier
2014-10-23 13:16:29 UTC
Permalink
Post by Heikki Linnakangas
Post by Michael Paquier
On Thu, Oct 23, 2014 at 10:09 AM, Heikki Linnakangas <
Sounds reasonable, for back-branches. Although I'm still worried we might
miss some corner-case unless we go with a more wholesale solution.
Don't really want to be the intruder here, but isn't that the simple patch
attached?
That's not right. Should check *after* the write if the segment was
completed, and close it if so. Like the attached.
There is still a small window between XLogWalRcvFlush and
Post by Michael Paquier
XLogArchiveForceDone in XLogWalRcvWrite if the standby crashes exactly
between them.
Yeah. I think we can live with that.
Just tested this patch with the combo pg_switch_xlog() + stop/immediate and
the apparition of the .ready files is not fixed.
Regards,
--
Michael
Michael Paquier
2014-10-23 14:19:37 UTC
Permalink
On Thu, Oct 23, 2014 at 1:45 PM, Heikki Linnakangas <
That's not right. Should check *after* the write if the segment was
completed, and close it if so. Like the attached.
Just tested this patch with the combo pg_switch_xlog() + stop/immediate
and the apparition of the .ready files is not fixed.
Btw, I think that we should pass walEnd to XLogWalRcvWrite and add an
additional check based on that after the write() loop calls to enforce the
segment to be switched to .done.
--
Michael
Fujii Masao
2014-10-23 11:55:35 UTC
Permalink
On Thu, Oct 23, 2014 at 5:09 PM, Heikki Linnakangas
Post by Fujii Masao
On Mon, Oct 20, 2014 at 3:26 PM, Michael Paquier
Post by Michael Paquier
On Fri, Oct 17, 2014 at 10:37 PM, Michael Paquier
Post by Michael Paquier
Post by Fujii Masao
In this case, the patch seems to make the restartpoint recycle even WAL files
which have .ready files and will have to be archived later. Thought?
The real problem currently is that it is possible to have a segment file
not marked as .done during recovery when stream connection is abruptly cut
when this segment is switched, marking it as .ready in archive_status and
simply letting this segment in pg_xlog because it will neither be recycled
nor removed. I have not been able to look much at this code these days, so I
am not sure how invasive it would be in back-branches, but perhaps we should
try to improve code such as when a segment file is switched and connection
to the is cut, we guarantee that this file is completed and marked as .done.
I have spent more time on that, with a bit more of underground work...
First, the problem can be reproduced most of the time by running this
psql -c 'select pg_switch_xlog()'; pg_ctl restart -m immediate
What about fixing this problem directly? That is, we can make walreceiver
check whether the end of last received WAL data is the end of current WAL file
or not, and then close the WAL file and create .done file if the test is true.
This is not a perfect solution. If the standby crashes during very
short interval
(i.e., after closing the WAL file and before creating .done file), the problem
would happen again. But it can really rarely happen, so I don't think that it's
worth fixing the corner case at least in back-branches. Of course, we can
find out the "perfect" solution for the master, though.
Sounds reasonable, for back-branches. Although I'm still worried we might
miss some corner-case unless we go with a more wholesale solution.
+1
At least for master, we should consider changing the way the archiving works
so that we only archive WAL that was generated in the same server. I.e. we
should never try to archive WAL files belonging to another timeline.
I just remembered that we discussed a different problem related to this some
time ago, at
The conclusion of that was that at promotion, we should not archive the
last, partial, segment from the old timeline.
So, the last, partial, segment of the old timeline is never archived?
If yes, I'm afraid that the PITR to the old timeline cannot replay the
last segment. No? Or you're thinking to change the code so that
the segment of new timeline is replayed in that case?
In summary, let's do something small for back-branches, like what you
suggested. But for master, let's do bigger changes to the timeline handling.
Yep.

Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Michael Paquier
2014-10-22 13:20:51 UTC
Permalink
Post by Fujii Masao
I found one problem in the 0002 patch. The patch changes the recovery so that
it creates .done files for every WAL files which exist in pg_xlog directory at
the end of recovery. But even WAL files which will have to be archived later
can exist in pg_xlog at that moment. For example, the latest, recycled and
fully-written-but-not-archived-yet (i.e., maybe having .ready files) WAL files.
The patch wrongly prevents them from being archived at all.
Re-looking at patch 2, yes you are right. Even if it was mentioned that we
should do that for a node that had promotion triggered it was not done this
way as a check on CheckForStandbyTrigger() is actually missing. Attached is
an updated patch.
Post by Fujii Masao
ISTM that the 0001 patch has the similar problem. Please imagine the following
scenario.
In this case, the patch seems to make the restartpoint recycle even WAL files
which have .ready files and will have to be archived later. Thought?
Right, that's really backward. This was the approach taken before c9cc7e0,
and this commit actually prevents removal of unarchived WAL files during
recovery.
--
Michael
Fujii Masao
2014-10-08 13:59:25 UTC
Permalink
On Wed, Oct 8, 2014 at 6:54 PM, Heikki Linnakangas
Post by Michael Paquier
On Fri, Sep 19, 2014 at 1:07 AM, Jehan-Guillaume de Rorthais <
Post by Jehan-Guillaume de Rorthais
We kept the WAL files and log files for further analysis. How can we help
regarding this issue?
Commit c2f79ba has added as assumption that the WAL receiver should always
enforce the create of .done files when WAL files are done being streamed
(XLogWalRcvWrite and WalReceiverMain) or archived
(KeepFileRestoredFromArchive). Then using this assumption 1bd42cd has
changed a bit RemoveOldXlogFiles, removing a check looking if the node is
in recovery. Now, based on the information given here yes it happens that
there are still cases where .done file creation is not correctly done,
leading to those extra files. Even by looking at the code, I am not
directly seeing any code paths where an extra call to XLogArchiveForceDone
would be needed on the WAL receiver side but... Something like the patch
attached (which is clearly a band-aid) may help though as it would make
files to be removed even if they are not marked as .done for a node in
recovery. And this is consistent with the pre-1bd42cd.
1. Where do the FF files come from? In 9.2, FF-segments are not supposed to
created, ever.
Since this only happens with streaming replication, the FF segments are
probably being created by walreceiver. XLogWalRcvWrite is the function that
opens the file. I don't see anything obviously wrong there. XLogWalRcvWrite
opens the file corresponding the start position in the message received from
the master. There is no check that the start position is valid, though; if
the master sends a start position in the FF segment, walreceiver will
merrily write it. So the problem could be in the walsender side. However, I
don't see anything wrong there either.
I think we should add a check in walreceiver, to throw an error if the
master sends an invalid WAL pointer, pointing to an FF segment.
2. Why are the .done files sometimes not being created?
I may have an explanation for that. Walreceiver creates a .done file when it
closes an old segment and opens a new one. However, it does this only when
it's about to start writing to the new segment, and still has the old
segment open. If you stream the FE segment fully, but drop replication
connection at exactly that point, the .done file is not created. That might
sound unlikely, but it's actually pretty easy to trigger. Just do "select
pg_switch_xlog()" in the master, followed by "pg_ctl stop -m i" and a
restart.
The creation of the .done files seems quite unreliable anyway. If only a
portion of a segment is streamed, we don't write a .done file for it, so we
still have the original problem that we will try to archive the segment
after failover, even though the master might already have archived it.
I believe the idea was that the server that generates a WAL segment is
always responsible for archiving it. A standby should never attempt to
archive a WAL segment that was restored from the archive, or streamed from
the master.
In that thread, it was not discussed what should happen to WAL files that an
admin manually copies into pg_xlog of the standby. Should the standby
archive them? I don't think so - the admin should copy them manually to the
archive too, if he wants them archived. It's a good and simple rule that the
server that generates the WAL, archives the WAL.
Instead of creating any .done files during recovery, we could scan pg_xlog
at promotion, and create a .done file for every WAL segment that's present
at that point. That would be more robust. And then apply your patch, to
recycle old segments during archive recovery, ignoring .done files.
What happens if a user shutdowns the standby, removes recovery.conf and
starts the server as the master? In this case, no WAL files have .done status
files, so the server will create .ready and archive all of them. Probably this
is problematic. So even if we adopt your idea, ISTM that it's better to create
.done file whenever WAL file is fullly streamed and restored.

Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Heikki Linnakangas
2014-10-08 14:38:58 UTC
Permalink
Post by Fujii Masao
On Wed, Oct 8, 2014 at 6:54 PM, Heikki Linnakangas
Post by Michael Paquier
Instead of creating any .done files during recovery, we could scan pg_xlog
at promotion, and create a .done file for every WAL segment that's present
at that point. That would be more robust. And then apply your patch, to
recycle old segments during archive recovery, ignoring .done files.
What happens if a user shutdowns the standby, removes recovery.conf and
starts the server as the master?
Um, that's not a safe thing to do anyway, is it?

- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-***@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Michael Paquier
2014-10-08 23:23:53 UTC
Permalink
Post by Heikki Linnakangas
Post by Fujii Masao
On Wed, Oct 8, 2014 at 6:54 PM, Heikki Linnakangas
Post by Michael Paquier
Instead of creating any .done files during recovery, we could scan pg_xlog
at promotion, and create a .done file for every WAL segment that's present
at that point. That would be more robust. And then apply your patch, to
recycle old segments during archive recovery, ignoring .done files.
What happens if a user shutdowns the standby, removes recovery.conf and
starts the server as the master?
Um, that's not a safe thing to do anyway, is it?
That's not safe as it bypasses all the consistency checks of promotion.
Now, it is also something that repmgr for example does as far as I recall
to do a node "promotion". What if we simply document the problem properly
then? The apparition of those phantom WAL files is more scary than a user
or a utility that does a promotion with a server restart. Not to mention as
well that users as free to add themselves files to pg_xlog.
--
Michael
Michael Paquier
2014-10-09 08:09:28 UTC
Permalink
On Wed, Oct 8, 2014 at 6:54 PM, Heikki Linnakangas
1. Where do the FF files come from? In 9.2, FF-segments are not supposed to
created, ever.
I think we should add a check in walreceiver, to throw an error if the
master sends an invalid WAL pointer, pointing to an FF segment.
Attached is a patch for that. This would be needed for PG < 9.3 if applied.
Regards,
--
Michael
Loading...