Not sure what you're trying to do here. You shouldn't activate the
standby at all; like Mark said, it needs to be sitting there in
rollforward pending state. It becomes active, as in a usable database,
when it does a takeover to become the primary database.
There are ways to make the standby available, but then it won't be a
standby anymore, from a HADR standpoint.
/T
> > Do not do a rollforward on a HADR standby database, as it must be in
> > rollforward pending state at all times. The original problem may have been
[quoted text clipped - 22 lines]
> The question now is...how do you activate a standby? I must be missing
> something that is key here.
I can't say for sure why the db refused to activate, but presumably
something happened prior to that to put it in a state where it could
not. I'm not familiar with what that may have been, as the "activate
db" command is supposed to be supported on an HADR standby.
- if the standby is not active, it should cause activation
- if the standby is already active, it will return a warning saying as
much
Note that activation of a standby does not mean normal user connections
can be made. Rather, it means that the db server is running and, if it
can connect to its partner, performing as an HADR standby.
If you examing the diagnostic log for the database (db2diag.log in your
db2dump directory), you may be able to find something about the sequece
of events leading to this and the other error messages you received.
> Then I stopped hadr
> on the standby and did an activate and it seemed to not have a problem
> with that but when I started hadr as standby again it stated it was
> successful but still would not connect to the primary.
There may have been some other steps in there. When a standby is
stopped, the db goes into an inactive rollforward-pending mode. In
this mode, the "activate db" command should return an error like this:
SQL1117N A connection to or activation of database "MYDB" cannot be
made
because of ROLL-FORWARD PENDING. SQLSTATE=57019
In any case, once you got the ex-standby started as a non-HADR
database, it most likely would be impossible for it to reconnect
successfully as a standby again except by reinitialization from scratch
(new db restore). To bring the ex-standby out of rollforward-pending
requires the rollforward to be completed there. That generally puts
that database on what we refer to as a "new log chain". In other
words, from the perspective of the db's history, as reflected in the db
log, that db has diverged and is on a different path from the primary
now.
> I couldnt do a
> db2pd or a db2 get snapshot to see what state the standby thought it
> was in. When I issued a db2 get snapshot it stated I could not perform
> that action unless the db was activated.
If you looked in the db2diag.log files for both primary and standby you
would likely find messages indicating that the standby had attempted to
connect with the primary but failed in the handshake validations. When
this is rejected, the standby deactivates. (No sense it trying again
and again as this is not a transient error.)
Without the db being active, there's no shared memory for db2pd to
attach to, nor can the get snapshot be performed.
> Basically, I was in a state where I couldnt get an active standby and I
> couldnt do anything as an active standard. I finally had to do the
> rollforward so I was able to get it to an active standard mode.
My guess is a previously issued rollforward on the standby helped it
get into that state.
If you are concerned that HADR is behaving incorrectly, please open a
case with IBM service and provide your understanding of what happened
along with the db2diag.log files from both primary and standby covering
the entirety of the relevant time period.
Regards,
- Steve P.
--
Steve Pearson, IBM DB2 for Linux, UNIX, and Windows, IBM Software Group
DB2 "Portland" Development Team, IBM Beaverton Lab, Beaverton, OR, USA
shorti - 29 Nov 2006 19:45 GMT
> I can't say for sure why the db refused to activate, but presumably
> something happened prior to that to put it in a state where it could
> not.
> Note that activation of a standby does not mean normal user connections
> can be made. Rather, it means that the db server is running and, if it
> can connect to its partner, performing as an HADR standby.
Yes..this is what I am trying to determine. It seems that the standby
was in an inactive standby mode and I could not reach active standby.
After reading some more, I found the proper procedure was for me to
make the Primary HADR into a Standard non-HADR....then db2 deactivate
the Standby and stop hadr.
I did not change the Primary HADR to Standard....Instead, the first
thing I did was the db2 deactivate. After that I was not able to get
the Standby to communicate with the Primary so I assumed that the
standby remained in some sort of Inactive Standby mode. I tried
stopping and starting HADR etc...but it would not connect with the
Primary.
> If you examing the diagnostic log for the database (db2diag.log in your
> db2dump directory), you may be able to find something about the sequece
> of events leading to this and the other error messages you received.
With your suggestion I did find it had a problem with a log:
FUNCTION: DB2 UDB, data protection, sqlpgArchiveLogFile, probe:3160
MESSAGE : Failed to archive log file S0005791.LOG to
/db2/backups/archive_0/xxxxxxx/THEDB/NODE0000/C0000010/ from
/db2/logs/active_0/NODE0000/ with rc = -2045837302.
So now I wonder what I should have done to recover. Evidently, the
rollforward was the wrong thing to do unless that was the last resort
to get it to Standard.
> In any case, once you got the ex-standby started as a non-HADR
> database, it most likely would be impossible for it to reconnect
[quoted text clipped - 10 lines]
> along with the db2diag.log files from both primary and standby covering
> the entirety of the relevant time period.
No...I dont feel that DB2 is the problem...and never have. Now that I
see that it looks like I corrupted the logging by not following
procedure I would like to know how this could be corrected. For
instance, what if the Standby has a power failure. A few minutes later
it is back up and has this issue...can I copy the logs to the Standby
and have it replay starting with the 'bad' log? Or am I stuck just
doing a database restore?
Thanks for the great info Steve!
Steve Pearson (news only) - 29 Nov 2006 22:05 GMT
Well... I'm not sure exactly what the sequence of events was, so it's
hard to propose a response. A thorough examination of the db2diag.log
files might reveal some additional information. But we can discuss a
bit more here about the initial concern you mentioned.
Assuming nothing else goes on, it should be possible to issue
"deactivate db" followed by "activate db" on a db configured as HADR
standby, and on the activate that db server should start up and resume
the activities of the standby role. That is, no special recovery
procedure should be needed after deactivation of an HADR standby
database.
Some things that could prevent the standby from properly activating:
- change in HADR config on primary or standby leading to mismatch
= if comms connects but handshake fails, standby deactivates
= if comms connection can't be made, standby stays active and retries
in a loop
- primary and standby pair validation mismatch (e.g., mismatching log
chains or standby ahead of primary); standby deactivates
- missing or corrupt log files/data on standby; standby panics or
deactivates
- missing needed log files on primary (e.g., primary got ahead while
standby was away, and archives of intervening log files are no for some
reason not retrievable on the primary); when a needed log file is not
available from primary, standby deactivates
- comms port collision/conflict with another db or other application
= standby may have its connection dropped
= primary may report comms related errors or config mismatch
Regards,
- Steve P.
--
Steve Pearson, DB2 for Linux, UNIX, and Windows Development, IBM
Software Group
DB2 "Portland" Team, IBM Beaverton Lab, Beaverton, OR, USA