Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Database Servers
DB2InformixIngresMS SQLOraclePervasive.SQLPostgreSQLProgressSybase
Desktop Databases
FileMakerFoxProMS AccessParadox
General
General DB TopicsDatabase Theory
Related Topics
Java Development.NET DevelopmentVB DevelopmentMore Topics ...

Database Forum / DB2 Topics / February 2006

Tip: Looking for answers? Try searching our database.

Connection hang with HADR takeover by force and old primary server is down

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Mark A - 02 Feb 2006 01:20 GMT
DB2 ESE 8.2.3 (FP10) for Linux

We are experiencing a connection hang of 10 - 15 minutes in the following
HADR and automatic client reroute scenario:

01 server is primary database
02 server is standby database

a.  applications connected to database on 01 server
b. shutdown 01 server
c. run takeover db by force on 02 server (force is necessary because
databases are no longer in peer state)
d. a user logged on directly to the 02 server can connect to new primary
database without delay as soon as takeover completed
e. for remote clients it takes about 10-15 minutes to get any response back
(wait time varies each time, and even varies somewhat by app tier blade).
f. after 10-15 minute delay, automatic client reroute on remote clients
reconnects to alternate server 02 after SQL retry.

However, if the following scenario occurs, there is no delay:

a. applications connected to database on 01 server
b. db2 instance stopped with force on 01 server (but 01 server is still up
and can be pinged)
c. run HADR takeover db by force on 02 server (not in peer state)
d. after only a 5-10 second delay, automatic client reroute reconnects to
alternate server 02 after SQL retry

Both of the above scenarios exhibit the same symptoms (delays) with either
the type 2 driver (SQL commands submitted from remote client via CLI) or a
type 4 client (Websphere 6).

Does anyone know why the connections to the 01 server are hung for 10-15
minutes after an HADR takeover by force on 02, only if the 01 server is
completely down, but there is no delay.if the server 01 is still reachable
(but instance is down).

We tried setting the db2 type 2 client's registry to have
db2tcp_client_rcvtimeout=15   (15 seconds). The registry value seems to have
helped the waiting issue (connection released after about 1 minute) but it
also seems to have severed the connection (that is no automatic client
reroute retry). The following error message was received:

SQL30081N  A communication error has been detected.  Communication protocol
being used: "TCP/IP".  Communication API being used: "SOCKETS". Location
where the error was detected: "10.34.9.139".  Communication function
detecting the error: RecvTimeout".  Protocol specific error code(s):  "4",
"*", "*".  SQLSTATE=08001

Then after retry:
Communication function detecting the error: "selectForRecvTimeout".
Protocol specific error code(s):  "4", "*", "*".  SQLSTATE=08001
Steve Pearson (news only) - 03 Feb 2006 18:50 GMT
You might want to look at your TCP keepalive (system configuration).
We have seen cases where Automatic Client Reroute suffers a response
delay due to the fact that it does not learn of the connection failure
in a timely fashion.  This shows up where the socket is broken in the
comms layer (such as when the server host is shut down) but doesn't
show up where the database server connection has an explicit error
returned from DB2;  those symptoms seem highly correlated to what you
report.

Regards,
-Steve P.
----------------------------
Steve Pearson
IBM DB2 UDB for LUW Development
Portland, OR, USA
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.