Hello,
We have an Informix 7.31 UC4 server running on HP Unix 11.11 with High
availability data replication. Due to network problems, replication stopped.
The primary database server was running without any problems for about 24
hours and all of a sudden with no apparent warning signs, the server simply
crashed. The online.log is attached below. I was under the impression, that
replication failures do not cause a crash of the primary server. Thanks in
advance to anyone that could shed some light.
14:13:08 DR: Turned off on primary server
14:13:09 Checkpoint Completed: duration was 0 seconds.
14:13:09 DR: Cannot connect to secondary server
14:13:19 DR: Primary server connected
14:13:20 DR: Receive error
14:13:20 DR: Failure recovery error (2)
14:13:22 DR: Turned off on primary server
14:13:23 Checkpoint Completed: duration was 0 seconds.
14:13:23 DR: Cannot connect to secondary server
14:13:34 DR: Primary server connected
14:13:34 DR: Receive error
14:13:34 DR: Failure recovery error (2)
14:13:34 Assert Failed: No Exception Handler
14:13:34 Informix Dynamic Server Version 7.31.UC4
14:13:34 Who: Session(24, informix@, 0, -635431992)
Thread(55, dr_prsend, da1dc9ec, 1)
File: mtex.c Line: 446
14:13:34 Results: Exception Caught. Type: MT_EX_OS, Context: mem
14:13:34 Action: Please notify Informix Technical Support.
14:13:34 stack trace for pid 15822 written to /tmp/af.41f34fe
14:13:47 See Also: /tmp/af.41f34fe, shmem.41f34fe.0
14:14:05 Error writing '/tmp/shmem.41f34fe.0' errno = 28
14:14:05 mtex.c, line 446, thread 55, proc id 15822, No Exception Handler.
14:14:05 PANIC: Attempting to bring system down
14:26:07 Segment locked: addr=0xc34dc000, size=491446272
Wed Jul 30 14:26:08 2003
Sandor - 31 Jul 2003 12:28 GMT
Hi
It looks like you have a problem in your server.
You should send /tmp/af.41f34fe to the Informix Technical Support
They will help you.
bye
Sandor
> Hello,
>
[quoted text clipped - 33 lines]
>
> Wed Jul 30 14:26:08 2003
Everett Mills - 31 Jul 2003 16:45 GMT
You say it ran for 24 hours. Was that 24 hours after the backup died?
If, so I'd say you ran out of log space. HADR saves its logs on the
primary until the secondary can read them. If your secondary is down,
and you know it will be down for a significant amount of time
(depending on how much log space you have), you should turn HADR off
to avoid this type of situation. It will lock up and eventually crash
your primary server. To do this run:
onmode -d standard
on the primary
Once you have the secondary ready to go, you will need to restart
HADR, by restoring a backup to it.
If your primary is down, and will not start for the lack of the
secondary, you may use these commands:
onmode -yuk (to stop any instance stuck in the initialization process)
oninit -D (-D is undocumented, it tells the server to start, even if
the connection to the secondary cannot be made)
to stop the initialization process and restart the instance with the
secondary missing. Once the server is restarted, you may use onmode
-d standard, and restart HADR (restore a backup, etc.) when your
secondary is available.
--EEM
> Hello,
>
[quoted text clipped - 33 lines]
>
> Wed Jul 30 14:26:08 2003