Hello,
I have any problems with ER rootserver; the server often
crashes..about 1-2 times per day.the system is Unixware 7 and IDS 9.20.
The error is :
16:20:23 Informix Dynamic Server 2000 Version 9.20.UC3
16:20:23 Who: Session(92, informix@sco00, 0, 390625404)
Thread(80, CDRACK_1, 17462bd8, 3)
File: mtex.c Line: 408
16:20:23 Results: Exception Caught. Type: MT_EX_OS, Context: mem
16:20:23 Action: Please notify Informix Technical Support.
16:20:23 stack trace for pid 17198 written to
/home2/informix/tmp/af.4380c56
16:20:23 See Also: /home2/informix/tmp/af.4380c56
......
16:20:42 mtex.c, line 408, thread 80, proc id 17198, No Exception
Handler.
16:20:42 Fatal error in ADM VP at mt.c:11029
16:20:42 Unexpected virtual processor termination, pid = 17198, exit =
0x100
16:20:43 PANIC: Attempting to bring system down
16:20:43 semctl: errno = 22
Constantly the rootserver crashes by reason of CDRACK_0 or CDRACK_1.
The problem appears after I define new replicates between 2 leaf
servers connected directly to rootserver.
The architecture is here: sco00-rootserver
sco01,sco40,sco42,sco43,sco44,sco45,sco46
leaf servers connected directly to sco00.
I define new replicates primary target- data are replicated from
sco40-sco46 to sco01; this replicates replicate a lot of data...30-50
row(transaction)/ min. i think that this volume of data cause the crash
because before i define this new replicates I haven't this problem.
Here is part of onconfig and af. file genereted:
--onconfig
CDR_LOGBUFFERS 16384
CDR_EVALTHREADS 1,2 # evaluator threads
(per-cpu-vp,additional)
CDR_DSLOCKWAIT 300 # DS lockwait timeout (seconds)
#CDR_QUEUEMEM 4096 # Maximum amount of memory for any CDR
queue (Kbytes)
CDR_QUEUEMEM 16384
CDR_LOGDELTA 30 # % of log space allowed in queue
memory
CDR_NUMCONNECT 100 # Expected connections per server
CDR_NIFRETRY 300 # Connection retry (seconds)
CDR_NIFCOMPRESS 5 # Link level compression (-1 never, 0
none, 9 max)
--onstat -g ath
75 1816b1f0 174614d8 2 sleeping secs: 52 3cpu
CDRNsT117
77 1807fcf8 17461a98 2 cond wait CDRBlbslp 3cpu
CDRBLOB_0
78 182a2178 17462058 2 cond wait CDRBlbslp 3cpu
CDRBLOB_1
79 182af1a0 17462618 2 cond wait CDRAckslp 1cpu
CDRACK_0
*80 182bc1a0 17462bd8 2 running 3cpu
CDRACK_1
81 182c91f0 17463198 2 cond wait CDRDssleep 1cpu
CDRD_0
82 182d7178 17463758 2 cond wait CDRDssleep 1cpu
CDRD_1
83 182e4178 17463d18 2 cond wait netnorm 1cpu
CDRNr46
onstat -g stk 80 light:
Informix Dynamic Server 2000 Version 9.20.UC3 -- On-Line -- Up 1 days
05:11:08 -- 433028 Kbytes
Stack for thread: 80 CDRACK_1
base: 0x182c0018
len: 36864
pc: 0x0856f0eb
tos: 0x182c7dd8
state: running
vp: 3
0x08863d98 (*nosymtab*)0x8863d98
What can i do to avoid this problem ? Can I tuning any parameters on
onconfig file?
Also I think to define server sco01 as nonroot server and sco40..sco46
leaf servers connected to sco01..is it a good idea ? My expectation is
this architecture avoid replication from sco40-46 to sco01 through
sco00 so the data will be replicated directly and sco00 won't be
implicated...is it correct ?
Thank you in advance...
Cristian
Madison Pruet - 28 Oct 2005 16:29 GMT
> Hello,
>
[quoted text clipped - 33 lines]
> row(transaction)/ min. i think that this volume of data cause the crash
> because before i define this new replicates I haven't this problem.
I don't think this is the case. We have customers replicating in the
thousands of transactions a second.
> Here is part of onconfig and af. file genereted:
> --onconfig
[quoted text clipped - 44 lines]
>
> 0x08863d98 (*nosymtab*)0x8863d98
We are going to have to get the stack somehow. It might be worth it to set
AFDEBUG so that
instead of just crashing that the server will hang. That would make it
possible to attach to the server
while it is in the process of crashing with a debugger and get a stack.
> What can i do to avoid this problem ? Can I tuning any parameters on
> onconfig file?
I would try turning off compression. I don't know if that would help, but
it's worth a try.
> Also I think to define server sco01 as nonroot server and sco40..sco46
> leaf servers connected to sco01..is it a good idea ? My expectation is
> this architecture avoid replication from sco40-46 to sco01 through
> sco00 so the data will be replicated directly and sco00 won't be
> implicated...is it correct ?
sco00 will forward the transactions to sco01. Sco01 may not participate
in the replicated tables, but it will participate in the network flow.
> Thank you in advance...
> Cristian
cristizaharioiu - 31 Oct 2005 10:48 GMT
Thank you Madison,
At the first step I would try to turn off compression ...it's necessary
to turn off compression on sco00 or both sco00 and sco01 ?
mpruet@comcast.net - 31 Oct 2005 14:03 GMT
> Thank you Madison,
>
> At the first step I would try to turn off compression ...it's necessary
> to turn off compression on sco00 or both sco00 and sco01 ?
compression is negotiated. That means that you only have to set it on
one of the nodes
mpruet@comcast.net - 31 Oct 2005 14:03 GMT
> Thank you Madison,
>
> At the first step I would try to turn off compression ...it's necessary
> to turn off compression on sco00 or both sco00 and sco01 ?
compression is negotiated. That means that you only have to set it on
one of the nodes
caver - 31 Oct 2005 16:06 GMT
Cristian
In the long term, I would suggest upgrading to at least version 9.4 - I
had lots of ER crashes at 9.21, but after I was finally able to get to
9.4, ER has been much more robust. Warning - for my configuration it
took a lot of work to upgrade my root server to 9.4 - but it was worth
it.
Daniel