Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Database Servers
DB2InformixIngresMS SQLOraclePervasive.SQLPostgreSQLProgressSybase
Desktop Databases
FileMakerFoxProMS AccessParadox
General
General DB TopicsDatabase Theory
Related Topics
Java Development.NET DevelopmentVB DevelopmentMore Topics ...

Database Forum / Informix Topics / October 2005

Tip: Looking for answers? Try searching our database.

ER problem---CDRACK cause rootserver crash

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
cristizaharioiu - 28 Oct 2005 09:47 GMT
Hello,

I have any problems with  ER rootserver; the server often
crashes..about 1-2 times per day.the system is Unixware 7 and IDS 9.20.
The error is :

16:20:23  Informix Dynamic Server 2000 Version 9.20.UC3
16:20:23   Who: Session(92, informix@sco00, 0, 390625404)
               Thread(80, CDRACK_1, 17462bd8, 3)
               File: mtex.c Line: 408
16:20:23   Results: Exception Caught. Type: MT_EX_OS, Context: mem
16:20:23   Action: Please notify Informix Technical Support.
16:20:23  stack trace for pid 17198 written to
/home2/informix/tmp/af.4380c56
16:20:23   See Also: /home2/informix/tmp/af.4380c56
......
16:20:42  mtex.c, line 408, thread 80, proc id 17198, No Exception
Handler.
16:20:42  Fatal error in ADM VP at mt.c:11029
16:20:42  Unexpected virtual processor termination, pid = 17198, exit =
0x100

16:20:43  PANIC: Attempting to bring system down
16:20:43  semctl: errno = 22

Constantly the rootserver crashes by reason of CDRACK_0 or CDRACK_1.
The problem appears after I define new replicates between 2 leaf
servers connected directly to rootserver.

The architecture is here:  sco00-rootserver
                          sco01,sco40,sco42,sco43,sco44,sco45,sco46
leaf servers connected directly to sco00.

I define new replicates primary target- data are replicated from
sco40-sco46 to sco01; this replicates replicate a lot of data...30-50
row(transaction)/ min. i think that this volume of data cause the crash
because before i define this new replicates I haven't this problem.

Here is part of onconfig and af. file genereted:
--onconfig
CDR_LOGBUFFERS  16384
CDR_EVALTHREADS 1,2             # evaluator threads
(per-cpu-vp,additional)
CDR_DSLOCKWAIT  300             # DS lockwait timeout (seconds)
#CDR_QUEUEMEM    4096            # Maximum amount of memory for any CDR
queue (Kbytes)
CDR_QUEUEMEM    16384
CDR_LOGDELTA    30              # % of log space allowed in queue
memory
CDR_NUMCONNECT  100             # Expected connections per server
CDR_NIFRETRY    300             # Connection retry (seconds)
CDR_NIFCOMPRESS 5               # Link level compression (-1 never, 0
none, 9 max)
--onstat -g ath

75      1816b1f0 174614d8 2    sleeping secs: 52       3cpu
CDRNsT117
77      1807fcf8 17461a98 2    cond wait  CDRBlbslp    3cpu
CDRBLOB_0
78      182a2178 17462058 2    cond wait  CDRBlbslp    3cpu
CDRBLOB_1
79      182af1a0 17462618 2    cond wait  CDRAckslp    1cpu
CDRACK_0
*80      182bc1a0 17462bd8 2    running                 3cpu
CDRACK_1
81      182c91f0 17463198 2    cond wait  CDRDssleep   1cpu
CDRD_0
82      182d7178 17463758 2    cond wait  CDRDssleep   1cpu
CDRD_1
83      182e4178 17463d18 2    cond wait  netnorm      1cpu
CDRNr46

onstat -g stk 80 light:

Informix Dynamic Server 2000 Version 9.20.UC3   -- On-Line -- Up 1 days
05:11:08 -- 433028 Kbytes

Stack for thread: 80 CDRACK_1
base: 0x182c0018
 len:   36864
  pc: 0x0856f0eb
 tos: 0x182c7dd8
state: running
  vp: 3

0x08863d98 (*nosymtab*)0x8863d98

What can i do to avoid this problem ? Can I tuning any parameters on
onconfig file?

Also I think to define server sco01 as nonroot server and sco40..sco46
leaf servers connected to sco01..is it a good idea ? My expectation is
this architecture avoid replication from sco40-46 to sco01 through
sco00 so the data will be replicated directly and sco00 won't be
implicated...is it correct ?

Thank you in advance...
Cristian
Madison Pruet - 28 Oct 2005 16:29 GMT
> Hello,
>
[quoted text clipped - 33 lines]
> row(transaction)/ min. i think that this volume of data cause the crash
> because before i define this new replicates I haven't this problem.

I don't think this is the case.  We have customers replicating in the
thousands of transactions a second.

> Here is part of onconfig and af. file genereted:
> --onconfig
[quoted text clipped - 44 lines]
>
> 0x08863d98 (*nosymtab*)0x8863d98

We are going to have to get the stack somehow.  It might be worth it to set
AFDEBUG so that
instead of just crashing that the server will hang.  That would make it
possible to attach to the server
while it is in the process of crashing with a debugger and get a stack.

> What can i do to avoid this problem ? Can I tuning any parameters on
> onconfig file?

I would try turning off compression.  I don't know if that would help, but
it's worth a try.

> Also I think to define server sco01 as nonroot server and sco40..sco46
> leaf servers connected to sco01..is it a good idea ? My expectation is
> this architecture avoid replication from sco40-46 to sco01 through
> sco00 so the data will be replicated directly and sco00 won't be
> implicated...is it correct ?

sco00 will forward the transactions to sco01. Sco01 may not participate
in the replicated tables, but it will participate in the network flow.

> Thank you in advance...
> Cristian
cristizaharioiu - 31 Oct 2005 10:48 GMT
Thank you Madison,

At the first step I would try to turn off compression ...it's necessary
to turn off compression on sco00 or both sco00 and sco01 ?
mpruet@comcast.net - 31 Oct 2005 14:03 GMT
> Thank you Madison,
>
> At the first step I would try to turn off compression ...it's necessary
> to turn off compression on sco00 or both sco00 and sco01 ?

compression is negotiated.  That means that you only have to set it on
one of the nodes
mpruet@comcast.net - 31 Oct 2005 14:03 GMT
> Thank you Madison,
>
> At the first step I would try to turn off compression ...it's necessary
> to turn off compression on sco00 or both sco00 and sco01 ?

compression is negotiated.  That means that you only have to set it on
one of the nodes
caver - 31 Oct 2005 16:06 GMT
Cristian

In the long term, I would suggest upgrading to at least version 9.4 - I
had lots of ER crashes at 9.21, but after I was finally able to get to
9.4, ER has been much more robust.  Warning - for my configuration it
took a lot of work to upgrade my root server to 9.4 - but it was worth
it.

Daniel
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.