Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Database Servers
DB2InformixIngresMS SQLOraclePervasive.SQLPostgreSQLProgressSybase
Desktop Databases
FileMakerFoxProMS AccessParadox
General
General DB TopicsDatabase Theory
Related Topics
Java Development.NET DevelopmentVB DevelopmentMore Topics ...

Database Forum / Ingres Topics / May 2005

Tip: Looking for answers? Try searching our database.

RE: [Info-ingres] E_DMA469  _PROCESS_HAS_DIED    Ingres 2.6 Tru64    OS 5.1a

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Laframboise André - 04 May 2005 00:41 GMT


We occasionally get that error. Are you running multiple servers ?

We often get something like Exception occurred in the DMF Facility,
exception number 68197 in one of the DBMS logs.

Andre

-----Original Message-----
From: info-ingres-admin@cariboulake.com
To: Oscar Carlés
Cc: info-ingres@cariboulake.com
Sent: 03/05/05 5:24 PM
Subject: RE: [Info-ingres] E_DMA469  _PROCESS_HAS_DIED    Ingres 2.6 Tru64
OS 5.1a

Hi Oscar,

Thank you for replying,

II 2.6/0305 (axp.osf/00) 10670

Thanks,

Hoan

-----Original Message-----
From: Oscar Carlés [mailto:oscar@integra.com.py]
Sent: Tuesday, May 03, 2005 2:20 PM
To: Hoan P. Thai
Cc: info-ingres@cariboulake.com
Subject: Re: [Info-ingres] E_DMA469 _PROCESS_HAS_DIED Ingres 2.6 Tru64
OS 5.1a

Hi Hoan:

Which is your Ingres ptf level? (see it in $II_SYSTEM/ingres/version.rel

file)

Regards

Oscar

Thai wrote:

>Hi
>
>Our Ingres 2.6 Tru64 5.1 just died with an error
>"[II_RCP            , 0000000140310c00]: Tue May  3 10:05:19 2005
>E_DMA469_PROCESS_HAS_DIED    Process (0003DD87) has died. A process
>attached to the logging and locking system has exited without going
>through normal cleanup processing. The system will now perform cleanup
>processing on behalf of the failed process.
>"
>
>Has anyone had this similar problem and known the solution?
>
>Thank you very much,
>
>Hoan
>
>_______________________________________________
>Info-ingres mailing list
>Info-ingres@cariboulake.com
>http://mailman.cariboulake.com/mailman/listinfo.py/info-ingres
>
>  

_______________________________________________
Info-ingres mailing list
Info-ingres@cariboulake.com
http://mailman.cariboulake.com/mailman/listinfo.py/info-ingres
ghingres@yahoo.co.uk - 04 May 2005 14:28 GMT
Hoan,

 I take it you are running OS threads ?

 If the occurance is frequent enought, try switching to Ingres threads
for a day or two (if you need help on how, there's plenty in the
group). In one of our DMA469 problems we found a large IN clause for an
SQL statement was blowing the stack. It appears ingres threads can
catch the reason before winding up the process, OS threads manage to
bomb the process to quick ! In order to obtain the output, you must
have DBMS logs enabled (do an ingsetenv II_DBMS_LOG
${II_SYSTEM}/ingres/files/dbms_%p.log ). Stack size could be what you
are experiencing, have you tried increasing ? check ulimit -aH settings
(man sys_attrs_proc for details)... If you need help in this area,
you'll have to publish the settings you've put in /etc/sysconfigtab
along with your ingres config (ingprenv, config.dat, protect.dat) then
people can compare and suggest values etc... I did publish some values
along time ago on the group...

  If you are not running out of a resource, then the only avenue left
for evidence (because nothing is being recorded) is to use ladebug (as
I do here !!). It doesn't stop the DMA469's but sure gives you some
evidence (and CA) to see where the DBMS server was at the time of SEGV.
The following command and execution script show how it is done
(remember to add onto your start up script and purge down the ladebug
logs every so often !)...

   Perhaps with us all collecting the evidence the root cause may one
day be flushed out...

   Gary

Execution Command: -

nohup ksh "${II_SYSTEM}/ingres/debug/ladebug_dump.sh
>>${II_SYSTEM}/ingres/ladebug.log 2>&1" &

Debugger Script (ladebug_dump.sh)

#!/bin/ksh
#
#       Script for Attaching LADEBUG to an IngresII server
#
xpid=`ps -u ingres | grep -v grep | grep " dbms " | awk '{ print $1}'`
for iipid in ${xpid}
do
       echo "`date` Connecting to Ingres ii DBMS Server with PID
${iipid}"
#
#       Execute the Debugger...
#
       ladebug << EOF >${II_SYSTEM}/ingres/debug/ladebug_${iipid}.log&
set \$stoponattach=1
attach ${iipid} ${II_SYSTEM}/ingres/bin/iimerge
ignore SIGUSR2
ignore SIGPIPE
cont
where thread all
# show condition
# show thread
detach
quit
EOF
done
#
#       All Done...
#
Thai - 04 May 2005 19:04 GMT
Thank you  Gary,

We only have 1 crashed yesterday. It is on our  first go live day.
CA- has recommended  OS upgrade to 5.1B.
Which stack_size parameter were you talking about?

Our kernel configuration are:

msg_max = 8192
msg_mnb = 16384
msg_mni = 256
msg_tql = 128
shm_max = 1500000000
shm_min = 1
shm_mni = 256
shm_seg = 102400
sem_mni = 1024
sem_msl = 240
sem_opm = 48
sem_ume = 48
sem_vmx = 65534
sem_aem = 32768

vm_max_wrpgio_kluster = 32768
vm_max_rdpgio_kluster = 16384
vm_cowfaults = 4
vm_segmentation = 1
vm_ubcpagesteal = 24
vm_ubcfilemaxdirtypages = 4294967295
vm_ubcdirtypercent = 10
ubc_maxdirtywrites = 5
vm_ubcseqstartpercent = 50
vm_ubcseqpercent = 10
vm_csubmapsize = 1048576
vm_ubcbuffers = 256
vm_syncswapbuffers = 128
vm_asyncswapbuffers = 4
vm_clustermap = 1048576
vm_clustersize = 65536
vm_syswiredpercent = 80
vm_troll_percent = 4
vm_inswappedmin = 1
vm_page_free_target = 1024
vm_page_free_swap = 522
vm_page_free_hardswap = 16384
vm_page_free_min = 20
vm_page_free_reserved = 10
vm_page_free_optimal = 522
vm_swap_eager = 1
vm_page_prewrite_target = 2048
vm_ffl = 1
ubc_ffl = 1
vm_rss_maxpercent = 100
vm_rss_block_target = 522
vm_rss_wakeup_target = 522
vm_min_kernel_address = 18446741891866165248 malloc_percpu_cache = 1
vm_aggressive_swap = 0 vm_segment_cache_max = 50

max_proc_per_user = 1024
max_threads_per_user = 16384
per_proc_stack_size = 553648128
max_per_proc_stack_size = 553648128
per_proc_data_size = 3221225472
max_per_proc_data_size = 3221225472
max_per_proc_address_space = 3221225472
per_proc_address_space = 3221225472

inet:
       ipport_userreserved = 65000
       tcp_keepalive_default = 1

socket:
       somaxconn = 65535
       sominconn = 65535

vfs:
       bufcache = 1
       name-cache-hash-size = 2048

generic:
       message-buffer-size = 16384
       msgbuf_size = 16384
               new_vers_high = 1445655480385976064
               new_vers_low = 51480

pcount:
       Device_Char_Files = pcount0
       Device_Char_Major = ANY
       Device_Char_Minor = 0
       Module_Config_Name = pcount
       Module_Type = Dynamic
       Subsystem_Description = pcount device driver

hwc:
       hwc_boot_old = 0

Thanks again,
ghingres@yahoo.co.uk - 05 May 2005 18:03 GMT
Hiya Hoan,

per_proc_stack_size = 553648128 and max_per_proc_stack_size = 553648128
set how much stack the OS will give you (and maximum you can have) and
are reflected in the ulimit and ulimit -H output. You currently have
528MB which should be more than adequate ;-)

Ingres has its own internal stack size, set through CBF, both on DBMS
and Recovery server which we currently have set at 524288 and 131072
respectively for our 4GB memory machine.

If this problem is occuring often, is not something obvious (like Andre
said - someone playing with the kill command) you have no output in the
DBMS logs (I am taking it you have these now switched on) and you need
stability quick... I would still suggest you stop running OS threads
and switch to Ingres threads. You will loose out on performance but it
may help diagnose the underlying problem whilst you consider your
options... these would be something along the lines of : -

 1. Upgrade to 5.1B-2 as CA suggests (5.1A is on prior version support
from HP after all)

 2. Keep running OS threads, DBMS logs and attach Ladebug to capture
reasons for server crash
     [ might come up with further clues / breadcrumbs... ]

 3. Switch to Ingres threads, enable DBMS logs and hope you'll find
the culprit
     [ we found a large SQL IN clause this way... ]

I've been in the situation you are currently experiencing, so know what
it is like. The real driver is to obtain as much info as you can (dbms
logs, debugs etc, what was running at time of crash) to CA support -
reproducable cases are best ;-) Once they can understand how you are
loosing your DBMS servers they're usually very quick at patching it...

Hope this gives you some ideas

Cheers

Gary
Thai - 06 May 2005 15:49 GMT
Thank you,

I already sent DBMS log to CA. They have not gotten back yet.
I was so lucky and  caught the following query on another machine. How
many are out there...:o(?
Warning! the following query will bring down INGRES database and you
will get a friendly E_DMA469_PROCESS_HAS_DIED  .
with no other trace.

SELECT DISTINCT  fac.fac_id ,  TRIM(address.street_nbr) as street_nbr ,
TRIM(address.street_nbr_sfx) as street_nbr_sfx ,
         TRIM(address.street_dir) as street_dir ,
TRIM(address.street_name) as street_name ,  TRIM(address.street_sfx) as
street_sfx ,   TRIM(address.street_apt) as street_apt ,
TRIM(address.city) as city ,  TRIM(address.state) as state ,
TRIM(address.zip) as zip ,   TRIM(address.zip_four) as zip_four ,
fac.fac_status ,  fac.zone ,  fac_bus_info.name        FROM fac ,
 address ,   fac_bus_info
       WHERE ( address.bus_info_id = fac_bus_info.bus_info_id ) and  (
fac.bus_info_id = fac_bus_info.bus_info_id ) and
                    ( ( fac.fac_id in
(136,136,136,236,236,236,346,346,346,346,346,346,346,346,346,346,346,346,346,346,346,346,346,346,346,346,346,434,465,465,
1003, 1206, 1206, 1206, 1206, 1206, 1206, 1206, 1206, 1206, 1206, 1206,
1206, 1206, 1206, 1206, 1206, 1206, 1206, 1206, 1206, 1308, 1744, 1744,
1744, 1744, 1744...this list go on to 1116 count of
facilities...,12345,232843
) ) and  ( address.system_type = 'LOC' )  )
       ORDER BY fac.fac_id ASC

As you can see, this is a 3 page in clause. Why our beloved programmer
do that...is beyond my understanding.
We ran into a similar case on Ingres 2.0 where they put 500 return
characters in a simple SQL which brought down I2.0 (no fix for it, CA-
just pump up to a much higher number, I forgot what is the max). The
good thing about I2.0 it will tell us who is the last person unlike
I2.6 E_DMA469_PROCESS_HAS_DIED .
As far as I know, some one internal can build a coffee break button and
submit this query through ODBC....no trace.

I will work with CA on this problem and let you all know.
Thank you,

Hoan
ghingres@yahoo.co.uk - 07 May 2005 13:01 GMT
Great News...

 And you don't need to do an OS upgrade to get round problem ;-)

 CA should be already aware of a large IN clause causing the problem
it was what was causing our first set of DMA469's... We put forard the
suggestion that the query optimizer should internally store a large IN
clause as a temporary internal table and perform a comparison against
that rather than stacking (and thus going bang) - HEY ANY R3 People
wanna have a look ?? Apparently on other OS's you do get a stack
overflow messsage, on Tru64 you get nothing - but DMA469 when running
OS threads...

 Fun thing to do (on your test box), switch to Ingres threads and run
that same 3 page IN clause... you'll see it is more helpful in telling
you whoops - blown stack ;-)

 Glad you're a happier bunny

 Cheers

 Gary
Betty & Karl Schendel - 07 May 2005 15:04 GMT
>   CA should be already aware of a large IN clause causing the problem
>it was what was causing our first set of DMA469's... We put forard the
>suggestion that the query optimizer should internally store a large IN
>clause as a temporary internal table and perform a comparison against
>that rather than stacking (and thus going bang) - HEY ANY R3 People
>wanna have a look ??

R3 doesn't process IN's recursively any more.  I wouldn't expect any
stack overflow or crash for large IN's in R3.

Karl
Thai - 10 May 2005 17:13 GMT
I ran the same query on R3. It did do anything on my R3. Very
interesting the way Ingres handles error.

Hoan
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.