Here's an interesting Ingres puzzle for you.
We are running an IngresII 2.5 installation and an IngresII 2.0
installation on the same Reliant UNIX 5.43 server. The older software
is hardly used and has not had any problems.
Our IngresII 2.5 main production installation became unresponsive to
all Ingres commands during a large batch job. That job was in the
"sysmod" phase ("Modifying 'iiatribute'").
All the normal processes were running, but not using any CPU resource.
There were no useful error messages in any log file (errlog.log.
iiacp.log, iircp.log, DBMS logs, the batch process log).
The E_DM9043_LOCK_TIMEOUT messages in the error log and DBMS log look
like a symptom rather than a cause. "CS_check_dead" in one of the DBMS
logs is a symptom of killing the DBMS server processes, isn't it?
When I killed processes and re-started the installation it worked fine.
I also re-started the IngresII 2.0 installation (even though it
remained perfectly usable) in order to clean up any shared resources
left behind.
I have naturally searched for similar problems on the web, but without
a specific error message this is difficult!
Has anyone experienced anything similar?
Has anyone any suggestions as to where I might find some evidence?
Many thanks, L Jackson.
> We are running an IngresII 2.5 installation and an IngresII 2.0
> installation on the same Reliant UNIX 5.43 server. The older software
> is hardly used and has not had any problems.
<SMARTASS>
I'd be surprised if it did manage to cause problems while it's not being
used! :-) Although now that I think about it there is a Sun workstation by
my desk where I keep an unused Ingres 6.4 installation and I did twist my
ankle when I tripped over it...
</SMARTASS>
> All the normal processes were running, but not using any CPU resource.
> There were no useful error messages in any log file (errlog.log.
> iiacp.log, iircp.log, DBMS logs, the batch process log).
My first guess is a locking problem of some kind.
> The E_DM9043_LOCK_TIMEOUT messages in the error log
I like locking as the explanation even more. Do you get repeated messages
like that? From the same session?
> When I killed processes and re-started the installation it worked fine.
Well that sure doesn't rule out locking as the culprit.
> I also re-started the IngresII 2.0 installation (even though it
> remained perfectly usable) in order to clean up any shared resources
> left behind.
If by that you mean you restarted Ingres 2.0 (as opposed to rebooting the
host) then that was unnecessary. There are no shared resources whatever.
> I have naturally searched for similar problems on the web, but without
> a specific error message this is difficult!
Next time it happens (if it happens) use IPM or VDBA to see what resources
are being locked and whether lock request is being blocked. We've had jobs
hang waiting for a lock when someone put a pop-up error message in an
application that was supposed to run as a batch job.
Roy
It looks like the system is hung and blocking on a mutex. The only way
to fix that is to bounce Ingres. If you are able to get into iimonitor
do a "show sessions" and see what state the sessions are in. This will
show you where the mutex is (e.g., QSR sem, ULM pool, ULH TCB, SVCB mem
mutex, etc.) You are supposed to be able to show the mutex, but I'm
willing to bet that won't work.
The best bet would be to upgrade Ingres. Most of those problems went
away with Ingres 2.6 service pack 2 or above.
Chip Nickolett (ChipN@Comp-Soln.com)
Comprehensive Solutions www.Comp-Soln.com
trapspamhere@googlemail.com - 24 Oct 2006 15:19 GMT
For the record this is a known bug, number 110007, associated with
error message "... LG_resume_lbb. Didn't resume ...".
I did naturally run "ipm -s" in the hope of spotting a MUTEX lock, but
there were none.
Thanks very much for your suggestions.