Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Database Servers
DB2InformixIngresMS SQLOraclePervasive.SQLPostgreSQLProgressSybase
Desktop Databases
FileMakerFoxProMS AccessParadox
General
General DB TopicsDatabase Theory
Related Topics
Java Development.NET DevelopmentVB DevelopmentMore Topics ...

Database Forum / Informix Topics / August 2003

Tip: Looking for answers? Try searching our database.

WHY index can corrupted???

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
miyaki - 13 Aug 2003 03:01 GMT
Hi all,

OS : Sun Solaris 2.7
IDS : 7.31.UD5

Recently my database engine was crashed due to the
index corruption. The engine was back up after
repairing the index by using the oncheck command. I'm
wondering why the index can be corrupted? In fact, I
just performed the database re-org by using the
dbexport-import at 3 months ago.

Anybody knows the reason that may cause the index
corruption???

Appreciate for the comments.

TIA,
MIyaki

sending to informix-list
Fernando Nunes - 13 Aug 2003 12:22 GMT
> Anybody knows the reason that may cause the index
> corruption???

BUG, hardware problem, or maybe a BTREE cleaner abort due to some system failure (which will probably a BUG anyway).

Regards.
Rajib Sarkar - 13 Aug 2003 16:03 GMT
Usually, in my experience, I've seen index corruptions due to a scenario
where multiple sessions are accessing the same table/index and some of them
are doing so in dirty read isolation causing the corruption ...

Thanx much,

Rajib Sarkar
Advisory Software Engineer (RAS)
IBM Data Management Group
Ph :  (602)-217-2100
Fax:  (602)-217-2100

As long as you derive inner help and comfort from anything, keep it --
Mahatma Gandhi

                                                                                                                                     
                     miyaki                                                                                                          
                     <lcib@yahoo.com>         To:       ids@iiug.org, informix-list@iiug.org                                          
                     Sent by:                 cc:                                                                                    
                     owner-informix-li        Subject:  WHY index can corrupted???                                                    
                     st@iiug.org                                                                                                      
                                                                                                                                     
                                                                                                                                     
                     08/12/2003 07:01                                                                                                
                     PM                                                                                                              
                                                                                                                                     

Hi all,

OS : Sun Solaris 2.7
IDS : 7.31.UD5

Recently my database engine was crashed due to the
index corruption. The engine was back up after
repairing the index by using the oncheck command. I'm
wondering why the index can be corrupted? In fact, I
just performed the database re-org by using the
dbexport-import at 3 months ago.

Anybody knows the reason that may cause the index
corruption???

Appreciate for the comments.

TIA,
MIyaki

sending to informix-list

sending to informix-list
Madison Pruet - 13 Aug 2003 17:54 GMT
There is absolutely no reason for index corruption.  If the index should
become corrupt, then it means that a bug has been encountered.  Please open
a case with tech support to try to discover the root cause of the
corruption.

The only exception to this would be the case where either something outside
of the engine induced the corruption.  This might include

1) the hardware did not do a physical IO that the os(software) thought had
been done
2) the user did somthing wrong such as having two instances using the same
chunk
3) the customer ran into some kind of problem and the logical logs were
patched so that the normal recovery could not successfully complete

M.Pruet

> Hi all,
>
[quoted text clipped - 21 lines]
> http://sitebuilder.yahoo.com
> sending to informix-list
Madison Pruet - 14 Aug 2003 17:02 GMT
I neglected to add

4) using non-logging databases

as a reason that the index might become corrupt.

M.P.

> There is absolutely no reason for index corruption.  If the index should
> become corrupt, then it means that a bug has been encountered.  Please open
[quoted text clipped - 38 lines]
> > http://sitebuilder.yahoo.com
> > sending to informix-list
Alexey Sonkin - 13 Aug 2003 19:10 GMT
Hi, everybody,

The last time I saw index corruption was when
I created index with PDQ on multi-CPU system for a huge table that
was being actively modified just few seconds before index
creation was started (may be, some buffers were not
flushed to disk). That happened with IDS 9.21uc4

Non of Madison's (1) (2) (3) took place in our case.

Formally, we did everything correctly. It was a pure IDS
problem.  

I think that IDS shared memory was corrupted by the
time index creation was started. Memory corruption could be caused
by some problematic query, not related to this index creation task.

------------------------------------------
Alexey Sonkin
Senior Database Administrator

> -----Original Message-----
> From: Madison Pruet [mailto:mpruet@comcast.net]
[quoted text clipped - 46 lines]
> > http://sitebuilder.yahoo.com
> > sending to informix-list

sending to informix-list
Bob Bankay - 27 Aug 2003 00:17 GMT
> Hi all,
>
> OS : Sun Solaris 2.7
> IDS : 7.31.UD5
<snip>

> Anybody knows the reason that may cause the index
> corruption???
[quoted text clipped - 3 lines]
> TIA,
> MIyaki

We use similar hardware and OS in a very high volume db application.  We
used caching RAID
disk controllers that would corrupt data long after it had given IO
complete to Solaris and Informix.  So neither the OS nor the db engine
had anyway to handle a hardware corruption at a later time.  At the time
we were using controllers with 128MB cache in RAID 1+0 config. Somehow,
the hardware would write corrupt data with correct parity, and the
mirror manager would not know which image was correct. Hence index
failures.

Normally we would find out about corruption when doing backups.  The
only way to cure the problem was to get rid of the caching controllers.
We now use RAID 5 which is slower but can self heal from intermittent
disk failures.  Thus, we have less performance but no more corruption
for the past 2 years.

Just one more experience.

Bob B
Rastogi, Asheesh (Cognizant) - 27 Aug 2003 05:03 GMT
This is a multi-part message in MIME format.

------=_NextPartTM-000-13b7e843-fa5d-4eeb-be8d-2cc75c3cb690
Content-Type: text/plain;
    charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Art,
     Your comments on this?

Cheers,
Asheesh.

> -----Original Message-----
> From: Bob Bankay [mailto:bobb@ssl.berkeley.edu]
[quoted text clipped - 44 lines]
>=20
>=20

------=_NextPartTM-000-13b7e843-fa5d-4eeb-be8d-2cc75c3cb690
Content-Type: text/plain;
    name="InterScan_Disclaimer.txt"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
    filename="InterScan_Disclaimer.txt"

This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information.
If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.
Any unauthorised review, use, disclosure, dissemination, forwarding, printing or copying of this email or any action taken in reliance on this e-mail is strictly
prohibited and may be unlawful.

        Visit us at http://www.cognizant.com

------=_NextPartTM-000-13b7e843-fa5d-4eeb-be8d-2cc75c3cb690--
sending to informix-list
Art S. Kagel - 27 Aug 2003 21:57 GMT
There are several sources of index corruption generally in a few classes:

Caused by bugs:
There was a known BTREE-CLEANER bug that would occassionally corrupt indexes but
I no longer remember what release(s) are affected.

Caused by crashes:
If you use caching controllers and/or an intelligent disk farm there is a real
risk that a system crash can leave data in the cache.  This may or may not be
flushed to disk eventually and you may or may not want it so flushed. Especially
prevalent after a controller failure.

Caused by improper shutdown:
Often I've found corruption similar to what one normally finds after a system
crash following a supposedly graceful system shutdown/restart.  This is caused
by the rc files sending a shutdown to the database and returning immediately
without waiting for the server to actually shutdown.  The delayed shutdown can
be caused by a large number of dirty buffers not yet flushed or applications in
critical sections for which the server must wait before starting the checkpoint
preparatory to a shutdown.

Caused by database logging mode:
UNLOGGED databases are at the highest risk for index corruption as the engine
has no way to recover a partially written index update after a crash or improper
shutdown.  These are most likely to suffer corruption caused by the other
issues.

BUFFERED LOG databases are still at some risk for corruption, especially if
there are no active UNBUFFERED LOG databases running in the same instance. This
is because a BUFFERED LOG database does not force the logical log buffer to
flush when committing a transaction.  UNBUFFERED LOG databases always force a
logical log buffer flush immediately upon writing a COMMIT record to the logical
log buffer and so are rarely found to have a corrupted index.  If you have a
mixture of BUFFERED and UNBUFFERED databases you may be OK if the UNBUFFERED
databases are rather active since the BUFFERED database records will be caught
up in the flushes forced by the UNBUFFERED database activity.

Note that having databases that are UNBUFFERED LOG databases offers the
additional risk that the engine will not be able to restart after a crash
without tech support intervention.  This can be caused when a commit requires
multiple writes to the logical logs and the commit is partially written to one
buffer which fills and is flushed, then the final COMMIT is written to the next
logical log buffer which is not immediately flushed before the system crashes.
On restart the engine will hang when it encounters the end of the logical log
in the middle of a partially written commit.

Art S. Kagel

> Hi all,
>
[quoted text clipped - 18 lines]
> http://sitebuilder.yahoo.com
> sending to informix-list
Michael Mueller - 28 Aug 2003 14:51 GMT
Hi Art and all,

I don't doubt that some of you have observed index corruption in logged
databases after system crashes. If this was not caused by faulty
hardware or os software that corrupts disk contents directly (index and
data pages, log space, etc.) it should be considered an Informix fast
recovery bug and should be fixed if possible. This should also apply to
buffered logging.

Michael

> There are several sources of index corruption generally in a few classes:
>
[quoted text clipped - 66 lines]
>>http://sitebuilder.yahoo.com
>>sending to informix-list

-- --
Art S. Kagel - 28 Aug 2003 20:28 GMT
> Hi Art and all,

There's nothing that can be fixed in IDS to get rid of the risk in BUFFERED LOG
databases.  Here's a typical scenario:

1-Transaction updates a row implying an index page in the buffer cache must be
updated.

2-The preimage of the page is written to the physical log buffer.

3-Record of the update itself is written to the logical log buffer.

4-Data and index pages are updated in the buffer cache.

5-An LRU containing the index page or the data page but not both is flushed to
disk because that LRU has reached its LRU_MAX_DIRTY level.

6-System crash or improper shutdown.

7-System/Engine restart - BOOM index corruption!

Cause: The logical and physical log buffers have never been flushed to disk
(database is BUFFERED LOG right?) but the data or the index page has been
flushed!  So the index reflects a key value that is not reflected in the row or
indicates a row that's been deleted or that should have been inserted but was
not or omits a row that's been inserted or that should have been deleted but
wasn't.  On restart fast recovery cannot correct the problem because the
physical and/or logical log pages needed to undo this part of the partial
transaction were never flushed!  While this is apparently an odd set of
circumstances, I've seen enough corrupted indexes to know that it does indeed
happen in the real world.

If this scenario happens in an UNBUFFERED LOG database instance the logical log
buffers are immediately flushed as soon as ANY transaction on the server
completes so the window of risk in this case is miniscule even compared to the
very small window in a BUFFERED LOG environment.

The only way to protect yourself if you use BUFFERED LOG is to use a very high
value for LRU_MAX_DIRTY and a relatively short checkpoint interval so all buffer
flushes occur at checkpoint time.  Since logical and physical log buffers are
flushed at the beginning of the checkpoint and so before any dirty data/index
pages are flushed then the risk is eliminated unless FG writes occur.

Art S. Kagel

> I don't doubt that some of you have observed index corruption in logged
> databases after system crashes. If this was not caused by faulty hardware or
[quoted text clipped - 76 lines]
>>
> -- --
Madison Pruet - 28 Aug 2003 23:06 GMT
Art,

Do you have a repro/case involved where this occurred?  We are supposed to
be flushing the physical log buffer and the logical log buffers to disk
prior to writing the data page/index page to disk.  If there is a case where
we aren't, then we should consider it a bug that needs to be resolved.

M.Pruet

> > Hi Art and all,
>
[quoted text clipped - 121 lines]
> >>
> > -- --
Michael Mueller - 29 Aug 2003 12:07 GMT
Madison,

I agree we flush the phy log buf. But do we really flush the logical log
buf (except when we do no physical logging in some specially optimized
cases)?

Michael

> Art,
>
[quoted text clipped - 260 lines]
>>>>
>>>-- --
Jonathan Leffler - 30 Aug 2003 05:36 GMT
> I agree we flush the phy log buf. But do we really flush the logical log
> buf (except when we do no physical logging in some specially optimized
> cases)?

There's a thing called log write-ahead protocol that, AFAIK, is
required to ensure recoverability.  That means that the logical log
information must be on disk before the other changes are committed.
That applies absolutely at checkpoints (unless fuzzy checkpoint did
something I'm not aware of - a possibility), and maybe even before
pages get written out normally.  At the least, that was the version 5
theory.  Anybody aware of when the rules changed.  (I suspect that a
few details above are a bit slipshod - corrections or amendments welcome.)

>> Do you have a repro/case involved where this occurred? We are
>> supposed to be flushing the physical log buffer and the logical
>> log buffers to disk prior to writing the data page/index page to
>> disk. If there is a case where we aren't, then we should consider
>> it a bug that needs to be resolved.

[...major snippage because the indentation is shot to pieces...]

Signature

Jonathan Leffler                   #include <disclaimer.h>
Email: jleffler@earthlink.net, jleffler@us.ibm.com
Guardian of DBD::Informix v2003.04 -- http://dbi.perl.org/

Umberto Quaia - 30 Aug 2003 00:07 GMT
Miyaki wrote:
> >>OS : Sun Solaris 2.7
> >>IDS : 7.31.UD5
[quoted text clipped - 3 lines]
> >>database re-org by using the dbexport-import at 3 months ago.
> >>Anybody knows the reason that may cause the index corruption???

> Art S. Kagel answered:
> > There are several sources of index corruption generally in a few classes:
[quoted text clipped - 37 lines]
> > in the middle of a partially written commit.
> > Art S. Kagel

"Michael Mueller" <michael.mueller01@kay-mueller.de> observed:
> Hi Art and all,
> I don't doubt that some of you have observed index corruption in logged
[quoted text clipped - 4 lines]
> buffered logging.
> Michael

Michael,
Art has correctly pointed out that caching controllers may potentially
cause corruption. That kind of corruption:
- is not due to hardware failure, but to the way the hardware works
 ordinarily
- is not due to the database engine

So you have a corruption which is not a bug and is not caused by
faulty
hardware...

I agree that a good controller should have a way to disable its cache
if I don't want it (on database disks, for example), but if the
manufacturer doesn't give you a way to do that...
I'd call that too-smart hardware: it tells the engine that data has
been
flushed before it has been really, and the db faithfully marks that
data
as already written. It works as a charm until power outage or
overheating
force a disk farm stop... You are lucky if it's just an index
involved!

With Oracle and ordinary datafiles, the problem is even greater. You
have
OS buffering too. Greater flexibility, but less robustness. It tells
you
that a datafile needs recover, you give RECOVER DATABASE, and
sometimes
it's all OK. I wonder what has been done when it succeeds. If datafile
was corrupt, how can it recover without physical restore? Why doesn't
it
attempt that directly by itself? Well, maybe I'm too curious, I like
to
"look under the hood" to understand... Of course, using cooked files
exposes Dynamic Server to the same problem.

So that's another possible cause. Summarizing them all:

- bugs
- crashes with cacheing controllers
- improper shutdowns
- buffered log or unlogged databases
- usage of cooked files instead of raw devices with databases

Umberto Quaia
Michael Mueller - 30 Aug 2003 18:16 GMT
Umberto,

One could fight about what one should call such caching hardware. You
could try to protect it using emergency power supply. Otherwise it will
not only corrupt databases but also ordinary file systems it hosts. One
can call that unreliable at the least.

With cooked files Informix did not use to guarantee data integrity in a
political sense. But internally cooked chunks are opened using the
O_SYNC os flag (see "man 2 open"). I believe that most modern operating
system honor this by returning from a write only after the data is
really on disk (but there is the same problem with write cache of
course). If this holds data integrity is not endangered even on cooked
chunks.

Michael

> Michael,
> Art has correctly pointed out that caching controllers may potentially
[quoted text clipped - 42 lines]
>
> Umberto Quaia

Signature

=== Michael Mueller ==================
Tel. + 49 8171 63600
Fax. + 49 8171 63615
Web: http://www.mm.kay-mueller.de
     http://www.planets.kay-mueller.de
======================================

Michael Mueller - 28 Aug 2003 09:48 GMT
Hi Art and all,

I don't doubt that some of you have observed index corruption in logged
databases after system crashes. If this was not caused by faulty
hardware or os software that corrupts disk contents directly (index and
data pages, log space, etc.) it should be considered an Informix fast
recovery bug and should be fixed if possible. This should also apply to
buffered logging.

Michael

Art S. Kagel wrote:
> On Tue, 12 Aug 2003 22:01:38 -0400, miyaki wrote:
>
> There are several sources of index corruption generally in a few classes:
>
> Caused by bugs:
> There was a known BTREE-CLEANER bug that would occassionally corrupt
indexes but
> I no longer remember what release(s) are affected.
>
> Caused by crashes:
> If you use caching controllers and/or an intelligent disk farm there
is a real
> risk that a system crash can leave data in the cache.  This may or
may not be
> flushed to disk eventually and you may or may not want it so flushed.
Especially
> prevalent after a controller failure.
>
> Caused by improper shutdown:
> Often I've found corruption similar to what one normally finds after
a system
> crash following a supposedly graceful system shutdown/restart.  This
is caused
> by the rc files sending a shutdown to the database and returning
immediately
> without waiting for the server to actually shutdown.  The delayed
shutdown can
> be caused by a large number of dirty buffers not yet flushed or
applications in
> critical sections for which the server must wait before starting the
checkpoint
> preparatory to a shutdown.
>
> Caused by database logging mode:
> UNLOGGED databases are at the highest risk for index corruption as
the engine
> has no way to recover a partially written index update after a crash
or improper
> shutdown.  These are most likely to suffer corruption caused by the other
> issues.
>
> BUFFERED LOG databases are still at some risk for corruption,
especially if
> there are no active UNBUFFERED LOG databases running in the same
instance. This
> is because a BUFFERED LOG database does not force the logical log
buffer to
> flush when committing a transaction.  UNBUFFERED LOG databases always
force a
> logical log buffer flush immediately upon writing a COMMIT record to
the logical
> log buffer and so are rarely found to have a corrupted index.  If you
have a
> mixture of BUFFERED and UNBUFFERED databases you may be OK if the
UNBUFFERED
> databases are rather active since the BUFFERED database records will
be caught
> up in the flushes forced by the UNBUFFERED database activity.
>
> Note that having databases that are UNBUFFERED LOG databases offers the
> additional risk that the engine will not be able to restart after a crash
> without tech support intervention.  This can be caused when a commit
requires
> multiple writes to the logical logs and the commit is partially
written to one
> buffer which fills and is flushed, then the final COMMIT is written
to the next
> logical log buffer which is not immediately flushed before the system
crashes.
> On restart the engine will hang when it encounters the end of the
logical log
> in the middle of a partially written commit.
>
[quoted text clipped - 7 lines]
>>Recently my database engine was crashed due to the index corruption. The
>>engine was back up after repairing the index by using the oncheck
command. I'm
>>wondering why the index can be corrupted? In fact, I just performed the
>>database re-org by using the dbexport-import at 3 months ago.
[quoted text clipped - 11 lines]
>>http://sitebuilder.yahoo.com
>>sending to informix-list

sending to informix-list
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.