Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Database Servers
DB2InformixIngresMS SQLOraclePervasive.SQLPostgreSQLProgressSybase
Desktop Databases
FileMakerFoxProMS AccessParadox
General
General DB TopicsDatabase Theory
Related Topics
Java Development.NET DevelopmentVB DevelopmentMore Topics ...

Database Forum / Oracle / Oracle Server / December 2005

Tip: Looking for answers? Try searching our database.

PROBLEM: Cannot rebuild index!!

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
BD - 30 Dec 2005 19:21 GMT
Hi all.

I have a problem which I'm not sure how to deal with.

Here's the scenario (8i on AIX):

We have a system wherein some data warehouse tables are truncated and
repopulated each night with fresh data.

This morning, a unique index on one of these tables was in an unuseable
state. I attempted to rebuild it, and got the error ORA-01452: cannot
CREATE UNIQUE INDEX; duplicate keys found.

I've looked on Metalink, and The only sense I can make of this is that
when the new data was inserted, a duplicate row was created.

I believe that this data insert was done via SQLLoader. How the inserts
succeeded and got past the unique index, I'm not sure.

There's a script on metalink to 'delete' duplicate rows (included
below), but I would like to identify the duplicates before deleting
them. I have changed the script from 'delete from' to 'select * from',
so I can view the data. But I've so far waited 10 minutes for results,
and for a 17000 column table that strikes me as not very promising.

So, in short, my question is: How should I deal with suspected
duplicate rows when my indexes are not working?

Hope someone can help!

BD

***here's the Metalink script***

REM This is an example SQL*Plus Script to delete duplicate rows from
REM a table.
REM
set echo off
set verify off heading off
undefine t
undefine c
prompt
prompt
prompt Enter name of table with duplicate rows
prompt
accept t prompt 'Table: '
prompt
select 'Table '||upper('&&t') from dual;
describe &&t
prompt
prompt Enter name(s) of column(s) which should be unique. If more than
prompt one column is specified , you MUST separate with commas.
prompt
accept c prompt 'Column(s): '
prompt
delete from &&t
where rowid not in (select min(rowid) from &&t group by &&c)
/
Matthias Hoys - 30 Dec 2005 19:46 GMT
> Hi all.
>
[quoted text clipped - 54 lines]
> where rowid not in (select min(rowid) from &&t group by &&c)
> /

You should first check if the business logics allow to have duplicate rows.
If they don't, there's something wrong with your ETL procedures and you
should fix this. If duplicate rows are allowed, you shouldn't be having a
UNIQUE index on those columns. Oh, I haven't seen a 17000 column table
before, but i'm not that experienced with large data warehouses. Have you
tried parallel DML in order to speed up the query ?

Matthias
BD - 30 Dec 2005 20:05 GMT
I found out why the dupes came into play:

We drop indexes, then do the inserts from the remote db, and re-enable
the indexes. It's entirely possible that the source db has made some
changes to how they index their tables.

We'll have to contact the dbas of the source db and see if they've made
some RI changes somewhere...

Blissfully, this particular dataset is not yet in use by anyone, so if
it's broken we won't suffer. ;)
Mark C. Stock - 30 Dec 2005 20:52 GMT
>I found out why the dupes came into play:
>
[quoted text clipped - 7 lines]
> Blissfully, this particular dataset is not yet in use by anyone, so if
> it's broken we won't suffer. ;)

why is it necessary to drop and recreate the indexes when adding rows to the
table?

++ mcs
BD - 30 Dec 2005 21:16 GMT
>why is it necessary to drop and recreate the indexes when adding rows to the table?

It's a performance thing. It's not necessary, per se - but there's a
lot of transformations that go on in these tables as they are updated,
and the index slows the process down. So the index is dropped, the
table is truncated and repopulated, the transformations take place, and
the index is recreated. We kind of rely on the integrity of the source
dbs data for valid inserts. First time I've seen a problem with the
data, so it doesn't happen often. All I can figure is that the source
db must have changed their indexing criteria and not told us.
Sybrand Bakker - 30 Dec 2005 21:10 GMT
>We drop indexes, then do the inserts from the remote db, and re-enable
>the indexes. It's entirely possible that the source db has made some
>changes to how they index their tables.

Strange procedure. The very reason for the ALTER INDEX REBUILD command
was to make procedures like this, coming from the DARK pre 7.3 ages,
when DBAs got no sleep, as they had to reindex from home, obsolete/
Also by dropping the index you are dropping RI.
Couldn't you just set the constraint to novalidate,prior to the load,
and enable validate exceptions into .... ?
Then you would have prevented all this fuss.

--
Sybrand Bakker, Senior Oracle DBA
BD - 30 Dec 2005 23:42 GMT
>Couldn't you just set the constraint to novalidate,prior to the load,
>and enable validate exceptions into .... ?
>Then you would have prevented all this fuss.

Quite possibly. Sadly, I'm 1 year into an environment that has been
humming along for several years prior - so reinvention of the process
will likely not be received all that well. However, if this case can be
used as an object lesson, perhaps some retooling of the process would
at least be considered.
Randy Harris - 30 Dec 2005 19:53 GMT
> Hi all.
>
[quoted text clipped - 54 lines]
> where rowid not in (select min(rowid) from &&t group by &&c)
> /

Perhaps this is overly simplistic, but can you view the duplicate rows with:

Select f1
From t1
Group By f1
Having Count(f1)>1

BTW - I hope that table has 17000 rows rather than 17000 columns.

Signature

Randy Harris
tech at promail dot com
I'm pretty sure I know everything that I can remember.

BD - 30 Dec 2005 20:07 GMT
>Perhaps this is overly simplistic

Not overly simplistic at all - in fact, it works a treat!

We found two instances of duplicates.

Seems we're not going to bother cleaning it up just now, because this
data set is not yet in use by the client base, and so if it's broken no
one will know/care. We'll deal with it next week when we're fully
staffed again.

Oh - yes, it's 17000 rows. *smack smack*. only 8 columns. ;)

Thanks!!
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2010 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.