Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Database Servers
DB2InformixIngresMS SQLOraclePervasive.SQLPostgreSQLProgressSybase
Desktop Databases
FileMakerFoxProMS AccessParadox
General
General DB TopicsDatabase Theory
Related Topics
Java Development.NET DevelopmentVB DevelopmentMore Topics ...

Database Forum / DB2 Topics / January 2005

Tip: Looking for answers? Try searching our database.

Performance problems when inserting into a large table

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Joachim Klassen - 26 Jan 2005 13:44 GMT
Hi all,

first apologies if this question looks the same as another one I recently
posted - its a different thing but for the same szenario:-).

We are having performance problems when inserting/deleting rows from a large
table.
My scenario:

Table (lets call it FACT1) with 1000 million rows distributed on 12
Partitions (3 physical hosts with 4 logical partitions each).
Overall size of table is 350 GB. Each night 1.5 Million new rows will be
added
and approx. the same amount of old records will be deleted (Roll in/Roll out
with SQL INSERT/DELETE).
The table is stored in SMS tablespace with 16K Pagesize and 64 Pages
Extentsize.
The tablespace has 6 containers on each partition. Each container is on a
separate IBM ESS array.
Prefetchsize is 384 (6 containers * 64 pages). Prefetch behaves very well
with these settings (DB2_PARALLEL_IO is set)
DB2 is V8.1 ESE (DPF) FP5 and runs on AIX.

It takes 7 hours to insert 1.5 Million Rows into FACT1 and up to 7 hours to
delete the same amount.
The Insert is done via INSERT INTO FACT1 ... SELECT * FROM STAGING_TABLE.
Both the fact and the staging table are in tablespaces in the same nodegroup
and do have the same partitioning key.

On a similar table (lets call it FACT2) with a comparable amount of
data/rows and nearly identical configuration the same process takes only 5
minutes.

The main difference between these two tables is that FACT1 has 7 indexes
defined on it and FACT2 only 4.
One of the indexes in each case is unique, the others not (all type 2).
There is no clustering index and the APPEND attribute is set to ON.
I'm aware of the pseudo-delete mechanism of type-2 indexes and the
corresponding longer search time for insert's in the index leaf pages .
But an exclusive lock on the table before inserting/deleting does not change
the needed runtime.
(And the docs say that with a X-lock on table pseudo-deletes will not
happen).
Also after reorg of table and indexes the insert runtime is the same as
before.

Is it possible that the additional index maintenace for FACT1 leads to such
a longer runtime ?
What exactly happens internal for index maintenance (searched the docs - but
do not found internals)?
Anyone seen similar behaviour ?

I can post additional infos if required (table and Index definitions,
statistics ...) - but wanted to keep the posting small in first place.

TIA for any comments
Joachim

PS: Feel free to send comments by email to joklassen at web dot de
PPS: We are parallel investigating in MDC tables, using smaller tables (and
combining them with a UNION ALL view) and the use of LOAD FROM CURSOR
instead of INSERT
Serge Rielau - 26 Jan 2005 14:31 GMT
> Hi all,
>
[quoted text clipped - 46 lines]
> What exactly happens internal for index maintenance (searched the docs - but
> do not found internals)?
I'm not privy of index maintenance internals, but could it be the 7
indexes cause a spill of some heap? Maybe sort heap? Have you checked
the snapshots?
Have you verified that the plans are good? You shouldn't see any TQs.
Also are you sure you don't have any other complicating factors (SQL
Functions, Triggers, check or RI constraints) (The plans will show).
> PPS: We are parallel investigating in MDC tables, using smaller tables (and
> combining them with a UNION ALL view) and the use of LOAD FROM CURSOR
> instead of INSERT
Be careful with LOAD FROM CURSOR, the cursor is a bottle neck. To do
that in a scalable fashion you would fire up concurrent LOADs on each
node filtering the source by DBPARTITION.
You shouldn't need UNION ALL.

Cheers
Serge

Signature

Serge Rielau
DB2 SQL Compiler Development
IBM Toronto Lab

Joachim Klassen - 26 Jan 2005 15:40 GMT
Serge,
again thanks for your quick reply :-)

I will try to get snapshot information next days (Problem is that "get
snapshot for all " runs 1 hour on production and once crashed the instance
in the past :-) (problem is fixed in FP7 which will be applied in the near
time)).

> Have you verified that the plans are good? You shouldn't see any TQs.
> Also are you sure you don't have any other complicating factors (SQL
> Functions, Triggers, check or RI constraints) (The plans will show).
The plan looks good (for me). Maybe you can comment it:

Section Code Page = 819

Estimated Cost = 31926.718750
Estimated Cardinality = 75608.000000

       Coordinator Subsection - Main Processing:
(-----)    Distribute Subsection #1
          |  Broadcast to Node List
          |  |  Nodes = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
          |  |          11, 12

       Subsection #1:
(    3)    Access Table Name = DTMP1T.STAGING ID = 411,121
          |  #Columns = 24
          |  Volatile Cardinality
          |  Relation Scan
          |  |  Prefetch: Eligible
          |  Lock Intents
          |  |  Table: Intent Share
          |  |  Row  : Next Key Share
(    2)    Insert:  Table Name = DPERMT.FACT1 ID = 1714,2

End of section

Optimizer Plan:

                 INSERT
                 (   2)
           /----/      \
     TBSCAN       Table:
     (   3)       DPERMT
       |          F7KB_F_A_T_Q_B_K
Table:
DTMP1T
F7KB_F_A_T_Q_B_K

> Be careful with LOAD FROM CURSOR, the cursor is a bottle neck. To do that
> in a scalable fashion you would fire up concurrent LOADs on each node
> filtering the source by DBPARTITION.

Does that mean
DECLARE C1 CURSOR for select * from stage where dbpartitionnum(column) = 1
LOAD FROM C1 OF CURSOR INSERT INTO FACT1 ... OUTPUT_DBPARTNUMS 1
DECLARE C2 CURSOR for select * from stage where dbpartitionnum(column) = 2
LOAD FROM C2 OF CURSOR  INSERT INTO FACT1 ... OUTPUT_DBPARTNUMS 2
and so on

Thanks
Joachim

>> Hi all,
>>
[quoted text clipped - 62 lines]
> Cheers
> Serge
Serge Rielau - 26 Jan 2005 18:18 GMT
> Optimizer Plan:
>
[quoted text clipped - 7 lines]
>  DTMP1T
>  F7KB_F_A_T_Q_B_K
Doesn't get easier than that...
>>Be careful with LOAD FROM CURSOR, the cursor is a bottle neck. To do that
>>in a scalable fashion you would fire up concurrent LOADs on each node
>>filtering the source by DBPARTITION.
>
> Does that mean
Connect to node 1:
> DECLARE C1 CURSOR for select * from stage where dbpartitionnum(column) = 1
> LOAD FROM C1 OF CURSOR INSERT INTO FACT1 ... OUTPUT_DBPARTNUMS 1
Connect to node 2:
> DECLARE C2 CURSOR for select * from stage where dbpartitionnum(column) = 2
> LOAD FROM C2 OF CURSOR  INSERT INTO FACT1 ... OUTPUT_DBPARTNUMS 2
connect to node "and so on"
> and so on

Basically you are your own splitter.

This, btw is a great way to do batch processing with procedures.

Cheers
Serge

Signature

Serge Rielau
DB2 SQL Compiler Development
IBM Toronto Lab

 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.