Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Database Servers
DB2InformixIngresMS SQLOraclePervasive.SQLPostgreSQLProgressSybase
Desktop Databases
FileMakerFoxProMS AccessParadox
General
General DB TopicsDatabase Theory
Related Topics
Java Development.NET DevelopmentVB DevelopmentMore Topics ...

Database Forum / DB2 Topics / February 2006

Tip: Looking for answers? Try searching our database.

multiple insertions through jdbc

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Diego - 15 Feb 2006 16:09 GMT
Hi everybody!

I'm using DB2 PE v8.2.3 for linux.  I've defined a database with the
following schema:

ANNOTATION(ID,AUTHOR,TEXT)
ANNOTATION_BOOK(ANNOTATION_ID,OBJECT_ID)
BOOK(ID,AUTHOR,TITLE).

Between the book and annotation entities there is a many-to-many
relation: one annotation can involve many books and a book can be
annotated by many annotations.  So the attributes ANNOTATION_ID and
OBJECT_ID of the ANNOTATION_BOOK table are foreign keys referring to the
ANNOTATION and BOOK tables ids, respectively.  I'm using two sequences
for the attributes ID of tables ANNOTATION and BOOK: annotation_id_seq
and book_id_seq.

I've to make batch insertions in this database that are the result of an
algorithm.  The algorithm output is a set made up of thousands of
annotations involving a small number of books.  These books might
already been stored in the database and obviously I don't want to have
multiple tuples that represent the same book, i.e., have the same values
for the attributes AUTHOR and TITLE.

The result of the algorithm has to be stored in the database through
JDBC.  I'm using the driver in the packages: db2jcc.jar and db2java.jar.

My current solution is:

1. Query the table BOOK to determine the ids of the books that are
already persistent.

2. Inserting the book annotations in sequence in a batch.  More
specifically:
    2.1 Creating an instance of object java.sql.Statement, let's call it
statement.
     2.2 Adding an sql insert command for table ANNOTATION to the statement:

insert into ANNOTATION(ID,AUTHOR,TEXT)
            values(next value for annotation_id_seq,...).

    2.3 For each book involved in the book annotation:
        2.3.1(a) If the book is not already persistent add two sql insert
command for table BOOK and table ANNOTATION_BOOK to the statement:

insert into BOOK(ID,AUTHOR,TITLE)
            values(next value for public.book_id_seq,...)

insert into ANNOTATION_BOOK(ANNOTATION_ID,BOOK_ID)
       values(previous value for annotation_id_seq,
                   previous value for book_id_seq);

        2.3.1(b) If the book annotation involves a book that is already
persistent add one sql insert command for table ANNOTATION_BOOK with the
value that you retrieved at point 1 for the attribute BOOK_ID:

insert into ANNOTATION_BOOK(ANNOTATION_ID,BOOK_ID)
       values(previous value for annotation_id_seq,
                   <value_retrieved_at_point_1>);
    2.4 Execute the batch:

statement.executeBatch();

This solution is not very efficient.  I would like to improve the
insertion performance.  Maybe after retrieving the ids of the persistent
books (point 1), I could ask the dbms to increase the book and
annotation sequence by how much I need and then insert the tuples one
table at the time, maybe using a prepared statement instead of a statement.

Do you have any suggestion?

Thank you very much,

Diego
Phil Sherman - 17 Feb 2006 17:56 GMT
The nature of this question and the use of PE indicates this could be a
class problem.

The key to improving performance here is to minimize the overhead.
1. Use parameter markers and multiple statements.
2. Figure out a way to make the database manager do some of the work for
you.
3. Consider inserting multiple ANNOTATION_BOOK rows with each statement.

You'll have problems with your 1. because you've made no allowance for
the books you'll be adding to the table. You've also said nothing about
how you will be determining the id for the persistant book associated
with the current annotation. You've also said nothing about determining
that an annotation you are processing already exists in the annotation
table. (It's possible that all annotations are considered new and get a
new row in their table.)

Is a sequence appropriate? Would identity columns for BOOK and
ANNOTATION be an alternative?

Phil Sherman

> Hi everybody!
>
[quoted text clipped - 71 lines]
>
> Diego
Diego - 20 Feb 2006 19:41 GMT
Hi Phil,

I've made some modifications to this procedure that go in the direction
you were suggesting in your points 1,2 and 3 and I had a good improvement.

> You've also said nothing about how you will be determining the id for the persistant book associated with the current annotation.

I assume that the pair of attributes (AUTHOR, TITLE) of the relation
BOOK is unique (actually the db is more complex than that but this was a
significant simplification).  So I query BOOK with two values for this
pair of attributes, e.g.:

select *
from BOOK " +
where AUTHOR = ? and TITLE = ?

> You've also said nothing about determining that an annotation you are processing already exists in the annotation table. (It's possible that all annotations are considered new and get a new row in their table.)

Yes, that's exactly the case.  All annotation are considered new and get
a new row.

> Is a sequence appropriate? Would identity columns for BOOK and ANNOTATION be an alternative?

In the new procedure I cannot use identity columns.

The new procedure works as follows:

1.  Query the table BOOK to determine the ids of the books that are
already persistent.  To do that I execute a PreparedStatement for each
book:

String query = "select * " +
               "from public.BOOK " +
               "where AUTHOR = ? and TITLE = ?";
preparedStatement = connection.prepareStatement(query);
for each book
    preparedStatement.setXXX(1,...);
    preparedStatement.setXXX(2,...);
    resultSet = preparedStatement.executeQuery();

As you suggested in point 1, I tried to execute a multiple statement
(using addBatch) with a preparedStatement but it is not supported with a
SELECT.

2.  Request as many ids as needed for the annotations from the sequences
that manages the ANNOTATION table id:

preparedStatement = connection.prepareStatement("values nextval for
public.annotation_id_seq");
for each annotation
    ResultSet resultSet = preparedStatement.getResultSet();

3.  Request as many ids as needed for the books.

4.  Inserts tuple in ANNOTATION table.
5.  Inserts tuple in BOOK_ANNOTATION table.
6.  Insert tuples in BOOK table.

Points 4,5 and 6 are implemented through a prepared statement batch
(what you called in your point "multiple statement", if I got it right).
 As an example, consider point 4:

String sql = "insert into public.annotation(id,author,text) " +
             "values(?,?,?);";
preparedStatement = connection.prepareStatement(sql);
for each annotation
    preparedStatement.setXXX(1,...);
    preparedStatement.setXXX(2,...);
    preparedStatement.setXXX(3,...);
    preparedStatement.addBatch();

This is analogous to what I intended to do for point 1.  But apparently
multiple sql commands with prepared statements don't work with SELECTs.

Thanks a lot!

Diego
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.