Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Database Servers
DB2InformixIngresMS SQLOraclePervasive.SQLPostgreSQLProgressSybase
Desktop Databases
FileMakerFoxProMS AccessParadox
General
General DB TopicsDatabase Theory
Related Topics
Java Development.NET DevelopmentVB DevelopmentMore Topics ...

Database Forum / Ingres Topics / December 2006

Tip: Looking for answers? Try searching our database.

Database storage

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
samantha_fein19@yahoo.com - 14 Dec 2006 22:59 GMT
Hi -

In Ingres 2006, is it possible to create a partitioned table such that
data is stored onto more than one physical machine?

sam
Roy Hann - 15 Dec 2006 00:13 GMT
> Hi -
>
> In Ingres 2006, is it possible to create a partitioned table such that
> data is stored onto more than one physical machine?

Currently all the parititions of a partitioned table have to reside in the
same installation.

Ingres Star does allow you distribute a database over multiple
installations/machines, but individual tables cannot be distributed.

Roy
samantha_fein19@yahoo.com - 15 Dec 2006 01:48 GMT
That's a help, thanks. I wonder (since you said "currently") is there a
road map or a time table for adding that kind of thing?

sam

> <samantha_fei...@yahoo.com> wrote in messagenews:1166137199.172782.249130@16g2000cwy.googlegroups.com...
>
[quoted text clipped - 8 lines]
>
> Roy
Karl & Betty Schendel - 15 Dec 2006 05:03 GMT
>That's a help, thanks. I wonder (since you said "currently") is there a
>road map or a time table for adding that kind of thing?

Not really, not as such.  There's not much call for partitioning
across machines unless you have a fancy high-speed interconnect, or
unless they are so nearly independent that they might as well
be separate tables.  Then, to make use of it properly you need
a lot of Star-like machinery on top to put the pieces together.

I don't see this happening unless someone comes along with a
proposal and eager hands to do the work.  (A contribution would
be entirely welcome, though.)

Karl

>> <samantha_fei...@yahoo.com> wrote in messagenews:1166137199.172782.249130@16g2000cwy.googlegroups.com...
>>
[quoted text clipped - 8 lines]
>>
> > Roy
Jim Gramling - 15 Dec 2006 11:04 GMT
Actually there is a way, in theory, kind-a /sort-of, to do this: you
can create horizontally partitoned tables using Ingres Replicator and
lookup tables.

For example, you can create a "logically distributed database" where
data associated with sales/whatever in "NY" is stored on a machine in
NY, and data from San Francisco is stored on a machine in SF.  The
caveat, as I recall, is that you have to have, somewhere, a "master"
version of each table with data from all of the remote nodes.  Whether
it would be implementable, or a practical solution to your problem,
will depend on your business case and other factors (such as whether
all of your tables have unique keys, and whether your criteria for
partitioning is easily lookup-able.

I just thought I would mention this possibility, as it might be enough
of a solution for whatever problem you are looking at.

regards,

Jim Gramling
Rio de Janeiro, Brazil

> >That's a help, thanks. I wonder (since you said "currently") is there a
> >road map or a time table for adding that kind of thing?
[quoted text clipped - 23 lines]
> >>
> > > Roy
Paul Andrews - 15 Dec 2006 11:35 GMT
> Hi -
>
> In Ingres 2006, is it possible to create a partitioned table such that
> data is stored onto more than one physical machine?

Hi,

There have already been a few responses about this, which I hope have
helped.

I think the real question is what benefit do you believe such a facility
would bring to you?
What disadvantage do you perceive without this facility?

Paul

> sam
>
> _______________________________________________
> Info-ingres mailing list
> Info-ingres@cariboulake.com
> http://mailman.cariboulake.com/mailman/listinfo.py/info-ingres
samantha_fein19@yahoo.com - 15 Dec 2006 20:22 GMT
Hi Paul,

Yes, the answers are very helpful - thanks to all. To answer your
question. The advantage is for querying on big tables. Say you have a
10TB table, just a long, flat table. There isn't a lot of writing done
on it, just queries. Imagine trying to configure hardware to run
queries on it. If all 10TB on one machine, it's going to be very
expensive to get any kind of performance.

But what if you could buy 20 computers which could each hold a 500 MB
partition (like NYC, San Francisco, etc., like Jim was saying, but the
computers are physically in the same room with a fast connection), and
a 21st machine which divided up the work (and brought it back together
- I think that's the part that is like "Ingres Star").

It would cost you something like USD 126,000 with two dual core
processors, RAID 10 and sixteen gigs of memory on each computer (that's
without the bulk discount ;). You would also buy enterprise support
from Ingres, naturally.

What you get. 80 processors and 320 GB RAM running under 20 copies of
Ingres. If the the central (Ingres Star) algorithm is good the overall
speed would scale nicely.

I never got a quote on a single machine with 320 GB RAM and 80
processors but I think it must be a few million US dollars. And maybe
it's modestly faster (and a whole lot QUIETER) than this daydream, but
the price difference is a factor of ten or twenty (!)

So the answer to your question about the advantage of such a feature
involves millions in savings. Finest sushi Hawaii has to offer. Mint
Juleps at the Derby from a silver cup. You know the drill.

sam

> <samantha_fei...@yahoo.com> wrote in messagenews:1166137199.172782.249130@16g2000cwy.googlegroups.com...
>
[quoted text clipped - 18 lines]
> > Info-ing...@cariboulake.com
> >http://mailman.cariboulake.com/mailman/listinfo.py/info-ingres
weasel - 16 Dec 2006 08:51 GMT
I think DATAllegro may have done all this sort of work already.

> Hi Paul,
>
[quoted text clipped - 53 lines]
> > > Info-ing...@cariboulake.com
> > >http://mailman.cariboulake.com/mailman/listinfo.py/info-ingres
Jean-Pierre Zuate - 16 Dec 2006 09:28 GMT
Hello Sam (an other),

Very intersting mail, I follow you in that direction.

What you explain here is in the cluster scope, and what you say about saving
money is, from my point of view, true. Cluster is a large subject and I
think many people confuses the concept and the technology (system cluster
and from the Ingres side things like VMS cluster or functionnality announced
with Ingres2006).

What you also need to include here a knowledge layer about star, perhaps
replicator if you want to put some high avaibility option in your
consideration and for database administration. Dealing with 1 to 50 ingres
installation is one thing if they are all the same (I did that in the past),
dealing with 1 big system composed with 20 or more Inres installation is a
other stuff.

I have in mind a french customer (french army) with a very big replicator
configuration. They manage the distribution of information and its
confidentiality with replicator and at the end they have a very complex
scheme of replication. To deal each day with it, they have developed some
particular houskeeping scripts and procedure and finaly it work "well". I
say "weel" because if they can find a solution for each issue they have they
must maintain people with a very good knowledge of their applicaton, Ingres,
replicator, system, and something like a hacker spirit to deal with those
complex issues (and they have some people like that, it is due to the fact
they are working for the army and then are not thinking salary or career but
only to the service they must provide to their "customers"). Saving money on
hardware is one thing, saving money on all the project is an other thing.
What if your system is sometime down for some unfindable reasons ? Or if a
patch break something somewhere ? But even in a more simple configuration
you can have issues like that ...

I already discuss of this kind of subject with many people, in some formal
and less formal context, and few of them think it was doable et maintainable
in a regular basis. My point of view is all of this is a question of
discipline and rigour ... and tests. I see in my life only one customer
doing some real tests with users doing sometimes the same job on 2 system :
the production system and the next production system. This customer is a
bank and due to the amount of money captured by the information system they
have this heavy environment. All the other customers in the indusctry think
first to the bill of hardware and software, some others of the complete bill
(hard + soft + knowledge).

I hope you will acheive your project in the mood you explain and you'll find
all the help you need with Ingres, his support (premium or not), prehap's
this list and the future community site Inges announce. But sure I will read
your post with the first interest, don't care. If I can help you of any way
I will not hesitate ... (your 10TB table is too much interesting to miss a
knowledge on this subject :-))

Kind regards,
Jean-Pierre
15 Dec 2006 12:22:19 -0800, samantha_fein19@yahoo.com <
samantha_fein19@yahoo.com>:

> Hi Paul,
>
[quoted text clipped - 59 lines]
> Info-ingres@cariboulake.com
> http://mailman.cariboulake.com/mailman/listinfo.py/info-ingres

Signature

Jean-Pierre Zuate
La Fage Conseil
+33(0)6 11 40 11 09
jpzuate@gmail.com

Roy Hann - 16 Dec 2006 09:52 GMT
> Hi Paul,
>
[quoted text clipped - 4 lines]
> queries on it. If all 10TB on one machine, it's going to be very
> expensive to get any kind of performance.

It's not clear that plain, "old-fashioned" Ingres rule-based partitioning
alone wouldn't be a huge benefit to you if you spend a little time setting
it up.  Have you already concluded that would be no help to you?

But otherwise, you are describing a Datallegro DW appliance.  Go to
www.datallegro.com.

Roy
samantha_fein19@yahoo.com - 16 Dec 2006 19:45 GMT
Thank you all the answers are interesting. I am looking now at the
datallegro appliance. The "P-series" looks very close to the
description I was giving. They are using faster drives but less RAM...
Retail cost for hardware would be ... USD 8,000 for a similar setup.
Lets compare the hardware price to the software price.

Hardware for 20 systems costs USD 160,000 and let's spend 10,000 on
what they are calling the "master" node. (With our bulk discount, we
are going to upgrade the RAM from 8GB to 12, how's that? ;)

Datallegro actually has their price for this setup (albeit with a mere
8GB RAM) here. http://www.datallegro.com/pdf/usa_pricing_options.pdf.
By coincidence, they show a 10 TB example at the bottom. It is USD
850k. That's a very good price compared to a mainframe!

But still, I am paying that company USD 680,000 for the software. 850k
is five times the hardware price. I'm sure they are brilliant & kind
people, and I hope they get richer than the Queen. I'm not a big
fortune 500 company though. To get this extra 680,000 I'll either have
to turn to a life of crime, or try to get something like 10 people
fired (to pay for my analysis project). Oh, I can think of the 10
people, to tell you the truth, but it's not really my style. ;)

In seriousness, most of the real work is open source. The Allegro
company packages Linux and Ingres, absolutely huge accomplishments.
They add modest value with this coordinator program. (Sure it's a lot
of value to me, but it represents a whole lot less work than Ingres!)

The big question. Whether this kind of feature will have to be an
integrated part of a serious database system in the near future, or if
it can remain a specialized gizmo which can be sold for USD 680,000 to
large companies.

Of course you know what I'm thinking: that this capability is a proper
thing for a database system.  **Data** is what databases are all about.
We are not trying to maximize the usefulness of single "computer".
We're trying to maximize the usefulness of a "data set".

This feature allows one to spend 1/10 to 1/20th the amount on hardware,
to attain the same level of usefulness out of a 10TB dataset. Also the
improvement goes up exponentially with the amount of data. If there is
some other roadmap for getting a ten- or twenty-fold increase in the
performance/price ratio for a data set, well, let's hear it! [If
someone says "Wait for better hardware," they are disqualified.]

So what do you think (I'll step out and listen). Should the next
generation of database systems be able to do this by themselves or
should they rely on a USD 680,000 gizmo?

sam

> <samantha_fei...@yahoo.com> wrote in messagenews:1166214139.694499.169660@73g2000cwn.googlegroups.com...
>
[quoted text clipped - 12 lines]
>
> Roy
Roy Hann - 16 Dec 2006 20:48 GMT
> So what do you think (I'll step out and listen). Should the next
> generation of database systems be able to do this by themselves or
> should they rely on a USD 680,000 gizmo?

Well of course as Scott Adams pointed out, we'd all like better stuff for
free.  But until the average DBMS offers these kinds of capabilities you
have two choices: buy it from someone who's invested the R&D to deliver
something today that they guarantee works; or roll your own that will be
ready in a couple of years (after spending the money to do it, and doing the
testing, and committing to the lifetime of support).

If the business need exists today, and the money required is available, I
struggle to see how the shrink-wrapped solution wouldn't be a no-brainer.
You are always going to pay a huge premium to be ahead of the technological
pack.  But not all businesses see a justification for being there.  Maybe
yours is one.  In which case, sure, wait a few years until it becomes
standard.

Or, to repeat what I said earlier, based on your sketchy (and possibly
hypothetical?) problem description I wonder if the standard rule-based
partitioning that is already available for free in Ingres would completely
solve your problem?  Have you looked at it?  Are you confident it won't work
for you?   What requirements does is fail to satisfy?  (For a quick overview
take a look at http://www.rationalcommerce.com/papers/partition.htm.)

Roy
Karl & Betty Schendel - 17 Dec 2006 03:56 GMT
>[snip]
>In seriousness, most of the real work is open source. The Allegro
>company packages Linux and Ingres, absolutely huge accomplishments.
>They add modest value with this coordinator program. (Sure it's a lot
>of value to me, but it represents a whole lot less work than Ingres!)

Permit me to differ on that one.  The coordinator, as you call it,
can be simple only if you are doing simple concatenation of query
results.  Relatively few queries are like that.  Throw even a simple
aggregation into the problem query and now you have to combine the
answers at the master, which means that the master now has to understand
that it has to do it, which means that it has to parse the SQL.
Which is nontrivial.

And now throw in the ability to do joins across the appliance, and
do it in an efficient manner, and things get *real* complicated
real fast.  Union, anyone?  Outer joins across partitions?  It all
needs the master to understand the query enough to run it across
the entire box.

I could restate it this way, to perhaps give you an idea:  the master
does roughly what Ingres/Star does, but Star does not have nearly the
capability to make this thing work.

>The big question. Whether this kind of feature will have to be an
>integrated part of a serious database system in the near future, or if
>it can remain a specialized gizmo which can be sold for USD 680,000 to
>large companies.

I rather suspect that something kinda sorta like it is already
part of Oracle and/or DB2 under the guise of various clustering
options.  I also rather suspect that the license fees on the proposed
hardware will end up eating most of your $680K, and they are
not nearly as easy to use, and there are various other disadvantages
because they have to be general case, and you STILL don't have any
hardware.

Oh, and by the way, if you try to build your 20-machine setup for
$160K using a couple of 500 Gb disks each, you'll not get anything
remotely like the performance of a P-series, nor a C-series either.
The Datallegro appliance is built on 12 WD Raptors per slave, with
some other engineering thrown in.  Your two-drive-per-node machine
will give you something in the vicinity of 100-150 Mb/sec per machine
which is less than 1/8 of the Datallegro speeds, and that's best
case (simple dual-threaded scan).  Run a real query on that machine
and I guarantee you that it will go up in smoke because there's no place
to put any work space that won't contend with the two drives.
Run multiple queries concurrently, more contention.

I would be willing to bet that you would get better real-world,
real-query performance out of a 10 Tb C-series machine (which by
the way costs WAY less than $850K or $650K or whatever), than you
would get out of a $8k-per-box plumber's special.

>This feature allows one to spend 1/10 to 1/20th the amount on hardware,
>to attain the same level of usefulness out of a 10TB dataset.

As I have attempted to point out, if you had that feature alone, you
could spend 1/10 to 1/20 on hardware to be able to do a few very
simple things on a 10 Tb dataset.  You're fooling yourself if you
think it gives you real query capability on that dataset.  (Unless
we re-implement all of the Datallegro master logic into Ingres, which
would be a multi-year effort for no real value to Datallegro.)

And we haven't even started talking about little things like how are
you going to load it, or update it, or maintain it.  Or what you're
going to use for an interconnect, and how you're going to ship multi-Gb
partial query results back and forth across the nodes or master in
less than hours. Etc, etc, etc.

Signature

Karl R. Schendel, Jr.         schendel@kbcomputer.com
DATAllegro Inc
Ingres DBMS Server Development

 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.