Database Forum / Ingres Topics / December 2006
Database storage
|
|
Thread rating:  |
samantha_fein19@yahoo.com - 14 Dec 2006 22:59 GMT Hi -
In Ingres 2006, is it possible to create a partitioned table such that data is stored onto more than one physical machine?
sam
Roy Hann - 15 Dec 2006 00:13 GMT > Hi - > > In Ingres 2006, is it possible to create a partitioned table such that > data is stored onto more than one physical machine? Currently all the parititions of a partitioned table have to reside in the same installation.
Ingres Star does allow you distribute a database over multiple installations/machines, but individual tables cannot be distributed.
Roy
samantha_fein19@yahoo.com - 15 Dec 2006 01:48 GMT That's a help, thanks. I wonder (since you said "currently") is there a road map or a time table for adding that kind of thing?
sam
> <samantha_fei...@yahoo.com> wrote in messagenews:1166137199.172782.249130@16g2000cwy.googlegroups.com... > [quoted text clipped - 8 lines] > > Roy Karl & Betty Schendel - 15 Dec 2006 05:03 GMT >That's a help, thanks. I wonder (since you said "currently") is there a >road map or a time table for adding that kind of thing? Not really, not as such. There's not much call for partitioning across machines unless you have a fancy high-speed interconnect, or unless they are so nearly independent that they might as well be separate tables. Then, to make use of it properly you need a lot of Star-like machinery on top to put the pieces together.
I don't see this happening unless someone comes along with a proposal and eager hands to do the work. (A contribution would be entirely welcome, though.)
Karl
>> <samantha_fei...@yahoo.com> wrote in messagenews:1166137199.172782.249130@16g2000cwy.googlegroups.com... >> [quoted text clipped - 8 lines] >> > > Roy Jim Gramling - 15 Dec 2006 11:04 GMT Actually there is a way, in theory, kind-a /sort-of, to do this: you can create horizontally partitoned tables using Ingres Replicator and lookup tables.
For example, you can create a "logically distributed database" where data associated with sales/whatever in "NY" is stored on a machine in NY, and data from San Francisco is stored on a machine in SF. The caveat, as I recall, is that you have to have, somewhere, a "master" version of each table with data from all of the remote nodes. Whether it would be implementable, or a practical solution to your problem, will depend on your business case and other factors (such as whether all of your tables have unique keys, and whether your criteria for partitioning is easily lookup-able.
I just thought I would mention this possibility, as it might be enough of a solution for whatever problem you are looking at.
regards,
Jim Gramling Rio de Janeiro, Brazil
> >That's a help, thanks. I wonder (since you said "currently") is there a > >road map or a time table for adding that kind of thing? [quoted text clipped - 23 lines] > >> > > > Roy Paul Andrews - 15 Dec 2006 11:35 GMT > Hi - > > In Ingres 2006, is it possible to create a partitioned table such that > data is stored onto more than one physical machine? Hi,
There have already been a few responses about this, which I hope have helped.
I think the real question is what benefit do you believe such a facility would bring to you? What disadvantage do you perceive without this facility?
Paul
> sam > > _______________________________________________ > Info-ingres mailing list > Info-ingres@cariboulake.com > http://mailman.cariboulake.com/mailman/listinfo.py/info-ingres samantha_fein19@yahoo.com - 15 Dec 2006 20:22 GMT Hi Paul,
Yes, the answers are very helpful - thanks to all. To answer your question. The advantage is for querying on big tables. Say you have a 10TB table, just a long, flat table. There isn't a lot of writing done on it, just queries. Imagine trying to configure hardware to run queries on it. If all 10TB on one machine, it's going to be very expensive to get any kind of performance.
But what if you could buy 20 computers which could each hold a 500 MB partition (like NYC, San Francisco, etc., like Jim was saying, but the computers are physically in the same room with a fast connection), and a 21st machine which divided up the work (and brought it back together - I think that's the part that is like "Ingres Star").
It would cost you something like USD 126,000 with two dual core processors, RAID 10 and sixteen gigs of memory on each computer (that's without the bulk discount ;). You would also buy enterprise support from Ingres, naturally.
What you get. 80 processors and 320 GB RAM running under 20 copies of Ingres. If the the central (Ingres Star) algorithm is good the overall speed would scale nicely.
I never got a quote on a single machine with 320 GB RAM and 80 processors but I think it must be a few million US dollars. And maybe it's modestly faster (and a whole lot QUIETER) than this daydream, but the price difference is a factor of ten or twenty (!)
So the answer to your question about the advantage of such a feature involves millions in savings. Finest sushi Hawaii has to offer. Mint Juleps at the Derby from a silver cup. You know the drill.
sam
> <samantha_fei...@yahoo.com> wrote in messagenews:1166137199.172782.249130@16g2000cwy.googlegroups.com... > [quoted text clipped - 18 lines] > > Info-ing...@cariboulake.com > >http://mailman.cariboulake.com/mailman/listinfo.py/info-ingres weasel - 16 Dec 2006 08:51 GMT I think DATAllegro may have done all this sort of work already.
> Hi Paul, > [quoted text clipped - 53 lines] > > > Info-ing...@cariboulake.com > > >http://mailman.cariboulake.com/mailman/listinfo.py/info-ingres Jean-Pierre Zuate - 16 Dec 2006 09:28 GMT Hello Sam (an other),
Very intersting mail, I follow you in that direction.
What you explain here is in the cluster scope, and what you say about saving money is, from my point of view, true. Cluster is a large subject and I think many people confuses the concept and the technology (system cluster and from the Ingres side things like VMS cluster or functionnality announced with Ingres2006).
What you also need to include here a knowledge layer about star, perhaps replicator if you want to put some high avaibility option in your consideration and for database administration. Dealing with 1 to 50 ingres installation is one thing if they are all the same (I did that in the past), dealing with 1 big system composed with 20 or more Inres installation is a other stuff.
I have in mind a french customer (french army) with a very big replicator configuration. They manage the distribution of information and its confidentiality with replicator and at the end they have a very complex scheme of replication. To deal each day with it, they have developed some particular houskeeping scripts and procedure and finaly it work "well". I say "weel" because if they can find a solution for each issue they have they must maintain people with a very good knowledge of their applicaton, Ingres, replicator, system, and something like a hacker spirit to deal with those complex issues (and they have some people like that, it is due to the fact they are working for the army and then are not thinking salary or career but only to the service they must provide to their "customers"). Saving money on hardware is one thing, saving money on all the project is an other thing. What if your system is sometime down for some unfindable reasons ? Or if a patch break something somewhere ? But even in a more simple configuration you can have issues like that ...
I already discuss of this kind of subject with many people, in some formal and less formal context, and few of them think it was doable et maintainable in a regular basis. My point of view is all of this is a question of discipline and rigour ... and tests. I see in my life only one customer doing some real tests with users doing sometimes the same job on 2 system : the production system and the next production system. This customer is a bank and due to the amount of money captured by the information system they have this heavy environment. All the other customers in the indusctry think first to the bill of hardware and software, some others of the complete bill (hard + soft + knowledge).
I hope you will acheive your project in the mood you explain and you'll find all the help you need with Ingres, his support (premium or not), prehap's this list and the future community site Inges announce. But sure I will read your post with the first interest, don't care. If I can help you of any way I will not hesitate ... (your 10TB table is too much interesting to miss a knowledge on this subject :-))
Kind regards, Jean-Pierre 15 Dec 2006 12:22:19 -0800, samantha_fein19@yahoo.com < samantha_fein19@yahoo.com>:
> Hi Paul, > [quoted text clipped - 59 lines] > Info-ingres@cariboulake.com > http://mailman.cariboulake.com/mailman/listinfo.py/info-ingres
 Signature Jean-Pierre Zuate La Fage Conseil +33(0)6 11 40 11 09 jpzuate@gmail.com
Roy Hann - 16 Dec 2006 09:52 GMT > Hi Paul, > [quoted text clipped - 4 lines] > queries on it. If all 10TB on one machine, it's going to be very > expensive to get any kind of performance. It's not clear that plain, "old-fashioned" Ingres rule-based partitioning alone wouldn't be a huge benefit to you if you spend a little time setting it up. Have you already concluded that would be no help to you?
But otherwise, you are describing a Datallegro DW appliance. Go to www.datallegro.com.
Roy
samantha_fein19@yahoo.com - 16 Dec 2006 19:45 GMT Thank you all the answers are interesting. I am looking now at the datallegro appliance. The "P-series" looks very close to the description I was giving. They are using faster drives but less RAM... Retail cost for hardware would be ... USD 8,000 for a similar setup. Lets compare the hardware price to the software price.
Hardware for 20 systems costs USD 160,000 and let's spend 10,000 on what they are calling the "master" node. (With our bulk discount, we are going to upgrade the RAM from 8GB to 12, how's that? ;)
Datallegro actually has their price for this setup (albeit with a mere 8GB RAM) here. http://www.datallegro.com/pdf/usa_pricing_options.pdf. By coincidence, they show a 10 TB example at the bottom. It is USD 850k. That's a very good price compared to a mainframe!
But still, I am paying that company USD 680,000 for the software. 850k is five times the hardware price. I'm sure they are brilliant & kind people, and I hope they get richer than the Queen. I'm not a big fortune 500 company though. To get this extra 680,000 I'll either have to turn to a life of crime, or try to get something like 10 people fired (to pay for my analysis project). Oh, I can think of the 10 people, to tell you the truth, but it's not really my style. ;)
In seriousness, most of the real work is open source. The Allegro company packages Linux and Ingres, absolutely huge accomplishments. They add modest value with this coordinator program. (Sure it's a lot of value to me, but it represents a whole lot less work than Ingres!)
The big question. Whether this kind of feature will have to be an integrated part of a serious database system in the near future, or if it can remain a specialized gizmo which can be sold for USD 680,000 to large companies.
Of course you know what I'm thinking: that this capability is a proper thing for a database system. **Data** is what databases are all about. We are not trying to maximize the usefulness of single "computer". We're trying to maximize the usefulness of a "data set".
This feature allows one to spend 1/10 to 1/20th the amount on hardware, to attain the same level of usefulness out of a 10TB dataset. Also the improvement goes up exponentially with the amount of data. If there is some other roadmap for getting a ten- or twenty-fold increase in the performance/price ratio for a data set, well, let's hear it! [If someone says "Wait for better hardware," they are disqualified.]
So what do you think (I'll step out and listen). Should the next generation of database systems be able to do this by themselves or should they rely on a USD 680,000 gizmo?
sam
> <samantha_fei...@yahoo.com> wrote in messagenews:1166214139.694499.169660@73g2000cwn.googlegroups.com... > [quoted text clipped - 12 lines] > > Roy Roy Hann - 16 Dec 2006 20:48 GMT > So what do you think (I'll step out and listen). Should the next > generation of database systems be able to do this by themselves or > should they rely on a USD 680,000 gizmo? Well of course as Scott Adams pointed out, we'd all like better stuff for free. But until the average DBMS offers these kinds of capabilities you have two choices: buy it from someone who's invested the R&D to deliver something today that they guarantee works; or roll your own that will be ready in a couple of years (after spending the money to do it, and doing the testing, and committing to the lifetime of support).
If the business need exists today, and the money required is available, I struggle to see how the shrink-wrapped solution wouldn't be a no-brainer. You are always going to pay a huge premium to be ahead of the technological pack. But not all businesses see a justification for being there. Maybe yours is one. In which case, sure, wait a few years until it becomes standard.
Or, to repeat what I said earlier, based on your sketchy (and possibly hypothetical?) problem description I wonder if the standard rule-based partitioning that is already available for free in Ingres would completely solve your problem? Have you looked at it? Are you confident it won't work for you? What requirements does is fail to satisfy? (For a quick overview take a look at http://www.rationalcommerce.com/papers/partition.htm.)
Roy
Karl & Betty Schendel - 17 Dec 2006 03:56 GMT >[snip] >In seriousness, most of the real work is open source. The Allegro >company packages Linux and Ingres, absolutely huge accomplishments. >They add modest value with this coordinator program. (Sure it's a lot >of value to me, but it represents a whole lot less work than Ingres!) Permit me to differ on that one. The coordinator, as you call it, can be simple only if you are doing simple concatenation of query results. Relatively few queries are like that. Throw even a simple aggregation into the problem query and now you have to combine the answers at the master, which means that the master now has to understand that it has to do it, which means that it has to parse the SQL. Which is nontrivial.
And now throw in the ability to do joins across the appliance, and do it in an efficient manner, and things get *real* complicated real fast. Union, anyone? Outer joins across partitions? It all needs the master to understand the query enough to run it across the entire box.
I could restate it this way, to perhaps give you an idea: the master does roughly what Ingres/Star does, but Star does not have nearly the capability to make this thing work.
>The big question. Whether this kind of feature will have to be an >integrated part of a serious database system in the near future, or if >it can remain a specialized gizmo which can be sold for USD 680,000 to >large companies. I rather suspect that something kinda sorta like it is already part of Oracle and/or DB2 under the guise of various clustering options. I also rather suspect that the license fees on the proposed hardware will end up eating most of your $680K, and they are not nearly as easy to use, and there are various other disadvantages because they have to be general case, and you STILL don't have any hardware.
Oh, and by the way, if you try to build your 20-machine setup for $160K using a couple of 500 Gb disks each, you'll not get anything remotely like the performance of a P-series, nor a C-series either. The Datallegro appliance is built on 12 WD Raptors per slave, with some other engineering thrown in. Your two-drive-per-node machine will give you something in the vicinity of 100-150 Mb/sec per machine which is less than 1/8 of the Datallegro speeds, and that's best case (simple dual-threaded scan). Run a real query on that machine and I guarantee you that it will go up in smoke because there's no place to put any work space that won't contend with the two drives. Run multiple queries concurrently, more contention.
I would be willing to bet that you would get better real-world, real-query performance out of a 10 Tb C-series machine (which by the way costs WAY less than $850K or $650K or whatever), than you would get out of a $8k-per-box plumber's special.
>This feature allows one to spend 1/10 to 1/20th the amount on hardware, >to attain the same level of usefulness out of a 10TB dataset. As I have attempted to point out, if you had that feature alone, you could spend 1/10 to 1/20 on hardware to be able to do a few very simple things on a 10 Tb dataset. You're fooling yourself if you think it gives you real query capability on that dataset. (Unless we re-implement all of the Datallegro master logic into Ingres, which would be a multi-year effort for no real value to Datallegro.)
And we haven't even started talking about little things like how are you going to load it, or update it, or maintain it. Or what you're going to use for an interconnect, and how you're going to ship multi-Gb partial query results back and forth across the nodes or master in less than hours. Etc, etc, etc.
 Signature Karl R. Schendel, Jr. schendel@kbcomputer.com DATAllegro Inc Ingres DBMS Server Development
|
|
|