Every now and then, I come across a IT-project that stores its data in
a generic data model. Such data models basically consist of 4 tables:
Entities, Relations, Attributes and Values. The Entities table contains
a record for each table in the conceptual data model, Attributes a
record for each column, etc.
Invariably, the choice for such a data model is defended by the
argumentation that new conceptual tables and columns can be added
without modification of the data model.
This all sounds very nice, but I always have the idea that this comes
at a price that may well be higher then the benefits it offers.
So, what I am looking for is a discussion, be it on line or in
literature, about the pro and cons, as well as some list of
recommendations to help me decide if such a model would be useful. Can
anybody recommend a source of information?
Roel Schreurs
The Netherlands
David Portas - 02 Sep 2005 12:21 GMT
> the choice for such a data model is defended by the
> argumentation that new conceptual tables and columns can be added
> without modification of the data model.
Why would that be an advantage? Is it because your RDBMS* is hard to
use or because the developers lack expertise. In that case the obvious
solution is to get new software or new staff.
A common excuse for the design you've described is that the business
doesn't know or doesn't care to define the data requirements up front
and wants a "cheap" way to support metadata that is "user-defined" at
some point in the future. The problem is that most of those users will
lack the expertise, the time, or the inclination to do a proper job of
designing a data model. The user-defined approach therefore costs more
in the longer term by reducing the data's validity and usefulness.
(* I'm assuming throughout that we are discussing Relational databases
or at least SQL databases, please tell me if I'm wrong)
David Portas
paul c - 02 Sep 2005 15:33 GMT
>>the choice for such a data model is defended by the
>>argumentation that new conceptual tables and columns can be added
[quoted text clipped - 16 lines]
>
> David Portas
sounds right to me. another point - if we're talking about a dbms, the
main 'con' is that it is basically a con game. sometimes it's sold to
the customer as a 'framework'. really a shell game where the the dbms's
ability to manipulate the customer's model of the customer data is
discarded and the dbms is demeaned becoming not much more than a
physical access method. so it's up to the 'framework' to invent new
verbs, catalog et cetera. whether these are complete, sufficient, safe,
portable and so forth is entirely up for grabs, which results in the
main 'pro' - "jobs for the boys" as the consultants' union might say.
meanwhile the customer believes, wrongly, that his investment in the
system software has long-term value but he is actually being screwed.
doubtful whether there is any "schema" that can be discussed logically
- maintenance becomes either a very crippled proposition with limited
choices for extension or every new function becomes a separate adhoc
effort. normalization and other advantages are thrown out the window
along with the baby and the bath water - might as well invent your own
dbms from scratch.
p
BobTheDataBaseBoy - 02 Sep 2005 19:02 GMT
>>the choice for such a data model is defended by the
>>argumentation that new conceptual tables and columns can be added
[quoted text clipped - 16 lines]
>
> David Portas
bill of material processors (now mostly referred to as ERP because they
include Job Control) use just this kind of data structure. there really
isn't much choice if the software producer wants to sell to more than
one company in one industry. if you're building a system for one
business, then you can build specific tables to the business process
definitions.
BTDBB
mAsterdam - 02 Sep 2005 12:40 GMT
schreurs_roel wrote:
> Every now and then, I come across a IT-project that stores its data in
> a generic data model. Such data models basically consist of 4 tables:
> Entities, Relations, Attributes and Values. The Entities table contains
> a record for each table in the conceptual data model, Attributes a
> record for each column, etc.
Sounds like a lousy DBMS built on top of a database.
> Invariably, the choice for such a data model is defended by the
> argumentation that new conceptual tables and columns can be added
> without modification of the data model.
Yep, now they can/have to change their 'conceptual' model
(I don't think conceptual is the appropriate term here) /without/ the
tools of the underlying database.
> This all sounds very nice, but I always have the idea that this comes
> at a price that may well be higher then the benefits it offers.
What are the benefits (not assuming ignorance)?
A try:
Package vendors who only need to control a very
limited subset of DBMS functionality can achieve
some DBMS-brand independence this way.
Roy Hann - 02 Sep 2005 12:50 GMT
> Every now and then, I come across a IT-project that stores its data in
> a generic data model. Such data models basically consist of 4 tables:
> Entities, Relations, Attributes and Values.
Me too. This brainstorm appears to be frequently re-discovered by the
medical research community in particular. (There is an extensive
literature, in refereed journals no less.)
There are innumerable reasons to object to it, and I look forward to seeing
some of them rehearsed here, for convenient reference. I will get the ball
rolling with my personal favourite: take the hardest query you ever wrote
against a "properly" designed database, and try re-writing it for one of
these designs. If that's too hard, try a simpler query. Keep relaxing the
difficulty until you can discover one you actually can rewrite. That is
will be the limit of what you can do with this approach.
Roy
Mikito Harakiri - 02 Sep 2005 18:26 GMT
> There are innumerable reasons to object to it, and I look forward to seeing
> some of them rehearsed here, for convenient reference. I will get the ball
[quoted text clipped - 3 lines]
> difficulty until you can discover one you actually can rewrite. That is
> will be the limit of what you can do with this approach.
This becomes the moot point with PIVOT/UNPIVOT operators. Translating
complex queries is straightforward: take EAV "schema", unpivot it into
the proper relational schema, query it, pivot the result back.
Next comes the performance issue. Clearly, in order for EAV to have
something as basic as multicolumn index, it has to materialize the EAV
data unpivoted into normal relations. Therefore, in principle, it is
possible to have the data stored as EAV with materialized views making
the data appeared as relations. There are a lot of techinicalities,
however. For example, materialized views have to be incrematelly
refreshable (refresh fast on commit). Then, what about concurrency? MVs
are notorious for low transaction throughput.
Marshall Spight - 02 Sep 2005 19:39 GMT
> This becomes the moot point with PIVOT/UNPIVOT operators. Translating
> complex queries is straightforward: take EAV "schema", unpivot it into
> the proper relational schema, query it, pivot the result back.
Interesting. I take it this is some sort of transposition, such
that the attribute name becomes a column name and the attribute
value becomes the value of that column. How do entities work
into it? Tables?
Can you point me at any further reading about this? I'm
*very* interested.
Marshall
vc - 02 Sep 2005 20:20 GMT
> Interesting. I take it this is some sort of transposition, such
> that the attribute name becomes a column name and the attribute
[quoted text clipped - 5 lines]
>
> Marshall
http://www.vldb.org/conf/2004/IND1P2.PDF
Yukon's implementation:
http://www.databasejournal.com/features/mssql/article.php/3521101
Marshall Spight - 02 Sep 2005 21:35 GMT
> > Can you point me at any further reading about this?
>
> http://www.vldb.org/conf/2004/IND1P2.PDF
Glancing over this paper, it occurs to me there are
significant issues around the ability to statically type
this operator.
Rows are not known statically; columns must be. To transform
row values into columns therefor raises some issues.
Marshall
VC - 02 Sep 2005 22:12 GMT
>> > Can you point me at any further reading about this?
>>
[quoted text clipped - 3 lines]
> significant issues around the ability to statically type
> this operator.
Right.
> Rows are not known statically; columns must be. To transform
> row values into columns therefor raises some issues.
To put it mildly.
> Marshall
Kenneth Downs - 02 Sep 2005 14:41 GMT
> Every now and then, I come across a IT-project that stores its data in
> a generic data model. Such data models basically consist of 4 tables:
[quoted text clipped - 14 lines]
> Roel Schreurs
> The Netherlands
To expand on what Roy said about queries, when a person makes this design
blunder they end up having to reinvent SQL. SQL was crafted to read and
write columns from tables. The E-A-V blunder transposes columns to rows,
now requiring a transpose step for every single column of every single
query. I don't even want to think about joins, group bys(!!). Ouch.
Much cheaper is to attack the real problem: the cost of changing table
structures and keeping code synchronized.

Signature
Kenneth Downs
Secure Data Software, Inc.
(Ken)nneth@(Sec)ure(Dat)a(.com)
Tony Andrews - 02 Sep 2005 20:40 GMT
> Every now and then, I come across a IT-project that stores its data in
> a generic data model. Such data models basically consist of 4 tables:
[quoted text clipped - 11 lines]
> recommendations to help me decide if such a model would be useful. Can
> anybody recommend a source of information?
This "EAV" model is one of my pet hates, which I ranted about a little
here:
http://tonyandrews.blogspot.com/2004/10/otlt-and-eav-two-big-design-mistakes.html
schreurs_roel@hotmail.com - 05 Sep 2005 09:06 GMT
Thank you all for your replies. I want to thank you, Tony,
specifically, since you pointed me to a discussion that I can refer to.
Also, now I know that the common name for this construction is EAV
(Entity-Attribute-Value), it has been easy to find out more about it.
The best statement I read about it, from a theoretical point of view,
is the following:
"basically, it's Extreme generalization saying that every entity is
made up of attributes, and that "made up" is the relationshsip, and
that attributes may be on several entites"
(http://www.sqlteam.com/Forums/topic.asp?TOPIC_ID=40371)
Interpreting it as follows: Since you force this relationship onto your
model, there is no room for any conceptial relationships to be enforced
anymore.
David Cressey - 08 Sep 2005 14:03 GMT
> Every now and then, I come across a IT-project that stores its data in
> a generic data model. Such data models basically consist of 4 tables:
> Entities, Relations, Attributes and Values. The Entities table contains
> a record for each table in the conceptual data model, Attributes a
> record for each column, etc.
In order to appreciate the down side of this implementaion, you need to
have the following experience:
Get hired as a contract database specialist, hired to generate a new set of
reports, from the existing database. That's a "lightweight" project,
right. After all, any idiot can use Crystal Reports, or some such thing as
that.So people have high expectations that you will promptly deliver
meaningful reports.
And that's your expectation too, "as soon as you can understand the data".
And that's when you discover the downside to this approach.
> Invariably, the choice for such a data model is defended by the
> argumentation that new conceptual tables and columns can be added
> without modification of the data model.
And this is the strongest argument against such an approach. One could
criticize this approach on a number of grounds, including but not limited to
performance, but that skirts the issue.
The real issue is that the documentation of this data is informal and often
unverbalized. The very flexibility that allows you to extend the universe
of discourse without formally incorporating new data definitions is the
biggest weakness of the approach.
If you want undocumented data, why do you want a database?