Database Forum / General DB Topics / DB Theory / July 2005
Property sheet, ad hoc, property page, flexible data
|
|
Thread rating:  |
AC - 19 Jul 2005 19:06 GMT Hello,
I am looking for a solution that hopefully someone here has tackled. I want to create "base" tables with fields that are intrinsic to the application. I also want to allow the customers using the application to be able to create their own fields so that they can easily customize the application. These new fields would either be added to the base table or to a separate table with a 1:1 relationship to the base table. The client application will render the controls based on the new fields in a grid to allow a flexible amount of fields to be added by customers.
How would you handle this?
Thanks in advance,
-- AC
Marshall Spight - 19 Jul 2005 22:13 GMT > I am looking for a solution that hopefully someone here has tackled. I want > to create "base" tables with fields that are intrinsic to the application. [quoted text clipped - 6 lines] > > How would you handle this? You can have whatever schema you want for your built-in, predefined fields; that's what schema is for. But the mistake most people make is thinking that user-defined fields somehow specify more schema; they don't. I hereby propose, in public for the first time, my formulation on the relationship between schema and semantics:
No schema, no semantics. Know schema, know semantics.
Okay, so it's bumper-sticker level material, but it gets the point across: if you don't have a schema for something, you don't know jack about it.
You don't know anything about what these user-defined fields mean. So don't make any schema for them; it's not the right tool. Instead, just make the schema for what you *do* have semantics for, which is "user-supplied name".
Marshall
Neo - 19 Jul 2005 22:27 GMT > No schema, no semantics. Based on your likely definition of schema, you are correct (but is your definition of schema correct).
AC - 20 Jul 2005 21:07 GMT >> I am looking for a solution that hopefully someone here has tackled. I >> want [quoted text clipped - 3 lines] >> create their own fields so that they can easily customize the >> application. <snip/>
>> How would you handle this? <snip/>
> No schema, no semantics. > Know schema, know semantics. > > Okay, so it's bumper-sticker level material, but it gets the > point across: if you don't have a schema for something, you > don't know jack about it. <snip/>
> Marshall If I load the ad hoc data into a grid what else would I need to know? This type of customization was part of Symantec ACT and some other contact management programs. How else do you easily allow customization of a program?
Regards
Neo - 19 Jul 2005 22:19 GMT > How would you handle this? These type of applications tend to be impractical in RM. One method might be to use generic modelling (ie T_Object, T_Property, T_Relation).
Neo - 19 Jul 2005 22:38 GMT > How would you handle this? One way is to design your original schema such that things, properties, values and relationships between things can be added as records in existing tables rather than creating additional tables.
AC - 20 Jul 2005 21:28 GMT >> How would you handle this? > > One way is to design your original schema such that things, properties, > values and relationships between things can be added as records in > existing tables rather than creating additional tables. I will have to think about this. Do you have links to this type of schema?
Thanks
mAsterdam - 20 Jul 2005 00:11 GMT > ...I want to create "base" tables with fields that are > intrinsic to the application. [quoted text clipped - 4 lines] > will render the controls based on the new fields in a grid to allow a > flexible amount of fields to be added by customers. ... If this group would have a FAQ, this Q would be in it. For now you'll have to do with googling the archives.
The short answer is: don't go that way, you'll get the worst of both worlds.
I won't go into this any deeper unless somebody shows a real start (and commitment to maintain it for say the first year) for a cdt FAQ (or unless you showed you googled and come up with some new aspects :-)
It's not a bad question, but I don't like going over the same arguments every time it pops up.
AC - 20 Jul 2005 21:07 GMT >> ...I want to create "base" tables with fields that are intrinsic to the >> application. I also want to allow the customers using the application to >> be able to create their own fields so that they can easily customize the >> application. <snip/>
> If this group would have a FAQ, this Q would be in it. > For now you'll have to do with googling the archives. > > The short answer is: don't go that way, you'll get the worst > of both worlds. <snip/>
> It's not a bad question, but I don't like going over the same arguments > every time it pops up. Can you post the search string I should use? I did google which is why I created the subject I did. So that others that were searching for this type of answer could find it. I don't know how others have described this scenario. I get a lot of pages on property sheets for programming controls. I know that this type of setup was used by Symantec ACT but didn't find references to that either.
Thanks in advance
dawn - 24 Jul 2005 02:33 GMT > >> ...I want to create "base" tables with fields that are intrinsic to the > >> application. I also want to allow the customers using the application to [quoted text clipped - 17 lines] > I know that this type of setup was used by Symantec ACT but didn't find > references to that either. I get a couple of hits on the string
user-defined-attributes
in cdt in googlegroups and I suspect some variations would also yield results.
The bottom line is that when someone who "does" databases is asked how to support user-defined attributes, they often tell the user that their requirements are not as important as the integrity of the database and that there is simply no way they can figure out how to design a solution that mitigates the risks within reason.
I have no problem letting users define attributes, but I don't often use a SQL-DBMS. Even if using a SQL-DBMS, there are multiple ways to try to meet the user requirements. You can assess the risk to your company if you were to toss in a few strings, ints, and floats with names like M1, M2, or userDef1, userDef2 or User_Def_1, User_Def_2 ... (just understand that some DBAs in the audience are certain I'm either stupid or ignorant when they read this). Place such attributes (with unique names) in relations that are likely spots for additions, document it well, and make UI's to collect the data.
Another approach is one where you store tag & value in two separate attributes so they can name a value and enter it, thereby mixing metadata with data. This is, of course, one of the "never do this" commandments, but, again, you need to assess the cost vs benefits and mitigate risks.
Your users will thank you. Scratch that as overly optimistic, but if you have determined that they and the company would be more productive with these attributes when you did your risk / cost assessment, then you just might have done something good.
Cheers! --dawn
AC - 24 Jul 2005 06:05 GMT >> >> ...I want to create "base" tables with fields that are intrinsic to >> >> the [quoted text clipped - 10 lines] >> > The short answer is: don't go that way, you'll get the worst >> > of both worlds. <snip/>
> I get a couple of hits on the string > > user-defined-attributes > > in cdt in googlegroups and I suspect some variations would also yield > results. <snip/>
> try to meet the user requirements. You can assess the risk to your > company if you were to toss in a few strings, ints, and floats with [quoted text clipped - 6 lines] > Another approach is one where you store tag & value in two separate > attributes so they can name a value and enter it, thereby mixing <snip/>
> Cheers! --dawn Thanks for the reply and the search string suggestion, Dawn. We considered tag/value storage but decided that it wouldn't work as well as defining each field. The problem with pre-creation of the fields (UserDef1, UserDef2, etc) is that you would be stuck with for example 5 text fields, 6 ints, 2 date_time, etc. This would be limitation that does not appear to have much benefit.
Regards --AC
Jonathan Leffler - 24 Jul 2005 08:19 GMT >>>>>...I want to create "base" tables with fields that are intrinsic to >>>>>the [quoted text clipped - 39 lines] > > Thanks for the reply and the search string suggestion, Dawn. Another search in groups.google.com that works reasonably well is:
attribute value universal group:comp.databases.theory
I have a feeling that I'm missing the most appropriate combination of words (but 'attribute value' alone is a bad search in this news group).
 Signature Jonathan Leffler #include <disclaimer.h> Email: jleffler@earthlink.net, jleffler@us.ibm.com Guardian of DBD::Informix v2005.01 -- http://dbi.perl.org/
David Cressey - 24 Jul 2005 14:10 GMT PMFJI.
The fundamental problem with user defined attributes is not a theoretical one. It's one of conflicts between two incompatible requirements of the users.
The first requirement is, as you outlined, that users want to be able to extend the schema to include data items that are not defined in the schema, and are defined in each user's head. Each user's extensions might be understood by the entire user community, but more commonly, each user's extensions are idiosyncratic to that user, and either meaningless or misstated in the view of the rest of the users.
The other requirement is that the data in the data base be sharable and usable by the entire user community in search of information. You can't turn data into information without the semantics. Of all the summaries of the problems of using data not included in the schema, I've never seen a more elegant summary than Marshall's:
No schema, no semantics. Know Schema, Know Semantics.
Now, here's the kicker: there's generally a delay time between the time the users demand the ability to add to the schema, and the time the same user community demands that the architects or administrators make sense out of the whole mess. By the time the mess has been made, and the semantics are encapsulated inside the skulls of hundreds of users, many of whom are former users, it's the devil's own work to analyze all that data, and to obtain the kind of outputs that "ought to be easy, because the data is in a database".
If you can somehow get the users to agree that they are responsible for sharing the semantics of the data with each other, and that the data architecture can't help them , except for data inside the schema, then, and only then, many of the suggestions already posted will do the job just fine
David Cressey - 24 Jul 2005 14:10 GMT PMFJI.
The fundamental problem with user defined attributes is not a theoretical one. It's one of conflicts between two incompatible requirements of the users.
The first requirement is, as you outlined, that users want to be able to extend the schema to include data items that are not defined in the schema, and are defined in each user's head. Each user's extensions might be understood by the entire user community, but more commonly, each user's extensions are idiosyncratic to that user, and either meaningless or misstated in the view of the rest of the users.
The other requirement is that the data in the data base be sharable and usable by the entire user community in search of information. You can't turn data into information without the semantics. Of all the summaries of the problems of using data not included in the schema, I've never seen a more elegant summary than Marshall's:
No schema, no semantics. Know Schema, Know Semantics.
Now, here's the kicker: there's generally a delay time between the time the users demand the ability to add to the schema, and the time the same user community demands that the architects or administrators make sense out of the whole mess. By the time the mess has been made, and the semantics are encapsulated inside the skulls of hundreds of users, many of whom are former users, it's the devil's own work to analyze all that data, and to obtain the kind of outputs that "ought to be easy, because the data is in a database".
If you can somehow get the users to agree that they are responsible for sharing the semantics of the data with each other, and that the data architecture can't help them , except for data inside the schema, then, and only then, many of the suggestions already posted will do the job just fine
Marshall Spight - 24 Jul 2005 18:14 GMT > PMFJI. > > The fundamental problem with user defined attributes is not a theoretical > one. It's one of conflicts between two incompatible requirements of the > users. > fine Strongly agree.
Some more problems:
Dawn describes two possible approaches. approach 1: add columns "UserDef1", "UserDef2" approach 2: additional table, with three columns: FK, Name, Value
The first one is just frightening. It's not even in first normal form, and it will exhaust its expanision capacity quickly.
The second one is better. Amusingly, I would describe that solution as being one that actually has well-defined semantics. The semantics are that the first column specifies the entity being described, the second is the name of the quality being described, and the third is the value for that quality. Not a *lot* of semantics there, but at least it's well-defined. Note that I would not agree with Dawn as describing this as mixing metadata and data, because the values in the name column *aren't* metadata, since we don't know what they mean, other than "they are names."
Another significant issue with these approaches is that they both do not have any way of allowing any structure in the user defined data; you're limited to exactly whatever domains you allow up front; probably int and string, say, but certainly not, say, three pairs of int/float tuples and a list of x, y points.
Marshall
dawn - 24 Jul 2005 21:49 GMT > > PMFJI. > > [quoted text clipped - 13 lines] > The first one is just frightening. It's not even in first normal form, > and it will exhaust its expanision capacity quickly. Yes, it is not a highly scalable approach, but does give more than the NOTHING that is often offered as an alternative to users. I'm a happier e-mail end users with gmail because it not only lets me tag an e-mail with multiple labels, but I can also click a star on e-mails as more information.
Users might be able to reuse the same single variable length character attribute for many purposes over the life of their system. These are basically "comment" attributes and you don't restrict the content of a comment. Name the comment "MultiPurposeShortComment" or "ProductNotation" (if an attribute of a product).
I'm guessing you have seen users request comments where a comment is not really what is needed, but they intend to put intelligent comments in the system so they can search on those. An attribute not named "comment" that is intended for this purpose would be better, right? Again, there are trade-offs and you need to assess the real requirements and design for flexibility for users as well as the system overall.
> The second one is better. Amusingly, I would describe that solution > as being one that actually has well-defined semantics. as would I :-)
> The semantics > are that the first column specifies the entity being described, the > second is the name of the quality being described, and the third is > the value for that quality. Not a *lot* of semantics there, but at > least it's well-defined. Note that I would not agree with Dawn as > describing this as mixing metadata and data, that is what I have heard it called. I would claim that we mix these a lot, so I do not have a problem with this. Metdata and data are all data.
> because the values in > the name column *aren't* metadata, since we don't know what they > mean, other than "they are names." Who is the "we" in that sentence? The user knows what they mean and can, perhaps, write reports that qualify rows based on the value related to a particular tag. So, SQL doesn't grab this metadata from the same place, but there is still data about data stored in as attribute data, instead of in the system catalog.
> Another significant issue with these approaches is that they both > do not have any way of allowing any structure in the user defined > data; Yes, that is both their charm and their downfall.
> you're limited to exactly whatever domains you allow up front; > probably int and string, say, but certainly not, say, three pairs of > int/float tuples and a list of x, y points. Or perhaps a relation-valued attribute ...
You could toss in a few of these notational attributes as strings and perhaps a date or two. I'd prefer they were all defined as strings and cast to whatever is needed at the time, but in a SQL-DBMS, that might not be as easy.
My point is that you need to assess the needs and associated risks with any design you might choose and also consider the risk of ignoring this particular perceived need. I suspect that too many DBAs are inclined to tell their users they can't have such an attribute without doing an analysis and without realizing that there are real benefits to notational fields of this nature. I would suggest that even when a user does not identify this need up front, it is still a good idea to identify if there is a place for such attributes. I didn't ask for that gmail star, but I really appreciate it and suspect that it increases my productivity.
Cheers! --dawn
Marshall Spight - 25 Jul 2005 04:19 GMT > > because the values in > > the name column *aren't* metadata, since we don't know what they > > mean, other than "they are names." > > Who is the "we" in that sentence? The programmer and the software he writes.
> The user knows what they mean and > can, perhaps, write reports that qualify rows based on the value > related to a particular tag. I disagree. The users (plural) *don't* know what they mean. The reason this doesn't work is because there is no canonical, homogenous user who knows what he knows. Instead there are vast seas of users. And while I would agree that one particular user probably knows what he means when he enters a particular name, we have no reason to believe that anyone else will agree. In fact, we have strong reason to believe that chaos will be the order of the day.
Tags/labels/what have you are lexical data merely; they have no semantics exactly because they have no schema. (Beyond the minimal "this is the name, and this is the value.")
> > you're limited to exactly whatever domains you allow up front; > > probably int and string, say, but certainly not, say, three pairs of > > int/float tuples and a list of x, y points. > > Or perhaps a relation-valued attribute ... Sure.
> I'd prefer they were all defined as strings and > cast to whatever is needed at the time, but in a SQL-DBMS, that might > not be as easy. That is a much worse way to do things. If evenything is a string and can be cast to anything as needed, then you can't reason about your code. It has no useful statically analyzable properties.
> I didn't ask for that > gmail star, but I really appreciate it and suspect that it increases my > productivity. I am glad you are enjoying gmail.
I would propose that the star is not an example of a used defined fields, but gmail labels are. They match your option 2) pretty well.
Marshall
David Cressey - 25 Jul 2005 13:56 GMT > > > because the values in > > > the name column *aren't* metadata, since we don't know what they [quoted text clipped - 3 lines] > > The programmer and the software he writes. I see two problems with the above:
First "we" is plural, and "the programmer" is singular. You may think I'm nitpicking, but I don't think I am. When several programmers want to understand what the data means, the understanding can be "joint" or "several", to use the legalese terms.
Second, you confuse the role of "data architect" with the role of "programmer". There's nothing to say that one person cannot do both roles, and do each of them well. But they aren't the same role. And the vast majority of programmers I've come in contact with have no knowledge of what the data "really means". Their level of interpretation of the data is strictly what's necessary in order to determine whether their programs are delivering correct results. That's it.
That's not a deep enough understanding of the data and its purpose to make data sharing feasible and cost effective among a wide and diverse user community.
A disturbing trend is this: there are an increasing number of database designers that are not fit to be data architects.
Marshall Spight - 25 Jul 2005 15:10 GMT > "Marshall Spight" <marshall.spight@gmail.com> wrote in message > > Second, you confuse the role of "data architect" with the role of > "programmer". There's nothing to say that one person cannot do both roles, > and do each of them well. But they aren't the same role. Fair enough.
> And the vast > majority of programmers I've come in contact with have no knowledge of what > the data "really means". Their level of interpretation of the data is > strictly what's necessary in order to determine whether their programs are > delivering correct results. That's it. Hmmm. Could you clarify the difference? The two things you describe here seem to me to be the same thing.
> That's not a deep enough understanding of the data and its purpose to make > data sharing feasible and cost effective among a wide and diverse user > community. How so?
> A disturbing trend is this: there are an increasing number of database > designers that are not fit to be data architects. I'll buy that!
Marshall
Gene Wirchenko - 26 Jul 2005 00:41 GMT [snip]
>> And the vast >> majority of programmers I've come in contact with have no knowledge of what >> the data "really means". Their level of interpretation of the data is >> strictly what's necessary in order to determine whether their programs are >> delivering correct results. That's it. That is, what the data means to the DBMS.
>Hmmm. Could you clarify the difference? The two things you describe >here seem to me to be the same thing. What does the data mean to the DBMS? It is merely the sum of all the constraints.
What does the data mean to the end user? What it means to the DBMS and more. A value of "Toronto" for City means more to the end user than just the characters.
[snip]
Sincerely,
Gene Wirchenko
David Cressey - 26 Jul 2005 15:39 GMT > > "Marshall Spight" <marshall.spight@gmail.com> wrote in message > > [quoted text clipped - 25 lines] > > Marshall I'm going to have to think long and hard about your questions. I'm hoping others will jump in.
All I can say off the top of my head is that data can have value (or worth, if you prefer) that is apart from its correctness.
There is some data in databases that is verifiably correct, but means nothing of much value.
AC - 25 Jul 2005 19:04 GMT > PMFJI. > [quoted text clipped - 5 lines] > extend the schema to include data items that are not defined in the > schema, <snip/>
That is the situation. Just like ACT (previously from Symantec) allowed you to have "property sheets" we would like to do something similar. The reason is that some items will be standard but there are too many possibilities with various business requirements to specify in advance all fields. This way there is flexibility without too much data bastardization. Ad hoc fields will show up in a grid view only.
> The other requirement is that the data in the data base be sharable and > usable by the entire user community in search of information. You can't <snip/> This is not a requirement or expectation. Base fields would be shareable but ad hoc fields would of course not be able to be shared. There is no expectation for sharing in the same way that two companies that used ACT and merged could not expect a 1:1 matching of their "property sheet" data.
Has anyone done this before? I know it's not common as I have not worked on a project with this scenario before however I know that with Salesforce Automation packages it is done.
Regards, --AC
|
|
|