Database Forum / General DB Topics / DB Theory / June 2005
Proving an Upgrade is Possible
|
|
Thread rating:  |
Kenneth Downs - 02 Jun 2005 04:49 GMT I would love to be able to determine with certainty if a certain operation can be validated before it is performed. This is the operation of changing the state of a database by an upgrade.
Let's define some terms.
First, the state of the database before the upgrade is S(1). The state after the upgrade will be S(2). The operation we wish to pre-validate is S(1)->S(2). Let's get back to this after looking at other stuff.
But before we validate the state change, we should validate that the proposed new meta-State will be valid. The database's meta-State is described in normalized tables (real 1NF, no lists please). The meta-State before the upgrade is mS(1), which we assume to be valid. We must validate the proposed new meta-State, mS(2). But this easy because it is in normalized tables, and we can use integrity and referential constraints to ensure it is valid.
So having validated our destination mS(2), we should also now determine that the change in meta-State is valid. This would be to prevent things like converting a column from char to int (or perhaps defining what should happen if one were to do that). Since mS(1) is a set of tables, and mS(2) is a set of the same structure tables, the operation mS(1)->mS(2) can be described in a third set of tables, delta-meta-State, or dmS. We obtain this set of tables by outer joining all corresponding tables in mS(1) and mS(2). We can then have a fourth set of tables that can be used to do referential checks on dmS and determine if it is valid.
The point that is bugging me is that, if we can prove mS(2) is valid, and that dmS is valid, does that imply that dS ( S(1)->S(2) ) is valid? Can it be proven to be valid without producing database S(2) and then database dS, because such a task is insane in actual practice.
So I guess I am looking for a theorem that says, "Given S(1), S(2) and mS(1) and mS(2), where S(1) mS(1), ms(2) and dmS are all known to be valid, it can be safely concluded that dS and S(2) will be valid". Wondering if anybody has seen a rigorous mathematical treatment of this subject, and what the Google terms might be.
 Signature Kenneth Downs Secure Data Software, Inc. (Ken)nneth@(Sec)ure(Dat)a(.com)
Jan Hidders - 04 Jun 2005 13:45 GMT > I would love to be able to determine with certainty if a certain operation > can be validated before it is performed. This is the operation of changing > the state of a database by an upgrade. An upgrade of what? The database? The schema of the database? The DBMS software? What are you talking about here? And what on earth is a "meta-state"?
-- Jan Hidders
Kenneth Downs - 04 Jun 2005 15:35 GMT >> I would love to be able to determine with certainty if a certain >> operation [quoted text clipped - 6 lines] > > -- Jan Hidders An upgrade of a database, not the DBMS software. In the trivial case just changes to the tables and columns, but also at very least unique and referential constraints.
The meta-state is all possible states described by the data dictionary (or meta-data if you prefer).
If you have a data dictionary containing the meta data that describes the database, an upgrade = a change to the data dicitonary. So just as a user transaction changes the state of a database, an upgrade changes the meta data, or the meta state.
My goal is to always know completely whether all structure changes to a db are valid.
Here is a case of where you can prove an upgrade will fail, or at least you cannot prove it will succeed. If the WIDGETS table currently has a unique constraint on COL_A, COL_B, and COL_C, we know that an upgrade (or any change to the database) can fail if the new version has the unique constraint only on COL_A and COL_B. While it may succeed in some cases, it will fail in others.
How would you know in advance though that this upgrade could not work? Well, its simple enough. You have the current state of the database described in tables, meta-State 1, or mS(1). You have the destination state described in other tables, mS(2). You join them together and look at everything that changes, such as key definitions. The dropping of a column in a key is not allowed, so the upgrade log puts out an error and refuses to run.
 Signature Kenneth Downs Secure Data Software, Inc. (Ken)nneth@(Sec)ure(Dat)a(.com)
Jan Hidders - 04 Jun 2005 23:38 GMT > If you have a data dictionary containing the meta data that describes the > database, an upgrade = a change to the data dicitonary. So just as a user > transaction changes the state of a database, an upgrade changes the meta > data, or the meta state. Ok. So you are talking about updates to the database schema.
> My goal is to always know completely whether all structure changes to a db > are valid. When you say "structure changes" you don't mean just changes to the structure, do you? You mean changes to both the database structure and the database constraints, right? Although below it seems you in fact exclude changes to the structure and only consider changes to the constraints.
> Here is a case of where you can prove an upgrade will fail, or at least you > cannot prove it will succeed. If the WIDGETS table currently has a unique > constraint on COL_A, COL_B, and COL_C, we know that an upgrade (or any > change to the database) can fail if the new version has the unique > constraint only on COL_A and COL_B. While it may succeed in some cases, it > will fail in others. Ok, so I suspect are you asking the following:
"Given the old database schema and a few updates to its constraints (but not its structure) is it possible to algorithmically decide (or mathematically prove) whether all instances of the old schema are also instances of the new schema?"
I leave it to you whether it is "algorithmically decide" or "mathematically prove".
Am I close?
-- Jan Hidders
Kenneth Downs - 05 Jun 2005 14:52 GMT >> If you have a data dictionary containing the meta data that describes the >> database, an upgrade = a change to the data dicitonary. So just as a [quoted text clipped - 11 lines] > exclude changes to the structure and only consider changes to the > constraints. The example considers only a constraint, to keep it simple. The OP states structure and constraints. Also included in my systems are automated columns and tables, so they must be included alos.
>> Here is a case of where you can prove an upgrade will fail, or at least >> you [quoted text clipped - 15 lines] > > Am I close? You are on target, except that constraints are included, and also automation (columns like price_extended = price * qty). So while we are at it, this means things like NULL/NOT NULL rules, DEFAULT values and so forth.
Another way to state the question is: "What structure of meta-data allows us to specify completely a database of arbitrary complexity and also allows us to demonstrate by analysis that any given change in the meta data is valid or not valid".
Methinks the structure of the meta-data would reduce the truth to a data validity issue (an invalid transform would fail a validitity check in the meta data), which is closer to an algorithm than to a mathematical proof. Yet, I have the strong intuitive sense that a before-hand mathematical proof could lead to the table structure.
 Signature Kenneth Downs Secure Data Software, Inc. (Ken)nneth@(Sec)ure(Dat)a(.com)
Jan Hidders - 05 Jun 2005 18:51 GMT >>Ok, so I suspect are you asking the following: >> [quoted text clipped - 9 lines] > (columns like price_extended = price * qty). So while we are at it, this > means things like NULL/NOT NULL rules, DEFAULT values and so forth. What do you mean by "constraints are included"? Those were already included in my formulation. Since you only care about whether instances are valid or not, automation can in this case be regarded as a special case of constraints. Default values are completely irrelevant for your question because they do not change the set of allowed valid instances.
Or do you want to also consider updates that change the structure of the schema, because in that case my formulation would be completely off track.
> Another way to state the question is: "What structure of meta-data allows > us to specify completely a database of arbitrary complexity and also allows > us to demonstrate by analysis that any given change in the meta data is > valid or not valid". And when exactly is a change to the meta-data "valid"? What does the specification of the meta-data look like? What do you mean by "arbitrary complexity"? What do you mean by "demonstrate by analysis"?
> Methinks the structure of the meta-data would reduce the truth to a data > validity issue (an invalid transform would fail a validitity check in the > meta data), which is closer to an algorithm than to a mathematical proof. > Yet, I have the strong intuitive sense that a before-hand mathematical > proof could lead to the table structure. Whoah, wait right there! You want to derive the table structure? That's a completely different question! Since you are still a bit sloppy with your terminology and still have not really given a mathematically exact definition of the theorem (which is mathematical in nature) that you want to discuss, I suggest that we first keep things simple and start with the case where the structure of the tables is fixed. Note that I'm not even talking about the answer of the problem here, just about formulating the problem. That's always the first problem you need to solve. :-)
-- Jan Hidders
Kenneth Downs - 06 Jun 2005 14:29 GMT >>>Ok, so I suspect are you asking the following: >>> [quoted text clipped - 16 lines] > case of constraints. Default values are completely irrelevant for your > question because they do not change the set of allowed valid instances. To provide an analytic and deterministic answer to the viability of any db change, you must consider constraints, null rules, and defaults.
Consider this case. Add two columns to a table, COL_A and COL_B, with the constraint that COL_A < COL_B. One has a default value, the other does not, and they may not be null. What on Earth do you do to satisfy these requirements in the rows that already exist in the table?
I am not here asking for the answer to that question, anybody in this forum can come up with case-by-case answers to puzzles. What I am probing is the ability to determine the answer by analysis in the general case. This way a db analsyst punches in what they want to happen, and the system tells them why it will not work.
> Or do you want to also consider updates that change the structure of the > schema, because in that case my formulation would be completely off track. I must not understand your formulation because I thought we were talking about the same thing: analyzing in advance a change to a database that might include anything the db server has implemented: tables, columns, constraints, null rules, and defaults.
Perhaps we should keep automation out of it because that is something I do that is generally considered heresy and it tends to divert discussions.
>> Another way to state the question is: "What structure of meta-data >> allows us to specify completely a database of arbitrary complexity and [quoted text clipped - 4 lines] > specification of the meta-data look like? What do you mean by "arbitrary > complexity"? What do you mean by "demonstrate by analysis"? The whole point is to arrive at a description of the meta-data that allows the analytic prediction of the validity of a state change. The meta-data is in tables, so it is valid if it satisfies the constraints on those tables.
Arbitrary complexity means the system you are describing in meta-data has no limits on complexity, either in number of tables, columns, constraints, or their interaction. Put another way, it is not a "toy" system that breaks down when you try to make real-world mission-critical stuff.
To demonstrate by analysis is to examine the meta-data and know if the upgrade will succeed. I did not choose your term "algorithmically decide" because it is not a step-by-step code thing, the validity or invalidity of the state change should be captured in the state of the meta-data. For the same reason I did not choose your term "mathematically prove". Demonstrate by Analysis seems closer to the mark because we want to examine (or analyze) the meta-data.
 Signature Kenneth Downs Secure Data Software, Inc. (Ken)nneth@(Sec)ure(Dat)a(.com)
Jan Hidders - 07 Jun 2005 22:36 GMT >>>>Ok, so I suspect are you asking the following: >>>> [quoted text clipped - 24 lines] > not, and they may not be null. What on Earth do you do to satisfy these > requirements in the rows that already exist in the table? You don't. In this case the specification of the schema update is incomplete because the new instance for the new schema is not completely determined. There are actually two problems that you would like to be able to decide in advance. The first is whether the new instance is actually fully determined or not. The second is (presuming the first is answered positively) whether the new instance always is a valid instance of the new schema.
To what extent these problems are solvable or not depends upon a lot of details such as if you consider each schema update as a sequence of atomic updates or as one big update, but also the nature of the constraints that are allowed or the type of queries that may be used for derived columns/tables. So until you give an idea of what your update language and its semantics are, how the constraint language looks what type of queries you allow for derived data it's hard to give a meaningful answer to your question.
> upgrade will succeed. I did not choose your term "algorithmically decide" > because it is not a step-by-step code thing, the validity or invalidity of > the state change should be captured in the state of the meta-data. For the > same reason I did not choose your term "mathematically prove". Demonstrate > by Analysis seems closer to the mark because we want to examine (or > analyze) the meta-data. I suspect that what I describe as "mathematically prove" is the same as your "analyze". It simply means using some kind of formalism (set theory, logic, calculus, whatever) to derive true statements about the formal object you are studying. Note that for certain problems for which there is no finite axiomatization this is in a certain sense sometimes not possible.
-- Jan Hidders
Kenneth Downs - 08 Jun 2005 19:19 GMT >>>>>Ok, so I suspect are you asking the following: >>>>> [quoted text clipped - 33 lines] > answered positively) whether the new instance always is a valid instance > of the new schema. Exactly.
> To what extent these problems are solvable or not depends upon a lot of > details such as if you consider each schema update as a sequence of [quoted text clipped - 4 lines] > type of queries you allow for derived data it's hard to give a > meaningful answer to your question. Here is our current specification for database specifications:
http://www.secdat.com/dev/androspec.html
It is pretty thorough in terms of what it can define, but it is still lacking in the topic of this thread: fully and completely determining that an upgrade is possible. I usually start a thread on something here before tackling a problem.
>> upgrade will succeed. I did not choose your term "algorithmically >> decide" because it is not a step-by-step code thing, the validity or [quoted text clipped - 10 lines] > there is no finite axiomatization this is in a certain sense sometimes > not possible. Methinks we are in fact saying the same thing. My final word on it would be that we cast the "mathematical proof" of the validity of the upgrade to the act of examining some meta-data tables for conventional stuff like unique and referential violations.
The question of infinite possibilities is tough. You don't have to resort to Godel to know that any set of features will be unsatisfactory for all possible cases, but you can at least prove that any operation involving those features is valid or not valid. If the primitive operations are powerful enough in combination, you can safely cast all problems in terms of those primitives.
 Signature Kenneth Downs Secure Data Software, Inc. (Ken)nneth@(Sec)ure(Dat)a(.com)
mountain man - 05 Jun 2005 08:05 GMT > My goal is to always know completely whether all structure changes to a db > are valid. [quoted text clipped - 7 lines] > it > will fail in others. Here it seems as if you are relaxing a constraint. I would have thought the relaxation of constraints should always be successful, whereas the creation of constrainst may be problematic.
Will not the determination of success of failure ultimately also resolve to the state of the data (in addition to the schema and its constraints, etc)?
Achieving your stated goal relies upon an assumption about the data, that *the_data* will be amenable to the structural changes imposed. What if it isn't?
Only a test-conversion will tell you that. This is not to say that "proving the schema upgrade is possible" ---- at the schema level alone ---- is not a worthwhile step in the upgrade path, as in the end you need to know about any failure, hopefully in advance.
 Signature Pete Brown IT Managers & Engineers Falls Creek Australia www.mountainman.com.au/software
Kenneth Downs - 05 Jun 2005 14:46 GMT >> My goal is to always know completely whether all structure changes to a >> db are valid. [quoted text clipped - 12 lines] > should always be successful, whereas the creation > of constrainst may be problematic. Ooops, I gave the example in reverse, you are correct.
> Will not the determination of success of failure ultimately also > resolve to the state of the data (in addition to the schema and > its constraints, etc)? Yes, this is what makes it not possible to prove in advance that it will work. You don't want features in the product that are known to be fallible before they even go out the door.
> Achieving your stated goal relies upon an assumption > about the data, that *the_data* will be amenable to > the structural changes imposed. What if it isn't? Again, the point it is to know it in the dev shop and not commit changes that will not work. The original OP is about hwo to always know whether or not the proposed change will work.
> Only a test-conversion will tell you that. This is > not to say that "proving the schema upgrade is > possible" ---- at the schema level alone ---- is > not a worthwhile step in the upgrade path, as > in the end you need to know about any failure, > hopefully in advance. Not true. In the example above, I showed how you could know before the conversion that it would fail, without going through the conversion.
I am seeking the structure of meta data that would offer this ability in the general case, that the feature set is complete enough to describe any application of arbitrary complexity, and that any feature in a particular case can be known to be possible by analysis before attempting the upgrade operation.
 Signature Kenneth Downs Secure Data Software, Inc. (Ken)nneth@(Sec)ure(Dat)a(.com)
mountain man - 06 Jun 2005 00:57 GMT >> Only a test-conversion will tell you that. This is >> not to say that "proving the schema upgrade is [quoted text clipped - 12 lines] > case can be known to be possible by analysis before attempting the upgrade > operation. It appears that you are trying to develop a generic process whereby the consistency/validity of schema changes can be tested at the schema level, before the data is dealt with. Essentially, a schematic-change exception check between the old and the new and proposed elements of schema change.
Is this correct?
The testing with any live data happens at a later stage. Is this correct?
 Signature Pete Brown Falls Creek OZ www.mountainman.com.au
Kenneth Downs - 06 Jun 2005 03:26 GMT >>> Only a test-conversion will tell you that. This is >>> not to say that "proving the schema upgrade is [quoted text clipped - 20 lines] > > Is this correct? Yes.
> The testing with any live data happens at a later stage. > Is this correct? No, I want this to be included in the analysis of the schema change, so it checks both:
mS(2): the destination schema (or meta-State) Delta-mS: the change in schema (or change in meta-State) Delta-S: Change in state of user data
 Signature Kenneth Downs Secure Data Software, Inc. (Ken)nneth@(Sec)ure(Dat)a(.com)
|
|
|