Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Database Servers
DB2InformixIngresMS SQLOraclePervasive.SQLPostgreSQLProgressSybase
Desktop Databases
FileMakerFoxProMS AccessParadox
General
General DB TopicsDatabase Theory
Related Topics
Java Development.NET DevelopmentVB DevelopmentMore Topics ...

Database Forum / General DB Topics / DB Theory / September 2008

Tip: Looking for answers? Try searching our database.

Non-text database theory

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Rune Allnor - 05 Sep 2008 10:26 GMT
Hi all.

This might be off topic for this group; if so please direct me to a
more
appropriate group.

I have 20 years of programming experience (hobby / personal scale)
and
am getting my feet wet with databases for the first time. The project
at
hand needs a database to handle large amounts of data. The data are
measured by sonar and amounts to the hundreds of GB, so one would
prefer to save the data on some binary format to save time on the
text <-> binary conversions.

The textbooks I have found on database theory solely deal with text
data, i.e. data that are stored as tables in text files, which I
suppose
is OK for educational purposes.

1) Where can I find material on 'real-life' databases which deal with
the
  storage and handling of binary data?
2) Are there database implementations which are better suited for my
  application than others? I would like to keep the application
platform
  independent, and use C++ as my programming language.

Rune
jefftyzzer - 05 Sep 2008 18:24 GMT
> Hi all.
>
[quoted text clipped - 25 lines]
>
> Rune

Hmmm...there's virtually no limit to the kinds of data a modern RDBMS
can store, particularly with the extended type capabilities that came
along with the object-relational wave of the last decade. The RMoD
certainly doesn't circumscribe (data) types.

Anyway, although it sounds like your textbook is using textual
attributes in its examples, RDBMSs are quite capable of efficiently
storing and allowing you to manipulate binary data. Are you speaking
of sonar *images* here, or some other, more fine-grained, measurement?

As to recommended books, I think for what you're working on, stepping
away from theory books (not in general, mind you!) and looking for
books that are specific to the RDBMS you're working with (which is, by
the way, what?) would take you farther on this specific question.

For books that are more theory-oriented, perhaps _Databases, Types and
the Relational Model_ by Date would be of interest to you.

Regards,

--Jeff
Rune Allnor - 05 Sep 2008 20:12 GMT
> > Hi all.
>
[quoted text clipped - 30 lines]
> along with the object-relational wave of the last decade. The RMoD
> certainly doesn't circumscribe (data) types.

I'm a total nephyte on the subject; acronyms are foreign to me.
RDBMS = Relational DataBase Management System...?
RMoD = ?

> Anyway, although it sounds like your textbook is using textual
> attributes in its examples, RDBMSs are quite capable of efficiently
> storing and allowing you to manipulate binary data. Are you speaking
> of sonar *images* here, or some other, more fine-grained, measurement?

It's anything and everything. Lots of data, measurements and
information flowing all over the place; keeping track is a
full-time job. Literally.

> As to recommended books, I think for what you're working on, stepping
> away from theory books (not in general, mind you!) and looking for
> books that are specific to the RDBMS you're working with (which is, by
> the way, what?) would take you farther on this specific question.

Just looking at options for now. I know what need be done, the
question is if there are good or bad ways of doing it.

> For books that are more theory-oriented, perhaps _Databases, Types and
> the Relational Model_ by Date would be of interest to you.

Thanks.

Rune
jefftyzzer - 05 Sep 2008 20:21 GMT
> > > Hi all.
>
[quoted text clipped - 58 lines]
>
> Rune

Yep, you got "RDBMS" right, and "RMoD" = the Relational Model of Data,
the body of theory collectively undergirding RDBMS's (note that the
vendors' fidelity to the model varies, but that's a story for another
day).

--Jeff
Ed Prochak - 22 Sep 2008 21:05 GMT
> > > Hi all.
>
[quoted text clipped - 58 lines]
>
> Rune

For specific products I was going to suggest PROGRESS, but it looks
like they abandoned their database product. (I guess that happens when
you haven't looked at a product for over 10 years.)

Still it may be useful to search some of the vendor sites to see if
they have something that does what you want.
HTH,
 ed
Tim X - 06 Sep 2008 02:53 GMT
> Hi all.
>
[quoted text clipped - 23 lines]
> platform
>    independent, and use C++ as my programming language.

Many databases have the concept of a 'blob', (binary large object),
which you could use. However, in most cases it isn't going to gain you
much.

The data storage and retrieval aspects of a database are only part of
the benefits of a DBMS. The real power comes from the ability to
retrieve sets of data based on various criteria or attributes. However,
with binary data, there is often little in the way of attributes that
can be easily identified in the data itself - after all, its just
sequences of 1s and 0s. In fact, with binary data, storing it in the
database can actually complicate things because more often than not, you
will use other stand-alone applications to process the data. If its in
the database, you will now need to create some interface between the
database and the applicaiton that processes the data. This could be as
easy as having the database dump the data into a disk file that the
application can then read, but then what has the database actually given you?

In most cases however, you do have meta information about the data. This
could be things like the date and time the data was obtained, the
location, interesting characteristics, data size etc. This is the data I would
store in the database together with information about where the file is
stored in the filesystem. The database could be responsible for
generating unique filenames, which is very useful if you have lots of
them as you don't have to think about it and you can use names that are
less user friendly, such as just sequencial numbers etc. The DB might
even manage a special filesystem hierarchy, grouping files into
directories based on certain meta data attributes.

This would give you the best of both worlds in that you can obtain lists
of data files from the database that represent data that meet certain
characteristics e.g. all data from a particular location, date, time etc
and at the same time, allow you to use other data processing
applications on the data directly at the filesystem level and whthout
the additional DBMS layer (assuming the processing doesn't change meta
information stored in the database).

The other advantage of this approach is that you won't need one of the
larger commercial databases, such as Oracle or DB2. In fact, you could
probably use things like sql lite, mysql or even Berkley DB hashes.

HTH

Tim

Signature

tcross (at) rapttech dot com dot au

Rune Allnor - 06 Sep 2008 08:50 GMT
...
> In most cases however, you do have meta information about the data. This
> could be things like the date and time the data was obtained, the
[quoted text clipped - 6 lines]
> even manage a special filesystem hierarchy, grouping files into
> directories based on certain meta data attributes.

Your description matches what I want, I am not sufficiently familiar
with the terminology to realize that what I was asking for was not
a database as such.

This must have been done thousands of times already. I don't want to
invent wheels, so is there a description around on how to do these
things? One question which immediately comes to mind is how to
protect
the logged files from being tampered with.

> This would give you the best of both worlds in that you can obtain lists
> of data files from the database that represent data that meet certain
[quoted text clipped - 7 lines]
> larger commercial databases, such as Oracle or DB2. In fact, you could
> probably use things like sql lite, mysql or even Berkley DB hashes.

Ah. Just what I want. Thanks for clarifying the big picture.

Rune
Seun Osewa - 06 Sep 2008 17:28 GMT
> This must have been done thousands of times already. I don't want to
> invent wheels, so is there a description around on how to do these
> things? One question which immediately comes to mind is how to
> protect
> the logged files from being tampered with.

This is not a database question.  Some Ideas
- Password protect the computer on which they are stored.
- Encrypt them with openssl and store the keys with you.
- Hash the files (SHA1) so you'll know when they've been changed.
Tim X - 07 Sep 2008 04:12 GMT
> ...
>> In most cases however, you do have meta information about the data. This
[quoted text clipped - 17 lines]
> protect
> the logged files from being tampered with.

The answer depends on the OS your on. For example, if we are talking
about Linux or one of the other members of the *nix family, I would
probably handle this by creating a specific user and group for the
application. You can then control access via normal OS access controls,
such as adding users to the group, using umask to ensure file/directory
permissions are set appropriately etc. . Under windows and other
platforms you have similar functionality, but I'm not familiar enough
with windows to give a detailed description.

An important consideration when working out how to lay everything out is
backup and rstore. If you have lots of data in lots of files, you will
want to make sure they are set out in a way that makes adding new data
straight forward and that also makes it easy to do backups. How you
approach this depends on how much the data changes and the total amount
of data and what backup facilities you ahve available.

The design of the database to manage the meta data will depend on what
meta data you have. However, this is really just the application of good
database design principles.

If your database supports database constraints, such as foreign key
constraints, check constraints, not null constraints etc, then use
them. Some argue these are bad because they restrict your ability to
make changes in the future. I think this is rubbish and a sign of a lack
of real analysis and design. Use the datatypes that best match your
'natural' data and how it is to be used. Be wary of data that uses the
word number in it, it may not be a number. for example, I can't count
the number of systems I've worked on when the original design used a
number field for something like a staff number or reference
number. These sort of numbers are often best represented by character
types because its not unusual for them to have leading 0s, which are
significant in the sense they are part of the data. However, if you
define the data as a number type, you generally can't have leading
zeros. Number types should only be required when you plan to use them in
numeric/math type operations. Try to use the data size that best fits
your data model. I often see poor database design where a column has
been defined to be as large as possible. Again, this is often done in
the misguided belief that it adds flexibility. However, doing so also
means that bad data can get into the system. For example, if you know
all values in column A should never be larger than 10 characters, then
define it to be that large. Then when something tries to insert a larger
value, you know that either the value is bogus or there has been some
change in your domain and you now need to increase the size of that
field - the point is, you are alerted to either the fact something is
doing something it shouldn't or there is an ierror in your underlying
data model. An important point with databases is that the old maxim of
GIGO is fundamental (Garbage In Garbage Out). Any database related
application is only as good as the quality of the data it manages. No
matter how flash, useful or sophisticated your application, if the data
is unreliable, the application is unreliable.

While analysis and modeling are important, its also important to
actually get something up and running. I'm a big believer in doing
prototypes. No matter how much analysis, planning and design you do,
there will always be things you discover or realise during the
implementation that just were not obvious in the planning/design
stage. Just trying to do the implementation teaches you a lot about your
problem domain that won't be obvious from reading or thinking
alone. Identify the core functionality you want to address. Keep it
simple and avoid the temptation to add additional functionality (note it
down for later, but move on). Keep it really simple and try to solve
your key problem first. Add bells and whistles later when your more
comfortable with the problem domain and have a better understanding of
it. Try to get something out as quickly as possible and if others are
going to use it, get them to start playing with it and get feedback.

HTH

Tim

Signature

tcross (at) rapttech dot com dot au

Volker Hetzer - 11 Sep 2008 20:01 GMT
Rune Allnor schrieb:
> Hi all.
>
[quoted text clipped - 10 lines]
> prefer to save the data on some binary format to save time on the
> text <-> binary conversions.
Sounds like modeling isn't the big thing in your application. (Might
play a role though.)
Offhand I can think of three standard applications that store massive
amounts of binary data, with a bit of meta stuff around it:
- pornographic sites have to serve huge amounts of imagery and videos. You
might look down your nose at it but in terms of design and technology they are
  state of the art in private enterprises.
-radio telescopes process and filter even greater amounts of data, much like
your sonar data but orders of magnitude more.
- the storage and processing facilities of film studios are geared to storage,
retrieval and shifting of data around to various processing facilities.

> The textbooks I have found on database theory solely deal with text
> data, i.e. data that are stored as tables in text files, which I
[quoted text clipped - 4 lines]
> the
>    storage and handling of binary data?
As others here have already told, most databases can store blobs either
internally or (transparently) in a file system. You still access them
through the database but it allows for instance to store the meta data
locally and all the binary stuff on a network drive on a file server
or large storage area network.
It gets more "real life" if you read the database specific documentation.
This here, for instance is for oracle 11g:
http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28393/toc.htm

> 2) Are there database implementations which are better suited for my
>    application than others? I would like to keep the application
> platform
>    independent, and use C++ as my programming language.
I'm not sure about database independence, I don't think BLOB access
has been standardized. But this would be just a couple of classes
with some database dependent innards and a generic interface.
Normally BLOB access means that you have to read out either a stream
or fetch the data in packets and all BLOBable databases pack this
functionality in one shape or other which you can easily repackage
for a generic access.
As for C++ and platform independence, not sure about that. Most databases
offer
- the generic interface (ODBC, OLEDB. ADO.NET) for the platform,
- the database specific (OC(C)I, libmysqlclient) but platform independent
  interface or
- Java connectivity as platform and database independent but language
  specific interface.
You decide.

As for storing the meta data too in a hierarchical structure, I think
it's worth investigating the meshed approach of the entity relationship
model. The tree is a special case of it but you'll find soon that an ERM
allows you to model your data more precisely and gives you more powerful
retrieval possibilities.

So, without knowing the slightest thing about the technical environment your
solution has to operate in, nothing about the kind of queries that are run,
nothing about required performance or reliability and and not much about
security I'd recommend some kind of database that allows you to separate blob
storage and meta data storage.

For tamper-proofing the whole thing, securing the database and file
server would be a start.
Everything else, database accounts, roles, grants, audit trails, encryption
and so on are greatly dependent on your application and its users. (How many?
How often do they change? Etc.)

Lots of Greetings!
Volker
Signature

For email replies, please substitute the obvious.

Evan Keel - 13 Sep 2008 17:42 GMT
> Hi all.
>
[quoted text clipped - 25 lines]
>
> Rune

You will be fine. Find a copy of  "Handbook of Relational Database Design
(Fleming, von Halle), old school but relevant.

Evan
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.