Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Database Servers
DB2InformixIngresMS SQLOraclePervasive.SQLPostgreSQLProgressSybase
Desktop Databases
FileMakerFoxProMS AccessParadox
General
General DB TopicsDatabase Theory
Related Topics
Java Development.NET DevelopmentVB DevelopmentMore Topics ...

Database Forum / General DB Topics / DB Theory / January 2008

Tip: Looking for answers? Try searching our database.

Entropy and Quantity of Information

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
David Cressey - 11 Jan 2008 18:41 GMT
Instead of hijacking another topic,  I'll start this topic.

David BL suggested that a way to quantify information is the number of bits
needed to encode, allowing for compression. I said I preferred entropy as
the measure of information,  and suggested that the two measures might in
some way be equivalent.  Someone else recalled the concept of entropy from
the study of statistical mechanics and thermodynamics in physics.

The use of the word "entropy"  definitiely comes from physics.  The formulas
that Shannon (qv) determined for the amount of information in a message bear
an uncanny resemblance to the formulas for entropy in physics.  I personally
don't believe that this is mere coincidence.

Next,  I talked a little about entropy being context sensitive.  There is a
much more precise way than mine of stating this, but I've forgotten what it
is.  I think it goes something like this:  the self information in  a
message is context insensitive but the mutual information between a pair of
messages is dependent on both of them.  My use of the word "database"  in
place of message in my earlier comment  is a little sloppy in this regard.

Next,  the use of logarithms,  and the number of bits needed to store
(represent?) a message are closely related.  If we have 8 equally likely
messages,  it will take precisely 3 bits to store one message.  Note that
log base 2 of 8 is 3.  If you use the formulas for entropy,  and use 2 as
the base for the logarithms,  the unit of information comes out to be
precisely the bit.

There is a form of compression called Huffman encoding on an ensemble of
messages that takes advantage of mathematical expectation to compress, on
the average,  an ensemble of messages when the eight possible messages are
not a priori equally likely.  If for example,  message number 1 is 50%
likely,  you and use a message consisting of 1 bit to represent it,  and use
somewhat longer messages for the other 7 possible messages.  I don't want to
belabor the point in cdt,  but there are ample references for Huffamn
encoding on the web.  If you study Huffman coding in detail,  you'll start
to get logarithms into the picture.

All of this goes back to the 1960s, and some of it to the 1940s.  Is entropy
still widely used in information science?  Is it relevant to database
theory?
Tegiri Nenashi - 11 Jan 2008 19:40 GMT
>  Is entropy
> still widely used in information science?  Is it relevant to database
> theory?

http://www.sigmod.org/pods/proc00/papers/dalkilic.pdf
Joe Thurbon - 12 Jan 2008 00:38 GMT
> Instead of hijacking another topic,  I'll start this topic.
>
> David BL suggested that a way to quantify information is the number of bits
> needed to encode, allowing for compression.

I think the term being looked for is

http://en.wikipedia.org/wiki/Kolmogorov_complexity

There's been a _lot_ of work done in this area.

> I said I preferred entropy as
> the measure of information,  and suggested that the two measures might in
> some way be equivalent.  Someone else recalled the concept of entropy from
> the study of statistical mechanics and thermodynamics in physics.

It also gets a good work out in machine learning. Many of the classical
algorithms  (decision trees in their various forms) have
information-gain (which is defined in terms of entropy) at their heart.
(The precise definition is closely related to the number of bits
required for representation)

I vaguely recall that some of the more theoretical machine learning
results (like learnability and optimality results) rely on a notion entropy.

[... snip interesting perspective ...]

> All of this goes back to the 1960s, and some of it to the 1940s.  Is entropy
> still widely used in information science?  

Yes.

> Is it relevant to database theory?

Good question.

Cheers,
Joe
David Cressey - 12 Jan 2008 09:17 GMT
[snip good stuff]

> I vaguely recall that some of the more theoretical machine learning
> results (like learnability and optimality results) rely on a notion entropy.

I once wrote a little program to play Mastermind  (a code guessing game)
using entropy to measure a move's worth.  Mastermind can be played using a
brute force tree search.  But my program tried out only about 10 moves,  and
picked the one with the best entropy.  Its play was only slightly inferior
to the play of a brute force tree searcher.
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2010 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.