I want to demonstrate the potential effects of update statistics in a
repeatable way but am having trouble doing so. This is for a customer
using an elderly version of Informix SE.
Here's what I did.
1) create table item
(code integer, name char(30), type char(1), cost integer)
2) create index item_ix on item(code)
create index item_type_ix on item(type)
3) load from "item.csv" insert into item
I assumed this would be slow but avoid having the statistics updated by
creating indexes after loading. I want bad stats first. I'd previously
generated a millon lines of item.csv with sequential code values, fixed
name, type = random A-Z, cost = random 0-100
4) set explain on and time a bunch of queries
select sum(cost) from item where type = 'A'
select sum(cost) from item where type != 'A'
select sum(cost) from item where type between 'B' and 'Z'
5) update statistics for table item
6) Repeat timings. Peruse sqexplain.out.
What I found was that the query optimizer selected sequential scan for
"type != 'A'" (40s) but used the index for "type between 'B' and 'Z'"
(420s). Update statistics had no effect.
I guess I'm misunderstanding and have not devised a suitable scenario to
clearly demonstrate the effect of update statistics. Any clues or
suggestions?

Signature
RGB
Art S. Kagel (Oninit) - 23 Apr 2008 19:34 GMT
> I want to demonstrate the potential effects of update statistics in a
> repeatable way but am having trouble doing so. This is for a customer
[quoted text clipped - 30 lines]
> clearly demonstrate the effect of update statistics. Any clues or
> suggestions?
Part of the 'problem' is that you are using SE. The optimizer in SE is
not very bright, if it finds an index it uses it. The stats are mostly
used to choose the best of two indexes.
Art S. Kagel
Oninit
Jonathan Leffler - 24 Apr 2008 13:49 GMT
On Apr 23, 9:08 am, RedGrittyBrick <RedGrittyBr...@SpamWeary.foo>
wrote:
> I want to demonstrate the potential effects of update statistics in a
> repeatable way but am having trouble doing so. This is for a customer
[quoted text clipped - 30 lines]
> clearly demonstrate the effect of update statistics. Any clues or
> suggestions?
As Art said, the SE optimizer doesn't do a lot with statistics. It is
also very fast in SE, and very simple. You can run UPDATE STATISTICS
(with no qualifiers whatsoever) periodically (say once a week, or once
a month; OTOH, daily doesn't hurt either) and unless you have a very
volatile table or two in the system, you'll be fine.
The SE optimizer is not cost-based - it is heuristic.
To the optimizer, there's a big difference between type != 'A' and
type BETWEEN 'B' AND 'Z'. The first includes [a-z0-9] and the
punctuation and the upper-half of the code set and control characters
and so on (as well as [B-Z]), and the second does not. Because
statistics are only statistics, the optimizer cannot rely on them
being accurate and never assumes that there are no values other than
the ones recorded in the statistics. Also, SE doesn't even use
distributions, so it would not know that the values are limited the
range A-Z; it might have second lowest (B) and second highest (Y) but
that's all.
In many respects, the fact that you don't have to worry about UPDATE
STATISTICS with SE is great -- it means it works well despite
neglect. The target for IDS is to get back to a state where you don't
have to worry about UPDATE STATISTICS either. It will still be a
while before IDS gets there.
-=JL=-
RedGrittyBrick - 24 Apr 2008 14:53 GMT
[Update statistics and Informix SE]
Thanks, Art and Jonathan, for your clear and useful replies.

Signature
RGB