Readme for HINT:
-=-=-=-=-=-=-=-=

by N. Douglas and Iowa State University Research Foundation, Inc.

Index:
------
Section 1: Overview
Section 2: Running HINT
Section 3: Example HINT curves
Section 4: Credits

Section 1:
-=-=-=-=-=
HINT is a radical new benchmarking system which essentially calculates how
fast the processor is when accessing a certain amount of memory at once.
Since all current computer systems do essentially one thing - process data
which is held in memory - this benchmark is equally applicable across all
platforms.

HINT is not very well known. It originates out of research carried out by
Iowa State University into improving benchmarking and probably is the single
best portable benchmarker currently available. Things like WinBench are fine
for testing one Intel Windows NT 4 system against another, but when
comparing, say, a DEC Alpha NT 3.51 system against an Intel NT 4 system then
things become a little more complicated. The benchmarker is now not accurate.

Ditto goes for old chestnuts like dhrystone. dhrystone simply does a few
simple looped tests with integer numbers and strings. And that's it. No more.
Needless to say, a processor with inbuilt string routines will be faster than
one without. Thus you have again an unfair benchmarker.

HINT, because it's in C, simply does mathematics. It does a Hierarchical
INTegration, which essentially means that the amount of memory occupied by
the integration tree gets exponentially bigger for each extra level of
integration taken. As all processors do mathematics ultimately (and clearly
all processors do mathematics to their best ability by definition), it simply
becomes a test of how quick the processor is at accessing certain amounts of
memory at once.

Most processors will handle small chunks eg; 4k, very very quickly - usually
near core processor speed as 4k will fit into the processor L1 cache. Bigger
amounts, like 100k, are slower but faster than main memory on architectures
with L2 caches. Finally, then there is main memory speed itself, but even
this gets slower as bigger amounts are used as the TLB (Translation
Look-aside Buffer) gets overloaded. Thus, you will get a curve from HINT
something like this:

                                                            ------------
                 Main memory                               /
                 speed               ----------------------  \
                       \            /                          L1 cache
                        \          /                           size
                         \        /  \
                     -------------    \
              ------                   L2 cache size
         ----                          (if L2 cache fitted)
       -- \
     -     \
    -       \ TLB cache starts
 ---          becoming useless
  \
   \ Main memory speed, raw without any caching at all

Further off to the left, the line flattens. On systems with virtual memory
paging, around memory free limit there should be a sudden drop as paging
kicks in.

HINT is written in C, as for results to be comparable it is assumed that the
compiler will optimise the code as well as any other compiler. This
unfortunately is not the case. Under Acorn systems, the C compiler isn't
anything like up to the standards of other architectures yet, although I hear
Arm Ltd. are working on this. However, it still doesn't do too bad a job.

Another problem is that Acorn's don't have FP support in hardware. HINT takes
its timings in double format, and so every time it does this on Acorn's
you're incurring the wrath of the FP emulator, a slow beastie if ever one was
seen. Thus the results are a bit lower than they should be.

HINT's output is in a unit called QUIPS. This is the number of quantifiable
integrations a second which is fairly obvious given above.




Section 2:
-=-=-=-=-=
To run HINT, first decide on what platform you wish to run it:

(i)  : A RISC-OS 3+ based Acorn.
(ii) : A Win32 capable PC (Win95 or WinNT will do)
(iii): Something else

If (i), refer to the hint executable. Simply make sure at least 1.5Mb of
memory is in the Next slot, hit f12 and run hint. See below for more. And oh,
make sure no taskwindows are running as these can affect HINT.

If (ii), see the zip file called hint_pc/zip. Inside there are four .exe
files, i32, i64, f32 and f64. i32 and i64 test 32 and 64 bit integer work,
and f32 and f64 test 32 and 64 bit FP work. Note that i64 takes a LOT of
memory when running - don't try it with a machine with less than 32Mb and
make sure as little as possible is loaded in. See below for more.

If (iii), then please find the sources in c and h. See hint.h for how to
generate a build and also makefilero. Please note that this version of hint
won't test multiprocessor machines correctly - go to the HINT web homepage if
you want to do that.

Displaying the results:
-=-=-=-=-=-=-=-=-=-=-=-
The results will be placed in a file called RESULTS usually. Run hint with a
-csv switch if you want the results in CSV format (easier for many non Unix
graph packages).

Essentially, to plot the results you require the QUIPS column and the Memory
used column. Grab your nearest graphing package (something like Fireworkz
will do, and it also works on a PC) and drop the CSV file into a spreadsheet.
Mark the QUIPS column and the Memory used column and create an X-Y chart with
the Memory used on the X axis. Make the X axis logarithmic.

Discarding all values below 1000 bytes of Memory usage is a good idea as
results are useless below this and muck up the graph.




Section 3:
-=-=-=-=-=
Enclosed you'll find a Fireworkz file called Exmple/fwk which contains the
HINT curves for a PII @ 233Mhz, a Pentium Pro (P6) @ 200Mhz, a Classic
Pentium (P5) @ 133Mhz, a 200Mhz SA RPC, a 40Mhz Arm7 RPC and a classic 25Mhz
Arm3 A540.

I also have some other curves for things like a 200Mhz DEC Alpha machine, but
these are fairly esoteric and so I haven't included them here.

Comments:
-=-=-=-=-
As you can see, a 200Mhz SA is about the same speed as a P133. Is this wrong?
No, I don't think so myself - a P133 is bloody fast when it isn't running
Windows, and besides - the SA is completely misplaced in a RPC where it has
no L2 cache and is relying on a pretty crap motherboard (very slow data bus,
even slower memory). Few realise it, but every time a SA in a RPC wants to
talk to main memory quite a few cycles get spent synchronising.

<rant on>

Compare this to a modern PC. In this, the L2 cache speeds up access to EDO
(increasingly SDRAM now) by bursting 64 bit words along a data bus running at
66Mhz (I believe it's 33Mhz on the RPC - maybe 50Mhz though?). Burst reads
from EDO RAM incur a 4-2-2-2 timing on a modern chipset. Compare this to
standard FPM DRAM as in the RPC, which takes X-4-4-4 (not sure exactly
in the RPC) cycles for each word accessed. Also remember that PC's interleave
their RAM, meaning that two different SIMM's are accessed at once, thus
doubling data throughput. Also, modern L2 caches are predictive and adaptive
and some will prefetch data before it has actually been requested.

So essentially, the SA's poor performance is down to the crap RPC
motherboard. The upcoming RPC II motherboard will fix a LOT of these problems
though with its basic 128k L2 cache and EDO RAM support. Also, the data bus
goes quicker and so we should now be up to around a Pentium 60 level
(finally). Unfortunately, by that stage PC's will be at least three years
technology further on by then.

Getting off the crap Acorn hardware vs. PC hardware rant, on other parts of
the graph you can see that the Pentium Pro and PII are *strong* performers.
This is with several of their features disabled BTW due to historical bugs in
the Intel chipsets long since fixed. With these features enabled, they'd be
even faster again.

Lower down, you can see that the Arm7 isn't exactly quick. About 486 speeds.
Lower again, the Arm3's curve isn't distinguishable - on it you can see a
*huge* difference between cache speed and main memory speed (it's 25Mhz 1
cycle access vs 8Mhz multiple cycle access!) of some three times, which is
exactly correct.

If an Arm2 was plotted, you'd see a flat line. This is because the Arm2
accesses everything at 8Mhz (main memory speed) as it has no cache.

One thing that sticks out is that on the graph there is a large fall around
32k for the SA. This is the SA's L1 cache size, and clearly performance
plummets past this.




Section 4:
-=-=-=-=-=
See the enclosed HINT readme file for more information about HINT itself and
where to get it from. Searching for HINT on the net picks it up fairly
quickly anyway.

My modifications are fairly minimal. Writing the RO 1ms timing stuff, making
the program a bit more verbose and functional, getting it to work on
non-Unixes - that type of thing. Copyright of HINT code remains with Iowa
State University, my additions remain with me. My additions also fall under
the GNU software licence that the original code was distributed under.

So essentially, feel free to play! The HINT people would like a catalog of
HINT curves for various machines (the weirder the better!), so please oblige
them.

Cheers,
Niall Douglas.

Email: douglasn@alf2.tcd.ie

If that doesn't work, try ndouglas@prot.demon.co.uk

And last of all, although there is several months between collections,
ndouglas@digibank.demon.co.uk
