Module time calibration - work notes (in svn for reference)
Vlastimil Babka

Goal:
- tweak inputs or settings of CPU consuming modules to have comparable
and reasonably long isolated durations.
- classes of inputs (short, medium, long) - how to select them?

===
In the following, the develoment is ordered as newest-first.
===

==========================================

libquantum: tuned the parameters somewhat

libquantum-small: 60 - 190ms
33, 35, 39, 45, 51, 55
5

liquantum-medium: 410ms - 3s
75, 81, 95 (~1s), 99, 105, 111, 115, 117, 123 (~1s), 143 (~3s)
25

==

lbm
- time steps of 1 results in ~1s duration, getting less than that would require modifications of the input file
  and naive removal of a block in there breaks the module. The train and ref input files yield same results.


lbm-medium: 1s - 3s
steps 1,2,3

==

sjeng
- try to change the numbers on even lines in the test input file, should control depth of searching?
  with the default 10/11 it's ~6s
- with 1/1/ it's 360ms
- removing the second input it's 270ms

sjeng-small: 270ms - 300 ms
- only first line from spec test input, low depth
depth 2, 5, 6

sjeng-medium: 410 ms - 1450ms
- spec test with lower depths
depths 5/6, 6/7, 7/8, 8/9

==

bzip2
- changed input size granularity from 1 MB to 128 KB, added it as a parameter instead of hardcoded '2'
- find appropriate files to compress

bzip2-small: 25-240 ms
- file: text.html, dryer.jpg, bzip2.input (=lbm.input)
- inputsize: 1, 2

bzip2-medium: 300 ms - 2.5s
- file: input.program
- inputsize: 4-32

==

lzw-small: 3ms - 18 ms
- file: lzw.input (=lbm.input), text.html, byoudoin.jpg, dryer.jpg
- maxbits: 16

lzw-medium: ~190-200ms
- file: input.program
- maxbits: 12-16

==

fft-tiny: 1ms
- buffer: 1M

fft-small: 13ms
- buffer: 4M

==
mfc
- implemented input shortener in bin/utils/mfc-shorten-input.py, used on SPEC test input

mfc-small: 89 ms - 300ms (450 - 500 timetables)
- file: mfc-small1.input, mfc-small2.input, mfc-small3.input, mfc-small4.input

unfeasible (smallest available input too large): astar, namd

===========================================

===
trunk -r112
===

Approximate times of modules (on washington) and analysis

memspeedmp (1 MB, 128 pointers) ~ 60ms

fft (buffer size 64) ~ 70ns
- the buffer size is probably unrealistically small, must check config

astar (astar-lake.cfg = SPEC test/lake.cfg) ~ 12s
- probably the smallest input available in SPEC CPU2006 (train and ref are larger)
- train/rivers1.cfg takes ~ 60s
- would need to find some input outside the SPEC suite to get less time

lbm (lbm.input) ~ 16s
- uses SPEC test input, train and ref use different files with same size and similar "ascii-art" contents,
  the setting that probably affects the execution time is in the accompanying lbm.in file and determines
  the number of time steps to compute (test 20, train 300, ref 3000)
- module wrapper hardcodes 20 time steps, changed to module parameter

mcf (mcf.input = SPEC test input) ~ 9s
- SPEC has no shorter input, we could maybe try truncating it (text file with numbers) and see what happens

namd (1 iteration) ~ 19s
- uses the only input that's in the SPEC suite. The time is controlled by the number of iterations.
  SPEC uses 1 for test/train, 38 for ref run. Even with 1 iteration, it's 19s, we thus cannot get less
  using the SPEC input file.

bzip2 (probably lbm's input file, not any of the SPEC input files for bzip2) ~ 1s
- thanks to its nature we can use anything as input, just keep in mind that SPEC readme says it's
  compressing only in memory (but we won't need too large files to obtain reasonable runtimes it seems)

lzw (maxbits = 64) ~ 15 ms
- anything goes as input (but since it's much faster than bzip2, we probably can't long runtimes without
  using large files, potentially going for I/O if not buffered)

sjeng (sjeng.input = SPEC test input)~ 6s
- SPEC has no shorter input, but its format looks like we can strip it a bit(perhaps decrease the depth
  of look,  it's a chess program)

libquantum (params as SPEC test input) ~ 76ms
- looks it can be nicely finetuned by the two parameters (first - number to be factorized, second - base
  for modular exponentiation)
- SPEC suite input is test (33, 5), train (143, 25), ref (1397, 8)