Module time calibration - work notes (in svn for reference) Vlastimil Babka Goal: - tweak inputs or settings of CPU consuming modules to have comparable and reasonably long isolated durations. - classes of inputs (short, medium, long) - how to select them? === In the following, the develoment is ordered as newest-first. === ========================================== libquantum: tuned the parameters somewhat libquantum-small: 60 - 190ms 33, 35, 39, 45, 51, 55 5 liquantum-medium: 410ms - 3s 75, 81, 95 (~1s), 99, 105, 111, 115, 117, 123 (~1s), 143 (~3s) 25 == lbm - time steps of 1 results in ~1s duration, getting less than that would require modifications of the input file and naive removal of a block in there breaks the module. The train and ref input files yield same results. lbm-medium: 1s - 3s steps 1,2,3 == sjeng - try to change the numbers on even lines in the test input file, should control depth of searching? with the default 10/11 it's ~6s - with 1/1/ it's 360ms - removing the second input it's 270ms sjeng-small: 270ms - 300 ms - only first line from spec test input, low depth depth 2, 5, 6 sjeng-medium: 410 ms - 1450ms - spec test with lower depths depths 5/6, 6/7, 7/8, 8/9 == bzip2 - changed input size granularity from 1 MB to 128 KB, added it as a parameter instead of hardcoded '2' - find appropriate files to compress bzip2-small: 25-240 ms - file: text.html, dryer.jpg, bzip2.input (=lbm.input) - inputsize: 1, 2 bzip2-medium: 300 ms - 2.5s - file: input.program - inputsize: 4-32 == lzw-small: 3ms - 18 ms - file: lzw.input (=lbm.input), text.html, byoudoin.jpg, dryer.jpg - maxbits: 16 lzw-medium: ~190-200ms - file: input.program - maxbits: 12-16 == fft-tiny: 1ms - buffer: 1M fft-small: 13ms - buffer: 4M == mfc - implemented input shortener in bin/utils/mfc-shorten-input.py, used on SPEC test input mfc-small: 89 ms - 300ms (450 - 500 timetables) - file: mfc-small1.input, mfc-small2.input, mfc-small3.input, mfc-small4.input unfeasible (smallest available input too large): astar, namd =========================================== === trunk -r112 === Approximate times of modules (on washington) and analysis memspeedmp (1 MB, 128 pointers) ~ 60ms fft (buffer size 64) ~ 70ns - the buffer size is probably unrealistically small, must check config astar (astar-lake.cfg = SPEC test/lake.cfg) ~ 12s - probably the smallest input available in SPEC CPU2006 (train and ref are larger) - train/rivers1.cfg takes ~ 60s - would need to find some input outside the SPEC suite to get less time lbm (lbm.input) ~ 16s - uses SPEC test input, train and ref use different files with same size and similar "ascii-art" contents, the setting that probably affects the execution time is in the accompanying lbm.in file and determines the number of time steps to compute (test 20, train 300, ref 3000) - module wrapper hardcodes 20 time steps, changed to module parameter mcf (mcf.input = SPEC test input) ~ 9s - SPEC has no shorter input, we could maybe try truncating it (text file with numbers) and see what happens namd (1 iteration) ~ 19s - uses the only input that's in the SPEC suite. The time is controlled by the number of iterations. SPEC uses 1 for test/train, 38 for ref run. Even with 1 iteration, it's 19s, we thus cannot get less using the SPEC input file. bzip2 (probably lbm's input file, not any of the SPEC input files for bzip2) ~ 1s - thanks to its nature we can use anything as input, just keep in mind that SPEC readme says it's compressing only in memory (but we won't need too large files to obtain reasonable runtimes it seems) lzw (maxbits = 64) ~ 15 ms - anything goes as input (but since it's much faster than bzip2, we probably can't long runtimes without using large files, potentially going for I/O if not buffered) sjeng (sjeng.input = SPEC test input)~ 6s - SPEC has no shorter input, but its format looks like we can strip it a bit(perhaps decrease the depth of look, it's a chess program) libquantum (params as SPEC test input) ~ 76ms - looks it can be nicely finetuned by the two parameters (first - number to be factorized, second - base for modular exponentiation) - SPEC suite input is test (33, 5), train (143, 25), ref (1397, 8)