I got a new computer (Pentium II) and it shows. :-) Time to index 50 MB
(49 619 019 bytes, to be exact) in 193 files of my new test
collection (Project Gutenberg texts from Jules Verne to the Bible, Usenet
jokes, on-line poetry etc.) with locus-0.88 is 15 minutes.
For locus-0.87, I got a significant speedup by increasing the size of
memory cache: Decline and Fall of the Roman Empire now indexes in 15 minutes.
(With only 3 indexing cycles - I should probably pick up some bigger reference
collection.)
Performance measurably (although probably not noticeably) improved for
locus-0.8, compiled with gcc 2.8.1. Time to index 9 772 883 bytes
of Gibbon's Decline and Fall of the Roman Empire is now 26 minutes.
locus-0.64 (and later) - compiled with gcc 2.8.0, which can compile my
convoluted C++ with optimizations enabled (as opposed to gcc 2.7.2.3) - is
a definite improvement over the previous version. Time to index
9 772 883 bytes of Gibbon's Decline and Fall of the Roman Empire
went from 62 to 29 minutes.
Time/size dependency is pretty linear. I guess it stays linear till you
have RAM, but surely not forever: 49 752 228 bytes of VIRUS-L archive takes
almost 6 hours (5 hr 43 min, to be precise). Here, disk writes are the
bottleneck (stop list cuts the time to 4 hr 45 min).
Of course, many more tests should be done, but I'm too lazy. If you do your
own measurements, I'll gladly include them here.
Top: locus homepage