This page presents my pet project: fulltext database locus. I wanted
a fulltext which would be:
Personal but not lightweight
locus must run on hardware I can afford (currently PC 486 with Linux). It's
decidedly non-distributed and has no pretensions to replace Internet search
engines like Altavista. On the
other hand, I want to index all documents which fit on my disk, including CDs
(i. e. with texts of Project
Gutenberg) and I'll tolerate much slower
indexing and higher disk usage (30-70 percent of source
text size for indexes) than Glimpse
or Swish
as a tradeoff for larger maximum database size and more focused search. locus
was tested on 400MB in 1200 documents and can find uncommon words (i. e. the
kind of words you would normally use to search for something) under ten
seconds.
Smart but not programmer-hostile
The ideal of fulltext search is clear: you just type in a few words and the
program finds what you meant to search for. The problem is, it doesn't
always work that way. So locus gives you the choice: if you just type in
a few words, it uses a relatively complicated search
algorithm trying to find the best match. When you're not satisfied,
you can see why it found what it found and tweak parameters to your heart's
content and beyond, using a simple query language. locus can search for
phrases - not just on one line with exactly matching spaces, like grep,
but for words near each other - as well as topics (get a word, find fifty
associations in your thesaurus and search for these). Simple
stemming
is also supported.
Interested?
If you think you might use something like locus,
you have the Linux source;
let me know at
vbar@comp.cz
how you liked it. If you have any questions/problems getting,
installing, understanding, using and/or extending locus, you
may want to see FAQ before
mailing me.
You can also take
a look at the
available options to see all the exciting
possibilities (well, all the exciting possibilities I cared to document -
but there's enough of them).
Your distribution contains just a (forever unfinished) core of locus.
The newest version is always (well, modulo connection problems) available
at locus homepage.
Additional code and data files for special uses (i. e. moving databases
on disk) are available upon request.
Last modified 06 Jul 98.