This page presents my pet project: fulltext database locus. I wanted
a fulltext which would be:
Personal but not lightweight
locus must run on hardware I can afford (currently PC 486 with Linux). It's
decidedly non-distributed and has no pretensions to replace Internet search
engines like Altavista. On the
other hand, I want to index all documents which fit on my disk, including CDs
(i. e. with texts of Project
Gutenberg) and I'll tolerate much slower
indexing and higher disk usage (30-70 percent of source
text size for indexes) than Glimpse
or Swish
as a tradeoff for larger maximum database size and more focused search. locus
was tested on 400MB in 1200 documents and can find uncommon words (i. e. the
kind of words you would normally use to search for something) under ten
seconds.
Smart but not programmer-hostile
The ideal of fulltext search is clear: you just type in a few words and the
program finds what you meant to search for. The problem is, it doesn't
always work that way. So locus gives you the choice: if you just type in
a few words, it uses a relatively complicated search
algorithm trying to find the best match. When you're not satisfied,
you can see why it found what it found and tweak parameters to your heart's
content and beyond, using a simple query language. locus can search for
phrases - not just on one line with exactly matching spaces, like grep,
but for words near each other - as well as topics (get a word, find fifty
associations in your thesaurus and search for these). Simple
stemming
is also supported.
Universal back end for any front end
I don't like creating GUIs, and GUIs I do create tend to look awful even to
me (not to mention others). So I decided to concentrate my work on locus on
the back end. But of course, to use a program, one needs an interface...
You can specify queries on the command line and read results from standard
output (or redirect them to a file), and
if you want anything fancier, set up your own frontend.
grazer output is quite flexible - for example, you can
output html and query locus
databases through your browser.
Interested?
If you think you might use something like locus,
you have the Linux source;
let me know at
vbar@comp.cz
how you liked it. If you have any questions, problems and/or
suggestions getting, installing, understanding, using and/or extending
locus, you may want to see FAQ before mailing me.
You can also take
a look at the
available options to see all the exciting
possibilities (well, all the exciting possibilities I cared to document -
but there's enough of them).
Your distribution contains just a (forever unfinished) core of locus.
The newest version is always (well, modulo connection problems) available
at locus homepage.
Additional code and data files for special uses (i. e. moving databases
on disk) are available upon request.
Last modified 27 Aug 98.