From tag-bounces@lists.linuxgazette.net Tue Feb 3 19:05:08 2009
From: Ben Okopnik
Date: Tue, 3 Feb 2009 21:59:16 -0500
To: Karl Vogel
Cc: The Answer Gang
Reply-To: The Answer Gang
Sender: tag-bounces@lists.linuxgazette.net
Subject: Re: [TAG] tkb: Talkback:159/okopnik.html
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=utf8
Status: RO
[ Karl, I hope you don't mind me copying this exchange to The Answer
Gang; I'd like for the error that you pointed out to be noted in our
next issue, and this is the best and easiest way to do it. If you have
any further replies, please CC them to 'tag@lists.linuxgazette.net'. ]
On Tue, Feb 03, 2009 at 02:59:57PM -0500, Karl Vogel wrote:
> Very cool follow-up article!
Thanks, Karl; I appreciate that. That's a very, very fun program -
again, thanks for introducing me to it!
> >> In a previous message, you unhesitatingly continued with this missive:
>
>> In practice, I've found that indexing HTML files with either "-ft" or
>> "-fh" leads to exactly the same results - i.e., a working index for all
>> the content - and so now I lump both of the above under "-ft".
>
> The display is different in the web interface. I indexed the same small
> collection of HTML files as both plain text and HTML, and then looked for
> "samba troubleshooting".
>
> Search for the phrase when indexed as plain text:
> http://localhost/search/plain/estseek.cgi?phrase=samba+troubleshooting
>
> Command used to index:
> estcmd gather -sd -ft plain /tmp/searchlmaiHg
>
> Display:
> SAMBA_Troubleshooting.htm 24428
> name="generator" content=" ... me="Generator" content="Microsoft
> Word 97"> Troubleshooting Log for VOS Samba
> size="6">Samba
>Troubleshooting
... 1516637">*
Samba
> Symptoms, Causes and Resolutions @type: text/plain
>
> Now do a search for the same phrase when indexed as HTML:
> http://localhost/search/hyper/estseek.cgi?phrase=samba+troubleshooting
>
> Command used to index:
> estcmd gather -sd -fh hyper /tmp/searchlmaiHg
>
> Display:
> Troubleshooting Log for VOS Samba 25592
> Samba Troubleshooting Guide Version 2.0.7 Paul Green May 22, 2002 -
> 2001, 2002 Paul Green. Permi ... ree Documentation License". Contents
> Terminology * Samba Symptoms, Causes and Resolutions * Introduction
> * ... and Editing Host Files from a PC * Miscellaneous * Samba Web
> Access Tool (SWAT) * Troubleshooting * GNU Fre ... t that I am unable
> to offer personal assistance in troubleshooting specific problems.
> Installation This section lists ... hat arise during installation
> and configuration of Samba. Symptom: Cannot add a new HOST machine
> to an NT D ...
> http://localhost/search/docs/SAMBA_Troubleshooting.htm - [detail]
>
> Click the "details" link and check the attributes:
> @type: text/html
I just ran a careful, step-by-step manual retest of the above, and
you're absolutely right. I must have lost track of what I did during
which test - it does indeed make a difference.
On the other hand - please bear with me while I think "out loud" about
this - since the only place that difference shows up is in the cited
"hit context" paragraphs in Hyperestraier and not in the content itself,
I'm not sure how much extra effort this deserves. In order to make that
small change - i.e., not have the HTML markup appear in the cited
paragraph, which only shows up for a second or so during the process -
you'd have to split the files into two streams, index each of them
individually, then do 'extkeys/optimize/purge' on both... pretty much
double the processing time and seriously increase the complexity of the
build script. Doesn't seem like much of a payback for a whole lot of
work.
I suppose you could use "-fx" to keep the modifications really simple:
just add something like '-fx htm* H@"lynx -dump -nolist"' to the "estcmd
gather" line... but if you're going to do that, you might as well set up
processing for all the other "interesting" types of files: PDFs, RTFs,
OpenOffice files, etc. (I was going to write about that, too, but
figured it would become too complex at that point.) I guess it's a
question of deciding where the cutoff point is and building the indexer
to reflect that.
Overall, I don't think that doing major hackery just to fix the context
paragraph is worthwhile. For myself, I'm going to leave it just as it
is until I decide to start processing the other filetypes.
--
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
+-+--------------------------------------------------------------------+-+
You've asked a question of The Answer Gang, so you've been sent the reply
directly as a courtesy. The TAG list has also been copied. Please send
all replies to tag@lists.linuxgazette.net, so that we can help our other
readers by publishing the exchange in our monthly Web magazine:
Linux Gazette (http://linuxgazette.net/)
+-+--------------------------------------------------------------------+-+
_______________________________________________
TAG mailing list
TAG@lists.linuxgazette.net
http://lists.linuxgazette.net/mailman/listinfo/tag
From tag-bounces@lists.linuxgazette.net Thu Feb 5 10:11:10 2009
To: Ben Okopnik
Date: Thu, 5 Feb 2009 13:08:06 -0500 (EST)
From: Karl Vogel
Cc: tag@lists.linuxgazette.net
Reply-To: vogelke+unix@pobox.com,
The Answer Gang
Sender: tag-bounces@lists.linuxgazette.net
Subject: Re: [TAG] tkb: Talkback:159/okopnik.html
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=utf8
Status: O
>> On Tue, 3 Feb 2009 21:59:16 -0500,
>> Ben Okopnik said:
> On the other hand - please bear with me while I think "out loud" about
> this - since the only place that difference shows up is in the cited
> "hit context" paragraphs in Hyperestraier and not in the content
> itself, I'm not sure how much extra effort this deserves.
Yup, this only starts to matter if you're searching lots of different
filetypes. I was trying to index as much content on a fileserver as I
could, to assist in records-office searches.
> [...] you'd have to split the files into two streams, index each of
> them individually, then do 'extkeys/optimize/purge' on both.
No, the extkeys/etc stuff only has to be done once if you're building
one index to hold more than one type of files.
> I suppose you could use "-fx" to keep the modifications really simple:
> just add something like '-fx htm* H@"lynx -dump -nolist"' to the
> "estcmd gather" line... but if you're going to do that, you might as
> well set up processing for all the other "interesting" types of files:
> PDFs, RTFs, OpenOffice files, etc.
And this is where I found the memory problem mentioned in the original
article, not to mention all sorts of MS/Adobe files which aren't
handled well by rtf2txt, antiword, xls2csv, and pdftotext. I finally
had to resort to running "strings" on lots of things and hoping for
the best. That's what the "locword" entry does in the example below.
The approach that worked best (failed least) was to run "file -i" on a
fileset to get the MIME types, and then make a few passes through the
resulting list to index what I could. Here's part of the script.
```
# --------------------------------------------------------------------------
# $ftype holds output from "file":
#
# /tmp/something.xls| application/msword
# /tmp/resume.pdf| application/pdf
# /tmp/somedb.mdb| application/x-msaccess
opts="-cl -sd -cm -xh -cs 128"
(
# -------------------------------------------------------------------
# Plain text files. The mimetypes file looks like this:
# application/x-perl
# application/x-shellscript
# message/news
# message/rfc822
# text/html
# text/plain
# text/rtf
# text/troff
# text/x-asm
# text/x-c
# text/x-mail
# text/x-news
# text/x-pascal
# text/x-tex
# text/xml
logmsg starting plain text
mimetypes='/usr/local/share/mime/plain-text'
fgrep -f $mimetypes $ftype |
cut -f1 -d'|' |
estcmd gather $opts -ft $dbname -
# -------------------------------------------------------------------
# Word files
logmsg starting Word
exten=".doc,.msg,.xls,.xlw"
grep 'application/msword' $ftype |
cut -f1 -d'|' |
estcmd gather $opts -fx "$exten" "T@locword" -fz $dbname -
# -------------------------------------------------------------------
# Access DBs
logmsg starting Access
exten=".mdb,.mde,.mdt,.use"
grep 'application/x-msaccess' $ftype |
cut -f1 -d'|' |
estcmd gather $opts -fx "$exten" "T@locword" -fz $dbname -
# -------------------------------------------------------------------
# Excel files with different MIME type
logmsg starting remaining Excel
exten=".xls,.xlw"
grep 'application/vnd.ms-excel' $ftype |
cut -f1 -d'|' |
estcmd gather $opts -fx "$exten" "T@locword" -fz $dbname -
# -------------------------------------------------------------------
# PDF files
logmsg starting PDF
exten=".pdf"
grep 'application/pdf' $ftype |
cut -f1 -d'|' |
estcmd gather $opts -fx "$exten" "H@estfxpdftohtml" -fz $dbname -
# -------------------------------------------------------------------
# Index cleanup for searching.
logmsg cleaning up index
estcmd extkeys $dbname
estcmd optimize $dbname
estcmd purge -cl $dbname
) > BUILDLOG 2>&1
'''
--
Karl Vogel I don't speak for the USAF or my company
Rectitude (n.), the formal, dignified demeanor assumed by a proctologist
immediately before he examines you.
--Washington Post "alternate definitions" contest
+-+--------------------------------------------------------------------+-+
You've asked a question of The Answer Gang, so you've been sent the reply
directly as a courtesy. The TAG list has also been copied. Please send
all replies to tag@lists.linuxgazette.net, so that we can help our other
readers by publishing the exchange in our monthly Web magazine:
Linux Gazette (http://linuxgazette.net/)
+-+--------------------------------------------------------------------+-+
_______________________________________________
TAG mailing list
TAG@lists.linuxgazette.net
http://lists.linuxgazette.net/mailman/listinfo/tag
From tag-bounces@lists.linuxgazette.net Fri Feb 6 16:34:29 2009
From: Ben Okopnik
Date: Fri, 6 Feb 2009 19:30:21 -0500
To: tag@lists.linuxgazette.net
Cc: Karl Vogel
Reply-To: The Answer Gang
Sender: tag-bounces@lists.linuxgazette.net
Subject: Re: [TAG] tkb: Talkback:159/okopnik.html
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=utf8
Status: O
On Thu, Feb 05, 2009 at 01:08:06PM -0500, Karl Vogel wrote:
> >> On Tue, 3 Feb 2009 21:59:16 -0500,
> >> Ben Okopnik said:
>
>> [...] you'd have to split the files into two streams, index each of
>> them individually, then do 'extkeys/optimize/purge' on both.
>
> No, the extkeys/etc stuff only has to be done once if you're building
> one index to hold more than one type of files.
You're right, of course.
>> I suppose you could use "-fx" to keep the modifications really simple:
>> just add something like '-fx htm* H@"lynx -dump -nolist"' to the
>> "estcmd gather" line... but if you're going to do that, you might as
>> well set up processing for all the other "interesting" types of files:
>> PDFs, RTFs, OpenOffice files, etc.
>
> And this is where I found the memory problem mentioned in the original
> article, not to mention all sorts of MS/Adobe files which aren't
> handled well by rtf2txt, antiword, xls2csv, and pdftotext. I finally
> had to resort to running "strings" on lots of things and hoping for
> the best. That's what the "locword" entry does in the example below.
I did wonder about that. The way I saw it, trying to convert all the
PDFs at once would really play hell on my poor underpowered laptop. :)
So, I didn't actually go into indexing all the PDFs and such, although I
did a couple of small test runs just to see what it would be like.
> The approach that worked best (failed least) was to run "file -i" on a
> fileset to get the MIME types, and then make a few passes through the
> resulting list to index what I could. Here's part of the script.
That certainly makes sense. I figured that for a simple indexing run,
all you needed was a pipe of the sort I put together - but for anything
more complicated, you'd need tempfiles, for exactly the reason you've
stated (multiple passes.) I actually played around with that quite a bit
('tmp=`mktemp /tmp/searchXXXXXX`' plus 'trap "/bin/rm -rf $tmp" 0' are
my friends!), and found it useful.
[snipping script]
Thanks, Karl - I was actually going to write something like this for
myself later. This will give me a good start on it; possibly, it'll be
of help to any of our readers who have been following along with this.
--
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *
+-+--------------------------------------------------------------------+-+
You've asked a question of The Answer Gang, so you've been sent the reply
directly as a courtesy. The TAG list has also been copied. Please send
all replies to tag@lists.linuxgazette.net, so that we can help our other
readers by publishing the exchange in our monthly Web magazine:
Linux Gazette (http://linuxgazette.net/)
+-+--------------------------------------------------------------------+-+
_______________________________________________
TAG mailing list
TAG@lists.linuxgazette.net
http://lists.linuxgazette.net/mailman/listinfo/tag
From tag-bounces@lists.linuxgazette.net Sat Feb 7 17:37:04 2009
To: Ben Okopnik
Date: Sat, 7 Feb 2009 20:22:53 -0500 (EST)
From: Karl Vogel
Cc: tag@lists.linuxgazette.net
Reply-To: vogelke+unix@pobox.com,
The Answer Gang
Sender: tag-bounces@lists.linuxgazette.net
Subject: Re: [TAG] tkb: Talkback:159/okopnik.html
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=utf8
Status: RO
>> On Fri, 6 Feb 2009 19:30:21 -0500,
>> Ben Okopnik said:
> The way I saw it, trying to convert all the PDFs at once would really play
> hell on my poor underpowered laptop. :) So, I didn't actually go into
> indexing all the PDFs and such, although I did a couple of small test runs
> just to see what it would be like.
My ideal setup (not there yet, but I'm inching closer) is to have two
distinct filetrees on a workstation or server. The first tree would
be /, /usr, /src -- all the junk we know and love. The second tree
(call it /shadow for now) would have drafts for most files under the
first tree. (If you didn't see the first Estraier article, drafts -- or
".est" files -- are the guts of the system; they hold the stuff that's
actually indexed.)
I don't much like databases for search/retrieval because it's not a
really great fit. I don't like millions of tiny files, either; if you
go down hard and have to run fsck, you not only have time for coffee,
you can go to Columbia and pick the beans. My compromise looks like this:
* Create 256 directories under /shadow using hex digits 00-ff.
Each directory has at most 256 zip files named the same way.
* Create a draft for any regular file of interest. One of the attributes
in each draft will be the hash of the contents of the file being
indexed. The MD5 hash of the filename without newline determines
where the draft file will go. For example, we index /etc/motd like so:
``
me% echo /etc/motd | tr -d '\012' | md5sum
b3097c3f6cd13df91fac6e56735da0b6 -
^^^^^^^^^^^^^^^^^^^^^^^^^^^^ <-- draft-filename
^^^^ <-- directory
me% md5sum /etc/motd
58d9f375623df94a2b26b0bcddb20e3d /etc/motd
''
The file /shadow/b3/09.zip will hold a draft called 7c3f...b6.est.
7c3f...b6.est holds all the interesting stuff about /etc/motd: keywords,
last modification time, and an attribute that holds a signature of the
file contents:
''
@sig=58d9f375623df94a2b26b0bcddb20e3d
''
This way, we can go directly from any "interesting" file on the system
to its corresponding draft by looking in no more than one zipfile,
and the draft doesn't have to be updated or reindexed for searching
unless the original file's contents have changed.
I want something that will scale up to tens of millions of indexed files.
I did a few experiments with this, and the 1.6 million files on my
workstation could fit into 64k zipfiles with an average of 25 drafts per
archive. My home directory has ~17,400 files taking up ~300 Mbytes.
Unpacking and zipping the equivalent draft files takes up 9.1 Mbytes
(about 3% of the original file space) if you don't mind doing without
phrase searches.
The current fad seems to be "consolidating" people's working files on
some massive central server for searching, which is dumb on so many
levels; crossing a network to get files that should be local, having a
nice juicy single point of failure, etc. If you want to search files
without generating enough heat to boil the nearest body of water,
put the draft files on the central server and index *those* instead.
--
Karl Vogel I don't speak for the USAF or my company
The outpatients are out in force tonight, I see. --Tom Lehrer
+-+--------------------------------------------------------------------+-+
You've asked a question of The Answer Gang, so you've been sent the reply
directly as a courtesy. The TAG list has also been copied. Please send
all replies to tag@lists.linuxgazette.net, so that we can help our other
readers by publishing the exchange in our monthly Web magazine:
Linux Gazette (http://linuxgazette.net/)
+-+--------------------------------------------------------------------+-+
_______________________________________________
TAG mailing list
TAG@lists.linuxgazette.net
http://lists.linuxgazette.net/mailman/listinfo/tag
From tag-bounces@lists.linuxgazette.net Tue Feb 10 14:01:35 2009
Date: Tue, 10 Feb 2009 14:00:01 -0800
From: Mike Orr
To: The Answer Gang
Reply-To: The Answer Gang
Sender: tag-bounces@lists.linuxgazette.net
Subject: [TAG] tkb: Talkback:115/orr.html
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=utf8
Status: RO
Here's a follow-up to my 2005 article "WSGI Explorations in Python"
(http://linuxgazette.net/115/orr.html). Michael Will asked me what
had happened since then, so I wrote this:
There is no state-of-the-Python-web overview that I know of, but a lot has
happened since I wrote that article. Pretty much all new frameworks
are written for WSGI, and the older ones have been retrofitted.
(CherryPy can run as a WSGI server, Plone can run as an application,
parts of Zope have been extracted to independent Repoze components,
and Quixote has a WSGI gateway floating around somewhere.) Django
works with WSGI sort of, and has been ported to Google App Engine via
WSGI.
I'm involved with Pylons, a framework that's fully WSGI and modular to
the core, built on top of Paste, which is a low-level WSGI library.
TurboGears 2 is being built on top of Pylons. This means that
different frameworks with different goals and target users can share
the same technology, and essentially makes every TG developer a Pylons
developer, doubling our developer base.
There's a group of WSGI framework developers including
Pylons/TG/Repoze.BFG developers that is designing a new framework to potentially
supercede all of them, with plug-in personalities to reflect their
different application styles. This is still at the idea stage but may
have some alpha code by the end of the year. If so it could point the
way to the next generation of frameworks.
Another big issue is Python 3. Over the next year frameworks will
either be ported to Python 3 or replaced by frameworks written for
Python 3. (Though the Python 2 frameworks may continue in use for
several years.) This has to be done on a dependency basis; e.g.,
Pylons can't upgrade until all the components it depends on have
upgraded.
--
Mike Orr
+-+--------------------------------------------------------------------+-+
You've asked a question of The Answer Gang, so you've been sent the reply
directly as a courtesy. The TAG list has also been copied. Please send
all replies to tag@lists.linuxgazette.net, so that we can help our other
readers by publishing the exchange in our monthly Web magazine:
Linux Gazette (http://linuxgazette.net/)
+-+--------------------------------------------------------------------+-+
_______________________________________________
TAG mailing list
TAG@lists.linuxgazette.net
http://lists.linuxgazette.net/mailman/listinfo/tag
From tag-bounces@lists.linuxgazette.net Fri Feb 13 23:52:09 2009
Date: Fri, 13 Feb 2009 23:51:46 -0800
From: "Stack, David"
To: tag@lists.linuxgazette.net
Reply-To: The Answer Gang
Sender: tag-bounces@lists.linuxgazette.net
Subject: [TAG] tkb: Talkback:159/dokopnik.html
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=utf8
Status: RO
Hello
Thanks for the article. I'm attempting to make this work with the 64 bit
version of Ubuntu server 8.10 and the 64bit version of vmware server
2.0. The install went fine but I cannot access the vmware console on the
IP of my ubuntu server on port 8222? I know it's alive on my network as
it accessed packages from the internet and I have a windows server file
share mounted. Any Idea's what to check? Thanks
Dave Stack
--
StackAble IT Solutions LLC
For SBMC P.S.
(509) 220-8517
+-+--------------------------------------------------------------------+-+
You've asked a question of The Answer Gang, so you've been sent the reply
directly as a courtesy. The TAG list has also been copied. Please send
all replies to tag@lists.linuxgazette.net, so that we can help our other
readers by publishing the exchange in our monthly Web magazine:
Linux Gazette (http://linuxgazette.net/)
+-+--------------------------------------------------------------------+-+
_______________________________________________
TAG mailing list
TAG@lists.linuxgazette.net
http://lists.linuxgazette.net/mailman/listinfo/tag
From tag-bounces@lists.linuxgazette.net Sat Feb 14 00:36:06 2009
Date: Sat, 14 Feb 2009 08:35:08 +0000
From: Martin J Hooper
To: The Answer Gang , Stack@sbmc-law.com
Reply-To: The Answer Gang
Sender: tag-bounces@lists.linuxgazette.net
Subject: Re: [TAG] tkb: Talkback:159/dokopnik.html
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=utf8
Status: RO
Stack, David wrote:
> Hello
>
> Thanks for the article. I’m attempting to make this work with the 64
> bit version of Ubuntu server 8.10 and the 64bit version of vmware server
> 2.0. The install went fine but I cannot access the vmware console on
> the IP of my ubuntu server on port 8222? I know it’s alive on my
> network as it accessed packages from the internet and I have a windows
> server file share mounted. Any Idea’s what to check? Thanks
No expert - But is your firewall block port 8222 ?
You would need to allow incoming and outgoing to that port before
you can access the console. Other than that I have no idea ;)
+-+--------------------------------------------------------------------+-+
You've asked a question of The Answer Gang, so you've been sent the reply
directly as a courtesy. The TAG list has also been copied. Please send
all replies to tag@lists.linuxgazette.net, so that we can help our other
readers by publishing the exchange in our monthly Web magazine:
Linux Gazette (http://linuxgazette.net/)
+-+--------------------------------------------------------------------+-+
_______________________________________________
TAG mailing list
TAG@lists.linuxgazette.net
http://lists.linuxgazette.net/mailman/listinfo/tag
From tag-bounces@lists.linuxgazette.net Sat Feb 14 08:35:08 2009
Date: Sat, 14 Feb 2009 14:34:18 -0200
From: Deividson Okopnik
To: The Answer Gang
Reply-To: The Answer Gang
Sender: tag-bounces@lists.linuxgazette.net
Subject: Re: [TAG] tkb: Talkback:159/dokopnik.html
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=utf8
Status: RO
2009/2/14 Stack, David :
> Hello
>
> Thanks for the article. I'm attempting to make this work with the 64 bit
> version of Ubuntu server 8.10 and the 64bit version of vmware server 2.0.
> The install went fine but I cannot access the vmware console on the IP of my
> ubuntu server on port 8222? I know it's alive on my network as it
> accessed packages from the internet and I have a windows server file share
> mounted. Any Idea's what to check? Thanks
>
Hey David. When you point your browser to :8222, can you access
that page? It asks for your login? Can you log in?
Wanting to know exactly what does not work - if i understood it right,
you manage to get into the vmware interface (so youre not blocked by a
firewall), but when you click to open the terminal, it doesnt work -
is that correct? If yes, whats the error? or it just hangs?
+-+--------------------------------------------------------------------+-+
You've asked a question of The Answer Gang, so you've been sent the reply
directly as a courtesy. The TAG list has also been copied. Please send
all replies to tag@lists.linuxgazette.net, so that we can help our other
readers by publishing the exchange in our monthly Web magazine:
Linux Gazette (http://linuxgazette.net/)
+-+--------------------------------------------------------------------+-+
_______________________________________________
TAG mailing list
TAG@lists.linuxgazette.net
http://lists.linuxgazette.net/mailman/listinfo/tag
From tag-bounces@lists.linuxgazette.net Fri Feb 20 17:25:28 2009
To: Ben Okopnik
Date: Fri, 20 Feb 2009 20:23:02 -0500 (EST)
From: Karl Vogel
Cc: tag@lists.linuxgazette.net
Reply-To: vogelke+unix@pobox.com,
The Answer Gang
Sender: tag-bounces@lists.linuxgazette.net
Subject: Re: [TAG] tkb: Talkback:158/vogel.html
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=utf8
Status: RO
Greetings:
I ran into some problems while searching for portions of words in one
of my indexes. If the word "workstation" was present, I wanted to be
able to search for (say) "orksta" without getting 0 hits. I can use
substrings like leading or trailing asterisks in a command-line search
by adding the "-sf" option:
``
me% estcmd search -sf -vu -max 40 $db/srch "$pattern"
''
I'd rather avoid the wildcard stuff when doing a browser search using
estseek.cgi, and I found that adding a synonym list works really well.
To build a wordlist for synonyms, run "estcmd words" on your search index:
``
me% estcmd words srch | awk '{print $1}' | sort -u > /path/to/synonyms
''
Here are the changes to estseek.conf:
```
1 # phraseform: specifies the phrase form. "1" is usual form, "2"
2 # is simplified form, "3" is rough form, "4" is union form, "5"
3 # is intersection form.
4 phraseform: 2
5
6 # candetail: specifies whether to enable detail display of a
7 # document. "true" or "false".
8 candetail: true
9
10 # candir: specifies whether to enable directory display of a
11 # document. "true" or "false".
12 candir: true
13
14 # If you want query expansion, enable an outer command by editing
15 # qxpndcmd in estseek.conf. It specifies the absolute path of
16 # a command which outputs synonyms of a word specified by the
17 # environment variable "ESTWORD".
18 qxpndcmd: /usr/local/share/hyperestraier/filter/myxpnd
'''
Here's the expansion script "myxpnd":
```
1 #!/bin/ksh
2 # myxpand: list synonyms
3
4 # set variables
5 LANG=C ; export LANG
6 LC_ALL=C ; export LC_ALL
7 PATH="/usr/local/bin:/bin:/usr/bin" ; export PATH
8
9 # show help message
10 case "$1" in
11 --help) echo 'List synonyms of a word'; exit 0 ;;
12 *) ;;
13 esac
14
15 # list synonyms
16 exec fgrep "$ESTWORD" /path/to/synonyms
17 exit 0
'''
Here's part of a search form that works:
```
'''
--
Karl Vogel I don't speak for the USAF or my company
WEDDING DRESS FOR SALE: Worn once by mistake. Call Stephanie.
--seen in the want-ads
+-+--------------------------------------------------------------------+-+
You've asked a question of The Answer Gang, so you've been sent the reply
directly as a courtesy. The TAG list has also been copied. Please send
all replies to tag@lists.linuxgazette.net, so that we can help our other
readers by publishing the exchange in our monthly Web magazine:
Linux Gazette (http://linuxgazette.net/)
+-+--------------------------------------------------------------------+-+
_______________________________________________
TAG mailing list
TAG@lists.linuxgazette.net
http://lists.linuxgazette.net/mailman/listinfo/tag
From tag-bounces@lists.linuxgazette.net Fri Feb 20 22:06:38 2009
Date: Sat, 21 Feb 2009 11:31:46 +0530
From: Kapil Hari Paranjape
To: Aditya Bhiday , tag@lists.linuxgazette.net
Reply-To: The Answer Gang
Sender: tag-bounces@lists.linuxgazette.net
Subject: [TAG] tkb: Talkback:144/lg_mail.html
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=utf8
Status: RO
Dear TAG-ers,
I am enclosing a qeury received regarding #144.
Regards,
Kapil.
P.S. (to aditya) please do not mail TAG members directly. Use the
mailing list address as above instead.
----- Forwarded message from Aditya Bhiday -----
Date: Sat, 21 Feb 2009 11:18:15 +0530
Subject: Regarding Proxy Tunneling (TLDP)
From: Aditya Bhiday
To: kapil@imsc.res.in
Hi,
I came across a post at
http://tldp.org/LDP/LGNET/144/misc/lg/question_on_how_to_block_a_ssh_host_from_being_used_as_a_socks_proxy.htmlwhich
said that
"AllowTcpForwarding Specifies whether TCP forwarding is permitted. The
default is "yes". Note that disabling TCP forwarding does not improve
security unless users are also denied shell access, as they can always
install their own forwarders."
I was just experimenting around with tunneling and as to how to block it.
Please could explain to me how one can install their own forwarders if ssh
tunneling is blocked, or the name of such a forwarding software?
Thanks,
Regards,
Aditya Bhiday
----- End forwarded message -----
+-+--------------------------------------------------------------------+-+
You've asked a question of The Answer Gang, so you've been sent the reply
directly as a courtesy. The TAG list has also been copied. Please send
all replies to tag@lists.linuxgazette.net, so that we can help our other
readers by publishing the exchange in our monthly Web magazine:
Linux Gazette (http://linuxgazette.net/)
+-+--------------------------------------------------------------------+-+
_______________________________________________
TAG mailing list
TAG@lists.linuxgazette.net
http://lists.linuxgazette.net/mailman/listinfo/tag
From tag-bounces@lists.linuxgazette.net Fri Feb 20 22:13:34 2009
Date: Sat, 21 Feb 2009 11:39:12 +0530
From: Aditya Bhiday
To: tag@lists.linuxgazette.net
Reply-To: The Answer Gang
Sender: tag-bounces@lists.linuxgazette.net
Subject: Re: [TAG] tkb: Talkback:144/lg_mail.html
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=utf8
Status: RO
Oh, I'm sorry. I'm new to mailing lists.
I'll keep that in mind.
However when I send a message to mailing list I am not a part of, do I
receive the replies to my messages in my Inbox?
Regards,
Aditya
+-+--------------------------------------------------------------------+-+
You've asked a question of The Answer Gang, so you've been sent the reply
directly as a courtesy. The TAG list has also been copied. Please send
all replies to tag@lists.linuxgazette.net, so that we can help our other
readers by publishing the exchange in our monthly Web magazine:
Linux Gazette (http://linuxgazette.net/)
+-+--------------------------------------------------------------------+-+
_______________________________________________
TAG mailing list
TAG@lists.linuxgazette.net
http://lists.linuxgazette.net/mailman/listinfo/tag
From tag-bounces@lists.linuxgazette.net Fri Feb 20 22:12:54 2009
Date: Sat, 21 Feb 2009 11:39:15 +0530
From: Kapil Hari Paranjape
To: Aditya Bhiday , tag@lists.linuxgazette.net
Reply-To: The Answer Gang
Sender: tag-bounces@lists.linuxgazette.net
Subject: Re: [TAG] tkb: Talkback:144/lg_mail.html
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=utf8
Status: RO
Hello,
On Sat, 21 Feb 2009 Aditya Bhiday wrote:
> I was just experimenting around with tunneling and as to how to block it.
> Please could explain to me how one can install their own forwarders if ssh
> tunneling is blocked, or the name of such a forwarding software?
IF:
``
- shell account access is enabled
and
- the user of that shell account can install programs
and
- run these programs
''
then forwarding is possible.
For example, the user can install "slirp" which takes a tty and
converts it into a ppp server. The user can then attach a pppd
process to the other end of the tty.
Kapil.
--
+-+--------------------------------------------------------------------+-+
You've asked a question of The Answer Gang, so you've been sent the reply
directly as a courtesy. The TAG list has also been copied. Please send
all replies to tag@lists.linuxgazette.net, so that we can help our other
readers by publishing the exchange in our monthly Web magazine:
Linux Gazette (http://linuxgazette.net/)
+-+--------------------------------------------------------------------+-+
_______________________________________________
TAG mailing list
TAG@lists.linuxgazette.net
http://lists.linuxgazette.net/mailman/listinfo/tag
From tag-bounces@lists.linuxgazette.net Fri Feb 20 22:18:08 2009
Date: Sat, 21 Feb 2009 11:44:44 +0530
From: Aditya Bhiday
To: Aditya Bhiday , tag@lists.linuxgazette.net
Reply-To: The Answer Gang
Sender: tag-bounces@lists.linuxgazette.net
Subject: Re: [TAG] tkb: Talkback:144/lg_mail.html
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=utf8
Status: RO
Yes, but if it an ordinary user, with no administrative powers, then just
disabling the TCP forwarding in the ssh daemon config should block all
tunneling right?
Regards,
Aditya
References
+-+--------------------------------------------------------------------+-+
You've asked a question of The Answer Gang, so you've been sent the reply
directly as a courtesy. The TAG list has also been copied. Please send
all replies to tag@lists.linuxgazette.net, so that we can help our other
readers by publishing the exchange in our monthly Web magazine:
Linux Gazette (http://linuxgazette.net/)
+-+--------------------------------------------------------------------+-+
_______________________________________________
TAG mailing list
TAG@lists.linuxgazette.net
http://lists.linuxgazette.net/mailman/listinfo/tag
From tag-bounces@lists.linuxgazette.net Sat Feb 21 13:15:40 2009
Date: Sat, 21 Feb 2009 13:14:12 -0800
From: Rick Moen
To: Aditya Bhiday
Cc: tag@lists.linuxgazette.net
Reply-To: The Answer Gang
Sender: tag-bounces@lists.linuxgazette.net
Subject: Re: [TAG] tkb: Talkback:144/lg_mail.html
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=utf8
Status: RO
Quoting Aditya Bhiday (aditya.bhiday@gmail.com):
> Oh, I'm sorry. I'm new to mailing lists.
> I'll keep that in mind.
>
> However when I send a message to mailing list I am not a part of, do I
> receive the replies to my messages in my Inbox?
Not automatically. However: (1) TAG mailing list members make a point
of CCing querents under the assumption that they are not subscribed,
specifically so that you _do_ get copies, and (2) you or anyone else are
of course very welcome to join the TAG mailing list. (See URL at
bottom.) You might merely find following the discussions to be
interesting, and eventually might wish to participate. That's how we
get new members of The Answer Gang! ;->
--
Cheers, "Please return all dogmas to their orthodox positions."
Rick Moen -- Brad Johnson, in r.a.sf.w.r-j
rick@linuxmafia.com
+-+--------------------------------------------------------------------+-+
You've asked a question of The Answer Gang, so you've been sent the reply
directly as a courtesy. The TAG list has also been copied. Please send
all replies to tag@lists.linuxgazette.net, so that we can help our other
readers by publishing the exchange in our monthly Web magazine:
Linux Gazette (http://linuxgazette.net/)
+-+--------------------------------------------------------------------+-+
_______________________________________________
TAG mailing list
TAG@lists.linuxgazette.net
http://lists.linuxgazette.net/mailman/listinfo/tag
From tag-bounces@lists.linuxgazette.net Sat Feb 21 17:27:45 2009
Date: Sun, 22 Feb 2009 06:53:32 +0530
From: Kapil Hari Paranjape
To: tag@lists.linuxgazette.net
Reply-To: The Answer Gang
Sender: tag-bounces@lists.linuxgazette.net
Subject: Re: [TAG] tkb: Talkback:144/lg_mail.html
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=utf8
Status: RO
Hello,
On Sat, 21 Feb 2009, Aditya Bhiday wrote:
> On Sat, Feb 21, 2009 at 11:39 AM, Kapil Hari Paranjape wrote:
> > For example, the user can install "slirp" which takes a tty and
> > converts it into a ppp server. The user can then attach a pppd
> > process to the other end of the tty.
> Yes, but if it an ordinary user, with no administrative powers, then just
> disabling the TCP forwarding in the ssh daemon config should block all
> tunneling right?
An "ordinary" user with a shell account can generally download a
program to their home directory and run it. So I don't understand
your remark.
Kapil.
--
+-+--------------------------------------------------------------------+-+
You've asked a question of The Answer Gang, so you've been sent the reply
directly as a courtesy. The TAG list has also been copied. Please send
all replies to tag@lists.linuxgazette.net, so that we can help our other
readers by publishing the exchange in our monthly Web magazine:
Linux Gazette (http://linuxgazette.net/)
+-+--------------------------------------------------------------------+-+
_______________________________________________
TAG mailing list
TAG@lists.linuxgazette.net
http://lists.linuxgazette.net/mailman/listinfo/tag
From tag-bounces@lists.linuxgazette.net Thu Feb 26 16:05:31 2009
Date: Fri, 27 Feb 2009 00:03:59 +0000
From: Jimmy O'Regan
To: The Answer Gang
Reply-To: The Answer Gang
Sender: tag-bounces@lists.linuxgazette.net
Subject: [TAG] tkb: Talkback:155/moen.html
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=utf8
Status: RO
2008/9/22 Rick Moen :
> the licensed work.) Is it possible to donate a work of original
> ownership directly to the public domain, despite the lack of any legal
> mechanism for doing so? Is it desirable to have a choice-of-law
Probably not, but you can now dedicate a work to 'the Commons':
http://creativecommons.org/licenses/zero/1.0/
``
'The person who associated a work with this document has dedicated
this work to the Commons by waiving all of his or her rights to the
work under copyright law and all related or neighboring legal rights
he or she had in the work, to the extent allowable by law.'
''
+-+--------------------------------------------------------------------+-+
You've asked a question of The Answer Gang, so you've been sent the reply
directly as a courtesy. The TAG list has also been copied. Please send
all replies to tag@lists.linuxgazette.net, so that we can help our other
readers by publishing the exchange in our monthly Web magazine:
Linux Gazette (http://linuxgazette.net/)
+-+--------------------------------------------------------------------+-+
_______________________________________________
TAG mailing list
TAG@lists.linuxgazette.net
http://lists.linuxgazette.net/mailman/listinfo/tag
From tag-bounces@lists.linuxgazette.net Thu Feb 26 16:45:25 2009
Date: Thu, 26 Feb 2009 16:44:35 -0800
From: Rick Moen
To: tag@lists.linuxgazette.net
Reply-To: The Answer Gang
Sender: tag-bounces@lists.linuxgazette.net
Subject: Re: [TAG] tkb: Talkback:155/moen.html
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=utf8
Status: RO
Quoting Jimmy O'Regan (joregan@gmail.com):
> 2008/9/22 Rick Moen :
> > the licensed work.) Is it possible to donate a work of original
> > ownership directly to the public domain, despite the lack of any legal
> > mechanism for doing so? Is it desirable to have a choice-of-law
>
> Probably not, but you can now dedicate a work to 'the Commons':
> http://creativecommons.org/licenses/zero/1.0/
>
> 'The person who associated a work with this document has dedicated
> this work to the Commons by waiving all of his or her rights to the
> work under copyright law and all related or neighboring legal rights
> he or she had in the work, to the extent allowable by law.'
Yes, I actually talk about the "CC0 Waiver"[1] in my comprehensive rundown
(Web page) on legal problems that _may_ result from purporting (as
copyright owner) to use any PD grant, including Creative Commons's. The
page includes a dialogue I had with CC co-founder, Prof. Lawrence
Lessig, where he acknowledges that the concept is problematic, and that
the CC0 Waiver is merely CC's effort to "frame the dedication in as
complete and reliable way as possible".
My page also points out that all such problems can be completely avoided
easily by using, instead, a simple one-line permissive licence grant
(with example cited) -- or two lines if you disclaim warranties (which
is advisable).
http://linuxmafia.com/faq/Licensing_and_Law/public-domain.html
[1] I guess they must have recently retired that term, and are now
calling it "Commons Deed" or CC0. I'll update my page, accordingly.
--
Cheers, Bad Unabomber!
Rick Moen Blowing people all to hell.
rick@linuxmafia.com Do you take requests?
-- Unabomber Haiku Contest, CyberLaw mailing list
+-+--------------------------------------------------------------------+-+
You've asked a question of The Answer Gang, so you've been sent the reply
directly as a courtesy. The TAG list has also been copied. Please send
all replies to tag@lists.linuxgazette.net, so that we can help our other
readers by publishing the exchange in our monthly Web magazine:
Linux Gazette (http://linuxgazette.net/)
+-+--------------------------------------------------------------------+-+
_______________________________________________
TAG mailing list
TAG@lists.linuxgazette.net
http://lists.linuxgazette.net/mailman/listinfo/tag