(?) The Answer Gang (!)
By Jim Dennis, Ben Okopnik, Dan Wilder, Breen, Chris, and... (meet the Gang) ... the Editors of Linux Gazette... and You!


We have guidelines for asking and answering questions. Linux questions only, please.
We make no guarantees about answers, but you can be anonymous on request.
See also: The Answer Gang's Knowledge Base and the LG Search Engine



(?) shell and pipe question

From Brian Chrisman

Answered By: Rick Moen, Heather Stern, Jim Dennis, Thomas Adam

This is what I'm trying to do specifically:

Say I have a bunch of log files:


log.0
log.1
log.2

etc.

And I want to do these things:

cat log.* | gzip > /other/place/all-logs.gz
cat log.* | grep -v somecrap | gzip > /other/place/all-logs-cleaned.gz

Just running the 2 cats is easy when the file sizes on log.* are small. But when those files come to total many gigs, doing the cat twice gets to be very expensive.

Any suggestions?

(!) [Rick] This will use (abuse) tee and fifos to do kinda what you want as well..
cat - | (mkfifo /tmp/x; (cat /tmp/x | gzip > /tmp/xx.gz &) ; tee -a /tmp/x) | cat -
(!) [Thomas]
(cat log.*) | tee -a gzip > /other/all-logs.gz | grep -v somecrap | gzip > /other/all-logs-cleaned/gz
(!) [JimD] How do you mark the EOF?
(!) [Heather] You really want one file with all the log data, instead of a tar so you can tell which log a given line had been from? Hmm. Sounds like a way to lose info a sysadmin would care about later.
Well, grep doesn't have any problem searching multiple files - just say all of them on its command line - so the initial request gains a useless use of cat award. Moreover, a small awk script looking over the files would as easily be able to spit out two instances of the log as one... it can even mention the name of the incoming file in your first output so that the logfilename really isn't lost during the concatenation.
Since you specifically requested bash (the default shell on nearly every Linux distribution) and it allows you to have additional file handles ... thought you were limited to standard out, didn't you? haha! It's not so! ...you should be able to do something much cooler. Even in another shell you can at least use stderr also.
{brief pause while I grodgel around for whether pipelines support the additional handles. apparently not. but < and > do... viz < ;(commandline) and > ;(commandline) automatically generating the fifo to use. whee!}
(!) [JimD] Yep. Heather as it right. This stuff is called "process substitution" and it substitutes the name of a dynamically created pipe (possibly a named pipe in /tmp) for the < ;(...) or > ;(...) expression.
I have to confess I don't quite understand how the EOF is propagated through these if they are implemented as named pipes (FIFOs). In retrospect this also works:
mkfifo ./pipesmoke &&  cat log.* | tee ./pipesmoke | gzip -c9 > all.gz &
grep -v [a-z0-9] < pipesmoke | gzip -c9 > all-logs-cleaned.gz
(!) [JimD] ... notice that you MUST background that first pipeline or the tee to pipesmoke will block. Conversely you could start the second pipeline in the background (it will immediately block on the empty pipesmoke FIFO) and then run the cat ... | tee ... command. Clearly one or the other (or both) must be in the background if they are started from the same script. Otherwise you'd have a perpetually blocked process waiting for input or for room in the buffer for you.
(!) [JimD]
The problem here is that tee writes to a file, not a program. You need "process substitution" like so:
  cat log.* | tee >( grep -v somecrap | gzip -c9 > all-logs-cleaned.gz ) | gzip -c9 > all.gz
... or:
  cat log.* | tee >( gzip -c9 > all.gz ) | grep -v somecrap | gzip -c9 > all-logs-cleaned.gz


This page edited and maintained by the Editors of Linux Gazette
Copyright © its authors, 2004
Published in issue 100 of Linux Gazette March 2004
HTML script maintained by Heather Stern of Starshine Technical Services, http://www.starshine.org/


[ Table Of Contents ][ Answer Guy Current Index ] greetings   Meet the Gang   1   2   3   4   5   6   7 [ Index of Past Answers ]