2-Cent Tips

Two-cent Tip: Retrieving directory contents

Ben Okopnik [ben at linuxgazette.net]


Wed, 3 Feb 2010 22:35:08 -0500

During a recent email discussion regarding pulling down the LG archives with 'wget', I discovered (perhaps mistakenly; if so, I wish someone would enlighten me [1]) that there's no way to tell it to pull down all the files in a directory unless there's a page that links to all those files... and the directory index doesn't count (even though it contains links to all those files.) So, after a minute or two of fiddling with it, I came up with a following solution:

#!/bin/bash
# Created by Ben Okopnik on Fri Jan 29 14:41:57 EST 2010
 
[ -z "$1" ] && { printf "Usage: ${0##*/} <URL> \n"; exit; }
 
# Safely create a temporary file
file=`tempfile`
# Extract all the links from the directory listing into a local text file
wget -q -O - "$1"|\
URL="${1%/}" perl -wlne'print "$ENV{URL}/$2" if /href=(["\047])([^\1]+)\1/' > $file
# Retrieve the listed links
wget -i $file && rm $file

To summarize, I used 'wget' to grab the directory listing, parse it to extract all the links, prefixing them with the site URL, and saved the result into a local tempfile. Then, I used that tempfile as a source for 'wget's '-i' option (read the links to be retrieved from a file.)

I've tested this script on about a dozen directories with a variety of servers, and it seems to work fine.

[1] Please test your proposed solution, though. I'm rather cranky at 'wget' with regard to its documentation; perhaps it's just me, but I often find that the options described in its manpage do something rather different from what they promise to do. For me, 'wget' is a terrific program, but the documentation has lost something in the translation from the original Martian.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *



Two-cent Tip: How big is that directory?

Dr. Parthasarathy S [drpartha at gmail.com]


Tue, 2 Feb 2010 09:57:02 +0530

At times, you may need to know exactly how big is a certain directory (say top directory) along with all its contents and subdirectories(and their contents). You may need this if you are copying a large diectory along with its contents and structure. And you may like to know if what you got after the copy, is what you sent. Or you may need this when trying to copy stuff on to a device where the space is limited. So you want to make sure that you can accomodate the material you are planning to send.

Here is a cute little script. Calling sequence::

howmuch <top directory name>

You get a summary, which gives the total size, the number of subdirectories, and the number of files (counted from the top directory). Good for book-keeping.

###########start-howmuch-script
# Tells you how many files, subdirectories and content bytes in a
# directory
# Usage :: how much <directory-path-and-name>
 
# check if there is no command line argument
if [ $# -eq 0 ]
then
echo "You forgot the directory to be accounted for !"
echo "Usage :: howmuch <directoryname with path>"
exit
fi
 
echo "***start-howmuch***"
pwd > ~/howmuch.rep
pwd
echo -n "Disk usage of directory ::" > ~/howmuch.rep
echo $1 >> ~/howmuch.rep
echo -n "made on ::" >> ~/howmuch.rep
du -s $1 > ~/howmuch1
tree $1 > ~/howmuch2
date >> ~/howmuch.rep
tail ~/howmuch1 >> ~/howmuch.rep
tail --lines=1 ~/howmuch2 >> ~/howmuch.rep
cat ~/howmuch.rep
# cleanup
rm ~/howmuch1
rm ~/howmuch2
#Optional -- you can delete howmuch.rep if you want
#rm ~/howmuch.rep
 
echo "***end-howmuch***"
#   
 
 
########end-howmuch-script
-- 
---------------------------------------------------------------------------------------------
Dr. S. Parthasarathy                    |   mailto:drpartha@gmail.com
Algologic Research & Solutions    |
78 Sancharpuri Colony                 |
Bowenpally  P.O                          |   Phone: + 91 - 40 - 2775 1650
Secunderabad 500 011 - INDIA     |
WWW-URL: http://algolog.tripod.com/nupartha.htm
---------------------------------------------------------------------------------------------


Ben Okopnik [ben at linuxgazette.net]


Wed, 3 Feb 2010 22:04:25 -0500

Hi, Partha -

On Tue, Feb 02, 2010 at 09:57:02AM +0530, Dr. Parthasarathy S wrote:

> 
> I hope you will find the enclosed submission worthwhile for LG. Please let me
> know as soon as it gets published, or if it is not worth publishing in LG.
> Thank you.

Pretty much anything sent to TAG, other than flames and spam, is fodder for discussion and publication; that's what we're all about. Now, as to the script itself -

###########start-howmuch-script
# Tells you how many files, subdirectories and content bytes in a
# directory
# Usage :: how much <directory-path-and-name>
 
# check if there is no command line argument
if [ $# -eq 0 ]
then
echo "You forgot the directory to be accounted for !"
echo "Usage :: howmuch <directoryname with path>"
exit
fi
 
echo "***start-howmuch***"
pwd > ~/howmuch.rep
pwd
echo -n "Disk usage of directory ::" > ~/howmuch.rep
echo $1 >> ~/howmuch.rep
echo -n "made on ::" >> ~/howmuch.rep
du -s $1 > ~/howmuch1
tree $1 > ~/howmuch2
date >> ~/howmuch.rep
tail ~/howmuch1 >> ~/howmuch.rep
tail --lines=1 ~/howmuch2 >> ~/howmuch.rep
cat ~/howmuch.rep
# cleanup
rm ~/howmuch1
rm ~/howmuch2
#Optional -- you can delete howmuch.rep if you want
#rm ~/howmuch.rep
 
echo "***end-howmuch***"
#   
 
########end-howmuch-script

One of the standard practices in shell scripting is to stay away from temporary files unless they're necessary (e.g., if you need to use a program that only takes files as input.) What would happen, for example, if you already had a file called 'howmuch.rep' in that directory? For example, if you had run this script for, say, the 'foo' directory yesterday, forgot about it, and wanted to get the results for the 'bar' directory today? The first file would be gone - and you wouldn't know anything about it until you wanted the data.

This is why the standard practice is to construct every program as a filter - that is, arrange it so that it takes data in, transforms it, and outputs it to STDOUT. What this mostly means with regard to coding is using pipes instead of temporary files. For example (I'm going to make an explicit effort to replicate your script's output here):

#!/bin/bash
# Created by Ben Okopnik on Wed Feb  3 21:57:05 EST 2010
 
[ -d "$1" ] || { printf "Usage: ${0##*/} <directory>\n"; exit; }
 
pwd
echo -e "Disk usage of directory ::$1\nmade on ::`date`"
du -sk "$1"
# You could use 'tree "$1"|sed -n "$p"' - or stick with the standard toolkit
ls -lR "$1"|awk '/^\//{d++};/^-/{f++}END{print d-1" directories, "f" files"}'

This will do essentially the same thing as your script, but without any temp files. The output can be saved to a specified file simply by redirecting it, or it can be further filtered/modified by piping it to another program.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *


Dr. Parthasarathy S [drpartha at gmail.com]


Thu, 4 Feb 2010 10:47:22 +0530

YES I agree. Your script is better than mine. If you authorise me, I will resubmit my stuff using your script (after I add a comment line acknowledging your contribution). Or I can just use the script for my internal usage and not submit for publication.

I thank you for your frank opinion and CONSTRUCTIVE criticism. That is how we all learn.

partha

-- 
---------------------------------------------------------------------------------------------
Dr. S. Parthasarathy                    |   mailto:drpartha@gmail.com
Algologic Research & Solutions    |
78 Sancharpuri Colony                 |
Bowenpally  P.O                          |   Phone: + 91 - 40 - 2775 1650
Secunderabad 500 011 - INDIA     |
WWW-URL: http://algolog.tripod.com/nupartha.htm
---------------------------------------------------------------------------------------------


Ben Okopnik [ben at linuxgazette.net]


Thu, 4 Feb 2010 00:56:09 -0500

On Thu, Feb 04, 2010 at 10:47:22AM +0530, Dr. Parthasarathy S wrote:

> YES I agree. Your script is better than mine.

Oh-oh. Context error warning! :)

Partha, this wasn't any sort of competition; as an example of what happens here, I've just sent in a Two-cent tip about using 'wget' to download files based on directory listings - and I fully expect that someone here will tell me that I could have done all that by using the "--download-all-the-files-in-the-directory-listing" option or something like that. :) The thing is, though, that it wouldn't be a question of "better" or "worse": both methods are useful. E.g., I'd learn about the option from the follow-up post; someone else might learn about Perl regular expressions, which I used in my tip, from my post.

My overall point is that we share this information, these ideas, with the Linux community - and all of us get to learn from all the information in the exchange. So, my answer isn't necessarily "better" than yours; yours may well be more useful for someone else's purposes.

> If you authorise me, I will
> resubmit my stuff using your script (after I add a comment line acknowledging
> your contribution). Or I can just use the script for my internal usage and not
> submit for publication.

There's nothing to resubmit, or authorize: TAG is open to our readers for the exact purpose of hosting this kind of discussion. As I mentioned the last time, all TAG content, minus flames and spam, gets published in the Mailbag section of LG - so readers get to see the whole exchange.

> I thank you for your frank opinion and CONSTRUCTIVE criticism. That is how we
> all learn.

You're welcome! And you're right - this is indeed one of the best ways to learn. TAG was one of the keystones of my own learning process about Linux, especially in the early days - and it still remains a useful tool that keeps up my skills, among other things.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *


Dr. Parthasarathy S [drpartha at gmail.com]


Thu, 4 Feb 2010 17:53:04 +0530

Although I do not show up too often on TAG, believe me I read every issue of LG and of course Two-cent tips.

I do use these ideas often both for myself and for teaching.

Thanks for all the great job you are doing.

partha

-- 
---------------------------------------------------------------------------------------------
Dr. S. Parthasarathy                    |   mailto:drpartha@gmail.com
Algologic Research & Solutions    |
78 Sancharpuri Colony                 |
Bowenpally  P.O                          |   Phone: + 91 - 40 - 2775 1650
Secunderabad 500 011 - INDIA     |
WWW-URL: http://algolog.tripod.com/nupartha.htm
---------------------------------------------------------------------------------------------



Two-cent tip: GRUB and inode sizes

René Pfeiffer [lynx at luchs.at]


Wed, 3 Feb 2010 01:07:03 +0100

Hello!

I had a decent fight with a stubborn server today. It was a Fedora Core 6 system (let's not talk about how old it is) that was scheduled for a change of disks. This is fairly straightforward - until you have to write the boot block. Unfortunately I prepared the new disks before copying the files. As soon as I wanted to install GRUB 0.97 it told me that it could not read the stage1 file. The problem is that GRUB only deals with 128-byte inodes. The prepared / partition has 256-byte inodes. So make sure to use

mkfs.ext3 -I 128 /dev/sda1

when preparing disks intended to co-exist with GRUB. I know this is old news, but I never encountered this problem before. http://www.linuxplanet.com/linuxplanet/tutorials/6480/2/ has more hints ready.

Best, René, who is thinking about moving back to LILO.


Thomas Adam [thomas.adam22 at gmail.com]


Wed, 3 Feb 2010 00:11:02 +0000

On Wed, Feb 03, 2010 at 01:07:03AM +0100, Ren? Pfeiffer wrote:

> Hello!
> 
> I had a decent fight with a stubborn server today. It was a Fedora Core
> 6 system (let's not talk about how old it is) that was scheduled for a
> change of disks. This is fairly straightforward - until you have to
> write the boot block. Unfortunately I prepared the new disks before

+1 for Debian here then, who manages to get this right off the bat.

-- Thomas Adam

-- 
"It was the cruelest game I've ever played and it's played inside my head."
-- "Hush The Warmth", Gorky's Zygotic Mynci.


Mulyadi Santosa [mulyadi.santosa at gmail.com]


Wed, 3 Feb 2010 09:14:49 +0700

On Wed, Feb 3, 2010 at 7:07 AM, René Pfeiffer <lynx@luchs.at> wrote:

> that it could not read the stage1 file. The problem is that GRUB only
> deals with 128-byte inodes. The prepared / partition has 256-byte
> inodes. So make sure to use
>
> mkfs.ext3 -I 128 /dev/sda1
>
> when preparing disks intended to co-exist with GRUB. I know this is old
> news, but I never encountered this problem before.
> http://www.linuxplanet.com/linuxplanet/tutorials/6480/2/ has more hints
> ready.

Never thought about that before. Thanks for the tip!

-- 
regards,
Freelance Linux trainer and consultant

blog: the-hydra.blogspot.com training: mulyaditraining.blogspot.com



Two-cent Tip: backgrounding the last stopped job without knowing its job ID

Mulyadi Santosa [mulyadi.santosa at gmail.com]


Mon, 22 Feb 2010 16:14:09 +0700

For most people, to send a job to background after stopping a task, he/she will take a note the job ID and then invoke "bg" command appropriately like below:

$ (while (true); do yes  > /dev/null ; done)
^Z
[2]+  Stopped                 ( while ( true ); do
    yes > /dev/null;
done )
 
$ bg %2
[2]+ ( while ( true ); do
    yes > /dev/null;
done ) &

Can we omit the job ID? Yes, we can. Simply replace the above "bg %2" with "bg %%". It will refer to the last stopped job ID. This way, command typing mistake could be avoided too.

-- 
regards,
 
Mulyadi Santosa
Freelance Linux trainer and consultant
 
blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com


Thomas Adam [thomas at xteddy.org]


Mon, 22 Feb 2010 09:28:43 +0000

On 22 February 2010 09:14, Mulyadi Santosa <mulyadi.santosa@gmail.com> wrote:

> For most people, to send a job to background after stopping a task,
> he/she will take a note the job ID and then invoke "bg" command
> appropriately like below:
>
> $ (while (true); do yes  > /dev/null ; done)
> ^Z
> [2]+  Stopped                 ( while ( true ); do
>    yes > /dev/null;
> done )

This is very shell-specific here in terms of output:

% xterm
^Z
zsh: suspended  xterm
> $ bg %2
> [2]+ ( while ( true ); do
>    yes > /dev/null;
> done ) &
>
> Can we omit the job ID? Yes, we can. Simply replace the above "bg %2"
> with "bg %%". It will refer to the last stopped job ID. This way,
> command typing mistake could be avoided too.

Precisely -- which is where the "jobs" builtin is also useful:

% jobs -p
[1]  - 5317 running    xterm
[2]  + 5452 running    xterm

-- Thomas Adam


Ben Okopnik [ben at linuxgazette.net]


Mon, 22 Feb 2010 09:52:41 -0500

On Mon, Feb 22, 2010 at 04:14:09PM +0700, Mulyadi Santosa wrote:

> For most people, to send a job to background after stopping a task,
> he/she will take a note the job ID and then invoke "bg" command
> appropriately like below:
> 
> $ (while (true); do yes  > /dev/null ; done)
> ^Z
> [2]+  Stopped                 ( while ( true ); do
>     yes > /dev/null;
> done )
> 
> $ bg %2
> [2]+ ( while ( true ); do
>     yes > /dev/null;
> done ) &
> 
> Can we omit the job ID? Yes, we can. Simply replace the above "bg %2"
> with "bg %%". It will refer to the last stopped job ID. This way,
> command typing mistake could be avoided too.

What's wrong with a simple 'bg'? The default, when you don't specify an argument, is the current job.

In my experience, specifying an arg to 'bg' or 'fg' is unnecessary 99% of the time - but that may just be the way that I use the job system, since I rarely have more than one thing backgrounded at any one time. The only place where I find '%%' useful is when it's coupled with 'kill': sometimes, a process that's not amenable to being stopped with 'Ctrl-C' will respond to a 'Ctrl-Z', after which it can be killed off with 'kill %%'.

-- 
* Ben Okopnik * Editor-in-Chief, Linux Gazette * http://LinuxGazette.NET *


Mulyadi Santosa [mulyadi.santosa at gmail.com]


Tue, 23 Feb 2010 00:09:13 +0700

Hi Ben...

On Mon, Feb 22, 2010 at 9:52 PM, Ben Okopnik <ben@linuxgazette.net> wrote:

> What's wrong with a simple 'bg'? The default, when you don't specify an
> argument, is the current job.

Oh, you're right.....thanks, I just realized that after trying it by myself. Fun, wish I knew it from long time ago :)

-- 
regards,
 
Mulyadi Santosa
Freelance Linux trainer and consultant
 
blog: the-hydra.blogspot.com
training: mulyaditraining.blogspot.com