PAGE DATE 19990427

DESCRIPTION
This is the index seedoc for various comments on various error messages that occur in a GNU/Linux system. Error messages describe what went wrong in a very limited way, since all the system knows is what failed, not what you're trying to do. As a result, when an error occurs, the message produced can often be quite misleading, and the art of interpretation comes into play. I hope this document helps.

There is much mention of a command or call's "return value" in the docs. When a child process finishes it leaves one value with the parent, it's return value. This return value is often used as an error code. When a user command returns error=true to the shell there is almost always an associated text error message. The underlying numerical codes must be dealt with when programming. As a general rule, with numerical error codes, 0 represents false. A return value usually means "error", rather than "success", so a command or function that returns a value as an error code will return 0 for no error, and some non-zero value for an error. This is rather counter-intuitive, but it makes sense if you think of a 0 error code as "error = false", i.e. the command or call succeeded. There are exceptions however, and some commands or calls may also return a positive value or zero on success, and a negative number on failure.

COMMANDS
The traditional manpages for user commands often have a DIAGNOSTICS section which explains error messages and returned values when things don't seem to be going right. The current GNU grep section 1 manpage has a DIAGNOSTICS section, for example. Some tips on various common user command error messages and return values are in the errors.1 seedoc.

Command failure is sometimes associated with a signal. A signal is an interprocess message from another process or the kernel. See signal . gcc often fails with signal 11, SIGSEGV, invalid memory reference. This is usually attributed to bad RAM, but can also occur in some obscure cases of bad code. That is, SIGSEGV from gcc isn't always a hardware problem.

THE KERNEL
At install time the most frustrating message is "VFS: unable to mount root fs on 3:01" or some other number. That happens when the kernel can't find a filesystem of known type on the device it was rdev'ed or LILO'ed to mount / on. unix is so much about the filesystem that there's no point in booting if you can't mount a root filesystem. The 3:01 is the major and minor device numbers, in this case the first partition on the first IDE drive. A boot will also fail if the root filesystem is not read-write, with a message about "unable to open initial console", because it has to be able to make a symlink for /dev/console, and making a symlink is a write.

Another class of kernel error is the "oops". A kernel oops is a debugging dump of various cryptic kernel memory information. Oopzen, in the affectionate plural form, are not supposed to happen in stable series kernels. Normal users don't usually ever see oopzen. If you get an oops in normal use of a current-version stable-series kernel they quite probably want to hear about it via email to linux-kernel@vger.rutgers.edu, even if the oops didn't kill the kernel.

PROGRAMMING
There are roughly 6 times as many libc calls as there are kernel calls, and each has some error behavior. The "section 3 manpages" for libc run to about 5 meg, and thus are not included in cLIeNUX Core. If you are dealing with these things, by all means get the related manpages. Also get the related GNU texinfo for the GNU version of the libc you use. There you will find all the docs on error codes from libc calls, and the "errno" variable gcc maintains for all calls. The docs will give you the macro names for the error codes you will encounter. Unfortunately, when an error occurs while you are writing your code ( in C, anyway), all you get is the actual number. You can write a simple program to print out all the macros as numbers, and this will tell you the actual numbers for your platform. Then the numbers can be understood based on the documentation of thier associated macros.

RIGHTS
This document is Copyright 1999 Richard Allen Hohensee.
Alan Cox told me about non-hardware SIGSEGV's with gcc.
This document is released for redistribution only as part of cLIeNUX Core.