.TI CSHELL/INTERNALS
Internal Operation of the C Shell


The whole point of the shell is to find programs and start them running.
When you enter a command it searches for a program to do your bidding,
starts it running, stands by idly until it completes, and then prints
a prompt for another command.
It continues this read-execute-prompt cycle indefinitely, stopping
only when you logout or the computer goes down.

Ultimately the shell's purpose is to take a user command and put it
in the form Unix requires for starting execution of new
programs:  execl( PROGFILE, ARG0, ARG1, ARG2, ..., 0 ).
For example, if your command were "nroff -ms myfile", the shell's job
would be to execl( "/usr/bin/nroff", "nroff", "-ms", "myfile", 0 ),
where "/usr/bin/nroff" tells Unix in which file to find the nroff program.
In this case the shell had very little work to do.
If your next command were "!! | lpr ; wc * > ~/wcout",
the shell would have much more work to do and end up with 3 execl calls
bearing little resemblance to your command.
This is important because what the shell winds up sending to execl
as arguments are what the programs involved really see.

A program that is executing, as opposed
to one that is stored in a file, is called a process.
When you login, Unix finds the C shell program in the file "/bin/csh"
and starts it running as a process on your terminal.
The same happens to everyone else when they login,
but each of the resulting processes is independent and has
no knowledge of any other processes except those it might create.
Thus you have your own shell when you login, and can
in fact personalize it to some extent.

In a little greater detail than before,
here is what the C shell does with a command.
To illustrate this suppose you enter the command
.br
	% nroff -ms chap* > outfile
.br
Your shell process ...

[1] reads the command and breaks it into separate
command words:  "nroff", "-ms", "chap*" ">", "outfile";

[2] makes new command words if necessary:  in this case replaces the
command word "chap*" by all filenames beginning with "chap",
for example, "chapintro", "chapter1", "chapter2";

[3] finds a file (assumed to contain the program)
named by the first command word:  "/usr/bin/nroff";

[4] makes a copy of itself -- a child process -- which will later be
transformed into the nroff process.

Here the child and parent processes do different things.

[5] The child sets up input and output, removing command words which
indicate redirection:  in this case opens a file
called "output" to which all future output from this child
process will be written instead of the terminal and removes
the words ">" and "outfile" from your command;

[6] the child transforms itself into the program found in step 3 above
using execl:  execl( "/usr/bin/nroff", "nroff", "-ms", "chapintro",
"chapter1", "chapter2", 0 );

[7] the child dies, either because it is done or there was an error,
at which point the Unix kernel removes all
traces of it and sends a signal of this event to the parent process;

[8] the parent process meanwhile literally waits idly for the child
process to finish, and then issues a prompt for another command.

Each of these steps have interesting and important ramifications.
Some are explained below, others are mentioned below and explained
elsewhere.

[1] Reads the command and breaks it into separate command words.

This step (lexical analysis) is needed to get the command words
(arguments) into the execl format.
It gives the typist some flexibility while imposing some restrictions.
In particular, the shell breaks the line into separate words
at blanks and tabs, treating multiple blanks and tabs as if they
were one blank.
So, for example, if you accidentally type extra blanks at
the beginning or end of the command, or between words, the
shell will probably do what you had in mind.
On the other hand, if you leave out blanks between two adjacent
arguments, it will go ahead and bundle them up as one word.
For example, the shell considers the command
.br
	% nroff-ms                myfile
.br
as having only two words, the name of the command being "nroff-ms",
then tries unsuccessfully to locate the program (step 3)
in a file of that name and responds with
.br
	nroff-ms: Command not found.
.br
The last argument would have been correctly interpreted as "myfile".
To add another twist, the command
.br
	nroff -ms-o1,5 myfile
.br
would be execl'd successfully (step 6) but would provoke
an error message from nroff.

One additional rule says that any one of the
characters &|;<>() is considered a separate word,
except when one of &|<> appear doubled, in which case
the doubled character is one word.
For example, the commands
.br
	% neqn    <paper|nroff -ms>>   outfile&
.br
	% neqn < paper | nroff -ms >> outfile &
.br
are interpreted identically, each consisting of 9 words.

On the other hand, if you want a blank, tab, or one of &|;<>()
to be considered part of another word, you must surround
it with quote marks of the type ", `, or ', or precede it with a \\
(use of \\ is also termed quoting).
If you want a carriage-return (newline) to be part of a word,
you must surround it with quote marks AND precede it with a \\,
since preceding it with a \\ and
not using quote marks is treated as a blank.

Beware.
Strictly speaking, quoting prevents the shell from interpreting
the quoted characters according to its usual practice, and this discussion
only mentions how the usual practice is suspended with respect
to word separation.
There are other much more profound side-effects of quoting depending on both
the quoted and the quoting characters.
The documentation is perhaps more unyielding, incomplete, and confusing
on this issue than on any other.

[2] Makes new command words if necessary.

The C Shell recognizes a large variety of characters and constructs
as having special meanings and substitutes other words in their place.
This means that if your command line contains any of them,
as in "!! | lpr ; wc * > ~/wcout" from before,
the resulting call (or calls) to execl (step 6) may be the result
of sweeping changes made in this step.
Note that the programs being called never see your original command
and never have to know anything about the special characters.
Consequently, the same substitution rules apply to ALL programs called
from the shell (for example, "lpr", "vi", "nroff", etc.).

Substitutions are classified by type and are applied in a definite order.
The shell scans command words for characters or constructs of the first type,
making substitutions if it finds any.
Then it takes the resulting command words and scans them to find and make
substitutions of the second type, if any, and so forth.
Here is a list of substitution types in order with an indication of the kinds
of special characters that will trigger them.

.nf
.ta 8n 16n 24n 32n 40n 48n 56n 64n
Type		Triggered By		Typical Uses
-------------------------------------------------------------------
History		!event, ^old^new	re-use earlier commands
Alias		first command word	re-name commands
Variable	$var, $#var, $var[n]	scripts, personalized shell
Command		`shell command`		use command output as args
Filename	*, ?, [], {}, ~		abbreviate groups of files
Input/Output	<, >, |, <<, >>, $<	re-route input and output
Expressions	( x <>=!~+-*/()&|^ y )	arithmetic and branching
.fi

In the hands of a sober, well-informed user, substitutions are very
useful:  (1) they can save tremendous amounts of typing, (2) they
need only be learned for the shell, since all programs called by
users have to go through the shell, and (3) they make it possible
to write programs consisting of shell commands.

In the wrong hands, however, substitutions can be a tricky.
To help you practice, the shell provides a way for you to see
exactly what it comes up with just before it calls execl.
The command "set echo" will cause it to print your command after
all substitutions have been made, just before calling execl.
To avoid the danger of executing a possibly incorrect command, you
can test whether a construct will end up the way you think
just by entering it as an argument to the "echo" command.
The "echo" command does nothing more than print its arguments
on the terminal and like all commands is subject to substitutions.
So, for example, "echo *" prints the words that would result,
on any command line, from substituting for * (which lists all your files).

[3] Finds a file named by the first command word.

The whole point of the shell is to run programs other than itself,
such as "vi", "cc", "troff", etc.
Occasionally there is a need for a command that the shell can perform
internally, that is, without locating a program file or
creating another process.
So in this step the shell usually tries to locate a file containing the
program named by the first command word, but not before checking
to see if it belongs to the set of commands built-in to itself.

If a command is non-built-in, the shell scans a list of directories
called the searchpath, which may be personalized for each user.
It appends the first command word to the first directory on
the list and checks to see if the resulting file name exists.
If not, it checks the second directory in a similar fashion,
and so forth, until a file is found, and that file name is
used when execl is called in step 6.
In the case that no file is found, the shell reports this
and prompts for another command.

If your searchpath becomes garbled, usually because you were
experimenting with it, the shell may not find some or all of
the usual non-built-in commands.
Besides panicking, there are two things to do.
Fortunately, the command to correct the searchpath is built-in
and can still be used, but only if you recognize that
that is the problem.
Also, if the first command word begins with a /, the shell
considers it to be the name of the program file to execute,
for example, the command "/usr/ucb/vi .cshrc" would work.

If a command is built-in, the shell bypasses steps 4, 6, 7, and 8,
which reduces run time greatly, and performs the command in its own way.
For the sake of efficiency, a built-in command is preferred to a
non-built-in command if they perform the same function, and that is
why some of the built-in commands were created.
Other commands were built-in because they would not have worked
otherwise, due to the way that processes
disappear completely in step 7; in particular, if a command is
needed to change the behavior of your shell from that point on,
a non-built-in command would only be able to change the characteristics
of a child process of your shell, the shell process that will read
your next command when the child dies leaving no trace of the change.

The "echo" command, for example, is built-in to the C shell
because it is used so often.
A quick and ugly way to list the files in your directory,
without using the "ls" command, is to type "echo *".
A very quick way to create a one line file, without "vi",
is "echo This is a one line file. > oneliner".
Some commands that have to be built-in are "cd", "set",
"alias", and "history".
Unfortunately, most built-in commands do not have separate manual
sections, so the command "man set" will yield nothing, while "man csh"
will tell you about "cd" after printing the first 9 pages or so.
Ironically, "man echo" will display a manual page because users
of the Bourne shell do not have a built-in "echo" command.

[4] Makes a copy of itself -- a child process.

The Unix kernel requires the C shell -- in fact, requires
all programs that run other programs -- to use execl.
Unfortunately, that causes the process running the new program
to die when it is done.
Your shell therefore has to create a new process to do the execl
in order that the old process survive to prompt you for the
next command.
The only way to create a new process on Unix, though, is for
an existing process to make a copy of itself by executing
a program statement called fork.
The new and old processes are identical except that one knows
it is a parent and the other knows it is a child, and
the internal code of the program for both processes can
take different branches on the basis of this information.
This step is time-consuming, and the documentation sometimes
mentions useful ways to avoid having to fork new processes,
for instance, by using built-in commands.

[5] The child sets up input and output.

In this step, the command words are scanned for special input or
output redirection constructs.
When these constructs have been interpreted, they are removed
from the list of command words.
Any output file specified is created if it does not already exist.
If the file or directory does not have the correct permissions,
or an input file does not exist, the shell, not the program named
by the first command word, issues an error message and prompts
for another command.
The program to be run has no knowledge that its inputs and outputs
have been changed.

In the presence of a pipe between commands, the shell removes the
pipe constructs from the command line after first breaking it up
into separate subcommands.  Each of these subcommands is processed
like any other command, with a separate fork and execl for each.
The main difference is that the parent sets up input and output
between processes and has them all started up before beginning
to wait on any of them.

[6] The child transforms itself into the program found in step 3.

This is where the child does the execl, but not precisely.
For simplicity I did not mention that the actual call is
of the form:  execve( PROGFILE, ARG0, ARG1, ARG2, ..., 0 , ENV0, ENV1,
ENV2, ..., 0 ).
The new arguments (after the first 0) contain definitions of
all the current process's environment variables.
These may contain any information the user may choose
to store in them using the built-in command "setenv" and
have the property that besides input/output redirection, the
current directory, and a handful of other data,
they are some of the very few things
that can be inherited by the new program after execl.

[7] The child dies.

Processes can finish normally or abnormally,
but all of them die eventually.
For example, when you leave "vi" by typing ZZ, or when
"nroff" stops because of a macro/diversion overflow,
then the associated processes die.
Your shell itself is a process which dies when you logout.

When the child process running the new program dies,
the Unix kernel sends a signal to the parent process (your shell)
notifying it of the event.

[8] The parent waits for the child to die, then prompts the user.

In the meantime, the parent process has executed a program statement
called wait which just puts it on hold until Unix
sends a signal notifying the shell that the child has died.
If you had entered an & at the end of the original command,
your shell would not wait for notification of the child's death
but would print the child's process number and then
prompt you for the next command.
That procedure is called backgrounding a process.

While the C shell is waiting for the child (only on 4.1 or 4.2 BSD Unix)
you can type ^Z to wakeup the parent and
freeze the child for the time being.
At that point you could enter other commands to shell and
at a later time you could issue commands to resume execution,
kill it altogether, or resume execution in the background.
This useful feature is called job control.


jak