.TI CSHELL/INTERNALS Internal Operation of the C Shell The whole point of the shell is to find programs and start them running. When you enter a command it searches for a program to do your bidding, starts it running, stands by idly until it completes, and then prints a prompt for another command. It continues this read-execute-prompt cycle indefinitely, stopping only when you logout or the computer goes down. Ultimately the shell's purpose is to take a user command and put it in the form Unix requires for starting execution of new programs: execl( PROGFILE, ARG0, ARG1, ARG2, ..., 0 ). For example, if your command were "nroff -ms myfile", the shell's job would be to execl( "/usr/bin/nroff", "nroff", "-ms", "myfile", 0 ), where "/usr/bin/nroff" tells Unix in which file to find the nroff program. In this case the shell had very little work to do. If your next command were "!! | lpr ; wc * > ~/wcout", the shell would have much more work to do and end up with 3 execl calls bearing little resemblance to your command. This is important because what the shell winds up sending to execl as arguments are what the programs involved really see. A program that is executing, as opposed to one that is stored in a file, is called a process. When you login, Unix finds the C shell program in the file "/bin/csh" and starts it running as a process on your terminal. The same happens to everyone else when they login, but each of the resulting processes is independent and has no knowledge of any other processes except those it might create. Thus you have your own shell when you login, and can in fact personalize it to some extent. In a little greater detail than before, here is what the C shell does with a command. To illustrate this suppose you enter the command .br % nroff -ms chap* > outfile .br Your shell process ... [1] reads the command and breaks it into separate command words: "nroff", "-ms", "chap*" ">", "outfile"; [2] makes new command words if necessary: in this case replaces the command word "chap*" by all filenames beginning with "chap", for example, "chapintro", "chapter1", "chapter2"; [3] finds a file (assumed to contain the program) named by the first command word: "/usr/bin/nroff"; [4] makes a copy of itself -- a child process -- which will later be transformed into the nroff process. Here the child and parent processes do different things. [5] The child sets up input and output, removing command words which indicate redirection: in this case opens a file called "output" to which all future output from this child process will be written instead of the terminal and removes the words ">" and "outfile" from your command; [6] the child transforms itself into the program found in step 3 above using execl: execl( "/usr/bin/nroff", "nroff", "-ms", "chapintro", "chapter1", "chapter2", 0 ); [7] the child dies, either because it is done or there was an error, at which point the Unix kernel removes all traces of it and sends a signal of this event to the parent process; [8] the parent process meanwhile literally waits idly for the child process to finish, and then issues a prompt for another command. Each of these steps have interesting and important ramifications. Some are explained below, others are mentioned below and explained elsewhere. [1] Reads the command and breaks it into separate command words. This step (lexical analysis) is needed to get the command words (arguments) into the execl format. It gives the typist some flexibility while imposing some restrictions. In particular, the shell breaks the line into separate words at blanks and tabs, treating multiple blanks and tabs as if they were one blank. So, for example, if you accidentally type extra blanks at the beginning or end of the command, or between words, the shell will probably do what you had in mind. On the other hand, if you leave out blanks between two adjacent arguments, it will go ahead and bundle them up as one word. For example, the shell considers the command .br % nroff-ms myfile .br as having only two words, the name of the command being "nroff-ms", then tries unsuccessfully to locate the program (step 3) in a file of that name and responds with .br nroff-ms: Command not found. .br The last argument would have been correctly interpreted as "myfile". To add another twist, the command .br nroff -ms-o1,5 myfile .br would be execl'd successfully (step 6) but would provoke an error message from nroff. One additional rule says that any one of the characters &|;<>() is considered a separate word, except when one of &|<> appear doubled, in which case the doubled character is one word. For example, the commands .br % neqn > outfile& .br % neqn < paper | nroff -ms >> outfile & .br are interpreted identically, each consisting of 9 words. On the other hand, if you want a blank, tab, or one of &|;<>() to be considered part of another word, you must surround it with quote marks of the type ", `, or ', or precede it with a \\ (use of \\ is also termed quoting). If you want a carriage-return (newline) to be part of a word, you must surround it with quote marks AND precede it with a \\, since preceding it with a \\ and not using quote marks is treated as a blank. Beware. Strictly speaking, quoting prevents the shell from interpreting the quoted characters according to its usual practice, and this discussion only mentions how the usual practice is suspended with respect to word separation. There are other much more profound side-effects of quoting depending on both the quoted and the quoting characters. The documentation is perhaps more unyielding, incomplete, and confusing on this issue than on any other. [2] Makes new command words if necessary. The C Shell recognizes a large variety of characters and constructs as having special meanings and substitutes other words in their place. This means that if your command line contains any of them, as in "!! | lpr ; wc * > ~/wcout" from before, the resulting call (or calls) to execl (step 6) may be the result of sweeping changes made in this step. Note that the programs being called never see your original command and never have to know anything about the special characters. Consequently, the same substitution rules apply to ALL programs called from the shell (for example, "lpr", "vi", "nroff", etc.). Substitutions are classified by type and are applied in a definite order. The shell scans command words for characters or constructs of the first type, making substitutions if it finds any. Then it takes the resulting command words and scans them to find and make substitutions of the second type, if any, and so forth. Here is a list of substitution types in order with an indication of the kinds of special characters that will trigger them. .nf .ta 8n 16n 24n 32n 40n 48n 56n 64n Type Triggered By Typical Uses ------------------------------------------------------------------- History !event, ^old^new re-use earlier commands Alias first command word re-name commands Variable $var, $#var, $var[n] scripts, personalized shell Command `shell command` use command output as args Filename *, ?, [], {}, ~ abbreviate groups of files Input/Output <, >, |, <<, >>, $< re-route input and output Expressions ( x <>=!~+-*/()&|^ y ) arithmetic and branching .fi In the hands of a sober, well-informed user, substitutions are very useful: (1) they can save tremendous amounts of typing, (2) they need only be learned for the shell, since all programs called by users have to go through the shell, and (3) they make it possible to write programs consisting of shell commands. In the wrong hands, however, substitutions can be a tricky. To help you practice, the shell provides a way for you to see exactly what it comes up with just before it calls execl. The command "set echo" will cause it to print your command after all substitutions have been made, just before calling execl. To avoid the danger of executing a possibly incorrect command, you can test whether a construct will end up the way you think just by entering it as an argument to the "echo" command. The "echo" command does nothing more than print its arguments on the terminal and like all commands is subject to substitutions. So, for example, "echo *" prints the words that would result, on any command line, from substituting for * (which lists all your files). [3] Finds a file named by the first command word. The whole point of the shell is to run programs other than itself, such as "vi", "cc", "troff", etc. Occasionally there is a need for a command that the shell can perform internally, that is, without locating a program file or creating another process. So in this step the shell usually tries to locate a file containing the program named by the first command word, but not before checking to see if it belongs to the set of commands built-in to itself. If a command is non-built-in, the shell scans a list of directories called the searchpath, which may be personalized for each user. It appends the first command word to the first directory on the list and checks to see if the resulting file name exists. If not, it checks the second directory in a similar fashion, and so forth, until a file is found, and that file name is used when execl is called in step 6. In the case that no file is found, the shell reports this and prompts for another command. If your searchpath becomes garbled, usually because you were experimenting with it, the shell may not find some or all of the usual non-built-in commands. Besides panicking, there are two things to do. Fortunately, the command to correct the searchpath is built-in and can still be used, but only if you recognize that that is the problem. Also, if the first command word begins with a /, the shell considers it to be the name of the program file to execute, for example, the command "/usr/ucb/vi .cshrc" would work. If a command is built-in, the shell bypasses steps 4, 6, 7, and 8, which reduces run time greatly, and performs the command in its own way. For the sake of efficiency, a built-in command is preferred to a non-built-in command if they perform the same function, and that is why some of the built-in commands were created. Other commands were built-in because they would not have worked otherwise, due to the way that processes disappear completely in step 7; in particular, if a command is needed to change the behavior of your shell from that point on, a non-built-in command would only be able to change the characteristics of a child process of your shell, the shell process that will read your next command when the child dies leaving no trace of the change. The "echo" command, for example, is built-in to the C shell because it is used so often. A quick and ugly way to list the files in your directory, without using the "ls" command, is to type "echo *". A very quick way to create a one line file, without "vi", is "echo This is a one line file. > oneliner". Some commands that have to be built-in are "cd", "set", "alias", and "history". Unfortunately, most built-in commands do not have separate manual sections, so the command "man set" will yield nothing, while "man csh" will tell you about "cd" after printing the first 9 pages or so. Ironically, "man echo" will display a manual page because users of the Bourne shell do not have a built-in "echo" command. [4] Makes a copy of itself -- a child process. The Unix kernel requires the C shell -- in fact, requires all programs that run other programs -- to use execl. Unfortunately, that causes the process running the new program to die when it is done. Your shell therefore has to create a new process to do the execl in order that the old process survive to prompt you for the next command. The only way to create a new process on Unix, though, is for an existing process to make a copy of itself by executing a program statement called fork. The new and old processes are identical except that one knows it is a parent and the other knows it is a child, and the internal code of the program for both processes can take different branches on the basis of this information. This step is time-consuming, and the documentation sometimes mentions useful ways to avoid having to fork new processes, for instance, by using built-in commands. [5] The child sets up input and output. In this step, the command words are scanned for special input or output redirection constructs. When these constructs have been interpreted, they are removed from the list of command words. Any output file specified is created if it does not already exist. If the file or directory does not have the correct permissions, or an input file does not exist, the shell, not the program named by the first command word, issues an error message and prompts for another command. The program to be run has no knowledge that its inputs and outputs have been changed. In the presence of a pipe between commands, the shell removes the pipe constructs from the command line after first breaking it up into separate subcommands. Each of these subcommands is processed like any other command, with a separate fork and execl for each. The main difference is that the parent sets up input and output between processes and has them all started up before beginning to wait on any of them. [6] The child transforms itself into the program found in step 3. This is where the child does the execl, but not precisely. For simplicity I did not mention that the actual call is of the form: execve( PROGFILE, ARG0, ARG1, ARG2, ..., 0 , ENV0, ENV1, ENV2, ..., 0 ). The new arguments (after the first 0) contain definitions of all the current process's environment variables. These may contain any information the user may choose to store in them using the built-in command "setenv" and have the property that besides input/output redirection, the current directory, and a handful of other data, they are some of the very few things that can be inherited by the new program after execl. [7] The child dies. Processes can finish normally or abnormally, but all of them die eventually. For example, when you leave "vi" by typing ZZ, or when "nroff" stops because of a macro/diversion overflow, then the associated processes die. Your shell itself is a process which dies when you logout. When the child process running the new program dies, the Unix kernel sends a signal to the parent process (your shell) notifying it of the event. [8] The parent waits for the child to die, then prompts the user. In the meantime, the parent process has executed a program statement called wait which just puts it on hold until Unix sends a signal notifying the shell that the child has died. If you had entered an & at the end of the original command, your shell would not wait for notification of the child's death but would print the child's process number and then prompt you for the next command. That procedure is called backgrounding a process. While the C shell is waiting for the child (only on 4.1 or 4.2 BSD Unix) you can type ^Z to wakeup the parent and freeze the child for the time being. At that point you could enter other commands to shell and at a later time you could issue commands to resume execution, kill it altogether, or resume execution in the background. This useful feature is called job control. jak