Annotation of 43BSD/contrib/icon/docs/tr84-11a.roff, revision 1.1

1.1     ! root        1: .TR 84-11
        !             2: .DA "August 5, 1984"
        !             3: .Gr
        !             4: .TL
        !             5: A Tour Through the C Implementation of Icon; Version 5.9
        !             6: .AU
        !             7: Ralph E. Griswold
        !             8: .AU
        !             9: Robert K. McConeghy
        !            10: .AU
        !            11: William H. Mitchell
        !            12: .AB
        !            13: This report documents the C implementation
        !            14: of Version 5.9 of the Icon programming language.
        !            15: This report concentrates on the
        !            16: major parts of the system \(em
        !            17: the translator, the linker, and the interpreter.
        !            18: An additional section discusses how the implementation
        !            19: can be modified for new language features.
        !            20: .AE
        !            21: .SH
        !            22: Introduction
        !            23: .PP
        !            24: This report describes the C implementation of Version 5.9 of
        !            25: the Icon programming language [1].
        !            26: Most of the system is coded in C [2] and is designed to be run under \*U.
        !            27: In addition to the C portion of the system, there is some assembly
        !            28: language code.
        !            29: To date, the C implementation has been adapted to the
        !            30: PDP-11\u\(dg\d, VAX-11, and Onyx C8002.
        !            31: .Un
        !            32: .FS
        !            33: \u\(dg\dPDP and VAX are trademarks of Digital Equipment Corporation.
        !            34: .FE
        !            35: This implementation is intended to be portable
        !            36: to other computers running under UNIX,
        !            37: but portability was not a primary design goal.
        !            38: Reference 3 describes the process of transporting this implementation
        !            39: and contains detailed descriptions of the assembly language routines for
        !            40: the VAX implementation.
        !            41: .PP
        !            42: The implementation of the Icon system consists of three parts:
        !            43: a translator, a linker,
        !            44: and an interpreter. The interpreter contains a run-time system that
        !            45: includes routines for the operations that are needed to execute an Icon program.
        !            46: The translator converts an Icon source program into
        !            47: an intermediate code, called
        !            48: .I ucode .
        !            49: The linker combines separately translated ucode files,
        !            50: binds inter-procedure references,
        !            51: and produces interpretable binary output, called \fIicode\fR.
        !            52: .PP
        !            53: The reference language for this report is Version 5.9 of Icon [4].
        !            54: This report is intended to be used in conjunction with
        !            55: the source listings for Version 5.9,
        !            56: although a general overview of the system
        !            57: can be obtained from this document alone.
        !            58: .NH
        !            59: The Translator
        !            60: .PP
        !            61: The Icon translator is written entirely in C
        !            62: and consists of
        !            63: 12 files of source code and 10 header files.
        !            64: The translator builds a parse tree for each Icon procedure,
        !            65: then traverses the tree to generate code.
        !            66: Three of the 12 source files contain only data initialization
        !            67: and are automatically generated from specification files.
        !            68: In addition, the LALR(1) parser is automatically generated
        !            69: by the \fIYacc\fR parser generator [5].
        !            70: .PP
        !            71: The ucode output from the translator consists of two files.
        !            72: One file, with the suffix \fM.u1\fR, contains
        !            73: intermediate code corresponding to the procedures in the program.
        !            74: The second file, with the suffix \fM.u2\fR,
        !            75: contains global symbol table information.
        !            76: Both files are printable.
        !            77: .PP
        !            78: The following sections discuss the four parts of the translator:
        !            79: the lexical analyzer,
        !            80: the parser,
        !            81: the code generator,
        !            82: and the symbol table manager.
        !            83: .NH 2
        !            84: The Lexical Analyzer
        !            85: .PP
        !            86: The lexical analyzer reads the Icon source program,
        !            87: breaks it into tokens,
        !            88: and delivers the tokens to the parser as requested.
        !            89: A token is the basic syntactic unit of the Icon language;
        !            90: it may be an identifier, a literal, a reserved word,
        !            91: or an operator (operators include punctuation).
        !            92: .PP
        !            93: The lexical analyzer consists of four source files:
        !            94: \fMlex.c\fR,
        !            95: \fMchar.c\fR,
        !            96: \fMoptab.c\fR,
        !            97: and
        !            98: \fMtoktab.c\fR.
        !            99: The latter two of these files contain operator and token tables, respectively,
        !           100: and are automatically generated
        !           101: from operator and token specification files,
        !           102: described below.
        !           103: The file
        !           104: \fMchar.c\fR
        !           105: contains character mapping tables and the file
        !           106: \fMlex.c\fR
        !           107: contains the lexical analyzer itself.
        !           108: .PP
        !           109: The parser requests a token by calling
        !           110: \fMyylex\fR,
        !           111: which finds the next token in the source program
        !           112: and determines its token type and value.
        !           113: The parser bases its moves on the token type:
        !           114: if the token is an operator or reserved word,
        !           115: the token type specifically identifies the operator or reserved word;
        !           116: otherwise, the token type indicates one of the six \*(oqprimitive\*(cq types:
        !           117: identifier, integer literal, real literal, string literal, cset literal, or end-of-file.
        !           118: The token value is a leaf node of the parse tree,
        !           119: which, for the primitive types,
        !           120: contains the source program representation of the token.
        !           121: The token value node also contains
        !           122: the source-program line and column numbers where the token starts.
        !           123: A pointer to this node is placed in the global variable
        !           124: \fMyychar\fR,
        !           125: and
        !           126: \fMyylex\fR
        !           127: returns the token type.
        !           128: .PP
        !           129: The lexical analyzer finds the next token
        !           130: by skipping white space, including comments.
        !           131: The first character of the new token indicates which of the classes it belongs to.
        !           132: A letter or underscore begins an identifier or reserved word,
        !           133: a digit begins an integer or real literal,
        !           134: a double quote begins a string literal,
        !           135: a single quote begins a cset literal,
        !           136: and any other character is assumed to begin an operator.
        !           137: An identifier or reserved word is completed
        !           138: by gathering all subsequent letters, digits, and underscores.
        !           139: The reserved word table is consulted to determine if the token
        !           140: is an identifier or a reserved word.
        !           141: Numeric literals are recognized by a finite-state automaton,
        !           142: which distinguishes real from integer literals by the
        !           143: presence of a decimal point or the letter \*(oqe\*(cq.
        !           144: A quoted literal is completed by reading
        !           145: until the opening delimiter is repeated,
        !           146: converting escapes in the process and continuing to new lines as necessary.
        !           147: A table-driven finite-state automaton, described below,
        !           148: recognizes operators.
        !           149: .PP
        !           150: An important task of the lexical analyzer is semicolon insertion.
        !           151: The grammar requires that semicolons separate expressions
        !           152: in a compound expression or procedure body,
        !           153: so they must be inserted into the token stream
        !           154: where they are omitted in the source program.
        !           155: This process is table driven.
        !           156: Associated with each token type are two flags,
        !           157: .I BEGINNER
        !           158: and
        !           159: .I ENDER .
        !           160: The
        !           161: .I BEGINNER
        !           162: flag is true if a token may legally begin an expression
        !           163: (i.e., if it may follow a semicolon).
        !           164: Similarly, the
        !           165: .I ENDER
        !           166: flag is true if a token may legally end an expression
        !           167: (i.e., if it may precede a semicolon).
        !           168: When a newline appears between two tokens, the
        !           169: .I ENDER
        !           170: flag of the first is true, and the
        !           171: .I BEGINNER
        !           172: flag of the second is true,
        !           173: then a semicolon is inserted between the two tokens.
        !           174: .PP
        !           175: The token table is initialized in the file
        !           176: \fMtoktab.c\fR.
        !           177: The table is divided into three sections:
        !           178: primitive types, reserved words, and operators.
        !           179: The primitive types are fixed in the first six
        !           180: slots in the table,
        !           181: and must not be changed,
        !           182: since they are referenced directly from the code.
        !           183: The reserved words follow
        !           184: and must be in alphabetical order.
        !           185: The operators follow in no special order.
        !           186: The last entry merely marks the end of the table.
        !           187: .PP
        !           188: Also in
        !           189: \fMtoktab.c\fR
        !           190: is an index to reserved words.
        !           191: To speed up the search for reserved words,
        !           192: this table hashes the search
        !           193: using the first letter as the hash value.
        !           194: The reserved words that begin with that letter then are examined linearly.
        !           195: .PP
        !           196: The operator table, in
        !           197: \fMoptab.c\fR,
        !           198: describes a finite-state automaton that
        !           199: recognizes each operator in the language.
        !           200: Each state is represented by an array of structures.
        !           201: Each structure in the array
        !           202: corresponds to a transition on the input symbol.
        !           203: The structure contains three fields:
        !           204: an input symbol,
        !           205: an action,
        !           206: and a value used by the action.
        !           207: The recognizer starts in state 0;
        !           208: the current input symbol is the first character
        !           209: of the operator.
        !           210: In a given state with a given input symbol,
        !           211: the recognizer searches the array associated with the current state
        !           212: for an entry that matches the current input symbol.
        !           213: Failing a match, the last entry of the array, with
        !           214: the input symbol field of 0,
        !           215: is used.
        !           216: The recognizer then performs one of the following actions,
        !           217: depending on the value of the action field:
        !           218: .in .5i
        !           219: .IP \(bu \w'\(bu'u+1m
        !           220: goes to the new state indicated by the value field
        !           221: and gets the next input character
        !           222: .IP \(bu
        !           223: issues an error
        !           224: .IP \(bu
        !           225: returns the value field as
        !           226: a pointer to the token table entry for the operator
        !           227: .IP \(bu
        !           228: returns the value field,
        !           229: but pushes the current input character back onto the input.
        !           230: .in 0
        !           231: .LP
        !           232: The difference between the last two actions is that
        !           233: some operators are recognized immediately (e.g., \*(oq\fM;\fR\*(cq),
        !           234: while others are not recognized until the character
        !           235: following the operator is read (e.g., \*(oq\fM=\fR\*(cq).
        !           236: .PP
        !           237: The token table
        !           238: and operator table
        !           239: are automatically constructed by the Icon program
        !           240: \fMmktoktab.icn\fR.
        !           241: This program reads the specification file
        !           242: \fMtokens\fR
        !           243: and builds the file
        !           244: \fMtoktab.c\fR.
        !           245: The file
        !           246: \fMtokens\fR
        !           247: contains a list of all the tokens,
        !           248: their token types (given as defined constants),
        !           249: and any associated flags.
        !           250: This list is divided into the three sections detailed above.
        !           251: The program then reads the specification file
        !           252: \fMoptab\fR
        !           253: and builds the file
        !           254: \fMoptab.c\fR.
        !           255: The former is a skeleton for the operator table;
        !           256: it contains the state tables,
        !           257: but the program fills in the pointers to the token table entries.
        !           258: .NH 2
        !           259: The Parser
        !           260: .PP
        !           261: The parser, in the file
        !           262: \fMparse.c\fR,
        !           263: is automatically generated by \fIYacc\fR.
        !           264: The grammar and semantic actions are contained in the file
        !           265: \fMicon.g\fR.
        !           266: From these specifications,
        !           267: \fIYacc\fR generates parser tables for an LALR(1) parser.
        !           268: .PP
        !           269: In addition to the grammar,
        !           270: \fMicon.g\fR
        !           271: contains
        !           272: a list of all the token types in the language
        !           273: and declarations necessary to the actions.
        !           274: \fIYacc\fR assigns an integer value to each token type,
        !           275: and generates define statements,
        !           276: which are written to the file
        !           277: \fMtoken.h\fR.
        !           278: These defined constants are the token types
        !           279: returned by the lexical analyzer.
        !           280: .PP
        !           281: The grammar is context-free,
        !           282: with actions associated with most of the rules.
        !           283: An action is invoked when the corresponding rule is reduced.
        !           284: The actions perform two duties:
        !           285: maintaining the symbol tables
        !           286: and constructing the parse tree.
        !           287: The parse tree is built from the bottom up \(em
        !           288: the leaves are supplied by the lexical analyzer
        !           289: and the actions build trees from the leaves and from smaller trees
        !           290: with each reduction.
        !           291: .PP
        !           292: The parser requests tokens from the lexical analyzer,
        !           293: building a parse tree until it reduces a procedure.
        !           294: At this point, it passes the root of the parse tree
        !           295: to the code generator.
        !           296: Once the intermediate code has been generated,
        !           297: the parse tree is discarded,
        !           298: and a new tree is begun for the next procedure.
        !           299: .PP
        !           300: Record and global declarations
        !           301: affect only the symbol table
        !           302: and do not generate parse trees.
        !           303: Files named in link directives produce link instructions in the ucode
        !           304: output.
        !           305: .PP
        !           306: A complete parse tree is rooted at a
        !           307: \f3proc\fR
        !           308: node, which identifies the procedure
        !           309: and points to the subtrees for the
        !           310: \fMinitial\fR
        !           311: clause (if any)
        !           312: and the body of the procedure.
        !           313: Each node in the parse tree represents a source program construction
        !           314: or some implicit semantic action.
        !           315: A node can contain up to six fields,
        !           316: the first of which is the node type.
        !           317: The second and third fields are always line and column numbers
        !           318: that are used for error messages and tracing.
        !           319: Any additional fields contain information about the construction,
        !           320: and possibly pointers to subtrees.
        !           321: Appendix A contains a description of all the node types.
        !           322: .PP
        !           323: The grammar, shown in Appendix B, has several ambiguities.
        !           324: The well-known \*(oqdangling else\*(cq problem exists
        !           325: not only in the
        !           326: \fMif-then-else\fR
        !           327: expression, but also in the
        !           328: \fMwhile-do\fR,
        !           329: \fMuntil-do\fR,
        !           330: \fMevery-do\fR,
        !           331: and \fMto-by\fR
        !           332: expressions.
        !           333: In each of these expressions, the last clause is optional,
        !           334: so that when the parser sees an
        !           335: \fMelse\fR,
        !           336: for example,
        !           337: it does not know whether to shift the token
        !           338: (associating it with the most recent
        !           339: \fMif\fR),
        !           340: or to reduce the preceding
        !           341: \fMif-then\fR
        !           342: expression (leaving the else \*(oqdangling\*(cq).
        !           343: The latter choice is obviously incorrect, since the
        !           344: \fMelse\fR
        !           345: would never be shifted, and \fIYacc\fR correctly resolves such conflicts
        !           346: in favor of the shift.
        !           347: Thus, each
        !           348: \fMelse\fR
        !           349: is paired with the most recent unpaired
        !           350: \fMif\fR.
        !           351: All the control structures (except
        !           352: \fMcase\fR)
        !           353: have an additional ambiguity:
        !           354: they do not have a closed syntax,
        !           355: yet they may appear in an expression at the highest precedence level.
        !           356: For example, the expression
        !           357: .DS
        !           358: .tr -\-+\(pl*\(**
        !           359: \fMx := y + if a = b then z else -z * 3
        !           360: .tr --++**
        !           361: .ft R
        !           362: .DE
        !           363: could parse in either of two ways:
        !           364: .DS
        !           365: .tr -\-+\(pl*\(**
        !           366: \fMx := y + (if a = b then z else (-z * 3))
        !           367: x := y + (if a = b then z else -z) * 3
        !           368: .tr --++**
        !           369: .ft R
        !           370: .DE
        !           371: This problem, too, is resolved in favor of the shift,
        !           372: such that the first parse is always used.
        !           373: Thus, in the absence of parentheses,
        !           374: the entire expression to the right of a control structure
        !           375: is part of the control structure.
        !           376: .PP
        !           377: Little attention has been paid to error recovery.
        !           378: A few error productions have been placed in the grammar
        !           379: to enable \fIYacc\fR to recover from syntax errors;
        !           380: the technique for doing so is described by Aho and Johnson [6].
        !           381: The parser is slightly modified by the editor script
        !           382: \fMpscript\fR
        !           383: so that the parser state is passed to the routine
        !           384: \fMyyerror\fR.
        !           385: This routine prints an error message from the file
        !           386: \fMsynerr.h\fR
        !           387: that is associated with the current parser state.
        !           388: This error table currently is constructed by
        !           389: hand from the \fMy.output\fR file obtained by
        !           390: running \fIYacc\fR with the
        !           391: \f3\-v\fR
        !           392: option.
        !           393: .NH 2
        !           394: The Code Generator
        !           395: .PP
        !           396: The parser calls the code generator upon
        !           397: recognition of each Icon procedure,
        !           398: giving it the root of the parse tree.
        !           399: The code generator traverses the parse tree recursively,
        !           400: emitting ucode.
        !           401: Appendix C contains a description of ucode.
        !           402: .PP
        !           403: The file
        !           404: \fMcode.c\fR
        !           405: contains both the tree node allocation and the code generation routines.
        !           406: The included header file
        !           407: \fMcode.h\fR
        !           408: contains macros and definitions needed by the code generator,
        !           409: while
        !           410: \fMtree.h\fR
        !           411: defines the tree nodes and the macros that allocate them.
        !           412: The macros in
        !           413: \fMtree.h\fR
        !           414: provide the interface between the parser and the code generator.
        !           415: .PP
        !           416: The tree traversal routine,
        !           417: \fMtraverse\fR,
        !           418: is a recursive procedure with one argument,
        !           419: a pointer to the root of a tree or subtree
        !           420: for which code is to be generated.
        !           421: The routine examines the type field of the root
        !           422: and, through a switch statement,
        !           423: generates a sequence of ucode instructions as determined by the type.
        !           424: If the node has subtrees,
        !           425: \fMtraverse\fR
        !           426: calls itself recursively at the appropriate point
        !           427: to generate code for the subtree.
        !           428: For example, the code generated for a binary operator
        !           429: first generates code for its two subexpressions,
        !           430: then emits the code that calls the appropriate run-time library routine.
        !           431: .PP
        !           432: The returned value of the traversal routine is used for counting
        !           433: elements of expression lists.
        !           434: If the root of the tree being traversed is an
        !           435: \f3elist\fR
        !           436: (expression list) node,
        !           437: \fMtraverse\fR
        !           438: returns the sum of the returned values of its two subtrees.
        !           439: Otherwise, it returns 1.
        !           440: This count is used when generating code for
        !           441: procedure calls and lists with explicit elements,
        !           442: which need to know the number of arguments
        !           443: to be pushed onto the stack.
        !           444: .PP
        !           445: When generating code for loops,
        !           446: the code generator needs to save three pieces of information
        !           447: for each nested loop:
        !           448: the
        !           449: .I "break label" ,
        !           450: the
        !           451: .I "next label" ,
        !           452: and the expression nesting level.
        !           453: This information is kept on the
        !           454: .I "loop stack" .
        !           455: The break label is a label placed just past the end of the loop;
        !           456: it is the place where control is passed when the loop is finished.
        !           457: The next label is placed near the end of the loop,
        !           458: at a point where the next iteration of the loop can be started.
        !           459: The code for
        !           460: \fMbreak\fR
        !           461: and
        !           462: \fMnext\fR
        !           463: expressions branches to these labels,
        !           464: but in either case, any incomplete expression frames (see Section 3.2)
        !           465: within the loop must first be popped from the stack.
        !           466: The expression nesting level counts the number of currently active
        !           467: expression frames within the current loop;
        !           468: an
        !           469: \f3unmark\fR
        !           470: instruction is generated for that many expression frames
        !           471: (less one for a
        !           472: \fMnext\fR
        !           473: expression).
        !           474: .PP
        !           475: The possibility of nested
        !           476: \fMcase\fR
        !           477: expressions requires that certain information be kept on a
        !           478: .I "case stack" .
        !           479: For each case expression,
        !           480: the code generator allocates a label for the end of the expression
        !           481: and pushes it onto the case stack.
        !           482: When a
        !           483: \fMdefault\fR
        !           484: clause is encountered,
        !           485: its subtree is placed on the top of the case stack to delay
        !           486: code generation for it until the end of the \fMcase\fR expression.
        !           487: .NH 2
        !           488: The Symbol Table Manager
        !           489: .PP
        !           490: The symbol table manager consists of the symbol table data structures
        !           491: and routines that operate upon these data structures.
        !           492: The source code for the symbol table manager is contained in two files.
        !           493: The file
        !           494: \fMkeyword.c\fR
        !           495: contains only the keyword name table and is automatically constructed from a
        !           496: keyword specification file discussed below.
        !           497: The remainder of the symbol table manager is located in the file
        !           498: \fMsym.c\fR.
        !           499: .PP
        !           500: The symbol table manager operates with two logical data structures,
        !           501: the symbol table proper and the string space.
        !           502: When the lexical analyzer identifies a token as either
        !           503: an identifier or a literal,
        !           504: the lexical analyzer requests the symbol table manager
        !           505: to enter the token into the string space.
        !           506: The symbol table manager returns a pointer into
        !           507: the string space for that string.
        !           508: The lexical analyzer then places this pointer in the token value node.
        !           509: To help keep the size of the string space small,
        !           510: all entries are hashed,
        !           511: and only one copy of any string is kept.
        !           512: This has the added benefit that two strings can be compared
        !           513: by checking only the pointers into the string space.
        !           514: .PP
        !           515: The parser determines the context of the token
        !           516: and requests the symbol table manager
        !           517: to enter the token into the symbol table proper.
        !           518: It is the responsibility of the symbol table manager
        !           519: to verify that the use of the token is consistent with prior use.
        !           520: Appropriate diagnostics are issued if the use is inconsistent.
        !           521: .PP
        !           522: The symbol table proper is physically divided into three separate structures:
        !           523: the
        !           524: .I global ,
        !           525: .I local ,
        !           526: and
        !           527: .I literal
        !           528: tables.
        !           529: Each of these tables is hashed,
        !           530: using the pointer into the string space as the key.
        !           531: Since this pointer is an offset into the string space,
        !           532: hashing is simply and effectively performed by taking the rightmost
        !           533: .I n
        !           534: bits of the offset
        !           535: (where
        !           536: .if n .I n "" 2**
        !           537: .if t 2\s-2\u\fIn\fR\d\s+2
        !           538: is the size of the hash vector for the table).
        !           539: .PP
        !           540: The global table contains identifiers that have been declared as
        !           541: globals, procedures, or records.
        !           542: The local table holds all identifiers declared as locals,
        !           543: formal parameters for procedure declarations, field names for record
        !           544: declarations, and all other identifiers referenced in the procedure
        !           545: (including those declared as global elsewhere).
        !           546: The literal table contains entries for literal strings and csets, integers,
        !           547: and floating-point constants.
        !           548: .PP
        !           549: Both the local and literal tables are associated with the current
        !           550: procedure being parsed
        !           551: and are written to the \fM.u1\fR file
        !           552: when the procedure has been successfully parsed.
        !           553: If a record declaration has been parsed, then the local table,
        !           554: containing only the field name identifiers, is written to the
        !           555: \&\fM.u2\fR file.
        !           556: After all procedure, record, and global declarations in a Icon source
        !           557: file have been parsed, the global table is written into the global
        !           558: declarations file.
        !           559: .PP
        !           560: An entry into any of the three symbol table sections is a structure with
        !           561: three fields:
        !           562: a link,
        !           563: a name,
        !           564: and a flag.
        !           565: The link field holds the pointer to the next entry in the same hash bucket.
        !           566: The name is the pointer to the identifier or literal name in the string space.
        !           567: The flag field contains the type
        !           568: .I "formal parameter" , (
        !           569: .I "static local" ,
        !           570: .I "procedure name" ,
        !           571: etc.) of the entry.
        !           572: Global table entries have a fourth field,
        !           573: an integer providing the number of formal parameters for a procedure declaration,
        !           574: or the number of fields in a record declaration.
        !           575: .PP
        !           576: Lookup in the local and global tables is merely the process of
        !           577: following a hash chain until an entry of the same name is found or until
        !           578: the hash chain is exhausted.
        !           579: If a previous entry is found, the flags of the existing and new entries
        !           580: are compared, and diagnostics are printed if the use of the new entry
        !           581: conflicts with the previous usage.
        !           582: The new entry is ignored whenever such an inconsistency is found.
        !           583: .PP
        !           584: The literal table uses the same lookup procedure, except
        !           585: the search down the hash chain stops when an entry is found
        !           586: with the same textual form and flag fields.
        !           587: Thus the string literal
        !           588: \fM"123"\fR
        !           589: and the integer literal
        !           590: \fM123\fR
        !           591: have separate entries in the literal table, even though
        !           592: they have the same string representations.
        !           593: A consequence of this technique is that the integer
        !           594: literals
        !           595: \fM123\fR
        !           596: and
        !           597: \fM0123\fR
        !           598: have separate entries in the literal table,
        !           599: even though they have the same numeric value.
        !           600: Since most programmers use a reasonably consistent style
        !           601: when expressing literals,
        !           602: this technique usually does not produce many
        !           603: duplicate constants.
        !           604: .PP
        !           605: .tr ##
        !           606: A final task of the symbol table manager is
        !           607: the identification of keyword names.
        !           608: (Note that keywords are of the form \fM&\fIname\fR.)
        !           609: The symbol table manager maintains a list of the legal keyword names
        !           610: and, upon request, returns a numeric identification for a keyword name
        !           611: to the parser.
        !           612: An automatic procedure exists for creating the keyword name table:
        !           613: the Icon program
        !           614: \fMmkkeytab.icn\fR
        !           615: reads the specification file
        !           616: \fMkeywords\fR
        !           617: and produces the keyword name table in
        !           618: \fMkeyword.c\fR.
        !           619: The file
        !           620: \fMkeywords\fR
        !           621: is simply a list of the keyword names and a numeric identification for each.
        !           622: Since the number of keyword names is small, and only a few
        !           623: references to keywords are typical in an Icon program, lookup in the
        !           624: keyword name table is done using a linear search.
        !           625: .PP
        !           626: The sizes of the respective portions of the symbol table
        !           627: may be altered with command line arguments to the Icon translator.
        !           628: .NH
        !           629: Linker
        !           630: .PP
        !           631: The Icon linker is written entirely in C. It consists of eight files of
        !           632: source code and three header files.
        !           633: The linker performs three tasks:
        !           634: combining the global symbol tables from one or more
        !           635: runs of the translator,
        !           636: resolving undeclared identifiers,
        !           637: and translating ucode to icode.
        !           638: The resulting combined global symbol table is used
        !           639: for determining the scope of undeclared identifiers
        !           640: during the second task.
        !           641: The second and third tasks are done during a single pass
        !           642: over each intermediate code file.
        !           643: A single file of assembly code is produced.
        !           644: .PP
        !           645: The symbol table module, in the file
        !           646: \fMlsym.c\fR,
        !           647: is similar to the symbol table module of the translator,
        !           648: except that there is an additional table for storing
        !           649: field names of records.
        !           650: The input module, in the file
        !           651: \fMllex.c\fR,
        !           652: recognizes the instructions in both the global symbol table files
        !           653: and the intermediate code files.
        !           654: The global symbol tables are merged by the routine \fMglobals\fR in
        !           655: \fMglob.c\fR,
        !           656: and the intermediate code files are produced by the routines in
        !           657: \fMlcode.c\fR.
        !           658: Of the remaining source files,
        !           659: \fMilink.c\fR
        !           660: and
        !           661: \fMlmem.c\fR
        !           662: contain the main program, miscellaneous support routines,
        !           663: and memory initialization.
        !           664: The files
        !           665: \fMbuiltin.c\fR
        !           666: and
        !           667: \fMopcode.c\fR
        !           668: contain table initializations for the list of built-in procedures
        !           669: (functions)
        !           670: and the ucode operations, respectively.
        !           671: .PP
        !           672: The first phase of the linker reads
        !           673: the global symbol table file from each translator run,
        !           674: and entering all the global symbols into one combined table.
        !           675: The format of a global symbol table file is described in Appendix C.
        !           676: This phase also builds the record/field table that cross-references
        !           677: records and field names,
        !           678: and sets the trace flag for execution-time tracing
        !           679: if any of the files being linked were translated with the
        !           680: \f3\-t\fR
        !           681: option.
        !           682: .PP
        !           683: As records are entered into the global symbol table
        !           684: and the record/field table,
        !           685: they are numbered, starting from 1.
        !           686: These record numbers are used to index the record/field table
        !           687: at run-time when referencing a field.
        !           688: .PP
        !           689: When the linker encounters a link instruction, the named file is added
        !           690: to the end of a linked list of files to be linked. The list initially
        !           691: consists of the files named as arguments. Names are not added to the
        !           692: list if they are already on it.
        !           693: .PP
        !           694: The second phase reads each intermediate code file in sequence,
        !           695: emitting icode as each procedure is encountered.
        !           696: Appendix C describes the intermediate code.
        !           697: The intermediate code contains a prologue for each procedure,
        !           698: beginning with a
        !           699: \f3proc\fR
        !           700: opcode, followed by a series of
        !           701: \f3loc\fR
        !           702: opcodes describing the local symbol table, a series of
        !           703: \f3con\fR
        !           704: opcodes describing the constant table, and a
        !           705: \f3declend\fR
        !           706: opcode terminating the prologue.
        !           707: The local symbol table contains not only local symbols,
        !           708: but all identifiers referenced in the procedure \(em
        !           709: global, local, or undeclared.
        !           710: When an undeclared identifier is entered into the local symbol table,
        !           711: its scope is resolved by the following steps:
        !           712: .in .5i
        !           713: .IP \(bu \w'\(bu'u+1m
        !           714: if the identifier has been entered in the global symbol table,
        !           715: it is entered into the local symbol table as a global identifier
        !           716: .IP \(bu
        !           717: if the identifier matches the name of a function,
        !           718: it is entered into the local symbol table as a function
        !           719: .IP \(bu
        !           720: otherwise it is entered as a local identifier
        !           721: and a warning is issued if the linker was run with the
        !           722: \f3\-u\fR
        !           723: option
        !           724: .in 0
        !           725: .LP
        !           726: The constant table contains an entry for each literal used
        !           727: in the procedure.
        !           728: .PP
        !           729: The linker outputs icode in several regions.
        !           730: The first region contains constants, procedure blocks, and code
        !           731: for the Icon procedures.
        !           732: The next region contains the record/field table and procedure blocks
        !           733: for record constructors.
        !           734: The next four regions contain
        !           735: the global identifiers, the names of the global identifiers,
        !           736: the static identifiers, and the identifier table.
        !           737: The icode is a sequence of instructions,
        !           738: each with an opcode and, in some cases, operands. The sizes of
        !           739: opcodes and operands depend on the machine architecture and the
        !           740: implementor's judgement. On the
        !           741: VAX-11, opcodes are one byte long and operands are four bytes long.
        !           742: Most instructions correspond exactly to instructions
        !           743: in the ucode that is output by the translator.
        !           744: The opcode values are those used internally by the linker
        !           745: (defined in the file
        !           746: \fMlink/opcode.h\fR).
        !           747: .PP
        !           748: Fields are provided in the global symbol and literal tables
        !           749: for associating a location with each entry.
        !           750: As the prologue is being read, each cset, real, or long-integer
        !           751: literal entered into the literal table is output immediately
        !           752: and its location is stored in the literal table.
        !           753: Thus, the locations of all constants are known before their reference.
        !           754: .PP
        !           755: The same is true of references to procedures,
        !           756: since these references only occur in the initialization for global identifiers,
        !           757: which is not output until all procedures have been output.
        !           758: When the prologue for a procedure has been completely processed,
        !           759: the procedure data block is output, and its location is noted in the
        !           760: global symbol table.
        !           761: .PP
        !           762: References to program labels require backpatching,
        !           763: since there often are forward references.
        !           764: Because program label references are always local to the current procedure,
        !           765: the linker buffers the output code for a procedure.
        !           766: A table of values for all program labels is initialized to zero at
        !           767: the beginning of each procedure.
        !           768: When a label is referenced and its table entry is zero,
        !           769: the location of the reference is negated and stored in the table entry
        !           770: and a zero is output for the operand.
        !           771: If a label's table entry is negative,
        !           772: the location of the reference is negated and stored in the table entry
        !           773: as before, but the previous value of the table entry is output for the operand.
        !           774: This forms a linked list of references to the as-yet-undefined label.
        !           775: When a label is defined, each reference on the linked list is replaced
        !           776: with the correct value of the label.
        !           777: .PP
        !           778: References to global and static identifiers are determined at run-time.
        !           779: The
        !           780: \f3glob\fR
        !           781: and
        !           782: \f3static\fR
        !           783: instructions have an integer operand referring to the identifier by
        !           784: position in the global or static identifier region.
        !           785: When one of these instructions is interpreted,
        !           786: the actual address is calculated from the position and the
        !           787: known address of the global or static identifier region.
        !           788: References to functions are also resolved at run-time.
        !           789: Each function is assigned an integer index
        !           790: (its position in the table of functions in
        !           791: \fMbuiltin.c\fR).
        !           792: When the global identifier initialization for a function is output,
        !           793: the negated index is output instead of an address.
        !           794: The interpreter fills in the correct address during program initialization.
        !           795: .PP
        !           796: Once the prologue has been processed,
        !           797: a procedure data block (see Section 3.1)
        !           798: is emitted.
        !           799: Opcodes following the prologue represent execution-time operations,
        !           800: and cause code to be emitted.
        !           801: .PP
        !           802: The record/field table is a two-dimensional matrix,
        !           803: first indexed by a field number assigned to each identifier that
        !           804: is used as a field name,
        !           805: next by a record number assigned to each record type.
        !           806: The value at the selected position in the table is the
        !           807: index of the field in a record of the given type, or \-1
        !           808: if the given record type does not contain the given field.
        !           809: .PP
        !           810: The initial value for global and static identifiers
        !           811: is the null value unless the global identifier is a procedure,
        !           812: function, or record constructor,
        !           813: in which case the initial value is a descriptor of type
        !           814: .I procedure
        !           815: pointing to the appropriate procedure data block.
        !           816: The values output use the data representations described
        !           817: in Section 3.1.
        !           818: .PP
        !           819: The names of global and static identifiers are output as
        !           820: .I "string qualifier"
        !           821: descriptors (see Section 3.1)
        !           822: and are used by the function
        !           823: \fMdisplay\fR.
        !           824: All string qualifiers contained in the generated procedure data blocks
        !           825: and global and static names point into the identifier table,
        !           826: which is just a static string space for that purpose.
        !           827: .NH
        !           828: The Interpreter
        !           829: .PP
        !           830: The interpreter consists of an interpretive loop and a collection of
        !           831: run-time routines that
        !           832: collectively provide support for the execution
        !           833: of an Icon program.
        !           834: .ig
        !           835: This library is searched by the loader for those
        !           836: routines necessary for a particular Icon program.
        !           837: The assembly code generated by the linker contains
        !           838: subroutine calls to library routines to perform most
        !           839: high-level operations where in-line code would be inappropriate.
        !           840: An executable program is created by assembling
        !           841: the linker output, then loading a startup routine and
        !           842: the assembler output with the run-time library and the C library.
        !           843: The startup routine, run-time library, and C library together
        !           844: form the run-time system.
        !           845: ..
        !           846: .PP
        !           847: Three directories contain routines relating directly to source-language
        !           848: operations:
        !           849: \fMfunctions\fR,
        !           850: \fMoperators\fR,
        !           851: and
        !           852: \fMlib\fR.
        !           853: The first two directories contain
        !           854: one routine per function or operator, respectively.
        !           855: The
        !           856: \fMlib\fR
        !           857: directory contains routines relating to Icon control structures.
        !           858: A fourth directory,
        !           859: \fMrt\fR,
        !           860: contains routines for performing common operations
        !           861: needed by many routines in the other three directories.
        !           862: In particular,
        !           863: \fMrt\fR
        !           864: contains routines that handle storage allocation and reclamation,
        !           865: type conversion, data comparison,
        !           866: integer arithmetic with overflow checking,
        !           867: generator suspension,
        !           868: and tracing.
        !           869: The directory \fMiconx\fR contains initialization and the interpreter proper.
        !           870: .PP
        !           871: In each of the four run-time directories, all of the object files are
        !           872: archived in a \fMLib\fR file which is randomized to speed loading.
        !           873: The \fMLib\fR files are loaded together with a startup routine and the
        !           874: interpretive loop to produce the interpreter.
        !           875: .PP
        !           876: Most of the run-time system is coded in C,
        !           877: but some of the routines are coded in assembly language.
        !           878: The interpretive loop and startup routines are written in assembly language, as is
        !           879: integer arithmetic with overflow checking
        !           880: (C does not provide this),
        !           881: as well as other routines concerned with stack management.
        !           882: .PP
        !           883: The interpreter is loaded with the run-time libraries and the C
        !           884: library to form the program \fMiconx\fR, which interprets
        !           885: icode.
        !           886: .PP
        !           887: Before the interpreter begins executing the Icon program,
        !           888: it reads in the icode file generated by the linker.
        !           889: The first eight words of this file contain header information
        !           890: indicating the total size of the rest of the file,
        !           891: the initial value of
        !           892: \fM&trace\fR,
        !           893: and the relative offsets from the beginning of the file to the
        !           894: various regions.
        !           895: These offsets are converted to actual addresses by adding the base
        !           896: address of the icode buffer.
        !           897: Several pointers in the icode must also be relocated.
        !           898: The interpreter sweeps through the global identifiers,
        !           899: looking for procedures, functions, and record constructors.
        !           900: For each function, it supplies the address of the appropriate procedure block.
        !           901: For each procedure, it relocates pointers
        !           902: from its procedure block to the procedure entry point,
        !           903: as well as to strings representing the procedure
        !           904: and local identifier names in the identifier table.
        !           905: For each record, it supplies the address of
        !           906: \fMmkrec\fR,
        !           907: the routine that constructs new records,
        !           908: as the entry point field in the procedure block.
        !           909: .PP
        !           910: The interpreter then begins execution by invoking the value of the first global
        !           911: identifier, which corresponds to the procedure
        !           912: \fMmain\fR.
        !           913: If there is no main procedure, the first global identifier has the
        !           914: null value and a run-time error is reported.
        !           915: The routine
        !           916: \fMinvoke\fR
        !           917: sets
        !           918: the
        !           919: \fIinterpreter program counter \fR(\fIipc\fR)
        !           920: to the entry point, and branches to
        !           921: \fMinterp\fR, which is contained in \fMiconx/interp.s\fR.
        !           922: .PP
        !           923: The routine
        !           924: \fMinterp\fR
        !           925: is the main interpreter loop.
        !           926: It fetches the next opcode,
        !           927: and branches to the appropriate processing routine through a jump table.
        !           928: .NH 2
        !           929: Data Representations
        !           930: .PP
        !           931: Icon has two elementary forms of data objects \(em values and variables.
        !           932: Values often can be converted from one data type to another.
        !           933: When this is done automatically, it is called
        !           934: .I coercion .
        !           935: There are three kinds of variables, each discussed below:
        !           936: .I "natural variables" ,
        !           937: .I "created variables" ,
        !           938: and
        !           939: .I "trapped variables" .
        !           940: The process of obtaining the value referred to by a variable is called
        !           941: .I dereferencing .
        !           942: .PP
        !           943: Every data object is represented by a two-word
        !           944: .I descriptor ,
        !           945: which may, depending on the type of the object,
        !           946: contain a value or
        !           947: refer to some other area of memory for the actual value.
        !           948: The first word of the descriptor always indicates the data type,
        !           949: and the second word either contains the value or a pointer to it.
        !           950: There are six descriptor formats, shown in Appendix D:
        !           951: .I null ,
        !           952: .I "string qualifier" ,
        !           953: .I "integer" ,
        !           954: .I value ,
        !           955: .I variable ,
        !           956: and
        !           957: .I "trapped variable" .
        !           958: These formats are distinguished from one another by the three high-order
        !           959: bits of the first word,
        !           960: except that a
        !           961: .I null
        !           962: descriptor is distinguished from a
        !           963: .I "string qualifier"
        !           964: only by the contents of the second word.
        !           965: Among
        !           966: .I "integer" ,
        !           967: .I value ,
        !           968: and
        !           969: .I "trapped variable"
        !           970: descriptors, the low-order six bits of the first word
        !           971: identify the type of object represented,
        !           972: while the remaining high-order bits in the first word are flags that
        !           973: classify the object (for example, whether the second word contains
        !           974: a pointer
        !           975: \(em historically, a \*(oqfloating address\*(cq [7]).
        !           976: .PP
        !           977: The
        !           978: .I null
        !           979: descriptor represents the null value.
        !           980: A
        !           981: .I "string qualifier"
        !           982: represents a string, and contains the length of the string
        !           983: and a pointer to the first character of the string.
        !           984: An
        !           985: .I "integer"
        !           986: descriptor represents an integer small enough
        !           987: to fit in the second word of the descriptor.
        !           988: This includes all integers on computers whose C \fIints\fR are the same
        !           989: size as C \fIlongs\fR (such as the VAX-11).
        !           990: All data types other than \f3integer\fR, \f3string\fR, and \f3null\fR are
        !           991: represented by
        !           992: .I value
        !           993: descriptors.
        !           994: A value descriptor
        !           995: contains a pointer to a
        !           996: .I "data block"
        !           997: of appropriate format for a value of the given type.
        !           998: On computers whose C \fIlongs\fR are longer than C \fIints\fR
        !           999: (such as the PDP-11),
        !          1000: an integer that requires more bits than there are in an \fIint\fR is contained
        !          1001: in a \fIlong integer\fR data block.
        !          1002: The data block formats for each data type are shown in Appendix D.
        !          1003: .PP
        !          1004: A
        !          1005: .I variable
        !          1006: descriptor represents either a natural variable
        !          1007: or a created variable.
        !          1008: A natural variable contains a pointer to a descriptor at a fixed location
        !          1009: (for a global identifier) or a location on the stack (for a local identifier)
        !          1010: where the value of the variable is stored.
        !          1011: A created variable, formed by a table, list, or field reference,
        !          1012: contains a pointer to a descriptor in the aggregate
        !          1013: where the referenced element is located.
        !          1014: Since such elements are in the heap,
        !          1015: created variables also contain an offset that indicates
        !          1016: the distance (in words) from the beginning of the data block
        !          1017: to the referenced descriptor.
        !          1018: This offset is used during the marking phase of garbage collection,
        !          1019: discussed in Section 3.3.
        !          1020: .PP
        !          1021: A
        !          1022: .I "trapped variable"
        !          1023: [8] descriptor represents a variable for which special action is necessary
        !          1024: upon dereferencing or assignment.
        !          1025: Such variables include substrings,
        !          1026: non-existent elements of tables,
        !          1027: and certain keywords.
        !          1028: Each type of trapped variable is distinguished by the first word
        !          1029: of the descriptor.
        !          1030: .PP
        !          1031: Substring trapped variables,
        !          1032: created by a section or subscripting operation,
        !          1033: contain a pointer to a data block that contains a
        !          1034: .I variable
        !          1035: descriptor identifying the value from which the substring was taken,
        !          1036: an integer indicating the beginning position of the substring,
        !          1037: and an integer showing the length of the substring.
        !          1038: With this information, assignment to a substring of a variable
        !          1039: can modify the contents of the variable properly.
        !          1040: Substrings of non-variables do not produce
        !          1041: substring trapped variables, since assignment to such substrings
        !          1042: is meaningless and illegal;
        !          1043: instead, forming the substring of a non-variable produces a
        !          1044: string qualifier.
        !          1045: .PP
        !          1046: Table element trapped variables,
        !          1047: formed by referencing a non-existent element of a table,
        !          1048: similarly contain a pointer to a data block that contains
        !          1049: enough information for assignment to add the element to the
        !          1050: referenced table or to supply the default table value.
        !          1051: .PP
        !          1052: The keywords
        !          1053: \fM&pos\fR,
        !          1054: \fM&random\fR,
        !          1055: and
        !          1056: \fM&trace\fR
        !          1057: are handled via trapped variables (\fM&subject\fR is handled differently).
        !          1058: These trapped variables
        !          1059: need no additional information.
        !          1060: It is sufficient to know the type of trapped variable on dereferencing \(em
        !          1061: the value of the keyword can be accessed and returned.
        !          1062: On assignment, the new value is coerced to the appropriate type,
        !          1063: checked for validity, and assigned to the keyword.
        !          1064: .PP
        !          1065: Strings formed during program execution are placed in the
        !          1066: .I "string space" ;
        !          1067: string qualifiers for these strings point into this region.
        !          1068: Substrings of existing strings are not allocated again;
        !          1069: instead, a string qualifier is formed that points into the
        !          1070: existing string.
        !          1071: When storage is exhausted in the string space,
        !          1072: the garbage collector (see Section 3.3)
        !          1073: is invoked to reclaim unused space and compact the region;
        !          1074: if enough space cannot be reclaimed, the region is expanded if possible.
        !          1075: .PP
        !          1076: Data blocks formed during program execution are placed in the
        !          1077: .I heap .
        !          1078: Data blocks have a rigid format dictated by the garbage collection
        !          1079: algorithm.
        !          1080: The first word of the block always contains a type code
        !          1081: which identifies the structure of the rest of the block.
        !          1082: Descriptors follow all non-descriptor information in the block.
        !          1083: If the size of the block is not determined by its type,
        !          1084: the size (in bytes) is contained in the second word of the block.
        !          1085: When storage is exhausted in the heap,
        !          1086: the garbage collector is invoked to reclaim unused space and compact the heap;
        !          1087: if enough space cannot be reclaimed, the heap is expanded if possible.
        !          1088: .PP
        !          1089: Co-expression stack blocks are allocated in a separate region and
        !          1090: are treated specially by the garbage collector.
        !          1091: .NH 2
        !          1092: Stack Organization
        !          1093: .PP
        !          1094: The stack is the focus of activity
        !          1095: during the execution of an Icon program.
        !          1096: All operators, functions, and procedures
        !          1097: expect to find their arguments at the top of the stack,
        !          1098: and replace the arguments with the result of their computation.
        !          1099: Local variables for Icon procedures are also kept on the stack.
        !          1100: The arguments, local variables, and temporaries on the stack
        !          1101: for an active Icon procedure are collectively called a
        !          1102: .I "procedure frame" .
        !          1103: This is one of several kinds of
        !          1104: .I "stack frames"
        !          1105: discussed in this section.
        !          1106: Appendix E summarizes the layouts of the stack frames for the PDP-11 and VAX-11.
        !          1107: See [3] for a detailed discussion of stack frames.
        !          1108: Each co-expression also has a stack. For uniformity, the main stack
        !          1109: is treated as the stack for the co-expression \fM&main\fR.
        !          1110: .PP
        !          1111: On the PDP-11 and VAX-11 stacks start in high memory and grow downward.
        !          1112: On these computers, a push causes the stack pointer to decrease and a
        !          1113: pop causes the stack pointer to increase. Thus ``above'' and ``below''
        !          1114: refer, respectively, to ``older'' and ``newer'' information on the
        !          1115: stack. An exception to this is the phrase
        !          1116: ``top of the stack'', which is used to
        !          1117: refer to the \fIlowest\fR memory location. The description of relative
        !          1118: stack locations that follows is based on this kind of architecture
        !          1119: and nomenclature.
        !          1120: .PP
        !          1121: Before an Icon procedure calls another Icon procedure,
        !          1122: the caller pushes the procedure to be called
        !          1123: (a descriptor \(em procedures are data objects in Icon)
        !          1124: onto the stack.
        !          1125: The caller then pushes each argument (also a descriptor) onto the stack,
        !          1126: leftmost argument first.
        !          1127: The caller then pushes one word onto the stack
        !          1128: indicating the number of arguments supplied,
        !          1129: which may be different from the number of arguments expected.
        !          1130: The run-time library routine
        !          1131: \fMinvoke\fR
        !          1132: is then called, which checks that the first descriptor pushed above
        !          1133: actually does represent an integer, procedure, or a variable whose value is
        !          1134: an integer or a procedure.
        !          1135: An integer indicates the selection of one of the
        !          1136: arguments resulting from mutual evaluation.
        !          1137: A procedure, on the other hand,
        !          1138: points to a procedure data block,
        !          1139: which contains various information about the called procedure,
        !          1140: including the number of arguments expected,
        !          1141: the number of local variables used,
        !          1142: and the procedure's entry point address.
        !          1143: The routine
        !          1144: \fMinvoke\fR
        !          1145: next adjusts the number of arguments supplied to match the number expected,
        !          1146: deleting excess arguments or supplying the null value for missing ones.
        !          1147: This adjustment is performed by moving the portion of the stack
        !          1148: below the arguments up or down, as appropriate.
        !          1149: It then dereferences the arguments.
        !          1150: A
        !          1151: .I "procedure marker"
        !          1152: is then pushed onto the stack,
        !          1153: and the
        !          1154: .I "procedure frame pointer"
        !          1155: is set to point to the new procedure marker.
        !          1156: The procedure marker contains, among other things,
        !          1157: the return address in the calling procedure
        !          1158: and the previous value of the procedure frame pointer.
        !          1159: Next, the null value is pushed onto the stack
        !          1160: as the initial value for each local variable.
        !          1161: The routine
        !          1162: \fMinvoke\fR
        !          1163: then transfers control to the procedure's entry point,
        !          1164: and execution of the Icon program resumes in the new procedure.
        !          1165: .PP
        !          1166: When a procedure is ready to return to its caller,
        !          1167: it pushes its return value (a descriptor) on the stack.
        !          1168: It then transfers control to
        !          1169: \fMpret\fR,
        !          1170: which moves the return value
        !          1171: to the location occupied by the descriptor that represented the
        !          1172: called procedure.
        !          1173: That is, the return value is stored in place of the
        !          1174: first descriptor that was pushed
        !          1175: at the beginning of the calling sequence described above.
        !          1176: The return sequence
        !          1177: then restores the state of the previous procedure
        !          1178: from the current procedure marker
        !          1179: (the procedure marker that the procedure frame pointer
        !          1180: currently points to).
        !          1181: This includes restoring the previous value of the procedure frame pointer,
        !          1182: retrieving the return address,
        !          1183: and popping the returning procedure's local variables,
        !          1184: procedure marker, and arguments.
        !          1185: Thus, when the calling procedure regains control,
        !          1186: the arguments have been popped
        !          1187: and the return value is now at the top of the stack.
        !          1188: .PP
        !          1189: Functions and operators are written in C,
        !          1190: and therefore obey the C calling sequence.
        !          1191: By design, the Icon calling sequence described above
        !          1192: is similar to the C calling sequence.
        !          1193: When an Icon procedure calls a function, a
        !          1194: .I boundary
        !          1195: on the stack is introduced,
        !          1196: where the stack above the boundary is regimented by Icon standards,
        !          1197: and the stack below the boundary contains C information.
        !          1198: This boundary is important during garbage collection:
        !          1199: The garbage collector must ignore the area of the stack
        !          1200: below the boundary,
        !          1201: since the structure of this area is unknown,
        !          1202: whereas the structure of the area above the boundary is
        !          1203: well-defined.
        !          1204: In particular, all data above the boundary is contained
        !          1205: in descriptors or is defined by the structure of a frame,
        !          1206: so that all pointers into the heap or string space may be
        !          1207: located during a garbage collection.
        !          1208: .PP
        !          1209: Functions and operators are written to \*(oqstraddle\*(cq
        !          1210: the boundary.
        !          1211: From above, they are designed to resemble Icon procedures;
        !          1212: from below, they are C procedures.
        !          1213: An Icon procedure calls a function in the same way as it
        !          1214: calls another Icon procedure;
        !          1215: in fact, functions are procedure-typed
        !          1216: data objects just as Icon procedures are.
        !          1217: When
        !          1218: \fMinvoke\fR
        !          1219: recognizes that a function is being called,
        !          1220: it bypasses the argument adjustment
        !          1221: if the field in the procedure data block that indicates
        !          1222: the number of arguments expected contains \-1,
        !          1223: which indicates that the function can take an arbitrary number of arguments.
        !          1224: It also does not allocate stack space for local variables, since
        !          1225: any such variables are C variables and are allocated by the C
        !          1226: function itself.
        !          1227: C procedures have an entry sequence that creates a new
        !          1228: procedure frame;
        !          1229: since
        !          1230: \fMinvoke\fR
        !          1231: has already done this,
        !          1232: the entry point for functions must be set past any instructions
        !          1233: that are involved in procedure frame creation.
        !          1234: .PP
        !          1235: The first formal parameter of
        !          1236: all functions is
        !          1237: \fMnargs\fR,
        !          1238: which corresponds to the word that contains the number of
        !          1239: arguments supplied.
        !          1240: For functions that expect a fixed number of arguments,
        !          1241: they are also listed as arguments, in reverse order.
        !          1242: For functions that can take an arbitrary number of arguments,
        !          1243: there is a macro
        !          1244: \fMARG(\fIn\fM)\fR
        !          1245: that uses the address and contents
        !          1246: of
        !          1247: \fMnargs\fR
        !          1248: to calculate the location of the
        !          1249: .I n \^th
        !          1250: argument.
        !          1251: Thus,
        !          1252: \fMARG(1)\fR
        !          1253: accesses the first argument (as a descriptor),
        !          1254: and
        !          1255: \fMARG(nargs)\fR
        !          1256: accesses the last argument.
        !          1257: Each function is responsible for supplying defaults for missing arguments
        !          1258: and for dereferencing arguments that are variables.
        !          1259: Because of the calling protocol,
        !          1260: \fMARG(0)\fR
        !          1261: accesses the location where the return value should be stored.
        !          1262: Every function must place its result there and
        !          1263: then return through normal C conventions.
        !          1264: Each function must also supply a procedure data block that contains
        !          1265: the number of arguments expected (or \-1), its entry point,
        !          1266: and a string qualifier representing its name.
        !          1267: .PP
        !          1268: Operators are very similar to functions. The only difference is that
        !          1269: operators are called directly (rather than being called through \fMinvoke\fR)
        !          1270: and must set the boundary themselves.
        !          1271: .ig
        !          1272: .PP
        !          1273: Operators are written like functions,
        !          1274: with one exception.
        !          1275: Since operators are not variables
        !          1276: (as function and procedure names are),
        !          1277: the name of the operator is known at translation time,
        !          1278: and the Icon procedure calls it directly
        !          1279: (at its normal entry point)
        !          1280: without going through
        !          1281: \fMinvoke\fR.
        !          1282: Thus, operators do not need procedure data blocks.
        !          1283: Instead of pushing a procedure descriptor pointing to a procedure block,
        !          1284: a null value is pushed instead to hold a place for the return value.
        !          1285: ..
        !          1286: .PP
        !          1287: When an operator or function fails to produce a result,
        !          1288: it calls
        !          1289: \fMfail\fR.
        !          1290: This routine initiates backtracking as described below.
        !          1291: .PP
        !          1292: Expressions are evaluated within an
        !          1293: .I "expression frame" .
        !          1294: When the evaluation of an expression is complete,
        !          1295: whether it has produced a result or failed,
        !          1296: the expression frame must be popped from the stack
        !          1297: and the result of the expression must be pushed back onto the stack.
        !          1298: The expression frame marks the stack height at the point
        !          1299: that the expression began to be evaluated,
        !          1300: so that the stack may be restored to its original state
        !          1301: when the evaluation of the expression is complete.
        !          1302: The stack normally would be restored to the original height
        !          1303: (that is, the pops would match the pushes)
        !          1304: except when an expression fails at some midpoint in its evaluation.
        !          1305: The expression frame is also used to limit backtracking:
        !          1306: backtracking is restricted in the language
        !          1307: to the current expression instance only.
        !          1308: .PP
        !          1309: When evaluation of an expression begins, an
        !          1310: .I "expression marker"
        !          1311: is pushed on the stack, the
        !          1312: .I "expression frame pointer"
        !          1313: is set to point to it, and the
        !          1314: .I "generator frame pointer" ,
        !          1315: discussed below, is cleared.
        !          1316: The marker contains the previous values of the
        !          1317: expression and generator frame pointers
        !          1318: and a failure label.
        !          1319: When an expression produces a result,
        !          1320: that result, on the top of the stack,
        !          1321: is popped and saved.
        !          1322: Then the stack is popped to the expression marker,
        !          1323: and the previous values of the two frame pointers are restored.
        !          1324: The marker is popped
        !          1325: and the result of the expression is pushed back onto the stack,
        !          1326: now a part of the previous expression frame.
        !          1327: If an expression fails to produce a result,
        !          1328: \fMfail\fR
        !          1329: pops the stack to the expression marker,
        !          1330: restores the previous values of the two frame pointers,
        !          1331: and branches to the failure label.
        !          1332: In the special case that the failure label is zero,
        !          1333: \fMfail\fR
        !          1334: is effectively called again to indicate failure in the new expression frame.
        !          1335: Thus the failure is propagated from one expression to an enclosing one.
        !          1336: .PP
        !          1337: If an expression has any generators,
        !          1338: then there is a
        !          1339: .I "generator frame"
        !          1340: within the current expression frame
        !          1341: for each generator that is inactive
        !          1342: (that is, that has produced a value but is not yet exhausted).
        !          1343: A generator frame preserves the state of the stack
        !          1344: at the point just before the generator
        !          1345: (whether it be operator, function, or procedure)
        !          1346: suspended (became inactive).
        !          1347: If
        !          1348: \fMfail\fR
        !          1349: is called and there are inactive generators,
        !          1350: then instead of exiting the current expression frame,
        !          1351: the most recently inactivated generator is reactivated
        !          1352: by restoring the stack to the state saved in the most recent
        !          1353: generator frame.
        !          1354: .PP
        !          1355: A function or operator suspends itself by calling
        !          1356: \fMsuspend\fR.
        !          1357: This routine preserves the state of the stack
        !          1358: by duplicating the current expression frame,
        !          1359: bounded on one end by the most recent generator frame
        !          1360: (or, if there are no inactive generators, the current expression frame)
        !          1361: and bounded on the other end by the beginning of the
        !          1362: argument list of the suspending function or operator.
        !          1363: A generator marker is pushed onto the stack,
        !          1364: followed by the duplicate expression frame.
        !          1365: The routine
        !          1366: \fMsuspend\fR
        !          1367: then causes the suspending function or operator to return to its caller,
        !          1368: instead of itself returning.
        !          1369: .PP
        !          1370: When reactivated by
        !          1371: \fMfail\fR,
        !          1372: the stack is restored to the generator marker,
        !          1373: which is used to restore the various frame pointers.
        !          1374: Then the marker is popped.
        !          1375: The stack is then in the same state that it was in when
        !          1376: \fMsuspend\fR
        !          1377: was called.
        !          1378: The routine
        !          1379: \fMfail\fR
        !          1380: then returns to the generator
        !          1381: as if the original call to
        !          1382: \fMsuspend\fR
        !          1383: had returned.
        !          1384: Thus the following schema is typical of operators and functions
        !          1385: that generate a sequence of values.
        !          1386: .DS
        !          1387: .ss 9
        !          1388: .ft M
        !          1389: \fIinitialize\fM;
        !          1390: \fMwhile\fM (\fInot exhausted\fM) {
        !          1391:    \fIcompute next value\fM;
        !          1392:    \fIstore return value\fM;
        !          1393:    suspend()
        !          1394:    }
        !          1395: fail();
        !          1396: .ft R
        !          1397: .ss 4
        !          1398: .DE
        !          1399: The effect of resuming an expression containing generators is that
        !          1400: \fMsuspend\fR
        !          1401: actually causes the generator to return.
        !          1402: If alternatives are needed, backtracking occurs,
        !          1403: and the apparent effect is that
        !          1404: \fMsuspend\fR
        !          1405: has finally returned.
        !          1406: The generator computes the next result,
        !          1407: and suspends with it.
        !          1408: When the generator is exhausted,
        !          1409: it merely fails without suspending,
        !          1410: which just passes the failure back to the next most recently inactivated generator,
        !          1411: if any.
        !          1412: .PP
        !          1413: Just as functions and operators can return normally, suspend, or fail,
        !          1414: so can Icon procedures.
        !          1415: The mechanics are essentially the same, but the differences in stack
        !          1416: layout require different primitives.
        !          1417: When Icon procedures return normally, the return value is presumed to
        !          1418: be at the top of the stack and
        !          1419: \fMpret\fR
        !          1420: is called.
        !          1421: Similarly, Icon procedures call
        !          1422: \fMpsusp\fR
        !          1423: to suspend.
        !          1424: Both of these routines also dereference the return result
        !          1425: if it is a local variable.
        !          1426: The routine
        !          1427: \fMpfail\fR
        !          1428: causes an Icon procedure to return with no result.
        !          1429: .PP
        !          1430: The same three primitives are also needed at the expression level:
        !          1431: \fMeret\fR,
        !          1432: \fMesusp\fR,
        !          1433: and
        !          1434: \fMefail\fR.
        !          1435: Unlike
        !          1436: \fMunmark\fR,
        !          1437: \fMeret\fR
        !          1438: is not a library routine, but is generated as in-line code.
        !          1439: Both cause an exit from the current expression frame; but
        !          1440: \fMeret\fR
        !          1441: supplies a result to the enclosing expression,
        !          1442: while
        !          1443: \fMunmark\fR
        !          1444: does not.
        !          1445: The routine
        !          1446: \fMesusp\fR
        !          1447: creates a inactive generator before supplying a result to the
        !          1448: enclosing expression;
        !          1449: it is used by the alternation control structure.
        !          1450: The routine
        !          1451: \fMefail\fR
        !          1452: simply causes backtracking within the current expression frame.
        !          1453: In fact,
        !          1454: \fMfail\fR
        !          1455: and
        !          1456: \fMpfail\fR
        !          1457: merely exit their procedure frame before branching to
        !          1458: \fMefail\fR.
        !          1459: .NH 2
        !          1460: Storage Allocation and Reclamation
        !          1461: .PP
        !          1462: During program execution, storage allocation is necessary when a data
        !          1463: object is created.
        !          1464: The three primitive routines
        !          1465: \fMallocate\fR,
        !          1466: \fMalcstr\fR, and \fMalcestk\fR
        !          1467: allocate storage in the heap, string space, and co-expression stack space, respectively.
        !          1468: All three routines return pointers to the beginning of newly
        !          1469: allocated space.
        !          1470: None of the routines is responsible for ensuring that enough space
        !          1471: remains in the data regions.
        !          1472: Ensuring that enough space remains in the data regions is
        !          1473: the responsibility of a
        !          1474: .I "predictive need"
        !          1475: strategy described below.
        !          1476: .PP
        !          1477: In the heap,
        !          1478: \fMallocate(n)\fR
        !          1479: returns a pointer to
        !          1480: \fMn\fR
        !          1481: contiguous bytes of storage.
        !          1482: Because a wide variety of objects may reside in the heap, a number of
        !          1483: support routines are provided to simplify the storing of various objects.
        !          1484: There is a specific routine to allocate a block for each datatype in the heap.
        !          1485: Where appropriate, these routines have the actual values to be stored
        !          1486: as their arguments.
        !          1487: All of the routines call
        !          1488: \fMallocate\fR
        !          1489: to obtain storage for the object.
        !          1490: .PP
        !          1491: In the string space,
        !          1492: \fMalcstr(s,\*bl)\fR
        !          1493: allocates room for a string of length
        !          1494: \fMl\fR
        !          1495: and copies the string pointed to by
        !          1496: \fMs\fR
        !          1497: into this space.
        !          1498: Since some routines such as
        !          1499: \fMleft\fR,
        !          1500: \fMright\fR,
        !          1501: and
        !          1502: \fMcenter\fR
        !          1503: need room in the string space in which to construct a string,
        !          1504: a call to
        !          1505: \fMalcstr\fR
        !          1506: with the defined constant
        !          1507: \fMNULL\fR
        !          1508: as the first argument results in the
        !          1509: allocation of storage without attempting to copy a string.
        !          1510: .PP
        !          1511: In the co-expression stack region, \fMalcestk()\fR allocates a new
        !          1512: co-expression stack.
        !          1513: .PP
        !          1514: Source code for all of the allocation routines is contained in the file
        !          1515: \fMrt/alc.c\fR.
        !          1516: Almost all interaction with the storage management is made through these
        !          1517: routines.
        !          1518: Two exceptions occur in string concatenation (\fMoperators/cat.c\fR) and
        !          1519: reading a fixed number of
        !          1520: bytes from a file (\fMfunctions/reads.c\fR).
        !          1521: In these cases, it is simpler and more efficient to have these operations
        !          1522: deal directly with storage management.
        !          1523: .PP
        !          1524: As mentioned earlier, a
        !          1525: .I "predictive need"
        !          1526: strategy is employed to ensure that enough room remains for data storage.
        !          1527: Simply put,
        !          1528: .I "predictive need"
        !          1529: states that it is the responsibility of any routine
        !          1530: that calls an allocation routine both to ensure that enough room remains
        !          1531: in the proper data region and to maintain the validity of any temporary
        !          1532: pointers into the data regions, should a
        !          1533: .I "garbage collection"
        !          1534: be necessary to free storage space.
        !          1535: .PP
        !          1536: Since the check for storage space only needs to occur before the allocation
        !          1537: takes place, each routine may perform this check at its convenience.
        !          1538: This approach permits the minimization of the number of temporary
        !          1539: pointers that must be protected during garbage collection.
        !          1540: As an aid, space for several descriptors is automatically protected
        !          1541: by the procedure invocation mechanism, and usually is used to hold
        !          1542: information pertaining to the arguments of the procedure (see Section 3.4).
        !          1543: .PP
        !          1544: Routines to ensure space are provided for each of the three
        !          1545: storage regions.
        !          1546: The routine
        !          1547: \fMsneed(n)\fR
        !          1548: ensures that at least
        !          1549: \fMn\fR
        !          1550: bytes of storage remain in the string space, and
        !          1551: \fMhneed(n)\fR
        !          1552: performs the same function in the heap.
        !          1553: \fMesneed()\fR ensures that there is a co-expression stack
        !          1554: available.
        !          1555: If either routine finds that there is insufficient storage remaining,
        !          1556: it invokes the
        !          1557: .I "garbage collector"
        !          1558: in an attempt to obtain that storage.
        !          1559: If that fails, then program execution is aborted with an appropriate
        !          1560: diagnostic.
        !          1561: .PP
        !          1562: Garbage collection, or
        !          1563: .I "storage reclamation" ,
        !          1564: is a process that identifies all valid allocated data.
        !          1565: In the string and heap regions, valid data is compacted
        !          1566: in order to provide a contiguous area of unused storage.
        !          1567: The algorithm used for identifying valid data is based upon
        !          1568: the algorithm described by Hanson [7].
        !          1569: Only the more novel features are discussed here.
        !          1570: .PP
        !          1571: Whenever a predictive need request discovers that insufficient
        !          1572: storage remains in either the heap or string space, the garbage
        !          1573: collector is invoked to reclaim unused space in all regions.
        !          1574: This approach is more efficient in situations where all regions
        !          1575: are heavily allocated and only slightly less efficient otherwise.
        !          1576: .PP
        !          1577: The approach is to sweep through the permanent data regions and the
        !          1578: stack, looking for descriptors that are either pointers into the heap or
        !          1579: string qualifiers.
        !          1580: When a string qualifier is found, a pointer to that qualifier is
        !          1581: saved in a temporary data region at the end of the heap.
        !          1582: If the descriptor is a pointer into the heap, then that heap data
        !          1583: block contains valid information.
        !          1584: The block is marked as valid, the descriptor is placed on a back chain
        !          1585: headed in the block,
        !          1586: and the marking process is called
        !          1587: recursively on any descriptors within that block.
        !          1588: Blocks that are already marked as valid are not processed a second time.
        !          1589: To simplify the marking of heap blocks, all data blocks have been
        !          1590: designed so that all descriptors within them exist as a contiguous
        !          1591: section at the end of the block.
        !          1592: Thus to sweep through the descriptors within a block, the marking
        !          1593: algorithm need only know the size of the block and the
        !          1594: location of the first descriptor.
        !          1595: Information concerning a data block's size, as well as the
        !          1596: offset for the first descriptor is in the file
        !          1597: \fMrt/dblocks.c\fR.
        !          1598: .PP
        !          1599: Valid co-expression stacks also may contain string qualifiers and
        !          1600: pointers to other valid data; such stacks are included in the
        !          1601: marking phase.
        !          1602: .PP
        !          1603: After the marking phase is completed, the string region is compacted.
        !          1604: The algorithm used is described by Hanson [9].
        !          1605: The pointers to the string qualifiers are sorted so that the order of
        !          1606: all valid strings within the string space is identified.
        !          1607: The string qualifiers are then processed in order, and modified as
        !          1608: the valid strings are compacted.
        !          1609: If this compaction does not free enough space within the
        !          1610: string space to satisfy the request, the heap must be moved in order
        !          1611: to provide more room in the string space.
        !          1612: An attempt is also made to provide
        !          1613: some additional
        !          1614: ``breathing room''
        !          1615: in the string space to permit future expansion.
        !          1616: .PP
        !          1617: The heap cannot be moved until after the valid pointers into it are
        !          1618: adjusted and the storage is compacted.
        !          1619: The pointer adjustment and heap compaction phases are two linear passes through
        !          1620: the heap which must be performed during standard heap garbage collection.
        !          1621: The only difference when the heap is to be moved is that the adjusted
        !          1622: pointers point to where that data will be after the heap has been moved.
        !          1623: If not enough breathing room is freed in the heap, then
        !          1624: more space is requested from the operating system.
        !          1625: As a last step, if the string space needs more room, the heap is
        !          1626: relocated.
        !          1627: .PP
        !          1628: This method has proved to be quite satisfactory for most applications.
        !          1629: A shortcoming of the implementation is the absence of a process for
        !          1630: decreasing the size of a data region, should it become too large.
        !          1631: It is also possible that insufficient room would be available for storing
        !          1632: the pointers to the string qualifiers, even though enough storage
        !          1633: would become available if the heap were collected separately.
        !          1634: In practice, this has not been a problem.
        !          1635: The source code for the garbage collector is contained in the files
        !          1636: \fMrt/gcollect.s\fR, \fMrt/gc.c\fR, and \fMrt/sweep.c\fR.
        !          1637: .NH 2
        !          1638: Coding Conventions
        !          1639: .PP
        !          1640: The calling conventions for functions and operators have been mentioned
        !          1641: earlier.
        !          1642: Several other aspects of the run-time system are explained here.
        !          1643: .PP
        !          1644: All header files for the run-time system are in the directory
        !          1645: \fMh\fR.
        !          1646: The file
        !          1647: \fMh/rt.h\fR
        !          1648: (or, for assembly-language routines,
        !          1649: \fMh/defs.s\fR)
        !          1650: is included by almost every source file in the run-time system, and contains
        !          1651: machine-dependent defined constants, run-time data structure declarations,
        !          1652: and defined constants and macros for flags,
        !          1653: type codes, argument accessing, and bit manipulations.
        !          1654: .PP
        !          1655: During the execution of an Icon program, many type
        !          1656: conversions are done on temporary values, where data storage is not
        !          1657: required beyond the bounds of the current operation.
        !          1658: For this reason, the type conversion routines all operate with pointers passed
        !          1659: to them that reference
        !          1660: buffers in the calling procedure.
        !          1661: Any routine calling for type conversion must determine if
        !          1662: heap or string space storage is needed, and perform the
        !          1663: allocation.
        !          1664: Most of the conversion routines return the type of the result or
        !          1665: \fMNULL\fR
        !          1666: if the conversion cannot be performed.
        !          1667: One exception is
        !          1668: \fMcvstr\fR
        !          1669: which, in addition to
        !          1670: \fMNULL\fR,
        !          1671: returns
        !          1672: \fM2\fR
        !          1673: if the object was already a string, and
        !          1674: \fM1\fR
        !          1675: if the object
        !          1676: had to be converted to a string.
        !          1677: This distinction makes it possible to avoid a large number of
        !          1678: predictive-need checks.
        !          1679: The second exception is
        !          1680: \fMcvnum\fR,
        !          1681: which returns either
        !          1682: real numbers or integers
        !          1683: and makes no attempt to distinguish between
        !          1684: short and long integers.
        !          1685: .PP
        !          1686: As mentioned in Section 3.3, there is space set aside to hold
        !          1687: temporary descriptors and to protect the validity of these
        !          1688: descriptors during garbage collection.
        !          1689: The garbage collector knows about this region, and
        !          1690: .I tends
        !          1691: it during storage reclamation.
        !          1692: The region is defined in the file
        !          1693: \fMh/gc.h\fR,
        !          1694: and is bounded by the labels
        !          1695: \fMtended\fR
        !          1696: and
        !          1697: \fMetended\fR.
        !          1698: This area can be referenced from C by considering
        !          1699: \fMtended\fR
        !          1700: to be an array of descriptors.
        !          1701: Since a garbage collection can occur
        !          1702: only during a call to
        !          1703: \fMsneed\fR
        !          1704: or
        !          1705: \fMhneed\fR,
        !          1706: or between suspension and reactivation,
        !          1707: the only places where C routines need to ensure that all pointers
        !          1708: into the heap or string space are tended
        !          1709: are just before calls to
        !          1710: \fMsneed\fR,
        !          1711: \fMhneed\fR,
        !          1712: or
        !          1713: \fMsuspend\fR.
        !          1714: .PP
        !          1715: All function names are preceded by the letter
        !          1716: \fMX\fR,
        !          1717: and their procedure blocks are preceded by the letter
        !          1718: \fMB\fR.
        !          1719: This prevents name collisions between Icon procedures and other routines,
        !          1720: such as those for operators, type conversions, and storage management.
        !          1721: Reference from the generated code to functions is made entirely through
        !          1722: the procedure block;
        !          1723: the entry point field of the procedure block references the function itself.
        !          1724: .NH
        !          1725: Modifying the Implementation
        !          1726: .PP
        !          1727: This section is intended to serve as a brief guide
        !          1728: for those who wish to modify the Icon system.
        !          1729: It is not comprehensive;
        !          1730: it only points to various parts of the
        !          1731: implementation that need to be considered
        !          1732: when making various kinds of changes.
        !          1733: .PP
        !          1734: Perhaps the most common kind of change that one might expect to make
        !          1735: is to add new functions (built-in procedures).
        !          1736: To add a function, first write it according to the conventions
        !          1737: described in Section 3.4.
        !          1738: (Use an existing function similar to the new one as a prototype.
        !          1739: Appendix F contains several example functions.)  Be
        !          1740: especially careful to observe the rules
        !          1741: concerning storage allocation and tended descriptors.
        !          1742: .PP
        !          1743: .c  note PI here
        !          1744: Prepare to add the new function to the run-time library
        !          1745: by moving the source code into the
        !          1746: \fMfunctions\fR
        !          1747: directory and adding its name to
        !          1748: \fMfunctions/Makefile\fR
        !          1749: (the name must be added in three places \(em
        !          1750: there are many examples already in the \fMMakefile\fR).
        !          1751: Then add the name to \fMh/pdef.h\fR
        !          1752: in proper alphabetical order.
        !          1753: Use other functions as a guide to the format of the entry.
        !          1754: .PP
        !          1755: When all changes have been made to the source code,
        !          1756: go to the Icon root directory and run
        !          1757: .Ds
        !          1758: make Icon
        !          1759: .De
        !          1760: This runs
        !          1761: \fMmake\fR
        !          1762: in each of the system directories \(em
        !          1763: \fMtran\fR,
        !          1764: \fMlink\fR,
        !          1765: \fMfunctions\fR,
        !          1766: \fMoperators\fR,
        !          1767: \fMlib\fR,
        !          1768: \fMrt\fR, and \fMiconx\fR
        !          1769: \(em and then copies the new versions into the
        !          1770: \fMbin\fR
        !          1771: directory [10].
        !          1772: .PP
        !          1773: Adding a new operator is more complicated.
        !          1774: Again, the first step is to write the routine, place it in the
        !          1775: \fMoperators\fR
        !          1776: directory, and add the appropriate entry to the
        !          1777: \fMMakefile\fR
        !          1778: there.
        !          1779: Next, the operator must be added to the translator, as follows:
        !          1780: .IP (1) \w'(1)'u+1n
        !          1781: Add the operator to the operator table in
        !          1782: \fMtran/optab\fR;
        !          1783: the structure of the table is described in Section 1.1.
        !          1784: .IP (2)
        !          1785: Create a unique name for the new token
        !          1786: and make a new token table entry in
        !          1787: \fMtran/tokens\fR
        !          1788: in the operators section of the table.
        !          1789: Although the operators section of the table is in alphabetical order
        !          1790: by token name
        !          1791: as distributed,
        !          1792: there is no need to preserve this order.
        !          1793: .IP (3)
        !          1794: If a running Version 5 of Icon is not available,
        !          1795: edit the files
        !          1796: \fMtran/optab.c\fR
        !          1797: and
        !          1798: \fMtran/toktab.c\fR
        !          1799: to correspond to the changes made in steps 1 and 2.
        !          1800: This sometimes involves a renumbering of token table entries
        !          1801: in both files (but nowhere else).
        !          1802: If a running Version 5 of Icon is available,
        !          1803: a
        !          1804: \fMmake\fR
        !          1805: in
        !          1806: \fMtran\fR
        !          1807: executes
        !          1808: \fMmktoktab\fR
        !          1809: to produce the new token tables.
        !          1810: .IP (4)
        !          1811: Add the operator to the grammar in
        !          1812: \fMtran/icon.g\fR.
        !          1813: The token name must be added to the list of terminal symbols
        !          1814: at the beginning of the grammar file,
        !          1815: and the operator must be inserted into the syntax at the
        !          1816: appropriate precedence level.
        !          1817: If the precedence is the same as that of an existing operator,
        !          1818: simply add the operator as an alternative to the existing production;
        !          1819: otherwise, insert a new production,
        !          1820: and change the production at the next lower precedence level
        !          1821: to refer to the new one.
        !          1822: The semantic action should create either a
        !          1823: .I BINOP
        !          1824: or a
        !          1825: .I UNOP
        !          1826: node in the parse tree;
        !          1827: use existing actions as a prototype.
        !          1828: .IP (5)
        !          1829: The new operator must now be added to the code generator in
        !          1830: \fMtran/code.c\fR.
        !          1831: Insert a case in either of the routines
        !          1832: \fMbinop\fR
        !          1833: or
        !          1834: \fMunop\fR
        !          1835: for the new token name
        !          1836: that assigns a new intermediate code opcode to
        !          1837: \fMname\fR,
        !          1838: as for other operators \(em
        !          1839: this causes the new opcode to be emitted into the ucode.
        !          1840: The opcode should have the same name as the library routine
        !          1841: that performs the operation.
        !          1842: .IP (6)
        !          1843: The new intermediate code opcode must also be added to the linker.
        !          1844: Add a defined constant to
        !          1845: \fMlink/opcode.h\fR;
        !          1846: order here is not important.
        !          1847: Then add the opcode name and the defined constant to
        !          1848: \fMlink/opcode.c\fR;
        !          1849: alphabetical order must be preserved here,
        !          1850: since a binary search is used.
        !          1851: Then edit the code generator in
        !          1852: \fMlink/lcode.c\fR,
        !          1853: adding a case in the routine
        !          1854: \fMgencode\fR
        !          1855: with either the binary or the unary operators.
        !          1856: The standard processing here emits code that
        !          1857: evaluates the operand(s),
        !          1858: then calls a library routine
        !          1859: with the same name as the intermediate code opcode.
        !          1860: .IP (7)
        !          1861: Add entries for the operator to \fMh/pnames.h\fR, using other operators as a guide.
        !          1862: .LP
        !          1863: The system is then be ready to be made as described above.
        !          1864: .PP
        !          1865: Adding a new control structure is similar in nature to
        !          1866: adding a new operator.
        !          1867: Most often, a new reserved word must be added to
        !          1868: \fMtran/tokens\fR;
        !          1869: this part of the token table must be kept in alphabetical order.
        !          1870: The new token must be added to the grammar,
        !          1871: and productions must be added,
        !          1872: usually at the highest precedence level
        !          1873: (the same as
        !          1874: \fMif\fR,
        !          1875: for example).
        !          1876: The semantic action for the new production probably will
        !          1877: involve creating a parse tree node of a new type.
        !          1878: The new node type should be added to
        !          1879: \fMtran/tree.h\fR
        !          1880: and a new case in the routine
        !          1881: \fMtraverse\fR
        !          1882: (in
        !          1883: \fMtran/code.c\fR)
        !          1884: should be added to generate intermediate code.
        !          1885: The intermediate code generated can use any of the existing
        !          1886: opcodes or can use new ones created specifically for the new control structure.
        !          1887: If new opcodes are created,
        !          1888: they must be added to the linker as described above,
        !          1889: and a new case in the routine
        !          1890: \fMgencode\fR
        !          1891: must generate code for it.
        !          1892: The generated code can be either entirely in-line
        !          1893: or can call a new library routine.
        !          1894: If new code generation templates are needed,
        !          1895: modify the code emission routines
        !          1896: in
        !          1897: \fMlink/lcode.c\fR.
        !          1898: If the code calls a new library routine,
        !          1899: add it to the
        !          1900: \fMlib\fR
        !          1901: directory and the
        !          1902: \fMMakefile\fR
        !          1903: there.
        !          1904: Then the system is ready to be made.
        !          1905: .PP
        !          1906: Modifying the semantics of existing control structures,
        !          1907: operators, or functions,
        !          1908: often involves changing only the generated in-line code
        !          1909: or a library routine.
        !          1910: Modifying the syntax without disturbing any semantics
        !          1911: usually requires only a change to the grammar.
        !          1912: .PP
        !          1913: Adding a new datatype means making many of the above changes.
        !          1914: A new datatype code must be added to
        !          1915: \fMh/rt.h\fR
        !          1916: and
        !          1917: \fMh/defs.s\fR,
        !          1918: and a new data block format must be defined in \fMh/rt.h\fR.
        !          1919: The size and location of the first descriptor of the new data block
        !          1920: must be entered in
        !          1921: \fMrt/dblocks.c\fR
        !          1922: so that the garbage collector knows how to treat the block.
        !          1923: The routines in
        !          1924: \fMfunctions/image.c\fR
        !          1925: and
        !          1926: \fMrt/outimage.c\fR
        !          1927: must be extended so that images of the new datatype can be produced.
        !          1928: The routines in \fMfunctions/type.c\fR and \fMrt/equiv.c\fR must
        !          1929: also be modified to account for the new type.
        !          1930: In addition, \fMrt/anycmp.c\fR must be extended so that objects
        !          1931: of the new type can be sorted relative to other types.
        !          1932: New functions and operators on the new type may be needed,
        !          1933: and possibly new coercion routines must be added to
        !          1934: \fMrt\fR.
        !          1935: .PP
        !          1936: Adding a new keyword entails a change to
        !          1937: \fMtran/keywords\fR
        !          1938: (and, if a running Version 5 of Icon is not available, to
        !          1939: \fMtran/keyword.h\fR)
        !          1940: and a new case in
        !          1941: \fMlib/keyword.c\fR.
        !          1942: A
        !          1943: \fMmake\fR
        !          1944: in
        !          1945: \fMtran\fR
        !          1946: runs the program
        !          1947: \fMmkkeytab\fR
        !          1948: to produce both
        !          1949: \fMtran/keyword.h\fR
        !          1950: and
        !          1951: \fMtran/keyword.c\fR.
        !          1952: Many keywords require trapped variables,
        !          1953: which requires changes to
        !          1954: \fMh/rt.h\fR,
        !          1955: \fMoperators/asgn.c\fR,
        !          1956: and
        !          1957: \fMrt/deref.c\fR;
        !          1958: the trapped variable for
        !          1959: \fM&random\fR
        !          1960: is a good model.
        !          1961: .PP
        !          1962: As mentioned above,
        !          1963: the examples in this section are intended
        !          1964: to identify what parts of the system are affected
        !          1965: by certain kinds of changes or extensions.
        !          1966: A thorough understanding of the system is
        !          1967: suggested, however, for other than minor changes.
        !          1968: .bp
        !          1969: .SH
        !          1970: Acknowledgements
        !          1971: .PP
        !          1972: This report is a revision of earlier descriptions of the C
        !          1973: implementation of Icon [11, 12], which were co-authored by Cary Coutant
        !          1974: and Steve Wampler.
        !          1975: Much of the material in this report is taken from the earlier ones.
        !          1976: .PP
        !          1977: Many features of the current implementation of Icon are based
        !          1978: upon the original Ratfor implementation by
        !          1979: Dave Hanson, Tim Korb, and Walt Hansen [13, 14].
        !          1980: .sp 1
        !          1981: .SH
        !          1982: References
        !          1983: .IP 1. 5n
        !          1984: Griswold, Ralph E. and Madge T. Griswold.
        !          1985: .I "The Icon Programming Language" .
        !          1986: Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1983.
        !          1987: .IP 2. 5n
        !          1988: Kernighan, Brian W., and Dennis M. Ritchie.
        !          1989: .I "The C Programming Language" .
        !          1990: Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1978.
        !          1991: .IP 3. 5n
        !          1992: Mitchell, William W. \fIPorting the UNIX Implementation of Icon\fR.
        !          1993: Technical Report TR 83-10d, Department of Computer Science, The
        !          1994: University of Arizona, Tucson, Arizona, August 1984.
        !          1995: .IP 4. 5n
        !          1996: Griswold, Ralph E., Robert K. McConeghy, and William H. Mitchell. \fIVersion 5.9 of Icon\fR.
        !          1997: Technical Report, Department of Computer Science, The University of
        !          1998: Arizona, Tucson, Arizona, July 1984.
        !          1999: .IP 5. 5n
        !          2000: Johnson, Stephen C.
        !          2001: ``Yacc: Yet Another Compiler-Compiler''
        !          2002: .I "Unix Programmer's Manual, Seventh Edition" .
        !          2003: Bell Telephone Laboratories, Inc., Murray Hill, New Jersey, 1979.
        !          2004: .IP 6. 5n
        !          2005: Aho, A. V., and S. C. Johnson.
        !          2006: ``LR Parsing''"
        !          2007: .I "Computing Surveys"
        !          2008: .B 6 ,
        !          2009: 2 (June 1974), 99-124.
        !          2010: .IP 7. 5n
        !          2011: Hanson, David R.
        !          2012: ``Storage Management for an Implementation of SNOBOL4'',
        !          2013: .I "Software\(emPractice and Experience"
        !          2014: .B 7 ,
        !          2015: 2 (March 1977), 179-192.
        !          2016: .IP 8. 5n
        !          2017: Hanson, David R.
        !          2018: ``Variable Associations in SNOBOL4'',
        !          2019: .I "Software\(emPractice and Experience"
        !          2020: .B 6 ,
        !          2021: 2 (April 1976), 245-254.
        !          2022: .IP 9. 5n
        !          2023: Hanson, David R.
        !          2024: .I "The Manipulation of Variable-Length String Data in Fortran IV" .
        !          2025: Technical Report,
        !          2026: Department of Computer Science,
        !          2027: The University of Arizona, Tucson, Arizona,
        !          2028: May 1975.
        !          2029: .IP 10. 5n
        !          2030: Griswold, Ralph E. and William H. Mitchell. \fIInstallation and Maintenance
        !          2031: Guide for Version 5.9 of Icon\fR. Technical Report TR 84-13,
        !          2032: Department of Computer Science,
        !          2033: The University
        !          2034: of Arizona,
        !          2035: Tucson, Arizona. August 1984.
        !          2036: .IP 11. 5n
        !          2037: Coutant, Cary A. and Stephen B. Wampler. \fIA Tour Through the
        !          2038: C Implementation of Icon; Version 5\fR. Technical Report TR 81-11a,
        !          2039: Department of Computer Science, The University of Arizona, Tucson,
        !          2040: Arizona, December 1981.
        !          2041: .IP 12. 5n
        !          2042: Griswold, Ralph E., William H. Mitchell, and Stephen B. Wampler.
        !          2043: \fIThe C Implementation of Icon; A Tour Through Version 5\fR.
        !          2044: Technical Report TR 83-11a, Department of Computer Science,
        !          2045: The University of Arizona, Tucson, Arizona. December 1983.
        !          2046: .IP 13. 5n
        !          2047: Korb, John Timothy.
        !          2048: .I "The Design and Implementation of a Goal-Directed Programming Language" .
        !          2049: Ph.D. Dissertation, Technical Report TR 79-11,
        !          2050: Department of Computer Science,
        !          2051: The University of Arizona, Tucson, Arizona,
        !          2052: June 1979.
        !          2053: .IP 14. 5n
        !          2054: Hanson, David R., and Walter J. Hansen.
        !          2055: .I "Icon Implementation Notes" .
        !          2056: Technical Report TR 79-12a,
        !          2057: Department of Computer Science,
        !          2058: The University of Arizona, Tucson, Arizona,
        !          2059: February 1980.

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.