Annotation of 43BSD/contrib/icon/docs/tr84-11a.roff, revision 1.1.1.1

1.1       root        1: .TR 84-11
                      2: .DA "August 5, 1984"
                      3: .Gr
                      4: .TL
                      5: A Tour Through the C Implementation of Icon; Version 5.9
                      6: .AU
                      7: Ralph E. Griswold
                      8: .AU
                      9: Robert K. McConeghy
                     10: .AU
                     11: William H. Mitchell
                     12: .AB
                     13: This report documents the C implementation
                     14: of Version 5.9 of the Icon programming language.
                     15: This report concentrates on the
                     16: major parts of the system \(em
                     17: the translator, the linker, and the interpreter.
                     18: An additional section discusses how the implementation
                     19: can be modified for new language features.
                     20: .AE
                     21: .SH
                     22: Introduction
                     23: .PP
                     24: This report describes the C implementation of Version 5.9 of
                     25: the Icon programming language [1].
                     26: Most of the system is coded in C [2] and is designed to be run under \*U.
                     27: In addition to the C portion of the system, there is some assembly
                     28: language code.
                     29: To date, the C implementation has been adapted to the
                     30: PDP-11\u\(dg\d, VAX-11, and Onyx C8002.
                     31: .Un
                     32: .FS
                     33: \u\(dg\dPDP and VAX are trademarks of Digital Equipment Corporation.
                     34: .FE
                     35: This implementation is intended to be portable
                     36: to other computers running under UNIX,
                     37: but portability was not a primary design goal.
                     38: Reference 3 describes the process of transporting this implementation
                     39: and contains detailed descriptions of the assembly language routines for
                     40: the VAX implementation.
                     41: .PP
                     42: The implementation of the Icon system consists of three parts:
                     43: a translator, a linker,
                     44: and an interpreter. The interpreter contains a run-time system that
                     45: includes routines for the operations that are needed to execute an Icon program.
                     46: The translator converts an Icon source program into
                     47: an intermediate code, called
                     48: .I ucode .
                     49: The linker combines separately translated ucode files,
                     50: binds inter-procedure references,
                     51: and produces interpretable binary output, called \fIicode\fR.
                     52: .PP
                     53: The reference language for this report is Version 5.9 of Icon [4].
                     54: This report is intended to be used in conjunction with
                     55: the source listings for Version 5.9,
                     56: although a general overview of the system
                     57: can be obtained from this document alone.
                     58: .NH
                     59: The Translator
                     60: .PP
                     61: The Icon translator is written entirely in C
                     62: and consists of
                     63: 12 files of source code and 10 header files.
                     64: The translator builds a parse tree for each Icon procedure,
                     65: then traverses the tree to generate code.
                     66: Three of the 12 source files contain only data initialization
                     67: and are automatically generated from specification files.
                     68: In addition, the LALR(1) parser is automatically generated
                     69: by the \fIYacc\fR parser generator [5].
                     70: .PP
                     71: The ucode output from the translator consists of two files.
                     72: One file, with the suffix \fM.u1\fR, contains
                     73: intermediate code corresponding to the procedures in the program.
                     74: The second file, with the suffix \fM.u2\fR,
                     75: contains global symbol table information.
                     76: Both files are printable.
                     77: .PP
                     78: The following sections discuss the four parts of the translator:
                     79: the lexical analyzer,
                     80: the parser,
                     81: the code generator,
                     82: and the symbol table manager.
                     83: .NH 2
                     84: The Lexical Analyzer
                     85: .PP
                     86: The lexical analyzer reads the Icon source program,
                     87: breaks it into tokens,
                     88: and delivers the tokens to the parser as requested.
                     89: A token is the basic syntactic unit of the Icon language;
                     90: it may be an identifier, a literal, a reserved word,
                     91: or an operator (operators include punctuation).
                     92: .PP
                     93: The lexical analyzer consists of four source files:
                     94: \fMlex.c\fR,
                     95: \fMchar.c\fR,
                     96: \fMoptab.c\fR,
                     97: and
                     98: \fMtoktab.c\fR.
                     99: The latter two of these files contain operator and token tables, respectively,
                    100: and are automatically generated
                    101: from operator and token specification files,
                    102: described below.
                    103: The file
                    104: \fMchar.c\fR
                    105: contains character mapping tables and the file
                    106: \fMlex.c\fR
                    107: contains the lexical analyzer itself.
                    108: .PP
                    109: The parser requests a token by calling
                    110: \fMyylex\fR,
                    111: which finds the next token in the source program
                    112: and determines its token type and value.
                    113: The parser bases its moves on the token type:
                    114: if the token is an operator or reserved word,
                    115: the token type specifically identifies the operator or reserved word;
                    116: otherwise, the token type indicates one of the six \*(oqprimitive\*(cq types:
                    117: identifier, integer literal, real literal, string literal, cset literal, or end-of-file.
                    118: The token value is a leaf node of the parse tree,
                    119: which, for the primitive types,
                    120: contains the source program representation of the token.
                    121: The token value node also contains
                    122: the source-program line and column numbers where the token starts.
                    123: A pointer to this node is placed in the global variable
                    124: \fMyychar\fR,
                    125: and
                    126: \fMyylex\fR
                    127: returns the token type.
                    128: .PP
                    129: The lexical analyzer finds the next token
                    130: by skipping white space, including comments.
                    131: The first character of the new token indicates which of the classes it belongs to.
                    132: A letter or underscore begins an identifier or reserved word,
                    133: a digit begins an integer or real literal,
                    134: a double quote begins a string literal,
                    135: a single quote begins a cset literal,
                    136: and any other character is assumed to begin an operator.
                    137: An identifier or reserved word is completed
                    138: by gathering all subsequent letters, digits, and underscores.
                    139: The reserved word table is consulted to determine if the token
                    140: is an identifier or a reserved word.
                    141: Numeric literals are recognized by a finite-state automaton,
                    142: which distinguishes real from integer literals by the
                    143: presence of a decimal point or the letter \*(oqe\*(cq.
                    144: A quoted literal is completed by reading
                    145: until the opening delimiter is repeated,
                    146: converting escapes in the process and continuing to new lines as necessary.
                    147: A table-driven finite-state automaton, described below,
                    148: recognizes operators.
                    149: .PP
                    150: An important task of the lexical analyzer is semicolon insertion.
                    151: The grammar requires that semicolons separate expressions
                    152: in a compound expression or procedure body,
                    153: so they must be inserted into the token stream
                    154: where they are omitted in the source program.
                    155: This process is table driven.
                    156: Associated with each token type are two flags,
                    157: .I BEGINNER
                    158: and
                    159: .I ENDER .
                    160: The
                    161: .I BEGINNER
                    162: flag is true if a token may legally begin an expression
                    163: (i.e., if it may follow a semicolon).
                    164: Similarly, the
                    165: .I ENDER
                    166: flag is true if a token may legally end an expression
                    167: (i.e., if it may precede a semicolon).
                    168: When a newline appears between two tokens, the
                    169: .I ENDER
                    170: flag of the first is true, and the
                    171: .I BEGINNER
                    172: flag of the second is true,
                    173: then a semicolon is inserted between the two tokens.
                    174: .PP
                    175: The token table is initialized in the file
                    176: \fMtoktab.c\fR.
                    177: The table is divided into three sections:
                    178: primitive types, reserved words, and operators.
                    179: The primitive types are fixed in the first six
                    180: slots in the table,
                    181: and must not be changed,
                    182: since they are referenced directly from the code.
                    183: The reserved words follow
                    184: and must be in alphabetical order.
                    185: The operators follow in no special order.
                    186: The last entry merely marks the end of the table.
                    187: .PP
                    188: Also in
                    189: \fMtoktab.c\fR
                    190: is an index to reserved words.
                    191: To speed up the search for reserved words,
                    192: this table hashes the search
                    193: using the first letter as the hash value.
                    194: The reserved words that begin with that letter then are examined linearly.
                    195: .PP
                    196: The operator table, in
                    197: \fMoptab.c\fR,
                    198: describes a finite-state automaton that
                    199: recognizes each operator in the language.
                    200: Each state is represented by an array of structures.
                    201: Each structure in the array
                    202: corresponds to a transition on the input symbol.
                    203: The structure contains three fields:
                    204: an input symbol,
                    205: an action,
                    206: and a value used by the action.
                    207: The recognizer starts in state 0;
                    208: the current input symbol is the first character
                    209: of the operator.
                    210: In a given state with a given input symbol,
                    211: the recognizer searches the array associated with the current state
                    212: for an entry that matches the current input symbol.
                    213: Failing a match, the last entry of the array, with
                    214: the input symbol field of 0,
                    215: is used.
                    216: The recognizer then performs one of the following actions,
                    217: depending on the value of the action field:
                    218: .in .5i
                    219: .IP \(bu \w'\(bu'u+1m
                    220: goes to the new state indicated by the value field
                    221: and gets the next input character
                    222: .IP \(bu
                    223: issues an error
                    224: .IP \(bu
                    225: returns the value field as
                    226: a pointer to the token table entry for the operator
                    227: .IP \(bu
                    228: returns the value field,
                    229: but pushes the current input character back onto the input.
                    230: .in 0
                    231: .LP
                    232: The difference between the last two actions is that
                    233: some operators are recognized immediately (e.g., \*(oq\fM;\fR\*(cq),
                    234: while others are not recognized until the character
                    235: following the operator is read (e.g., \*(oq\fM=\fR\*(cq).
                    236: .PP
                    237: The token table
                    238: and operator table
                    239: are automatically constructed by the Icon program
                    240: \fMmktoktab.icn\fR.
                    241: This program reads the specification file
                    242: \fMtokens\fR
                    243: and builds the file
                    244: \fMtoktab.c\fR.
                    245: The file
                    246: \fMtokens\fR
                    247: contains a list of all the tokens,
                    248: their token types (given as defined constants),
                    249: and any associated flags.
                    250: This list is divided into the three sections detailed above.
                    251: The program then reads the specification file
                    252: \fMoptab\fR
                    253: and builds the file
                    254: \fMoptab.c\fR.
                    255: The former is a skeleton for the operator table;
                    256: it contains the state tables,
                    257: but the program fills in the pointers to the token table entries.
                    258: .NH 2
                    259: The Parser
                    260: .PP
                    261: The parser, in the file
                    262: \fMparse.c\fR,
                    263: is automatically generated by \fIYacc\fR.
                    264: The grammar and semantic actions are contained in the file
                    265: \fMicon.g\fR.
                    266: From these specifications,
                    267: \fIYacc\fR generates parser tables for an LALR(1) parser.
                    268: .PP
                    269: In addition to the grammar,
                    270: \fMicon.g\fR
                    271: contains
                    272: a list of all the token types in the language
                    273: and declarations necessary to the actions.
                    274: \fIYacc\fR assigns an integer value to each token type,
                    275: and generates define statements,
                    276: which are written to the file
                    277: \fMtoken.h\fR.
                    278: These defined constants are the token types
                    279: returned by the lexical analyzer.
                    280: .PP
                    281: The grammar is context-free,
                    282: with actions associated with most of the rules.
                    283: An action is invoked when the corresponding rule is reduced.
                    284: The actions perform two duties:
                    285: maintaining the symbol tables
                    286: and constructing the parse tree.
                    287: The parse tree is built from the bottom up \(em
                    288: the leaves are supplied by the lexical analyzer
                    289: and the actions build trees from the leaves and from smaller trees
                    290: with each reduction.
                    291: .PP
                    292: The parser requests tokens from the lexical analyzer,
                    293: building a parse tree until it reduces a procedure.
                    294: At this point, it passes the root of the parse tree
                    295: to the code generator.
                    296: Once the intermediate code has been generated,
                    297: the parse tree is discarded,
                    298: and a new tree is begun for the next procedure.
                    299: .PP
                    300: Record and global declarations
                    301: affect only the symbol table
                    302: and do not generate parse trees.
                    303: Files named in link directives produce link instructions in the ucode
                    304: output.
                    305: .PP
                    306: A complete parse tree is rooted at a
                    307: \f3proc\fR
                    308: node, which identifies the procedure
                    309: and points to the subtrees for the
                    310: \fMinitial\fR
                    311: clause (if any)
                    312: and the body of the procedure.
                    313: Each node in the parse tree represents a source program construction
                    314: or some implicit semantic action.
                    315: A node can contain up to six fields,
                    316: the first of which is the node type.
                    317: The second and third fields are always line and column numbers
                    318: that are used for error messages and tracing.
                    319: Any additional fields contain information about the construction,
                    320: and possibly pointers to subtrees.
                    321: Appendix A contains a description of all the node types.
                    322: .PP
                    323: The grammar, shown in Appendix B, has several ambiguities.
                    324: The well-known \*(oqdangling else\*(cq problem exists
                    325: not only in the
                    326: \fMif-then-else\fR
                    327: expression, but also in the
                    328: \fMwhile-do\fR,
                    329: \fMuntil-do\fR,
                    330: \fMevery-do\fR,
                    331: and \fMto-by\fR
                    332: expressions.
                    333: In each of these expressions, the last clause is optional,
                    334: so that when the parser sees an
                    335: \fMelse\fR,
                    336: for example,
                    337: it does not know whether to shift the token
                    338: (associating it with the most recent
                    339: \fMif\fR),
                    340: or to reduce the preceding
                    341: \fMif-then\fR
                    342: expression (leaving the else \*(oqdangling\*(cq).
                    343: The latter choice is obviously incorrect, since the
                    344: \fMelse\fR
                    345: would never be shifted, and \fIYacc\fR correctly resolves such conflicts
                    346: in favor of the shift.
                    347: Thus, each
                    348: \fMelse\fR
                    349: is paired with the most recent unpaired
                    350: \fMif\fR.
                    351: All the control structures (except
                    352: \fMcase\fR)
                    353: have an additional ambiguity:
                    354: they do not have a closed syntax,
                    355: yet they may appear in an expression at the highest precedence level.
                    356: For example, the expression
                    357: .DS
                    358: .tr -\-+\(pl*\(**
                    359: \fMx := y + if a = b then z else -z * 3
                    360: .tr --++**
                    361: .ft R
                    362: .DE
                    363: could parse in either of two ways:
                    364: .DS
                    365: .tr -\-+\(pl*\(**
                    366: \fMx := y + (if a = b then z else (-z * 3))
                    367: x := y + (if a = b then z else -z) * 3
                    368: .tr --++**
                    369: .ft R
                    370: .DE
                    371: This problem, too, is resolved in favor of the shift,
                    372: such that the first parse is always used.
                    373: Thus, in the absence of parentheses,
                    374: the entire expression to the right of a control structure
                    375: is part of the control structure.
                    376: .PP
                    377: Little attention has been paid to error recovery.
                    378: A few error productions have been placed in the grammar
                    379: to enable \fIYacc\fR to recover from syntax errors;
                    380: the technique for doing so is described by Aho and Johnson [6].
                    381: The parser is slightly modified by the editor script
                    382: \fMpscript\fR
                    383: so that the parser state is passed to the routine
                    384: \fMyyerror\fR.
                    385: This routine prints an error message from the file
                    386: \fMsynerr.h\fR
                    387: that is associated with the current parser state.
                    388: This error table currently is constructed by
                    389: hand from the \fMy.output\fR file obtained by
                    390: running \fIYacc\fR with the
                    391: \f3\-v\fR
                    392: option.
                    393: .NH 2
                    394: The Code Generator
                    395: .PP
                    396: The parser calls the code generator upon
                    397: recognition of each Icon procedure,
                    398: giving it the root of the parse tree.
                    399: The code generator traverses the parse tree recursively,
                    400: emitting ucode.
                    401: Appendix C contains a description of ucode.
                    402: .PP
                    403: The file
                    404: \fMcode.c\fR
                    405: contains both the tree node allocation and the code generation routines.
                    406: The included header file
                    407: \fMcode.h\fR
                    408: contains macros and definitions needed by the code generator,
                    409: while
                    410: \fMtree.h\fR
                    411: defines the tree nodes and the macros that allocate them.
                    412: The macros in
                    413: \fMtree.h\fR
                    414: provide the interface between the parser and the code generator.
                    415: .PP
                    416: The tree traversal routine,
                    417: \fMtraverse\fR,
                    418: is a recursive procedure with one argument,
                    419: a pointer to the root of a tree or subtree
                    420: for which code is to be generated.
                    421: The routine examines the type field of the root
                    422: and, through a switch statement,
                    423: generates a sequence of ucode instructions as determined by the type.
                    424: If the node has subtrees,
                    425: \fMtraverse\fR
                    426: calls itself recursively at the appropriate point
                    427: to generate code for the subtree.
                    428: For example, the code generated for a binary operator
                    429: first generates code for its two subexpressions,
                    430: then emits the code that calls the appropriate run-time library routine.
                    431: .PP
                    432: The returned value of the traversal routine is used for counting
                    433: elements of expression lists.
                    434: If the root of the tree being traversed is an
                    435: \f3elist\fR
                    436: (expression list) node,
                    437: \fMtraverse\fR
                    438: returns the sum of the returned values of its two subtrees.
                    439: Otherwise, it returns 1.
                    440: This count is used when generating code for
                    441: procedure calls and lists with explicit elements,
                    442: which need to know the number of arguments
                    443: to be pushed onto the stack.
                    444: .PP
                    445: When generating code for loops,
                    446: the code generator needs to save three pieces of information
                    447: for each nested loop:
                    448: the
                    449: .I "break label" ,
                    450: the
                    451: .I "next label" ,
                    452: and the expression nesting level.
                    453: This information is kept on the
                    454: .I "loop stack" .
                    455: The break label is a label placed just past the end of the loop;
                    456: it is the place where control is passed when the loop is finished.
                    457: The next label is placed near the end of the loop,
                    458: at a point where the next iteration of the loop can be started.
                    459: The code for
                    460: \fMbreak\fR
                    461: and
                    462: \fMnext\fR
                    463: expressions branches to these labels,
                    464: but in either case, any incomplete expression frames (see Section 3.2)
                    465: within the loop must first be popped from the stack.
                    466: The expression nesting level counts the number of currently active
                    467: expression frames within the current loop;
                    468: an
                    469: \f3unmark\fR
                    470: instruction is generated for that many expression frames
                    471: (less one for a
                    472: \fMnext\fR
                    473: expression).
                    474: .PP
                    475: The possibility of nested
                    476: \fMcase\fR
                    477: expressions requires that certain information be kept on a
                    478: .I "case stack" .
                    479: For each case expression,
                    480: the code generator allocates a label for the end of the expression
                    481: and pushes it onto the case stack.
                    482: When a
                    483: \fMdefault\fR
                    484: clause is encountered,
                    485: its subtree is placed on the top of the case stack to delay
                    486: code generation for it until the end of the \fMcase\fR expression.
                    487: .NH 2
                    488: The Symbol Table Manager
                    489: .PP
                    490: The symbol table manager consists of the symbol table data structures
                    491: and routines that operate upon these data structures.
                    492: The source code for the symbol table manager is contained in two files.
                    493: The file
                    494: \fMkeyword.c\fR
                    495: contains only the keyword name table and is automatically constructed from a
                    496: keyword specification file discussed below.
                    497: The remainder of the symbol table manager is located in the file
                    498: \fMsym.c\fR.
                    499: .PP
                    500: The symbol table manager operates with two logical data structures,
                    501: the symbol table proper and the string space.
                    502: When the lexical analyzer identifies a token as either
                    503: an identifier or a literal,
                    504: the lexical analyzer requests the symbol table manager
                    505: to enter the token into the string space.
                    506: The symbol table manager returns a pointer into
                    507: the string space for that string.
                    508: The lexical analyzer then places this pointer in the token value node.
                    509: To help keep the size of the string space small,
                    510: all entries are hashed,
                    511: and only one copy of any string is kept.
                    512: This has the added benefit that two strings can be compared
                    513: by checking only the pointers into the string space.
                    514: .PP
                    515: The parser determines the context of the token
                    516: and requests the symbol table manager
                    517: to enter the token into the symbol table proper.
                    518: It is the responsibility of the symbol table manager
                    519: to verify that the use of the token is consistent with prior use.
                    520: Appropriate diagnostics are issued if the use is inconsistent.
                    521: .PP
                    522: The symbol table proper is physically divided into three separate structures:
                    523: the
                    524: .I global ,
                    525: .I local ,
                    526: and
                    527: .I literal
                    528: tables.
                    529: Each of these tables is hashed,
                    530: using the pointer into the string space as the key.
                    531: Since this pointer is an offset into the string space,
                    532: hashing is simply and effectively performed by taking the rightmost
                    533: .I n
                    534: bits of the offset
                    535: (where
                    536: .if n .I n "" 2**
                    537: .if t 2\s-2\u\fIn\fR\d\s+2
                    538: is the size of the hash vector for the table).
                    539: .PP
                    540: The global table contains identifiers that have been declared as
                    541: globals, procedures, or records.
                    542: The local table holds all identifiers declared as locals,
                    543: formal parameters for procedure declarations, field names for record
                    544: declarations, and all other identifiers referenced in the procedure
                    545: (including those declared as global elsewhere).
                    546: The literal table contains entries for literal strings and csets, integers,
                    547: and floating-point constants.
                    548: .PP
                    549: Both the local and literal tables are associated with the current
                    550: procedure being parsed
                    551: and are written to the \fM.u1\fR file
                    552: when the procedure has been successfully parsed.
                    553: If a record declaration has been parsed, then the local table,
                    554: containing only the field name identifiers, is written to the
                    555: \&\fM.u2\fR file.
                    556: After all procedure, record, and global declarations in a Icon source
                    557: file have been parsed, the global table is written into the global
                    558: declarations file.
                    559: .PP
                    560: An entry into any of the three symbol table sections is a structure with
                    561: three fields:
                    562: a link,
                    563: a name,
                    564: and a flag.
                    565: The link field holds the pointer to the next entry in the same hash bucket.
                    566: The name is the pointer to the identifier or literal name in the string space.
                    567: The flag field contains the type
                    568: .I "formal parameter" , (
                    569: .I "static local" ,
                    570: .I "procedure name" ,
                    571: etc.) of the entry.
                    572: Global table entries have a fourth field,
                    573: an integer providing the number of formal parameters for a procedure declaration,
                    574: or the number of fields in a record declaration.
                    575: .PP
                    576: Lookup in the local and global tables is merely the process of
                    577: following a hash chain until an entry of the same name is found or until
                    578: the hash chain is exhausted.
                    579: If a previous entry is found, the flags of the existing and new entries
                    580: are compared, and diagnostics are printed if the use of the new entry
                    581: conflicts with the previous usage.
                    582: The new entry is ignored whenever such an inconsistency is found.
                    583: .PP
                    584: The literal table uses the same lookup procedure, except
                    585: the search down the hash chain stops when an entry is found
                    586: with the same textual form and flag fields.
                    587: Thus the string literal
                    588: \fM"123"\fR
                    589: and the integer literal
                    590: \fM123\fR
                    591: have separate entries in the literal table, even though
                    592: they have the same string representations.
                    593: A consequence of this technique is that the integer
                    594: literals
                    595: \fM123\fR
                    596: and
                    597: \fM0123\fR
                    598: have separate entries in the literal table,
                    599: even though they have the same numeric value.
                    600: Since most programmers use a reasonably consistent style
                    601: when expressing literals,
                    602: this technique usually does not produce many
                    603: duplicate constants.
                    604: .PP
                    605: .tr ##
                    606: A final task of the symbol table manager is
                    607: the identification of keyword names.
                    608: (Note that keywords are of the form \fM&\fIname\fR.)
                    609: The symbol table manager maintains a list of the legal keyword names
                    610: and, upon request, returns a numeric identification for a keyword name
                    611: to the parser.
                    612: An automatic procedure exists for creating the keyword name table:
                    613: the Icon program
                    614: \fMmkkeytab.icn\fR
                    615: reads the specification file
                    616: \fMkeywords\fR
                    617: and produces the keyword name table in
                    618: \fMkeyword.c\fR.
                    619: The file
                    620: \fMkeywords\fR
                    621: is simply a list of the keyword names and a numeric identification for each.
                    622: Since the number of keyword names is small, and only a few
                    623: references to keywords are typical in an Icon program, lookup in the
                    624: keyword name table is done using a linear search.
                    625: .PP
                    626: The sizes of the respective portions of the symbol table
                    627: may be altered with command line arguments to the Icon translator.
                    628: .NH
                    629: Linker
                    630: .PP
                    631: The Icon linker is written entirely in C. It consists of eight files of
                    632: source code and three header files.
                    633: The linker performs three tasks:
                    634: combining the global symbol tables from one or more
                    635: runs of the translator,
                    636: resolving undeclared identifiers,
                    637: and translating ucode to icode.
                    638: The resulting combined global symbol table is used
                    639: for determining the scope of undeclared identifiers
                    640: during the second task.
                    641: The second and third tasks are done during a single pass
                    642: over each intermediate code file.
                    643: A single file of assembly code is produced.
                    644: .PP
                    645: The symbol table module, in the file
                    646: \fMlsym.c\fR,
                    647: is similar to the symbol table module of the translator,
                    648: except that there is an additional table for storing
                    649: field names of records.
                    650: The input module, in the file
                    651: \fMllex.c\fR,
                    652: recognizes the instructions in both the global symbol table files
                    653: and the intermediate code files.
                    654: The global symbol tables are merged by the routine \fMglobals\fR in
                    655: \fMglob.c\fR,
                    656: and the intermediate code files are produced by the routines in
                    657: \fMlcode.c\fR.
                    658: Of the remaining source files,
                    659: \fMilink.c\fR
                    660: and
                    661: \fMlmem.c\fR
                    662: contain the main program, miscellaneous support routines,
                    663: and memory initialization.
                    664: The files
                    665: \fMbuiltin.c\fR
                    666: and
                    667: \fMopcode.c\fR
                    668: contain table initializations for the list of built-in procedures
                    669: (functions)
                    670: and the ucode operations, respectively.
                    671: .PP
                    672: The first phase of the linker reads
                    673: the global symbol table file from each translator run,
                    674: and entering all the global symbols into one combined table.
                    675: The format of a global symbol table file is described in Appendix C.
                    676: This phase also builds the record/field table that cross-references
                    677: records and field names,
                    678: and sets the trace flag for execution-time tracing
                    679: if any of the files being linked were translated with the
                    680: \f3\-t\fR
                    681: option.
                    682: .PP
                    683: As records are entered into the global symbol table
                    684: and the record/field table,
                    685: they are numbered, starting from 1.
                    686: These record numbers are used to index the record/field table
                    687: at run-time when referencing a field.
                    688: .PP
                    689: When the linker encounters a link instruction, the named file is added
                    690: to the end of a linked list of files to be linked. The list initially
                    691: consists of the files named as arguments. Names are not added to the
                    692: list if they are already on it.
                    693: .PP
                    694: The second phase reads each intermediate code file in sequence,
                    695: emitting icode as each procedure is encountered.
                    696: Appendix C describes the intermediate code.
                    697: The intermediate code contains a prologue for each procedure,
                    698: beginning with a
                    699: \f3proc\fR
                    700: opcode, followed by a series of
                    701: \f3loc\fR
                    702: opcodes describing the local symbol table, a series of
                    703: \f3con\fR
                    704: opcodes describing the constant table, and a
                    705: \f3declend\fR
                    706: opcode terminating the prologue.
                    707: The local symbol table contains not only local symbols,
                    708: but all identifiers referenced in the procedure \(em
                    709: global, local, or undeclared.
                    710: When an undeclared identifier is entered into the local symbol table,
                    711: its scope is resolved by the following steps:
                    712: .in .5i
                    713: .IP \(bu \w'\(bu'u+1m
                    714: if the identifier has been entered in the global symbol table,
                    715: it is entered into the local symbol table as a global identifier
                    716: .IP \(bu
                    717: if the identifier matches the name of a function,
                    718: it is entered into the local symbol table as a function
                    719: .IP \(bu
                    720: otherwise it is entered as a local identifier
                    721: and a warning is issued if the linker was run with the
                    722: \f3\-u\fR
                    723: option
                    724: .in 0
                    725: .LP
                    726: The constant table contains an entry for each literal used
                    727: in the procedure.
                    728: .PP
                    729: The linker outputs icode in several regions.
                    730: The first region contains constants, procedure blocks, and code
                    731: for the Icon procedures.
                    732: The next region contains the record/field table and procedure blocks
                    733: for record constructors.
                    734: The next four regions contain
                    735: the global identifiers, the names of the global identifiers,
                    736: the static identifiers, and the identifier table.
                    737: The icode is a sequence of instructions,
                    738: each with an opcode and, in some cases, operands. The sizes of
                    739: opcodes and operands depend on the machine architecture and the
                    740: implementor's judgement. On the
                    741: VAX-11, opcodes are one byte long and operands are four bytes long.
                    742: Most instructions correspond exactly to instructions
                    743: in the ucode that is output by the translator.
                    744: The opcode values are those used internally by the linker
                    745: (defined in the file
                    746: \fMlink/opcode.h\fR).
                    747: .PP
                    748: Fields are provided in the global symbol and literal tables
                    749: for associating a location with each entry.
                    750: As the prologue is being read, each cset, real, or long-integer
                    751: literal entered into the literal table is output immediately
                    752: and its location is stored in the literal table.
                    753: Thus, the locations of all constants are known before their reference.
                    754: .PP
                    755: The same is true of references to procedures,
                    756: since these references only occur in the initialization for global identifiers,
                    757: which is not output until all procedures have been output.
                    758: When the prologue for a procedure has been completely processed,
                    759: the procedure data block is output, and its location is noted in the
                    760: global symbol table.
                    761: .PP
                    762: References to program labels require backpatching,
                    763: since there often are forward references.
                    764: Because program label references are always local to the current procedure,
                    765: the linker buffers the output code for a procedure.
                    766: A table of values for all program labels is initialized to zero at
                    767: the beginning of each procedure.
                    768: When a label is referenced and its table entry is zero,
                    769: the location of the reference is negated and stored in the table entry
                    770: and a zero is output for the operand.
                    771: If a label's table entry is negative,
                    772: the location of the reference is negated and stored in the table entry
                    773: as before, but the previous value of the table entry is output for the operand.
                    774: This forms a linked list of references to the as-yet-undefined label.
                    775: When a label is defined, each reference on the linked list is replaced
                    776: with the correct value of the label.
                    777: .PP
                    778: References to global and static identifiers are determined at run-time.
                    779: The
                    780: \f3glob\fR
                    781: and
                    782: \f3static\fR
                    783: instructions have an integer operand referring to the identifier by
                    784: position in the global or static identifier region.
                    785: When one of these instructions is interpreted,
                    786: the actual address is calculated from the position and the
                    787: known address of the global or static identifier region.
                    788: References to functions are also resolved at run-time.
                    789: Each function is assigned an integer index
                    790: (its position in the table of functions in
                    791: \fMbuiltin.c\fR).
                    792: When the global identifier initialization for a function is output,
                    793: the negated index is output instead of an address.
                    794: The interpreter fills in the correct address during program initialization.
                    795: .PP
                    796: Once the prologue has been processed,
                    797: a procedure data block (see Section 3.1)
                    798: is emitted.
                    799: Opcodes following the prologue represent execution-time operations,
                    800: and cause code to be emitted.
                    801: .PP
                    802: The record/field table is a two-dimensional matrix,
                    803: first indexed by a field number assigned to each identifier that
                    804: is used as a field name,
                    805: next by a record number assigned to each record type.
                    806: The value at the selected position in the table is the
                    807: index of the field in a record of the given type, or \-1
                    808: if the given record type does not contain the given field.
                    809: .PP
                    810: The initial value for global and static identifiers
                    811: is the null value unless the global identifier is a procedure,
                    812: function, or record constructor,
                    813: in which case the initial value is a descriptor of type
                    814: .I procedure
                    815: pointing to the appropriate procedure data block.
                    816: The values output use the data representations described
                    817: in Section 3.1.
                    818: .PP
                    819: The names of global and static identifiers are output as
                    820: .I "string qualifier"
                    821: descriptors (see Section 3.1)
                    822: and are used by the function
                    823: \fMdisplay\fR.
                    824: All string qualifiers contained in the generated procedure data blocks
                    825: and global and static names point into the identifier table,
                    826: which is just a static string space for that purpose.
                    827: .NH
                    828: The Interpreter
                    829: .PP
                    830: The interpreter consists of an interpretive loop and a collection of
                    831: run-time routines that
                    832: collectively provide support for the execution
                    833: of an Icon program.
                    834: .ig
                    835: This library is searched by the loader for those
                    836: routines necessary for a particular Icon program.
                    837: The assembly code generated by the linker contains
                    838: subroutine calls to library routines to perform most
                    839: high-level operations where in-line code would be inappropriate.
                    840: An executable program is created by assembling
                    841: the linker output, then loading a startup routine and
                    842: the assembler output with the run-time library and the C library.
                    843: The startup routine, run-time library, and C library together
                    844: form the run-time system.
                    845: ..
                    846: .PP
                    847: Three directories contain routines relating directly to source-language
                    848: operations:
                    849: \fMfunctions\fR,
                    850: \fMoperators\fR,
                    851: and
                    852: \fMlib\fR.
                    853: The first two directories contain
                    854: one routine per function or operator, respectively.
                    855: The
                    856: \fMlib\fR
                    857: directory contains routines relating to Icon control structures.
                    858: A fourth directory,
                    859: \fMrt\fR,
                    860: contains routines for performing common operations
                    861: needed by many routines in the other three directories.
                    862: In particular,
                    863: \fMrt\fR
                    864: contains routines that handle storage allocation and reclamation,
                    865: type conversion, data comparison,
                    866: integer arithmetic with overflow checking,
                    867: generator suspension,
                    868: and tracing.
                    869: The directory \fMiconx\fR contains initialization and the interpreter proper.
                    870: .PP
                    871: In each of the four run-time directories, all of the object files are
                    872: archived in a \fMLib\fR file which is randomized to speed loading.
                    873: The \fMLib\fR files are loaded together with a startup routine and the
                    874: interpretive loop to produce the interpreter.
                    875: .PP
                    876: Most of the run-time system is coded in C,
                    877: but some of the routines are coded in assembly language.
                    878: The interpretive loop and startup routines are written in assembly language, as is
                    879: integer arithmetic with overflow checking
                    880: (C does not provide this),
                    881: as well as other routines concerned with stack management.
                    882: .PP
                    883: The interpreter is loaded with the run-time libraries and the C
                    884: library to form the program \fMiconx\fR, which interprets
                    885: icode.
                    886: .PP
                    887: Before the interpreter begins executing the Icon program,
                    888: it reads in the icode file generated by the linker.
                    889: The first eight words of this file contain header information
                    890: indicating the total size of the rest of the file,
                    891: the initial value of
                    892: \fM&trace\fR,
                    893: and the relative offsets from the beginning of the file to the
                    894: various regions.
                    895: These offsets are converted to actual addresses by adding the base
                    896: address of the icode buffer.
                    897: Several pointers in the icode must also be relocated.
                    898: The interpreter sweeps through the global identifiers,
                    899: looking for procedures, functions, and record constructors.
                    900: For each function, it supplies the address of the appropriate procedure block.
                    901: For each procedure, it relocates pointers
                    902: from its procedure block to the procedure entry point,
                    903: as well as to strings representing the procedure
                    904: and local identifier names in the identifier table.
                    905: For each record, it supplies the address of
                    906: \fMmkrec\fR,
                    907: the routine that constructs new records,
                    908: as the entry point field in the procedure block.
                    909: .PP
                    910: The interpreter then begins execution by invoking the value of the first global
                    911: identifier, which corresponds to the procedure
                    912: \fMmain\fR.
                    913: If there is no main procedure, the first global identifier has the
                    914: null value and a run-time error is reported.
                    915: The routine
                    916: \fMinvoke\fR
                    917: sets
                    918: the
                    919: \fIinterpreter program counter \fR(\fIipc\fR)
                    920: to the entry point, and branches to
                    921: \fMinterp\fR, which is contained in \fMiconx/interp.s\fR.
                    922: .PP
                    923: The routine
                    924: \fMinterp\fR
                    925: is the main interpreter loop.
                    926: It fetches the next opcode,
                    927: and branches to the appropriate processing routine through a jump table.
                    928: .NH 2
                    929: Data Representations
                    930: .PP
                    931: Icon has two elementary forms of data objects \(em values and variables.
                    932: Values often can be converted from one data type to another.
                    933: When this is done automatically, it is called
                    934: .I coercion .
                    935: There are three kinds of variables, each discussed below:
                    936: .I "natural variables" ,
                    937: .I "created variables" ,
                    938: and
                    939: .I "trapped variables" .
                    940: The process of obtaining the value referred to by a variable is called
                    941: .I dereferencing .
                    942: .PP
                    943: Every data object is represented by a two-word
                    944: .I descriptor ,
                    945: which may, depending on the type of the object,
                    946: contain a value or
                    947: refer to some other area of memory for the actual value.
                    948: The first word of the descriptor always indicates the data type,
                    949: and the second word either contains the value or a pointer to it.
                    950: There are six descriptor formats, shown in Appendix D:
                    951: .I null ,
                    952: .I "string qualifier" ,
                    953: .I "integer" ,
                    954: .I value ,
                    955: .I variable ,
                    956: and
                    957: .I "trapped variable" .
                    958: These formats are distinguished from one another by the three high-order
                    959: bits of the first word,
                    960: except that a
                    961: .I null
                    962: descriptor is distinguished from a
                    963: .I "string qualifier"
                    964: only by the contents of the second word.
                    965: Among
                    966: .I "integer" ,
                    967: .I value ,
                    968: and
                    969: .I "trapped variable"
                    970: descriptors, the low-order six bits of the first word
                    971: identify the type of object represented,
                    972: while the remaining high-order bits in the first word are flags that
                    973: classify the object (for example, whether the second word contains
                    974: a pointer
                    975: \(em historically, a \*(oqfloating address\*(cq [7]).
                    976: .PP
                    977: The
                    978: .I null
                    979: descriptor represents the null value.
                    980: A
                    981: .I "string qualifier"
                    982: represents a string, and contains the length of the string
                    983: and a pointer to the first character of the string.
                    984: An
                    985: .I "integer"
                    986: descriptor represents an integer small enough
                    987: to fit in the second word of the descriptor.
                    988: This includes all integers on computers whose C \fIints\fR are the same
                    989: size as C \fIlongs\fR (such as the VAX-11).
                    990: All data types other than \f3integer\fR, \f3string\fR, and \f3null\fR are
                    991: represented by
                    992: .I value
                    993: descriptors.
                    994: A value descriptor
                    995: contains a pointer to a
                    996: .I "data block"
                    997: of appropriate format for a value of the given type.
                    998: On computers whose C \fIlongs\fR are longer than C \fIints\fR
                    999: (such as the PDP-11),
                   1000: an integer that requires more bits than there are in an \fIint\fR is contained
                   1001: in a \fIlong integer\fR data block.
                   1002: The data block formats for each data type are shown in Appendix D.
                   1003: .PP
                   1004: A
                   1005: .I variable
                   1006: descriptor represents either a natural variable
                   1007: or a created variable.
                   1008: A natural variable contains a pointer to a descriptor at a fixed location
                   1009: (for a global identifier) or a location on the stack (for a local identifier)
                   1010: where the value of the variable is stored.
                   1011: A created variable, formed by a table, list, or field reference,
                   1012: contains a pointer to a descriptor in the aggregate
                   1013: where the referenced element is located.
                   1014: Since such elements are in the heap,
                   1015: created variables also contain an offset that indicates
                   1016: the distance (in words) from the beginning of the data block
                   1017: to the referenced descriptor.
                   1018: This offset is used during the marking phase of garbage collection,
                   1019: discussed in Section 3.3.
                   1020: .PP
                   1021: A
                   1022: .I "trapped variable"
                   1023: [8] descriptor represents a variable for which special action is necessary
                   1024: upon dereferencing or assignment.
                   1025: Such variables include substrings,
                   1026: non-existent elements of tables,
                   1027: and certain keywords.
                   1028: Each type of trapped variable is distinguished by the first word
                   1029: of the descriptor.
                   1030: .PP
                   1031: Substring trapped variables,
                   1032: created by a section or subscripting operation,
                   1033: contain a pointer to a data block that contains a
                   1034: .I variable
                   1035: descriptor identifying the value from which the substring was taken,
                   1036: an integer indicating the beginning position of the substring,
                   1037: and an integer showing the length of the substring.
                   1038: With this information, assignment to a substring of a variable
                   1039: can modify the contents of the variable properly.
                   1040: Substrings of non-variables do not produce
                   1041: substring trapped variables, since assignment to such substrings
                   1042: is meaningless and illegal;
                   1043: instead, forming the substring of a non-variable produces a
                   1044: string qualifier.
                   1045: .PP
                   1046: Table element trapped variables,
                   1047: formed by referencing a non-existent element of a table,
                   1048: similarly contain a pointer to a data block that contains
                   1049: enough information for assignment to add the element to the
                   1050: referenced table or to supply the default table value.
                   1051: .PP
                   1052: The keywords
                   1053: \fM&pos\fR,
                   1054: \fM&random\fR,
                   1055: and
                   1056: \fM&trace\fR
                   1057: are handled via trapped variables (\fM&subject\fR is handled differently).
                   1058: These trapped variables
                   1059: need no additional information.
                   1060: It is sufficient to know the type of trapped variable on dereferencing \(em
                   1061: the value of the keyword can be accessed and returned.
                   1062: On assignment, the new value is coerced to the appropriate type,
                   1063: checked for validity, and assigned to the keyword.
                   1064: .PP
                   1065: Strings formed during program execution are placed in the
                   1066: .I "string space" ;
                   1067: string qualifiers for these strings point into this region.
                   1068: Substrings of existing strings are not allocated again;
                   1069: instead, a string qualifier is formed that points into the
                   1070: existing string.
                   1071: When storage is exhausted in the string space,
                   1072: the garbage collector (see Section 3.3)
                   1073: is invoked to reclaim unused space and compact the region;
                   1074: if enough space cannot be reclaimed, the region is expanded if possible.
                   1075: .PP
                   1076: Data blocks formed during program execution are placed in the
                   1077: .I heap .
                   1078: Data blocks have a rigid format dictated by the garbage collection
                   1079: algorithm.
                   1080: The first word of the block always contains a type code
                   1081: which identifies the structure of the rest of the block.
                   1082: Descriptors follow all non-descriptor information in the block.
                   1083: If the size of the block is not determined by its type,
                   1084: the size (in bytes) is contained in the second word of the block.
                   1085: When storage is exhausted in the heap,
                   1086: the garbage collector is invoked to reclaim unused space and compact the heap;
                   1087: if enough space cannot be reclaimed, the heap is expanded if possible.
                   1088: .PP
                   1089: Co-expression stack blocks are allocated in a separate region and
                   1090: are treated specially by the garbage collector.
                   1091: .NH 2
                   1092: Stack Organization
                   1093: .PP
                   1094: The stack is the focus of activity
                   1095: during the execution of an Icon program.
                   1096: All operators, functions, and procedures
                   1097: expect to find their arguments at the top of the stack,
                   1098: and replace the arguments with the result of their computation.
                   1099: Local variables for Icon procedures are also kept on the stack.
                   1100: The arguments, local variables, and temporaries on the stack
                   1101: for an active Icon procedure are collectively called a
                   1102: .I "procedure frame" .
                   1103: This is one of several kinds of
                   1104: .I "stack frames"
                   1105: discussed in this section.
                   1106: Appendix E summarizes the layouts of the stack frames for the PDP-11 and VAX-11.
                   1107: See [3] for a detailed discussion of stack frames.
                   1108: Each co-expression also has a stack. For uniformity, the main stack
                   1109: is treated as the stack for the co-expression \fM&main\fR.
                   1110: .PP
                   1111: On the PDP-11 and VAX-11 stacks start in high memory and grow downward.
                   1112: On these computers, a push causes the stack pointer to decrease and a
                   1113: pop causes the stack pointer to increase. Thus ``above'' and ``below''
                   1114: refer, respectively, to ``older'' and ``newer'' information on the
                   1115: stack. An exception to this is the phrase
                   1116: ``top of the stack'', which is used to
                   1117: refer to the \fIlowest\fR memory location. The description of relative
                   1118: stack locations that follows is based on this kind of architecture
                   1119: and nomenclature.
                   1120: .PP
                   1121: Before an Icon procedure calls another Icon procedure,
                   1122: the caller pushes the procedure to be called
                   1123: (a descriptor \(em procedures are data objects in Icon)
                   1124: onto the stack.
                   1125: The caller then pushes each argument (also a descriptor) onto the stack,
                   1126: leftmost argument first.
                   1127: The caller then pushes one word onto the stack
                   1128: indicating the number of arguments supplied,
                   1129: which may be different from the number of arguments expected.
                   1130: The run-time library routine
                   1131: \fMinvoke\fR
                   1132: is then called, which checks that the first descriptor pushed above
                   1133: actually does represent an integer, procedure, or a variable whose value is
                   1134: an integer or a procedure.
                   1135: An integer indicates the selection of one of the
                   1136: arguments resulting from mutual evaluation.
                   1137: A procedure, on the other hand,
                   1138: points to a procedure data block,
                   1139: which contains various information about the called procedure,
                   1140: including the number of arguments expected,
                   1141: the number of local variables used,
                   1142: and the procedure's entry point address.
                   1143: The routine
                   1144: \fMinvoke\fR
                   1145: next adjusts the number of arguments supplied to match the number expected,
                   1146: deleting excess arguments or supplying the null value for missing ones.
                   1147: This adjustment is performed by moving the portion of the stack
                   1148: below the arguments up or down, as appropriate.
                   1149: It then dereferences the arguments.
                   1150: A
                   1151: .I "procedure marker"
                   1152: is then pushed onto the stack,
                   1153: and the
                   1154: .I "procedure frame pointer"
                   1155: is set to point to the new procedure marker.
                   1156: The procedure marker contains, among other things,
                   1157: the return address in the calling procedure
                   1158: and the previous value of the procedure frame pointer.
                   1159: Next, the null value is pushed onto the stack
                   1160: as the initial value for each local variable.
                   1161: The routine
                   1162: \fMinvoke\fR
                   1163: then transfers control to the procedure's entry point,
                   1164: and execution of the Icon program resumes in the new procedure.
                   1165: .PP
                   1166: When a procedure is ready to return to its caller,
                   1167: it pushes its return value (a descriptor) on the stack.
                   1168: It then transfers control to
                   1169: \fMpret\fR,
                   1170: which moves the return value
                   1171: to the location occupied by the descriptor that represented the
                   1172: called procedure.
                   1173: That is, the return value is stored in place of the
                   1174: first descriptor that was pushed
                   1175: at the beginning of the calling sequence described above.
                   1176: The return sequence
                   1177: then restores the state of the previous procedure
                   1178: from the current procedure marker
                   1179: (the procedure marker that the procedure frame pointer
                   1180: currently points to).
                   1181: This includes restoring the previous value of the procedure frame pointer,
                   1182: retrieving the return address,
                   1183: and popping the returning procedure's local variables,
                   1184: procedure marker, and arguments.
                   1185: Thus, when the calling procedure regains control,
                   1186: the arguments have been popped
                   1187: and the return value is now at the top of the stack.
                   1188: .PP
                   1189: Functions and operators are written in C,
                   1190: and therefore obey the C calling sequence.
                   1191: By design, the Icon calling sequence described above
                   1192: is similar to the C calling sequence.
                   1193: When an Icon procedure calls a function, a
                   1194: .I boundary
                   1195: on the stack is introduced,
                   1196: where the stack above the boundary is regimented by Icon standards,
                   1197: and the stack below the boundary contains C information.
                   1198: This boundary is important during garbage collection:
                   1199: The garbage collector must ignore the area of the stack
                   1200: below the boundary,
                   1201: since the structure of this area is unknown,
                   1202: whereas the structure of the area above the boundary is
                   1203: well-defined.
                   1204: In particular, all data above the boundary is contained
                   1205: in descriptors or is defined by the structure of a frame,
                   1206: so that all pointers into the heap or string space may be
                   1207: located during a garbage collection.
                   1208: .PP
                   1209: Functions and operators are written to \*(oqstraddle\*(cq
                   1210: the boundary.
                   1211: From above, they are designed to resemble Icon procedures;
                   1212: from below, they are C procedures.
                   1213: An Icon procedure calls a function in the same way as it
                   1214: calls another Icon procedure;
                   1215: in fact, functions are procedure-typed
                   1216: data objects just as Icon procedures are.
                   1217: When
                   1218: \fMinvoke\fR
                   1219: recognizes that a function is being called,
                   1220: it bypasses the argument adjustment
                   1221: if the field in the procedure data block that indicates
                   1222: the number of arguments expected contains \-1,
                   1223: which indicates that the function can take an arbitrary number of arguments.
                   1224: It also does not allocate stack space for local variables, since
                   1225: any such variables are C variables and are allocated by the C
                   1226: function itself.
                   1227: C procedures have an entry sequence that creates a new
                   1228: procedure frame;
                   1229: since
                   1230: \fMinvoke\fR
                   1231: has already done this,
                   1232: the entry point for functions must be set past any instructions
                   1233: that are involved in procedure frame creation.
                   1234: .PP
                   1235: The first formal parameter of
                   1236: all functions is
                   1237: \fMnargs\fR,
                   1238: which corresponds to the word that contains the number of
                   1239: arguments supplied.
                   1240: For functions that expect a fixed number of arguments,
                   1241: they are also listed as arguments, in reverse order.
                   1242: For functions that can take an arbitrary number of arguments,
                   1243: there is a macro
                   1244: \fMARG(\fIn\fM)\fR
                   1245: that uses the address and contents
                   1246: of
                   1247: \fMnargs\fR
                   1248: to calculate the location of the
                   1249: .I n \^th
                   1250: argument.
                   1251: Thus,
                   1252: \fMARG(1)\fR
                   1253: accesses the first argument (as a descriptor),
                   1254: and
                   1255: \fMARG(nargs)\fR
                   1256: accesses the last argument.
                   1257: Each function is responsible for supplying defaults for missing arguments
                   1258: and for dereferencing arguments that are variables.
                   1259: Because of the calling protocol,
                   1260: \fMARG(0)\fR
                   1261: accesses the location where the return value should be stored.
                   1262: Every function must place its result there and
                   1263: then return through normal C conventions.
                   1264: Each function must also supply a procedure data block that contains
                   1265: the number of arguments expected (or \-1), its entry point,
                   1266: and a string qualifier representing its name.
                   1267: .PP
                   1268: Operators are very similar to functions. The only difference is that
                   1269: operators are called directly (rather than being called through \fMinvoke\fR)
                   1270: and must set the boundary themselves.
                   1271: .ig
                   1272: .PP
                   1273: Operators are written like functions,
                   1274: with one exception.
                   1275: Since operators are not variables
                   1276: (as function and procedure names are),
                   1277: the name of the operator is known at translation time,
                   1278: and the Icon procedure calls it directly
                   1279: (at its normal entry point)
                   1280: without going through
                   1281: \fMinvoke\fR.
                   1282: Thus, operators do not need procedure data blocks.
                   1283: Instead of pushing a procedure descriptor pointing to a procedure block,
                   1284: a null value is pushed instead to hold a place for the return value.
                   1285: ..
                   1286: .PP
                   1287: When an operator or function fails to produce a result,
                   1288: it calls
                   1289: \fMfail\fR.
                   1290: This routine initiates backtracking as described below.
                   1291: .PP
                   1292: Expressions are evaluated within an
                   1293: .I "expression frame" .
                   1294: When the evaluation of an expression is complete,
                   1295: whether it has produced a result or failed,
                   1296: the expression frame must be popped from the stack
                   1297: and the result of the expression must be pushed back onto the stack.
                   1298: The expression frame marks the stack height at the point
                   1299: that the expression began to be evaluated,
                   1300: so that the stack may be restored to its original state
                   1301: when the evaluation of the expression is complete.
                   1302: The stack normally would be restored to the original height
                   1303: (that is, the pops would match the pushes)
                   1304: except when an expression fails at some midpoint in its evaluation.
                   1305: The expression frame is also used to limit backtracking:
                   1306: backtracking is restricted in the language
                   1307: to the current expression instance only.
                   1308: .PP
                   1309: When evaluation of an expression begins, an
                   1310: .I "expression marker"
                   1311: is pushed on the stack, the
                   1312: .I "expression frame pointer"
                   1313: is set to point to it, and the
                   1314: .I "generator frame pointer" ,
                   1315: discussed below, is cleared.
                   1316: The marker contains the previous values of the
                   1317: expression and generator frame pointers
                   1318: and a failure label.
                   1319: When an expression produces a result,
                   1320: that result, on the top of the stack,
                   1321: is popped and saved.
                   1322: Then the stack is popped to the expression marker,
                   1323: and the previous values of the two frame pointers are restored.
                   1324: The marker is popped
                   1325: and the result of the expression is pushed back onto the stack,
                   1326: now a part of the previous expression frame.
                   1327: If an expression fails to produce a result,
                   1328: \fMfail\fR
                   1329: pops the stack to the expression marker,
                   1330: restores the previous values of the two frame pointers,
                   1331: and branches to the failure label.
                   1332: In the special case that the failure label is zero,
                   1333: \fMfail\fR
                   1334: is effectively called again to indicate failure in the new expression frame.
                   1335: Thus the failure is propagated from one expression to an enclosing one.
                   1336: .PP
                   1337: If an expression has any generators,
                   1338: then there is a
                   1339: .I "generator frame"
                   1340: within the current expression frame
                   1341: for each generator that is inactive
                   1342: (that is, that has produced a value but is not yet exhausted).
                   1343: A generator frame preserves the state of the stack
                   1344: at the point just before the generator
                   1345: (whether it be operator, function, or procedure)
                   1346: suspended (became inactive).
                   1347: If
                   1348: \fMfail\fR
                   1349: is called and there are inactive generators,
                   1350: then instead of exiting the current expression frame,
                   1351: the most recently inactivated generator is reactivated
                   1352: by restoring the stack to the state saved in the most recent
                   1353: generator frame.
                   1354: .PP
                   1355: A function or operator suspends itself by calling
                   1356: \fMsuspend\fR.
                   1357: This routine preserves the state of the stack
                   1358: by duplicating the current expression frame,
                   1359: bounded on one end by the most recent generator frame
                   1360: (or, if there are no inactive generators, the current expression frame)
                   1361: and bounded on the other end by the beginning of the
                   1362: argument list of the suspending function or operator.
                   1363: A generator marker is pushed onto the stack,
                   1364: followed by the duplicate expression frame.
                   1365: The routine
                   1366: \fMsuspend\fR
                   1367: then causes the suspending function or operator to return to its caller,
                   1368: instead of itself returning.
                   1369: .PP
                   1370: When reactivated by
                   1371: \fMfail\fR,
                   1372: the stack is restored to the generator marker,
                   1373: which is used to restore the various frame pointers.
                   1374: Then the marker is popped.
                   1375: The stack is then in the same state that it was in when
                   1376: \fMsuspend\fR
                   1377: was called.
                   1378: The routine
                   1379: \fMfail\fR
                   1380: then returns to the generator
                   1381: as if the original call to
                   1382: \fMsuspend\fR
                   1383: had returned.
                   1384: Thus the following schema is typical of operators and functions
                   1385: that generate a sequence of values.
                   1386: .DS
                   1387: .ss 9
                   1388: .ft M
                   1389: \fIinitialize\fM;
                   1390: \fMwhile\fM (\fInot exhausted\fM) {
                   1391:    \fIcompute next value\fM;
                   1392:    \fIstore return value\fM;
                   1393:    suspend()
                   1394:    }
                   1395: fail();
                   1396: .ft R
                   1397: .ss 4
                   1398: .DE
                   1399: The effect of resuming an expression containing generators is that
                   1400: \fMsuspend\fR
                   1401: actually causes the generator to return.
                   1402: If alternatives are needed, backtracking occurs,
                   1403: and the apparent effect is that
                   1404: \fMsuspend\fR
                   1405: has finally returned.
                   1406: The generator computes the next result,
                   1407: and suspends with it.
                   1408: When the generator is exhausted,
                   1409: it merely fails without suspending,
                   1410: which just passes the failure back to the next most recently inactivated generator,
                   1411: if any.
                   1412: .PP
                   1413: Just as functions and operators can return normally, suspend, or fail,
                   1414: so can Icon procedures.
                   1415: The mechanics are essentially the same, but the differences in stack
                   1416: layout require different primitives.
                   1417: When Icon procedures return normally, the return value is presumed to
                   1418: be at the top of the stack and
                   1419: \fMpret\fR
                   1420: is called.
                   1421: Similarly, Icon procedures call
                   1422: \fMpsusp\fR
                   1423: to suspend.
                   1424: Both of these routines also dereference the return result
                   1425: if it is a local variable.
                   1426: The routine
                   1427: \fMpfail\fR
                   1428: causes an Icon procedure to return with no result.
                   1429: .PP
                   1430: The same three primitives are also needed at the expression level:
                   1431: \fMeret\fR,
                   1432: \fMesusp\fR,
                   1433: and
                   1434: \fMefail\fR.
                   1435: Unlike
                   1436: \fMunmark\fR,
                   1437: \fMeret\fR
                   1438: is not a library routine, but is generated as in-line code.
                   1439: Both cause an exit from the current expression frame; but
                   1440: \fMeret\fR
                   1441: supplies a result to the enclosing expression,
                   1442: while
                   1443: \fMunmark\fR
                   1444: does not.
                   1445: The routine
                   1446: \fMesusp\fR
                   1447: creates a inactive generator before supplying a result to the
                   1448: enclosing expression;
                   1449: it is used by the alternation control structure.
                   1450: The routine
                   1451: \fMefail\fR
                   1452: simply causes backtracking within the current expression frame.
                   1453: In fact,
                   1454: \fMfail\fR
                   1455: and
                   1456: \fMpfail\fR
                   1457: merely exit their procedure frame before branching to
                   1458: \fMefail\fR.
                   1459: .NH 2
                   1460: Storage Allocation and Reclamation
                   1461: .PP
                   1462: During program execution, storage allocation is necessary when a data
                   1463: object is created.
                   1464: The three primitive routines
                   1465: \fMallocate\fR,
                   1466: \fMalcstr\fR, and \fMalcestk\fR
                   1467: allocate storage in the heap, string space, and co-expression stack space, respectively.
                   1468: All three routines return pointers to the beginning of newly
                   1469: allocated space.
                   1470: None of the routines is responsible for ensuring that enough space
                   1471: remains in the data regions.
                   1472: Ensuring that enough space remains in the data regions is
                   1473: the responsibility of a
                   1474: .I "predictive need"
                   1475: strategy described below.
                   1476: .PP
                   1477: In the heap,
                   1478: \fMallocate(n)\fR
                   1479: returns a pointer to
                   1480: \fMn\fR
                   1481: contiguous bytes of storage.
                   1482: Because a wide variety of objects may reside in the heap, a number of
                   1483: support routines are provided to simplify the storing of various objects.
                   1484: There is a specific routine to allocate a block for each datatype in the heap.
                   1485: Where appropriate, these routines have the actual values to be stored
                   1486: as their arguments.
                   1487: All of the routines call
                   1488: \fMallocate\fR
                   1489: to obtain storage for the object.
                   1490: .PP
                   1491: In the string space,
                   1492: \fMalcstr(s,\*bl)\fR
                   1493: allocates room for a string of length
                   1494: \fMl\fR
                   1495: and copies the string pointed to by
                   1496: \fMs\fR
                   1497: into this space.
                   1498: Since some routines such as
                   1499: \fMleft\fR,
                   1500: \fMright\fR,
                   1501: and
                   1502: \fMcenter\fR
                   1503: need room in the string space in which to construct a string,
                   1504: a call to
                   1505: \fMalcstr\fR
                   1506: with the defined constant
                   1507: \fMNULL\fR
                   1508: as the first argument results in the
                   1509: allocation of storage without attempting to copy a string.
                   1510: .PP
                   1511: In the co-expression stack region, \fMalcestk()\fR allocates a new
                   1512: co-expression stack.
                   1513: .PP
                   1514: Source code for all of the allocation routines is contained in the file
                   1515: \fMrt/alc.c\fR.
                   1516: Almost all interaction with the storage management is made through these
                   1517: routines.
                   1518: Two exceptions occur in string concatenation (\fMoperators/cat.c\fR) and
                   1519: reading a fixed number of
                   1520: bytes from a file (\fMfunctions/reads.c\fR).
                   1521: In these cases, it is simpler and more efficient to have these operations
                   1522: deal directly with storage management.
                   1523: .PP
                   1524: As mentioned earlier, a
                   1525: .I "predictive need"
                   1526: strategy is employed to ensure that enough room remains for data storage.
                   1527: Simply put,
                   1528: .I "predictive need"
                   1529: states that it is the responsibility of any routine
                   1530: that calls an allocation routine both to ensure that enough room remains
                   1531: in the proper data region and to maintain the validity of any temporary
                   1532: pointers into the data regions, should a
                   1533: .I "garbage collection"
                   1534: be necessary to free storage space.
                   1535: .PP
                   1536: Since the check for storage space only needs to occur before the allocation
                   1537: takes place, each routine may perform this check at its convenience.
                   1538: This approach permits the minimization of the number of temporary
                   1539: pointers that must be protected during garbage collection.
                   1540: As an aid, space for several descriptors is automatically protected
                   1541: by the procedure invocation mechanism, and usually is used to hold
                   1542: information pertaining to the arguments of the procedure (see Section 3.4).
                   1543: .PP
                   1544: Routines to ensure space are provided for each of the three
                   1545: storage regions.
                   1546: The routine
                   1547: \fMsneed(n)\fR
                   1548: ensures that at least
                   1549: \fMn\fR
                   1550: bytes of storage remain in the string space, and
                   1551: \fMhneed(n)\fR
                   1552: performs the same function in the heap.
                   1553: \fMesneed()\fR ensures that there is a co-expression stack
                   1554: available.
                   1555: If either routine finds that there is insufficient storage remaining,
                   1556: it invokes the
                   1557: .I "garbage collector"
                   1558: in an attempt to obtain that storage.
                   1559: If that fails, then program execution is aborted with an appropriate
                   1560: diagnostic.
                   1561: .PP
                   1562: Garbage collection, or
                   1563: .I "storage reclamation" ,
                   1564: is a process that identifies all valid allocated data.
                   1565: In the string and heap regions, valid data is compacted
                   1566: in order to provide a contiguous area of unused storage.
                   1567: The algorithm used for identifying valid data is based upon
                   1568: the algorithm described by Hanson [7].
                   1569: Only the more novel features are discussed here.
                   1570: .PP
                   1571: Whenever a predictive need request discovers that insufficient
                   1572: storage remains in either the heap or string space, the garbage
                   1573: collector is invoked to reclaim unused space in all regions.
                   1574: This approach is more efficient in situations where all regions
                   1575: are heavily allocated and only slightly less efficient otherwise.
                   1576: .PP
                   1577: The approach is to sweep through the permanent data regions and the
                   1578: stack, looking for descriptors that are either pointers into the heap or
                   1579: string qualifiers.
                   1580: When a string qualifier is found, a pointer to that qualifier is
                   1581: saved in a temporary data region at the end of the heap.
                   1582: If the descriptor is a pointer into the heap, then that heap data
                   1583: block contains valid information.
                   1584: The block is marked as valid, the descriptor is placed on a back chain
                   1585: headed in the block,
                   1586: and the marking process is called
                   1587: recursively on any descriptors within that block.
                   1588: Blocks that are already marked as valid are not processed a second time.
                   1589: To simplify the marking of heap blocks, all data blocks have been
                   1590: designed so that all descriptors within them exist as a contiguous
                   1591: section at the end of the block.
                   1592: Thus to sweep through the descriptors within a block, the marking
                   1593: algorithm need only know the size of the block and the
                   1594: location of the first descriptor.
                   1595: Information concerning a data block's size, as well as the
                   1596: offset for the first descriptor is in the file
                   1597: \fMrt/dblocks.c\fR.
                   1598: .PP
                   1599: Valid co-expression stacks also may contain string qualifiers and
                   1600: pointers to other valid data; such stacks are included in the
                   1601: marking phase.
                   1602: .PP
                   1603: After the marking phase is completed, the string region is compacted.
                   1604: The algorithm used is described by Hanson [9].
                   1605: The pointers to the string qualifiers are sorted so that the order of
                   1606: all valid strings within the string space is identified.
                   1607: The string qualifiers are then processed in order, and modified as
                   1608: the valid strings are compacted.
                   1609: If this compaction does not free enough space within the
                   1610: string space to satisfy the request, the heap must be moved in order
                   1611: to provide more room in the string space.
                   1612: An attempt is also made to provide
                   1613: some additional
                   1614: ``breathing room''
                   1615: in the string space to permit future expansion.
                   1616: .PP
                   1617: The heap cannot be moved until after the valid pointers into it are
                   1618: adjusted and the storage is compacted.
                   1619: The pointer adjustment and heap compaction phases are two linear passes through
                   1620: the heap which must be performed during standard heap garbage collection.
                   1621: The only difference when the heap is to be moved is that the adjusted
                   1622: pointers point to where that data will be after the heap has been moved.
                   1623: If not enough breathing room is freed in the heap, then
                   1624: more space is requested from the operating system.
                   1625: As a last step, if the string space needs more room, the heap is
                   1626: relocated.
                   1627: .PP
                   1628: This method has proved to be quite satisfactory for most applications.
                   1629: A shortcoming of the implementation is the absence of a process for
                   1630: decreasing the size of a data region, should it become too large.
                   1631: It is also possible that insufficient room would be available for storing
                   1632: the pointers to the string qualifiers, even though enough storage
                   1633: would become available if the heap were collected separately.
                   1634: In practice, this has not been a problem.
                   1635: The source code for the garbage collector is contained in the files
                   1636: \fMrt/gcollect.s\fR, \fMrt/gc.c\fR, and \fMrt/sweep.c\fR.
                   1637: .NH 2
                   1638: Coding Conventions
                   1639: .PP
                   1640: The calling conventions for functions and operators have been mentioned
                   1641: earlier.
                   1642: Several other aspects of the run-time system are explained here.
                   1643: .PP
                   1644: All header files for the run-time system are in the directory
                   1645: \fMh\fR.
                   1646: The file
                   1647: \fMh/rt.h\fR
                   1648: (or, for assembly-language routines,
                   1649: \fMh/defs.s\fR)
                   1650: is included by almost every source file in the run-time system, and contains
                   1651: machine-dependent defined constants, run-time data structure declarations,
                   1652: and defined constants and macros for flags,
                   1653: type codes, argument accessing, and bit manipulations.
                   1654: .PP
                   1655: During the execution of an Icon program, many type
                   1656: conversions are done on temporary values, where data storage is not
                   1657: required beyond the bounds of the current operation.
                   1658: For this reason, the type conversion routines all operate with pointers passed
                   1659: to them that reference
                   1660: buffers in the calling procedure.
                   1661: Any routine calling for type conversion must determine if
                   1662: heap or string space storage is needed, and perform the
                   1663: allocation.
                   1664: Most of the conversion routines return the type of the result or
                   1665: \fMNULL\fR
                   1666: if the conversion cannot be performed.
                   1667: One exception is
                   1668: \fMcvstr\fR
                   1669: which, in addition to
                   1670: \fMNULL\fR,
                   1671: returns
                   1672: \fM2\fR
                   1673: if the object was already a string, and
                   1674: \fM1\fR
                   1675: if the object
                   1676: had to be converted to a string.
                   1677: This distinction makes it possible to avoid a large number of
                   1678: predictive-need checks.
                   1679: The second exception is
                   1680: \fMcvnum\fR,
                   1681: which returns either
                   1682: real numbers or integers
                   1683: and makes no attempt to distinguish between
                   1684: short and long integers.
                   1685: .PP
                   1686: As mentioned in Section 3.3, there is space set aside to hold
                   1687: temporary descriptors and to protect the validity of these
                   1688: descriptors during garbage collection.
                   1689: The garbage collector knows about this region, and
                   1690: .I tends
                   1691: it during storage reclamation.
                   1692: The region is defined in the file
                   1693: \fMh/gc.h\fR,
                   1694: and is bounded by the labels
                   1695: \fMtended\fR
                   1696: and
                   1697: \fMetended\fR.
                   1698: This area can be referenced from C by considering
                   1699: \fMtended\fR
                   1700: to be an array of descriptors.
                   1701: Since a garbage collection can occur
                   1702: only during a call to
                   1703: \fMsneed\fR
                   1704: or
                   1705: \fMhneed\fR,
                   1706: or between suspension and reactivation,
                   1707: the only places where C routines need to ensure that all pointers
                   1708: into the heap or string space are tended
                   1709: are just before calls to
                   1710: \fMsneed\fR,
                   1711: \fMhneed\fR,
                   1712: or
                   1713: \fMsuspend\fR.
                   1714: .PP
                   1715: All function names are preceded by the letter
                   1716: \fMX\fR,
                   1717: and their procedure blocks are preceded by the letter
                   1718: \fMB\fR.
                   1719: This prevents name collisions between Icon procedures and other routines,
                   1720: such as those for operators, type conversions, and storage management.
                   1721: Reference from the generated code to functions is made entirely through
                   1722: the procedure block;
                   1723: the entry point field of the procedure block references the function itself.
                   1724: .NH
                   1725: Modifying the Implementation
                   1726: .PP
                   1727: This section is intended to serve as a brief guide
                   1728: for those who wish to modify the Icon system.
                   1729: It is not comprehensive;
                   1730: it only points to various parts of the
                   1731: implementation that need to be considered
                   1732: when making various kinds of changes.
                   1733: .PP
                   1734: Perhaps the most common kind of change that one might expect to make
                   1735: is to add new functions (built-in procedures).
                   1736: To add a function, first write it according to the conventions
                   1737: described in Section 3.4.
                   1738: (Use an existing function similar to the new one as a prototype.
                   1739: Appendix F contains several example functions.)  Be
                   1740: especially careful to observe the rules
                   1741: concerning storage allocation and tended descriptors.
                   1742: .PP
                   1743: .c  note PI here
                   1744: Prepare to add the new function to the run-time library
                   1745: by moving the source code into the
                   1746: \fMfunctions\fR
                   1747: directory and adding its name to
                   1748: \fMfunctions/Makefile\fR
                   1749: (the name must be added in three places \(em
                   1750: there are many examples already in the \fMMakefile\fR).
                   1751: Then add the name to \fMh/pdef.h\fR
                   1752: in proper alphabetical order.
                   1753: Use other functions as a guide to the format of the entry.
                   1754: .PP
                   1755: When all changes have been made to the source code,
                   1756: go to the Icon root directory and run
                   1757: .Ds
                   1758: make Icon
                   1759: .De
                   1760: This runs
                   1761: \fMmake\fR
                   1762: in each of the system directories \(em
                   1763: \fMtran\fR,
                   1764: \fMlink\fR,
                   1765: \fMfunctions\fR,
                   1766: \fMoperators\fR,
                   1767: \fMlib\fR,
                   1768: \fMrt\fR, and \fMiconx\fR
                   1769: \(em and then copies the new versions into the
                   1770: \fMbin\fR
                   1771: directory [10].
                   1772: .PP
                   1773: Adding a new operator is more complicated.
                   1774: Again, the first step is to write the routine, place it in the
                   1775: \fMoperators\fR
                   1776: directory, and add the appropriate entry to the
                   1777: \fMMakefile\fR
                   1778: there.
                   1779: Next, the operator must be added to the translator, as follows:
                   1780: .IP (1) \w'(1)'u+1n
                   1781: Add the operator to the operator table in
                   1782: \fMtran/optab\fR;
                   1783: the structure of the table is described in Section 1.1.
                   1784: .IP (2)
                   1785: Create a unique name for the new token
                   1786: and make a new token table entry in
                   1787: \fMtran/tokens\fR
                   1788: in the operators section of the table.
                   1789: Although the operators section of the table is in alphabetical order
                   1790: by token name
                   1791: as distributed,
                   1792: there is no need to preserve this order.
                   1793: .IP (3)
                   1794: If a running Version 5 of Icon is not available,
                   1795: edit the files
                   1796: \fMtran/optab.c\fR
                   1797: and
                   1798: \fMtran/toktab.c\fR
                   1799: to correspond to the changes made in steps 1 and 2.
                   1800: This sometimes involves a renumbering of token table entries
                   1801: in both files (but nowhere else).
                   1802: If a running Version 5 of Icon is available,
                   1803: a
                   1804: \fMmake\fR
                   1805: in
                   1806: \fMtran\fR
                   1807: executes
                   1808: \fMmktoktab\fR
                   1809: to produce the new token tables.
                   1810: .IP (4)
                   1811: Add the operator to the grammar in
                   1812: \fMtran/icon.g\fR.
                   1813: The token name must be added to the list of terminal symbols
                   1814: at the beginning of the grammar file,
                   1815: and the operator must be inserted into the syntax at the
                   1816: appropriate precedence level.
                   1817: If the precedence is the same as that of an existing operator,
                   1818: simply add the operator as an alternative to the existing production;
                   1819: otherwise, insert a new production,
                   1820: and change the production at the next lower precedence level
                   1821: to refer to the new one.
                   1822: The semantic action should create either a
                   1823: .I BINOP
                   1824: or a
                   1825: .I UNOP
                   1826: node in the parse tree;
                   1827: use existing actions as a prototype.
                   1828: .IP (5)
                   1829: The new operator must now be added to the code generator in
                   1830: \fMtran/code.c\fR.
                   1831: Insert a case in either of the routines
                   1832: \fMbinop\fR
                   1833: or
                   1834: \fMunop\fR
                   1835: for the new token name
                   1836: that assigns a new intermediate code opcode to
                   1837: \fMname\fR,
                   1838: as for other operators \(em
                   1839: this causes the new opcode to be emitted into the ucode.
                   1840: The opcode should have the same name as the library routine
                   1841: that performs the operation.
                   1842: .IP (6)
                   1843: The new intermediate code opcode must also be added to the linker.
                   1844: Add a defined constant to
                   1845: \fMlink/opcode.h\fR;
                   1846: order here is not important.
                   1847: Then add the opcode name and the defined constant to
                   1848: \fMlink/opcode.c\fR;
                   1849: alphabetical order must be preserved here,
                   1850: since a binary search is used.
                   1851: Then edit the code generator in
                   1852: \fMlink/lcode.c\fR,
                   1853: adding a case in the routine
                   1854: \fMgencode\fR
                   1855: with either the binary or the unary operators.
                   1856: The standard processing here emits code that
                   1857: evaluates the operand(s),
                   1858: then calls a library routine
                   1859: with the same name as the intermediate code opcode.
                   1860: .IP (7)
                   1861: Add entries for the operator to \fMh/pnames.h\fR, using other operators as a guide.
                   1862: .LP
                   1863: The system is then be ready to be made as described above.
                   1864: .PP
                   1865: Adding a new control structure is similar in nature to
                   1866: adding a new operator.
                   1867: Most often, a new reserved word must be added to
                   1868: \fMtran/tokens\fR;
                   1869: this part of the token table must be kept in alphabetical order.
                   1870: The new token must be added to the grammar,
                   1871: and productions must be added,
                   1872: usually at the highest precedence level
                   1873: (the same as
                   1874: \fMif\fR,
                   1875: for example).
                   1876: The semantic action for the new production probably will
                   1877: involve creating a parse tree node of a new type.
                   1878: The new node type should be added to
                   1879: \fMtran/tree.h\fR
                   1880: and a new case in the routine
                   1881: \fMtraverse\fR
                   1882: (in
                   1883: \fMtran/code.c\fR)
                   1884: should be added to generate intermediate code.
                   1885: The intermediate code generated can use any of the existing
                   1886: opcodes or can use new ones created specifically for the new control structure.
                   1887: If new opcodes are created,
                   1888: they must be added to the linker as described above,
                   1889: and a new case in the routine
                   1890: \fMgencode\fR
                   1891: must generate code for it.
                   1892: The generated code can be either entirely in-line
                   1893: or can call a new library routine.
                   1894: If new code generation templates are needed,
                   1895: modify the code emission routines
                   1896: in
                   1897: \fMlink/lcode.c\fR.
                   1898: If the code calls a new library routine,
                   1899: add it to the
                   1900: \fMlib\fR
                   1901: directory and the
                   1902: \fMMakefile\fR
                   1903: there.
                   1904: Then the system is ready to be made.
                   1905: .PP
                   1906: Modifying the semantics of existing control structures,
                   1907: operators, or functions,
                   1908: often involves changing only the generated in-line code
                   1909: or a library routine.
                   1910: Modifying the syntax without disturbing any semantics
                   1911: usually requires only a change to the grammar.
                   1912: .PP
                   1913: Adding a new datatype means making many of the above changes.
                   1914: A new datatype code must be added to
                   1915: \fMh/rt.h\fR
                   1916: and
                   1917: \fMh/defs.s\fR,
                   1918: and a new data block format must be defined in \fMh/rt.h\fR.
                   1919: The size and location of the first descriptor of the new data block
                   1920: must be entered in
                   1921: \fMrt/dblocks.c\fR
                   1922: so that the garbage collector knows how to treat the block.
                   1923: The routines in
                   1924: \fMfunctions/image.c\fR
                   1925: and
                   1926: \fMrt/outimage.c\fR
                   1927: must be extended so that images of the new datatype can be produced.
                   1928: The routines in \fMfunctions/type.c\fR and \fMrt/equiv.c\fR must
                   1929: also be modified to account for the new type.
                   1930: In addition, \fMrt/anycmp.c\fR must be extended so that objects
                   1931: of the new type can be sorted relative to other types.
                   1932: New functions and operators on the new type may be needed,
                   1933: and possibly new coercion routines must be added to
                   1934: \fMrt\fR.
                   1935: .PP
                   1936: Adding a new keyword entails a change to
                   1937: \fMtran/keywords\fR
                   1938: (and, if a running Version 5 of Icon is not available, to
                   1939: \fMtran/keyword.h\fR)
                   1940: and a new case in
                   1941: \fMlib/keyword.c\fR.
                   1942: A
                   1943: \fMmake\fR
                   1944: in
                   1945: \fMtran\fR
                   1946: runs the program
                   1947: \fMmkkeytab\fR
                   1948: to produce both
                   1949: \fMtran/keyword.h\fR
                   1950: and
                   1951: \fMtran/keyword.c\fR.
                   1952: Many keywords require trapped variables,
                   1953: which requires changes to
                   1954: \fMh/rt.h\fR,
                   1955: \fMoperators/asgn.c\fR,
                   1956: and
                   1957: \fMrt/deref.c\fR;
                   1958: the trapped variable for
                   1959: \fM&random\fR
                   1960: is a good model.
                   1961: .PP
                   1962: As mentioned above,
                   1963: the examples in this section are intended
                   1964: to identify what parts of the system are affected
                   1965: by certain kinds of changes or extensions.
                   1966: A thorough understanding of the system is
                   1967: suggested, however, for other than minor changes.
                   1968: .bp
                   1969: .SH
                   1970: Acknowledgements
                   1971: .PP
                   1972: This report is a revision of earlier descriptions of the C
                   1973: implementation of Icon [11, 12], which were co-authored by Cary Coutant
                   1974: and Steve Wampler.
                   1975: Much of the material in this report is taken from the earlier ones.
                   1976: .PP
                   1977: Many features of the current implementation of Icon are based
                   1978: upon the original Ratfor implementation by
                   1979: Dave Hanson, Tim Korb, and Walt Hansen [13, 14].
                   1980: .sp 1
                   1981: .SH
                   1982: References
                   1983: .IP 1. 5n
                   1984: Griswold, Ralph E. and Madge T. Griswold.
                   1985: .I "The Icon Programming Language" .
                   1986: Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1983.
                   1987: .IP 2. 5n
                   1988: Kernighan, Brian W., and Dennis M. Ritchie.
                   1989: .I "The C Programming Language" .
                   1990: Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1978.
                   1991: .IP 3. 5n
                   1992: Mitchell, William W. \fIPorting the UNIX Implementation of Icon\fR.
                   1993: Technical Report TR 83-10d, Department of Computer Science, The
                   1994: University of Arizona, Tucson, Arizona, August 1984.
                   1995: .IP 4. 5n
                   1996: Griswold, Ralph E., Robert K. McConeghy, and William H. Mitchell. \fIVersion 5.9 of Icon\fR.
                   1997: Technical Report, Department of Computer Science, The University of
                   1998: Arizona, Tucson, Arizona, July 1984.
                   1999: .IP 5. 5n
                   2000: Johnson, Stephen C.
                   2001: ``Yacc: Yet Another Compiler-Compiler''
                   2002: .I "Unix Programmer's Manual, Seventh Edition" .
                   2003: Bell Telephone Laboratories, Inc., Murray Hill, New Jersey, 1979.
                   2004: .IP 6. 5n
                   2005: Aho, A. V., and S. C. Johnson.
                   2006: ``LR Parsing''"
                   2007: .I "Computing Surveys"
                   2008: .B 6 ,
                   2009: 2 (June 1974), 99-124.
                   2010: .IP 7. 5n
                   2011: Hanson, David R.
                   2012: ``Storage Management for an Implementation of SNOBOL4'',
                   2013: .I "Software\(emPractice and Experience"
                   2014: .B 7 ,
                   2015: 2 (March 1977), 179-192.
                   2016: .IP 8. 5n
                   2017: Hanson, David R.
                   2018: ``Variable Associations in SNOBOL4'',
                   2019: .I "Software\(emPractice and Experience"
                   2020: .B 6 ,
                   2021: 2 (April 1976), 245-254.
                   2022: .IP 9. 5n
                   2023: Hanson, David R.
                   2024: .I "The Manipulation of Variable-Length String Data in Fortran IV" .
                   2025: Technical Report,
                   2026: Department of Computer Science,
                   2027: The University of Arizona, Tucson, Arizona,
                   2028: May 1975.
                   2029: .IP 10. 5n
                   2030: Griswold, Ralph E. and William H. Mitchell. \fIInstallation and Maintenance
                   2031: Guide for Version 5.9 of Icon\fR. Technical Report TR 84-13,
                   2032: Department of Computer Science,
                   2033: The University
                   2034: of Arizona,
                   2035: Tucson, Arizona. August 1984.
                   2036: .IP 11. 5n
                   2037: Coutant, Cary A. and Stephen B. Wampler. \fIA Tour Through the
                   2038: C Implementation of Icon; Version 5\fR. Technical Report TR 81-11a,
                   2039: Department of Computer Science, The University of Arizona, Tucson,
                   2040: Arizona, December 1981.
                   2041: .IP 12. 5n
                   2042: Griswold, Ralph E., William H. Mitchell, and Stephen B. Wampler.
                   2043: \fIThe C Implementation of Icon; A Tour Through Version 5\fR.
                   2044: Technical Report TR 83-11a, Department of Computer Science,
                   2045: The University of Arizona, Tucson, Arizona. December 1983.
                   2046: .IP 13. 5n
                   2047: Korb, John Timothy.
                   2048: .I "The Design and Implementation of a Goal-Directed Programming Language" .
                   2049: Ph.D. Dissertation, Technical Report TR 79-11,
                   2050: Department of Computer Science,
                   2051: The University of Arizona, Tucson, Arizona,
                   2052: June 1979.
                   2053: .IP 14. 5n
                   2054: Hanson, David R., and Walter J. Hansen.
                   2055: .I "Icon Implementation Notes" .
                   2056: Technical Report TR 79-12a,
                   2057: Department of Computer Science,
                   2058: The University of Arizona, Tucson, Arizona,
                   2059: February 1980.

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.