|
|
1.1 ! root 1: .TR 84-11 ! 2: .DA "August 5, 1984" ! 3: .Gr ! 4: .TL ! 5: A Tour Through the C Implementation of Icon; Version 5.9 ! 6: .AU ! 7: Ralph E. Griswold ! 8: .AU ! 9: Robert K. McConeghy ! 10: .AU ! 11: William H. Mitchell ! 12: .AB ! 13: This report documents the C implementation ! 14: of Version 5.9 of the Icon programming language. ! 15: This report concentrates on the ! 16: major parts of the system \(em ! 17: the translator, the linker, and the interpreter. ! 18: An additional section discusses how the implementation ! 19: can be modified for new language features. ! 20: .AE ! 21: .SH ! 22: Introduction ! 23: .PP ! 24: This report describes the C implementation of Version 5.9 of ! 25: the Icon programming language [1]. ! 26: Most of the system is coded in C [2] and is designed to be run under \*U. ! 27: In addition to the C portion of the system, there is some assembly ! 28: language code. ! 29: To date, the C implementation has been adapted to the ! 30: PDP-11\u\(dg\d, VAX-11, and Onyx C8002. ! 31: .Un ! 32: .FS ! 33: \u\(dg\dPDP and VAX are trademarks of Digital Equipment Corporation. ! 34: .FE ! 35: This implementation is intended to be portable ! 36: to other computers running under UNIX, ! 37: but portability was not a primary design goal. ! 38: Reference 3 describes the process of transporting this implementation ! 39: and contains detailed descriptions of the assembly language routines for ! 40: the VAX implementation. ! 41: .PP ! 42: The implementation of the Icon system consists of three parts: ! 43: a translator, a linker, ! 44: and an interpreter. The interpreter contains a run-time system that ! 45: includes routines for the operations that are needed to execute an Icon program. ! 46: The translator converts an Icon source program into ! 47: an intermediate code, called ! 48: .I ucode . ! 49: The linker combines separately translated ucode files, ! 50: binds inter-procedure references, ! 51: and produces interpretable binary output, called \fIicode\fR. ! 52: .PP ! 53: The reference language for this report is Version 5.9 of Icon [4]. ! 54: This report is intended to be used in conjunction with ! 55: the source listings for Version 5.9, ! 56: although a general overview of the system ! 57: can be obtained from this document alone. ! 58: .NH ! 59: The Translator ! 60: .PP ! 61: The Icon translator is written entirely in C ! 62: and consists of ! 63: 12 files of source code and 10 header files. ! 64: The translator builds a parse tree for each Icon procedure, ! 65: then traverses the tree to generate code. ! 66: Three of the 12 source files contain only data initialization ! 67: and are automatically generated from specification files. ! 68: In addition, the LALR(1) parser is automatically generated ! 69: by the \fIYacc\fR parser generator [5]. ! 70: .PP ! 71: The ucode output from the translator consists of two files. ! 72: One file, with the suffix \fM.u1\fR, contains ! 73: intermediate code corresponding to the procedures in the program. ! 74: The second file, with the suffix \fM.u2\fR, ! 75: contains global symbol table information. ! 76: Both files are printable. ! 77: .PP ! 78: The following sections discuss the four parts of the translator: ! 79: the lexical analyzer, ! 80: the parser, ! 81: the code generator, ! 82: and the symbol table manager. ! 83: .NH 2 ! 84: The Lexical Analyzer ! 85: .PP ! 86: The lexical analyzer reads the Icon source program, ! 87: breaks it into tokens, ! 88: and delivers the tokens to the parser as requested. ! 89: A token is the basic syntactic unit of the Icon language; ! 90: it may be an identifier, a literal, a reserved word, ! 91: or an operator (operators include punctuation). ! 92: .PP ! 93: The lexical analyzer consists of four source files: ! 94: \fMlex.c\fR, ! 95: \fMchar.c\fR, ! 96: \fMoptab.c\fR, ! 97: and ! 98: \fMtoktab.c\fR. ! 99: The latter two of these files contain operator and token tables, respectively, ! 100: and are automatically generated ! 101: from operator and token specification files, ! 102: described below. ! 103: The file ! 104: \fMchar.c\fR ! 105: contains character mapping tables and the file ! 106: \fMlex.c\fR ! 107: contains the lexical analyzer itself. ! 108: .PP ! 109: The parser requests a token by calling ! 110: \fMyylex\fR, ! 111: which finds the next token in the source program ! 112: and determines its token type and value. ! 113: The parser bases its moves on the token type: ! 114: if the token is an operator or reserved word, ! 115: the token type specifically identifies the operator or reserved word; ! 116: otherwise, the token type indicates one of the six \*(oqprimitive\*(cq types: ! 117: identifier, integer literal, real literal, string literal, cset literal, or end-of-file. ! 118: The token value is a leaf node of the parse tree, ! 119: which, for the primitive types, ! 120: contains the source program representation of the token. ! 121: The token value node also contains ! 122: the source-program line and column numbers where the token starts. ! 123: A pointer to this node is placed in the global variable ! 124: \fMyychar\fR, ! 125: and ! 126: \fMyylex\fR ! 127: returns the token type. ! 128: .PP ! 129: The lexical analyzer finds the next token ! 130: by skipping white space, including comments. ! 131: The first character of the new token indicates which of the classes it belongs to. ! 132: A letter or underscore begins an identifier or reserved word, ! 133: a digit begins an integer or real literal, ! 134: a double quote begins a string literal, ! 135: a single quote begins a cset literal, ! 136: and any other character is assumed to begin an operator. ! 137: An identifier or reserved word is completed ! 138: by gathering all subsequent letters, digits, and underscores. ! 139: The reserved word table is consulted to determine if the token ! 140: is an identifier or a reserved word. ! 141: Numeric literals are recognized by a finite-state automaton, ! 142: which distinguishes real from integer literals by the ! 143: presence of a decimal point or the letter \*(oqe\*(cq. ! 144: A quoted literal is completed by reading ! 145: until the opening delimiter is repeated, ! 146: converting escapes in the process and continuing to new lines as necessary. ! 147: A table-driven finite-state automaton, described below, ! 148: recognizes operators. ! 149: .PP ! 150: An important task of the lexical analyzer is semicolon insertion. ! 151: The grammar requires that semicolons separate expressions ! 152: in a compound expression or procedure body, ! 153: so they must be inserted into the token stream ! 154: where they are omitted in the source program. ! 155: This process is table driven. ! 156: Associated with each token type are two flags, ! 157: .I BEGINNER ! 158: and ! 159: .I ENDER . ! 160: The ! 161: .I BEGINNER ! 162: flag is true if a token may legally begin an expression ! 163: (i.e., if it may follow a semicolon). ! 164: Similarly, the ! 165: .I ENDER ! 166: flag is true if a token may legally end an expression ! 167: (i.e., if it may precede a semicolon). ! 168: When a newline appears between two tokens, the ! 169: .I ENDER ! 170: flag of the first is true, and the ! 171: .I BEGINNER ! 172: flag of the second is true, ! 173: then a semicolon is inserted between the two tokens. ! 174: .PP ! 175: The token table is initialized in the file ! 176: \fMtoktab.c\fR. ! 177: The table is divided into three sections: ! 178: primitive types, reserved words, and operators. ! 179: The primitive types are fixed in the first six ! 180: slots in the table, ! 181: and must not be changed, ! 182: since they are referenced directly from the code. ! 183: The reserved words follow ! 184: and must be in alphabetical order. ! 185: The operators follow in no special order. ! 186: The last entry merely marks the end of the table. ! 187: .PP ! 188: Also in ! 189: \fMtoktab.c\fR ! 190: is an index to reserved words. ! 191: To speed up the search for reserved words, ! 192: this table hashes the search ! 193: using the first letter as the hash value. ! 194: The reserved words that begin with that letter then are examined linearly. ! 195: .PP ! 196: The operator table, in ! 197: \fMoptab.c\fR, ! 198: describes a finite-state automaton that ! 199: recognizes each operator in the language. ! 200: Each state is represented by an array of structures. ! 201: Each structure in the array ! 202: corresponds to a transition on the input symbol. ! 203: The structure contains three fields: ! 204: an input symbol, ! 205: an action, ! 206: and a value used by the action. ! 207: The recognizer starts in state 0; ! 208: the current input symbol is the first character ! 209: of the operator. ! 210: In a given state with a given input symbol, ! 211: the recognizer searches the array associated with the current state ! 212: for an entry that matches the current input symbol. ! 213: Failing a match, the last entry of the array, with ! 214: the input symbol field of 0, ! 215: is used. ! 216: The recognizer then performs one of the following actions, ! 217: depending on the value of the action field: ! 218: .in .5i ! 219: .IP \(bu \w'\(bu'u+1m ! 220: goes to the new state indicated by the value field ! 221: and gets the next input character ! 222: .IP \(bu ! 223: issues an error ! 224: .IP \(bu ! 225: returns the value field as ! 226: a pointer to the token table entry for the operator ! 227: .IP \(bu ! 228: returns the value field, ! 229: but pushes the current input character back onto the input. ! 230: .in 0 ! 231: .LP ! 232: The difference between the last two actions is that ! 233: some operators are recognized immediately (e.g., \*(oq\fM;\fR\*(cq), ! 234: while others are not recognized until the character ! 235: following the operator is read (e.g., \*(oq\fM=\fR\*(cq). ! 236: .PP ! 237: The token table ! 238: and operator table ! 239: are automatically constructed by the Icon program ! 240: \fMmktoktab.icn\fR. ! 241: This program reads the specification file ! 242: \fMtokens\fR ! 243: and builds the file ! 244: \fMtoktab.c\fR. ! 245: The file ! 246: \fMtokens\fR ! 247: contains a list of all the tokens, ! 248: their token types (given as defined constants), ! 249: and any associated flags. ! 250: This list is divided into the three sections detailed above. ! 251: The program then reads the specification file ! 252: \fMoptab\fR ! 253: and builds the file ! 254: \fMoptab.c\fR. ! 255: The former is a skeleton for the operator table; ! 256: it contains the state tables, ! 257: but the program fills in the pointers to the token table entries. ! 258: .NH 2 ! 259: The Parser ! 260: .PP ! 261: The parser, in the file ! 262: \fMparse.c\fR, ! 263: is automatically generated by \fIYacc\fR. ! 264: The grammar and semantic actions are contained in the file ! 265: \fMicon.g\fR. ! 266: From these specifications, ! 267: \fIYacc\fR generates parser tables for an LALR(1) parser. ! 268: .PP ! 269: In addition to the grammar, ! 270: \fMicon.g\fR ! 271: contains ! 272: a list of all the token types in the language ! 273: and declarations necessary to the actions. ! 274: \fIYacc\fR assigns an integer value to each token type, ! 275: and generates define statements, ! 276: which are written to the file ! 277: \fMtoken.h\fR. ! 278: These defined constants are the token types ! 279: returned by the lexical analyzer. ! 280: .PP ! 281: The grammar is context-free, ! 282: with actions associated with most of the rules. ! 283: An action is invoked when the corresponding rule is reduced. ! 284: The actions perform two duties: ! 285: maintaining the symbol tables ! 286: and constructing the parse tree. ! 287: The parse tree is built from the bottom up \(em ! 288: the leaves are supplied by the lexical analyzer ! 289: and the actions build trees from the leaves and from smaller trees ! 290: with each reduction. ! 291: .PP ! 292: The parser requests tokens from the lexical analyzer, ! 293: building a parse tree until it reduces a procedure. ! 294: At this point, it passes the root of the parse tree ! 295: to the code generator. ! 296: Once the intermediate code has been generated, ! 297: the parse tree is discarded, ! 298: and a new tree is begun for the next procedure. ! 299: .PP ! 300: Record and global declarations ! 301: affect only the symbol table ! 302: and do not generate parse trees. ! 303: Files named in link directives produce link instructions in the ucode ! 304: output. ! 305: .PP ! 306: A complete parse tree is rooted at a ! 307: \f3proc\fR ! 308: node, which identifies the procedure ! 309: and points to the subtrees for the ! 310: \fMinitial\fR ! 311: clause (if any) ! 312: and the body of the procedure. ! 313: Each node in the parse tree represents a source program construction ! 314: or some implicit semantic action. ! 315: A node can contain up to six fields, ! 316: the first of which is the node type. ! 317: The second and third fields are always line and column numbers ! 318: that are used for error messages and tracing. ! 319: Any additional fields contain information about the construction, ! 320: and possibly pointers to subtrees. ! 321: Appendix A contains a description of all the node types. ! 322: .PP ! 323: The grammar, shown in Appendix B, has several ambiguities. ! 324: The well-known \*(oqdangling else\*(cq problem exists ! 325: not only in the ! 326: \fMif-then-else\fR ! 327: expression, but also in the ! 328: \fMwhile-do\fR, ! 329: \fMuntil-do\fR, ! 330: \fMevery-do\fR, ! 331: and \fMto-by\fR ! 332: expressions. ! 333: In each of these expressions, the last clause is optional, ! 334: so that when the parser sees an ! 335: \fMelse\fR, ! 336: for example, ! 337: it does not know whether to shift the token ! 338: (associating it with the most recent ! 339: \fMif\fR), ! 340: or to reduce the preceding ! 341: \fMif-then\fR ! 342: expression (leaving the else \*(oqdangling\*(cq). ! 343: The latter choice is obviously incorrect, since the ! 344: \fMelse\fR ! 345: would never be shifted, and \fIYacc\fR correctly resolves such conflicts ! 346: in favor of the shift. ! 347: Thus, each ! 348: \fMelse\fR ! 349: is paired with the most recent unpaired ! 350: \fMif\fR. ! 351: All the control structures (except ! 352: \fMcase\fR) ! 353: have an additional ambiguity: ! 354: they do not have a closed syntax, ! 355: yet they may appear in an expression at the highest precedence level. ! 356: For example, the expression ! 357: .DS ! 358: .tr -\-+\(pl*\(** ! 359: \fMx := y + if a = b then z else -z * 3 ! 360: .tr --++** ! 361: .ft R ! 362: .DE ! 363: could parse in either of two ways: ! 364: .DS ! 365: .tr -\-+\(pl*\(** ! 366: \fMx := y + (if a = b then z else (-z * 3)) ! 367: x := y + (if a = b then z else -z) * 3 ! 368: .tr --++** ! 369: .ft R ! 370: .DE ! 371: This problem, too, is resolved in favor of the shift, ! 372: such that the first parse is always used. ! 373: Thus, in the absence of parentheses, ! 374: the entire expression to the right of a control structure ! 375: is part of the control structure. ! 376: .PP ! 377: Little attention has been paid to error recovery. ! 378: A few error productions have been placed in the grammar ! 379: to enable \fIYacc\fR to recover from syntax errors; ! 380: the technique for doing so is described by Aho and Johnson [6]. ! 381: The parser is slightly modified by the editor script ! 382: \fMpscript\fR ! 383: so that the parser state is passed to the routine ! 384: \fMyyerror\fR. ! 385: This routine prints an error message from the file ! 386: \fMsynerr.h\fR ! 387: that is associated with the current parser state. ! 388: This error table currently is constructed by ! 389: hand from the \fMy.output\fR file obtained by ! 390: running \fIYacc\fR with the ! 391: \f3\-v\fR ! 392: option. ! 393: .NH 2 ! 394: The Code Generator ! 395: .PP ! 396: The parser calls the code generator upon ! 397: recognition of each Icon procedure, ! 398: giving it the root of the parse tree. ! 399: The code generator traverses the parse tree recursively, ! 400: emitting ucode. ! 401: Appendix C contains a description of ucode. ! 402: .PP ! 403: The file ! 404: \fMcode.c\fR ! 405: contains both the tree node allocation and the code generation routines. ! 406: The included header file ! 407: \fMcode.h\fR ! 408: contains macros and definitions needed by the code generator, ! 409: while ! 410: \fMtree.h\fR ! 411: defines the tree nodes and the macros that allocate them. ! 412: The macros in ! 413: \fMtree.h\fR ! 414: provide the interface between the parser and the code generator. ! 415: .PP ! 416: The tree traversal routine, ! 417: \fMtraverse\fR, ! 418: is a recursive procedure with one argument, ! 419: a pointer to the root of a tree or subtree ! 420: for which code is to be generated. ! 421: The routine examines the type field of the root ! 422: and, through a switch statement, ! 423: generates a sequence of ucode instructions as determined by the type. ! 424: If the node has subtrees, ! 425: \fMtraverse\fR ! 426: calls itself recursively at the appropriate point ! 427: to generate code for the subtree. ! 428: For example, the code generated for a binary operator ! 429: first generates code for its two subexpressions, ! 430: then emits the code that calls the appropriate run-time library routine. ! 431: .PP ! 432: The returned value of the traversal routine is used for counting ! 433: elements of expression lists. ! 434: If the root of the tree being traversed is an ! 435: \f3elist\fR ! 436: (expression list) node, ! 437: \fMtraverse\fR ! 438: returns the sum of the returned values of its two subtrees. ! 439: Otherwise, it returns 1. ! 440: This count is used when generating code for ! 441: procedure calls and lists with explicit elements, ! 442: which need to know the number of arguments ! 443: to be pushed onto the stack. ! 444: .PP ! 445: When generating code for loops, ! 446: the code generator needs to save three pieces of information ! 447: for each nested loop: ! 448: the ! 449: .I "break label" , ! 450: the ! 451: .I "next label" , ! 452: and the expression nesting level. ! 453: This information is kept on the ! 454: .I "loop stack" . ! 455: The break label is a label placed just past the end of the loop; ! 456: it is the place where control is passed when the loop is finished. ! 457: The next label is placed near the end of the loop, ! 458: at a point where the next iteration of the loop can be started. ! 459: The code for ! 460: \fMbreak\fR ! 461: and ! 462: \fMnext\fR ! 463: expressions branches to these labels, ! 464: but in either case, any incomplete expression frames (see Section 3.2) ! 465: within the loop must first be popped from the stack. ! 466: The expression nesting level counts the number of currently active ! 467: expression frames within the current loop; ! 468: an ! 469: \f3unmark\fR ! 470: instruction is generated for that many expression frames ! 471: (less one for a ! 472: \fMnext\fR ! 473: expression). ! 474: .PP ! 475: The possibility of nested ! 476: \fMcase\fR ! 477: expressions requires that certain information be kept on a ! 478: .I "case stack" . ! 479: For each case expression, ! 480: the code generator allocates a label for the end of the expression ! 481: and pushes it onto the case stack. ! 482: When a ! 483: \fMdefault\fR ! 484: clause is encountered, ! 485: its subtree is placed on the top of the case stack to delay ! 486: code generation for it until the end of the \fMcase\fR expression. ! 487: .NH 2 ! 488: The Symbol Table Manager ! 489: .PP ! 490: The symbol table manager consists of the symbol table data structures ! 491: and routines that operate upon these data structures. ! 492: The source code for the symbol table manager is contained in two files. ! 493: The file ! 494: \fMkeyword.c\fR ! 495: contains only the keyword name table and is automatically constructed from a ! 496: keyword specification file discussed below. ! 497: The remainder of the symbol table manager is located in the file ! 498: \fMsym.c\fR. ! 499: .PP ! 500: The symbol table manager operates with two logical data structures, ! 501: the symbol table proper and the string space. ! 502: When the lexical analyzer identifies a token as either ! 503: an identifier or a literal, ! 504: the lexical analyzer requests the symbol table manager ! 505: to enter the token into the string space. ! 506: The symbol table manager returns a pointer into ! 507: the string space for that string. ! 508: The lexical analyzer then places this pointer in the token value node. ! 509: To help keep the size of the string space small, ! 510: all entries are hashed, ! 511: and only one copy of any string is kept. ! 512: This has the added benefit that two strings can be compared ! 513: by checking only the pointers into the string space. ! 514: .PP ! 515: The parser determines the context of the token ! 516: and requests the symbol table manager ! 517: to enter the token into the symbol table proper. ! 518: It is the responsibility of the symbol table manager ! 519: to verify that the use of the token is consistent with prior use. ! 520: Appropriate diagnostics are issued if the use is inconsistent. ! 521: .PP ! 522: The symbol table proper is physically divided into three separate structures: ! 523: the ! 524: .I global , ! 525: .I local , ! 526: and ! 527: .I literal ! 528: tables. ! 529: Each of these tables is hashed, ! 530: using the pointer into the string space as the key. ! 531: Since this pointer is an offset into the string space, ! 532: hashing is simply and effectively performed by taking the rightmost ! 533: .I n ! 534: bits of the offset ! 535: (where ! 536: .if n .I n "" 2** ! 537: .if t 2\s-2\u\fIn\fR\d\s+2 ! 538: is the size of the hash vector for the table). ! 539: .PP ! 540: The global table contains identifiers that have been declared as ! 541: globals, procedures, or records. ! 542: The local table holds all identifiers declared as locals, ! 543: formal parameters for procedure declarations, field names for record ! 544: declarations, and all other identifiers referenced in the procedure ! 545: (including those declared as global elsewhere). ! 546: The literal table contains entries for literal strings and csets, integers, ! 547: and floating-point constants. ! 548: .PP ! 549: Both the local and literal tables are associated with the current ! 550: procedure being parsed ! 551: and are written to the \fM.u1\fR file ! 552: when the procedure has been successfully parsed. ! 553: If a record declaration has been parsed, then the local table, ! 554: containing only the field name identifiers, is written to the ! 555: \&\fM.u2\fR file. ! 556: After all procedure, record, and global declarations in a Icon source ! 557: file have been parsed, the global table is written into the global ! 558: declarations file. ! 559: .PP ! 560: An entry into any of the three symbol table sections is a structure with ! 561: three fields: ! 562: a link, ! 563: a name, ! 564: and a flag. ! 565: The link field holds the pointer to the next entry in the same hash bucket. ! 566: The name is the pointer to the identifier or literal name in the string space. ! 567: The flag field contains the type ! 568: .I "formal parameter" , ( ! 569: .I "static local" , ! 570: .I "procedure name" , ! 571: etc.) of the entry. ! 572: Global table entries have a fourth field, ! 573: an integer providing the number of formal parameters for a procedure declaration, ! 574: or the number of fields in a record declaration. ! 575: .PP ! 576: Lookup in the local and global tables is merely the process of ! 577: following a hash chain until an entry of the same name is found or until ! 578: the hash chain is exhausted. ! 579: If a previous entry is found, the flags of the existing and new entries ! 580: are compared, and diagnostics are printed if the use of the new entry ! 581: conflicts with the previous usage. ! 582: The new entry is ignored whenever such an inconsistency is found. ! 583: .PP ! 584: The literal table uses the same lookup procedure, except ! 585: the search down the hash chain stops when an entry is found ! 586: with the same textual form and flag fields. ! 587: Thus the string literal ! 588: \fM"123"\fR ! 589: and the integer literal ! 590: \fM123\fR ! 591: have separate entries in the literal table, even though ! 592: they have the same string representations. ! 593: A consequence of this technique is that the integer ! 594: literals ! 595: \fM123\fR ! 596: and ! 597: \fM0123\fR ! 598: have separate entries in the literal table, ! 599: even though they have the same numeric value. ! 600: Since most programmers use a reasonably consistent style ! 601: when expressing literals, ! 602: this technique usually does not produce many ! 603: duplicate constants. ! 604: .PP ! 605: .tr ## ! 606: A final task of the symbol table manager is ! 607: the identification of keyword names. ! 608: (Note that keywords are of the form \fM&\fIname\fR.) ! 609: The symbol table manager maintains a list of the legal keyword names ! 610: and, upon request, returns a numeric identification for a keyword name ! 611: to the parser. ! 612: An automatic procedure exists for creating the keyword name table: ! 613: the Icon program ! 614: \fMmkkeytab.icn\fR ! 615: reads the specification file ! 616: \fMkeywords\fR ! 617: and produces the keyword name table in ! 618: \fMkeyword.c\fR. ! 619: The file ! 620: \fMkeywords\fR ! 621: is simply a list of the keyword names and a numeric identification for each. ! 622: Since the number of keyword names is small, and only a few ! 623: references to keywords are typical in an Icon program, lookup in the ! 624: keyword name table is done using a linear search. ! 625: .PP ! 626: The sizes of the respective portions of the symbol table ! 627: may be altered with command line arguments to the Icon translator. ! 628: .NH ! 629: Linker ! 630: .PP ! 631: The Icon linker is written entirely in C. It consists of eight files of ! 632: source code and three header files. ! 633: The linker performs three tasks: ! 634: combining the global symbol tables from one or more ! 635: runs of the translator, ! 636: resolving undeclared identifiers, ! 637: and translating ucode to icode. ! 638: The resulting combined global symbol table is used ! 639: for determining the scope of undeclared identifiers ! 640: during the second task. ! 641: The second and third tasks are done during a single pass ! 642: over each intermediate code file. ! 643: A single file of assembly code is produced. ! 644: .PP ! 645: The symbol table module, in the file ! 646: \fMlsym.c\fR, ! 647: is similar to the symbol table module of the translator, ! 648: except that there is an additional table for storing ! 649: field names of records. ! 650: The input module, in the file ! 651: \fMllex.c\fR, ! 652: recognizes the instructions in both the global symbol table files ! 653: and the intermediate code files. ! 654: The global symbol tables are merged by the routine \fMglobals\fR in ! 655: \fMglob.c\fR, ! 656: and the intermediate code files are produced by the routines in ! 657: \fMlcode.c\fR. ! 658: Of the remaining source files, ! 659: \fMilink.c\fR ! 660: and ! 661: \fMlmem.c\fR ! 662: contain the main program, miscellaneous support routines, ! 663: and memory initialization. ! 664: The files ! 665: \fMbuiltin.c\fR ! 666: and ! 667: \fMopcode.c\fR ! 668: contain table initializations for the list of built-in procedures ! 669: (functions) ! 670: and the ucode operations, respectively. ! 671: .PP ! 672: The first phase of the linker reads ! 673: the global symbol table file from each translator run, ! 674: and entering all the global symbols into one combined table. ! 675: The format of a global symbol table file is described in Appendix C. ! 676: This phase also builds the record/field table that cross-references ! 677: records and field names, ! 678: and sets the trace flag for execution-time tracing ! 679: if any of the files being linked were translated with the ! 680: \f3\-t\fR ! 681: option. ! 682: .PP ! 683: As records are entered into the global symbol table ! 684: and the record/field table, ! 685: they are numbered, starting from 1. ! 686: These record numbers are used to index the record/field table ! 687: at run-time when referencing a field. ! 688: .PP ! 689: When the linker encounters a link instruction, the named file is added ! 690: to the end of a linked list of files to be linked. The list initially ! 691: consists of the files named as arguments. Names are not added to the ! 692: list if they are already on it. ! 693: .PP ! 694: The second phase reads each intermediate code file in sequence, ! 695: emitting icode as each procedure is encountered. ! 696: Appendix C describes the intermediate code. ! 697: The intermediate code contains a prologue for each procedure, ! 698: beginning with a ! 699: \f3proc\fR ! 700: opcode, followed by a series of ! 701: \f3loc\fR ! 702: opcodes describing the local symbol table, a series of ! 703: \f3con\fR ! 704: opcodes describing the constant table, and a ! 705: \f3declend\fR ! 706: opcode terminating the prologue. ! 707: The local symbol table contains not only local symbols, ! 708: but all identifiers referenced in the procedure \(em ! 709: global, local, or undeclared. ! 710: When an undeclared identifier is entered into the local symbol table, ! 711: its scope is resolved by the following steps: ! 712: .in .5i ! 713: .IP \(bu \w'\(bu'u+1m ! 714: if the identifier has been entered in the global symbol table, ! 715: it is entered into the local symbol table as a global identifier ! 716: .IP \(bu ! 717: if the identifier matches the name of a function, ! 718: it is entered into the local symbol table as a function ! 719: .IP \(bu ! 720: otherwise it is entered as a local identifier ! 721: and a warning is issued if the linker was run with the ! 722: \f3\-u\fR ! 723: option ! 724: .in 0 ! 725: .LP ! 726: The constant table contains an entry for each literal used ! 727: in the procedure. ! 728: .PP ! 729: The linker outputs icode in several regions. ! 730: The first region contains constants, procedure blocks, and code ! 731: for the Icon procedures. ! 732: The next region contains the record/field table and procedure blocks ! 733: for record constructors. ! 734: The next four regions contain ! 735: the global identifiers, the names of the global identifiers, ! 736: the static identifiers, and the identifier table. ! 737: The icode is a sequence of instructions, ! 738: each with an opcode and, in some cases, operands. The sizes of ! 739: opcodes and operands depend on the machine architecture and the ! 740: implementor's judgement. On the ! 741: VAX-11, opcodes are one byte long and operands are four bytes long. ! 742: Most instructions correspond exactly to instructions ! 743: in the ucode that is output by the translator. ! 744: The opcode values are those used internally by the linker ! 745: (defined in the file ! 746: \fMlink/opcode.h\fR). ! 747: .PP ! 748: Fields are provided in the global symbol and literal tables ! 749: for associating a location with each entry. ! 750: As the prologue is being read, each cset, real, or long-integer ! 751: literal entered into the literal table is output immediately ! 752: and its location is stored in the literal table. ! 753: Thus, the locations of all constants are known before their reference. ! 754: .PP ! 755: The same is true of references to procedures, ! 756: since these references only occur in the initialization for global identifiers, ! 757: which is not output until all procedures have been output. ! 758: When the prologue for a procedure has been completely processed, ! 759: the procedure data block is output, and its location is noted in the ! 760: global symbol table. ! 761: .PP ! 762: References to program labels require backpatching, ! 763: since there often are forward references. ! 764: Because program label references are always local to the current procedure, ! 765: the linker buffers the output code for a procedure. ! 766: A table of values for all program labels is initialized to zero at ! 767: the beginning of each procedure. ! 768: When a label is referenced and its table entry is zero, ! 769: the location of the reference is negated and stored in the table entry ! 770: and a zero is output for the operand. ! 771: If a label's table entry is negative, ! 772: the location of the reference is negated and stored in the table entry ! 773: as before, but the previous value of the table entry is output for the operand. ! 774: This forms a linked list of references to the as-yet-undefined label. ! 775: When a label is defined, each reference on the linked list is replaced ! 776: with the correct value of the label. ! 777: .PP ! 778: References to global and static identifiers are determined at run-time. ! 779: The ! 780: \f3glob\fR ! 781: and ! 782: \f3static\fR ! 783: instructions have an integer operand referring to the identifier by ! 784: position in the global or static identifier region. ! 785: When one of these instructions is interpreted, ! 786: the actual address is calculated from the position and the ! 787: known address of the global or static identifier region. ! 788: References to functions are also resolved at run-time. ! 789: Each function is assigned an integer index ! 790: (its position in the table of functions in ! 791: \fMbuiltin.c\fR). ! 792: When the global identifier initialization for a function is output, ! 793: the negated index is output instead of an address. ! 794: The interpreter fills in the correct address during program initialization. ! 795: .PP ! 796: Once the prologue has been processed, ! 797: a procedure data block (see Section 3.1) ! 798: is emitted. ! 799: Opcodes following the prologue represent execution-time operations, ! 800: and cause code to be emitted. ! 801: .PP ! 802: The record/field table is a two-dimensional matrix, ! 803: first indexed by a field number assigned to each identifier that ! 804: is used as a field name, ! 805: next by a record number assigned to each record type. ! 806: The value at the selected position in the table is the ! 807: index of the field in a record of the given type, or \-1 ! 808: if the given record type does not contain the given field. ! 809: .PP ! 810: The initial value for global and static identifiers ! 811: is the null value unless the global identifier is a procedure, ! 812: function, or record constructor, ! 813: in which case the initial value is a descriptor of type ! 814: .I procedure ! 815: pointing to the appropriate procedure data block. ! 816: The values output use the data representations described ! 817: in Section 3.1. ! 818: .PP ! 819: The names of global and static identifiers are output as ! 820: .I "string qualifier" ! 821: descriptors (see Section 3.1) ! 822: and are used by the function ! 823: \fMdisplay\fR. ! 824: All string qualifiers contained in the generated procedure data blocks ! 825: and global and static names point into the identifier table, ! 826: which is just a static string space for that purpose. ! 827: .NH ! 828: The Interpreter ! 829: .PP ! 830: The interpreter consists of an interpretive loop and a collection of ! 831: run-time routines that ! 832: collectively provide support for the execution ! 833: of an Icon program. ! 834: .ig ! 835: This library is searched by the loader for those ! 836: routines necessary for a particular Icon program. ! 837: The assembly code generated by the linker contains ! 838: subroutine calls to library routines to perform most ! 839: high-level operations where in-line code would be inappropriate. ! 840: An executable program is created by assembling ! 841: the linker output, then loading a startup routine and ! 842: the assembler output with the run-time library and the C library. ! 843: The startup routine, run-time library, and C library together ! 844: form the run-time system. ! 845: .. ! 846: .PP ! 847: Three directories contain routines relating directly to source-language ! 848: operations: ! 849: \fMfunctions\fR, ! 850: \fMoperators\fR, ! 851: and ! 852: \fMlib\fR. ! 853: The first two directories contain ! 854: one routine per function or operator, respectively. ! 855: The ! 856: \fMlib\fR ! 857: directory contains routines relating to Icon control structures. ! 858: A fourth directory, ! 859: \fMrt\fR, ! 860: contains routines for performing common operations ! 861: needed by many routines in the other three directories. ! 862: In particular, ! 863: \fMrt\fR ! 864: contains routines that handle storage allocation and reclamation, ! 865: type conversion, data comparison, ! 866: integer arithmetic with overflow checking, ! 867: generator suspension, ! 868: and tracing. ! 869: The directory \fMiconx\fR contains initialization and the interpreter proper. ! 870: .PP ! 871: In each of the four run-time directories, all of the object files are ! 872: archived in a \fMLib\fR file which is randomized to speed loading. ! 873: The \fMLib\fR files are loaded together with a startup routine and the ! 874: interpretive loop to produce the interpreter. ! 875: .PP ! 876: Most of the run-time system is coded in C, ! 877: but some of the routines are coded in assembly language. ! 878: The interpretive loop and startup routines are written in assembly language, as is ! 879: integer arithmetic with overflow checking ! 880: (C does not provide this), ! 881: as well as other routines concerned with stack management. ! 882: .PP ! 883: The interpreter is loaded with the run-time libraries and the C ! 884: library to form the program \fMiconx\fR, which interprets ! 885: icode. ! 886: .PP ! 887: Before the interpreter begins executing the Icon program, ! 888: it reads in the icode file generated by the linker. ! 889: The first eight words of this file contain header information ! 890: indicating the total size of the rest of the file, ! 891: the initial value of ! 892: \fM&trace\fR, ! 893: and the relative offsets from the beginning of the file to the ! 894: various regions. ! 895: These offsets are converted to actual addresses by adding the base ! 896: address of the icode buffer. ! 897: Several pointers in the icode must also be relocated. ! 898: The interpreter sweeps through the global identifiers, ! 899: looking for procedures, functions, and record constructors. ! 900: For each function, it supplies the address of the appropriate procedure block. ! 901: For each procedure, it relocates pointers ! 902: from its procedure block to the procedure entry point, ! 903: as well as to strings representing the procedure ! 904: and local identifier names in the identifier table. ! 905: For each record, it supplies the address of ! 906: \fMmkrec\fR, ! 907: the routine that constructs new records, ! 908: as the entry point field in the procedure block. ! 909: .PP ! 910: The interpreter then begins execution by invoking the value of the first global ! 911: identifier, which corresponds to the procedure ! 912: \fMmain\fR. ! 913: If there is no main procedure, the first global identifier has the ! 914: null value and a run-time error is reported. ! 915: The routine ! 916: \fMinvoke\fR ! 917: sets ! 918: the ! 919: \fIinterpreter program counter \fR(\fIipc\fR) ! 920: to the entry point, and branches to ! 921: \fMinterp\fR, which is contained in \fMiconx/interp.s\fR. ! 922: .PP ! 923: The routine ! 924: \fMinterp\fR ! 925: is the main interpreter loop. ! 926: It fetches the next opcode, ! 927: and branches to the appropriate processing routine through a jump table. ! 928: .NH 2 ! 929: Data Representations ! 930: .PP ! 931: Icon has two elementary forms of data objects \(em values and variables. ! 932: Values often can be converted from one data type to another. ! 933: When this is done automatically, it is called ! 934: .I coercion . ! 935: There are three kinds of variables, each discussed below: ! 936: .I "natural variables" , ! 937: .I "created variables" , ! 938: and ! 939: .I "trapped variables" . ! 940: The process of obtaining the value referred to by a variable is called ! 941: .I dereferencing . ! 942: .PP ! 943: Every data object is represented by a two-word ! 944: .I descriptor , ! 945: which may, depending on the type of the object, ! 946: contain a value or ! 947: refer to some other area of memory for the actual value. ! 948: The first word of the descriptor always indicates the data type, ! 949: and the second word either contains the value or a pointer to it. ! 950: There are six descriptor formats, shown in Appendix D: ! 951: .I null , ! 952: .I "string qualifier" , ! 953: .I "integer" , ! 954: .I value , ! 955: .I variable , ! 956: and ! 957: .I "trapped variable" . ! 958: These formats are distinguished from one another by the three high-order ! 959: bits of the first word, ! 960: except that a ! 961: .I null ! 962: descriptor is distinguished from a ! 963: .I "string qualifier" ! 964: only by the contents of the second word. ! 965: Among ! 966: .I "integer" , ! 967: .I value , ! 968: and ! 969: .I "trapped variable" ! 970: descriptors, the low-order six bits of the first word ! 971: identify the type of object represented, ! 972: while the remaining high-order bits in the first word are flags that ! 973: classify the object (for example, whether the second word contains ! 974: a pointer ! 975: \(em historically, a \*(oqfloating address\*(cq [7]). ! 976: .PP ! 977: The ! 978: .I null ! 979: descriptor represents the null value. ! 980: A ! 981: .I "string qualifier" ! 982: represents a string, and contains the length of the string ! 983: and a pointer to the first character of the string. ! 984: An ! 985: .I "integer" ! 986: descriptor represents an integer small enough ! 987: to fit in the second word of the descriptor. ! 988: This includes all integers on computers whose C \fIints\fR are the same ! 989: size as C \fIlongs\fR (such as the VAX-11). ! 990: All data types other than \f3integer\fR, \f3string\fR, and \f3null\fR are ! 991: represented by ! 992: .I value ! 993: descriptors. ! 994: A value descriptor ! 995: contains a pointer to a ! 996: .I "data block" ! 997: of appropriate format for a value of the given type. ! 998: On computers whose C \fIlongs\fR are longer than C \fIints\fR ! 999: (such as the PDP-11), ! 1000: an integer that requires more bits than there are in an \fIint\fR is contained ! 1001: in a \fIlong integer\fR data block. ! 1002: The data block formats for each data type are shown in Appendix D. ! 1003: .PP ! 1004: A ! 1005: .I variable ! 1006: descriptor represents either a natural variable ! 1007: or a created variable. ! 1008: A natural variable contains a pointer to a descriptor at a fixed location ! 1009: (for a global identifier) or a location on the stack (for a local identifier) ! 1010: where the value of the variable is stored. ! 1011: A created variable, formed by a table, list, or field reference, ! 1012: contains a pointer to a descriptor in the aggregate ! 1013: where the referenced element is located. ! 1014: Since such elements are in the heap, ! 1015: created variables also contain an offset that indicates ! 1016: the distance (in words) from the beginning of the data block ! 1017: to the referenced descriptor. ! 1018: This offset is used during the marking phase of garbage collection, ! 1019: discussed in Section 3.3. ! 1020: .PP ! 1021: A ! 1022: .I "trapped variable" ! 1023: [8] descriptor represents a variable for which special action is necessary ! 1024: upon dereferencing or assignment. ! 1025: Such variables include substrings, ! 1026: non-existent elements of tables, ! 1027: and certain keywords. ! 1028: Each type of trapped variable is distinguished by the first word ! 1029: of the descriptor. ! 1030: .PP ! 1031: Substring trapped variables, ! 1032: created by a section or subscripting operation, ! 1033: contain a pointer to a data block that contains a ! 1034: .I variable ! 1035: descriptor identifying the value from which the substring was taken, ! 1036: an integer indicating the beginning position of the substring, ! 1037: and an integer showing the length of the substring. ! 1038: With this information, assignment to a substring of a variable ! 1039: can modify the contents of the variable properly. ! 1040: Substrings of non-variables do not produce ! 1041: substring trapped variables, since assignment to such substrings ! 1042: is meaningless and illegal; ! 1043: instead, forming the substring of a non-variable produces a ! 1044: string qualifier. ! 1045: .PP ! 1046: Table element trapped variables, ! 1047: formed by referencing a non-existent element of a table, ! 1048: similarly contain a pointer to a data block that contains ! 1049: enough information for assignment to add the element to the ! 1050: referenced table or to supply the default table value. ! 1051: .PP ! 1052: The keywords ! 1053: \fM&pos\fR, ! 1054: \fM&random\fR, ! 1055: and ! 1056: \fM&trace\fR ! 1057: are handled via trapped variables (\fM&subject\fR is handled differently). ! 1058: These trapped variables ! 1059: need no additional information. ! 1060: It is sufficient to know the type of trapped variable on dereferencing \(em ! 1061: the value of the keyword can be accessed and returned. ! 1062: On assignment, the new value is coerced to the appropriate type, ! 1063: checked for validity, and assigned to the keyword. ! 1064: .PP ! 1065: Strings formed during program execution are placed in the ! 1066: .I "string space" ; ! 1067: string qualifiers for these strings point into this region. ! 1068: Substrings of existing strings are not allocated again; ! 1069: instead, a string qualifier is formed that points into the ! 1070: existing string. ! 1071: When storage is exhausted in the string space, ! 1072: the garbage collector (see Section 3.3) ! 1073: is invoked to reclaim unused space and compact the region; ! 1074: if enough space cannot be reclaimed, the region is expanded if possible. ! 1075: .PP ! 1076: Data blocks formed during program execution are placed in the ! 1077: .I heap . ! 1078: Data blocks have a rigid format dictated by the garbage collection ! 1079: algorithm. ! 1080: The first word of the block always contains a type code ! 1081: which identifies the structure of the rest of the block. ! 1082: Descriptors follow all non-descriptor information in the block. ! 1083: If the size of the block is not determined by its type, ! 1084: the size (in bytes) is contained in the second word of the block. ! 1085: When storage is exhausted in the heap, ! 1086: the garbage collector is invoked to reclaim unused space and compact the heap; ! 1087: if enough space cannot be reclaimed, the heap is expanded if possible. ! 1088: .PP ! 1089: Co-expression stack blocks are allocated in a separate region and ! 1090: are treated specially by the garbage collector. ! 1091: .NH 2 ! 1092: Stack Organization ! 1093: .PP ! 1094: The stack is the focus of activity ! 1095: during the execution of an Icon program. ! 1096: All operators, functions, and procedures ! 1097: expect to find their arguments at the top of the stack, ! 1098: and replace the arguments with the result of their computation. ! 1099: Local variables for Icon procedures are also kept on the stack. ! 1100: The arguments, local variables, and temporaries on the stack ! 1101: for an active Icon procedure are collectively called a ! 1102: .I "procedure frame" . ! 1103: This is one of several kinds of ! 1104: .I "stack frames" ! 1105: discussed in this section. ! 1106: Appendix E summarizes the layouts of the stack frames for the PDP-11 and VAX-11. ! 1107: See [3] for a detailed discussion of stack frames. ! 1108: Each co-expression also has a stack. For uniformity, the main stack ! 1109: is treated as the stack for the co-expression \fM&main\fR. ! 1110: .PP ! 1111: On the PDP-11 and VAX-11 stacks start in high memory and grow downward. ! 1112: On these computers, a push causes the stack pointer to decrease and a ! 1113: pop causes the stack pointer to increase. Thus ``above'' and ``below'' ! 1114: refer, respectively, to ``older'' and ``newer'' information on the ! 1115: stack. An exception to this is the phrase ! 1116: ``top of the stack'', which is used to ! 1117: refer to the \fIlowest\fR memory location. The description of relative ! 1118: stack locations that follows is based on this kind of architecture ! 1119: and nomenclature. ! 1120: .PP ! 1121: Before an Icon procedure calls another Icon procedure, ! 1122: the caller pushes the procedure to be called ! 1123: (a descriptor \(em procedures are data objects in Icon) ! 1124: onto the stack. ! 1125: The caller then pushes each argument (also a descriptor) onto the stack, ! 1126: leftmost argument first. ! 1127: The caller then pushes one word onto the stack ! 1128: indicating the number of arguments supplied, ! 1129: which may be different from the number of arguments expected. ! 1130: The run-time library routine ! 1131: \fMinvoke\fR ! 1132: is then called, which checks that the first descriptor pushed above ! 1133: actually does represent an integer, procedure, or a variable whose value is ! 1134: an integer or a procedure. ! 1135: An integer indicates the selection of one of the ! 1136: arguments resulting from mutual evaluation. ! 1137: A procedure, on the other hand, ! 1138: points to a procedure data block, ! 1139: which contains various information about the called procedure, ! 1140: including the number of arguments expected, ! 1141: the number of local variables used, ! 1142: and the procedure's entry point address. ! 1143: The routine ! 1144: \fMinvoke\fR ! 1145: next adjusts the number of arguments supplied to match the number expected, ! 1146: deleting excess arguments or supplying the null value for missing ones. ! 1147: This adjustment is performed by moving the portion of the stack ! 1148: below the arguments up or down, as appropriate. ! 1149: It then dereferences the arguments. ! 1150: A ! 1151: .I "procedure marker" ! 1152: is then pushed onto the stack, ! 1153: and the ! 1154: .I "procedure frame pointer" ! 1155: is set to point to the new procedure marker. ! 1156: The procedure marker contains, among other things, ! 1157: the return address in the calling procedure ! 1158: and the previous value of the procedure frame pointer. ! 1159: Next, the null value is pushed onto the stack ! 1160: as the initial value for each local variable. ! 1161: The routine ! 1162: \fMinvoke\fR ! 1163: then transfers control to the procedure's entry point, ! 1164: and execution of the Icon program resumes in the new procedure. ! 1165: .PP ! 1166: When a procedure is ready to return to its caller, ! 1167: it pushes its return value (a descriptor) on the stack. ! 1168: It then transfers control to ! 1169: \fMpret\fR, ! 1170: which moves the return value ! 1171: to the location occupied by the descriptor that represented the ! 1172: called procedure. ! 1173: That is, the return value is stored in place of the ! 1174: first descriptor that was pushed ! 1175: at the beginning of the calling sequence described above. ! 1176: The return sequence ! 1177: then restores the state of the previous procedure ! 1178: from the current procedure marker ! 1179: (the procedure marker that the procedure frame pointer ! 1180: currently points to). ! 1181: This includes restoring the previous value of the procedure frame pointer, ! 1182: retrieving the return address, ! 1183: and popping the returning procedure's local variables, ! 1184: procedure marker, and arguments. ! 1185: Thus, when the calling procedure regains control, ! 1186: the arguments have been popped ! 1187: and the return value is now at the top of the stack. ! 1188: .PP ! 1189: Functions and operators are written in C, ! 1190: and therefore obey the C calling sequence. ! 1191: By design, the Icon calling sequence described above ! 1192: is similar to the C calling sequence. ! 1193: When an Icon procedure calls a function, a ! 1194: .I boundary ! 1195: on the stack is introduced, ! 1196: where the stack above the boundary is regimented by Icon standards, ! 1197: and the stack below the boundary contains C information. ! 1198: This boundary is important during garbage collection: ! 1199: The garbage collector must ignore the area of the stack ! 1200: below the boundary, ! 1201: since the structure of this area is unknown, ! 1202: whereas the structure of the area above the boundary is ! 1203: well-defined. ! 1204: In particular, all data above the boundary is contained ! 1205: in descriptors or is defined by the structure of a frame, ! 1206: so that all pointers into the heap or string space may be ! 1207: located during a garbage collection. ! 1208: .PP ! 1209: Functions and operators are written to \*(oqstraddle\*(cq ! 1210: the boundary. ! 1211: From above, they are designed to resemble Icon procedures; ! 1212: from below, they are C procedures. ! 1213: An Icon procedure calls a function in the same way as it ! 1214: calls another Icon procedure; ! 1215: in fact, functions are procedure-typed ! 1216: data objects just as Icon procedures are. ! 1217: When ! 1218: \fMinvoke\fR ! 1219: recognizes that a function is being called, ! 1220: it bypasses the argument adjustment ! 1221: if the field in the procedure data block that indicates ! 1222: the number of arguments expected contains \-1, ! 1223: which indicates that the function can take an arbitrary number of arguments. ! 1224: It also does not allocate stack space for local variables, since ! 1225: any such variables are C variables and are allocated by the C ! 1226: function itself. ! 1227: C procedures have an entry sequence that creates a new ! 1228: procedure frame; ! 1229: since ! 1230: \fMinvoke\fR ! 1231: has already done this, ! 1232: the entry point for functions must be set past any instructions ! 1233: that are involved in procedure frame creation. ! 1234: .PP ! 1235: The first formal parameter of ! 1236: all functions is ! 1237: \fMnargs\fR, ! 1238: which corresponds to the word that contains the number of ! 1239: arguments supplied. ! 1240: For functions that expect a fixed number of arguments, ! 1241: they are also listed as arguments, in reverse order. ! 1242: For functions that can take an arbitrary number of arguments, ! 1243: there is a macro ! 1244: \fMARG(\fIn\fM)\fR ! 1245: that uses the address and contents ! 1246: of ! 1247: \fMnargs\fR ! 1248: to calculate the location of the ! 1249: .I n \^th ! 1250: argument. ! 1251: Thus, ! 1252: \fMARG(1)\fR ! 1253: accesses the first argument (as a descriptor), ! 1254: and ! 1255: \fMARG(nargs)\fR ! 1256: accesses the last argument. ! 1257: Each function is responsible for supplying defaults for missing arguments ! 1258: and for dereferencing arguments that are variables. ! 1259: Because of the calling protocol, ! 1260: \fMARG(0)\fR ! 1261: accesses the location where the return value should be stored. ! 1262: Every function must place its result there and ! 1263: then return through normal C conventions. ! 1264: Each function must also supply a procedure data block that contains ! 1265: the number of arguments expected (or \-1), its entry point, ! 1266: and a string qualifier representing its name. ! 1267: .PP ! 1268: Operators are very similar to functions. The only difference is that ! 1269: operators are called directly (rather than being called through \fMinvoke\fR) ! 1270: and must set the boundary themselves. ! 1271: .ig ! 1272: .PP ! 1273: Operators are written like functions, ! 1274: with one exception. ! 1275: Since operators are not variables ! 1276: (as function and procedure names are), ! 1277: the name of the operator is known at translation time, ! 1278: and the Icon procedure calls it directly ! 1279: (at its normal entry point) ! 1280: without going through ! 1281: \fMinvoke\fR. ! 1282: Thus, operators do not need procedure data blocks. ! 1283: Instead of pushing a procedure descriptor pointing to a procedure block, ! 1284: a null value is pushed instead to hold a place for the return value. ! 1285: .. ! 1286: .PP ! 1287: When an operator or function fails to produce a result, ! 1288: it calls ! 1289: \fMfail\fR. ! 1290: This routine initiates backtracking as described below. ! 1291: .PP ! 1292: Expressions are evaluated within an ! 1293: .I "expression frame" . ! 1294: When the evaluation of an expression is complete, ! 1295: whether it has produced a result or failed, ! 1296: the expression frame must be popped from the stack ! 1297: and the result of the expression must be pushed back onto the stack. ! 1298: The expression frame marks the stack height at the point ! 1299: that the expression began to be evaluated, ! 1300: so that the stack may be restored to its original state ! 1301: when the evaluation of the expression is complete. ! 1302: The stack normally would be restored to the original height ! 1303: (that is, the pops would match the pushes) ! 1304: except when an expression fails at some midpoint in its evaluation. ! 1305: The expression frame is also used to limit backtracking: ! 1306: backtracking is restricted in the language ! 1307: to the current expression instance only. ! 1308: .PP ! 1309: When evaluation of an expression begins, an ! 1310: .I "expression marker" ! 1311: is pushed on the stack, the ! 1312: .I "expression frame pointer" ! 1313: is set to point to it, and the ! 1314: .I "generator frame pointer" , ! 1315: discussed below, is cleared. ! 1316: The marker contains the previous values of the ! 1317: expression and generator frame pointers ! 1318: and a failure label. ! 1319: When an expression produces a result, ! 1320: that result, on the top of the stack, ! 1321: is popped and saved. ! 1322: Then the stack is popped to the expression marker, ! 1323: and the previous values of the two frame pointers are restored. ! 1324: The marker is popped ! 1325: and the result of the expression is pushed back onto the stack, ! 1326: now a part of the previous expression frame. ! 1327: If an expression fails to produce a result, ! 1328: \fMfail\fR ! 1329: pops the stack to the expression marker, ! 1330: restores the previous values of the two frame pointers, ! 1331: and branches to the failure label. ! 1332: In the special case that the failure label is zero, ! 1333: \fMfail\fR ! 1334: is effectively called again to indicate failure in the new expression frame. ! 1335: Thus the failure is propagated from one expression to an enclosing one. ! 1336: .PP ! 1337: If an expression has any generators, ! 1338: then there is a ! 1339: .I "generator frame" ! 1340: within the current expression frame ! 1341: for each generator that is inactive ! 1342: (that is, that has produced a value but is not yet exhausted). ! 1343: A generator frame preserves the state of the stack ! 1344: at the point just before the generator ! 1345: (whether it be operator, function, or procedure) ! 1346: suspended (became inactive). ! 1347: If ! 1348: \fMfail\fR ! 1349: is called and there are inactive generators, ! 1350: then instead of exiting the current expression frame, ! 1351: the most recently inactivated generator is reactivated ! 1352: by restoring the stack to the state saved in the most recent ! 1353: generator frame. ! 1354: .PP ! 1355: A function or operator suspends itself by calling ! 1356: \fMsuspend\fR. ! 1357: This routine preserves the state of the stack ! 1358: by duplicating the current expression frame, ! 1359: bounded on one end by the most recent generator frame ! 1360: (or, if there are no inactive generators, the current expression frame) ! 1361: and bounded on the other end by the beginning of the ! 1362: argument list of the suspending function or operator. ! 1363: A generator marker is pushed onto the stack, ! 1364: followed by the duplicate expression frame. ! 1365: The routine ! 1366: \fMsuspend\fR ! 1367: then causes the suspending function or operator to return to its caller, ! 1368: instead of itself returning. ! 1369: .PP ! 1370: When reactivated by ! 1371: \fMfail\fR, ! 1372: the stack is restored to the generator marker, ! 1373: which is used to restore the various frame pointers. ! 1374: Then the marker is popped. ! 1375: The stack is then in the same state that it was in when ! 1376: \fMsuspend\fR ! 1377: was called. ! 1378: The routine ! 1379: \fMfail\fR ! 1380: then returns to the generator ! 1381: as if the original call to ! 1382: \fMsuspend\fR ! 1383: had returned. ! 1384: Thus the following schema is typical of operators and functions ! 1385: that generate a sequence of values. ! 1386: .DS ! 1387: .ss 9 ! 1388: .ft M ! 1389: \fIinitialize\fM; ! 1390: \fMwhile\fM (\fInot exhausted\fM) { ! 1391: \fIcompute next value\fM; ! 1392: \fIstore return value\fM; ! 1393: suspend() ! 1394: } ! 1395: fail(); ! 1396: .ft R ! 1397: .ss 4 ! 1398: .DE ! 1399: The effect of resuming an expression containing generators is that ! 1400: \fMsuspend\fR ! 1401: actually causes the generator to return. ! 1402: If alternatives are needed, backtracking occurs, ! 1403: and the apparent effect is that ! 1404: \fMsuspend\fR ! 1405: has finally returned. ! 1406: The generator computes the next result, ! 1407: and suspends with it. ! 1408: When the generator is exhausted, ! 1409: it merely fails without suspending, ! 1410: which just passes the failure back to the next most recently inactivated generator, ! 1411: if any. ! 1412: .PP ! 1413: Just as functions and operators can return normally, suspend, or fail, ! 1414: so can Icon procedures. ! 1415: The mechanics are essentially the same, but the differences in stack ! 1416: layout require different primitives. ! 1417: When Icon procedures return normally, the return value is presumed to ! 1418: be at the top of the stack and ! 1419: \fMpret\fR ! 1420: is called. ! 1421: Similarly, Icon procedures call ! 1422: \fMpsusp\fR ! 1423: to suspend. ! 1424: Both of these routines also dereference the return result ! 1425: if it is a local variable. ! 1426: The routine ! 1427: \fMpfail\fR ! 1428: causes an Icon procedure to return with no result. ! 1429: .PP ! 1430: The same three primitives are also needed at the expression level: ! 1431: \fMeret\fR, ! 1432: \fMesusp\fR, ! 1433: and ! 1434: \fMefail\fR. ! 1435: Unlike ! 1436: \fMunmark\fR, ! 1437: \fMeret\fR ! 1438: is not a library routine, but is generated as in-line code. ! 1439: Both cause an exit from the current expression frame; but ! 1440: \fMeret\fR ! 1441: supplies a result to the enclosing expression, ! 1442: while ! 1443: \fMunmark\fR ! 1444: does not. ! 1445: The routine ! 1446: \fMesusp\fR ! 1447: creates a inactive generator before supplying a result to the ! 1448: enclosing expression; ! 1449: it is used by the alternation control structure. ! 1450: The routine ! 1451: \fMefail\fR ! 1452: simply causes backtracking within the current expression frame. ! 1453: In fact, ! 1454: \fMfail\fR ! 1455: and ! 1456: \fMpfail\fR ! 1457: merely exit their procedure frame before branching to ! 1458: \fMefail\fR. ! 1459: .NH 2 ! 1460: Storage Allocation and Reclamation ! 1461: .PP ! 1462: During program execution, storage allocation is necessary when a data ! 1463: object is created. ! 1464: The three primitive routines ! 1465: \fMallocate\fR, ! 1466: \fMalcstr\fR, and \fMalcestk\fR ! 1467: allocate storage in the heap, string space, and co-expression stack space, respectively. ! 1468: All three routines return pointers to the beginning of newly ! 1469: allocated space. ! 1470: None of the routines is responsible for ensuring that enough space ! 1471: remains in the data regions. ! 1472: Ensuring that enough space remains in the data regions is ! 1473: the responsibility of a ! 1474: .I "predictive need" ! 1475: strategy described below. ! 1476: .PP ! 1477: In the heap, ! 1478: \fMallocate(n)\fR ! 1479: returns a pointer to ! 1480: \fMn\fR ! 1481: contiguous bytes of storage. ! 1482: Because a wide variety of objects may reside in the heap, a number of ! 1483: support routines are provided to simplify the storing of various objects. ! 1484: There is a specific routine to allocate a block for each datatype in the heap. ! 1485: Where appropriate, these routines have the actual values to be stored ! 1486: as their arguments. ! 1487: All of the routines call ! 1488: \fMallocate\fR ! 1489: to obtain storage for the object. ! 1490: .PP ! 1491: In the string space, ! 1492: \fMalcstr(s,\*bl)\fR ! 1493: allocates room for a string of length ! 1494: \fMl\fR ! 1495: and copies the string pointed to by ! 1496: \fMs\fR ! 1497: into this space. ! 1498: Since some routines such as ! 1499: \fMleft\fR, ! 1500: \fMright\fR, ! 1501: and ! 1502: \fMcenter\fR ! 1503: need room in the string space in which to construct a string, ! 1504: a call to ! 1505: \fMalcstr\fR ! 1506: with the defined constant ! 1507: \fMNULL\fR ! 1508: as the first argument results in the ! 1509: allocation of storage without attempting to copy a string. ! 1510: .PP ! 1511: In the co-expression stack region, \fMalcestk()\fR allocates a new ! 1512: co-expression stack. ! 1513: .PP ! 1514: Source code for all of the allocation routines is contained in the file ! 1515: \fMrt/alc.c\fR. ! 1516: Almost all interaction with the storage management is made through these ! 1517: routines. ! 1518: Two exceptions occur in string concatenation (\fMoperators/cat.c\fR) and ! 1519: reading a fixed number of ! 1520: bytes from a file (\fMfunctions/reads.c\fR). ! 1521: In these cases, it is simpler and more efficient to have these operations ! 1522: deal directly with storage management. ! 1523: .PP ! 1524: As mentioned earlier, a ! 1525: .I "predictive need" ! 1526: strategy is employed to ensure that enough room remains for data storage. ! 1527: Simply put, ! 1528: .I "predictive need" ! 1529: states that it is the responsibility of any routine ! 1530: that calls an allocation routine both to ensure that enough room remains ! 1531: in the proper data region and to maintain the validity of any temporary ! 1532: pointers into the data regions, should a ! 1533: .I "garbage collection" ! 1534: be necessary to free storage space. ! 1535: .PP ! 1536: Since the check for storage space only needs to occur before the allocation ! 1537: takes place, each routine may perform this check at its convenience. ! 1538: This approach permits the minimization of the number of temporary ! 1539: pointers that must be protected during garbage collection. ! 1540: As an aid, space for several descriptors is automatically protected ! 1541: by the procedure invocation mechanism, and usually is used to hold ! 1542: information pertaining to the arguments of the procedure (see Section 3.4). ! 1543: .PP ! 1544: Routines to ensure space are provided for each of the three ! 1545: storage regions. ! 1546: The routine ! 1547: \fMsneed(n)\fR ! 1548: ensures that at least ! 1549: \fMn\fR ! 1550: bytes of storage remain in the string space, and ! 1551: \fMhneed(n)\fR ! 1552: performs the same function in the heap. ! 1553: \fMesneed()\fR ensures that there is a co-expression stack ! 1554: available. ! 1555: If either routine finds that there is insufficient storage remaining, ! 1556: it invokes the ! 1557: .I "garbage collector" ! 1558: in an attempt to obtain that storage. ! 1559: If that fails, then program execution is aborted with an appropriate ! 1560: diagnostic. ! 1561: .PP ! 1562: Garbage collection, or ! 1563: .I "storage reclamation" , ! 1564: is a process that identifies all valid allocated data. ! 1565: In the string and heap regions, valid data is compacted ! 1566: in order to provide a contiguous area of unused storage. ! 1567: The algorithm used for identifying valid data is based upon ! 1568: the algorithm described by Hanson [7]. ! 1569: Only the more novel features are discussed here. ! 1570: .PP ! 1571: Whenever a predictive need request discovers that insufficient ! 1572: storage remains in either the heap or string space, the garbage ! 1573: collector is invoked to reclaim unused space in all regions. ! 1574: This approach is more efficient in situations where all regions ! 1575: are heavily allocated and only slightly less efficient otherwise. ! 1576: .PP ! 1577: The approach is to sweep through the permanent data regions and the ! 1578: stack, looking for descriptors that are either pointers into the heap or ! 1579: string qualifiers. ! 1580: When a string qualifier is found, a pointer to that qualifier is ! 1581: saved in a temporary data region at the end of the heap. ! 1582: If the descriptor is a pointer into the heap, then that heap data ! 1583: block contains valid information. ! 1584: The block is marked as valid, the descriptor is placed on a back chain ! 1585: headed in the block, ! 1586: and the marking process is called ! 1587: recursively on any descriptors within that block. ! 1588: Blocks that are already marked as valid are not processed a second time. ! 1589: To simplify the marking of heap blocks, all data blocks have been ! 1590: designed so that all descriptors within them exist as a contiguous ! 1591: section at the end of the block. ! 1592: Thus to sweep through the descriptors within a block, the marking ! 1593: algorithm need only know the size of the block and the ! 1594: location of the first descriptor. ! 1595: Information concerning a data block's size, as well as the ! 1596: offset for the first descriptor is in the file ! 1597: \fMrt/dblocks.c\fR. ! 1598: .PP ! 1599: Valid co-expression stacks also may contain string qualifiers and ! 1600: pointers to other valid data; such stacks are included in the ! 1601: marking phase. ! 1602: .PP ! 1603: After the marking phase is completed, the string region is compacted. ! 1604: The algorithm used is described by Hanson [9]. ! 1605: The pointers to the string qualifiers are sorted so that the order of ! 1606: all valid strings within the string space is identified. ! 1607: The string qualifiers are then processed in order, and modified as ! 1608: the valid strings are compacted. ! 1609: If this compaction does not free enough space within the ! 1610: string space to satisfy the request, the heap must be moved in order ! 1611: to provide more room in the string space. ! 1612: An attempt is also made to provide ! 1613: some additional ! 1614: ``breathing room'' ! 1615: in the string space to permit future expansion. ! 1616: .PP ! 1617: The heap cannot be moved until after the valid pointers into it are ! 1618: adjusted and the storage is compacted. ! 1619: The pointer adjustment and heap compaction phases are two linear passes through ! 1620: the heap which must be performed during standard heap garbage collection. ! 1621: The only difference when the heap is to be moved is that the adjusted ! 1622: pointers point to where that data will be after the heap has been moved. ! 1623: If not enough breathing room is freed in the heap, then ! 1624: more space is requested from the operating system. ! 1625: As a last step, if the string space needs more room, the heap is ! 1626: relocated. ! 1627: .PP ! 1628: This method has proved to be quite satisfactory for most applications. ! 1629: A shortcoming of the implementation is the absence of a process for ! 1630: decreasing the size of a data region, should it become too large. ! 1631: It is also possible that insufficient room would be available for storing ! 1632: the pointers to the string qualifiers, even though enough storage ! 1633: would become available if the heap were collected separately. ! 1634: In practice, this has not been a problem. ! 1635: The source code for the garbage collector is contained in the files ! 1636: \fMrt/gcollect.s\fR, \fMrt/gc.c\fR, and \fMrt/sweep.c\fR. ! 1637: .NH 2 ! 1638: Coding Conventions ! 1639: .PP ! 1640: The calling conventions for functions and operators have been mentioned ! 1641: earlier. ! 1642: Several other aspects of the run-time system are explained here. ! 1643: .PP ! 1644: All header files for the run-time system are in the directory ! 1645: \fMh\fR. ! 1646: The file ! 1647: \fMh/rt.h\fR ! 1648: (or, for assembly-language routines, ! 1649: \fMh/defs.s\fR) ! 1650: is included by almost every source file in the run-time system, and contains ! 1651: machine-dependent defined constants, run-time data structure declarations, ! 1652: and defined constants and macros for flags, ! 1653: type codes, argument accessing, and bit manipulations. ! 1654: .PP ! 1655: During the execution of an Icon program, many type ! 1656: conversions are done on temporary values, where data storage is not ! 1657: required beyond the bounds of the current operation. ! 1658: For this reason, the type conversion routines all operate with pointers passed ! 1659: to them that reference ! 1660: buffers in the calling procedure. ! 1661: Any routine calling for type conversion must determine if ! 1662: heap or string space storage is needed, and perform the ! 1663: allocation. ! 1664: Most of the conversion routines return the type of the result or ! 1665: \fMNULL\fR ! 1666: if the conversion cannot be performed. ! 1667: One exception is ! 1668: \fMcvstr\fR ! 1669: which, in addition to ! 1670: \fMNULL\fR, ! 1671: returns ! 1672: \fM2\fR ! 1673: if the object was already a string, and ! 1674: \fM1\fR ! 1675: if the object ! 1676: had to be converted to a string. ! 1677: This distinction makes it possible to avoid a large number of ! 1678: predictive-need checks. ! 1679: The second exception is ! 1680: \fMcvnum\fR, ! 1681: which returns either ! 1682: real numbers or integers ! 1683: and makes no attempt to distinguish between ! 1684: short and long integers. ! 1685: .PP ! 1686: As mentioned in Section 3.3, there is space set aside to hold ! 1687: temporary descriptors and to protect the validity of these ! 1688: descriptors during garbage collection. ! 1689: The garbage collector knows about this region, and ! 1690: .I tends ! 1691: it during storage reclamation. ! 1692: The region is defined in the file ! 1693: \fMh/gc.h\fR, ! 1694: and is bounded by the labels ! 1695: \fMtended\fR ! 1696: and ! 1697: \fMetended\fR. ! 1698: This area can be referenced from C by considering ! 1699: \fMtended\fR ! 1700: to be an array of descriptors. ! 1701: Since a garbage collection can occur ! 1702: only during a call to ! 1703: \fMsneed\fR ! 1704: or ! 1705: \fMhneed\fR, ! 1706: or between suspension and reactivation, ! 1707: the only places where C routines need to ensure that all pointers ! 1708: into the heap or string space are tended ! 1709: are just before calls to ! 1710: \fMsneed\fR, ! 1711: \fMhneed\fR, ! 1712: or ! 1713: \fMsuspend\fR. ! 1714: .PP ! 1715: All function names are preceded by the letter ! 1716: \fMX\fR, ! 1717: and their procedure blocks are preceded by the letter ! 1718: \fMB\fR. ! 1719: This prevents name collisions between Icon procedures and other routines, ! 1720: such as those for operators, type conversions, and storage management. ! 1721: Reference from the generated code to functions is made entirely through ! 1722: the procedure block; ! 1723: the entry point field of the procedure block references the function itself. ! 1724: .NH ! 1725: Modifying the Implementation ! 1726: .PP ! 1727: This section is intended to serve as a brief guide ! 1728: for those who wish to modify the Icon system. ! 1729: It is not comprehensive; ! 1730: it only points to various parts of the ! 1731: implementation that need to be considered ! 1732: when making various kinds of changes. ! 1733: .PP ! 1734: Perhaps the most common kind of change that one might expect to make ! 1735: is to add new functions (built-in procedures). ! 1736: To add a function, first write it according to the conventions ! 1737: described in Section 3.4. ! 1738: (Use an existing function similar to the new one as a prototype. ! 1739: Appendix F contains several example functions.) Be ! 1740: especially careful to observe the rules ! 1741: concerning storage allocation and tended descriptors. ! 1742: .PP ! 1743: .c note PI here ! 1744: Prepare to add the new function to the run-time library ! 1745: by moving the source code into the ! 1746: \fMfunctions\fR ! 1747: directory and adding its name to ! 1748: \fMfunctions/Makefile\fR ! 1749: (the name must be added in three places \(em ! 1750: there are many examples already in the \fMMakefile\fR). ! 1751: Then add the name to \fMh/pdef.h\fR ! 1752: in proper alphabetical order. ! 1753: Use other functions as a guide to the format of the entry. ! 1754: .PP ! 1755: When all changes have been made to the source code, ! 1756: go to the Icon root directory and run ! 1757: .Ds ! 1758: make Icon ! 1759: .De ! 1760: This runs ! 1761: \fMmake\fR ! 1762: in each of the system directories \(em ! 1763: \fMtran\fR, ! 1764: \fMlink\fR, ! 1765: \fMfunctions\fR, ! 1766: \fMoperators\fR, ! 1767: \fMlib\fR, ! 1768: \fMrt\fR, and \fMiconx\fR ! 1769: \(em and then copies the new versions into the ! 1770: \fMbin\fR ! 1771: directory [10]. ! 1772: .PP ! 1773: Adding a new operator is more complicated. ! 1774: Again, the first step is to write the routine, place it in the ! 1775: \fMoperators\fR ! 1776: directory, and add the appropriate entry to the ! 1777: \fMMakefile\fR ! 1778: there. ! 1779: Next, the operator must be added to the translator, as follows: ! 1780: .IP (1) \w'(1)'u+1n ! 1781: Add the operator to the operator table in ! 1782: \fMtran/optab\fR; ! 1783: the structure of the table is described in Section 1.1. ! 1784: .IP (2) ! 1785: Create a unique name for the new token ! 1786: and make a new token table entry in ! 1787: \fMtran/tokens\fR ! 1788: in the operators section of the table. ! 1789: Although the operators section of the table is in alphabetical order ! 1790: by token name ! 1791: as distributed, ! 1792: there is no need to preserve this order. ! 1793: .IP (3) ! 1794: If a running Version 5 of Icon is not available, ! 1795: edit the files ! 1796: \fMtran/optab.c\fR ! 1797: and ! 1798: \fMtran/toktab.c\fR ! 1799: to correspond to the changes made in steps 1 and 2. ! 1800: This sometimes involves a renumbering of token table entries ! 1801: in both files (but nowhere else). ! 1802: If a running Version 5 of Icon is available, ! 1803: a ! 1804: \fMmake\fR ! 1805: in ! 1806: \fMtran\fR ! 1807: executes ! 1808: \fMmktoktab\fR ! 1809: to produce the new token tables. ! 1810: .IP (4) ! 1811: Add the operator to the grammar in ! 1812: \fMtran/icon.g\fR. ! 1813: The token name must be added to the list of terminal symbols ! 1814: at the beginning of the grammar file, ! 1815: and the operator must be inserted into the syntax at the ! 1816: appropriate precedence level. ! 1817: If the precedence is the same as that of an existing operator, ! 1818: simply add the operator as an alternative to the existing production; ! 1819: otherwise, insert a new production, ! 1820: and change the production at the next lower precedence level ! 1821: to refer to the new one. ! 1822: The semantic action should create either a ! 1823: .I BINOP ! 1824: or a ! 1825: .I UNOP ! 1826: node in the parse tree; ! 1827: use existing actions as a prototype. ! 1828: .IP (5) ! 1829: The new operator must now be added to the code generator in ! 1830: \fMtran/code.c\fR. ! 1831: Insert a case in either of the routines ! 1832: \fMbinop\fR ! 1833: or ! 1834: \fMunop\fR ! 1835: for the new token name ! 1836: that assigns a new intermediate code opcode to ! 1837: \fMname\fR, ! 1838: as for other operators \(em ! 1839: this causes the new opcode to be emitted into the ucode. ! 1840: The opcode should have the same name as the library routine ! 1841: that performs the operation. ! 1842: .IP (6) ! 1843: The new intermediate code opcode must also be added to the linker. ! 1844: Add a defined constant to ! 1845: \fMlink/opcode.h\fR; ! 1846: order here is not important. ! 1847: Then add the opcode name and the defined constant to ! 1848: \fMlink/opcode.c\fR; ! 1849: alphabetical order must be preserved here, ! 1850: since a binary search is used. ! 1851: Then edit the code generator in ! 1852: \fMlink/lcode.c\fR, ! 1853: adding a case in the routine ! 1854: \fMgencode\fR ! 1855: with either the binary or the unary operators. ! 1856: The standard processing here emits code that ! 1857: evaluates the operand(s), ! 1858: then calls a library routine ! 1859: with the same name as the intermediate code opcode. ! 1860: .IP (7) ! 1861: Add entries for the operator to \fMh/pnames.h\fR, using other operators as a guide. ! 1862: .LP ! 1863: The system is then be ready to be made as described above. ! 1864: .PP ! 1865: Adding a new control structure is similar in nature to ! 1866: adding a new operator. ! 1867: Most often, a new reserved word must be added to ! 1868: \fMtran/tokens\fR; ! 1869: this part of the token table must be kept in alphabetical order. ! 1870: The new token must be added to the grammar, ! 1871: and productions must be added, ! 1872: usually at the highest precedence level ! 1873: (the same as ! 1874: \fMif\fR, ! 1875: for example). ! 1876: The semantic action for the new production probably will ! 1877: involve creating a parse tree node of a new type. ! 1878: The new node type should be added to ! 1879: \fMtran/tree.h\fR ! 1880: and a new case in the routine ! 1881: \fMtraverse\fR ! 1882: (in ! 1883: \fMtran/code.c\fR) ! 1884: should be added to generate intermediate code. ! 1885: The intermediate code generated can use any of the existing ! 1886: opcodes or can use new ones created specifically for the new control structure. ! 1887: If new opcodes are created, ! 1888: they must be added to the linker as described above, ! 1889: and a new case in the routine ! 1890: \fMgencode\fR ! 1891: must generate code for it. ! 1892: The generated code can be either entirely in-line ! 1893: or can call a new library routine. ! 1894: If new code generation templates are needed, ! 1895: modify the code emission routines ! 1896: in ! 1897: \fMlink/lcode.c\fR. ! 1898: If the code calls a new library routine, ! 1899: add it to the ! 1900: \fMlib\fR ! 1901: directory and the ! 1902: \fMMakefile\fR ! 1903: there. ! 1904: Then the system is ready to be made. ! 1905: .PP ! 1906: Modifying the semantics of existing control structures, ! 1907: operators, or functions, ! 1908: often involves changing only the generated in-line code ! 1909: or a library routine. ! 1910: Modifying the syntax without disturbing any semantics ! 1911: usually requires only a change to the grammar. ! 1912: .PP ! 1913: Adding a new datatype means making many of the above changes. ! 1914: A new datatype code must be added to ! 1915: \fMh/rt.h\fR ! 1916: and ! 1917: \fMh/defs.s\fR, ! 1918: and a new data block format must be defined in \fMh/rt.h\fR. ! 1919: The size and location of the first descriptor of the new data block ! 1920: must be entered in ! 1921: \fMrt/dblocks.c\fR ! 1922: so that the garbage collector knows how to treat the block. ! 1923: The routines in ! 1924: \fMfunctions/image.c\fR ! 1925: and ! 1926: \fMrt/outimage.c\fR ! 1927: must be extended so that images of the new datatype can be produced. ! 1928: The routines in \fMfunctions/type.c\fR and \fMrt/equiv.c\fR must ! 1929: also be modified to account for the new type. ! 1930: In addition, \fMrt/anycmp.c\fR must be extended so that objects ! 1931: of the new type can be sorted relative to other types. ! 1932: New functions and operators on the new type may be needed, ! 1933: and possibly new coercion routines must be added to ! 1934: \fMrt\fR. ! 1935: .PP ! 1936: Adding a new keyword entails a change to ! 1937: \fMtran/keywords\fR ! 1938: (and, if a running Version 5 of Icon is not available, to ! 1939: \fMtran/keyword.h\fR) ! 1940: and a new case in ! 1941: \fMlib/keyword.c\fR. ! 1942: A ! 1943: \fMmake\fR ! 1944: in ! 1945: \fMtran\fR ! 1946: runs the program ! 1947: \fMmkkeytab\fR ! 1948: to produce both ! 1949: \fMtran/keyword.h\fR ! 1950: and ! 1951: \fMtran/keyword.c\fR. ! 1952: Many keywords require trapped variables, ! 1953: which requires changes to ! 1954: \fMh/rt.h\fR, ! 1955: \fMoperators/asgn.c\fR, ! 1956: and ! 1957: \fMrt/deref.c\fR; ! 1958: the trapped variable for ! 1959: \fM&random\fR ! 1960: is a good model. ! 1961: .PP ! 1962: As mentioned above, ! 1963: the examples in this section are intended ! 1964: to identify what parts of the system are affected ! 1965: by certain kinds of changes or extensions. ! 1966: A thorough understanding of the system is ! 1967: suggested, however, for other than minor changes. ! 1968: .bp ! 1969: .SH ! 1970: Acknowledgements ! 1971: .PP ! 1972: This report is a revision of earlier descriptions of the C ! 1973: implementation of Icon [11, 12], which were co-authored by Cary Coutant ! 1974: and Steve Wampler. ! 1975: Much of the material in this report is taken from the earlier ones. ! 1976: .PP ! 1977: Many features of the current implementation of Icon are based ! 1978: upon the original Ratfor implementation by ! 1979: Dave Hanson, Tim Korb, and Walt Hansen [13, 14]. ! 1980: .sp 1 ! 1981: .SH ! 1982: References ! 1983: .IP 1. 5n ! 1984: Griswold, Ralph E. and Madge T. Griswold. ! 1985: .I "The Icon Programming Language" . ! 1986: Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1983. ! 1987: .IP 2. 5n ! 1988: Kernighan, Brian W., and Dennis M. Ritchie. ! 1989: .I "The C Programming Language" . ! 1990: Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1978. ! 1991: .IP 3. 5n ! 1992: Mitchell, William W. \fIPorting the UNIX Implementation of Icon\fR. ! 1993: Technical Report TR 83-10d, Department of Computer Science, The ! 1994: University of Arizona, Tucson, Arizona, August 1984. ! 1995: .IP 4. 5n ! 1996: Griswold, Ralph E., Robert K. McConeghy, and William H. Mitchell. \fIVersion 5.9 of Icon\fR. ! 1997: Technical Report, Department of Computer Science, The University of ! 1998: Arizona, Tucson, Arizona, July 1984. ! 1999: .IP 5. 5n ! 2000: Johnson, Stephen C. ! 2001: ``Yacc: Yet Another Compiler-Compiler'' ! 2002: .I "Unix Programmer's Manual, Seventh Edition" . ! 2003: Bell Telephone Laboratories, Inc., Murray Hill, New Jersey, 1979. ! 2004: .IP 6. 5n ! 2005: Aho, A. V., and S. C. Johnson. ! 2006: ``LR Parsing''" ! 2007: .I "Computing Surveys" ! 2008: .B 6 , ! 2009: 2 (June 1974), 99-124. ! 2010: .IP 7. 5n ! 2011: Hanson, David R. ! 2012: ``Storage Management for an Implementation of SNOBOL4'', ! 2013: .I "Software\(emPractice and Experience" ! 2014: .B 7 , ! 2015: 2 (March 1977), 179-192. ! 2016: .IP 8. 5n ! 2017: Hanson, David R. ! 2018: ``Variable Associations in SNOBOL4'', ! 2019: .I "Software\(emPractice and Experience" ! 2020: .B 6 , ! 2021: 2 (April 1976), 245-254. ! 2022: .IP 9. 5n ! 2023: Hanson, David R. ! 2024: .I "The Manipulation of Variable-Length String Data in Fortran IV" . ! 2025: Technical Report, ! 2026: Department of Computer Science, ! 2027: The University of Arizona, Tucson, Arizona, ! 2028: May 1975. ! 2029: .IP 10. 5n ! 2030: Griswold, Ralph E. and William H. Mitchell. \fIInstallation and Maintenance ! 2031: Guide for Version 5.9 of Icon\fR. Technical Report TR 84-13, ! 2032: Department of Computer Science, ! 2033: The University ! 2034: of Arizona, ! 2035: Tucson, Arizona. August 1984. ! 2036: .IP 11. 5n ! 2037: Coutant, Cary A. and Stephen B. Wampler. \fIA Tour Through the ! 2038: C Implementation of Icon; Version 5\fR. Technical Report TR 81-11a, ! 2039: Department of Computer Science, The University of Arizona, Tucson, ! 2040: Arizona, December 1981. ! 2041: .IP 12. 5n ! 2042: Griswold, Ralph E., William H. Mitchell, and Stephen B. Wampler. ! 2043: \fIThe C Implementation of Icon; A Tour Through Version 5\fR. ! 2044: Technical Report TR 83-11a, Department of Computer Science, ! 2045: The University of Arizona, Tucson, Arizona. December 1983. ! 2046: .IP 13. 5n ! 2047: Korb, John Timothy. ! 2048: .I "The Design and Implementation of a Goal-Directed Programming Language" . ! 2049: Ph.D. Dissertation, Technical Report TR 79-11, ! 2050: Department of Computer Science, ! 2051: The University of Arizona, Tucson, Arizona, ! 2052: June 1979. ! 2053: .IP 14. 5n ! 2054: Hanson, David R., and Walter J. Hansen. ! 2055: .I "Icon Implementation Notes" . ! 2056: Technical Report TR 79-12a, ! 2057: Department of Computer Science, ! 2058: The University of Arizona, Tucson, Arizona, ! 2059: February 1980.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.