Annotation of 43BSDTahoe/ucb/lisp/doc/franz.n, revision 1.1.1.1

1.1       root        1: .\" Copyright (c) 1980 Regents of the University of California.
                      2: .\" All rights reserved.  The Berkeley software License Agreement
                      3: .\" specifies the terms and conditions for redistribution.
                      4: .\"
                      5: .\"    @(#)franz.n     6.1 (Berkeley) 4/29/86
                      6: .\"
                      7: ."  franz.n                            -[Thu Jun 17 11:01:27 1982 by jkf]-
                      8: ." this file will contain a description of the franz lisp system.
                      9: ." sort of a systems manual.
                     10: .de Fr
                     11: F\s-2RANZ\s0 L\s-2ISP\s0
                     12: ..
                     13: .tp
                     14: .(l C
                     15: .sz +2
                     16: \fBThe Franz Lisp System\fP
                     17: .sz -2
                     18: .sp 2v
                     19: \fIby
                     20: \fIJohn K. Foderaro\fP
                     21: .)l
                     22: .sp 3v
                     23: .tl ''Abstract''
                     24: .br
                     25: This document describes the 
                     26: .Fr
                     27: system written at The University of California 
                     28: at Berkeley.
                     29: Included are descriptions of the memory management, interpreter
                     30: and compiler, the conventions used within the C coded kernel
                     31: and those within the compiled code.
                     32: This does not duplicate the information found in The Franz Lisp Manual.
                     33: .++ C '\s-2Draft\s+2'\fBThe\ Franz\ Lisp\ System\fP'%'
                     34: .+c Characteristics\ of\ \F\s-2RANZ\ \s0L\s-2ISP\s0
                     35: .ls 1
                     36: .pp
                     37: There is no formal standard for lisp systems, thus each lisp
                     38: system is almost guaranteed to be different from any other lisp system.
                     39: In this section we will examine the design decisions which most often
                     40: characterize a lisp system.
                     41: This
                     42: focus does  not imply that all other parts of the language are generally
                     43: agreed upon.
                     44: In fact, there is nothing sacred to lisp system designers.
                     45: For example, one usually identifies the symbol 
                     46: .i nil 
                     47: with the empty list,
                     48: and one usually thinks of lisp as a dynamically scoped language.
                     49: Both of these conventions are are not true in the lisp dialect 
                     50: .i NIL
                     51: currently being written at MIT.
                     52: As another example, one imagines that a lisp system must 
                     53: use  garbage collection to reclaim storage.
                     54: The lisp dialect
                     55: .i ZetaLisp
                     56: that is running on Lisp Machines doesn't
                     57: normally use a garbage collector
                     58: because of the way in which it allocates its address space.
                     59: It is faster to reboot the machine than to do a garbage collection.
                     60: In most lisp dialects the lisp expressions are written in 
                     61: parenthesised form.  In 
                     62: .i Portable
                     63: .i Standard
                     64: .i Lisp
                     65: (PSL) at the University of Utah, programs are written 
                     66: primarily 
                     67: in an Algol-like
                     68: syntax.
                     69: .pp
                     70: Thus some of the discussion to follow indicates not so much 
                     71: .i unique
                     72: charateristics of 
                     73: .Fr
                     74: but instead, how decisions were made.
                     75: .sh 2 Scoping\ and\ Binding
                     76: .pp
                     77: The 
                     78: .Fr
                     79: interpreter
                     80: uses 
                     81: .i dynamic 
                     82: .i scoping , 
                     83: that is, after A calls B, B can access all of the
                     84: variables that  A allocated
                     85: as well
                     86: as those that A can access, provided B doesn't allocate variables of the same
                     87: names.
                     88: There are two popular ways of implementing dynamic scoping: 
                     89: .i deep
                     90: .i binding
                     91: and 
                     92: .i shallow
                     93: .i binding .
                     94: Note that we will use the terms variable allocation and lambda
                     95: binding interchangeably in this report.
                     96: The former term is less language-specific than the latter.
                     97: .sh +1 Deep\ Binding
                     98: .pp
                     99: In a deep binding implementation, when a variable is  allocated,
                    100: the name of the variable and a space for its value are pushed on the
                    101: top of a stack.
                    102: When a program asks for the value of a variable, 
                    103: the lisp system must search down the stack for the first occurrence of
                    104: the variable.
                    105: This system offers great flexibility for the 
                    106: programmer since  the state of his program 
                    107: can be described by the contents of the stack.
                    108: It is thus possible to save the context of a program by just saving
                    109: the stack or in some cases just a pointer to
                    110: a place in  the stack.
                    111: The problem with deep binding is that it is time-consuming to search
                    112: down the stack for the value of a variable, and, 
                    113: as a result, most systems 
                    114: modify the deep binding model by giving variables a global value
                    115: cell and allowing programs to set and access the global cell.
                    116: Some implementations of Interlisp use deep binding.
                    117: .sh +0 Shallow\ Binding
                    118: .pp
                    119: In a shallow binding implementation, each variable has a global value cell
                    120: which contains the current value of the variable.
                    121: When a variable is allocated inside a program, its name 
                    122: and old value are pushed on a stack called a
                    123: .i bindstack .
                    124: The variable is then given a new value. 
                    125: Throughout the lifetime of the allocation, the current value of the
                    126: variable can be found in the global value cell associated with the variable.
                    127: When leaving the context of the variable's allocation, the old value
                    128: is restored from the 
                    129: .i bindstack .
                    130: A shallow binding scheme makes it much cheaper to access 
                    131: the values of variables, however it is much more difficult to 
                    132: save the state and to restore it. 
                    133: .pp
                    134: .Fr
                    135: and most other lisps which are not derived from Interlisp
                    136: use shallow binding.
                    137: Some versions of Interlisp use shallow binding.
                    138: .sh -1 Lisp\ Compiler
                    139: .pp
                    140: Dynamic scoping often is not necessary and it is never cheap, even
                    141: in a shallow binding implementation.
                    142: Thus the 
                    143: .Fr  
                    144: compiler prefers to do lexical scoping rather than dynamic
                    145: scoping.
                    146: If the user does not specify otherwise,  when a 
                    147: function is compiled, all variables 
                    148: allocated will be 
                    149: put on a local stack and will 
                    150: .i not 
                    151: be accessible by any
                    152: function that this function calls.
                    153: This
                    154: convention results in faster code being generated by
                    155: the compiler, but if the
                    156: user is not careful, it can result in compiled and interpreted code
                    157: working differently.
                    158: In some of the new lisp designs, such as 
                    159: .i NIL 
                    160: and
                    161: .i Common
                    162: .i Lisp
                    163: the semantics of compiled and interpreted code have been
                    164: unified by
                    165: transferring the compiler semantics (lexical scoping) to the interpreter.
                    166: This is a rather large step
                    167: if dynamic scoping is no
                    168: longer an option, and it is not clear whether the resulting
                    169: language should be called 'lisp'.
                    170: .sh +0 Data\ Types
                    171: .pp
                    172: A complete list of data types in 
                    173: .Fr 
                    174: can be found in the first chapter of the 
                    175: .Fr
                    176: Manual.
                    177: The most important ones are described below.
                    178: .pp
                    179: Lisp is a list processing language and the basic data type is the list
                    180: cell. 
                    181: A list cell also goes by the name of cons cell, pair, or dotted pair (dtpr).
                    182: It contains pointers to two lisp objects, and these pointers can
                    183: be accessed via the 
                    184: .i car
                    185: and
                    186: .i cdr 
                    187: functions.
                    188: Each pointer requires four (8-bit) bytes and thus the list cell is 
                    189: eight bytes large.
                    190: The cdr pointer is stored first since this makes it easier 
                    191: to 'cdr down a list' using the addressing modes of the VAX.
                    192: .pp
                    193: The symbol cell is used to implement variables.  It contains  a pointer
                    194: to a string that is the print name, a cell to store a value, a cell to
                    195: store the function definition, a cell to store a pointer to  the property
                    196: list and one pointer that the list system uses if it stores
                    197: the symbol in the oblist.
                    198: .sh +0 Memory\ Management
                    199: .pp
                    200: A lisp system must be able to determine the type of an
                    201: object at run time.
                    202: The method used to determine the type influences the
                    203: way storage is managed and garbage collection is done.
                    204: Next, we will look at three possible places
                    205: to store the type information.
                    206: .sh +1 Typed\ data
                    207: .pp
                    208: The type of the data object could be placed in the object itself, much
                    209: as Pascal stores the type of a variant record in the record itself.
                    210: This could result is a large amount of storage being used to store
                    211: type information.
                    212: No lisp system that we know of uses this method exclusively.
                    213: .sh +0 Typed\ pointers
                    214: .pp
                    215: Every reference to a lisp object could contain the type of the object
                    216: referenced.
                    217: This is a good idea in a machine like an IBM 370 where only
                    218: part of  machine word is used by the addressing hardware.
                    219: Lisps that use typed pointers generally use a
                    220: .i heap
                    221: memory management scheme and a 
                    222: .i copying 
                    223: garbage collector.
                    224: In a heap scheme, 
                    225: all memory is divided by a pointer (called the
                    226: heap pointer) separating lisp
                    227: data and free space.  
                    228: When a new lisp object is needed, it is formed at the
                    229: dividing line and then  
                    230: the heap pointer is incremented
                    231: past the new object.
                    232: Garbage collection is done by maintaining two separate spaces for lisp
                    233: data, only one of which is used at any one time.
                    234: When one fills up, all active lisp objects are copied to the 
                    235: other space, leaving the first space totally free. 
                    236: Subsequent allocations are done from the second space, until it fills up,
                    237: at which point 
                    238: all active data in the second space is copied to the first space.
                    239: The advantage of the copying garbage collector is that the data space
                    240: is compacted, which will improve virtual memory behavior.
                    241: The disadvantage is that
                    242: half the data space is always unused.
                    243: .pp
                    244: PSL on a PDP-10 uses this scheme, as does Lisp 370 on 
                    245: an IBM 370.
                    246: PSL  and NIL on the VAX will also use this scheme.  
                    247: Since the VAX hardware 
                    248: uses the entire word for address 
                    249: calculation, 
                    250: PSL and NIL 
                    251: must mask off the type bits
                    252: of a pointer before using it to reference memory.
                    253: .sh +0 Typed\ pages
                    254: .pp
                    255: The final alternative is to allocate data objects by pages rather than
                    256: individually.
                    257: Each page contains only one type of object and there is a table that 
                    258: keeps track of the type of data object in each page.
                    259: The address of an object tells which page the object is on and 
                    260: the page number tells which type the object is.
                    261: Since a whole page of objects is allocated when only one 
                    262: object is necessary,
                    263: the rest of the objects in the page are put on a free list.
                    264: This method of allocation is sometimes referred to as
                    265: .i bibop
                    266: for BIg Bag Of Pages.
                    267: The garbage collector for bibop is usually
                    268: a traditional
                    269: .i mark
                    270: and
                    271: .i sweep
                    272: algorithm.
                    273: All active data objects are marked  and
                    274: then each page is swept linearly with
                    275: the free cells being put on the free list and the used cells 
                    276: left alone.
                    277: The advantage of mark and sweep over a copying garbage collector is that
                    278: the mark phase is cheaper because data objects do not have to be copied.
                    279: Also, all of memory can be used since there is no requirement for two spaces.
                    280: The disadvantage is that a sweep phase is required in a mark and sweep
                    281: but one is not required in a copying garbage collector.
                    282: The sweep phase is expensive because it has to alter most data pages 
                    283: while building a free list.
                    284: This operation 
                    285: can be expensive in a virtual memory environment.  
                    286: Alternatives to the sweep phase have been proposed in
                    287: [Foderaro+Fateman Characterization of Vax Macsyma].
                    288: .pp
                    289: .Fr
                    290: uses a bibop memory allocator and a mark and sweep garbage collector,
                    291: as does Maclisp (on a PDP-10).
                    292: The reason that
                    293: .Fr 
                    294: uses bibop is primarily due to the VAX architecture,
                    295: which makes typed pointers
                    296: expensive to implement.
                    297: Also, typed pointers would make it difficult to pass lisp values
                    298: to  programs written in other languages, some of which may not have
                    299: the ability to extract the address of a lisp object from a typed pointer.
                    300: .sh -1 Construction
                    301: .pp
                    302: The
                    303: .Fr
                    304: system is built by adding lisp code to a C-coded kernel.
                    305: The C-coded kernel is a working lisp interpreter, although it has few of
                    306: the functions a lisp system needs.
                    307: The lisp code provides most of the functions, the top level and error
                    308: handlers.
                    309: The lisp compiler is also written in lisp, 
                    310: but is usually 
                    311: run as a separate process.
                    312: .sh +0 Unique\ features
                    313: .pp
                    314: .Fr
                    315: can dynamically load and execute functions defined in the other languages
                    316: which run on the VAX.
                    317: It uses two different dynamic loaders.
                    318: One is part of the lisp system and is used for lisp only, it is called 
                    319: .i fasl
                    320: (for fast loader).
                    321: The other is the Unix loader, ld, which is used for loading in 
                    322: C, Fortran or Pascal code.
                    323: Loading code with fasl
                    324: is much faster than with ld
                    325: since fasl benefits from 
                    326: the simplicity of a lisp object file.
                    327: .+c Data\ Structures
                    328: .sh 0 _
                    329: .nr $1 \n(ch
                    330: .pp
                    331: In this chapter we shall examine the data structures which are most 
                    332: important to the 
                    333: .Fr
                    334: system.
                    335: First, though we will see how the lisp system uses the address space.
                    336: .sh 2 Physical\ layout
                    337: .pp
                    338: As a Unix process, lisp has three address regions: text, data and stack.
                    339: Text begins at location 0 and continues up to 0x3b73c (0x means hex number).
                    340: The data segment begins on the next page boundary and continues
                    341: up to a limit set by the configuration parameters (currently 0x2fd000).
                    342: Lisp does not begin running with such a large data segment,  rather it grows
                    343: when necessary.
                    344: The stack segment begins at address 0x7fffffff and grows downward to a 
                    345: maximum size of one-half megabyte.
                    346: .pp
                    347: The text segment stores the instructions for the C kernel as well as read-only
                    348: data.
                    349: The read-only data for lisp are 
                    350: the symbol nil, the i/o ports, and the small integer
                    351: table.
                    352: The symbol nil is stored at location 0 which makes it very easy to test
                    353: whether a value is nil.
                    354: The problem with storing nil in read-only space is that a special case
                    355: must be made for nil's property list, which is the only thing in the
                    356: nil symbol that the user is permitted to alter.
                    357: .pp
                    358: The fixnums -1024 through 1023 are stored sequentially, with 0 being
                    359: stored at 0x1400.
                    360: .Fr
                    361: doesn't have any 'immediate' lisp data, that is data whose value is 
                    362: stored as part of the reference to the data. 
                    363: But, by storing some of the fixnums in a known place, we can achieve
                    364: some of the benefits of immediate data: 
                    365: A program  can use 
                    366: .i eq
                    367: as a test for a fixnum in the range -1024 to 1023.
                    368: In the majority of cases, when asked
                    369: to allocate a fixnum, the system can  return a pointer into this table
                    370: and bypass the memory allocator.
                    371: .sh +0 Typetable
                    372: .pp
                    373: .Fr
                    374: uses the typed pages (or bibop)
                    375: method of type determination.
                    376: The 
                    377: .i typetable
                    378: is an array of bytes (
                    379: .i chars 
                    380: in  C lingo).
                    381: This table describes the type of all pages, from page 0 where nil is stored,
                    382: up to the end of data space.
                    383: Thus there are many entries that describe the types of the pages which
                    384: make up the C kernel.
                    385: In order to reduce the wasted space in the typetable,
                    386: we could have made the typetable begin typing pages at the start of data
                    387: space and  make a special case of the pages that contain nil and
                    388: the small integer table.
                    389: However, the
                    390: effect of this change would be that type checks 
                    391: (which are done in-line in compiled code) would
                    392: be longer and slower.  Since type checking happens quite
                    393: frequently, we felt it was better to waste the space in the
                    394: typetable in order to save execution time  and instruction space.
                    395: .pp
                    396: Each page on a VAX is 512 bytes, and thus to find the type of an 
                    397: object given the address of it, 
                    398: it is only necessary to shift the address right 9 bits
                    399: and index the typetable array offset by one.
                    400: The offset by one is required because the value -4, which is an illegal
                    401: address, is used as a sentinel value to indicate an illegal value.
                    402: Thus when we take the type of -4 we will be indexing the table by -1
                    403: and we want to retrieve the first byte in the table (which contains
                    404: the value UNBOUND).
                    405: The C macro which retrieves the type is this (from file h/global.h):
                    406: .(b I
                    407: 
                    408: #define        TYPE(a1)        ((typetable+1)[(int)(a1) >> 9])
                    409: 
                    410: .)b
                    411: This is code generated by the lisp compiler to check if the type
                    412: code of an argument (stored at 0(r10)) is a symbol (which is type
                    413: code 1):
                    414: .(b I
                    415: 
                    416: ashl   $-9,0(r10),r0
                    417: cmpb   _typetable+1[r0],$1
                    418: 
                    419: .)b
                    420: .pp
                    421: The type codes which are stored in the typetable are these:
                    422: .ts 2i 4i 6i
                    423: .(b I
                    424: UNBO     -1\tSTRNG     0\tATOM     1\tINT       2
                    425: DTPR      3\tDOUB      4\tBCD      5\tPORT      6
                    426: ARRAY     7\tOTHER     8\tSDOT     9\tVALUE    10
                    427: HUNK2    11\tHUNK4    12\tHUNK8   13\tHUNK16   14
                    428: HUNK32   15\tHUNK64   16\tHUNK128 17
                    429: .)b
                    430: The names given above are the C kernel internal names.
                    431: ATOM is symbol, INT is fixnum, DTPR is list cell, 
                    432: DOUB is flonum, BCD is binary,
                    433: SDOT is bignum and all the HUNKn types are just known as hunk to
                    434: the user.
                    435: .sh +0 Stacks
                    436: .pp
                    437: .Fr
                    438: uses three stacks: the 
                    439: .i C 
                    440: .i runtime 
                    441: stack, the 
                    442: .i namestack
                    443: and the
                    444: .i bindstack .
                    445: The C runtime stack is the stack part of the address space 
                    446: and the other two stacks are stored in the data space.
                    447: .sh +1 C\ runtime\ stack
                    448: The C-coded kernel uses this
                    449: stack in the same way as a typical  C program
                    450: use the stack:
                    451: storing return addresses
                    452: and non-lisp-object arguments to  subroutines,
                    453: saving registers,
                    454: and storing  local variables within a function.
                    455: The lisp system stores 
                    456: .i catch
                    457: .i frames
                    458: on this stack as well (to be described later).
                    459: .pp
                    460: The lisp system assumes that there are no lisp data on the stack and
                    461: thus  the use of this stack 
                    462: by a program is unrestricted.
                    463: As will be discussed later on, the 
                    464: .b calls
                    465: and 
                    466: .b ret
                    467: instructions are used for most subroutine calls and returns.
                    468: These instructions expect the stack to look a certain way.
                    469: .sh +0 Namestack
                    470: .pp
                    471: The namestack contains only lisp data.
                    472: It is used to pass arguments to functions and to  hold
                    473: local (lexically scoped) data within lisp functions.
                    474: It is also used as a temporary storage spot for lisp data
                    475: which must be protected from garbage collection.
                    476: .pp
                    477: A slight digression on the garbage collector:
                    478: The person who writes code for the lisp system must always be aware of
                    479: his greatest enemy: the garbage collector.  
                    480: Whenever a function is called that could possibly allocate more lisp
                    481: data, one must assume that it 
                    482: when it attempts to allocate space,  the garbage collector
                    483: will be invoked and that it will take away everything that isn't protected
                    484: from garbage collection.
                    485: The objects that are protected from garbage collection are: the
                    486: namestack, the bindstack, the oblist (table of interned symbols),
                    487: and the compiler literals.  Objects that are only 
                    488: referred to by values in registers or
                    489: the C runtime stack will not be seen by the mark phase of the
                    490: garbage collector and will be swept up during the sweep phase.
                    491: .pp
                    492: Back to the namestack:
                    493: The first free element on the namestack is pointed to by the
                    494: C variable 
                    495: .i np .
                    496: This variable is always stored in the VAX register r6.
                    497: Another pointer into the namestack is the C variable
                    498: .i lbot .
                    499: It is always stored in VAX register r7.
                    500: Its use will be described in the section on calling conventions.
                    501: .sh +0 Bindstack
                    502: .pp
                    503: The bindstack is a stack of lisp object pairs: symbol and
                    504: value.
                    505: It is used to implement shallow binding.
                    506: When a symbol is lambda-bound, the symbol and its current value are put
                    507: on this stack.
                    508: Then the symbol can be given a new value.
                    509: When the context of the lambda binding is over, the symbol and value pair
                    510: are popped from the stack and the symbol is given its old value.
                    511: The C variable
                    512: .i bnp
                    513: points to the first free entry on the bindstack.
                    514: In the C code, the following macros lambda-bind a symbol to a value
                    515: and restore the old value:
                    516: .(b
                    517: #define PUSHDOWN(atom,value)\e
                    518:        {bnp->atm=(atom);\e
                    519:          bnp++->val=(atom)->a.clb;\e
                    520:         (atom)->a.clb=value;\e
                    521:        if(bnp>bnplim) binderr();}
                    522: 
                    523: #define POP\e
                    524:        {--bnp;  bnp->atm->a.clb=bnp->val;}
                    525: .)b
                    526: .sh -1 Bitmap
                    527: .pp
                    528: The bitmap is used in garbage collection to hold
                    529: the mark bits for the mark and sweep garbage collector.
                    530: As its names implies, it is viewed as a collection of bits.
                    531: The minimum size of a lisp data object is 4 bytes, thus
                    532: there are 128 of those on a VAX page of 512 bytes.
                    533: Since there are 8 bits in a byte, it takes 16 bytes to obtain 128 bits.
                    534: Therefore the size of the bitmap in bytes is 16 
                    535: times the maximum number of
                    536: pages.
                    537: Like the typetable, the bitmap  keeps track of marks from the
                    538: bottom of memory, not the bottom of data space.
                    539: The bitmap and the typetable are static structures.
                    540: It is their size, which is determined when the C kernel is built,
                    541: which determines the size to which the data segment can grow.
                    542: .sh +0 Transfer\ Tables
                    543: .pp
                    544: Transfer tables are used by compiled lisp code as a transfer vector to
                    545: other functions.
                    546: A transfer table consists of pairs of entries: symbol and pointer to
                    547: function.
                    548: When a compiled lisp function  calls another (non-local) function, it
                    549: calls indirectly through an entry in the transfer table.
                    550: Depending on the setting of certain status variables, the call may
                    551: bring control into a function linkage routine or directly to
                    552: the function desired (if that function is a compiled lisp or C function).
                    553: .pp
                    554: Transfer tables serve a number of useful purposes.
                    555: .np
                    556: They allow compiled code to call interpreted code
                    557: or compiled code using the same calling sequence.
                    558: .np
                    559: They allow the selection of which function to call to be determined
                    560: at runtime based on the name of the function.
                    561: In most other languages, this selection process is done at either
                    562: compile or load time.
                    563: .np
                    564: They allow the user to run in a debugging mode where all calls
                    565: from compiled code go through the interpreter.
                    566: Once control reaches the interpreter, it is easier to debug.
                    567: .np
                    568: They allow the user to run in a production mode, where all calls
                    569: from compiled to other compiled code are done without function
                    570: lookup overhead.
                    571: .np
                    572: They allow the user to switch between debugging and production modes
                    573: at runtime with one function call.
                    574: .pp
                    575: Transfer tables will be described further  in the section on 
                    576: the lisp compiler.
                    577: .sh +0 Catch\ frames
                    578: .pp
                    579: Lisp has a number of forms of non-local transfers.
                    580: Among them are 
                    581: .i throw ,
                    582: .i error ,
                    583: .i return 
                    584: and
                    585: .i go .
                    586: If a program is willing to accept a non-local transfer, it puts a 
                    587: .i catch
                    588: .i frame
                    589: on the stack which describes which type of transfer it 
                    590: accepts.
                    591: The catch frame describes the current state of the registers,
                    592: the variables np, lbot, and bnp, and also contains entries that describe
                    593: what kinds of non-local transfers the function will accept.
                    594: After creating a catch frame, the program  goes on to evaluate forms.
                    595: Should the evaluation of one of those forms result in a non-local
                    596: transfer to the catch frame that this program has set up, the 
                    597: system will restore the namestack and bindstack to the way
                    598: they were when the program put the catch frame on the stack (by using
                    599: np and bnp) and will return control to the program (setting 
                    600: the variable retval to describe why it returned).
                    601: This is described further in the 
                    602: section on interpreter conventions.
                    603: .pp
                    604: The C variable 
                    605: .i errp
                    606: points to the most recent catch frame, and then each catch frame points
                    607: to the previous catch frame.
                    608: .sh +0 oblist
                    609: .pp
                    610: Normally when symbols are created they are
                    611: .i interned ,
                    612: that is they are put in a hash table called an oblist (or obarray).
                    613: The purpose of the oblist is to insure that if you type a
                    614: symbol's name to the reader, you will always get  the  same symbol.
                    615: Also it protects the symbol from garbage collection.
                    616: The oblist is simply a hash table with buckets, with a hash link being
                    617: part of the symbol data structure.
                    618: Currently the hash table is not accessible to a lisp program, but with
                    619: a little work it could be.
                    620: .+c Interpreter
                    621: .sh 0 _
                    622: .nr $1 \n(ch
                    623: .pp
                    624: The interpreter is composed of these parts:
                    625: .ip \fIcore:\fP
                    626: The functions eval, apply and funcall.
                    627: .ip \fIstack\ management:\fP
                    628: The code to create catch frames and handle non-local transfers.
                    629: .ip \fImemory\ management:\fP
                    630: The code to allocate and garbage collect memory.
                    631: .ip \fIthe\ functions:\fP
                    632: The user callable functions that expect lisp arguments and return
                    633: lisp values.
                    634: .pp
                    635: In the above list, the first three are written mainly in C (with a little
                    636: assembler) and the last is written mainly in Lisp with a little 
                    637: bit in C
                    638: and a very small amount in assembler.
                    639: .pp
                    640: The core functions are the center of activity  in the interpreter.
                    641: The 
                    642: .i eval 
                    643: function given a lisp expression  to evaluate.  
                    644: It must determine if it is a simple expression such as a symbol or number
                    645: whose value eval can determine immediately, or if it is an function
                    646: calling expression.
                    647: If the form is a function call, eval must determine if the arguments should
                    648: be evaluated and if so eval must recursively call itself to evaluate the
                    649: arguments and then stack them.
                    650: If the function called is to be interpreted, eval must lambda-bind the
                    651: formal variables of the function to the arguments just stacked.
                    652: If the function being called is compiled, eval just calls the function and
                    653: lets the function do the lambda binding if it wants to.
                    654: .pp
                    655: The 
                    656: .i apply 
                    657: function is passed two lisp objects: 
                    658: a function to call (either a symbol or a functional object)
                    659: and a list of arguments already evaluated.
                    660: It will do lambda binding if the function to call is to
                    661: be interpreted.
                    662: .pp
                    663: The 
                    664: .i funcall
                    665: function is passed any number of lisp objects,
                    666: the first of which is the function to call, and the rest are the
                    667: arguments to the function 
                    668: which have already been evaluated and stacked.
                    669: Funcall will  do lambda binding if the function to call is
                    670: to be interpreted.
                    671: .pp
                    672: When compiled lisp code calls a function 
                    673: which must be interpreted, it enters the interpreter through the
                    674: funcall function.
                    675: The interpreter may go to compiled code through either eval, apply or
                    676: funcall, though most often it goes through eval.
                    677: .sh 2 Conventions
                    678: .pp
                    679: These are the conventions used in interpreted and compiled code.
                    680: .sh +1 C\ conventions
                    681: .pp
                    682: Our conventions are extensions of the C calling conventions, so we
                    683: describe here the conventions for C.
                    684: The VAX has 16 general
                    685: purpose registers.
                    686: Registers r12 through r15 are reserved
                    687: for use by the hardware because we use the
                    688: .i calls
                    689: subroutine call.
                    690: Registers r0-r5 can be used by a program without saving them first.
                    691: The result of a function call is returned in r0.
                    692: Registers r11-r6 are allocated  (from high to low)
                    693: by the C compiler when a 
                    694: .i register
                    695: declaration is made in the C code.
                    696: Registers r11-r6 must be
                    697: saved upon entry and restored upon exit if they are used within a function.
                    698: On the VAX it is very easy to preserve registers
                    699: since it is done automatically
                    700: by the hardware if the
                    701: .i calls
                    702: (or 
                    703: .i callg ) 
                    704: instruction is used.
                    705: The first short word (two bytes) of a subroutine is a 
                    706: register save mask which
                    707: tells which registers should be saved (on the C runtime stack) upon
                    708: entry and restored when a 
                    709: .i ret
                    710: instruction is done.
                    711: .sh +0 np\ and\ lbot
                    712: .pp
                    713: Earlier we  mentioned that the C variables np and lbot are 
                    714: always stored in
                    715: registers r6 and r7.
                    716: It is very difficult to globally assign a variable to a register
                    717: in the C language (no external register declarations
                    718: are permitted)
                    719: and thus we have to filter
                    720: the output of the  C compiler and convert all occurrences of 'np' to 'r6'
                    721: and all occurrences of 'lbot' to 'r7'.
                    722: This is only half the job, though.
                    723: We also must modify  the register save masks for those routines which 
                    724: alter the value of np or lbot to insure that those registers get 
                    725: saved and restored upon function entry and exit.
                    726: This is done in the C code by putting 
                    727: .(b C
                    728: Savestack(n)
                    729: .)b
                    730: at the beginning of a routine which will alter np or lbot and which
                    731: also allocates n register variables.
                    732: Also in that routine, before the function returns, we put 
                    733: .(b C
                    734: Restorestack()
                    735: .)b
                    736: This is not really necessary in the VAX, but it is there for other 
                    737: machine implementations which don't have a 
                    738: .i ret
                    739: function which restores registers.
                    740: The calls to Savestack are recognized by a filter which processes
                    741: the output
                    742: of the C compiler and fixes up 
                    743: the save masks.
                    744: .sh +0 Function\ calling
                    745: .pp
                    746: The following text describes what the conventions are for 
                    747: calling
                    748: compiled lisp functions, whether they were written in lisp or C.
                    749: We look at it from the viewpoint of the called function.
                    750: .pp
                    751: Upon entry
                    752: to the compiled functon,
                    753: lbot points to the first argument and np points to the
                    754: first free spot on the namestack.
                    755: If np equals lbot then there are no arguments.
                    756: Recall that np will be in r6 and lbot in r7.
                    757: The function is free to alter registers r0 through r5 and should return
                    758: the result in r0.
                    759: The function may alter registers r6 through r11 as long as their 
                    760: original values
                    761: are restored when the function returns.
                    762: The value of np should always 
                    763: point to the first free entry in the namestack.
                    764: This is all that is required.
                    765: The extra conventions followed by 
                    766: the lisp compiler 
                    767: in the code it generates are described in the next chapter.
                    768: .sh +0 Catch\ frames
                    769: .pp
                    770: A catch frame saves the state of execution that a program
                    771: might  want to return to at some later time.
                    772: A catch frame  is stored on the C runtime stack, with the most recent 
                    773: frame pointed to by the C global variable errp.
                    774: The information saved is 
                    775: .ip \fIargs\fP
                    776: - one or two optional arguments. The lisp compiler always 
                    777: stacks two 
                    778: arguments since it must know exactly how large the frame is.
                    779: .ip \fIclass\fP
                    780: - the class describes what type of frame this is (described below).
                    781: .ip \fIretaddr\fP
                    782: - address to return to if returning from this frame
                    783: .ip \fIolderrp\fP
                    784: - pointer to next older catch frame on the stack
                    785: .ip \fIbnp\fP
                    786: - value of bnp (bindstack pointer) when the frame was created
                    787: .ip \fInp\fP
                    788: - value of np when the frame was created
                    789: .ip \fIlbot\fP
                    790: - value of lbot when the frame was created. 
                    791: .ip \fIsystem\ dependent\fP
                    792: - the rest of the information stacked depends on the particular machine.
                    793: In the case of the VAX, registers r13 through r8 are stacked. (r14 and
                    794: r15 are the stack pointer and program counter; they are not saved since
                    795: they can be calculated from the other information).
                    796: .pp
                    797: The information in a catch frame is put on the C runtime stack
                    798: in the precise order given above, and the variable errp points not
                    799: at the beginning of the entire frame, but to the lbot entry.
                    800: Thus errp\ +\ 4 points to np.
                    801: The classes of frames are described below.
                    802: Each class is defined as a constant in the C code (h/frame.h) whose value
                    803: is a small integer. 
                    804: .ip F_PROG\ [1]
                    805: - takes no arguments.  This frame marks the beginning of a piece of
                    806: code which can accept a 
                    807: .i return
                    808: or a 
                    809: .i go .
                    810: The interpreter uses this to implement
                    811: .i prog
                    812: and 
                    813: .i do
                    814: and for its primative break loop.
                    815: The lisp compiler does not use catch frames for progs since it only
                    816: permits 
                    817: .i returns
                    818: and
                    819: .i gos
                    820: to occur within
                    821: .i progs 
                    822: or 
                    823: .i dos
                    824: and thus it can determine how to do the non-local transfer
                    825: at compile time.
                    826: .ip F_CATCH\ [2]
                    827: - this takes one or two arguments and is used to implement both
                    828: .i catch
                    829: and 
                    830: .i errset .
                    831: In both cases 
                    832: the first argument is the tag which is being caught,
                    833: which in the case of an 
                    834: .i errset
                    835: is symbol ER%all.
                    836: An 
                    837: .i errset
                    838: also
                    839: supplies a second argument which is non nil if the error message
                    840: should be printed.
                    841: .ip F_RESET\ [3]
                    842: - in the C kernel, the reset function 
                    843: is implemented as a non local transfer to a F_RESET catchframe.
                    844: When the lisp system is built, the reset function is redefined to
                    845: do a 
                    846: .i throw.
                    847: Thus this type of catch frame is rarely used.
                    848: .ip F_EVAL\ [4]
                    849: - this has one argument, the form being evaluated. 
                    850: Since stacking this
                    851: on every eval would be expensive,
                    852: this type of catch frame is only stacked if a \fI(*rset\ t)\fP
                    853: has been done and if the value of the symbol 
                    854: .i *rset
                    855: is non nil.
                    856: The form being evaluated is stacked  so that 
                    857: the necessary information for the
                    858: .i evalframe
                    859: function is available and so that 
                    860: .i freturn
                    861: can return from the context of any pending evaluation.
                    862: .ip F_FUNCALL\ [5]
                    863: - this is much like F_EVAL, 
                    864: except the one argument it takes is the
                    865: name of the function to call.  
                    866: It has the same restrictions on when it is stacked as  F_EVAL
                    867: and for the same reasons.
                    868: .pp
                    869: In C, a frame is pushed on the stack with Pushframe, with a call
                    870: of one of these forms:
                    871: .(b
                    872: errp = Pushframe(class);
                    873: errp = Pushframe(class,arg1);
                    874: errp = Pushframe(class,arg1,arg2);
                    875: .)b
                    876: After the call the C variable 
                    877: .i retval
                    878: contains the value which describes why this function returned.
                    879: You must remember that it is possible for this one function call to
                    880: return more than once! 
                    881: The reasons it can return are these (from h/frame.h):
                    882: .ip C_INITIAL\ [0]
                    883: This is the initial call to set up the frame.
                    884: .ip C_GO\ [1]
                    885: This will only happen for F_PROG frames. 
                    886: In this case, 
                    887: the C variable lispretval contains a lisp symbol which is the tag
                    888: to which control should be transferred.
                    889: If the tag cannot be found in this prog or do body, the
                    890: tag should be again thrown up the stack to a next higher prog or do.
                    891: .ip C_RET\ [2]
                    892: This will only happen for F_PROG frames.
                    893: In this case, lispretval contains the value to return from this prog.
                    894: .ip C_THROW\ [3]
                    895: This will only happen for F_CATCH frames.
                    896: In this case lispretval contains the value to return from this catch.
                    897: .ip C_RESET\ [4]
                    898: This will only happen for F_RESET frames.
                    899: It simply means that a reset is being done.
                    900: .ip C_FRETURN\ [5]
                    901: This will only happen for F_EVAL and F_FRETURN frames.
                    902: The variable lispretval contains the value to return from this
                    903: call to 
                    904: .i eval
                    905: or
                    906: .i funcall .
                    907: .pp
                    908: The call to Pushframe is turned into a simple subroutine call (using
                    909: the 
                    910: .i jsb
                    911: instruction instead of the
                    912: .i calls )
                    913: by the filters which alter the code coming out of the C compiler.
                    914: Thus it may be useful to see what stacking a catch frame really looks
                    915: like.  
                    916: Here is what the lisp compiler generates 
                    917: to stack the frame for \fI(*catch\ 'tag\ x)\fP
                    918: .(b
                    919: movl   0(r8),r0        #move 'tag to r0
                    920: pushl  $0              # dummy argument
                    921: pushl  r0              # put tag argument on C runtime stack
                    922: pushl  $2              # push F_CATCH
                    923: jsb    _qpushframe     # call Pushframe
                    924: movl   r0,_errp        # move result into errp
                    925: .)b
                    926: .pp
                    927: Every function which does a Pushframe() must also do a Popframe()
                    928: before it returns to its caller.
                    929: This simply removes the frame from the stack.
                    930: In C it is written:
                    931: .br
                    932: .tl ''errp = Popframe()''
                    933: in the code generated by the lisp compiler it looks like:
                    934: .(b
                    935: movl   _errp,r1        # put current errp in r1
                    936: movl   12(r1),_errp    # put previous errp in errp
                    937: addl3  $32,r1,sp       # pop frame from stack
                    938: .)b
                    939: .pp
                    940: Non-local transfers are done with the Inonlocalgo
                    941: function. This function always takes three arguments. 
                    942: The
                    943: first is the return type (one of the types mentioned
                    944: above which begin with 'C_'). It will be assigned to retval.
                    945: The second argument is the value to be assigned to lispretval,
                    946: except in the case of a C_THROW, where the second argument
                    947: is the tag to throw to and the third argument is the value
                    948: to assign to lispretval.
                    949: This function never returns. 
                    950: If it doesn't find a catch frame
                    951: which matches what it is looking for, it signals an error.
                    952: The function is called with 
                    953: .i calls
                    954: and the arguments are stacked on the C runtime stack, not the 
                    955: namestack.
                    956: .+c Liszt:\ The\ Lisp\ Compiler
                    957: .sh 0 _
                    958: .nr $1 \n(ch
                    959: .pp
                    960: The purpose of compiling a lisp function is to create a representation of 
                    961: the function which can be evaluated in less time and perhaps take
                    962: up less space.
                    963: There are two approaches to lisp compilers.
                    964: One is to convert the functions to a compact form, often called 
                    965: .i bytecodes
                    966: which can be rapidly interpreted.
                    967: Each bytecode represents a primitive operation in the lisp system.
                    968: This approach is simple to implement but there is an time penalty
                    969: in using an  interpreter.
                    970: The other approach is to compiled to  machine code.
                    971: In general, this is not as portable as the bytecode  approach but the result
                    972: generally runs faster.
                    973: There are two ways of compiling to  machine code. 
                    974: One is to place arguments to functions in registers. 
                    975: If there are more arguments than registers, the arguments are put on
                    976: a stack and a special type of call is made.
                    977: This method is used in Maclisp.
                    978: The other method is to assume a stack model, in which
                    979: arguments to a function are placed on a stack.
                    980: This is what the 
                    981: .Fr
                    982: compiler Liszt does.
                    983: The stack model made it much easier to write parts of the lisp
                    984: system in the C langauge.
                    985: It also simplifies the garbage collector since the mark phase doesn't
                    986: have to locate and peruse all saved registers looking for lisp data.
                    987: .sh 2 File\ Compiler
                    988: .pp
                    989: When a file of lisp expressions is loaded,
                    990: the 
                    991: .i load 
                    992: function repeatedly reads and evaluates the forms in the
                    993: file.
                    994: Some of these evaluations may result in functions being defined, 
                    995: and others may use the functions just defined or previously defined.
                    996: The job of the lisp compiler is to create an object file, which,
                    997: when read in 
                    998: by 
                    999: .i fasl,
                   1000: acts just like it was 
                   1001: .i load ed
                   1002: in, except when a function is defined it is defined in machine
                   1003: code, not as a lisp expression.
                   1004: This is quite a bit different from what compilers do in other languages
                   1005: and it is done this way to make it easier to 
                   1006: switch between compiled and
                   1007: interpreted code.
                   1008: .sh +0 Differences\ between\ compiled\ and\ interpreted
                   1009: .pp
                   1010: There are some differences, though, between compiled and interpreted code.
                   1011: Local variables in compiled code are lexically scoped (put on the
                   1012: namestack and inaccessible to other functions) unless the variable
                   1013: has been declared 
                   1014: .i special.
                   1015: The declaration:
                   1016: .(b
                   1017: \fI(declare (special x y))\fP
                   1018: .)b
                   1019: declares both x and y to be special from this point in the file on.
                   1020: The declaration 
                   1021: .(b
                   1022: \fI(declare (specials t))\fP
                   1023: .)b
                   1024: declares all variables to be special.
                   1025: .pp
                   1026: Lisp has a very powerful macro definition system.
                   1027: The compiler will macro expand all it can, whereas the interpreter
                   1028: expands macros when necessary but never replaces a macro call with
                   1029: its expansion.
                   1030: Thus if you redefine a macro, the compiled code that uses it will
                   1031: still work the same way, but the interpreted code will use the 
                   1032: new definition. 
                   1033: Also, when compiling a file, macro definitions must be
                   1034: done before any call on the macro  is encountered during compiling.
                   1035: In the interpreter, 
                   1036: macros can be defined or redefined anytime up until
                   1037: they are used.
                   1038: Thus the interpreter is far freer about macro definitions than 
                   1039: the compiler.
                   1040: This could cause programs which work interpreted to
                   1041: fail compiled.
                   1042: .sh +0 Top\ level\ algorithm
                   1043: .pp
                   1044: The top level algorithm of the compiler is simply this: 
                   1045: .np 
                   1046: read a lisp expression from the input file
                   1047: .np 
                   1048: macro expand the top level of the form as much as possible
                   1049: .np 
                   1050: if the form is a function definition (a list beginning with
                   1051: the symbol 'def')
                   1052: then compile it.
                   1053: .np 
                   1054: if the form is not a function definition then put it on a list of 
                   1055: forms that will be evaluated when the file is 
                   1056: .i fasl ed
                   1057: in.
                   1058: .np
                   1059: if not at end of file, go to step 1.
                   1060: .pp
                   1061: In reality, step 3 is a bit more complex.
                   1062: If the definition is of a macro, then the form will be evaluated
                   1063: immediately, thus adding the macro definition to the compiler's 
                   1064: environment.
                   1065: If the lisp variable 
                   1066: .i macros 
                   1067: is t then the macro will also be compiled.
                   1068: There are also some forms like \fI(progn\ 'compile\ ...)\fP
                   1069: and \fI(eval-when\ ()\ )\fP which determine when the 
                   1070: forms get compiled and evaluated.  
                   1071: See the lisp manual for details.
                   1072: .sh +0 Expression\ Compilation
                   1073: .pp
                   1074: Just as the interpreter is centered around the 
                   1075: .i eval
                   1076: function, the lisp compiler is centered around the function
                   1077: .i d-exp
                   1078: whose job it is to compile the expression passed to it.
                   1079: The lisp compiler is one pass.
                   1080: Before a call to d-exp returns,
                   1081: all the compiled code for a form has been computed and written
                   1082: to a file.
                   1083: One reason that the lisp compiler can be a single pass compiler
                   1084: is that the assembler which reads the compiler's output is 
                   1085: a two pass assembler.
                   1086: .sh +1 global\ state\ variables
                   1087: .pp
                   1088: There are a number of variables that maintain the state of
                   1089: the compilation process.
                   1090: These variables are declared special and are thus dynamically scoped
                   1091: and visible to any function within the compiler.
                   1092: When d-exp 
                   1093: is called their meanings are:
                   1094: .ip \fIv-form\fP
                   1095: - contains the expression to be compiled.
                   1096: .ip \fIg-loc\fP
                   1097: - tells where the result of evaluating this expression should be put.
                   1098: If g-loc is nil, then the value returned is unimportant and shouldn't
                   1099: be put anywhere.
                   1100: .ip \fIg-cc\fP
                   1101: - controls conditional branches.
                   1102: If g-cc is non nil, then it is a list cell whose value has
                   1103: either a non-null car or non-null cdr but not both. 
                   1104: If the car is non-nil then
                   1105: if  the  value  of  the expression held in 
                   1106: v-form  is  non-nil, a branch should be done to the symbol
                   1107: stored  in  \fI(car\ g-cc)\fP. 
                   1108: If the cdr is non-nil  then if the value of v-form
                   1109: is  nil,  a branch should be done to the symbol stored in 
                   1110: \fI(cdr\ g-cc)\fP. Since at
                   1111: every  conditional  
                   1112: branch  control should  either jump or continue, the car and cdr will
                   1113: never both be specified.
                   1114: .ip   \fIg-ret\fP
                   1115: - is non nil if the result of evaluating v-form will be returned as the
                   1116: value of the function we are evaluating. 
                   1117: This is used to allow liszt to detect
                   1118: when tail recursion removal is possible.
                   1119: .ip  \fIg-locs\fP
                   1120: - maintains current information about the state of the stacks:
                   1121: the bindstack (for specials), the namestack (for locals) and the
                   1122: C runtime stack (for catch frames)
                   1123: The form of g-locs is a stack of tokens stored as a list.
                   1124: The meaning of each token is as follows:
                   1125: .br
                   1126: \fInil\fP - this represents an unnamed object on the namestack.  This happens
                   1127: when calling functions, when the arguments are stacked prior
                   1128: to a function call.
                   1129: .br
                   1130: \fI<symbol>\fP - the given symbol is the name of a local variable on the namestack.
                   1131: .br
                   1132: \fI(prog . <n>)\fP - prog statement which binds <n> special variables
                   1133: .br
                   1134: \fI(do . <n>)\fP - head of a do statement which binds <n> special variables
                   1135: .br
                   1136: \fI(catcherrset . 0)\fP - catch or errset frame on stack
                   1137: .br
                   1138: \fI(lambda . <n>)\fP - lambda expression which binds <n> special variables
                   1139: .ip \fIg-labs\fP
                   1140: - this is a stack of labels. 
                   1141: There is one entry in g-labs for every entry which is a list
                   1142: in g-locs.
                   1143: the forms of the entries are:
                   1144: .br
                   1145: \fInil\fP - no labels in this form
                   1146: .br
                   1147: \fI(endlab (real . gen) (real2 . gen2) ...)\fP - endlab is the label to go to
                   1148: to get out of this body.  After this label will be code
                   1149: to unbind specials and pop off locals.
                   1150: The 'real' labels are those found in the code. the gen
                   1151: labels are those generated and put in the assembler code.
                   1152: .sh +0 Function\ compilation
                   1153: .pp
                   1154: The action d-exp takes when compiling an expression depends on the
                   1155: type of expression.
                   1156: For atoms (symbols and numbers) the compilation is very simple, it
                   1157: is just a matter of putting the value where g-loc specifies and 
                   1158: then jumping if specified by g-cc.
                   1159: When the expression is a list, d-exp first macro
                   1160: expands the form and then looks at the first element of 
                   1161: the list (if the list has not macro expanded to an atom).
                   1162: If the first element is a symbol  then the list is
                   1163: is a function call.
                   1164: If the function is one of the functions which liszt knows how to open compile
                   1165: then liszt will call the particular routine designated to open
                   1166: compile this function.
                   1167: There are two classes of functions within liszt that
                   1168: do open compiling.
                   1169: The first class, the fl-expr class, are distinguished by names which
                   1170: begin with c-.
                   1171: These functions always generate
                   1172: code which returns the result in r0.
                   1173: They are not equipped to obey the contents of g-loc and g-cc.
                   1174: Thus d-exp, after calling one of these functions, must do what
                   1175: is necessary to obey g-loc and g-cc.
                   1176: The other class of open compiling functions, the fl-exprcc class (whose
                   1177: names begin with cc-),
                   1178: handle g-loc and g-cc.
                   1179: As a result these are usually much more complex and generate better code.
                   1180: .sh -1 Register\ Conventions
                   1181: .pp
                   1182: The register conventions used by liszt are an extension of those
                   1183: used by the C code.
                   1184: .ip \fIr0\fP
                   1185: - return value placed here
                   1186: .ip \fIr1,r2,r3,r4\fP
                   1187: - scratch registers.
                   1188: When long strings of 
                   1189: .i car's
                   1190: or
                   1191: .i cdr's
                   1192: are done (such as
                   1193: .i caddadr )
                   1194: these registers are used in a least-recently-used fashion to access down
                   1195: the list. 
                   1196: The compiler keeps track of the values in these registers but must
                   1197: assume that they are garbage after a function is called.
                   1198: .ip \fIr5\fP
                   1199: - fixnum accumulator.
                   1200: There a number of functions which work on fixnum's only and these
                   1201: usually put their result in r5.
                   1202: The assembler code function 
                   1203: .i qnewint
                   1204: which returns a pointer to a cell containing a fixnum value (after
                   1205: checking if it is in the fixnum table), expects its argument to be
                   1206: in r5.
                   1207: .ip \fIr6\fP
                   1208: np
                   1209: .ip \fIr7\fP
                   1210: lbot.
                   1211: When calling a function, this register is set just before the function
                   1212: call, and after the function call its value is used to reset the value
                   1213: of np in order to pop the arguments off the namestack.
                   1214: .ip \fIr8\fP
                   1215: the literal table pointer.
                   1216: The compiler generates a table of all the literal lisp data
                   1217: which the compiled code might access.
                   1218: Upon function entry a pointer to the base of this table is put in r8.
                   1219: For example, if we compile \fI(setq\ x\ 'foo)\fP then we will
                   1220: need a pointer to the lisp symbol
                   1221: .i foo
                   1222: and if the symbol
                   1223: .i x
                   1224: as been declared special we will also need a pointer to 
                   1225: .i x .
                   1226: .ip \fIr10\fP
                   1227: - upon function entry the value of lbot (r7) is put in r10.  
                   1228: This allows us to reference the arguments to our function while
                   1229: using lbot to call other function.
                   1230: .sh +0 Addresses
                   1231: There are four types of addresses used internally in Franz: 
                   1232: symbolic, intermediate addresses (iadr), extended intermediate (eiadr)
                   1233: and vax assembler format.
                   1234: .sh +1 Symbolic
                   1235: .pp
                   1236: These are the names of lisp objects, such as `a', `foo', 34,
                   1237: or 3.234234.
                   1238: A name is either a local variable, a special variable or a
                   1239: number.  A number is either a small fixnum, large fixnum,
                   1240: bignum or floating point number.
                   1241: .sh +0 Intermediate\ address\ (IADR)
                   1242: .pp
                   1243: This type of address is generated from a symbolic address by
                   1244: .i d-loc, 
                   1245: .i d-loclit 
                   1246: and the routines 
                   1247: .i d-simple
                   1248: and 
                   1249: .i d-rsimple 
                   1250: which
                   1251: call them. The forms are
                   1252: .ip \fINil\fP
                   1253: - the location or value of nil.
                   1254: .ip \fIT\fP   
                   1255: - the location or value of t.
                   1256: .ip \fIreg\fP
                   1257: - register r0, which is where the result of function calls
                   1258: are stored.
                   1259: .ip \fIstack\fP
                   1260: - the address pointed to by np with np incremented after
                   1261: the value is stored.  (i.e  (r6)+)
                   1262: .ip \fIunstack\fP
                   1263: - the address one word below np (np is decremented before
                   1264: accessing. (i.e. -(r6))
                   1265: .ip \fI<symbol>\fP
                   1266: - this is just <symbol>.  This allows strange forms to be
                   1267: represented directly.
                   1268: .ip \fI(stack\ n)\fP
                   1269: - The n'th value on the namestack for this function.
                   1270: The first value 0(r10) is (stack 1).
                   1271: .ip \fI(vstack\ n)\fP
                   1272: - The cdr of the n'th value on the namestack.  
                   1273: That is, (stack 1) is *0(r10)
                   1274: .ip \fI(bind\ n)\fP
                   1275: - The value of the n'th value in 
                   1276: the literal table.  If 
                   1277: this refers to a symbol, then this is the value
                   1278: of the symbol.
                   1279: If this refers to a list then this
                   1280: this is the cdr of the list (although this is a rare
                   1281: use). (bind 1) corresponds to *0(r8).
                   1282: .ip \fI(lbind\ n)\fP
                   1283: - The n'th value in the literal table.  If 
                   1284: this refers to 
                   1285: object foo then this is the address of foo
                   1286: in memory.
                   1287: .ip \fI(fixnum\ n)\fP
                   1288: - This is the address of small fixnum n in memory.
                   1289: A small fixnum is in the range -1024 to 1023.
                   1290: .ip \fI(immed\ n)\fP
                   1291: - The is the immediate constant n. 
                   1292: .sh +0 extended\ intermediate\ address\ (EIADR)
                   1293: .pp
                   1294: This address is generated from an IADR by e-cvt.  It 
                   1295: symbolically represents a vax address.  
                   1296: .ip \fI<atom>\fP
                   1297: - This corresponds to <atom>.
                   1298: .ip \fI(n\ <atom>)\fP
                   1299: - This corresponds to n(<atom>)
                   1300: (stack n+1) and (lbind n+1) are converted to these forms.
                   1301: .ip \fI(n\ <atom1>\ <atom2>)\fP
                   1302: - This corresponds to n(<atom1>)[<atom2>]
                   1303: There is currently no IADR which generates this.
                   1304: .ip \fI(*\ n\ <atom>)\fP
                   1305: -This corresponds to *n(<atom>)
                   1306: (vstack n+1) and (bind n+1) are converted to these forms.
                   1307: .ip \fI(+\ <atom>)\fP
                   1308: - This corresponds to (<atom>)+.
                   1309: stack is converted to this form.
                   1310: .ip \fI(-\ <atom>)\fP
                   1311: - This corresponds to -(<atom>)
                   1312: unstack is converted to this form.
                   1313: .ip \fI($\ <numb>)\fP
                   1314: - This corresponds to $<atom>
                   1315: (immed <numb>) is converted to this form.
                   1316: .ip \fI(#\ <numb>)\fP
                   1317: - This corresponds to $<loc> where <loc> equals 
                   1318: the base of the fixnums (0x1400) plus 4 * <numb>
                   1319: (fixnum <numb>) is converted to this form
                   1320: .ip \fI(*\ #\ <numb>)\fP
                   1321: - This corresponds to $<numb>.  It is generated
                   1322: by d-rsimple occasionally when you ask for the
                   1323: cdr of a number (which you do sometimes when you
                   1324: are compiling fixnum operators).
                   1325: .sh +0 Vax\ Unix\ assembler\ addresses
                   1326: .pp
                   1327: The addresses are printed from a EIADR by e-cvtas.  The conversions
                   1328: are shown in the above section.
                   1329: .sh -1 Function\ calling\ convention
                   1330: .sh +1 Function\ linkages
                   1331: .pp
                   1332: The function associated with a symbol is stored in the function 
                   1333: definition slot of the symbol.  If the function slot contains a
                   1334: list then the function is to be interpreted and its 'car' will 
                   1335: be lambda, nlambda, lexpr or macro.  If the function is compiled then 
                   1336: the function definition slot will contain a binary object. 
                   1337: A binary object 
                   1338: is composed of two independent parts: the discipline and the entry.
                   1339: The discipline is one of:
                   1340: .ip \fIlambda\fP
                   1341: - a function whose arguments should be evaluated.
                   1342: .ip \fInlambda\fP
                   1343: - a function whose arguments should not be evaluated but
                   1344: which should be passed as a list
                   1345: .ip  \fImacro\fP
                   1346: - a function which should be passed the unevaluated form
                   1347: being evaluated and whose result should be evaluated.
                   1348: .ip \fI\"subroutine\"\fP
                   1349: - a foreign function subroutine
                   1350: .ip  \fI\"integer-function\"\fP
                   1351: - a foreign function returning an integer
                   1352: .ip \fI\"real-function"\fP
                   1353: - a foreign function returning a flonum.
                   1354: .pp   
                   1355: A lexpr is not in the list as there is no difference to the caller
                   1356: between compiled lambda's and compiled lexprs.
                   1357: .sh +0 Transfer\ tables
                   1358: Most calls from compiled code to other lisp functions go through
                   1359: transfer tables.  A transfer table entry is a pair: symbol, address
                   1360: of routine. 
                   1361: When another lisp function is called it uses the 
                   1362: .i calls 
                   1363: instruction which will
                   1364: indirectly jump through the address in the transfer table.  This
                   1365: address may point to the desired function or it may point to the
                   1366: interface routine.  If a call ends up in the interface routine, 
                   1367: then that routine will trace back through the call stack and eventually
                   1368: reach the tranfer table entry that it was 'called through'. Since the
                   1369: transfer table entry contains a symbol which names the function
                   1370: to be called, the interface routine can determine which routine
                   1371: was to have been called.  If that routine is compiled then the 
                   1372: interface routine can modify the transfer table so that a call
                   1373: through that table entry will go directly to the desired function.
                   1374: If the routine to be called is interpreted, or if the user has
                   1375: requested that transfer linkages should be disabled, then the
                   1376: interface routine will go through the 'funcall' function 
                   1377: in the interpreter to continue the call.  
                   1378: .sh +0 calling\ sequence\ in\ the\ compiler:
                   1379: .pp
                   1380: When transfer tables are used
                   1381: .(b
                   1382:        \fBforeach\fP arg
                   1383:         \fBcompute\fP arg and stack result using (r6)+
                   1384:         for example: movl r0,(r6)+
                   1385:        movab -n(r6),r7    where n = 4*number of args to fcn
                   1386:        calls  $0,*trantb+m      where m is the index of the function
                   1387:                                 in the transfer table.
                   1388:        movl r7,r6              restore r6
                   1389: .)b
                   1390: .pp
                   1391: The compiler supports local functions, which are function accessible
                   1392: only within one file.
                   1393: Because they are not accessible from C code, we can use a very fast
                   1394: call and return sequence when calling them.
                   1395: To call a local function
                   1396: .(b
                   1397:        \fBforeach\fP arg
                   1398:         \fBcompute\fP and stack using (r6)+
                   1399:        jsb     LOCALNAME       go directly to the function, it will 
                   1400:                                restore r6 before it returns.
                   1401: .)b
                   1402: .pp
                   1403: The beginning of each function looks as follows. 
                   1404: First for a non-lexpr function called in the standard way
                   1405: (topsim is the label jumped to when tail merging, it will be unique
                   1406: for each function; the brackets indicate the optional code which
                   1407: exists if the -p switch is given to liszt);
                   1408: .(b
                   1409:        .word   0x5c0           # save r6, r7, r8, r10
                   1410:     [   movab  mcounts,r0      # if profiling, mcounts replaced by fasl
                   1411:        jsb     mcount  ]
                   1412:        movab   linker,r8       # set up r8
                   1413:        movl    r7,r10          # set up oldlbot
                   1414:        movab   n(r10),r6       # n = 4*Number of args expected.
                   1415: topsim:
                   1416: .)b
                   1417: .pp
                   1418: Now for lexprs:
                   1419: .(b
                   1420:        .word   0x5c0
                   1421:     [   movab  mcounts,r0      # if profiling. [mcounts replaced by fasl]
                   1422:        jsb     mcount  ]
                   1423:        movab   linker,r8       # set up r8
                   1424:        subl3   $4,r7,-(sp)     # address one word below bottom of args
                   1425: topsim:
                   1426:        movl    r6,r10          # first free addr to arg base
                   1427:        subl3   r7,r6,r0        # number of args * 4 into r0
                   1428:        movab   0x1400(r0),(r6)+ # stack boxed number of args
                   1429:        movl    0(r10),-(sp)    # also store on stack so user can't clobber
                   1430: .)b
                   1431: .pp
                   1432: And finally for local functions:
                   1433: .(b
                   1434:        movl    r10,-(sp)       # save r10
                   1435:        movab   -n(r6),r10      # set up arg base using arg top
                   1436: topsim:
                   1437: .)b
                   1438: .sh -1 Assembler\ file\ format
                   1439: .pp
                   1440: The liszt compiler generates a file which is in Unix assembler format.
                   1441: The Unix assembler converts that file into an object file which fasl
                   1442: then reads.
                   1443: Although the object file generated is a standard Unix object
                   1444: file (as defined in /usr/include/a.out.h), it is not of a 
                   1445: format that  the Unix ld loader can understand.
                   1446: This is because the requirements  of a lisp object file are different
                   1447: from an object file of other languages.
                   1448: The 
                   1449: run time semantics of lisp and the fact that lisp data must be protected
                   1450: from garbage collection are the principal differences.
                   1451: The unconventional object file created by the unix assembler 
                   1452: is a result of the
                   1453: unconventional assembler input file.
                   1454: Next we will look at what must be put in the assembler file and how
                   1455: it is put there.
                   1456: .pp
                   1457: The assembler file must contain
                   1458: .ip \fIinstructions\fP
                   1459: - vax assembler code for the compiled functions. 
                   1460: If there aren't any functions compiled, this can be empty.
                   1461: .ip \fIliterals\fP
                   1462: - lisp data which is referred to by compiled code
                   1463: .ip \fItransfer\ table\fP
                   1464: - the names of the functions which correspond to the calls through
                   1465: the transfer table.
                   1466: The instructions simply say 'call indirect through the nth transfer
                   1467: table entry'.
                   1468: .ip \fIfunction\ names\fP
                   1469: - the names of the functions which are being 
                   1470: defined.
                   1471: .ip \fIload\ time\ forms\fP
                   1472: - in order to mimic the 
                   1473: .i load
                   1474: function, fasl has to be able to evaluate lisp expressions at fasl 
                   1475: time, so we must be able to store lisp expressions in the assembler
                   1476: file and indicate when they should be evaluated.
                   1477: .pp
                   1478: Based on the requirements above, the form of the assembler file
                   1479: is as described below.
                   1480: The assembler  builds two segments: text and data.
                   1481: We put all of our information in the text segment.
                   1482: The compiler places some annotation strings in the data segment
                   1483: so that the object file can be identified, however the data segment
                   1484: is ignored by fasl.
                   1485: The format is
                   1486: .ip \fIcompiled\ instructions\fP
                   1487: The instructions for each compiled  (non-local) function begins with
                   1488: .(b
                   1489:     .globl F00003
                   1490:     F00003:
                   1491:    .word 0x5c0
                   1492: .)b
                   1493: The globl declaration and the fact that the symbol name begins
                   1494: with a F will tell fasl that this is the beginning of a lisp
                   1495: function.
                   1496: The symbols beginning with F must be unique within a file
                   1497: but may be duplicated in other files.
                   1498: The lisp  name of the function will appear later.
                   1499: Next the instructions for the function are given.
                   1500: Only a small fixed set of external symbols may be referenced.
                   1501: The list is found in the code for nfasl.c and the common
                   1502: ones will be described below (soon).
                   1503: Labels should be given a name beginning with L and should
                   1504: be unique within the file.
                   1505: .ip \fItable\ sizes\fP
                   1506: somewhere in the file there should be a pair of 'set' assembler
                   1507: pseudo ops which describe the sizes of the literal table and
                   1508: transfer table.
                   1509: The form is this
                   1510: .(b
                   1511:     .set linker_size  4
                   1512:     .set trans_size 3
                   1513: .)b
                   1514: where linker_size is the number of entries in the literal table
                   1515: which will be required and trans_size is the number of entries
                   1516: in the transfer table which will be required.
                   1517: Those tables will be built by  fasl.
                   1518: .ip \fIbinder\ table\fP
                   1519: - this table describes  the order that the functions should be
                   1520: defined and forms evaluated.
                   1521: It is a table of longwords  beginning at the symbol bind_org
                   1522: and ending when a -1 entry is seen.
                   1523: The meaning of the entries will be described below.
                   1524: .ip \fIlisp\ expression\ table\fP
                   1525: - this is a table of null terminated strings beginning at the 
                   1526: symbol lit_org and ending at the symbol lit_end.
                   1527: Each string is read by the lisp read function (using the
                   1528: raw readtable).
                   1529: The first linker_size expressions are put in the literal table.
                   1530: The next trans_size expressions are the names  of the
                   1531: functions to put in the transfer table.
                   1532: The remaining expressions correspond to the entries in the
                   1533: binder table.
                   1534: The binder entries are processed, one by one. 
                   1535: Provided that the binder entry is not -1,  an expression
                   1536: is read from the lisp expression table.
                   1537: Then if the binder table entry is 0, that expression is the name of
                   1538: a lambda type lisp function.
                   1539: A binary object is created, the discipline is set to lambda and
                   1540: the function address is set to the lexically next function 
                   1541: defined in the
                   1542: file.
                   1543: If the binder entry is 1 then this is an nlambda function, and
                   1544: if the entry is 2 then this is a macro function.
                   1545: If the entry is 99 then the expression just read should be evaluated
                   1546: by the lisp function eval.
                   1547: .pp
                   1548: The lisp compiler normally puts the assembler format output file in /tmp
                   1549: and removes it when it is done. 
                   1550: The -S switch will tell liszt to write the 
                   1551: assembler file in the same place as
                   1552: the source file, and with the same root name but a .s extension.
                   1553: The -C file will tell the lisp compiler to comment the file as it
                   1554: generates it, making it easier to understand what is going on.
                   1555: Assume that the following is file foo.l:
                   1556: .(b
                   1557: (defun foo (x) (bar y) (baz k))
                   1558: (print (foo 3))
                   1559: (def test (nlambda (l) (print 'hithere) (foo 3)))
                   1560: .)b
                   1561: we now compile it with -SC
                   1562: .(b
                   1563:     .globl     F00007  #(fcn lambda foo)
                   1564:     F00007:
                   1565:     .word      0x5c0
                   1566:     movab      linker,r8
                   1567:     movl       r7,r10
                   1568:     movab      4(r10),r6
                   1569:     L00008:
                   1570:     movl       *0(r8),(r6)+    #(calling bar)
                   1571:            #(from y to stack)
                   1572:     movab      -4(r6),r7
                   1573:     calls      $0,*trantb+0
                   1574:     movl       r7,r6
                   1575:     movl       *4(r8),(r6)+    #(calling baz)
                   1576:            #(from k to stack)
                   1577:     movab      -4(r6),r7
                   1578:     calls      $0,*trantb+8
                   1579:     movl       r7,r6
                   1580:     ret
                   1581:     .globl     F00009  #(fcn nlambda test)
                   1582:     F00009:
                   1583:     .word      0x5c0
                   1584:     movab      linker,r8
                   1585:     movl       r7,r10
                   1586:     movab      4(r10),r6
                   1587:     L00010:
                   1588:     movl       8(r8),(r6)+     #(calling print)
                   1589:            #(from 'hithere to stack)
                   1590:     movab      -4(r6),r7
                   1591:     calls      $0,*trantb+16
                   1592:     movl       r7,r6
                   1593:     movl       $5132,(r6)+     #(calling foo)
                   1594:            #(from (fixnum 3) to stack)
                   1595:     movab      -4(r6),r7
                   1596:     calls      $0,*trantb+24
                   1597:     movl       r7,r6
                   1598:     ret
                   1599:     bind_org:
                   1600:     .set linker_size,  3
                   1601:     .set trans_size,   4
                   1602:     .long      0
                   1603:     .long      99
                   1604:     .long      1
                   1605:     .long -1
                   1606:     lit_org:
                   1607:     .asciz "y"
                   1608:     .asciz "k"
                   1609:     .asciz "hithere"
                   1610:     .asciz "bar"
                   1611:     .asciz "baz"
                   1612:     .asciz "print"
                   1613:     .asciz "foo"
                   1614:     .asciz "foo"
                   1615:     .asciz "(print (foo 3))"
                   1616:     .asciz "test"
                   1617:     lit_end:
                   1618:     .data  # this is just for documentation 
                   1619:     .asciz "@(#)Compiled by Lisp Compiler 7.1 on Sun Feb 21 17:51:54 1982"
                   1620:     .asciz "@(#)decl.l 1.5     2/10/82"
                   1621:     .asciz "@(#)array.l        1.1     9/25/81"
                   1622:     .asciz "@(#)datab.l        1.1     9/25/81"
                   1623:     .asciz "@(#)expr.l 1.1     9/25/81"
                   1624:     .asciz "@(#)io.l   1.1     9/25/81"
                   1625:     .asciz "@(#)funa.l 1.3     2/10/82"
                   1626:     .asciz "@(#)funb.l 1.2     2/10/82"
                   1627:     .asciz "@(#)func.l 1.2     2/10/82"
                   1628:     .asciz "@(#)tlev.l 1.4     10/24/81"
                   1629:     .asciz "@(#)fixnum.l       1.6     10/21/81"
                   1630:     .asciz "@(#)util.l 1.2     10/7/81"
                   1631: .)b
                   1632: .sh +0 functions\ callable\ from\ compiled\ lisp\ code
                   1633: .sh +0 Object\ File\ Format
                   1634: .sh +0 Fasl
                   1635: 
                   1636: 
                   1637: 
                   1638: 
                   1639: 

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.