|
|
1.1 root 1: .\" Copyright (c) 1980 Regents of the University of California.
2: .\" All rights reserved. The Berkeley software License Agreement
3: .\" specifies the terms and conditions for redistribution.
4: .\"
5: .\" @(#)franz.n 6.1 (Berkeley) 4/29/86
6: .\"
7: ." franz.n -[Thu Jun 17 11:01:27 1982 by jkf]-
8: ." this file will contain a description of the franz lisp system.
9: ." sort of a systems manual.
10: .de Fr
11: F\s-2RANZ\s0 L\s-2ISP\s0
12: ..
13: .tp
14: .(l C
15: .sz +2
16: \fBThe Franz Lisp System\fP
17: .sz -2
18: .sp 2v
19: \fIby
20: \fIJohn K. Foderaro\fP
21: .)l
22: .sp 3v
23: .tl ''Abstract''
24: .br
25: This document describes the
26: .Fr
27: system written at The University of California
28: at Berkeley.
29: Included are descriptions of the memory management, interpreter
30: and compiler, the conventions used within the C coded kernel
31: and those within the compiled code.
32: This does not duplicate the information found in The Franz Lisp Manual.
33: .++ C '\s-2Draft\s+2'\fBThe\ Franz\ Lisp\ System\fP'%'
34: .+c Characteristics\ of\ \F\s-2RANZ\ \s0L\s-2ISP\s0
35: .ls 1
36: .pp
37: There is no formal standard for lisp systems, thus each lisp
38: system is almost guaranteed to be different from any other lisp system.
39: In this section we will examine the design decisions which most often
40: characterize a lisp system.
41: This
42: focus does not imply that all other parts of the language are generally
43: agreed upon.
44: In fact, there is nothing sacred to lisp system designers.
45: For example, one usually identifies the symbol
46: .i nil
47: with the empty list,
48: and one usually thinks of lisp as a dynamically scoped language.
49: Both of these conventions are are not true in the lisp dialect
50: .i NIL
51: currently being written at MIT.
52: As another example, one imagines that a lisp system must
53: use garbage collection to reclaim storage.
54: The lisp dialect
55: .i ZetaLisp
56: that is running on Lisp Machines doesn't
57: normally use a garbage collector
58: because of the way in which it allocates its address space.
59: It is faster to reboot the machine than to do a garbage collection.
60: In most lisp dialects the lisp expressions are written in
61: parenthesised form. In
62: .i Portable
63: .i Standard
64: .i Lisp
65: (PSL) at the University of Utah, programs are written
66: primarily
67: in an Algol-like
68: syntax.
69: .pp
70: Thus some of the discussion to follow indicates not so much
71: .i unique
72: charateristics of
73: .Fr
74: but instead, how decisions were made.
75: .sh 2 Scoping\ and\ Binding
76: .pp
77: The
78: .Fr
79: interpreter
80: uses
81: .i dynamic
82: .i scoping ,
83: that is, after A calls B, B can access all of the
84: variables that A allocated
85: as well
86: as those that A can access, provided B doesn't allocate variables of the same
87: names.
88: There are two popular ways of implementing dynamic scoping:
89: .i deep
90: .i binding
91: and
92: .i shallow
93: .i binding .
94: Note that we will use the terms variable allocation and lambda
95: binding interchangeably in this report.
96: The former term is less language-specific than the latter.
97: .sh +1 Deep\ Binding
98: .pp
99: In a deep binding implementation, when a variable is allocated,
100: the name of the variable and a space for its value are pushed on the
101: top of a stack.
102: When a program asks for the value of a variable,
103: the lisp system must search down the stack for the first occurrence of
104: the variable.
105: This system offers great flexibility for the
106: programmer since the state of his program
107: can be described by the contents of the stack.
108: It is thus possible to save the context of a program by just saving
109: the stack or in some cases just a pointer to
110: a place in the stack.
111: The problem with deep binding is that it is time-consuming to search
112: down the stack for the value of a variable, and,
113: as a result, most systems
114: modify the deep binding model by giving variables a global value
115: cell and allowing programs to set and access the global cell.
116: Some implementations of Interlisp use deep binding.
117: .sh +0 Shallow\ Binding
118: .pp
119: In a shallow binding implementation, each variable has a global value cell
120: which contains the current value of the variable.
121: When a variable is allocated inside a program, its name
122: and old value are pushed on a stack called a
123: .i bindstack .
124: The variable is then given a new value.
125: Throughout the lifetime of the allocation, the current value of the
126: variable can be found in the global value cell associated with the variable.
127: When leaving the context of the variable's allocation, the old value
128: is restored from the
129: .i bindstack .
130: A shallow binding scheme makes it much cheaper to access
131: the values of variables, however it is much more difficult to
132: save the state and to restore it.
133: .pp
134: .Fr
135: and most other lisps which are not derived from Interlisp
136: use shallow binding.
137: Some versions of Interlisp use shallow binding.
138: .sh -1 Lisp\ Compiler
139: .pp
140: Dynamic scoping often is not necessary and it is never cheap, even
141: in a shallow binding implementation.
142: Thus the
143: .Fr
144: compiler prefers to do lexical scoping rather than dynamic
145: scoping.
146: If the user does not specify otherwise, when a
147: function is compiled, all variables
148: allocated will be
149: put on a local stack and will
150: .i not
151: be accessible by any
152: function that this function calls.
153: This
154: convention results in faster code being generated by
155: the compiler, but if the
156: user is not careful, it can result in compiled and interpreted code
157: working differently.
158: In some of the new lisp designs, such as
159: .i NIL
160: and
161: .i Common
162: .i Lisp
163: the semantics of compiled and interpreted code have been
164: unified by
165: transferring the compiler semantics (lexical scoping) to the interpreter.
166: This is a rather large step
167: if dynamic scoping is no
168: longer an option, and it is not clear whether the resulting
169: language should be called 'lisp'.
170: .sh +0 Data\ Types
171: .pp
172: A complete list of data types in
173: .Fr
174: can be found in the first chapter of the
175: .Fr
176: Manual.
177: The most important ones are described below.
178: .pp
179: Lisp is a list processing language and the basic data type is the list
180: cell.
181: A list cell also goes by the name of cons cell, pair, or dotted pair (dtpr).
182: It contains pointers to two lisp objects, and these pointers can
183: be accessed via the
184: .i car
185: and
186: .i cdr
187: functions.
188: Each pointer requires four (8-bit) bytes and thus the list cell is
189: eight bytes large.
190: The cdr pointer is stored first since this makes it easier
191: to 'cdr down a list' using the addressing modes of the VAX.
192: .pp
193: The symbol cell is used to implement variables. It contains a pointer
194: to a string that is the print name, a cell to store a value, a cell to
195: store the function definition, a cell to store a pointer to the property
196: list and one pointer that the list system uses if it stores
197: the symbol in the oblist.
198: .sh +0 Memory\ Management
199: .pp
200: A lisp system must be able to determine the type of an
201: object at run time.
202: The method used to determine the type influences the
203: way storage is managed and garbage collection is done.
204: Next, we will look at three possible places
205: to store the type information.
206: .sh +1 Typed\ data
207: .pp
208: The type of the data object could be placed in the object itself, much
209: as Pascal stores the type of a variant record in the record itself.
210: This could result is a large amount of storage being used to store
211: type information.
212: No lisp system that we know of uses this method exclusively.
213: .sh +0 Typed\ pointers
214: .pp
215: Every reference to a lisp object could contain the type of the object
216: referenced.
217: This is a good idea in a machine like an IBM 370 where only
218: part of machine word is used by the addressing hardware.
219: Lisps that use typed pointers generally use a
220: .i heap
221: memory management scheme and a
222: .i copying
223: garbage collector.
224: In a heap scheme,
225: all memory is divided by a pointer (called the
226: heap pointer) separating lisp
227: data and free space.
228: When a new lisp object is needed, it is formed at the
229: dividing line and then
230: the heap pointer is incremented
231: past the new object.
232: Garbage collection is done by maintaining two separate spaces for lisp
233: data, only one of which is used at any one time.
234: When one fills up, all active lisp objects are copied to the
235: other space, leaving the first space totally free.
236: Subsequent allocations are done from the second space, until it fills up,
237: at which point
238: all active data in the second space is copied to the first space.
239: The advantage of the copying garbage collector is that the data space
240: is compacted, which will improve virtual memory behavior.
241: The disadvantage is that
242: half the data space is always unused.
243: .pp
244: PSL on a PDP-10 uses this scheme, as does Lisp 370 on
245: an IBM 370.
246: PSL and NIL on the VAX will also use this scheme.
247: Since the VAX hardware
248: uses the entire word for address
249: calculation,
250: PSL and NIL
251: must mask off the type bits
252: of a pointer before using it to reference memory.
253: .sh +0 Typed\ pages
254: .pp
255: The final alternative is to allocate data objects by pages rather than
256: individually.
257: Each page contains only one type of object and there is a table that
258: keeps track of the type of data object in each page.
259: The address of an object tells which page the object is on and
260: the page number tells which type the object is.
261: Since a whole page of objects is allocated when only one
262: object is necessary,
263: the rest of the objects in the page are put on a free list.
264: This method of allocation is sometimes referred to as
265: .i bibop
266: for BIg Bag Of Pages.
267: The garbage collector for bibop is usually
268: a traditional
269: .i mark
270: and
271: .i sweep
272: algorithm.
273: All active data objects are marked and
274: then each page is swept linearly with
275: the free cells being put on the free list and the used cells
276: left alone.
277: The advantage of mark and sweep over a copying garbage collector is that
278: the mark phase is cheaper because data objects do not have to be copied.
279: Also, all of memory can be used since there is no requirement for two spaces.
280: The disadvantage is that a sweep phase is required in a mark and sweep
281: but one is not required in a copying garbage collector.
282: The sweep phase is expensive because it has to alter most data pages
283: while building a free list.
284: This operation
285: can be expensive in a virtual memory environment.
286: Alternatives to the sweep phase have been proposed in
287: [Foderaro+Fateman Characterization of Vax Macsyma].
288: .pp
289: .Fr
290: uses a bibop memory allocator and a mark and sweep garbage collector,
291: as does Maclisp (on a PDP-10).
292: The reason that
293: .Fr
294: uses bibop is primarily due to the VAX architecture,
295: which makes typed pointers
296: expensive to implement.
297: Also, typed pointers would make it difficult to pass lisp values
298: to programs written in other languages, some of which may not have
299: the ability to extract the address of a lisp object from a typed pointer.
300: .sh -1 Construction
301: .pp
302: The
303: .Fr
304: system is built by adding lisp code to a C-coded kernel.
305: The C-coded kernel is a working lisp interpreter, although it has few of
306: the functions a lisp system needs.
307: The lisp code provides most of the functions, the top level and error
308: handlers.
309: The lisp compiler is also written in lisp,
310: but is usually
311: run as a separate process.
312: .sh +0 Unique\ features
313: .pp
314: .Fr
315: can dynamically load and execute functions defined in the other languages
316: which run on the VAX.
317: It uses two different dynamic loaders.
318: One is part of the lisp system and is used for lisp only, it is called
319: .i fasl
320: (for fast loader).
321: The other is the Unix loader, ld, which is used for loading in
322: C, Fortran or Pascal code.
323: Loading code with fasl
324: is much faster than with ld
325: since fasl benefits from
326: the simplicity of a lisp object file.
327: .+c Data\ Structures
328: .sh 0 _
329: .nr $1 \n(ch
330: .pp
331: In this chapter we shall examine the data structures which are most
332: important to the
333: .Fr
334: system.
335: First, though we will see how the lisp system uses the address space.
336: .sh 2 Physical\ layout
337: .pp
338: As a Unix process, lisp has three address regions: text, data and stack.
339: Text begins at location 0 and continues up to 0x3b73c (0x means hex number).
340: The data segment begins on the next page boundary and continues
341: up to a limit set by the configuration parameters (currently 0x2fd000).
342: Lisp does not begin running with such a large data segment, rather it grows
343: when necessary.
344: The stack segment begins at address 0x7fffffff and grows downward to a
345: maximum size of one-half megabyte.
346: .pp
347: The text segment stores the instructions for the C kernel as well as read-only
348: data.
349: The read-only data for lisp are
350: the symbol nil, the i/o ports, and the small integer
351: table.
352: The symbol nil is stored at location 0 which makes it very easy to test
353: whether a value is nil.
354: The problem with storing nil in read-only space is that a special case
355: must be made for nil's property list, which is the only thing in the
356: nil symbol that the user is permitted to alter.
357: .pp
358: The fixnums -1024 through 1023 are stored sequentially, with 0 being
359: stored at 0x1400.
360: .Fr
361: doesn't have any 'immediate' lisp data, that is data whose value is
362: stored as part of the reference to the data.
363: But, by storing some of the fixnums in a known place, we can achieve
364: some of the benefits of immediate data:
365: A program can use
366: .i eq
367: as a test for a fixnum in the range -1024 to 1023.
368: In the majority of cases, when asked
369: to allocate a fixnum, the system can return a pointer into this table
370: and bypass the memory allocator.
371: .sh +0 Typetable
372: .pp
373: .Fr
374: uses the typed pages (or bibop)
375: method of type determination.
376: The
377: .i typetable
378: is an array of bytes (
379: .i chars
380: in C lingo).
381: This table describes the type of all pages, from page 0 where nil is stored,
382: up to the end of data space.
383: Thus there are many entries that describe the types of the pages which
384: make up the C kernel.
385: In order to reduce the wasted space in the typetable,
386: we could have made the typetable begin typing pages at the start of data
387: space and make a special case of the pages that contain nil and
388: the small integer table.
389: However, the
390: effect of this change would be that type checks
391: (which are done in-line in compiled code) would
392: be longer and slower. Since type checking happens quite
393: frequently, we felt it was better to waste the space in the
394: typetable in order to save execution time and instruction space.
395: .pp
396: Each page on a VAX is 512 bytes, and thus to find the type of an
397: object given the address of it,
398: it is only necessary to shift the address right 9 bits
399: and index the typetable array offset by one.
400: The offset by one is required because the value -4, which is an illegal
401: address, is used as a sentinel value to indicate an illegal value.
402: Thus when we take the type of -4 we will be indexing the table by -1
403: and we want to retrieve the first byte in the table (which contains
404: the value UNBOUND).
405: The C macro which retrieves the type is this (from file h/global.h):
406: .(b I
407:
408: #define TYPE(a1) ((typetable+1)[(int)(a1) >> 9])
409:
410: .)b
411: This is code generated by the lisp compiler to check if the type
412: code of an argument (stored at 0(r10)) is a symbol (which is type
413: code 1):
414: .(b I
415:
416: ashl $-9,0(r10),r0
417: cmpb _typetable+1[r0],$1
418:
419: .)b
420: .pp
421: The type codes which are stored in the typetable are these:
422: .ts 2i 4i 6i
423: .(b I
424: UNBO -1\tSTRNG 0\tATOM 1\tINT 2
425: DTPR 3\tDOUB 4\tBCD 5\tPORT 6
426: ARRAY 7\tOTHER 8\tSDOT 9\tVALUE 10
427: HUNK2 11\tHUNK4 12\tHUNK8 13\tHUNK16 14
428: HUNK32 15\tHUNK64 16\tHUNK128 17
429: .)b
430: The names given above are the C kernel internal names.
431: ATOM is symbol, INT is fixnum, DTPR is list cell,
432: DOUB is flonum, BCD is binary,
433: SDOT is bignum and all the HUNKn types are just known as hunk to
434: the user.
435: .sh +0 Stacks
436: .pp
437: .Fr
438: uses three stacks: the
439: .i C
440: .i runtime
441: stack, the
442: .i namestack
443: and the
444: .i bindstack .
445: The C runtime stack is the stack part of the address space
446: and the other two stacks are stored in the data space.
447: .sh +1 C\ runtime\ stack
448: The C-coded kernel uses this
449: stack in the same way as a typical C program
450: use the stack:
451: storing return addresses
452: and non-lisp-object arguments to subroutines,
453: saving registers,
454: and storing local variables within a function.
455: The lisp system stores
456: .i catch
457: .i frames
458: on this stack as well (to be described later).
459: .pp
460: The lisp system assumes that there are no lisp data on the stack and
461: thus the use of this stack
462: by a program is unrestricted.
463: As will be discussed later on, the
464: .b calls
465: and
466: .b ret
467: instructions are used for most subroutine calls and returns.
468: These instructions expect the stack to look a certain way.
469: .sh +0 Namestack
470: .pp
471: The namestack contains only lisp data.
472: It is used to pass arguments to functions and to hold
473: local (lexically scoped) data within lisp functions.
474: It is also used as a temporary storage spot for lisp data
475: which must be protected from garbage collection.
476: .pp
477: A slight digression on the garbage collector:
478: The person who writes code for the lisp system must always be aware of
479: his greatest enemy: the garbage collector.
480: Whenever a function is called that could possibly allocate more lisp
481: data, one must assume that it
482: when it attempts to allocate space, the garbage collector
483: will be invoked and that it will take away everything that isn't protected
484: from garbage collection.
485: The objects that are protected from garbage collection are: the
486: namestack, the bindstack, the oblist (table of interned symbols),
487: and the compiler literals. Objects that are only
488: referred to by values in registers or
489: the C runtime stack will not be seen by the mark phase of the
490: garbage collector and will be swept up during the sweep phase.
491: .pp
492: Back to the namestack:
493: The first free element on the namestack is pointed to by the
494: C variable
495: .i np .
496: This variable is always stored in the VAX register r6.
497: Another pointer into the namestack is the C variable
498: .i lbot .
499: It is always stored in VAX register r7.
500: Its use will be described in the section on calling conventions.
501: .sh +0 Bindstack
502: .pp
503: The bindstack is a stack of lisp object pairs: symbol and
504: value.
505: It is used to implement shallow binding.
506: When a symbol is lambda-bound, the symbol and its current value are put
507: on this stack.
508: Then the symbol can be given a new value.
509: When the context of the lambda binding is over, the symbol and value pair
510: are popped from the stack and the symbol is given its old value.
511: The C variable
512: .i bnp
513: points to the first free entry on the bindstack.
514: In the C code, the following macros lambda-bind a symbol to a value
515: and restore the old value:
516: .(b
517: #define PUSHDOWN(atom,value)\e
518: {bnp->atm=(atom);\e
519: bnp++->val=(atom)->a.clb;\e
520: (atom)->a.clb=value;\e
521: if(bnp>bnplim) binderr();}
522:
523: #define POP\e
524: {--bnp; bnp->atm->a.clb=bnp->val;}
525: .)b
526: .sh -1 Bitmap
527: .pp
528: The bitmap is used in garbage collection to hold
529: the mark bits for the mark and sweep garbage collector.
530: As its names implies, it is viewed as a collection of bits.
531: The minimum size of a lisp data object is 4 bytes, thus
532: there are 128 of those on a VAX page of 512 bytes.
533: Since there are 8 bits in a byte, it takes 16 bytes to obtain 128 bits.
534: Therefore the size of the bitmap in bytes is 16
535: times the maximum number of
536: pages.
537: Like the typetable, the bitmap keeps track of marks from the
538: bottom of memory, not the bottom of data space.
539: The bitmap and the typetable are static structures.
540: It is their size, which is determined when the C kernel is built,
541: which determines the size to which the data segment can grow.
542: .sh +0 Transfer\ Tables
543: .pp
544: Transfer tables are used by compiled lisp code as a transfer vector to
545: other functions.
546: A transfer table consists of pairs of entries: symbol and pointer to
547: function.
548: When a compiled lisp function calls another (non-local) function, it
549: calls indirectly through an entry in the transfer table.
550: Depending on the setting of certain status variables, the call may
551: bring control into a function linkage routine or directly to
552: the function desired (if that function is a compiled lisp or C function).
553: .pp
554: Transfer tables serve a number of useful purposes.
555: .np
556: They allow compiled code to call interpreted code
557: or compiled code using the same calling sequence.
558: .np
559: They allow the selection of which function to call to be determined
560: at runtime based on the name of the function.
561: In most other languages, this selection process is done at either
562: compile or load time.
563: .np
564: They allow the user to run in a debugging mode where all calls
565: from compiled code go through the interpreter.
566: Once control reaches the interpreter, it is easier to debug.
567: .np
568: They allow the user to run in a production mode, where all calls
569: from compiled to other compiled code are done without function
570: lookup overhead.
571: .np
572: They allow the user to switch between debugging and production modes
573: at runtime with one function call.
574: .pp
575: Transfer tables will be described further in the section on
576: the lisp compiler.
577: .sh +0 Catch\ frames
578: .pp
579: Lisp has a number of forms of non-local transfers.
580: Among them are
581: .i throw ,
582: .i error ,
583: .i return
584: and
585: .i go .
586: If a program is willing to accept a non-local transfer, it puts a
587: .i catch
588: .i frame
589: on the stack which describes which type of transfer it
590: accepts.
591: The catch frame describes the current state of the registers,
592: the variables np, lbot, and bnp, and also contains entries that describe
593: what kinds of non-local transfers the function will accept.
594: After creating a catch frame, the program goes on to evaluate forms.
595: Should the evaluation of one of those forms result in a non-local
596: transfer to the catch frame that this program has set up, the
597: system will restore the namestack and bindstack to the way
598: they were when the program put the catch frame on the stack (by using
599: np and bnp) and will return control to the program (setting
600: the variable retval to describe why it returned).
601: This is described further in the
602: section on interpreter conventions.
603: .pp
604: The C variable
605: .i errp
606: points to the most recent catch frame, and then each catch frame points
607: to the previous catch frame.
608: .sh +0 oblist
609: .pp
610: Normally when symbols are created they are
611: .i interned ,
612: that is they are put in a hash table called an oblist (or obarray).
613: The purpose of the oblist is to insure that if you type a
614: symbol's name to the reader, you will always get the same symbol.
615: Also it protects the symbol from garbage collection.
616: The oblist is simply a hash table with buckets, with a hash link being
617: part of the symbol data structure.
618: Currently the hash table is not accessible to a lisp program, but with
619: a little work it could be.
620: .+c Interpreter
621: .sh 0 _
622: .nr $1 \n(ch
623: .pp
624: The interpreter is composed of these parts:
625: .ip \fIcore:\fP
626: The functions eval, apply and funcall.
627: .ip \fIstack\ management:\fP
628: The code to create catch frames and handle non-local transfers.
629: .ip \fImemory\ management:\fP
630: The code to allocate and garbage collect memory.
631: .ip \fIthe\ functions:\fP
632: The user callable functions that expect lisp arguments and return
633: lisp values.
634: .pp
635: In the above list, the first three are written mainly in C (with a little
636: assembler) and the last is written mainly in Lisp with a little
637: bit in C
638: and a very small amount in assembler.
639: .pp
640: The core functions are the center of activity in the interpreter.
641: The
642: .i eval
643: function given a lisp expression to evaluate.
644: It must determine if it is a simple expression such as a symbol or number
645: whose value eval can determine immediately, or if it is an function
646: calling expression.
647: If the form is a function call, eval must determine if the arguments should
648: be evaluated and if so eval must recursively call itself to evaluate the
649: arguments and then stack them.
650: If the function called is to be interpreted, eval must lambda-bind the
651: formal variables of the function to the arguments just stacked.
652: If the function being called is compiled, eval just calls the function and
653: lets the function do the lambda binding if it wants to.
654: .pp
655: The
656: .i apply
657: function is passed two lisp objects:
658: a function to call (either a symbol or a functional object)
659: and a list of arguments already evaluated.
660: It will do lambda binding if the function to call is to
661: be interpreted.
662: .pp
663: The
664: .i funcall
665: function is passed any number of lisp objects,
666: the first of which is the function to call, and the rest are the
667: arguments to the function
668: which have already been evaluated and stacked.
669: Funcall will do lambda binding if the function to call is
670: to be interpreted.
671: .pp
672: When compiled lisp code calls a function
673: which must be interpreted, it enters the interpreter through the
674: funcall function.
675: The interpreter may go to compiled code through either eval, apply or
676: funcall, though most often it goes through eval.
677: .sh 2 Conventions
678: .pp
679: These are the conventions used in interpreted and compiled code.
680: .sh +1 C\ conventions
681: .pp
682: Our conventions are extensions of the C calling conventions, so we
683: describe here the conventions for C.
684: The VAX has 16 general
685: purpose registers.
686: Registers r12 through r15 are reserved
687: for use by the hardware because we use the
688: .i calls
689: subroutine call.
690: Registers r0-r5 can be used by a program without saving them first.
691: The result of a function call is returned in r0.
692: Registers r11-r6 are allocated (from high to low)
693: by the C compiler when a
694: .i register
695: declaration is made in the C code.
696: Registers r11-r6 must be
697: saved upon entry and restored upon exit if they are used within a function.
698: On the VAX it is very easy to preserve registers
699: since it is done automatically
700: by the hardware if the
701: .i calls
702: (or
703: .i callg )
704: instruction is used.
705: The first short word (two bytes) of a subroutine is a
706: register save mask which
707: tells which registers should be saved (on the C runtime stack) upon
708: entry and restored when a
709: .i ret
710: instruction is done.
711: .sh +0 np\ and\ lbot
712: .pp
713: Earlier we mentioned that the C variables np and lbot are
714: always stored in
715: registers r6 and r7.
716: It is very difficult to globally assign a variable to a register
717: in the C language (no external register declarations
718: are permitted)
719: and thus we have to filter
720: the output of the C compiler and convert all occurrences of 'np' to 'r6'
721: and all occurrences of 'lbot' to 'r7'.
722: This is only half the job, though.
723: We also must modify the register save masks for those routines which
724: alter the value of np or lbot to insure that those registers get
725: saved and restored upon function entry and exit.
726: This is done in the C code by putting
727: .(b C
728: Savestack(n)
729: .)b
730: at the beginning of a routine which will alter np or lbot and which
731: also allocates n register variables.
732: Also in that routine, before the function returns, we put
733: .(b C
734: Restorestack()
735: .)b
736: This is not really necessary in the VAX, but it is there for other
737: machine implementations which don't have a
738: .i ret
739: function which restores registers.
740: The calls to Savestack are recognized by a filter which processes
741: the output
742: of the C compiler and fixes up
743: the save masks.
744: .sh +0 Function\ calling
745: .pp
746: The following text describes what the conventions are for
747: calling
748: compiled lisp functions, whether they were written in lisp or C.
749: We look at it from the viewpoint of the called function.
750: .pp
751: Upon entry
752: to the compiled functon,
753: lbot points to the first argument and np points to the
754: first free spot on the namestack.
755: If np equals lbot then there are no arguments.
756: Recall that np will be in r6 and lbot in r7.
757: The function is free to alter registers r0 through r5 and should return
758: the result in r0.
759: The function may alter registers r6 through r11 as long as their
760: original values
761: are restored when the function returns.
762: The value of np should always
763: point to the first free entry in the namestack.
764: This is all that is required.
765: The extra conventions followed by
766: the lisp compiler
767: in the code it generates are described in the next chapter.
768: .sh +0 Catch\ frames
769: .pp
770: A catch frame saves the state of execution that a program
771: might want to return to at some later time.
772: A catch frame is stored on the C runtime stack, with the most recent
773: frame pointed to by the C global variable errp.
774: The information saved is
775: .ip \fIargs\fP
776: - one or two optional arguments. The lisp compiler always
777: stacks two
778: arguments since it must know exactly how large the frame is.
779: .ip \fIclass\fP
780: - the class describes what type of frame this is (described below).
781: .ip \fIretaddr\fP
782: - address to return to if returning from this frame
783: .ip \fIolderrp\fP
784: - pointer to next older catch frame on the stack
785: .ip \fIbnp\fP
786: - value of bnp (bindstack pointer) when the frame was created
787: .ip \fInp\fP
788: - value of np when the frame was created
789: .ip \fIlbot\fP
790: - value of lbot when the frame was created.
791: .ip \fIsystem\ dependent\fP
792: - the rest of the information stacked depends on the particular machine.
793: In the case of the VAX, registers r13 through r8 are stacked. (r14 and
794: r15 are the stack pointer and program counter; they are not saved since
795: they can be calculated from the other information).
796: .pp
797: The information in a catch frame is put on the C runtime stack
798: in the precise order given above, and the variable errp points not
799: at the beginning of the entire frame, but to the lbot entry.
800: Thus errp\ +\ 4 points to np.
801: The classes of frames are described below.
802: Each class is defined as a constant in the C code (h/frame.h) whose value
803: is a small integer.
804: .ip F_PROG\ [1]
805: - takes no arguments. This frame marks the beginning of a piece of
806: code which can accept a
807: .i return
808: or a
809: .i go .
810: The interpreter uses this to implement
811: .i prog
812: and
813: .i do
814: and for its primative break loop.
815: The lisp compiler does not use catch frames for progs since it only
816: permits
817: .i returns
818: and
819: .i gos
820: to occur within
821: .i progs
822: or
823: .i dos
824: and thus it can determine how to do the non-local transfer
825: at compile time.
826: .ip F_CATCH\ [2]
827: - this takes one or two arguments and is used to implement both
828: .i catch
829: and
830: .i errset .
831: In both cases
832: the first argument is the tag which is being caught,
833: which in the case of an
834: .i errset
835: is symbol ER%all.
836: An
837: .i errset
838: also
839: supplies a second argument which is non nil if the error message
840: should be printed.
841: .ip F_RESET\ [3]
842: - in the C kernel, the reset function
843: is implemented as a non local transfer to a F_RESET catchframe.
844: When the lisp system is built, the reset function is redefined to
845: do a
846: .i throw.
847: Thus this type of catch frame is rarely used.
848: .ip F_EVAL\ [4]
849: - this has one argument, the form being evaluated.
850: Since stacking this
851: on every eval would be expensive,
852: this type of catch frame is only stacked if a \fI(*rset\ t)\fP
853: has been done and if the value of the symbol
854: .i *rset
855: is non nil.
856: The form being evaluated is stacked so that
857: the necessary information for the
858: .i evalframe
859: function is available and so that
860: .i freturn
861: can return from the context of any pending evaluation.
862: .ip F_FUNCALL\ [5]
863: - this is much like F_EVAL,
864: except the one argument it takes is the
865: name of the function to call.
866: It has the same restrictions on when it is stacked as F_EVAL
867: and for the same reasons.
868: .pp
869: In C, a frame is pushed on the stack with Pushframe, with a call
870: of one of these forms:
871: .(b
872: errp = Pushframe(class);
873: errp = Pushframe(class,arg1);
874: errp = Pushframe(class,arg1,arg2);
875: .)b
876: After the call the C variable
877: .i retval
878: contains the value which describes why this function returned.
879: You must remember that it is possible for this one function call to
880: return more than once!
881: The reasons it can return are these (from h/frame.h):
882: .ip C_INITIAL\ [0]
883: This is the initial call to set up the frame.
884: .ip C_GO\ [1]
885: This will only happen for F_PROG frames.
886: In this case,
887: the C variable lispretval contains a lisp symbol which is the tag
888: to which control should be transferred.
889: If the tag cannot be found in this prog or do body, the
890: tag should be again thrown up the stack to a next higher prog or do.
891: .ip C_RET\ [2]
892: This will only happen for F_PROG frames.
893: In this case, lispretval contains the value to return from this prog.
894: .ip C_THROW\ [3]
895: This will only happen for F_CATCH frames.
896: In this case lispretval contains the value to return from this catch.
897: .ip C_RESET\ [4]
898: This will only happen for F_RESET frames.
899: It simply means that a reset is being done.
900: .ip C_FRETURN\ [5]
901: This will only happen for F_EVAL and F_FRETURN frames.
902: The variable lispretval contains the value to return from this
903: call to
904: .i eval
905: or
906: .i funcall .
907: .pp
908: The call to Pushframe is turned into a simple subroutine call (using
909: the
910: .i jsb
911: instruction instead of the
912: .i calls )
913: by the filters which alter the code coming out of the C compiler.
914: Thus it may be useful to see what stacking a catch frame really looks
915: like.
916: Here is what the lisp compiler generates
917: to stack the frame for \fI(*catch\ 'tag\ x)\fP
918: .(b
919: movl 0(r8),r0 #move 'tag to r0
920: pushl $0 # dummy argument
921: pushl r0 # put tag argument on C runtime stack
922: pushl $2 # push F_CATCH
923: jsb _qpushframe # call Pushframe
924: movl r0,_errp # move result into errp
925: .)b
926: .pp
927: Every function which does a Pushframe() must also do a Popframe()
928: before it returns to its caller.
929: This simply removes the frame from the stack.
930: In C it is written:
931: .br
932: .tl ''errp = Popframe()''
933: in the code generated by the lisp compiler it looks like:
934: .(b
935: movl _errp,r1 # put current errp in r1
936: movl 12(r1),_errp # put previous errp in errp
937: addl3 $32,r1,sp # pop frame from stack
938: .)b
939: .pp
940: Non-local transfers are done with the Inonlocalgo
941: function. This function always takes three arguments.
942: The
943: first is the return type (one of the types mentioned
944: above which begin with 'C_'). It will be assigned to retval.
945: The second argument is the value to be assigned to lispretval,
946: except in the case of a C_THROW, where the second argument
947: is the tag to throw to and the third argument is the value
948: to assign to lispretval.
949: This function never returns.
950: If it doesn't find a catch frame
951: which matches what it is looking for, it signals an error.
952: The function is called with
953: .i calls
954: and the arguments are stacked on the C runtime stack, not the
955: namestack.
956: .+c Liszt:\ The\ Lisp\ Compiler
957: .sh 0 _
958: .nr $1 \n(ch
959: .pp
960: The purpose of compiling a lisp function is to create a representation of
961: the function which can be evaluated in less time and perhaps take
962: up less space.
963: There are two approaches to lisp compilers.
964: One is to convert the functions to a compact form, often called
965: .i bytecodes
966: which can be rapidly interpreted.
967: Each bytecode represents a primitive operation in the lisp system.
968: This approach is simple to implement but there is an time penalty
969: in using an interpreter.
970: The other approach is to compiled to machine code.
971: In general, this is not as portable as the bytecode approach but the result
972: generally runs faster.
973: There are two ways of compiling to machine code.
974: One is to place arguments to functions in registers.
975: If there are more arguments than registers, the arguments are put on
976: a stack and a special type of call is made.
977: This method is used in Maclisp.
978: The other method is to assume a stack model, in which
979: arguments to a function are placed on a stack.
980: This is what the
981: .Fr
982: compiler Liszt does.
983: The stack model made it much easier to write parts of the lisp
984: system in the C langauge.
985: It also simplifies the garbage collector since the mark phase doesn't
986: have to locate and peruse all saved registers looking for lisp data.
987: .sh 2 File\ Compiler
988: .pp
989: When a file of lisp expressions is loaded,
990: the
991: .i load
992: function repeatedly reads and evaluates the forms in the
993: file.
994: Some of these evaluations may result in functions being defined,
995: and others may use the functions just defined or previously defined.
996: The job of the lisp compiler is to create an object file, which,
997: when read in
998: by
999: .i fasl,
1000: acts just like it was
1001: .i load ed
1002: in, except when a function is defined it is defined in machine
1003: code, not as a lisp expression.
1004: This is quite a bit different from what compilers do in other languages
1005: and it is done this way to make it easier to
1006: switch between compiled and
1007: interpreted code.
1008: .sh +0 Differences\ between\ compiled\ and\ interpreted
1009: .pp
1010: There are some differences, though, between compiled and interpreted code.
1011: Local variables in compiled code are lexically scoped (put on the
1012: namestack and inaccessible to other functions) unless the variable
1013: has been declared
1014: .i special.
1015: The declaration:
1016: .(b
1017: \fI(declare (special x y))\fP
1018: .)b
1019: declares both x and y to be special from this point in the file on.
1020: The declaration
1021: .(b
1022: \fI(declare (specials t))\fP
1023: .)b
1024: declares all variables to be special.
1025: .pp
1026: Lisp has a very powerful macro definition system.
1027: The compiler will macro expand all it can, whereas the interpreter
1028: expands macros when necessary but never replaces a macro call with
1029: its expansion.
1030: Thus if you redefine a macro, the compiled code that uses it will
1031: still work the same way, but the interpreted code will use the
1032: new definition.
1033: Also, when compiling a file, macro definitions must be
1034: done before any call on the macro is encountered during compiling.
1035: In the interpreter,
1036: macros can be defined or redefined anytime up until
1037: they are used.
1038: Thus the interpreter is far freer about macro definitions than
1039: the compiler.
1040: This could cause programs which work interpreted to
1041: fail compiled.
1042: .sh +0 Top\ level\ algorithm
1043: .pp
1044: The top level algorithm of the compiler is simply this:
1045: .np
1046: read a lisp expression from the input file
1047: .np
1048: macro expand the top level of the form as much as possible
1049: .np
1050: if the form is a function definition (a list beginning with
1051: the symbol 'def')
1052: then compile it.
1053: .np
1054: if the form is not a function definition then put it on a list of
1055: forms that will be evaluated when the file is
1056: .i fasl ed
1057: in.
1058: .np
1059: if not at end of file, go to step 1.
1060: .pp
1061: In reality, step 3 is a bit more complex.
1062: If the definition is of a macro, then the form will be evaluated
1063: immediately, thus adding the macro definition to the compiler's
1064: environment.
1065: If the lisp variable
1066: .i macros
1067: is t then the macro will also be compiled.
1068: There are also some forms like \fI(progn\ 'compile\ ...)\fP
1069: and \fI(eval-when\ ()\ )\fP which determine when the
1070: forms get compiled and evaluated.
1071: See the lisp manual for details.
1072: .sh +0 Expression\ Compilation
1073: .pp
1074: Just as the interpreter is centered around the
1075: .i eval
1076: function, the lisp compiler is centered around the function
1077: .i d-exp
1078: whose job it is to compile the expression passed to it.
1079: The lisp compiler is one pass.
1080: Before a call to d-exp returns,
1081: all the compiled code for a form has been computed and written
1082: to a file.
1083: One reason that the lisp compiler can be a single pass compiler
1084: is that the assembler which reads the compiler's output is
1085: a two pass assembler.
1086: .sh +1 global\ state\ variables
1087: .pp
1088: There are a number of variables that maintain the state of
1089: the compilation process.
1090: These variables are declared special and are thus dynamically scoped
1091: and visible to any function within the compiler.
1092: When d-exp
1093: is called their meanings are:
1094: .ip \fIv-form\fP
1095: - contains the expression to be compiled.
1096: .ip \fIg-loc\fP
1097: - tells where the result of evaluating this expression should be put.
1098: If g-loc is nil, then the value returned is unimportant and shouldn't
1099: be put anywhere.
1100: .ip \fIg-cc\fP
1101: - controls conditional branches.
1102: If g-cc is non nil, then it is a list cell whose value has
1103: either a non-null car or non-null cdr but not both.
1104: If the car is non-nil then
1105: if the value of the expression held in
1106: v-form is non-nil, a branch should be done to the symbol
1107: stored in \fI(car\ g-cc)\fP.
1108: If the cdr is non-nil then if the value of v-form
1109: is nil, a branch should be done to the symbol stored in
1110: \fI(cdr\ g-cc)\fP. Since at
1111: every conditional
1112: branch control should either jump or continue, the car and cdr will
1113: never both be specified.
1114: .ip \fIg-ret\fP
1115: - is non nil if the result of evaluating v-form will be returned as the
1116: value of the function we are evaluating.
1117: This is used to allow liszt to detect
1118: when tail recursion removal is possible.
1119: .ip \fIg-locs\fP
1120: - maintains current information about the state of the stacks:
1121: the bindstack (for specials), the namestack (for locals) and the
1122: C runtime stack (for catch frames)
1123: The form of g-locs is a stack of tokens stored as a list.
1124: The meaning of each token is as follows:
1125: .br
1126: \fInil\fP - this represents an unnamed object on the namestack. This happens
1127: when calling functions, when the arguments are stacked prior
1128: to a function call.
1129: .br
1130: \fI<symbol>\fP - the given symbol is the name of a local variable on the namestack.
1131: .br
1132: \fI(prog . <n>)\fP - prog statement which binds <n> special variables
1133: .br
1134: \fI(do . <n>)\fP - head of a do statement which binds <n> special variables
1135: .br
1136: \fI(catcherrset . 0)\fP - catch or errset frame on stack
1137: .br
1138: \fI(lambda . <n>)\fP - lambda expression which binds <n> special variables
1139: .ip \fIg-labs\fP
1140: - this is a stack of labels.
1141: There is one entry in g-labs for every entry which is a list
1142: in g-locs.
1143: the forms of the entries are:
1144: .br
1145: \fInil\fP - no labels in this form
1146: .br
1147: \fI(endlab (real . gen) (real2 . gen2) ...)\fP - endlab is the label to go to
1148: to get out of this body. After this label will be code
1149: to unbind specials and pop off locals.
1150: The 'real' labels are those found in the code. the gen
1151: labels are those generated and put in the assembler code.
1152: .sh +0 Function\ compilation
1153: .pp
1154: The action d-exp takes when compiling an expression depends on the
1155: type of expression.
1156: For atoms (symbols and numbers) the compilation is very simple, it
1157: is just a matter of putting the value where g-loc specifies and
1158: then jumping if specified by g-cc.
1159: When the expression is a list, d-exp first macro
1160: expands the form and then looks at the first element of
1161: the list (if the list has not macro expanded to an atom).
1162: If the first element is a symbol then the list is
1163: is a function call.
1164: If the function is one of the functions which liszt knows how to open compile
1165: then liszt will call the particular routine designated to open
1166: compile this function.
1167: There are two classes of functions within liszt that
1168: do open compiling.
1169: The first class, the fl-expr class, are distinguished by names which
1170: begin with c-.
1171: These functions always generate
1172: code which returns the result in r0.
1173: They are not equipped to obey the contents of g-loc and g-cc.
1174: Thus d-exp, after calling one of these functions, must do what
1175: is necessary to obey g-loc and g-cc.
1176: The other class of open compiling functions, the fl-exprcc class (whose
1177: names begin with cc-),
1178: handle g-loc and g-cc.
1179: As a result these are usually much more complex and generate better code.
1180: .sh -1 Register\ Conventions
1181: .pp
1182: The register conventions used by liszt are an extension of those
1183: used by the C code.
1184: .ip \fIr0\fP
1185: - return value placed here
1186: .ip \fIr1,r2,r3,r4\fP
1187: - scratch registers.
1188: When long strings of
1189: .i car's
1190: or
1191: .i cdr's
1192: are done (such as
1193: .i caddadr )
1194: these registers are used in a least-recently-used fashion to access down
1195: the list.
1196: The compiler keeps track of the values in these registers but must
1197: assume that they are garbage after a function is called.
1198: .ip \fIr5\fP
1199: - fixnum accumulator.
1200: There a number of functions which work on fixnum's only and these
1201: usually put their result in r5.
1202: The assembler code function
1203: .i qnewint
1204: which returns a pointer to a cell containing a fixnum value (after
1205: checking if it is in the fixnum table), expects its argument to be
1206: in r5.
1207: .ip \fIr6\fP
1208: np
1209: .ip \fIr7\fP
1210: lbot.
1211: When calling a function, this register is set just before the function
1212: call, and after the function call its value is used to reset the value
1213: of np in order to pop the arguments off the namestack.
1214: .ip \fIr8\fP
1215: the literal table pointer.
1216: The compiler generates a table of all the literal lisp data
1217: which the compiled code might access.
1218: Upon function entry a pointer to the base of this table is put in r8.
1219: For example, if we compile \fI(setq\ x\ 'foo)\fP then we will
1220: need a pointer to the lisp symbol
1221: .i foo
1222: and if the symbol
1223: .i x
1224: as been declared special we will also need a pointer to
1225: .i x .
1226: .ip \fIr10\fP
1227: - upon function entry the value of lbot (r7) is put in r10.
1228: This allows us to reference the arguments to our function while
1229: using lbot to call other function.
1230: .sh +0 Addresses
1231: There are four types of addresses used internally in Franz:
1232: symbolic, intermediate addresses (iadr), extended intermediate (eiadr)
1233: and vax assembler format.
1234: .sh +1 Symbolic
1235: .pp
1236: These are the names of lisp objects, such as `a', `foo', 34,
1237: or 3.234234.
1238: A name is either a local variable, a special variable or a
1239: number. A number is either a small fixnum, large fixnum,
1240: bignum or floating point number.
1241: .sh +0 Intermediate\ address\ (IADR)
1242: .pp
1243: This type of address is generated from a symbolic address by
1244: .i d-loc,
1245: .i d-loclit
1246: and the routines
1247: .i d-simple
1248: and
1249: .i d-rsimple
1250: which
1251: call them. The forms are
1252: .ip \fINil\fP
1253: - the location or value of nil.
1254: .ip \fIT\fP
1255: - the location or value of t.
1256: .ip \fIreg\fP
1257: - register r0, which is where the result of function calls
1258: are stored.
1259: .ip \fIstack\fP
1260: - the address pointed to by np with np incremented after
1261: the value is stored. (i.e (r6)+)
1262: .ip \fIunstack\fP
1263: - the address one word below np (np is decremented before
1264: accessing. (i.e. -(r6))
1265: .ip \fI<symbol>\fP
1266: - this is just <symbol>. This allows strange forms to be
1267: represented directly.
1268: .ip \fI(stack\ n)\fP
1269: - The n'th value on the namestack for this function.
1270: The first value 0(r10) is (stack 1).
1271: .ip \fI(vstack\ n)\fP
1272: - The cdr of the n'th value on the namestack.
1273: That is, (stack 1) is *0(r10)
1274: .ip \fI(bind\ n)\fP
1275: - The value of the n'th value in
1276: the literal table. If
1277: this refers to a symbol, then this is the value
1278: of the symbol.
1279: If this refers to a list then this
1280: this is the cdr of the list (although this is a rare
1281: use). (bind 1) corresponds to *0(r8).
1282: .ip \fI(lbind\ n)\fP
1283: - The n'th value in the literal table. If
1284: this refers to
1285: object foo then this is the address of foo
1286: in memory.
1287: .ip \fI(fixnum\ n)\fP
1288: - This is the address of small fixnum n in memory.
1289: A small fixnum is in the range -1024 to 1023.
1290: .ip \fI(immed\ n)\fP
1291: - The is the immediate constant n.
1292: .sh +0 extended\ intermediate\ address\ (EIADR)
1293: .pp
1294: This address is generated from an IADR by e-cvt. It
1295: symbolically represents a vax address.
1296: .ip \fI<atom>\fP
1297: - This corresponds to <atom>.
1298: .ip \fI(n\ <atom>)\fP
1299: - This corresponds to n(<atom>)
1300: (stack n+1) and (lbind n+1) are converted to these forms.
1301: .ip \fI(n\ <atom1>\ <atom2>)\fP
1302: - This corresponds to n(<atom1>)[<atom2>]
1303: There is currently no IADR which generates this.
1304: .ip \fI(*\ n\ <atom>)\fP
1305: -This corresponds to *n(<atom>)
1306: (vstack n+1) and (bind n+1) are converted to these forms.
1307: .ip \fI(+\ <atom>)\fP
1308: - This corresponds to (<atom>)+.
1309: stack is converted to this form.
1310: .ip \fI(-\ <atom>)\fP
1311: - This corresponds to -(<atom>)
1312: unstack is converted to this form.
1313: .ip \fI($\ <numb>)\fP
1314: - This corresponds to $<atom>
1315: (immed <numb>) is converted to this form.
1316: .ip \fI(#\ <numb>)\fP
1317: - This corresponds to $<loc> where <loc> equals
1318: the base of the fixnums (0x1400) plus 4 * <numb>
1319: (fixnum <numb>) is converted to this form
1320: .ip \fI(*\ #\ <numb>)\fP
1321: - This corresponds to $<numb>. It is generated
1322: by d-rsimple occasionally when you ask for the
1323: cdr of a number (which you do sometimes when you
1324: are compiling fixnum operators).
1325: .sh +0 Vax\ Unix\ assembler\ addresses
1326: .pp
1327: The addresses are printed from a EIADR by e-cvtas. The conversions
1328: are shown in the above section.
1329: .sh -1 Function\ calling\ convention
1330: .sh +1 Function\ linkages
1331: .pp
1332: The function associated with a symbol is stored in the function
1333: definition slot of the symbol. If the function slot contains a
1334: list then the function is to be interpreted and its 'car' will
1335: be lambda, nlambda, lexpr or macro. If the function is compiled then
1336: the function definition slot will contain a binary object.
1337: A binary object
1338: is composed of two independent parts: the discipline and the entry.
1339: The discipline is one of:
1340: .ip \fIlambda\fP
1341: - a function whose arguments should be evaluated.
1342: .ip \fInlambda\fP
1343: - a function whose arguments should not be evaluated but
1344: which should be passed as a list
1345: .ip \fImacro\fP
1346: - a function which should be passed the unevaluated form
1347: being evaluated and whose result should be evaluated.
1348: .ip \fI\"subroutine\"\fP
1349: - a foreign function subroutine
1350: .ip \fI\"integer-function\"\fP
1351: - a foreign function returning an integer
1352: .ip \fI\"real-function"\fP
1353: - a foreign function returning a flonum.
1354: .pp
1355: A lexpr is not in the list as there is no difference to the caller
1356: between compiled lambda's and compiled lexprs.
1357: .sh +0 Transfer\ tables
1358: Most calls from compiled code to other lisp functions go through
1359: transfer tables. A transfer table entry is a pair: symbol, address
1360: of routine.
1361: When another lisp function is called it uses the
1362: .i calls
1363: instruction which will
1364: indirectly jump through the address in the transfer table. This
1365: address may point to the desired function or it may point to the
1366: interface routine. If a call ends up in the interface routine,
1367: then that routine will trace back through the call stack and eventually
1368: reach the tranfer table entry that it was 'called through'. Since the
1369: transfer table entry contains a symbol which names the function
1370: to be called, the interface routine can determine which routine
1371: was to have been called. If that routine is compiled then the
1372: interface routine can modify the transfer table so that a call
1373: through that table entry will go directly to the desired function.
1374: If the routine to be called is interpreted, or if the user has
1375: requested that transfer linkages should be disabled, then the
1376: interface routine will go through the 'funcall' function
1377: in the interpreter to continue the call.
1378: .sh +0 calling\ sequence\ in\ the\ compiler:
1379: .pp
1380: When transfer tables are used
1381: .(b
1382: \fBforeach\fP arg
1383: \fBcompute\fP arg and stack result using (r6)+
1384: for example: movl r0,(r6)+
1385: movab -n(r6),r7 where n = 4*number of args to fcn
1386: calls $0,*trantb+m where m is the index of the function
1387: in the transfer table.
1388: movl r7,r6 restore r6
1389: .)b
1390: .pp
1391: The compiler supports local functions, which are function accessible
1392: only within one file.
1393: Because they are not accessible from C code, we can use a very fast
1394: call and return sequence when calling them.
1395: To call a local function
1396: .(b
1397: \fBforeach\fP arg
1398: \fBcompute\fP and stack using (r6)+
1399: jsb LOCALNAME go directly to the function, it will
1400: restore r6 before it returns.
1401: .)b
1402: .pp
1403: The beginning of each function looks as follows.
1404: First for a non-lexpr function called in the standard way
1405: (topsim is the label jumped to when tail merging, it will be unique
1406: for each function; the brackets indicate the optional code which
1407: exists if the -p switch is given to liszt);
1408: .(b
1409: .word 0x5c0 # save r6, r7, r8, r10
1410: [ movab mcounts,r0 # if profiling, mcounts replaced by fasl
1411: jsb mcount ]
1412: movab linker,r8 # set up r8
1413: movl r7,r10 # set up oldlbot
1414: movab n(r10),r6 # n = 4*Number of args expected.
1415: topsim:
1416: .)b
1417: .pp
1418: Now for lexprs:
1419: .(b
1420: .word 0x5c0
1421: [ movab mcounts,r0 # if profiling. [mcounts replaced by fasl]
1422: jsb mcount ]
1423: movab linker,r8 # set up r8
1424: subl3 $4,r7,-(sp) # address one word below bottom of args
1425: topsim:
1426: movl r6,r10 # first free addr to arg base
1427: subl3 r7,r6,r0 # number of args * 4 into r0
1428: movab 0x1400(r0),(r6)+ # stack boxed number of args
1429: movl 0(r10),-(sp) # also store on stack so user can't clobber
1430: .)b
1431: .pp
1432: And finally for local functions:
1433: .(b
1434: movl r10,-(sp) # save r10
1435: movab -n(r6),r10 # set up arg base using arg top
1436: topsim:
1437: .)b
1438: .sh -1 Assembler\ file\ format
1439: .pp
1440: The liszt compiler generates a file which is in Unix assembler format.
1441: The Unix assembler converts that file into an object file which fasl
1442: then reads.
1443: Although the object file generated is a standard Unix object
1444: file (as defined in /usr/include/a.out.h), it is not of a
1445: format that the Unix ld loader can understand.
1446: This is because the requirements of a lisp object file are different
1447: from an object file of other languages.
1448: The
1449: run time semantics of lisp and the fact that lisp data must be protected
1450: from garbage collection are the principal differences.
1451: The unconventional object file created by the unix assembler
1452: is a result of the
1453: unconventional assembler input file.
1454: Next we will look at what must be put in the assembler file and how
1455: it is put there.
1456: .pp
1457: The assembler file must contain
1458: .ip \fIinstructions\fP
1459: - vax assembler code for the compiled functions.
1460: If there aren't any functions compiled, this can be empty.
1461: .ip \fIliterals\fP
1462: - lisp data which is referred to by compiled code
1463: .ip \fItransfer\ table\fP
1464: - the names of the functions which correspond to the calls through
1465: the transfer table.
1466: The instructions simply say 'call indirect through the nth transfer
1467: table entry'.
1468: .ip \fIfunction\ names\fP
1469: - the names of the functions which are being
1470: defined.
1471: .ip \fIload\ time\ forms\fP
1472: - in order to mimic the
1473: .i load
1474: function, fasl has to be able to evaluate lisp expressions at fasl
1475: time, so we must be able to store lisp expressions in the assembler
1476: file and indicate when they should be evaluated.
1477: .pp
1478: Based on the requirements above, the form of the assembler file
1479: is as described below.
1480: The assembler builds two segments: text and data.
1481: We put all of our information in the text segment.
1482: The compiler places some annotation strings in the data segment
1483: so that the object file can be identified, however the data segment
1484: is ignored by fasl.
1485: The format is
1486: .ip \fIcompiled\ instructions\fP
1487: The instructions for each compiled (non-local) function begins with
1488: .(b
1489: .globl F00003
1490: F00003:
1491: .word 0x5c0
1492: .)b
1493: The globl declaration and the fact that the symbol name begins
1494: with a F will tell fasl that this is the beginning of a lisp
1495: function.
1496: The symbols beginning with F must be unique within a file
1497: but may be duplicated in other files.
1498: The lisp name of the function will appear later.
1499: Next the instructions for the function are given.
1500: Only a small fixed set of external symbols may be referenced.
1501: The list is found in the code for nfasl.c and the common
1502: ones will be described below (soon).
1503: Labels should be given a name beginning with L and should
1504: be unique within the file.
1505: .ip \fItable\ sizes\fP
1506: somewhere in the file there should be a pair of 'set' assembler
1507: pseudo ops which describe the sizes of the literal table and
1508: transfer table.
1509: The form is this
1510: .(b
1511: .set linker_size 4
1512: .set trans_size 3
1513: .)b
1514: where linker_size is the number of entries in the literal table
1515: which will be required and trans_size is the number of entries
1516: in the transfer table which will be required.
1517: Those tables will be built by fasl.
1518: .ip \fIbinder\ table\fP
1519: - this table describes the order that the functions should be
1520: defined and forms evaluated.
1521: It is a table of longwords beginning at the symbol bind_org
1522: and ending when a -1 entry is seen.
1523: The meaning of the entries will be described below.
1524: .ip \fIlisp\ expression\ table\fP
1525: - this is a table of null terminated strings beginning at the
1526: symbol lit_org and ending at the symbol lit_end.
1527: Each string is read by the lisp read function (using the
1528: raw readtable).
1529: The first linker_size expressions are put in the literal table.
1530: The next trans_size expressions are the names of the
1531: functions to put in the transfer table.
1532: The remaining expressions correspond to the entries in the
1533: binder table.
1534: The binder entries are processed, one by one.
1535: Provided that the binder entry is not -1, an expression
1536: is read from the lisp expression table.
1537: Then if the binder table entry is 0, that expression is the name of
1538: a lambda type lisp function.
1539: A binary object is created, the discipline is set to lambda and
1540: the function address is set to the lexically next function
1541: defined in the
1542: file.
1543: If the binder entry is 1 then this is an nlambda function, and
1544: if the entry is 2 then this is a macro function.
1545: If the entry is 99 then the expression just read should be evaluated
1546: by the lisp function eval.
1547: .pp
1548: The lisp compiler normally puts the assembler format output file in /tmp
1549: and removes it when it is done.
1550: The -S switch will tell liszt to write the
1551: assembler file in the same place as
1552: the source file, and with the same root name but a .s extension.
1553: The -C file will tell the lisp compiler to comment the file as it
1554: generates it, making it easier to understand what is going on.
1555: Assume that the following is file foo.l:
1556: .(b
1557: (defun foo (x) (bar y) (baz k))
1558: (print (foo 3))
1559: (def test (nlambda (l) (print 'hithere) (foo 3)))
1560: .)b
1561: we now compile it with -SC
1562: .(b
1563: .globl F00007 #(fcn lambda foo)
1564: F00007:
1565: .word 0x5c0
1566: movab linker,r8
1567: movl r7,r10
1568: movab 4(r10),r6
1569: L00008:
1570: movl *0(r8),(r6)+ #(calling bar)
1571: #(from y to stack)
1572: movab -4(r6),r7
1573: calls $0,*trantb+0
1574: movl r7,r6
1575: movl *4(r8),(r6)+ #(calling baz)
1576: #(from k to stack)
1577: movab -4(r6),r7
1578: calls $0,*trantb+8
1579: movl r7,r6
1580: ret
1581: .globl F00009 #(fcn nlambda test)
1582: F00009:
1583: .word 0x5c0
1584: movab linker,r8
1585: movl r7,r10
1586: movab 4(r10),r6
1587: L00010:
1588: movl 8(r8),(r6)+ #(calling print)
1589: #(from 'hithere to stack)
1590: movab -4(r6),r7
1591: calls $0,*trantb+16
1592: movl r7,r6
1593: movl $5132,(r6)+ #(calling foo)
1594: #(from (fixnum 3) to stack)
1595: movab -4(r6),r7
1596: calls $0,*trantb+24
1597: movl r7,r6
1598: ret
1599: bind_org:
1600: .set linker_size, 3
1601: .set trans_size, 4
1602: .long 0
1603: .long 99
1604: .long 1
1605: .long -1
1606: lit_org:
1607: .asciz "y"
1608: .asciz "k"
1609: .asciz "hithere"
1610: .asciz "bar"
1611: .asciz "baz"
1612: .asciz "print"
1613: .asciz "foo"
1614: .asciz "foo"
1615: .asciz "(print (foo 3))"
1616: .asciz "test"
1617: lit_end:
1618: .data # this is just for documentation
1619: .asciz "@(#)Compiled by Lisp Compiler 7.1 on Sun Feb 21 17:51:54 1982"
1620: .asciz "@(#)decl.l 1.5 2/10/82"
1621: .asciz "@(#)array.l 1.1 9/25/81"
1622: .asciz "@(#)datab.l 1.1 9/25/81"
1623: .asciz "@(#)expr.l 1.1 9/25/81"
1624: .asciz "@(#)io.l 1.1 9/25/81"
1625: .asciz "@(#)funa.l 1.3 2/10/82"
1626: .asciz "@(#)funb.l 1.2 2/10/82"
1627: .asciz "@(#)func.l 1.2 2/10/82"
1628: .asciz "@(#)tlev.l 1.4 10/24/81"
1629: .asciz "@(#)fixnum.l 1.6 10/21/81"
1630: .asciz "@(#)util.l 1.2 10/7/81"
1631: .)b
1632: .sh +0 functions\ callable\ from\ compiled\ lisp\ code
1633: .sh +0 Object\ File\ Format
1634: .sh +0 Fasl
1635:
1636:
1637:
1638:
1639:
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.