|
|
1.1 root 1: \documentstyle[11pt]{article}
2: \bibliographystyle{plain} %changed from alpha
3: \documentstyle{openbib}
4: \makeindex
5: \oddsidemargin .5in % Left margin on odd-numbered pages.
6: \evensidemargin .5in % Left margin on even-numbered pages.
7: \topmargin .5in % Nominal distance from top of page to top of
8: \headheight 0pt % Height of box containing running head.
9: \headsep 0pt % Space between running head and text.
10: % Bottom of page:
11: \footheight 12pt % Height of box containing running foot.
12: \footskip 30pt % Distance from baseline of box containing
13: % foot to baseline of last line of text.
14: \textwidth 5.5in % width of text on page
15:
16: %\newcommand{\mysection}[1]{\section{\protect\large\bf #1}}
17: %\setcounter{secnumdepth}{0}
18:
19: \sloppy
20:
21: \begin{document}
22: \title{A Runtime System \\ (DRAFT)}
23: \author{Andrew W. Appel\thanks{
24: Supported in part by NSF Grants
25: DCR-8603453 and CCR-880-6121}
26: \\ Princeton University}
27: \date{February 1989}
28: \maketitle
29: \begin{abstract}
30: The runtime data structures of the Standard ML of New Jersey compiler
31: are simple yet general. As a result, code generators are easy to implement,
32: programs execute quickly, garbage collectors are easy to implement and
33: work efficiently, and a variety of runtime facilities can be provided with
34: ease.
35: \end{abstract}
36:
37: \section{Introduction}
38: \label{introduction}
39:
40: Some languages, like Lisp, Smalltalk, ML, Prolog, etc. rely heavily on a
41: {\em runtime system} to provide essential services. The most important
42: such service is the management of dynamically-allocated storage
43: (e.g. {\bf garbage collection}), but the runtime system may also provide
44: facilities like
45:
46: \begin{itemize}
47: \item {\bf Stream input/output:} on operating systems (like Unix) that do not
48: have a buffered input/output facility, the process must provide its own;
49: this might be handled in the runtime system.
50: \item {\bf Structured input/output:} the ability to automatically
51: write a large linked data structure to a file, and read it back in
52: with all links adjusted, is a great convenience that can be implemented
53: efficiently in the runtime system.
54: \item {\bf Process suspension:} a snapshot of an
55: executing process may be ``preserved'' in a file, so that
56: the execution of that file causes a new process to start exactly
57: where the saved one left off.
58: \item {\bf Operating system calls:} operating system services needed by
59: a program may be conveniently packaged by the runtime system.
60: \item {\bf Handling of interrupts and asynchronous events:} if the programming
61: language has a mechanism to handle asynchronous events, it relies on the
62: runtime system for its implementation.
63: \item {\bf Handling of arithmetic exceptions:} the programming language's
64: exception-handling mechanism must be implemented in cooperation with
65: the runtime system.
66: \item {\bf Assembly-language implementation of language primitives:}
67: it may be inconvenient for the compiler to generate code for some features
68: of the programming language; these functions can be implemented as calls
69: to runtime system routines.
70: \item {\bf Foreign-language procedure calls:} calls to subroutines written
71: in other languages may be mediated by the runtime system.
72: \item {\bf Fun with continuations:} Languages with features like {\em call with
73: current continuation}, which allows the explicit manipulation of threads
74: of control, require runtime-system cooperation.
75: \item {\bf Execution profiling:} Automatic measurement of the time spent in
76: different parts of the user program can be accomplished with the help of
77: the runtime system.
78: \item {\bf Debugging:} starting, stopping, and displaying the execution state
79: of user programs can be accomplished only by low-level routines.
80: \end{itemize}
81:
82: Since this list is rather long, and several of these features may interact,
83: it is evident that runtime systems can become nasty and complicated.
84: The proliferation of data types may make the implementation of
85: the garbage collector (and other programs that must traverse the data)
86: inefficient.
87:
88: We particularly wanted a runtime data layout that provided fast
89: allocation of records and fast garbage collection, since ML makes very
90: heavy use of dynamically-allocated storage. To this end, we eliminated
91: the runtime stack and use a very simple data format.
92: The only runtime data types are {\em integers}, {\em pointers},
93: {\em records}, and {\em strings}.
94: Then we found that this very simple structure allowed the easy implementation
95: of many of the services described above.
96:
97: Our runtime system has relatively few ML-specific
98: features; it could be used equally well for other languages. This paper
99: describes the design and implementation of the SML-NJ runtime system.
100: It should be read in conjunction with an earlier paper on the
101: SML-NJ generational garbage collector\cite{appel89:sggc}.
102:
103: \begin{quotation}{
104: \small Text in smaller type is of interest only to those who might
105: have to read, modify, or maintain the source code to the runtime system.
106: }\end{quotation}
107:
108: \section{Standard ML of New Jersey}
109: \label{standard}
110: The ML language originated as part of the Edinburgh LCF proof system
111: \cite{gordon78}, and was soon implemented as a stand-alone compiler
112: \cite{cardelli83:ml}\cite{cardelli84}. The language was
113: ``standardized'' \cite{milner84}\cite{macqueen84}
114: \cite{milner85}\cite{harper88}, and
115: Standard ML has been implemented at Edinburgh, Cambridge, and
116: New Jersey\cite{appel87:sml}; the New Jersey implementation is
117: a joint effort between researchers at AT\&T Bell Laboratories and
118: Princeton University.
119:
120: Though ML was first implemented as the meta-language of a theorem-proving
121: system, Standard ML is a general-purpose
122: programming language with several advantages over more conventional
123: languages. Its important characteristics are:
124:
125: \begin{itemize}
126: \item {\bf Automatic garbage collection:} this is a great convenience in
127: writing correct and readable programs.
128: \item {\bf Static, polymorphic types:} like Pascal, types are checked
129: at compile-time and not at runtime; but like Lisp, there is great flexibility
130: and re-usability of code.
131: \item {\bf Safety:} there are no runtime insecurities (i.e. ``core is never
132: dumped''); this is unlike the C language, where unsafe pointers run
133: rampant, and like Lisp (except when Lisp programmers turn off the runtime
134: type checking for efficiency). In ML, there is no run-time type checking,
135: but safety is guarateed by compile-time type checking.
136: \item {\bf Higher-order functions:} like Scheme (and lambda-calculus); this
137: can lead to a desirable conciseness of expression.
138: \item {\bf Typechecked Modules:} like Ada and Modula.
139: \item {\bf Exception handling:} a dynamically-scoped exception
140: mechanism allows both hardware- and software-generated exceptions
141: to be caught and handled by user programs.
142: \end{itemize}
143:
144: One important feature of a systems programming language
145: is not prescribed by the list above, and in fact cannot be specified
146: by a language definition: an efficient and robust implementation.
147: Standard ML of New Jersey\cite{appel87:sml} is meant to be a complete,
148: efficient, robust, and cleanly written compiler for the language.
149: It has an optimizing code generator based on continuation-passing style
150: \cite{appel89:cps}.
151:
152: A knowledge of ML is not necessary to understand its runtime system.
153:
154: \section{Tagging schemes}
155: \label{tagging}
156:
157: Almost all the pieces of the runtime system must deal with the data structures
158: of the executing program. Therefore, it is helpful to keep the format of
159: this data as simple and straightforward as possible.
160:
161: ML, like Lisp, allows polymorphic functions: a function that reverses a list
162: of objects (for example) need not know the type of the objects. In order that
163: the same piece of executable code can operate on objects of arbitrary type,
164: it is necessary that every object be represented in the same amount of space.
165: As in Lisp, this is achieved by making everything be the size of a pointer;
166: if an object's natural representation is larger (as for a record of $n$
167: objects), then it is represented by a pointer to storage on the heap.
168:
169: The garbage collector must be able to determine the size and layout of each
170: object it traverses. This can be accomplished in several ways:
171: \begin{itemize}
172: \item {\bf By encoding a type-tag\index{tag}
173: inside each pointer;} we chose not to do this
174: because it reduces the number of bits left for actual addressing.
175: In these days of 32-bit pointers and 100-megabyte memories, it is easy
176: to imagine the need for every bit of addressability we can muster.
177: \item {\bf By reserving different areas of memory for objects of different
178: types.}
179: BIBOP\index{BIBOP} (Big Bag Of Pages) schemes require that each page hold objects of a single
180: type, and a global table maps pages to types. This is relatively efficient,
181: and doesn't reduce the number of addressable words. But it complicates the
182: process of allocating and copying objects, since several free regions
183: are simultaneously required.
184: \item {\bf Statically-typed languages don't require any runtime-tagging} at all;
185: instead, the compiler can tell the garbage collectector about the type
186: system of the program. This has worked well in Pascal\cite{britton75}.
187: Even though it is also theoretically possible in ML, in practice the
188: polymorphic type system introduces overhead and complexity that make
189: this method unattractive\cite{appel89:tag}.
190: \item {\bf Each record can have a descriptor at the beginning} that explains
191: which fields are integers and which are pointers. This method works
192: well in non-polymorphic languages like Modula-2+ and Mesa, where
193: descriptors can be computed at compile-time and just inserted
194: in records as they are created. However,
195: in polymorphic languages like Lisp and ML
196: this descriptor would have to be laboriously
197: constructed each time a record is created, introducing unacceptable overhead.
198: \item {\bf Each record can have a descriptor\index{descriptor} at the
199: beginning} that tells
200: the length of the record, and {\bf each field can have a tag bit} that tells
201: whether it is a pointer or a non-pointer.
202: This is what we have done.
203: \end{itemize}
204:
205: To have a tag word on each record, and a tag bit on each
206: field, is not clever at all; but we are not always
207: striving for cleverness, we
208: want simplicity and efficiency. If all our records were {\em cons} cells,
209: then one-third of memory would be devoted to tag words; but memories
210: are large and cheap. And in Standard ML, records are of many different
211: sizes, and two-word records are not particularly predominant.
212:
213: By having only one free region, we are
214: able to do fast allocation\index{allocation}
215: and also manage the total amount of virtual
216: memory used\cite{appel89:sggc}; this would be much more difficult
217: with a BIBOP scheme.
218:
219: \section{Data formats in our runtime system}
220: \label{data}
221: Standard ML of New Jersey has
222: records, strings, procedures (machine code),
223: closures, constructors, arrays, byte-arrays, floating-point
224: numbers, references (modifiable one-word records),
225: modules, and integers. This large set of language primitives and
226: user-defined datatypes are represented
227: by just two runtime data formats: one for objects that contain pointers
228: (records, closures, constructors, arrays, references, modules), and
229: the other for objects containing no pointers (strings, procedures,
230: byte-arrays, floating point). The garbage collector (and other
231: parts of the runtime system) need to understand only these two formats,
232: not the many kinds of ML objects.
233:
234: A {\em field\index{field}} is either a
235: {\em pointer\index{pointer}} or an {\em integer\index{integer}}. Pointers
236: have a low-order bit of 0; integers have a low-order bit of 1.
237: It is necessary to distinguish pointers from non-pointers in order that
238: the garbage collector\index{garbage collector}
239: will know what structures to traverse.
240: On a byte-addressable machine, all pointers to aligned (4-byte) words
241: are multiples of 4 anyway, so pointers have a low-order bit of 0 in
242: their natural representation. The high-order 31 bits of an integer
243: field can represent an integer in the range $[-2^{30}, +2^{30}-1]$.
244:
245: Some Lisp implementations use a similar representation except that
246: pointers have a low-order 1 and integers have a low-order 0. This makes
247: arithmetic on tagged integers somewhat easier, and takes advantage
248: of the fact that pointers are usually used with a displacement addressing mode;
249: the displacement can be adjusted by 1 with no penalty in efficiency.
250: This is probably a better arrangement, but
251: either version of this scheme will work; and a simple analysis\cite{appel89:tag}
252: shows that the efficiency trade-offs are negligible, and that
253: doing arithmetic around the low-order integer tag is not very costly.
254:
255: An {\em object on the heap} may be either a {\em record\index{record}}
256: (containing fields,
257: i.e. pointers and tagged integers) or
258: a {\em string\index{string}}
259: (containing bytes of an arbitrary bit-pattern, but no pointers).
260:
261: A {\em record} is a sequence of $n>0$ fields numbered $0,1,...,n-1$.
262: Each record has a {\em descriptor} at position $-1$. The low-order bit
263: of a descriptor\index{descriptor} is 1
264: (making it look like an integer), the next three
265: bits identify the object as a record, and the high-order 28 bits give the
266: number of fields. Thus, each record is limited to one gigabyte, which
267: should not be a significant limitation (seriously, though, it is important
268: to avoid miserly restrictions on the sizes of objects).
269:
270: In the implementation of closures, it is useful to be able to point
271: at the interior of records. However, it is always necessary for the
272: garbage collector, given a pointer, to find the descriptor of an object.
273: If the
274: fields $0$ through $k-1$ of a record are all pointers, then a pointer
275: can point at the $k^{\rm th}$ field of the record; since the
276: descriptor is unboxed (has a low-order bit of 1) and the fields $0$ through
277: $k-1$ will all be boxed, the garbage collector can easily search backward
278: for the descriptor of the record. So, we allow pointers to the interior
279: of a record, providing all the previous fields are boxed; this is sufficiently
280: flexible for our needs in implementing closures, as will be described.
281: Otherwise,
282: no pointer may address the interior of a record.
283:
284: A {\em string\index{string}}
285: contains $n>0$ bytes (numbered from $0$ to $n-1$),
286: padded with trailing zero bytes
287: to a multiple of four.
288: Immediately preceding the $0^{\rm th}$ byte is a one-word
289: descriptor whose low-order bit is 1, whose next three bits describe
290: the type of string (and distinguish strings from records),
291: and whose upper 28 bits give the length in bytes.
292:
293: Strings are used for a variety of purposes;
294: they can hold printable characters, real numbers, machine code,
295: or any other kind of data that doesn't contain pointers. In fact,
296: a string object can hold several embedded string objects.
297: A pointer may point at the $0^{\rm th}$ byte of a string, and may point to
298: an embedded string at the
299: $k^{\rm th}$ byte provided that $k>0$ is a multiple of 4 and that the bytes
300: $k-4, k-3, k-2, k-1$ contain a {\em back-pointer\index{back-pointer}}
301: descriptor that
302: contains the number $k$ (the offset to the beginning of the record).
303: This allows the beginning of a string object
304: to be found quickly, given a pointer into it.
305: Note that a back-pointer couldn't be distinguished from data, except
306: for the fact that it is found just before the pointed-to byte of the
307: string.
308:
309: The three tag\index{tag} bits of a descriptor\index{descriptor}
310: can denote any of these kinds of records,
311: strings, etc:
312:
313: \begin{itemize}
314: \item[0] record\index{record}
315: \item[1] forwarding-pointer\index{forwarding pointer}
316: (for the garbage collector\index{garbage collector})
317: \item[2] back-pointer (that precedes an referenceable location in a string)
318: \item[3] embedded string\index{embedded string} length (described below)
319: \item[4] array\index{array} (just like a record).
320: \item[5] byte-array\index{byte-array}
321: (just like a string).
322: \item[6] this tag value is unused.
323: \item[7] string\index{string}
324: \end{itemize}
325:
326: Because of ML's static type system, it is not necessary to put type information
327: in the descriptors of objects. Therefore, tags are necessary only
328: for the garbage collector's benefit, since it must distinguish objects of
329: different formats.
330:
331: The only exception to this rule is that mutable objects (arrays, byte-arrays)
332: must be distinguishable from immutable objects (records, strings)
333: in order to implement the polymorphic
334: equality\index{equality}
335: feature. For all purposes of the runtime system, arrays
336: are identical to records, and byte-arrays are identical to strings.
337:
338: A dynamically-typed language like Lisp would require more than
339: three bits to distinguish types of objects; this would not pose a
340: serious problem. It would still be possible in Lisp to have just
341: two formats (pointer-containing and pointer-free), as in our runtime system.
342:
343: \section{Forwarding pointers}
344: \label{forwarding}
345: \index{forwarding}
346: The Standard ML of New Jersey runtime system has a generational garbage
347: collector\index{garbage collector}
348: that takes advantage of object lifetime and referencing
349: patterns\cite{appel89:sggc}.
350: But at the heart of any generational garbage collector is a simple
351: copying garbage collector as originally described by Cheney\cite{cheney70}.
352: Objects are copied from {\em fromspace} to {\em tospace} in a breadth-first
353: order, with the {\em tospace} itself serving as the queue for the breadth-first
354: traversal. The {\em fromspace} versions of objects are overwritten
355: with {\em forwarding pointers}, so that when other references to them are
356: found, it is easy to find the {\em tospace} copies of them.
357:
358: The fundamental operation in Cheney's algorithm is to {\em forward} a pointer.
359: This means taking a pointer into {\em fromspace} and making it point to {\em tospace}.
360: If the {\em fromspace} object it points to has already been copied, then its
361: forwarding pointer is taken as the new value; otherwise, the object must
362: be copied to {\em tospace} and a forwarding pointer installed.
363:
364: Forwarding is relatively easy using our runtime data format. We have a special
365: kind of descriptor {\bf forwarding-pointer}, that indicates that a {\em fromspace}
366: object has already been copied. If an object has this descriptor, then
367: the first word (after the descriptor) is to be interpreted as the address
368: of the copy. The only complications in forwarding are that pointers
369: may point into the middle of records and strings.
370:
371: Our runtime system (and garbage collector) is implemented in C.
372: In figures~\ref{forward}, \ref{trappc}, \ref{gc} we show the entire
373: core of our copying garbage collector. We could use pseudo-code
374: and describe the algorithm abstractly, but we want to emphasize that
375: our simple runtime data format does permit an efficient and easy-to-implement
376: garbage collector.
377:
378: We pretend that all ML values are integers, and make liberal use of
379: casts to maintain this pretense. Our \verb"forward" function is
380: shown in figure~\ref{forward}; it
381: takes an ML value (by reference) and modifies it (if a pointer into
382: {\em fromspace}) to point into {\em tospace}.
383: \begin{figure}[htbp]
384: \label{forward}
385: \begin{verbatim}
386: 1 forward(int *refloc)
387: 2 {register int *m = *((int**)(refloc)), len;
388: 3 if( (m&1)==0 && (m >= (int*)lowest && m < (int*)highest))
389: 4 { m--; /* make m point at the descriptor */
390: 5 while(1)
391: 6 {len = (*m)>>4;
392: 7 switch(m&15)
393: 8 {case tag_backptr:
394: 9 m -= len;
395: 10 continue;
396: 11 case tag_string:
397: 12 case tag_bytearray:
398: 13 len = (len+3)/4;
399:
400: 19 /* fall through */
401: 20 case tag_record:
402: 21 case tag_array:
403: 22 {int **i=(int**)m, **j=to_ptr;
404: 23 while (j+len >= to_lim)
405: 24 do to_lim=gmore();
406: 25 while (len-- >= 0)
407: 26 do {*j++ = *i++;}
408: 27 ((int**)m)[1]= 1+(int*)to_ptr;
409: 28 to_ptr = j;
410: 29 }
411: 30 (*m) = tag_forwarded;
412: 31 /* fall through */
413: 32 case tag_forwarded:
414: 33 *(int*)(refloc) += ((int*)m)[1] - ((int)(m+1));
415: 34 return;
416: 35 case tag_embedded:
417: 36 default:
418: 37 m--; continue;
419: 38 }} } }
420: \end{verbatim}
421: \caption{The {\tt forward} function}
422: \end{figure}
423: Line 2 establishes $m$ as a copy of the original value to be forwarded.
424: Line 3 considers only the case that $m$ is a pointer, and points into
425: {\em fromspace}. (Any pointers not into {\em fromspace} are treated as constants.)
426: Line 4 adjusts m to point at the descriptor of the {\em fromspace} object.
427:
428: Lines 5--38 loop until the beginning of the object is found.
429: If [line 8] $m$ points to a back-pointer\index{back-pointer},
430: the appropriate offset is subtracted
431: and we start again; similarly, if [lines 35,36] $m$ points to a pointer
432: (which can happen if we
433: point into the middle of a closure\index{closure} record) or an
434: embedded\index{embedded string} descriptor,
435: then $m$ is decremented and we try again.
436:
437: If $m$ is a string\index{string length} or byte array [line 13],
438: then we adjust its $len$ to count in words rather
439: than bytes (i.e. we divide by 4, rounding up).
440: (Lines 14--18 are explained below.)
441:
442: Now at line 22 we have either a string or record in {\em fromspace}
443: that needs to be copied. We verify that there's enough space remaining
444: in {\em tospace}; if not, we call the function \verb"gmore" that will either
445: \index{gmore}
446: get more space or die trying. Then we copy all the words of the
447: object. Finally [lines 27 and 30] we install a forwarding pointer
448: into the {\em fromspace} object and mark its descriptor as forwarded.
449:
450: For an already forwarded object (line 33), whether it was forwarded
451: in a previous call or just now, adjust the reference \verb|*refloc|
452: to point at the new object. Since the reference might have pointed
453: at the middle of the old object, we must take care to make it point
454: to the corresponding location in the new object; this is accomplished
455: by the computation:
456: \[
457: {\rm old\;pointer} + {\rm beginning\;of\;new\;object} -
458: {\rm beginning\;of\;old\;object.}
459: \]
460:
461: Then the \verb"forward" function returns.
462:
463: Almost without exception, all pointers to objects point to places where
464: an immediately preceding descriptor\index{descriptor}
465: explains the format of the object.
466: The exception is an artifact of the
467: very fast allocation mechanism that can create and initialize
468: a $k$-word object in $k+2$ instructions\cite{appel89:sggc}.
469: This mechanism relies on a page fault\index{fault}
470: trap to invoke the garbage collector
471: when the free space is exhausted; when this trap occurs, the program
472: \index{pc}
473: counter will point into the middle of a string object that might
474: be moved by the garbage collector. This is the only reference that
475: points to a place in a string that lacks a backpointer\index{back-pointer}.
476: It can be handled
477: relatively simply: as each string object is moved, we check to see
478: if the saved program counter points into the middle of it; if so,
479: we adjust the saved program counter. Since the number of string objects
480: is typically less than one percent of the number of record objects,
481: very costly overall. The test occurs at line 14, as shown in
482: figure~\ref{trappc}.
483: \index{trap_pc}
484: \begin{figure}[htbp]
485: \label{trappc}
486: \begin{verbatim}
487:
488: 14 if (!trap_pc_done
489: 15 && m < trap_pc && m+len >= trap_pc)
490: 16 {trap_pc_done=1;
491: 17 trap_pc += to_ptr - (int**)m;
492: 18 }
493: \end{verbatim}
494: \caption{Testing the (saved) program counter}
495: \end{figure}
496:
497: \section{Garbage collection}
498: \label{garbage}
499:
500: Once the \verb"forward"ing procedure is written, a simple copying garbage
501: collector is trivial (figure~\ref{gc}).
502: \index{garbage collector}
503: \begin{figure}[htbp]
504: \label{gc}
505: \begin{verbatim}
506: 1 gc(int *from_low, int *from_high,
507: 2 int *to_low, int *to_high,
508: 3 int **roots,
509: 4 int *trap_pcx, int *(*get_more)()
510: 5 )
511: 6 { gmore=get_more; trap_pc = *trap_pcx; to_ptr = to_low;
512: 7 trap_pc_done = !(trap_pc>=(int*)from_low
513: 8 && trap_pc<(int*)from_high);
514: 9 lowest=from_low; highest=from_high;
515:
516: 10 while (*roots) forward(*roots++);
517:
518: 11 { int *x = to_low;
519: 12 while (x<to_ptr)
520: 13 { int *p = x+1, descr = (*x), len=descr>>4;
521: 14 if (string_or_bytearray(descr))
522: 15 x += (len+3)/4 + 1;
523: 16 else {x += len+1;
524: 17 do {forward(p++);} while (p<x);
525: 18 } }
526:
527: 19 *trap_pcx = trap_pc;
528: 20 }
529: \end{verbatim}
530: \caption{The {\tt gc} function}
531: \end{figure}
532: The function \verb|gc| is parametrized by the lower and upper bounds
533: of {\em fromspace} and the lower and upper bounds of {\em tospace}. The \verb"roots"
534: parameter is a vector of pointers to the roots of accessible objects;
535: in the Standard ML system this is little more than the addresses of
536: saved machine registers. The last two parameters are the address
537: of the saved program counter (necessary as explained in
538: section~\ref{forwarding})
539: and the address of a function which can be called to expand the {\em tospace}
540: if necessary.
541:
542: The first step [lines 6--9] is to put various quantities into global
543: variables to make them accessible to the \verb"forward" procedure,
544: since C does not have nested procedures.
545:
546: The next step is to forward all the root variables (line 10).
547:
548: Finally [lines 11-18],
549: we forward each word of each record in {\em tospace}. Since the
550: \verb|forward| procedure may increment the \verb"to_ptr" variable
551: that denotes the end of the filled portion of {\em tospace}, we have to
552: keep iterating until \verb|x| catches up with it. In effect,
553: the {\em tospace} between \verb|x| and \verb|to_ptr| is the queue of the
554: breadth-first search. The queue must eventually become empty,
555: since there is a finite amount of accessible data to be copied.
556: In this phase, integers (and strings) are just skipped, since
557: they contain no pointers that need forwarding.
558:
559: This garbage collector is relatively simple, and therefore it's not difficult
560: to make it fast. Even the fanciest of generational or concurrent collectors
561: relies on a program like this as its inner loop; the very simple layout
562: of data in the Standard ML runtime system makes efficiency easy.
563:
564: \section{Fast allocation}
565: \label{fast}
566:
567: Copying garbage collection, because it takes time proportional only to the
568: live data and not to the amount of garbage, can be arbitrarily
569: efficient\cite{appel87:gc}. That is, there is no lower bound on the
570: amortized cost of garbage collection for each cell allocated.
571: Standard ML's generational copying garbage collector\cite{appel89:sggc}
572: expends on the order of one (amortized)
573: machine instruction for every cell allocated; the precise amortized
574: cost is proportional to the ratio of live data to heap size.
575:
576: Since garbage collection is so fast, it makes sense to make
577: allocation\index{allocation}
578: fast too. Standard ML of New Jersey allocates a new record every
579: 40 to 80 machine instructions, so we want a very low overhead on the
580: creation of an object.
581:
582: Since copying garbage collectors compact the live objects into consecutive
583: memory cells, the free region is all contiguous. This means that to create
584: an $n$-word object, we can just grab the next $n$ words of the free space.
585: Since objects are created so often, it makes sense to make allocation
586: in-line (with no procedure call) and to reserve a register \verb"fr" to point
587: to the beginning of the free region. Then it becomes very simple
588: to allocate a new object: \verb"A = CONS(B,C)" is implement as
589: \begin{enumerate}
590: \item mem[fr+2] := C
591: \item mem[fr+1] := B
592: \item mem[fr] := descriptor(2,tag\_record)
593: \item A := fr+1
594: \item fr := fr+3
595: \end{enumerate}
596: Each of these lines is one machine instruction. Thus, in five instructions
597: (plus one instruction of amortized garbage collection overhead)
598: we have made a new {\em cons} cell; it takes only twice as long to create
599: a data structure as it does to read it!
600: This fast allocation encourages a simple and clean programming style;
601: no longer do programmers have to stand on their heads to avoid {\em cons}ing.
602:
603: Of course, this won't work if the free region is exhausted. We can insert
604: an explicit test to make sure that \verb"fr" is not near the end of the
605: free region. But a more clever trick is to make the virtual memory page
606: at the end of the free region inaccessible, so that we will get a page
607: fault\index{fault}\index{page fault}
608: when the free region is exhausted. That's why we store the
609: last field (\verb"mem[fr+2]") first in the example above; it's simpler
610: for the garbage collector if the fault occurs at the very beginning
611: of creating the new cell.
612: (On the MC68020, the state of the machine at a page fault is complicated,
613: and it's not easy to restart the faulting instruction; so on that machine
614: we use an explicit comparison with a limit register.)
615:
616: The page fault will be caught by the hardware and handed to the operating
617: system, which can then pass control to the user process. The user process
618: then has to find all the registers of the faulting procedure; these
619: registers are the roots of the accessible data. Appel\cite{appel89:sggc}
620: and Cormack\cite{cormack88} both describe schemes for finding these
621: registers; Cormack's is simpler and more reliable.
622:
623: The ML program and the garbage collector behave like two processes
624: (threads\index{thread})
625: running in the same address space: while one thread executes, the
626: other's registers are saved. When ML suspends itself (either
627: because of a page fault indicating end of free space, or voluntarily),
628: it saves its registers into a static area that looks just like a process
629: control block, so that the garbage collector (and other parts of the
630: runtime system) can access them.
631:
632: \begin{quotation}
633: {\small Here's how Cormack's scheme works: A page fault arrives, causing
634: the operating system to invoke the C function \verb"ghandle".
635: \index{ghandle}
636: This function is passed (as an argument) a structure containing
637: the address of the faulting instruction. Other saved registers are
638: at undocumented locations on the stack. \verb"ghandle" saves the
639: \index{saved_pc}\index{pc}
640: faulting pc in \verb"saved_pc", and modifies its argument structure
641: to point to the assembly-language function \verb"saveregs". Then it
642: returns; the operating system restores the pc from the argument
643: structure, restores other registers (from those undocumented locations),
644: and resumes. But of course, we have fooled the operating system into
645: \index{saveregs}
646: resuming in \verb"saveregs", which stores all registers into known
647: global variables and returns to the C thread (i.e. in the function
648: \verb"runML").
649:
650: The C thread (typically) does garbage collection, then calls the
651: assembly-language function \verb"restoreregs", which loads registers
652: (and program counter) from their global variables and resumes execution
653: of the ML thread.
654:
655: Sometimes it's useful to invoke the C thread without a page fault,
656: e.g. to export the process state into a file or to do a structured
657: write. In this case, \verb"saveregs" can be called directly from assembly
658: code in the ML thread.
659: }
660: \end{quotation}
661: \section{Heap allocation of procedure call frames}
662: \label{heap}
663:
664: Since heap allocation is so cheap, we put procedure call frames on the
665: heap instead of on the stack\index{stack}.
666: This has many advantages\cite{appel89:cps},
667: but the one of interest here is that it greatly simplifies the runtime
668: system. Many of the facilities described in this paper
669: would be much more complicated to implement if runtime stacks had to be
670: dealt with.
671:
672: Any record on the stack must be initialized as it is created;
673: otherwise it will contain garbage data that could be interpreted
674: by the garbage collector as spurious pointers. However, typical
675: code generators will allocate a call frame on
676: entry to a procedure and spill registers into it as needed.
677: This must be avoided. One way to solve
678: this problem is to delay allocation of the frame; values can be accumulated
679: in registers until spilling is necessary, then all the registers can
680: be spilled at once into a newly-created frame, which is just a record
681: object in runtime data format. Thus, the frame is completely initialized
682: as it is created.
683:
684: The Standard ML of New Jersey compiler doesn't use ``procedure call frames.''
685: Since it uses continuation\index{continuation}-passing
686: style\cite{steele78}\cite{kranz86}\cite{appel89:cps},
687: what an ordinary mortal might call a ``frame'' is really just the
688: closure of a continuation. It is easy to create closures as ordinary
689: records that are completely initialized when they are created.
690: This simplifies both the code generator
691: and the runtime system, though the trick described in the previous paragraph
692: will work for more conventional code generators.
693:
694: Since there's no runtime stack, the
695: {\em call-with-current-continuation}\cite{rees86}
696: \index{call-with-current-continuation}
697: primitive can be implemented very efficiently. In implementations
698: with a runtime stack, the entire stack must be copied when {\em call/cc}
699: is evaluated (or else there must be a lot of extra complexity in stack
700: management); without a stack, the execution of
701: {\em call/cc}, and the execution of
702: saved continuations, take just a few instructions each. This makes
703: {\em call/cc} a practical programming tool, just as fast allocation
704: makes {\em cons} practical.
705:
706: \section{Representing ML structures in records and strings}
707: \label{representing}
708: Section~\ref{data} describes just two kinds of
709: objects---records\index{record} and
710: strings\index{string}---referenceable at their beginning and (in a limited way)
711: at interior points. All of the kinds of ML data can be represented
712: in records and strings.
713:
714: An ML value must be representable in one word. A larger value can be {\em
715: boxed} by putting it in several words on the heap and keeping a (one-word)
716: pointer to it. A small value (like a 31-bit integer)
717: can be kept {\em unboxed\index{unboxed values}} by representing it without indirection in
718: a machine word.
719: Boxed\index{boxed values}
720: values (pointers)
721: are distinguished from unboxed values (integers, or data represented as
722: integers)
723: by their low-order bit.
724:
725: An ML {\bf record\index{record}}
726: is an $n$-tuple of values. It has a natural representation
727: as a record in our runtime data format. ML has a {\it pro forma}
728: record of length 0; as our runtime data format does not allow objects of length
729: 0 (since that would leave no room to put a forwarding pointer, as described in
730: a section~\ref{forwarding}),
731: the empty record
732: is represented as the unboxed integer 0. This does no harm, since
733: no fields can be selected from an empty record anyway.
734:
735: An ML {\bf array\index{array}} is also an $n$-tuple of values.
736: In the ML language,
737: record-field offsets are determined at compile-time, whereas
738: arrays may be indexed by runtime values;
739: and arrays may be modified after
740: they are created, whereas records are immutable.
741: But neither of these
742: differences matters to the runtime system.
743: The polymorphic equality function of ML requires distinguishing between
744: mutable and immutable objects at runtime, so arrays and records
745: have different tags, however; this requires that they have different
746: tags.
747:
748: ML does permit
749: arrays\index{arrays of length 0} of length 0,
750: but our runtime data format does not.
751: Happily, all arrays of length 0 have
752: the same behaviour, so a special object of length 0 (located outside
753: the garbage-collectible region) serves to represent all the
754: empty arrays.
755:
756: {\bf References} in ML are mutable objects:
757: \verb"val a = ref 5" declares a reference\index{reference variables}
758: variable \verb"a" that may be
759: changed by a later assignment statement, unlike most variables which cannot
760: be modified once defined. Reference variables behave just like single-element
761: arrays, and that's how they are represented in the runtime system.
762:
763: ML has {\bf datatype\index{datatype}}
764: declarations to allow tagged variant types. For example,
765: the declaration
766: \begin{verbatim}
767: datatype t = NAME of string | NUMBER of int
768: \end{verbatim}
769: specifies a type \verb"t" that can be either a string or an integer,
770: depending on whether the constructor\index{constructor} \verb"NAME"
771: or \verb"NUMBER" is used to create it.
772: The program can examine any object of type \verb"t" and determine whether
773: it has the \verb"NAME" or \verb"NUMBER" representation---that is, it
774: is a {\em tagged} union. An object of type \verb"t" is represented
775: as a two-element record: one field contains the tag (a small integer),
776: and the other field contains the value.
777: The details of constructor representation may be of interest
778: only to those knowledgeable in ML, and are described in
779: section~\ref{constructor}.
780:
781: ML has {\bf strings\index{string}}
782: of characters. Strings can be of any non-negative length.
783: Since our runtime representation cannot support objects of length 0
784: (because one word is needed to store the forwarding pointer, as described
785: in a section~\ref{forwarding}),
786: the empty string\index{empty string}\index{null string}
787: \index{string of length 0}
788: must be treated specially. However, it turns out that
789: it is never necessary to create a new empty string; the only occurrences
790: of empty strings are as literals in the program text. String
791: literals (and back-pointers) will be discussed in section~\ref{function}.
792:
793: Strings\index{Strings of length one} of length one are treated specially by the compiler,
794: though this is not necessary either for the ML language or for our runtime
795: data format. Single-character strings\index{single-character strings}
796: \index{characters}
797: are treated as unboxed integers
798: between 0 and 255, to avoid heap-allocation for this (frequent) special case.
799: Though this technique may cause
800: less allocation, it requires special tests on every
801: string operation. It's not clear whether this special case saves more than
802: it costs. However, this special case is transparent to the runtime system
803: anyway; all conversions, etc. are handled explicitly by the compiler;
804: the single-character strings are not a new runtime data format, but
805: look like ordinary unboxed integers.
806:
807: {\bf Byte-arrays\index{byte-arrays}}
808: in ML are to strings as arrays are to records: they have the
809: same representation as strings, but their tag is different to facilitate
810: certain language features.
811:
812: {\bf Floating-point\index{floating-point}} numbers are too large to fit in
813: one word. Even if we chose
814: to use single-precision floating point, it would be difficult to store
815: them unboxed\index{unboxed}
816: because there is no bit available for use as a tag\index{tag} bit.
817: Thus, all floating-point numbers are stored boxed, with 8 bytes of data and
818: one word of descriptor.
819: The descriptor\index{descriptor}
820: must be one that indicates that the contents of the
821: object contain no pointers.
822: Since ML is statically typed, the language
823: never needs to distinguish (at runtime) floating-point numbers from
824: strings, so we can just represent a double-precision float as an 8-byte
825: string. This is not an ASCII string, it is the hardware representation;
826: a ``string'' descriptor does not denote that the contents are printable
827: characters, just that there is an arbitrary bit-pattern not containing
828: pointers.
829:
830: Similarly, {\bf machine-code procedures}
831: \index{machine code} do not require a separate class
832: of descriptor. From the garbage collector's point of view, machine
833: code is just like strings: it contains bits that are to be (perhaps)
834: moved from place to place but which never represent pointers.
835: Thus, machine code is just kept in string objects. The mechanism by
836: which we avoid placing pointers\index{pointers}
837: (or any reference to other objects)
838: in machine code will be described in section~\ref{modules}.
839:
840: \section{Closures}
841: \label{closures}
842: \index{closure}
843: In ML, as in many lexically scoped languages, functions may have free
844: variables. That is, the body of a function may reference not only its
845: own formal parameters\index{formal parameters}
846: (the bound variables of the function), but the
847: formal parameters of a statically enclosing function. Consider the
848: function
849: \begin{verbatim}
850: fun add(x) = let fun add1(y) = x+y
851: in add1
852: end
853: \end{verbatim}
854: The function \verb"add(x)" defines an internal function \verb"add1(y)" that
855: adds \verb"x" and \verb"y", and then \verb"add" returns \verb"add1" as its
856: result. The variable \verb"x" is bound by \verb"add", and
857: \verb"y" is bound by \verb"add1". Therefore, if we just consider the
858: function definition \verb"fun add1(y) = x+y", the variable \verb"x"
859: is a free variable\index{free variable} of this internal function.
860:
861: In order to evaluate \verb"add1" applied to some argument, we must have
862: a context in which the value \verb"x" is defined. It will not suffice
863: for the \verb"add" function just to return a pointer to the machine code
864: of \verb"add1" as its result. Instead, it must return a {\em closure}:
865: a combination of a code-pointer an some information about the values of
866: all the free variables (in this case, just \verb"x").
867:
868: For example, the result of evaluating \verb"add(5)" is (abstractly) the
869: function which will always add 5 to its argument.
870: Figure~\ref{add5} shows a possible representation
871: as a record object, with a descriptor at the beginning for the
872: garbage collector.
873: \begin{figure}[htbp]
874: \label{add5}
875: \begin{verbatim}
876: +------+
877: | desc |
878: +------+
879: p--> | o------------> machine code for add1
880: +------+
881: | 5 |
882: +------+
883: \end{verbatim}
884: \caption{A simple closure.}
885: \end{figure}
886: The pointer \verb"p" is the runtime representation of the particular version
887: of \verb"add1" in which \verb|x| is bound to 5. This pointer may be passed
888: as an argument to some other function that doesn't know anything about
889: the \verb"add1" function. All the other function has to know is how
890: to ``call'' a closure. The calling sequence might be:
891: \begin{enumerate}
892: \item Put \verb|p| in register 0.
893: \item Put the return address in register 1.
894: \item Fetch p[0] into register 2.
895: \item Jump to the address contained in register 2.
896: \end{enumerate}
897: Step 1 is necessary so that when \verb"add1" is entered, it can gain
898: access to the closure record where the binding of \verb"x" is stored.
899: Step 2 is just the normal saving of a return address, which might
900: in many cases be on a stack instead of in a register. Steps 3 and
901: 4 are just the typical code necessary to jump to a runtime-bound address.
902:
903: It is important to note that the function that calls a closure \verb"p"
904: doesn't need to access any part of it except the machine-code pointer
905: in \verb|p[0]|. This means that the free variables (the rest of the
906: closure) may have an arbitrary arrangement, known only to the
907: function that builds the closure (\verb"add", in this case) and the
908: function that executes it (\verb"add1", in this case). Closures can
909: be structured in many ways\cite{appel88:clo}, and can even point
910: to other closures for access to free variables (i.e., they can
911: have ``static links\index{static link},'' in Algol terminology).
912:
913: Clearly, closures are represented a lot like records. A closure has several
914: fields, each of which is either a machine-code pointer or a free variable.
915: The machine-code pointer is just a (boxed) string pointer,
916: and the free variables are
917: just one-word ML objects. Therefore, we use record\index{record}
918: objects to represent closures.
919:
920: When several functions share a set of free variables, it is convenient
921: to have a {\bf multiple-function closure.}
922: \index{closure for multiple functions}
923: \index{multiple functions closure}
924: \index{shared closure}
925: Consider the functions
926: \begin{verbatim}
927: fun test(y,z) =
928: let fun even(i) = if i=0 then y else odd(i-1)
929: and odd(i) = if i=0 then z else even(i-1)
930: in (even,odd)
931: end
932: \end{verbatim}
933: This function, when applied to arguments \verb"(3,7)", would return
934: two functions (let's call them \verb"even37" and \verb"odd37").
935: from integer to integer. \verb"Even37" applied to even integers
936: returns 3 and applied to odd integers returns 7, and \verb"odd37"
937: works the other way around.
938:
939: We could represent these functions by two different closures, as shown
940: in figure~\ref{twoclos}.
941: \begin{figure}[htbp]
942: \label{twoclos}
943: \begin{verbatim}
944:
945: +------+
946: | desc |
947: +------+
948: even37 --> | o------------> machine code for even
949: +------+
950: y | 3 |
951: +------+
952: odd | o-------> odd37
953: +------+
954:
955: +------+
956: | desc |
957: +------+
958: odd37 --> | o------------> machine code for odd
959: +------+
960: z | 7 |
961: +------+
962: even | o--------> even37
963: +------+
964: \end{verbatim}
965: \caption{Two mutually recursive closures}
966: \end{figure}
967: This is particularly unfortunate because \verb"even" and \verb"odd"
968: are free variables\index{free variables}
969: of each other, and this requires the construction
970: of a cyclic data structure of closures, which can get complicated.
971:
972: A more clever trick\cite{appel87:sml}\cite{kranz86} is to let these
973: two functions share a closure, as shown in figure~\ref{oneclos}.
974: \begin{figure}[htbp]
975: \label{oneclos}
976: \begin{verbatim}
977: +------+
978: | desc |
979: +------+
980: even37 --> | o------------> machine code for even
981: +------+
982: odd37 --> | o------------> machine code for odd
983: +------+
984: y | 3 |
985: +------+
986: z | 7 |
987: +------+
988: \end{verbatim}
989: \caption{Two functions sharing a closure}
990: \end{figure}
991: Now, when \verb"even37" is called, its closure-pointer will be loaded
992: in register 0, and it knows that it can access \verb:y: at offset 2 from
993: register 0. And when \verb"odd37" is called, its closure pointer
994: (which is really a pointer to the second field of the record) will be
995: loaded in register 0, and it can access \verb"z" at offset 2.
996: In some situations, the same free variable may be accessed by different
997: functions in the closure, and the code generator will have to remember
998: that the offsets\index{offset} from register 0 to a particular field
999: depend on which closure-pointer is loaded (i.e. which function we are
1000: generating code for).
1001:
1002: Finally, if \verb"even" wants to call \verb"odd", it can just generate
1003: the appropriate closure value in register 0 by adding an offset of 1
1004: to its own closure pointer. In general, all mutually recursive
1005: functions can be handled this way, and cyclic\index{cyclic closure}
1006: closure structures
1007: are never required. But even non-mutually-recursive sets of functions
1008: can save storage by sharing closures.
1009:
1010: It should now be clear why (in section~\ref{data}) we arranged for
1011: pointers into the middle of record objects. Given the pointer
1012: \verb"odd37", the garbage collector can easily search backward until
1013: it finds the descriptor\index{descriptor}
1014: of the record; the descriptor is unboxed,
1015: while all of the previous machine-code pointers are boxed fields.
1016:
1017: \section{Function entry points}
1018: \label{function}
1019:
1020: A typical compilation\index{compilation unit} unit
1021: will contain many functions, and it would be
1022: unwieldy to make a different string object for each function:
1023: since the functions in a compilation unit often call each other,
1024: these calls will go much faster as pc-relative jumps than if
1025: we had to fetch addresses from closures. But to achieve pc-relative
1026: jumps, the relative distance between two functions cannot change;
1027: and the garbage collector moves objects around.
1028: So to achieve
1029: this constant distance, we must put several functions in the
1030: same string object, and have several
1031: entry points.
1032:
1033: For example, the \verb"even", \verb"odd", and \verb"test"
1034: functions of section~\ref{closures} could all be placed a single string,
1035: \index{embedded string}
1036: with appropriate back-pointer descriptors (figure~\ref{embedded1}).
1037: \begin{figure}[htbp]
1038: \label{embedded1}
1039: \begin{verbatim}
1040: +------+
1041: | desc |
1042: +------+
1043: test --> | t |
1044: + e +
1045: | s |
1046: + t +
1047: | |
1048: +------+
1049: | [5] |
1050: +------+
1051: even --> | e |
1052: + v +
1053: | e |
1054: + n +
1055: | |
1056: +------+
1057: | [10] |
1058: +------+
1059: odd --> | o |
1060: + d +
1061: | d |
1062: +------+
1063: \end{verbatim}
1064: \caption{Three code strings embedded in a string}
1065: \end{figure}
1066: The numbers \verb"[5]" and \verb"[10]" are back-pointers, formatted
1067: as described in section~\ref{data}. They tell the garbage collector
1068: the offset to the beginning of the object, so it can find the true descriptor.
1069:
1070: The advantage to putting several functions in the same string is that
1071: they can refer to each other directly, without having to access each
1072: other through closures. Of course, it will be necessary to adjust
1073: the closure\index{closure}
1074: pointer in register 0. Here, \verb"even" could call \verb"odd"
1075: by this (simpler) calling sequence:
1076: \begin{enumerate}
1077: \item Add 1 to register 0 (to adjust p).
1078: \item Put the return address in register 1.
1079: \item Jump to \verb"odd".
1080: \end{enumerate}
1081:
1082: The garbage collector may move a code string from one place to another
1083: in memory, so it is necessary that the jump in step 3 be a
1084: pc\index{pc}-relative
1085: jump.
1086:
1087: ML programs may contain literal\index{literal}
1088: \index{string literal}
1089: strings and floating\index{floating point constant} point constants.
1090: These are embedded within string objects amongst the machine-code
1091: functions. For example,
1092: \begin{verbatim}
1093: fun show(b) = if b then "true" else "false"
1094: \end{verbatim}
1095: could almost have the representation shown in figure~\ref{embedded2}.
1096: \begin{figure}[htbp]
1097: \label{embedded2}.
1098: \begin{verbatim}
1099: +------+
1100: | desc |
1101: +------+
1102: show --> | s |
1103: + h +
1104: | o |
1105: + w +
1106: | |
1107: +------+
1108: | [5] |
1109: +------+
1110: true --> | true |
1111: +------+
1112: | [7] |
1113: +------+
1114: false --> | fals |
1115: + +
1116: | e000 |
1117: +------+
1118: \end{verbatim}
1119: \caption{String literals, oversimplified}
1120: \end{figure}
1121: The function \verb"show" returns a pointer to \verb"true" or \verb"false";
1122: this pointer points into the middle of the large string object at one
1123: of the embedded strings. Because the embedded string is preceded by a
1124: back-pointer descriptor, the garbage collector won't be confused.
1125:
1126: There's a slight problem with this. Although the garbage collector doesn't
1127: need to know the length of the string \verb|"true"|, the ML program might
1128: apply the \verb"length" function to it.
1129: String\index{string length} lengths are kept in the
1130: descriptor word, and are accessed by fetching field $-1$ and shifting
1131: off the tag bits. The problem here is that the descriptor for \verb"true"
1132: is a back-pointer\index{back-pointer},
1133: not a string descriptor. The solution---simple and
1134: inelegant---is to introduce a new kind of descriptor, an {\em embedded
1135: string}, which contains the length of the string and always is preceded
1136: by a backpointer. Thus, the actual representation of the \verb"show"
1137: function is more like figure~\ref{embedded3}.
1138: \begin{figure}[htbp]
1139: \label{embedded3}
1140: \begin{verbatim}
1141: +------+
1142: | desc |
1143: +------+
1144: show --> | s |
1145: + h +
1146: | o |
1147: + w +
1148: | |
1149: +------+
1150: | [6] |
1151: +------+
1152: | 4emb |
1153: +------+
1154: true --> | true |
1155: +------+
1156: | [9] |
1157: +------+
1158: | 5emb |
1159: +------+
1160: false --> | fals |
1161: + +
1162: | e000 |
1163: +------+
1164: \end{verbatim}
1165: \caption{String literals embedded in a code string}
1166: \end{figure}
1167:
1168: The treatment of embedded \index{floating point constant}
1169: floating point constants is just like that of
1170: embedded strings. However, the ML program will never need to take the
1171: length of a floating point constant, so the embedded-string descriptor
1172: can be omitted and a simple backpointer\index{backpointer} will suffice.
1173: Note that embedded strings of length 0 are permitted, since there
1174: is no need for a word to install a forwarding pointer (see
1175: section~\ref{forwarding},
1176: and note that forwarding pointers go only at the beginning of the
1177: entire string object, not at embedded strings).
1178:
1179: The address of string and floating constants may appear in machine code,
1180: but only in a pc\index{pc}-relative way, since code objects may be moved in memory
1181: by the garbage collector\index{garbage collector}.
1182:
1183: \section{Modules and compilation units}
1184: \label{modules}
1185:
1186: Standard ML has a powerful module\index{module}
1187: system, allowing nested modules,
1188: modules as parameters to other modules, thinning of a richer module
1189: to form a smaller module, etc. The module system can be represented
1190: entirely as records and functions in the runtime system\cite{appel87:sml};
1191: no new machinery is needed.
1192:
1193: A compilation unit\index{compilation unit}
1194: is just a sequence of function and value declarations that
1195: will be compiled together into one string object. But one compilation
1196: unit may refer to values defined in another, and a linkage\index{linkage}
1197: mechanism is required.
1198:
1199: We can use the power of higher-order functions to implement the linkage
1200: in a way that is transparent to the runtime system. If unit \verb"B"
1201: refers to a value in unit \verb"A", then the compiler will treat
1202: \verb"A" as an implicit parameter of \verb"B". For example, suppose we have
1203: two compilation units
1204: \begin{verbatim}
1205: fun f(x,y) = x+y
1206: \end{verbatim}
1207: and
1208: \begin{verbatim}
1209: fun g(z) = f(z,z)
1210: \end{verbatim}
1211: clearly \verb"f" is a free variable\index{free variable}
1212: of \verb"g". But the compiler can
1213: parametrize the second unit, as if it were written
1214: \begin{verbatim}
1215: fun g0(f) = let fun g(z) = f(z,z)
1216: in g
1217: end
1218: \end{verbatim}
1219: This compilation unit has no free variables; the ML system will simply
1220: apply the second unit to the first one, yielding the desired function
1221: \verb"g". The compiler must keep track of inter-module references
1222: in order to do this, but at least the runtime mechanism for linkage is simple.
1223:
1224: So, each compilation unit is a closed function with no free variables.
1225: No special linkage mechanism is built into the runtime system; the
1226: closure\index{closure}
1227: mechanism handles linkage very elegantly.
1228:
1229: Eliminating free variables from compilation units also simplifies the
1230: code generator\index{code generator}.
1231: Whenever the code generator must analyze local free variables,
1232: generate pc\index{pc}-relative references, etc.,
1233: its job is much simpler because there
1234: are no global references to other objects. The interface between the
1235: front end of the compiler and the code generator are much cleaner as a result.
1236:
1237: \section{Linkage to assembly language}
1238: \label{linkage}
1239:
1240: Some primitives of a programming language are best implemented in a different
1241: language (typically assembly language\index{assembly language}).
1242: Standard ML of New Jersey
1243: makes use of 8 functions written in assembly language, and 8 functions written
1244: in C\index{C}.
1245: (For comparison, more than 100 standard-library\index{standard library}
1246: functions are written
1247: in ML.) The assembly-language functions are:
1248: \begin{enumerate}
1249: \item {\bf array\index{array}(n,x)} creates an array of length $n$, each element initialized
1250: to $x$.
1251: \item {\bf callc\index{callc}(f,a)} calls a C-language function $f$ with argument $a$.
1252: \item {\bf create\_b(n)} creates\index{create\_b} a byte-array of length $n$.
1253: \item {\bf create\_s(n)} creates\index{create\_s} a string of length $n$.
1254: \item {\bf floor\index{floor}(x)} returns the smallest integer less than
1255: or equal to $x$.
1256: \item {\bf logb\index{logb}(x)} returns the exponent part of (the floating-point) $x$.
1257: \item {\bf scalb\index{scalb}(x)} inserts a new exponent into (floating-point) $x$.
1258: \item {\bf syscall\index{syscall}(i,args,k)} does operating system
1259: \index{operating system} kernel-call $i$ with
1260: $k$ arguments.
1261: \end{enumerate}
1262: The assembly-language functions are located at (constant) addresses within
1263: the runtime system. But it's a good idea to follow the rules about
1264: free variables\index{free variables}
1265: in module-linkages\index{linkage}: even the references to assembly-language
1266: primitives should not be ``hard-wired'' addresses in code objects.
1267: If this were done, then all ML code would have to be re-compiled whenever
1268: the runtime system was re-compiled.
1269:
1270: Instead, we just treat the assembly-language functions as a special
1271: record object containing nine elements. Other modules are parametrized
1272: by this \verb"Assembly" module just as if it were an ordinary one.
1273: All the assembly-language functions follow the ML calling conventions.
1274:
1275: Most of the C functions are to provide access to system calls with
1276: hard-to-manage interfaces, like \verb"fork", etc.
1277: There is an added complication that the C and ML calling conventions
1278: don't match.
1279: The assembly-language function \verb"callc" arranges the arguments
1280: for an ML function to call a C function. The details are uninteresting,
1281: but the point is that absolute references are again avoided.
1282:
1283: \begin{quotation}
1284: {\small The file \verb"boot/assembly.sig" contains the signature
1285: \verb"Assembly", approximately as in figure~\ref{assembly}.
1286: \begin{figure}
1287: \label{assembly}
1288: \begin{verbatim}
1289: signature ASSEMBLY =
1290: sig
1291: datatype datalist = DATANIL | DATACONS of (string * string * datalist)
1292: type func
1293: datatype funclist = FUNC of (func * string * funclist)
1294: type object
1295: structure A : sig
1296: val array : int * 'a -> 'a array
1297: val callc : 'b (* func*) * 'a -> 'c
1298: val create_b : int -> string
1299: val create_s : int -> string
1300: val floor : real -> int
1301: val logb : real -> int
1302: val scalb : real * int -> real
1303: val syscall : int * string list * int -> int
1304: end
1305: exception Div
1306: exception Float of string
1307: exception Interrupt
1308: exception Overflow
1309: exception SystemCall of string
1310: exception UnboundTable
1311: val array0 : 'a array
1312: val bytearray0 : string
1313: val collected : int ref
1314: val collectedfrom : int ref
1315: val current : string ref
1316: val datalist : datalist
1317: val external : funclist
1318: val gcmessages : int ref
1319: val gcprof : string ref
1320: val majorcollections : int ref
1321: val minorcollections : int ref
1322: val opsys : int (* 1 = vax bsd ultrix, 4.2, 4.3
1323: 2 = sunos 3.0, 4.0
1324: 3 = vax v9 (bell labs) *)
1325: val pstruct : object ref
1326: val ratio : int ref
1327: end
1328: \end{verbatim}
1329: \caption{The {\tt ASSEMBLY} signature}
1330: \end{figure}
1331: \index{Assembly structure}
1332: The substructure \verb"A" is the machine-dependent part (implemented
1333: in assembly language), and the other components (exceptions, constants,
1334: etc.) are just data structures that can be described machine-independently
1335: in C.
1336: \verb"A" is arranged as an ML record
1337: at the label \verb"runvec"\index{runvec}
1338: in the file \verb"VAX.prim.s", \verb"M68.prim.s",
1339: etc. The machine-independent part
1340: is in \verb"cstruct.c"\index{cstruct.c}.
1341:
1342: The exceptions in this structure are just those that can be raised
1343: from C or assembly language. Exceptions as elaborated by the compiler
1344: look like references to strings, and in \verb"cstruct.c" are implemented
1345: as initialized structures to mimic this
1346: with all the appropriate descriptors.
1347:
1348: The values \verb"array0" and \verb"bytearray0" are just the array and
1349: \index{array0}\index{array of length 0}
1350: \index{bytearray0}
1351: byte-array of length 0; this is necessary because objects of length 0
1352: are not permitted in the garbage-collectible region.
1353:
1354: \index{collected}\index{collectedfrom}
1355: \index{majorcollections}\index{minorcollections}\index{ratio}
1356: The references \verb"collected", \verb"collectedfrom",
1357: \verb"majorcollections", and \verb"minorcollections" are updated by
1358: the garbage collector with performance information. The \verb"ratio"
1359: variable can be set from ML to tell the garbage collector what
1360: ratio to maintain between heap size and amount of live data.
1361:
1362: \index{current}\index{profiling}
1363: The variable \verb"current" is used in execution profiling\cite{appel88:prof};
1364: it is a pointer
1365: to an array of length 2 of ML integers. When profiling is enabled,
1366: a timer interrupt will periodically
1367: cause current[1] to be incremented.
1368:
1369: \index{pstruct}
1370: \index{loader}
1371: \index{Initial}
1372: The \verb"pstruct" is a pointer to the \verb"Initial" (pervasive)
1373: structure. The ML loader (\verb"boot/loader.sml") needs this
1374: because other modules may reference the structure \verb"Initial".
1375:
1376: \index{datalist}
1377: The \verb"datalist" is used for linkage to sharable compiled ML code.
1378: In vanilla Unix, such code must be link-loaded into the text segment
1379: of the sml executable file, and the \verb"datalist" provides the ML
1380: loader a way to get to it. Each element of the data list contains
1381: two strings (the name of the module, and the executable code), along
1382: with a link to the next element.
1383:
1384: \index{funclist}
1385: The \verb"funclist" is similarly used for linkage to functions implemented
1386: in C. These functions could be described in the Assembly signature
1387: but this requires recompilation whenever a new function is added.
1388: The funclist is terminated by a function whose name is "xxxx". All functions
1389: in the list must have a name of exactly 4 characters.
1390: All C-functions must take exactly one argument in ML format and return
1391: a result in ML format. The functions implemented in C at this writing are:
1392: \begin{itemize}
1393: \item{\bf fion\index{fion}} tells how many characters may be read from an open
1394: file-descriptor without blocking.
1395: \item{\bf fork\index{fork}} does a Unix process fork.
1396: \item{\bf prof\index{prof}} is obsolete.
1397: \item{\bf syst\index{syst}} executes a shell command as a sub-process.
1398: \item{\bf time\index{time}} gets execution-time information from the operating system
1399: and the garbage collector.
1400: \item{\bf argv\index{argv}} gets the command-line arguments of the executing program
1401: (as a list of strings).
1402: \item{\bf envi\index{envi}} gets the Unix environment\index{environment}
1403: string (as a list of strings).
1404: \item{\bf blas\index{blas}} does a structured\index{structured write} write (as described section~\ref{structured}).
1405: \item{\bf salb\index{salb}} does a structured read.
1406: \end{itemize}
1407:
1408: These functions should be careful not to allocate memory using \verb"malloc".
1409: \index{malloc}
1410:
1411: All of these functions are called from the ML ``thread'' (see
1412: section~\ref{fast}).
1413: Some C functions may require the (temporary) suspension of the ML thread,
1414: e.g. to do a garbage collection for a structured write.
1415: By setting the global variable \verb"cause"\index{cause} to a nonzero
1416: value and then returning, a function may cause control to be grabbed
1417: immediately thereafter in the thread\index{thread} controller.
1418: }
1419: \end{quotation}
1420:
1421: \section{Access to operating system services}
1422: \label{access}
1423:
1424: A typical operating system\index{operating system}
1425: provides many services, each with one or
1426: more operations (``system calls''). A very early version of Standard ML of
1427: New Jersey made little use of these services, so it sufficed to write
1428: an assembly-language interface to each desired system call, and then
1429: call the assembly-language functions from ML programs. As the ML environment
1430: grew more sophisticated, it needed more operating system services, so the
1431: number of assembly-language interface functions grew. This eventually
1432: became intolerable.
1433:
1434: Now there is just one assembly-language interface function, \verb"syscall",
1435: \index{syscall}
1436: that takes a list of arguments and a system-call number. This function
1437: pushes the arguments onto the stack and makes the system call. The ML standard
1438: library hides the \verb"syscall" function behind a variety of typechecked
1439: functions. Implementing these functions in ML rather than assembly language
1440: is better for two reasons: ML is safer and easier to program in, and (more
1441: important) the ML code doesn't need to be rewritten for each target
1442: architecture.
1443:
1444: \begin{quotation}
1445: {\small All Unix system calls return integer values. These are
1446: converted into ML integers (by shifting and incrementing) and returned
1447: as the result of \verb"syscall". Many system calls take parameters
1448: by reference that they stuff results into. These results are (of course)
1449: not appropriately tagged for ML. A byte-array (of the
1450: appropriate length) is used
1451: as the actual parameter for such an argument, and then
1452: the results can be extracted from the byte-array after the system call.
1453:
1454: Integer arguments to system calls are shifted right by \verb"syscall"
1455: to convert them into their (untagged) representation.
1456:
1457: String and byte-array arguments are left alone, yielding their natural
1458: representation. However, strings\index{string}
1459: in C (and Unix) are terminated
1460: by a null character, whereas strings in ML have a length prefix.
1461: The prefix is ignored by the kernel, since it occurs at offset -4 (bytes).
1462: To achieve null termination of strings, you may take the actual ML argument
1463: and concatenate the string \verb|"\000\000"| of two zero characters (two
1464: characters are required because of the unboxed representation of
1465: single-character strings, and the possibility that the actual argument
1466: is an empty string).
1467: }
1468: \end{quotation}
1469: \section{Constructor representations}
1470: \label{constructor}
1471: Those unfamiliar with ML might wish to skip this section.
1472:
1473: Some of the constructors\index{constructor}
1474: in an ML datatype\index{datatype} may not carry values.
1475: For example, in the type
1476: \begin{verbatim}
1477: datatype 'a option = NONE | SOME of 'a
1478: \end{verbatim}
1479: the \verb"SOME" constructor carries value of type \verb"'a"
1480: and the
1481: \verb"NONE" constructor carries no value. Constructors that carry no value
1482: are represented not as two-element records, but as (unboxed) small integers.
1483: In this case, \verb"NONE"
1484: is represented by the integer 0 (with a low-order 1 tag bit),
1485: and \verb"SOME(x)" is represented by a two-element record containing the value
1486: \verb"x" and a small integer representing the \verb"SOME" constructor.
1487: The ML program can distinguish which constructor has been used by
1488: testing for ``boxity:''\index{boxity}
1489: \verb"NONE" is an unboxed value\index{unboxed value}, \verb"SOME(x)"
1490: is a boxed value\index{boxed value},
1491: and the low-order bit will distinguish them.
1492:
1493: Finally, in the case that there is only one value-carrying constructor,
1494: and the value carried always has a boxed representation, the extra
1495: indirection record can be eliminated. Thus, for the {\bf list} datatype
1496: \begin{verbatim}
1497: datatype 'a list = NIL | CONS of ('a * 'a list)
1498: \end{verbatim}
1499: the constructor \verb"NIL" can be represented as the unboxed integer 0, the
1500: constructor \verb"CONS(a,b)" can be represented as a two-element record
1501: containing \verb"a" and \verb"b". Again, the low-order ``boxity'' bit
1502: distinguishes the constructors. If there had been several value-carrying
1503: constructors, then the boxity bit alone could not distinguish them,
1504: and an indirection record (containing the carried value and a small
1505: integer denoting the constructor) would be required.
1506:
1507: It is tempting to play more elaborate tricks with the representation of
1508: constructors. For example, the {\em option} datatype described above
1509: might not seem to need an extra indirection; surely the boxity bit
1510: should be enough to distinguish between \verb"NONE" and \verb"SOME(x)"?
1511: The problem is that the value \verb"x" might itself be either boxed or
1512: unboxed; the indirection guarantees that \verb"SOME(x)" will indeed be boxed.
1513:
1514: Similarly, it is tempting to use a clever representation for the datatype
1515: \begin{verbatim}
1516: type t = a * b
1517: datatype atom = NUMBER of int | RECORD of t
1518: \end{verbatim}
1519: Here, the values carried by the NUMBER constructor are always unboxed, and
1520: the values carried by the RECORD constructor are always boxed; the
1521: boxity can serve to distinguish them.
1522:
1523: The problem with these schemes is caused by the abstraction provided by
1524: ML {\em functors\index{functor}}:
1525: a functor might take the atom datatype as a parameter
1526: with some of its structure hidden:
1527: \begin{verbatim}
1528: functor F(sig type t
1529: datatype atom = NUMBER of int | RECORD of t
1530: end)
1531: \end{verbatim}
1532: and here it is not at all clear that \verb"t" is always boxed.
1533: (Unfortunately, this problem applies to the ``clever'' representation of the {\em list}
1534: datatype as well, but we have chosen to ignore it there---we can
1535: detect the (rare) problems with partially-abstracted lists
1536: as functor parameters and give error messages, but we didn't want
1537: to use an extra indirection in the representation of lists.)
1538: The problem of functor/datatype interaction
1539: could be considered a defect in the language design.
1540:
1541: \begin{quotation}
1542: {\small Note that the constructor tag is element 1 of the record and
1543: the value is element 0. There's no particular reason for this; it's
1544: probably a bad idea. For exception constructors, the constructor
1545: tag is not a small integer, it is a \verb"string ref". A ref cell is
1546: used instead of just a string so that (generative)
1547: exceptions may be compared
1548: for equality in pattern-matching exception\index{exception} constructors.
1549: }
1550: \end{quotation}
1551: \section{Suspending a process}
1552: \label{suspending}
1553: \index{exportML}
1554: Having designed a simple layout of runtime objects, we can now implement
1555: easily a variety of runtime services.
1556:
1557: A convenient feature of a programming environment is the ability to save
1558: the current state of the computation in a file, so that by executing
1559: that file on a later date the computation can be resumed. In an interactive
1560: system, one might wish to compile and load some programs, and then
1561: ``save the world'' so that in the next session these programs don't
1562: have to be re-loaded. This feature could also be useful for program
1563: checkpoints.
1564:
1565: It's not too hard to suspend a process in this way. It's even possible
1566: to make the resulting
1567: file an ordinary executable file. An executable file
1568: contains a header, a text segment, and a data segment. The (read-only)
1569: text segment
1570: of the saved file will be identical to the text segment of the currently
1571: executing process, so a single \verb"write" system call will serve to
1572: write it to the file. The data segment of the file will consist of the
1573: original data segment of the process plus any newly allocated heap memory;
1574: again, one or two \verb"write"s will put it neatly into the file.
1575: (It is useful to do a garbage collection immediately before saving,
1576: to minimize the size of the file.)
1577: Finally, the header can be synthesized appropriately.
1578:
1579: An early version of Standard ML of New Jersey used a runtime
1580: stack\index{stack}, and
1581: we had no end of trouble getting the stack saved correctly. A Unix executable
1582: file doesn't have a stack segment, so the stack had to be copied into the
1583: heap before saving. And after restoring and copying back to the stack
1584: segment, we found that we had creamed the (new process's) command-line
1585: arguments, etc. When we eliminated the runtime stack, these annoyances
1586: ceased.
1587:
1588: \begin{quotation}
1589: {\small
1590: \index{magic number}
1591: The resulting executable file has a ZMAGIC magic number on
1592: Berkeley Unix, and NMAGIC on SunOs 4.0. This is because ZMAGIC
1593: implies dynamic loading of shared libraries on SunOs 4.0, which complicates
1594: the address map. Creating executable files is annoyingly difficult
1595: in operating systems like SunOs and Mach, whose notion of a process
1596: address map is not as simple as in Berkeley Unix.
1597: }
1598: \end{quotation}
1599: \section{Structured writes and reads}
1600: \label{structured}
1601:
1602: \index{structured write}
1603: It's often necessary to take a data structure (with many records, pointers,
1604: strings, etc.) and write it to a file in binary form so it can be read
1605: back in quickly. It's always possible to do this in an {\em ad hoc} way
1606: for each different kind of data structure that has to be written, but
1607: it's sometimes difficult to get the pointer-sharing relations right.
1608: Since the runtime data format has enough information for the garbage-collector
1609: to traverse the structure, then a writer/reader must be able to traverse
1610: it as well.
1611:
1612: In fact, we can view the saving of such a structure
1613: as just a garbage collection.
1614: The argument to the \verb"structured-write" function is just a root pointer,
1615: and \verb"structured-write" must traverse all the data accessible from this
1616: root. In doing so, it must copy the data to a file, being careful
1617: to make only one copy of each record.
1618:
1619: \index{garbage collection}
1620: We can, in fact, just invoke the \verb"gc" procedure, giving as the
1621: \verb"roots" argument just a singleton vector---the pointer that
1622: was handed to \verb"structured-write". Then, the resulting {\em tospace}
1623: will be exactly the desired contents of the file, which can be written
1624: out in a single operation.
1625:
1626: At this point, however, memory is in a bit of a mess; some objects have
1627: been forwarded and others have not. One solution is just to finish the
1628: garbage collection: each of the ``normal'' roots of the data is
1629: forwarded, and then (again) all the remaining unforwarded pointers in
1630: {\em tospace} are forwarded. Then memory is again consistent, and execution
1631: may continue.
1632:
1633: This is unattractive because the time required to do a structured write
1634: is now proportional to the amount of all live data, not proportional
1635: to the amount written. So a different solution is better: keep
1636: a separate list of
1637: repair\index{repair}
1638: information about the {\em fromspace} objects that were overwritten with
1639: forwarding pointers. Whenever a {\em fromspace} object is forwarded,
1640: two words of information must be kept about it: the location of the
1641: object itself, and
1642: the previous value
1643: of its first word (that was overwritten). Then, repair is easy;
1644: the list of repair information can be traversed, and each {\em fromspace}
1645: object is restored to its original state.
1646: The descriptor can be recovered by copying the descriptor of the
1647: {\em tospace} copy (accessed via the forwarding pointer), and the first
1648: word (which had been overwritten by the forwarding pointer) can then
1649: be recovered from the repair information.
1650: Saving the repair information is fast, and performing the repairs is also
1651: fast. The time to do a structured write is now proportional (with a small
1652: constant of proportionality!) to the amount written.
1653:
1654: There's one last problem. Where do we keep the repair information, and
1655: what if it grows too large? The repair information can be kept at the
1656: far end of {\em tospace}. Then, as long as the amount written is fairly small,
1657: we'll never run into it. And if we do run into it, that means that
1658: the amount written must be at least half the size of {\em tospace} (where {\em tospace}
1659: is always at least the size of the total accessible data). What we can do
1660: in this case is just to toss away the repair information, and go ahead
1661: with a garbage collection!
1662:
1663: \begin{quotation}
1664: {\small In a multi-generation system (as in SML-NJ's runtime system),
1665: we do a (cheap) minor collection before the structured write; the collection
1666: referred to in the paragraphs above is a major collection.
1667: }
1668: \end{quotation}
1669: \section{Buffered input/output}
1670: \label{buffered}
1671:
1672: Some operating systems (like Unix) don't provide buffered\index{buffered I/O}
1673: input and output
1674: as a primitive, and even in operating systems that do, the overhead
1675: of one system call per character may be too high. What the runtime
1676: system can do is provide this service for user programs.
1677:
1678: The Unix/C\index{C} standard library\cite{kernighan78} demonstrates that
1679: \index{standard library}
1680: the runtime system need not provide this service; instead, library
1681: routines can implement buffered I/O. We have taken the same approach
1682: in our system; the buffered I/O functions are implemented in ML as library
1683: functions, and the runtime system is completely ignorant of buffered I/O.
1684:
1685:
1686: \section{Execution profiling}
1687: \label{execution}
1688:
1689: \index{profiling}
1690: In large software systems it can become difficult to tell where
1691: the inefficiencies are. Execution profilers can help with this process.
1692: A typical profiler might provided information about the amount of time
1693: spent in each procedure and the number of times it was called.
1694:
1695: Standard ML of New Jersey has an execution profiler\cite{appel88:prof},
1696: which will only briefly be described here. Call counts are handled
1697: completely independently of the runtime system; the compiler inserts
1698: program variables that are incremented on entry to procedures, and the
1699: profiler examines these variables to generate call-count information.
1700:
1701: Execution-time estimates are more difficult. The Unix \verb"prof" system
1702: call starts a timer interrupt in the operating system that periodically
1703: samples the program's program counter. An affine function is applied
1704: to the PC and the corresponding element of a ``histogram'' array is
1705: incremented. Then the histogram can be compared against the object
1706: file to see how many interrupts occurred in each procedure, which gives
1707: a good estimate of actual time spend in each procedure.
1708:
1709: With a garbage collector that moves procedures around in memory, this
1710: method becomes unwieldy. Instead, the runtime system has a variable
1711: \verb"current" that points at a sample-count cell for the currently
1712: executing procedure. Whenever (profiled) code begins executing a procedure
1713: (or returns from a call), it assigns the address of the new procedure's
1714: \index{current}
1715: sample-count variable into \verb"current" (the compiler inserts code to do
1716: this). The timer interrupt, instead
1717: of examining the PC, just increments the cell pointed to by \verb"current".
1718:
1719: This very simple mechanism does not greatly complicate the runtime system,
1720: but provides profiling information with a relatively low overhead.
1721:
1722: \section{Debugging}
1723: \label{debugging}
1724:
1725: We are currently developing a high-level
1726: debugger\index{debugger} based on the same principles as our
1727: profiler---namely, minimal interaction with the runtime system.
1728: The work is still in its early stages, but we believe that
1729: it is possible to make a debugger this way, and that the runtime system
1730: won't have to grow much in size to support it. A detailed description
1731: of our plans is beyond the scope of this paper.
1732:
1733: \section{Handling of interrupts and exceptions}
1734: \label{handling}
1735:
1736: The ML language has an exception\index{exception}-handling
1737: mechanism with dynamically-nested
1738: handlers. Some exceptions are raised by the program itself, some are related
1739: to synchronous hardware exceptions (like floating point errors), and
1740: some are generated asynchronously (like the \verb"Interrupt" exception
1741: raised when the user presses the interrupt key).
1742:
1743: The ML program has a ``current-exception-handler'' register that just
1744: contains a pointer to a continuation\index{continuation}
1745: (i.e. a closure).
1746: Raising an exception just
1747: corresponds to invoking this continuation with the exception-object
1748: as an argument. Dynamic nesting of handlers is handled completely by the
1749: compiler: it generates code that saves the previous handler (in the
1750: new handler's closure) and restores it upon exit from the scope of the
1751: new handler.
1752:
1753: This makes it particularly easy for the programming language's exception
1754: mechanism to be attached to the hardware's notion of fault or interrupt.
1755: When a fault or interrupt\index{interrupt}
1756: occurs, the operating system calls a special
1757: (assembly-language)
1758: function in the runtime system. This function then determines what
1759: caused the fault, clears it if necessary, then {\em does not} return to the
1760: operating system but just invokes the current-exception-handler continuation
1761: on the appropriate exception object.
1762:
1763: \section{Fancier garbage collectors and object boundaries}
1764: \label{fancier}
1765:
1766: Our current system uses a good, simple generational garbage
1767: collector\cite{appel89:sggc}. Generation garbage
1768: collection\cite{lieberman83}\cite{ungar84} is based on the observations
1769: that newer records tend to become garbage more quickly, so that the
1770: collector should concentrate its effort on the newer records; and that
1771: newer records tend to point to older records, but not vice versa, so that
1772: the older records need not be searched for roots into the newer area.
1773:
1774: Sometimes, however, older records are modified (by a \verb"rplaca" operation
1775: in Lisp or an assignment operation in ML)
1776: to point to newer ones.
1777: Generational garbage collectors require
1778: the maintenance of a list of all modified records.
1779: To ensure portability,
1780: we use a very simple scheme to maintain this list: the compiler
1781: inserts instructions after each \verb"update" operation to put the
1782: address of the updated cell on a special list for the garbage collector
1783: to use.
1784:
1785: \index{page fault}\index{virtual memory}\index{garbage collector}
1786: A clever trick is to use the virtual memory system in maintaining this list.
1787: It is only updates to records in older generations that must be remembered;
1788: we can just make all the memory occupied by older generations read-only,
1789: and an update will cause a page fault. The fault-handler can then put the
1790: page on a list of updated pages, and mark the page as writeable.
1791: During garbage collection, the entire page can be searched for
1792: roots into the newer region.
1793:
1794: A different virtual-memory trick can be used to achieve concurrent garbage
1795: collection\cite{appel88:gc}. By making the pages of the {\em tospace} inaccessible
1796: until they have been scanned and forwarded, we can allow the ML program
1797: to continue executing even while the garbage collector is running.
1798: An access to an unscanned page of {\em tospace} causes a page fault, and the
1799: fault handler scans that page and makes it accessible.
1800: Then the latency is bounded by the product of the page size and the
1801: record size (in practice, to just a few milliseconds).
1802: This is very desirable in a real-time or interactive system.
1803:
1804: Both of these algorithms require that pages be scanned (and forwarded)
1805: out of order. When objects cross page boundaries, it is difficult
1806: to know where the first object on a page begins, so that its descriptor
1807: may be found. There are solutions to this problem that involve keeping
1808: track of objects that cross page boundaries\cite{appel88:gc}, but the
1809: SML-NJ runtime system has a runtime data format that allows a very
1810: simple solution.
1811:
1812: Suppose we need to scan a particular page (to forward all its pointers).
1813: When we start at the beginning of the page, we don't know whether
1814: we are in the middle of a record or of a string; and we don't know
1815: where the descriptor for the object is.
1816: But if we knew that the page contained only records, then we wouldn't need
1817: to find the descriptor;
1818: the important thing about records is that their boundaries can be ignored.
1819: If a page consists of the last part of one record, followed by several
1820: complete records, followed by the first part of the last record, the
1821: pointers in all of those records can be scanned and forwarded
1822: without regard to the boundaries. Since the descriptors are unboxed
1823: (they look like integers), they won't be touched; and all the other
1824: fields are tagged pointers or integers which will be forwarded (or not)
1825: appropriately. (See figure~\ref{boundaries} for an illustration.)
1826: \begin{figure}[htbp]
1827: \label{boundaries}
1828: \begin{verbatim}
1829: +------+--object boundary
1830: | desc |
1831: +------+
1832: | o------------>
1833: -----+------+-----page boundary
1834: | 5 |
1835: +------+--object boundary
1836: | desc |
1837: +------+
1838: | o------------>
1839: +------+
1840: | o------------>
1841: +------+--object boundary
1842: | desc |
1843: +------+
1844: | 7 |
1845: +------+
1846: | o------------>
1847: -----+------+-----page boundary
1848: | 6 |
1849: +------+
1850: | o------------>
1851: +------+--object boundary
1852: \end{verbatim}
1853: \caption{Object boundaries may be ignored by the forwarding algorithm}
1854: \end{figure}
1855:
1856: And if we knew that the page contained only strings, then it wouldn't
1857: need to be scanned at all, since strings contain no pointers.
1858:
1859: Thus, a simple arrangement that's useful when pages need to be scanned
1860: in arbitrary order is
1861: to segregate pointer-containing objects (records and arrays) from
1862: non-pointer-containing objects (strings and byte-arrays). This isn't necessary
1863: for the newly-allocated objects; the compiler can continue to maintain
1864: just one register that points at the beginning of the free space. But
1865: the copying garbage collector (the functions \verb"gc" and \verb"forward")
1866: must copy records and strings into different parts of memory.
1867: This greatly simplifies the implementation of page-based garbage collection
1868: algorithms.
1869:
1870: \section{Size of the SML-NJ runtime system}
1871: \label{size}
1872:
1873: Our runtime system is implemented in 1548 lines of C and 232 lines of
1874: assembly language (the assembly language must be duplicated for each
1875: target architecture). This is small indeed, considering the functionality
1876: of our Standard ML system. We have made it small by moving as much as possible
1877: out of the runtime system, and by having a simple format for ML data.
1878:
1879: This 1780 lines of code can be divided approximately as follows:
1880: \begin{enumerate}
1881: \item[291] lines of C to initialize and link the Standard ML library
1882: and loader, which then loads the rest of the system.
1883: {\tiny (run.c)}
1884: \item[299] lines to implement the copying garbage collector with repairs.
1885: {\tiny (gc.c)}
1886: \item[55] lines of C to implement the simulated process-switching required
1887: for garbage-collector access to ML registers.
1888: {\tiny (callgc.c)}
1889: \item[400] lines for the management of generational garbage collection,
1890: and for deciding when to ask the operating system for more memory.
1891: {\tiny (callgc.c)}
1892: \item[304] lines to implement C-language functions and data structures
1893: used as primitives by the ML program.
1894: {\tiny (cstruct.c)}
1895: \item[117] lines of C to implement the suspending of process states into
1896: Unix executable files.
1897: {\tiny (export.c)}
1898: \item[72] lines of utility functions for C functions that manipulate ML
1899: objects.
1900: {\tiny (objects.c)}
1901: \item[38] lines of assembly language to handle simulated process switching.
1902: {\tiny (*.prim.s)}
1903: \item[195] lines of assembly language to implement primitive functions
1904: callable by ML programs.
1905: {\tiny (*.prim.s)}
1906: \end{enumerate}
1907: For comparison, the Icon runtime system\cite{griswold86} (excluding
1908: its interpreter) is 18,000 lines of C; the T3\cite{kranz86} runtime system is
1909: 1,900(?) lines of C and assembly language; the FranzLisp runtime system
1910: (including interpreter)
1911: is 19,500 lines of C; the Poly/ML runtime system is ??? lines of ???.
1912:
1913: \section{Conclusion}
1914: \label{conclusion}
1915:
1916: It's easy to implement a hi-tech runtime system if you do it cleanly.
1917:
1918: \section{Acknowlegements}
1919:
1920: David MacQueen, Trevor Jim, and Bruce Duba have all worked on the runtime
1921: system, and their assistance has been valuable. Norman Ramsey provided
1922: many useful suggestions about the organization of this paper.
1923:
1924: %\cleardoublepage
1925: \bibliographystyle{plain}
1926: \bibliography{appel}
1927: %\cleardoublepage
1928: %\printindex
1929: \end{document}
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.