|
|
1.1 ! root 1: \documentstyle[11pt]{article} ! 2: \bibliographystyle{plain} %changed from alpha ! 3: \documentstyle{openbib} ! 4: \makeindex ! 5: \oddsidemargin .5in % Left margin on odd-numbered pages. ! 6: \evensidemargin .5in % Left margin on even-numbered pages. ! 7: \topmargin .5in % Nominal distance from top of page to top of ! 8: \headheight 0pt % Height of box containing running head. ! 9: \headsep 0pt % Space between running head and text. ! 10: % Bottom of page: ! 11: \footheight 12pt % Height of box containing running foot. ! 12: \footskip 30pt % Distance from baseline of box containing ! 13: % foot to baseline of last line of text. ! 14: \textwidth 5.5in % width of text on page ! 15: ! 16: %\newcommand{\mysection}[1]{\section{\protect\large\bf #1}} ! 17: %\setcounter{secnumdepth}{0} ! 18: ! 19: \sloppy ! 20: ! 21: \begin{document} ! 22: \title{A Runtime System \\ (DRAFT)} ! 23: \author{Andrew W. Appel\thanks{ ! 24: Supported in part by NSF Grants ! 25: DCR-8603453 and CCR-880-6121} ! 26: \\ Princeton University} ! 27: \date{February 1989} ! 28: \maketitle ! 29: \begin{abstract} ! 30: The runtime data structures of the Standard ML of New Jersey compiler ! 31: are simple yet general. As a result, code generators are easy to implement, ! 32: programs execute quickly, garbage collectors are easy to implement and ! 33: work efficiently, and a variety of runtime facilities can be provided with ! 34: ease. ! 35: \end{abstract} ! 36: ! 37: \section{Introduction} ! 38: \label{introduction} ! 39: ! 40: Some languages, like Lisp, Smalltalk, ML, Prolog, etc. rely heavily on a ! 41: {\em runtime system} to provide essential services. The most important ! 42: such service is the management of dynamically-allocated storage ! 43: (e.g. {\bf garbage collection}), but the runtime system may also provide ! 44: facilities like ! 45: ! 46: \begin{itemize} ! 47: \item {\bf Stream input/output:} on operating systems (like Unix) that do not ! 48: have a buffered input/output facility, the process must provide its own; ! 49: this might be handled in the runtime system. ! 50: \item {\bf Structured input/output:} the ability to automatically ! 51: write a large linked data structure to a file, and read it back in ! 52: with all links adjusted, is a great convenience that can be implemented ! 53: efficiently in the runtime system. ! 54: \item {\bf Process suspension:} a snapshot of an ! 55: executing process may be ``preserved'' in a file, so that ! 56: the execution of that file causes a new process to start exactly ! 57: where the saved one left off. ! 58: \item {\bf Operating system calls:} operating system services needed by ! 59: a program may be conveniently packaged by the runtime system. ! 60: \item {\bf Handling of interrupts and asynchronous events:} if the programming ! 61: language has a mechanism to handle asynchronous events, it relies on the ! 62: runtime system for its implementation. ! 63: \item {\bf Handling of arithmetic exceptions:} the programming language's ! 64: exception-handling mechanism must be implemented in cooperation with ! 65: the runtime system. ! 66: \item {\bf Assembly-language implementation of language primitives:} ! 67: it may be inconvenient for the compiler to generate code for some features ! 68: of the programming language; these functions can be implemented as calls ! 69: to runtime system routines. ! 70: \item {\bf Foreign-language procedure calls:} calls to subroutines written ! 71: in other languages may be mediated by the runtime system. ! 72: \item {\bf Fun with continuations:} Languages with features like {\em call with ! 73: current continuation}, which allows the explicit manipulation of threads ! 74: of control, require runtime-system cooperation. ! 75: \item {\bf Execution profiling:} Automatic measurement of the time spent in ! 76: different parts of the user program can be accomplished with the help of ! 77: the runtime system. ! 78: \item {\bf Debugging:} starting, stopping, and displaying the execution state ! 79: of user programs can be accomplished only by low-level routines. ! 80: \end{itemize} ! 81: ! 82: Since this list is rather long, and several of these features may interact, ! 83: it is evident that runtime systems can become nasty and complicated. ! 84: The proliferation of data types may make the implementation of ! 85: the garbage collector (and other programs that must traverse the data) ! 86: inefficient. ! 87: ! 88: We particularly wanted a runtime data layout that provided fast ! 89: allocation of records and fast garbage collection, since ML makes very ! 90: heavy use of dynamically-allocated storage. To this end, we eliminated ! 91: the runtime stack and use a very simple data format. ! 92: The only runtime data types are {\em integers}, {\em pointers}, ! 93: {\em records}, and {\em strings}. ! 94: Then we found that this very simple structure allowed the easy implementation ! 95: of many of the services described above. ! 96: ! 97: Our runtime system has relatively few ML-specific ! 98: features; it could be used equally well for other languages. This paper ! 99: describes the design and implementation of the SML-NJ runtime system. ! 100: It should be read in conjunction with an earlier paper on the ! 101: SML-NJ generational garbage collector\cite{appel89:sggc}. ! 102: ! 103: \begin{quotation}{ ! 104: \small Text in smaller type is of interest only to those who might ! 105: have to read, modify, or maintain the source code to the runtime system. ! 106: }\end{quotation} ! 107: ! 108: \section{Standard ML of New Jersey} ! 109: \label{standard} ! 110: The ML language originated as part of the Edinburgh LCF proof system ! 111: \cite{gordon78}, and was soon implemented as a stand-alone compiler ! 112: \cite{cardelli83:ml}\cite{cardelli84}. The language was ! 113: ``standardized'' \cite{milner84}\cite{macqueen84} ! 114: \cite{milner85}\cite{harper88}, and ! 115: Standard ML has been implemented at Edinburgh, Cambridge, and ! 116: New Jersey\cite{appel87:sml}; the New Jersey implementation is ! 117: a joint effort between researchers at AT\&T Bell Laboratories and ! 118: Princeton University. ! 119: ! 120: Though ML was first implemented as the meta-language of a theorem-proving ! 121: system, Standard ML is a general-purpose ! 122: programming language with several advantages over more conventional ! 123: languages. Its important characteristics are: ! 124: ! 125: \begin{itemize} ! 126: \item {\bf Automatic garbage collection:} this is a great convenience in ! 127: writing correct and readable programs. ! 128: \item {\bf Static, polymorphic types:} like Pascal, types are checked ! 129: at compile-time and not at runtime; but like Lisp, there is great flexibility ! 130: and re-usability of code. ! 131: \item {\bf Safety:} there are no runtime insecurities (i.e. ``core is never ! 132: dumped''); this is unlike the C language, where unsafe pointers run ! 133: rampant, and like Lisp (except when Lisp programmers turn off the runtime ! 134: type checking for efficiency). In ML, there is no run-time type checking, ! 135: but safety is guarateed by compile-time type checking. ! 136: \item {\bf Higher-order functions:} like Scheme (and lambda-calculus); this ! 137: can lead to a desirable conciseness of expression. ! 138: \item {\bf Typechecked Modules:} like Ada and Modula. ! 139: \item {\bf Exception handling:} a dynamically-scoped exception ! 140: mechanism allows both hardware- and software-generated exceptions ! 141: to be caught and handled by user programs. ! 142: \end{itemize} ! 143: ! 144: One important feature of a systems programming language ! 145: is not prescribed by the list above, and in fact cannot be specified ! 146: by a language definition: an efficient and robust implementation. ! 147: Standard ML of New Jersey\cite{appel87:sml} is meant to be a complete, ! 148: efficient, robust, and cleanly written compiler for the language. ! 149: It has an optimizing code generator based on continuation-passing style ! 150: \cite{appel89:cps}. ! 151: ! 152: A knowledge of ML is not necessary to understand its runtime system. ! 153: ! 154: \section{Tagging schemes} ! 155: \label{tagging} ! 156: ! 157: Almost all the pieces of the runtime system must deal with the data structures ! 158: of the executing program. Therefore, it is helpful to keep the format of ! 159: this data as simple and straightforward as possible. ! 160: ! 161: ML, like Lisp, allows polymorphic functions: a function that reverses a list ! 162: of objects (for example) need not know the type of the objects. In order that ! 163: the same piece of executable code can operate on objects of arbitrary type, ! 164: it is necessary that every object be represented in the same amount of space. ! 165: As in Lisp, this is achieved by making everything be the size of a pointer; ! 166: if an object's natural representation is larger (as for a record of $n$ ! 167: objects), then it is represented by a pointer to storage on the heap. ! 168: ! 169: The garbage collector must be able to determine the size and layout of each ! 170: object it traverses. This can be accomplished in several ways: ! 171: \begin{itemize} ! 172: \item {\bf By encoding a type-tag\index{tag} ! 173: inside each pointer;} we chose not to do this ! 174: because it reduces the number of bits left for actual addressing. ! 175: In these days of 32-bit pointers and 100-megabyte memories, it is easy ! 176: to imagine the need for every bit of addressability we can muster. ! 177: \item {\bf By reserving different areas of memory for objects of different ! 178: types.} ! 179: BIBOP\index{BIBOP} (Big Bag Of Pages) schemes require that each page hold objects of a single ! 180: type, and a global table maps pages to types. This is relatively efficient, ! 181: and doesn't reduce the number of addressable words. But it complicates the ! 182: process of allocating and copying objects, since several free regions ! 183: are simultaneously required. ! 184: \item {\bf Statically-typed languages don't require any runtime-tagging} at all; ! 185: instead, the compiler can tell the garbage collectector about the type ! 186: system of the program. This has worked well in Pascal\cite{britton75}. ! 187: Even though it is also theoretically possible in ML, in practice the ! 188: polymorphic type system introduces overhead and complexity that make ! 189: this method unattractive\cite{appel89:tag}. ! 190: \item {\bf Each record can have a descriptor at the beginning} that explains ! 191: which fields are integers and which are pointers. This method works ! 192: well in non-polymorphic languages like Modula-2+ and Mesa, where ! 193: descriptors can be computed at compile-time and just inserted ! 194: in records as they are created. However, ! 195: in polymorphic languages like Lisp and ML ! 196: this descriptor would have to be laboriously ! 197: constructed each time a record is created, introducing unacceptable overhead. ! 198: \item {\bf Each record can have a descriptor\index{descriptor} at the ! 199: beginning} that tells ! 200: the length of the record, and {\bf each field can have a tag bit} that tells ! 201: whether it is a pointer or a non-pointer. ! 202: This is what we have done. ! 203: \end{itemize} ! 204: ! 205: To have a tag word on each record, and a tag bit on each ! 206: field, is not clever at all; but we are not always ! 207: striving for cleverness, we ! 208: want simplicity and efficiency. If all our records were {\em cons} cells, ! 209: then one-third of memory would be devoted to tag words; but memories ! 210: are large and cheap. And in Standard ML, records are of many different ! 211: sizes, and two-word records are not particularly predominant. ! 212: ! 213: By having only one free region, we are ! 214: able to do fast allocation\index{allocation} ! 215: and also manage the total amount of virtual ! 216: memory used\cite{appel89:sggc}; this would be much more difficult ! 217: with a BIBOP scheme. ! 218: ! 219: \section{Data formats in our runtime system} ! 220: \label{data} ! 221: Standard ML of New Jersey has ! 222: records, strings, procedures (machine code), ! 223: closures, constructors, arrays, byte-arrays, floating-point ! 224: numbers, references (modifiable one-word records), ! 225: modules, and integers. This large set of language primitives and ! 226: user-defined datatypes are represented ! 227: by just two runtime data formats: one for objects that contain pointers ! 228: (records, closures, constructors, arrays, references, modules), and ! 229: the other for objects containing no pointers (strings, procedures, ! 230: byte-arrays, floating point). The garbage collector (and other ! 231: parts of the runtime system) need to understand only these two formats, ! 232: not the many kinds of ML objects. ! 233: ! 234: A {\em field\index{field}} is either a ! 235: {\em pointer\index{pointer}} or an {\em integer\index{integer}}. Pointers ! 236: have a low-order bit of 0; integers have a low-order bit of 1. ! 237: It is necessary to distinguish pointers from non-pointers in order that ! 238: the garbage collector\index{garbage collector} ! 239: will know what structures to traverse. ! 240: On a byte-addressable machine, all pointers to aligned (4-byte) words ! 241: are multiples of 4 anyway, so pointers have a low-order bit of 0 in ! 242: their natural representation. The high-order 31 bits of an integer ! 243: field can represent an integer in the range $[-2^{30}, +2^{30}-1]$. ! 244: ! 245: Some Lisp implementations use a similar representation except that ! 246: pointers have a low-order 1 and integers have a low-order 0. This makes ! 247: arithmetic on tagged integers somewhat easier, and takes advantage ! 248: of the fact that pointers are usually used with a displacement addressing mode; ! 249: the displacement can be adjusted by 1 with no penalty in efficiency. ! 250: This is probably a better arrangement, but ! 251: either version of this scheme will work; and a simple analysis\cite{appel89:tag} ! 252: shows that the efficiency trade-offs are negligible, and that ! 253: doing arithmetic around the low-order integer tag is not very costly. ! 254: ! 255: An {\em object on the heap} may be either a {\em record\index{record}} ! 256: (containing fields, ! 257: i.e. pointers and tagged integers) or ! 258: a {\em string\index{string}} ! 259: (containing bytes of an arbitrary bit-pattern, but no pointers). ! 260: ! 261: A {\em record} is a sequence of $n>0$ fields numbered $0,1,...,n-1$. ! 262: Each record has a {\em descriptor} at position $-1$. The low-order bit ! 263: of a descriptor\index{descriptor} is 1 ! 264: (making it look like an integer), the next three ! 265: bits identify the object as a record, and the high-order 28 bits give the ! 266: number of fields. Thus, each record is limited to one gigabyte, which ! 267: should not be a significant limitation (seriously, though, it is important ! 268: to avoid miserly restrictions on the sizes of objects). ! 269: ! 270: In the implementation of closures, it is useful to be able to point ! 271: at the interior of records. However, it is always necessary for the ! 272: garbage collector, given a pointer, to find the descriptor of an object. ! 273: If the ! 274: fields $0$ through $k-1$ of a record are all pointers, then a pointer ! 275: can point at the $k^{\rm th}$ field of the record; since the ! 276: descriptor is unboxed (has a low-order bit of 1) and the fields $0$ through ! 277: $k-1$ will all be boxed, the garbage collector can easily search backward ! 278: for the descriptor of the record. So, we allow pointers to the interior ! 279: of a record, providing all the previous fields are boxed; this is sufficiently ! 280: flexible for our needs in implementing closures, as will be described. ! 281: Otherwise, ! 282: no pointer may address the interior of a record. ! 283: ! 284: A {\em string\index{string}} ! 285: contains $n>0$ bytes (numbered from $0$ to $n-1$), ! 286: padded with trailing zero bytes ! 287: to a multiple of four. ! 288: Immediately preceding the $0^{\rm th}$ byte is a one-word ! 289: descriptor whose low-order bit is 1, whose next three bits describe ! 290: the type of string (and distinguish strings from records), ! 291: and whose upper 28 bits give the length in bytes. ! 292: ! 293: Strings are used for a variety of purposes; ! 294: they can hold printable characters, real numbers, machine code, ! 295: or any other kind of data that doesn't contain pointers. In fact, ! 296: a string object can hold several embedded string objects. ! 297: A pointer may point at the $0^{\rm th}$ byte of a string, and may point to ! 298: an embedded string at the ! 299: $k^{\rm th}$ byte provided that $k>0$ is a multiple of 4 and that the bytes ! 300: $k-4, k-3, k-2, k-1$ contain a {\em back-pointer\index{back-pointer}} ! 301: descriptor that ! 302: contains the number $k$ (the offset to the beginning of the record). ! 303: This allows the beginning of a string object ! 304: to be found quickly, given a pointer into it. ! 305: Note that a back-pointer couldn't be distinguished from data, except ! 306: for the fact that it is found just before the pointed-to byte of the ! 307: string. ! 308: ! 309: The three tag\index{tag} bits of a descriptor\index{descriptor} ! 310: can denote any of these kinds of records, ! 311: strings, etc: ! 312: ! 313: \begin{itemize} ! 314: \item[0] record\index{record} ! 315: \item[1] forwarding-pointer\index{forwarding pointer} ! 316: (for the garbage collector\index{garbage collector}) ! 317: \item[2] back-pointer (that precedes an referenceable location in a string) ! 318: \item[3] embedded string\index{embedded string} length (described below) ! 319: \item[4] array\index{array} (just like a record). ! 320: \item[5] byte-array\index{byte-array} ! 321: (just like a string). ! 322: \item[6] this tag value is unused. ! 323: \item[7] string\index{string} ! 324: \end{itemize} ! 325: ! 326: Because of ML's static type system, it is not necessary to put type information ! 327: in the descriptors of objects. Therefore, tags are necessary only ! 328: for the garbage collector's benefit, since it must distinguish objects of ! 329: different formats. ! 330: ! 331: The only exception to this rule is that mutable objects (arrays, byte-arrays) ! 332: must be distinguishable from immutable objects (records, strings) ! 333: in order to implement the polymorphic ! 334: equality\index{equality} ! 335: feature. For all purposes of the runtime system, arrays ! 336: are identical to records, and byte-arrays are identical to strings. ! 337: ! 338: A dynamically-typed language like Lisp would require more than ! 339: three bits to distinguish types of objects; this would not pose a ! 340: serious problem. It would still be possible in Lisp to have just ! 341: two formats (pointer-containing and pointer-free), as in our runtime system. ! 342: ! 343: \section{Forwarding pointers} ! 344: \label{forwarding} ! 345: \index{forwarding} ! 346: The Standard ML of New Jersey runtime system has a generational garbage ! 347: collector\index{garbage collector} ! 348: that takes advantage of object lifetime and referencing ! 349: patterns\cite{appel89:sggc}. ! 350: But at the heart of any generational garbage collector is a simple ! 351: copying garbage collector as originally described by Cheney\cite{cheney70}. ! 352: Objects are copied from {\em fromspace} to {\em tospace} in a breadth-first ! 353: order, with the {\em tospace} itself serving as the queue for the breadth-first ! 354: traversal. The {\em fromspace} versions of objects are overwritten ! 355: with {\em forwarding pointers}, so that when other references to them are ! 356: found, it is easy to find the {\em tospace} copies of them. ! 357: ! 358: The fundamental operation in Cheney's algorithm is to {\em forward} a pointer. ! 359: This means taking a pointer into {\em fromspace} and making it point to {\em tospace}. ! 360: If the {\em fromspace} object it points to has already been copied, then its ! 361: forwarding pointer is taken as the new value; otherwise, the object must ! 362: be copied to {\em tospace} and a forwarding pointer installed. ! 363: ! 364: Forwarding is relatively easy using our runtime data format. We have a special ! 365: kind of descriptor {\bf forwarding-pointer}, that indicates that a {\em fromspace} ! 366: object has already been copied. If an object has this descriptor, then ! 367: the first word (after the descriptor) is to be interpreted as the address ! 368: of the copy. The only complications in forwarding are that pointers ! 369: may point into the middle of records and strings. ! 370: ! 371: Our runtime system (and garbage collector) is implemented in C. ! 372: In figures~\ref{forward}, \ref{trappc}, \ref{gc} we show the entire ! 373: core of our copying garbage collector. We could use pseudo-code ! 374: and describe the algorithm abstractly, but we want to emphasize that ! 375: our simple runtime data format does permit an efficient and easy-to-implement ! 376: garbage collector. ! 377: ! 378: We pretend that all ML values are integers, and make liberal use of ! 379: casts to maintain this pretense. Our \verb"forward" function is ! 380: shown in figure~\ref{forward}; it ! 381: takes an ML value (by reference) and modifies it (if a pointer into ! 382: {\em fromspace}) to point into {\em tospace}. ! 383: \begin{figure}[htbp] ! 384: \label{forward} ! 385: \begin{verbatim} ! 386: 1 forward(int *refloc) ! 387: 2 {register int *m = *((int**)(refloc)), len; ! 388: 3 if( (m&1)==0 && (m >= (int*)lowest && m < (int*)highest)) ! 389: 4 { m--; /* make m point at the descriptor */ ! 390: 5 while(1) ! 391: 6 {len = (*m)>>4; ! 392: 7 switch(m&15) ! 393: 8 {case tag_backptr: ! 394: 9 m -= len; ! 395: 10 continue; ! 396: 11 case tag_string: ! 397: 12 case tag_bytearray: ! 398: 13 len = (len+3)/4; ! 399: ! 400: 19 /* fall through */ ! 401: 20 case tag_record: ! 402: 21 case tag_array: ! 403: 22 {int **i=(int**)m, **j=to_ptr; ! 404: 23 while (j+len >= to_lim) ! 405: 24 do to_lim=gmore(); ! 406: 25 while (len-- >= 0) ! 407: 26 do {*j++ = *i++;} ! 408: 27 ((int**)m)[1]= 1+(int*)to_ptr; ! 409: 28 to_ptr = j; ! 410: 29 } ! 411: 30 (*m) = tag_forwarded; ! 412: 31 /* fall through */ ! 413: 32 case tag_forwarded: ! 414: 33 *(int*)(refloc) += ((int*)m)[1] - ((int)(m+1)); ! 415: 34 return; ! 416: 35 case tag_embedded: ! 417: 36 default: ! 418: 37 m--; continue; ! 419: 38 }} } } ! 420: \end{verbatim} ! 421: \caption{The {\tt forward} function} ! 422: \end{figure} ! 423: Line 2 establishes $m$ as a copy of the original value to be forwarded. ! 424: Line 3 considers only the case that $m$ is a pointer, and points into ! 425: {\em fromspace}. (Any pointers not into {\em fromspace} are treated as constants.) ! 426: Line 4 adjusts m to point at the descriptor of the {\em fromspace} object. ! 427: ! 428: Lines 5--38 loop until the beginning of the object is found. ! 429: If [line 8] $m$ points to a back-pointer\index{back-pointer}, ! 430: the appropriate offset is subtracted ! 431: and we start again; similarly, if [lines 35,36] $m$ points to a pointer ! 432: (which can happen if we ! 433: point into the middle of a closure\index{closure} record) or an ! 434: embedded\index{embedded string} descriptor, ! 435: then $m$ is decremented and we try again. ! 436: ! 437: If $m$ is a string\index{string length} or byte array [line 13], ! 438: then we adjust its $len$ to count in words rather ! 439: than bytes (i.e. we divide by 4, rounding up). ! 440: (Lines 14--18 are explained below.) ! 441: ! 442: Now at line 22 we have either a string or record in {\em fromspace} ! 443: that needs to be copied. We verify that there's enough space remaining ! 444: in {\em tospace}; if not, we call the function \verb"gmore" that will either ! 445: \index{gmore} ! 446: get more space or die trying. Then we copy all the words of the ! 447: object. Finally [lines 27 and 30] we install a forwarding pointer ! 448: into the {\em fromspace} object and mark its descriptor as forwarded. ! 449: ! 450: For an already forwarded object (line 33), whether it was forwarded ! 451: in a previous call or just now, adjust the reference \verb|*refloc| ! 452: to point at the new object. Since the reference might have pointed ! 453: at the middle of the old object, we must take care to make it point ! 454: to the corresponding location in the new object; this is accomplished ! 455: by the computation: ! 456: \[ ! 457: {\rm old\;pointer} + {\rm beginning\;of\;new\;object} - ! 458: {\rm beginning\;of\;old\;object.} ! 459: \] ! 460: ! 461: Then the \verb"forward" function returns. ! 462: ! 463: Almost without exception, all pointers to objects point to places where ! 464: an immediately preceding descriptor\index{descriptor} ! 465: explains the format of the object. ! 466: The exception is an artifact of the ! 467: very fast allocation mechanism that can create and initialize ! 468: a $k$-word object in $k+2$ instructions\cite{appel89:sggc}. ! 469: This mechanism relies on a page fault\index{fault} ! 470: trap to invoke the garbage collector ! 471: when the free space is exhausted; when this trap occurs, the program ! 472: \index{pc} ! 473: counter will point into the middle of a string object that might ! 474: be moved by the garbage collector. This is the only reference that ! 475: points to a place in a string that lacks a backpointer\index{back-pointer}. ! 476: It can be handled ! 477: relatively simply: as each string object is moved, we check to see ! 478: if the saved program counter points into the middle of it; if so, ! 479: we adjust the saved program counter. Since the number of string objects ! 480: is typically less than one percent of the number of record objects, ! 481: very costly overall. The test occurs at line 14, as shown in ! 482: figure~\ref{trappc}. ! 483: \index{trap_pc} ! 484: \begin{figure}[htbp] ! 485: \label{trappc} ! 486: \begin{verbatim} ! 487: ! 488: 14 if (!trap_pc_done ! 489: 15 && m < trap_pc && m+len >= trap_pc) ! 490: 16 {trap_pc_done=1; ! 491: 17 trap_pc += to_ptr - (int**)m; ! 492: 18 } ! 493: \end{verbatim} ! 494: \caption{Testing the (saved) program counter} ! 495: \end{figure} ! 496: ! 497: \section{Garbage collection} ! 498: \label{garbage} ! 499: ! 500: Once the \verb"forward"ing procedure is written, a simple copying garbage ! 501: collector is trivial (figure~\ref{gc}). ! 502: \index{garbage collector} ! 503: \begin{figure}[htbp] ! 504: \label{gc} ! 505: \begin{verbatim} ! 506: 1 gc(int *from_low, int *from_high, ! 507: 2 int *to_low, int *to_high, ! 508: 3 int **roots, ! 509: 4 int *trap_pcx, int *(*get_more)() ! 510: 5 ) ! 511: 6 { gmore=get_more; trap_pc = *trap_pcx; to_ptr = to_low; ! 512: 7 trap_pc_done = !(trap_pc>=(int*)from_low ! 513: 8 && trap_pc<(int*)from_high); ! 514: 9 lowest=from_low; highest=from_high; ! 515: ! 516: 10 while (*roots) forward(*roots++); ! 517: ! 518: 11 { int *x = to_low; ! 519: 12 while (x<to_ptr) ! 520: 13 { int *p = x+1, descr = (*x), len=descr>>4; ! 521: 14 if (string_or_bytearray(descr)) ! 522: 15 x += (len+3)/4 + 1; ! 523: 16 else {x += len+1; ! 524: 17 do {forward(p++);} while (p<x); ! 525: 18 } } ! 526: ! 527: 19 *trap_pcx = trap_pc; ! 528: 20 } ! 529: \end{verbatim} ! 530: \caption{The {\tt gc} function} ! 531: \end{figure} ! 532: The function \verb|gc| is parametrized by the lower and upper bounds ! 533: of {\em fromspace} and the lower and upper bounds of {\em tospace}. The \verb"roots" ! 534: parameter is a vector of pointers to the roots of accessible objects; ! 535: in the Standard ML system this is little more than the addresses of ! 536: saved machine registers. The last two parameters are the address ! 537: of the saved program counter (necessary as explained in ! 538: section~\ref{forwarding}) ! 539: and the address of a function which can be called to expand the {\em tospace} ! 540: if necessary. ! 541: ! 542: The first step [lines 6--9] is to put various quantities into global ! 543: variables to make them accessible to the \verb"forward" procedure, ! 544: since C does not have nested procedures. ! 545: ! 546: The next step is to forward all the root variables (line 10). ! 547: ! 548: Finally [lines 11-18], ! 549: we forward each word of each record in {\em tospace}. Since the ! 550: \verb|forward| procedure may increment the \verb"to_ptr" variable ! 551: that denotes the end of the filled portion of {\em tospace}, we have to ! 552: keep iterating until \verb|x| catches up with it. In effect, ! 553: the {\em tospace} between \verb|x| and \verb|to_ptr| is the queue of the ! 554: breadth-first search. The queue must eventually become empty, ! 555: since there is a finite amount of accessible data to be copied. ! 556: In this phase, integers (and strings) are just skipped, since ! 557: they contain no pointers that need forwarding. ! 558: ! 559: This garbage collector is relatively simple, and therefore it's not difficult ! 560: to make it fast. Even the fanciest of generational or concurrent collectors ! 561: relies on a program like this as its inner loop; the very simple layout ! 562: of data in the Standard ML runtime system makes efficiency easy. ! 563: ! 564: \section{Fast allocation} ! 565: \label{fast} ! 566: ! 567: Copying garbage collection, because it takes time proportional only to the ! 568: live data and not to the amount of garbage, can be arbitrarily ! 569: efficient\cite{appel87:gc}. That is, there is no lower bound on the ! 570: amortized cost of garbage collection for each cell allocated. ! 571: Standard ML's generational copying garbage collector\cite{appel89:sggc} ! 572: expends on the order of one (amortized) ! 573: machine instruction for every cell allocated; the precise amortized ! 574: cost is proportional to the ratio of live data to heap size. ! 575: ! 576: Since garbage collection is so fast, it makes sense to make ! 577: allocation\index{allocation} ! 578: fast too. Standard ML of New Jersey allocates a new record every ! 579: 40 to 80 machine instructions, so we want a very low overhead on the ! 580: creation of an object. ! 581: ! 582: Since copying garbage collectors compact the live objects into consecutive ! 583: memory cells, the free region is all contiguous. This means that to create ! 584: an $n$-word object, we can just grab the next $n$ words of the free space. ! 585: Since objects are created so often, it makes sense to make allocation ! 586: in-line (with no procedure call) and to reserve a register \verb"fr" to point ! 587: to the beginning of the free region. Then it becomes very simple ! 588: to allocate a new object: \verb"A = CONS(B,C)" is implement as ! 589: \begin{enumerate} ! 590: \item mem[fr+2] := C ! 591: \item mem[fr+1] := B ! 592: \item mem[fr] := descriptor(2,tag\_record) ! 593: \item A := fr+1 ! 594: \item fr := fr+3 ! 595: \end{enumerate} ! 596: Each of these lines is one machine instruction. Thus, in five instructions ! 597: (plus one instruction of amortized garbage collection overhead) ! 598: we have made a new {\em cons} cell; it takes only twice as long to create ! 599: a data structure as it does to read it! ! 600: This fast allocation encourages a simple and clean programming style; ! 601: no longer do programmers have to stand on their heads to avoid {\em cons}ing. ! 602: ! 603: Of course, this won't work if the free region is exhausted. We can insert ! 604: an explicit test to make sure that \verb"fr" is not near the end of the ! 605: free region. But a more clever trick is to make the virtual memory page ! 606: at the end of the free region inaccessible, so that we will get a page ! 607: fault\index{fault}\index{page fault} ! 608: when the free region is exhausted. That's why we store the ! 609: last field (\verb"mem[fr+2]") first in the example above; it's simpler ! 610: for the garbage collector if the fault occurs at the very beginning ! 611: of creating the new cell. ! 612: (On the MC68020, the state of the machine at a page fault is complicated, ! 613: and it's not easy to restart the faulting instruction; so on that machine ! 614: we use an explicit comparison with a limit register.) ! 615: ! 616: The page fault will be caught by the hardware and handed to the operating ! 617: system, which can then pass control to the user process. The user process ! 618: then has to find all the registers of the faulting procedure; these ! 619: registers are the roots of the accessible data. Appel\cite{appel89:sggc} ! 620: and Cormack\cite{cormack88} both describe schemes for finding these ! 621: registers; Cormack's is simpler and more reliable. ! 622: ! 623: The ML program and the garbage collector behave like two processes ! 624: (threads\index{thread}) ! 625: running in the same address space: while one thread executes, the ! 626: other's registers are saved. When ML suspends itself (either ! 627: because of a page fault indicating end of free space, or voluntarily), ! 628: it saves its registers into a static area that looks just like a process ! 629: control block, so that the garbage collector (and other parts of the ! 630: runtime system) can access them. ! 631: ! 632: \begin{quotation} ! 633: {\small Here's how Cormack's scheme works: A page fault arrives, causing ! 634: the operating system to invoke the C function \verb"ghandle". ! 635: \index{ghandle} ! 636: This function is passed (as an argument) a structure containing ! 637: the address of the faulting instruction. Other saved registers are ! 638: at undocumented locations on the stack. \verb"ghandle" saves the ! 639: \index{saved_pc}\index{pc} ! 640: faulting pc in \verb"saved_pc", and modifies its argument structure ! 641: to point to the assembly-language function \verb"saveregs". Then it ! 642: returns; the operating system restores the pc from the argument ! 643: structure, restores other registers (from those undocumented locations), ! 644: and resumes. But of course, we have fooled the operating system into ! 645: \index{saveregs} ! 646: resuming in \verb"saveregs", which stores all registers into known ! 647: global variables and returns to the C thread (i.e. in the function ! 648: \verb"runML"). ! 649: ! 650: The C thread (typically) does garbage collection, then calls the ! 651: assembly-language function \verb"restoreregs", which loads registers ! 652: (and program counter) from their global variables and resumes execution ! 653: of the ML thread. ! 654: ! 655: Sometimes it's useful to invoke the C thread without a page fault, ! 656: e.g. to export the process state into a file or to do a structured ! 657: write. In this case, \verb"saveregs" can be called directly from assembly ! 658: code in the ML thread. ! 659: } ! 660: \end{quotation} ! 661: \section{Heap allocation of procedure call frames} ! 662: \label{heap} ! 663: ! 664: Since heap allocation is so cheap, we put procedure call frames on the ! 665: heap instead of on the stack\index{stack}. ! 666: This has many advantages\cite{appel89:cps}, ! 667: but the one of interest here is that it greatly simplifies the runtime ! 668: system. Many of the facilities described in this paper ! 669: would be much more complicated to implement if runtime stacks had to be ! 670: dealt with. ! 671: ! 672: Any record on the stack must be initialized as it is created; ! 673: otherwise it will contain garbage data that could be interpreted ! 674: by the garbage collector as spurious pointers. However, typical ! 675: code generators will allocate a call frame on ! 676: entry to a procedure and spill registers into it as needed. ! 677: This must be avoided. One way to solve ! 678: this problem is to delay allocation of the frame; values can be accumulated ! 679: in registers until spilling is necessary, then all the registers can ! 680: be spilled at once into a newly-created frame, which is just a record ! 681: object in runtime data format. Thus, the frame is completely initialized ! 682: as it is created. ! 683: ! 684: The Standard ML of New Jersey compiler doesn't use ``procedure call frames.'' ! 685: Since it uses continuation\index{continuation}-passing ! 686: style\cite{steele78}\cite{kranz86}\cite{appel89:cps}, ! 687: what an ordinary mortal might call a ``frame'' is really just the ! 688: closure of a continuation. It is easy to create closures as ordinary ! 689: records that are completely initialized when they are created. ! 690: This simplifies both the code generator ! 691: and the runtime system, though the trick described in the previous paragraph ! 692: will work for more conventional code generators. ! 693: ! 694: Since there's no runtime stack, the ! 695: {\em call-with-current-continuation}\cite{rees86} ! 696: \index{call-with-current-continuation} ! 697: primitive can be implemented very efficiently. In implementations ! 698: with a runtime stack, the entire stack must be copied when {\em call/cc} ! 699: is evaluated (or else there must be a lot of extra complexity in stack ! 700: management); without a stack, the execution of ! 701: {\em call/cc}, and the execution of ! 702: saved continuations, take just a few instructions each. This makes ! 703: {\em call/cc} a practical programming tool, just as fast allocation ! 704: makes {\em cons} practical. ! 705: ! 706: \section{Representing ML structures in records and strings} ! 707: \label{representing} ! 708: Section~\ref{data} describes just two kinds of ! 709: objects---records\index{record} and ! 710: strings\index{string}---referenceable at their beginning and (in a limited way) ! 711: at interior points. All of the kinds of ML data can be represented ! 712: in records and strings. ! 713: ! 714: An ML value must be representable in one word. A larger value can be {\em ! 715: boxed} by putting it in several words on the heap and keeping a (one-word) ! 716: pointer to it. A small value (like a 31-bit integer) ! 717: can be kept {\em unboxed\index{unboxed values}} by representing it without indirection in ! 718: a machine word. ! 719: Boxed\index{boxed values} ! 720: values (pointers) ! 721: are distinguished from unboxed values (integers, or data represented as ! 722: integers) ! 723: by their low-order bit. ! 724: ! 725: An ML {\bf record\index{record}} ! 726: is an $n$-tuple of values. It has a natural representation ! 727: as a record in our runtime data format. ML has a {\it pro forma} ! 728: record of length 0; as our runtime data format does not allow objects of length ! 729: 0 (since that would leave no room to put a forwarding pointer, as described in ! 730: a section~\ref{forwarding}), ! 731: the empty record ! 732: is represented as the unboxed integer 0. This does no harm, since ! 733: no fields can be selected from an empty record anyway. ! 734: ! 735: An ML {\bf array\index{array}} is also an $n$-tuple of values. ! 736: In the ML language, ! 737: record-field offsets are determined at compile-time, whereas ! 738: arrays may be indexed by runtime values; ! 739: and arrays may be modified after ! 740: they are created, whereas records are immutable. ! 741: But neither of these ! 742: differences matters to the runtime system. ! 743: The polymorphic equality function of ML requires distinguishing between ! 744: mutable and immutable objects at runtime, so arrays and records ! 745: have different tags, however; this requires that they have different ! 746: tags. ! 747: ! 748: ML does permit ! 749: arrays\index{arrays of length 0} of length 0, ! 750: but our runtime data format does not. ! 751: Happily, all arrays of length 0 have ! 752: the same behaviour, so a special object of length 0 (located outside ! 753: the garbage-collectible region) serves to represent all the ! 754: empty arrays. ! 755: ! 756: {\bf References} in ML are mutable objects: ! 757: \verb"val a = ref 5" declares a reference\index{reference variables} ! 758: variable \verb"a" that may be ! 759: changed by a later assignment statement, unlike most variables which cannot ! 760: be modified once defined. Reference variables behave just like single-element ! 761: arrays, and that's how they are represented in the runtime system. ! 762: ! 763: ML has {\bf datatype\index{datatype}} ! 764: declarations to allow tagged variant types. For example, ! 765: the declaration ! 766: \begin{verbatim} ! 767: datatype t = NAME of string | NUMBER of int ! 768: \end{verbatim} ! 769: specifies a type \verb"t" that can be either a string or an integer, ! 770: depending on whether the constructor\index{constructor} \verb"NAME" ! 771: or \verb"NUMBER" is used to create it. ! 772: The program can examine any object of type \verb"t" and determine whether ! 773: it has the \verb"NAME" or \verb"NUMBER" representation---that is, it ! 774: is a {\em tagged} union. An object of type \verb"t" is represented ! 775: as a two-element record: one field contains the tag (a small integer), ! 776: and the other field contains the value. ! 777: The details of constructor representation may be of interest ! 778: only to those knowledgeable in ML, and are described in ! 779: section~\ref{constructor}. ! 780: ! 781: ML has {\bf strings\index{string}} ! 782: of characters. Strings can be of any non-negative length. ! 783: Since our runtime representation cannot support objects of length 0 ! 784: (because one word is needed to store the forwarding pointer, as described ! 785: in a section~\ref{forwarding}), ! 786: the empty string\index{empty string}\index{null string} ! 787: \index{string of length 0} ! 788: must be treated specially. However, it turns out that ! 789: it is never necessary to create a new empty string; the only occurrences ! 790: of empty strings are as literals in the program text. String ! 791: literals (and back-pointers) will be discussed in section~\ref{function}. ! 792: ! 793: Strings\index{Strings of length one} of length one are treated specially by the compiler, ! 794: though this is not necessary either for the ML language or for our runtime ! 795: data format. Single-character strings\index{single-character strings} ! 796: \index{characters} ! 797: are treated as unboxed integers ! 798: between 0 and 255, to avoid heap-allocation for this (frequent) special case. ! 799: Though this technique may cause ! 800: less allocation, it requires special tests on every ! 801: string operation. It's not clear whether this special case saves more than ! 802: it costs. However, this special case is transparent to the runtime system ! 803: anyway; all conversions, etc. are handled explicitly by the compiler; ! 804: the single-character strings are not a new runtime data format, but ! 805: look like ordinary unboxed integers. ! 806: ! 807: {\bf Byte-arrays\index{byte-arrays}} ! 808: in ML are to strings as arrays are to records: they have the ! 809: same representation as strings, but their tag is different to facilitate ! 810: certain language features. ! 811: ! 812: {\bf Floating-point\index{floating-point}} numbers are too large to fit in ! 813: one word. Even if we chose ! 814: to use single-precision floating point, it would be difficult to store ! 815: them unboxed\index{unboxed} ! 816: because there is no bit available for use as a tag\index{tag} bit. ! 817: Thus, all floating-point numbers are stored boxed, with 8 bytes of data and ! 818: one word of descriptor. ! 819: The descriptor\index{descriptor} ! 820: must be one that indicates that the contents of the ! 821: object contain no pointers. ! 822: Since ML is statically typed, the language ! 823: never needs to distinguish (at runtime) floating-point numbers from ! 824: strings, so we can just represent a double-precision float as an 8-byte ! 825: string. This is not an ASCII string, it is the hardware representation; ! 826: a ``string'' descriptor does not denote that the contents are printable ! 827: characters, just that there is an arbitrary bit-pattern not containing ! 828: pointers. ! 829: ! 830: Similarly, {\bf machine-code procedures} ! 831: \index{machine code} do not require a separate class ! 832: of descriptor. From the garbage collector's point of view, machine ! 833: code is just like strings: it contains bits that are to be (perhaps) ! 834: moved from place to place but which never represent pointers. ! 835: Thus, machine code is just kept in string objects. The mechanism by ! 836: which we avoid placing pointers\index{pointers} ! 837: (or any reference to other objects) ! 838: in machine code will be described in section~\ref{modules}. ! 839: ! 840: \section{Closures} ! 841: \label{closures} ! 842: \index{closure} ! 843: In ML, as in many lexically scoped languages, functions may have free ! 844: variables. That is, the body of a function may reference not only its ! 845: own formal parameters\index{formal parameters} ! 846: (the bound variables of the function), but the ! 847: formal parameters of a statically enclosing function. Consider the ! 848: function ! 849: \begin{verbatim} ! 850: fun add(x) = let fun add1(y) = x+y ! 851: in add1 ! 852: end ! 853: \end{verbatim} ! 854: The function \verb"add(x)" defines an internal function \verb"add1(y)" that ! 855: adds \verb"x" and \verb"y", and then \verb"add" returns \verb"add1" as its ! 856: result. The variable \verb"x" is bound by \verb"add", and ! 857: \verb"y" is bound by \verb"add1". Therefore, if we just consider the ! 858: function definition \verb"fun add1(y) = x+y", the variable \verb"x" ! 859: is a free variable\index{free variable} of this internal function. ! 860: ! 861: In order to evaluate \verb"add1" applied to some argument, we must have ! 862: a context in which the value \verb"x" is defined. It will not suffice ! 863: for the \verb"add" function just to return a pointer to the machine code ! 864: of \verb"add1" as its result. Instead, it must return a {\em closure}: ! 865: a combination of a code-pointer an some information about the values of ! 866: all the free variables (in this case, just \verb"x"). ! 867: ! 868: For example, the result of evaluating \verb"add(5)" is (abstractly) the ! 869: function which will always add 5 to its argument. ! 870: Figure~\ref{add5} shows a possible representation ! 871: as a record object, with a descriptor at the beginning for the ! 872: garbage collector. ! 873: \begin{figure}[htbp] ! 874: \label{add5} ! 875: \begin{verbatim} ! 876: +------+ ! 877: | desc | ! 878: +------+ ! 879: p--> | o------------> machine code for add1 ! 880: +------+ ! 881: | 5 | ! 882: +------+ ! 883: \end{verbatim} ! 884: \caption{A simple closure.} ! 885: \end{figure} ! 886: The pointer \verb"p" is the runtime representation of the particular version ! 887: of \verb"add1" in which \verb|x| is bound to 5. This pointer may be passed ! 888: as an argument to some other function that doesn't know anything about ! 889: the \verb"add1" function. All the other function has to know is how ! 890: to ``call'' a closure. The calling sequence might be: ! 891: \begin{enumerate} ! 892: \item Put \verb|p| in register 0. ! 893: \item Put the return address in register 1. ! 894: \item Fetch p[0] into register 2. ! 895: \item Jump to the address contained in register 2. ! 896: \end{enumerate} ! 897: Step 1 is necessary so that when \verb"add1" is entered, it can gain ! 898: access to the closure record where the binding of \verb"x" is stored. ! 899: Step 2 is just the normal saving of a return address, which might ! 900: in many cases be on a stack instead of in a register. Steps 3 and ! 901: 4 are just the typical code necessary to jump to a runtime-bound address. ! 902: ! 903: It is important to note that the function that calls a closure \verb"p" ! 904: doesn't need to access any part of it except the machine-code pointer ! 905: in \verb|p[0]|. This means that the free variables (the rest of the ! 906: closure) may have an arbitrary arrangement, known only to the ! 907: function that builds the closure (\verb"add", in this case) and the ! 908: function that executes it (\verb"add1", in this case). Closures can ! 909: be structured in many ways\cite{appel88:clo}, and can even point ! 910: to other closures for access to free variables (i.e., they can ! 911: have ``static links\index{static link},'' in Algol terminology). ! 912: ! 913: Clearly, closures are represented a lot like records. A closure has several ! 914: fields, each of which is either a machine-code pointer or a free variable. ! 915: The machine-code pointer is just a (boxed) string pointer, ! 916: and the free variables are ! 917: just one-word ML objects. Therefore, we use record\index{record} ! 918: objects to represent closures. ! 919: ! 920: When several functions share a set of free variables, it is convenient ! 921: to have a {\bf multiple-function closure.} ! 922: \index{closure for multiple functions} ! 923: \index{multiple functions closure} ! 924: \index{shared closure} ! 925: Consider the functions ! 926: \begin{verbatim} ! 927: fun test(y,z) = ! 928: let fun even(i) = if i=0 then y else odd(i-1) ! 929: and odd(i) = if i=0 then z else even(i-1) ! 930: in (even,odd) ! 931: end ! 932: \end{verbatim} ! 933: This function, when applied to arguments \verb"(3,7)", would return ! 934: two functions (let's call them \verb"even37" and \verb"odd37"). ! 935: from integer to integer. \verb"Even37" applied to even integers ! 936: returns 3 and applied to odd integers returns 7, and \verb"odd37" ! 937: works the other way around. ! 938: ! 939: We could represent these functions by two different closures, as shown ! 940: in figure~\ref{twoclos}. ! 941: \begin{figure}[htbp] ! 942: \label{twoclos} ! 943: \begin{verbatim} ! 944: ! 945: +------+ ! 946: | desc | ! 947: +------+ ! 948: even37 --> | o------------> machine code for even ! 949: +------+ ! 950: y | 3 | ! 951: +------+ ! 952: odd | o-------> odd37 ! 953: +------+ ! 954: ! 955: +------+ ! 956: | desc | ! 957: +------+ ! 958: odd37 --> | o------------> machine code for odd ! 959: +------+ ! 960: z | 7 | ! 961: +------+ ! 962: even | o--------> even37 ! 963: +------+ ! 964: \end{verbatim} ! 965: \caption{Two mutually recursive closures} ! 966: \end{figure} ! 967: This is particularly unfortunate because \verb"even" and \verb"odd" ! 968: are free variables\index{free variables} ! 969: of each other, and this requires the construction ! 970: of a cyclic data structure of closures, which can get complicated. ! 971: ! 972: A more clever trick\cite{appel87:sml}\cite{kranz86} is to let these ! 973: two functions share a closure, as shown in figure~\ref{oneclos}. ! 974: \begin{figure}[htbp] ! 975: \label{oneclos} ! 976: \begin{verbatim} ! 977: +------+ ! 978: | desc | ! 979: +------+ ! 980: even37 --> | o------------> machine code for even ! 981: +------+ ! 982: odd37 --> | o------------> machine code for odd ! 983: +------+ ! 984: y | 3 | ! 985: +------+ ! 986: z | 7 | ! 987: +------+ ! 988: \end{verbatim} ! 989: \caption{Two functions sharing a closure} ! 990: \end{figure} ! 991: Now, when \verb"even37" is called, its closure-pointer will be loaded ! 992: in register 0, and it knows that it can access \verb:y: at offset 2 from ! 993: register 0. And when \verb"odd37" is called, its closure pointer ! 994: (which is really a pointer to the second field of the record) will be ! 995: loaded in register 0, and it can access \verb"z" at offset 2. ! 996: In some situations, the same free variable may be accessed by different ! 997: functions in the closure, and the code generator will have to remember ! 998: that the offsets\index{offset} from register 0 to a particular field ! 999: depend on which closure-pointer is loaded (i.e. which function we are ! 1000: generating code for). ! 1001: ! 1002: Finally, if \verb"even" wants to call \verb"odd", it can just generate ! 1003: the appropriate closure value in register 0 by adding an offset of 1 ! 1004: to its own closure pointer. In general, all mutually recursive ! 1005: functions can be handled this way, and cyclic\index{cyclic closure} ! 1006: closure structures ! 1007: are never required. But even non-mutually-recursive sets of functions ! 1008: can save storage by sharing closures. ! 1009: ! 1010: It should now be clear why (in section~\ref{data}) we arranged for ! 1011: pointers into the middle of record objects. Given the pointer ! 1012: \verb"odd37", the garbage collector can easily search backward until ! 1013: it finds the descriptor\index{descriptor} ! 1014: of the record; the descriptor is unboxed, ! 1015: while all of the previous machine-code pointers are boxed fields. ! 1016: ! 1017: \section{Function entry points} ! 1018: \label{function} ! 1019: ! 1020: A typical compilation\index{compilation unit} unit ! 1021: will contain many functions, and it would be ! 1022: unwieldy to make a different string object for each function: ! 1023: since the functions in a compilation unit often call each other, ! 1024: these calls will go much faster as pc-relative jumps than if ! 1025: we had to fetch addresses from closures. But to achieve pc-relative ! 1026: jumps, the relative distance between two functions cannot change; ! 1027: and the garbage collector moves objects around. ! 1028: So to achieve ! 1029: this constant distance, we must put several functions in the ! 1030: same string object, and have several ! 1031: entry points. ! 1032: ! 1033: For example, the \verb"even", \verb"odd", and \verb"test" ! 1034: functions of section~\ref{closures} could all be placed a single string, ! 1035: \index{embedded string} ! 1036: with appropriate back-pointer descriptors (figure~\ref{embedded1}). ! 1037: \begin{figure}[htbp] ! 1038: \label{embedded1} ! 1039: \begin{verbatim} ! 1040: +------+ ! 1041: | desc | ! 1042: +------+ ! 1043: test --> | t | ! 1044: + e + ! 1045: | s | ! 1046: + t + ! 1047: | | ! 1048: +------+ ! 1049: | [5] | ! 1050: +------+ ! 1051: even --> | e | ! 1052: + v + ! 1053: | e | ! 1054: + n + ! 1055: | | ! 1056: +------+ ! 1057: | [10] | ! 1058: +------+ ! 1059: odd --> | o | ! 1060: + d + ! 1061: | d | ! 1062: +------+ ! 1063: \end{verbatim} ! 1064: \caption{Three code strings embedded in a string} ! 1065: \end{figure} ! 1066: The numbers \verb"[5]" and \verb"[10]" are back-pointers, formatted ! 1067: as described in section~\ref{data}. They tell the garbage collector ! 1068: the offset to the beginning of the object, so it can find the true descriptor. ! 1069: ! 1070: The advantage to putting several functions in the same string is that ! 1071: they can refer to each other directly, without having to access each ! 1072: other through closures. Of course, it will be necessary to adjust ! 1073: the closure\index{closure} ! 1074: pointer in register 0. Here, \verb"even" could call \verb"odd" ! 1075: by this (simpler) calling sequence: ! 1076: \begin{enumerate} ! 1077: \item Add 1 to register 0 (to adjust p). ! 1078: \item Put the return address in register 1. ! 1079: \item Jump to \verb"odd". ! 1080: \end{enumerate} ! 1081: ! 1082: The garbage collector may move a code string from one place to another ! 1083: in memory, so it is necessary that the jump in step 3 be a ! 1084: pc\index{pc}-relative ! 1085: jump. ! 1086: ! 1087: ML programs may contain literal\index{literal} ! 1088: \index{string literal} ! 1089: strings and floating\index{floating point constant} point constants. ! 1090: These are embedded within string objects amongst the machine-code ! 1091: functions. For example, ! 1092: \begin{verbatim} ! 1093: fun show(b) = if b then "true" else "false" ! 1094: \end{verbatim} ! 1095: could almost have the representation shown in figure~\ref{embedded2}. ! 1096: \begin{figure}[htbp] ! 1097: \label{embedded2}. ! 1098: \begin{verbatim} ! 1099: +------+ ! 1100: | desc | ! 1101: +------+ ! 1102: show --> | s | ! 1103: + h + ! 1104: | o | ! 1105: + w + ! 1106: | | ! 1107: +------+ ! 1108: | [5] | ! 1109: +------+ ! 1110: true --> | true | ! 1111: +------+ ! 1112: | [7] | ! 1113: +------+ ! 1114: false --> | fals | ! 1115: + + ! 1116: | e000 | ! 1117: +------+ ! 1118: \end{verbatim} ! 1119: \caption{String literals, oversimplified} ! 1120: \end{figure} ! 1121: The function \verb"show" returns a pointer to \verb"true" or \verb"false"; ! 1122: this pointer points into the middle of the large string object at one ! 1123: of the embedded strings. Because the embedded string is preceded by a ! 1124: back-pointer descriptor, the garbage collector won't be confused. ! 1125: ! 1126: There's a slight problem with this. Although the garbage collector doesn't ! 1127: need to know the length of the string \verb|"true"|, the ML program might ! 1128: apply the \verb"length" function to it. ! 1129: String\index{string length} lengths are kept in the ! 1130: descriptor word, and are accessed by fetching field $-1$ and shifting ! 1131: off the tag bits. The problem here is that the descriptor for \verb"true" ! 1132: is a back-pointer\index{back-pointer}, ! 1133: not a string descriptor. The solution---simple and ! 1134: inelegant---is to introduce a new kind of descriptor, an {\em embedded ! 1135: string}, which contains the length of the string and always is preceded ! 1136: by a backpointer. Thus, the actual representation of the \verb"show" ! 1137: function is more like figure~\ref{embedded3}. ! 1138: \begin{figure}[htbp] ! 1139: \label{embedded3} ! 1140: \begin{verbatim} ! 1141: +------+ ! 1142: | desc | ! 1143: +------+ ! 1144: show --> | s | ! 1145: + h + ! 1146: | o | ! 1147: + w + ! 1148: | | ! 1149: +------+ ! 1150: | [6] | ! 1151: +------+ ! 1152: | 4emb | ! 1153: +------+ ! 1154: true --> | true | ! 1155: +------+ ! 1156: | [9] | ! 1157: +------+ ! 1158: | 5emb | ! 1159: +------+ ! 1160: false --> | fals | ! 1161: + + ! 1162: | e000 | ! 1163: +------+ ! 1164: \end{verbatim} ! 1165: \caption{String literals embedded in a code string} ! 1166: \end{figure} ! 1167: ! 1168: The treatment of embedded \index{floating point constant} ! 1169: floating point constants is just like that of ! 1170: embedded strings. However, the ML program will never need to take the ! 1171: length of a floating point constant, so the embedded-string descriptor ! 1172: can be omitted and a simple backpointer\index{backpointer} will suffice. ! 1173: Note that embedded strings of length 0 are permitted, since there ! 1174: is no need for a word to install a forwarding pointer (see ! 1175: section~\ref{forwarding}, ! 1176: and note that forwarding pointers go only at the beginning of the ! 1177: entire string object, not at embedded strings). ! 1178: ! 1179: The address of string and floating constants may appear in machine code, ! 1180: but only in a pc\index{pc}-relative way, since code objects may be moved in memory ! 1181: by the garbage collector\index{garbage collector}. ! 1182: ! 1183: \section{Modules and compilation units} ! 1184: \label{modules} ! 1185: ! 1186: Standard ML has a powerful module\index{module} ! 1187: system, allowing nested modules, ! 1188: modules as parameters to other modules, thinning of a richer module ! 1189: to form a smaller module, etc. The module system can be represented ! 1190: entirely as records and functions in the runtime system\cite{appel87:sml}; ! 1191: no new machinery is needed. ! 1192: ! 1193: A compilation unit\index{compilation unit} ! 1194: is just a sequence of function and value declarations that ! 1195: will be compiled together into one string object. But one compilation ! 1196: unit may refer to values defined in another, and a linkage\index{linkage} ! 1197: mechanism is required. ! 1198: ! 1199: We can use the power of higher-order functions to implement the linkage ! 1200: in a way that is transparent to the runtime system. If unit \verb"B" ! 1201: refers to a value in unit \verb"A", then the compiler will treat ! 1202: \verb"A" as an implicit parameter of \verb"B". For example, suppose we have ! 1203: two compilation units ! 1204: \begin{verbatim} ! 1205: fun f(x,y) = x+y ! 1206: \end{verbatim} ! 1207: and ! 1208: \begin{verbatim} ! 1209: fun g(z) = f(z,z) ! 1210: \end{verbatim} ! 1211: clearly \verb"f" is a free variable\index{free variable} ! 1212: of \verb"g". But the compiler can ! 1213: parametrize the second unit, as if it were written ! 1214: \begin{verbatim} ! 1215: fun g0(f) = let fun g(z) = f(z,z) ! 1216: in g ! 1217: end ! 1218: \end{verbatim} ! 1219: This compilation unit has no free variables; the ML system will simply ! 1220: apply the second unit to the first one, yielding the desired function ! 1221: \verb"g". The compiler must keep track of inter-module references ! 1222: in order to do this, but at least the runtime mechanism for linkage is simple. ! 1223: ! 1224: So, each compilation unit is a closed function with no free variables. ! 1225: No special linkage mechanism is built into the runtime system; the ! 1226: closure\index{closure} ! 1227: mechanism handles linkage very elegantly. ! 1228: ! 1229: Eliminating free variables from compilation units also simplifies the ! 1230: code generator\index{code generator}. ! 1231: Whenever the code generator must analyze local free variables, ! 1232: generate pc\index{pc}-relative references, etc., ! 1233: its job is much simpler because there ! 1234: are no global references to other objects. The interface between the ! 1235: front end of the compiler and the code generator are much cleaner as a result. ! 1236: ! 1237: \section{Linkage to assembly language} ! 1238: \label{linkage} ! 1239: ! 1240: Some primitives of a programming language are best implemented in a different ! 1241: language (typically assembly language\index{assembly language}). ! 1242: Standard ML of New Jersey ! 1243: makes use of 8 functions written in assembly language, and 8 functions written ! 1244: in C\index{C}. ! 1245: (For comparison, more than 100 standard-library\index{standard library} ! 1246: functions are written ! 1247: in ML.) The assembly-language functions are: ! 1248: \begin{enumerate} ! 1249: \item {\bf array\index{array}(n,x)} creates an array of length $n$, each element initialized ! 1250: to $x$. ! 1251: \item {\bf callc\index{callc}(f,a)} calls a C-language function $f$ with argument $a$. ! 1252: \item {\bf create\_b(n)} creates\index{create\_b} a byte-array of length $n$. ! 1253: \item {\bf create\_s(n)} creates\index{create\_s} a string of length $n$. ! 1254: \item {\bf floor\index{floor}(x)} returns the smallest integer less than ! 1255: or equal to $x$. ! 1256: \item {\bf logb\index{logb}(x)} returns the exponent part of (the floating-point) $x$. ! 1257: \item {\bf scalb\index{scalb}(x)} inserts a new exponent into (floating-point) $x$. ! 1258: \item {\bf syscall\index{syscall}(i,args,k)} does operating system ! 1259: \index{operating system} kernel-call $i$ with ! 1260: $k$ arguments. ! 1261: \end{enumerate} ! 1262: The assembly-language functions are located at (constant) addresses within ! 1263: the runtime system. But it's a good idea to follow the rules about ! 1264: free variables\index{free variables} ! 1265: in module-linkages\index{linkage}: even the references to assembly-language ! 1266: primitives should not be ``hard-wired'' addresses in code objects. ! 1267: If this were done, then all ML code would have to be re-compiled whenever ! 1268: the runtime system was re-compiled. ! 1269: ! 1270: Instead, we just treat the assembly-language functions as a special ! 1271: record object containing nine elements. Other modules are parametrized ! 1272: by this \verb"Assembly" module just as if it were an ordinary one. ! 1273: All the assembly-language functions follow the ML calling conventions. ! 1274: ! 1275: Most of the C functions are to provide access to system calls with ! 1276: hard-to-manage interfaces, like \verb"fork", etc. ! 1277: There is an added complication that the C and ML calling conventions ! 1278: don't match. ! 1279: The assembly-language function \verb"callc" arranges the arguments ! 1280: for an ML function to call a C function. The details are uninteresting, ! 1281: but the point is that absolute references are again avoided. ! 1282: ! 1283: \begin{quotation} ! 1284: {\small The file \verb"boot/assembly.sig" contains the signature ! 1285: \verb"Assembly", approximately as in figure~\ref{assembly}. ! 1286: \begin{figure} ! 1287: \label{assembly} ! 1288: \begin{verbatim} ! 1289: signature ASSEMBLY = ! 1290: sig ! 1291: datatype datalist = DATANIL | DATACONS of (string * string * datalist) ! 1292: type func ! 1293: datatype funclist = FUNC of (func * string * funclist) ! 1294: type object ! 1295: structure A : sig ! 1296: val array : int * 'a -> 'a array ! 1297: val callc : 'b (* func*) * 'a -> 'c ! 1298: val create_b : int -> string ! 1299: val create_s : int -> string ! 1300: val floor : real -> int ! 1301: val logb : real -> int ! 1302: val scalb : real * int -> real ! 1303: val syscall : int * string list * int -> int ! 1304: end ! 1305: exception Div ! 1306: exception Float of string ! 1307: exception Interrupt ! 1308: exception Overflow ! 1309: exception SystemCall of string ! 1310: exception UnboundTable ! 1311: val array0 : 'a array ! 1312: val bytearray0 : string ! 1313: val collected : int ref ! 1314: val collectedfrom : int ref ! 1315: val current : string ref ! 1316: val datalist : datalist ! 1317: val external : funclist ! 1318: val gcmessages : int ref ! 1319: val gcprof : string ref ! 1320: val majorcollections : int ref ! 1321: val minorcollections : int ref ! 1322: val opsys : int (* 1 = vax bsd ultrix, 4.2, 4.3 ! 1323: 2 = sunos 3.0, 4.0 ! 1324: 3 = vax v9 (bell labs) *) ! 1325: val pstruct : object ref ! 1326: val ratio : int ref ! 1327: end ! 1328: \end{verbatim} ! 1329: \caption{The {\tt ASSEMBLY} signature} ! 1330: \end{figure} ! 1331: \index{Assembly structure} ! 1332: The substructure \verb"A" is the machine-dependent part (implemented ! 1333: in assembly language), and the other components (exceptions, constants, ! 1334: etc.) are just data structures that can be described machine-independently ! 1335: in C. ! 1336: \verb"A" is arranged as an ML record ! 1337: at the label \verb"runvec"\index{runvec} ! 1338: in the file \verb"VAX.prim.s", \verb"M68.prim.s", ! 1339: etc. The machine-independent part ! 1340: is in \verb"cstruct.c"\index{cstruct.c}. ! 1341: ! 1342: The exceptions in this structure are just those that can be raised ! 1343: from C or assembly language. Exceptions as elaborated by the compiler ! 1344: look like references to strings, and in \verb"cstruct.c" are implemented ! 1345: as initialized structures to mimic this ! 1346: with all the appropriate descriptors. ! 1347: ! 1348: The values \verb"array0" and \verb"bytearray0" are just the array and ! 1349: \index{array0}\index{array of length 0} ! 1350: \index{bytearray0} ! 1351: byte-array of length 0; this is necessary because objects of length 0 ! 1352: are not permitted in the garbage-collectible region. ! 1353: ! 1354: \index{collected}\index{collectedfrom} ! 1355: \index{majorcollections}\index{minorcollections}\index{ratio} ! 1356: The references \verb"collected", \verb"collectedfrom", ! 1357: \verb"majorcollections", and \verb"minorcollections" are updated by ! 1358: the garbage collector with performance information. The \verb"ratio" ! 1359: variable can be set from ML to tell the garbage collector what ! 1360: ratio to maintain between heap size and amount of live data. ! 1361: ! 1362: \index{current}\index{profiling} ! 1363: The variable \verb"current" is used in execution profiling\cite{appel88:prof}; ! 1364: it is a pointer ! 1365: to an array of length 2 of ML integers. When profiling is enabled, ! 1366: a timer interrupt will periodically ! 1367: cause current[1] to be incremented. ! 1368: ! 1369: \index{pstruct} ! 1370: \index{loader} ! 1371: \index{Initial} ! 1372: The \verb"pstruct" is a pointer to the \verb"Initial" (pervasive) ! 1373: structure. The ML loader (\verb"boot/loader.sml") needs this ! 1374: because other modules may reference the structure \verb"Initial". ! 1375: ! 1376: \index{datalist} ! 1377: The \verb"datalist" is used for linkage to sharable compiled ML code. ! 1378: In vanilla Unix, such code must be link-loaded into the text segment ! 1379: of the sml executable file, and the \verb"datalist" provides the ML ! 1380: loader a way to get to it. Each element of the data list contains ! 1381: two strings (the name of the module, and the executable code), along ! 1382: with a link to the next element. ! 1383: ! 1384: \index{funclist} ! 1385: The \verb"funclist" is similarly used for linkage to functions implemented ! 1386: in C. These functions could be described in the Assembly signature ! 1387: but this requires recompilation whenever a new function is added. ! 1388: The funclist is terminated by a function whose name is "xxxx". All functions ! 1389: in the list must have a name of exactly 4 characters. ! 1390: All C-functions must take exactly one argument in ML format and return ! 1391: a result in ML format. The functions implemented in C at this writing are: ! 1392: \begin{itemize} ! 1393: \item{\bf fion\index{fion}} tells how many characters may be read from an open ! 1394: file-descriptor without blocking. ! 1395: \item{\bf fork\index{fork}} does a Unix process fork. ! 1396: \item{\bf prof\index{prof}} is obsolete. ! 1397: \item{\bf syst\index{syst}} executes a shell command as a sub-process. ! 1398: \item{\bf time\index{time}} gets execution-time information from the operating system ! 1399: and the garbage collector. ! 1400: \item{\bf argv\index{argv}} gets the command-line arguments of the executing program ! 1401: (as a list of strings). ! 1402: \item{\bf envi\index{envi}} gets the Unix environment\index{environment} ! 1403: string (as a list of strings). ! 1404: \item{\bf blas\index{blas}} does a structured\index{structured write} write (as described section~\ref{structured}). ! 1405: \item{\bf salb\index{salb}} does a structured read. ! 1406: \end{itemize} ! 1407: ! 1408: These functions should be careful not to allocate memory using \verb"malloc". ! 1409: \index{malloc} ! 1410: ! 1411: All of these functions are called from the ML ``thread'' (see ! 1412: section~\ref{fast}). ! 1413: Some C functions may require the (temporary) suspension of the ML thread, ! 1414: e.g. to do a garbage collection for a structured write. ! 1415: By setting the global variable \verb"cause"\index{cause} to a nonzero ! 1416: value and then returning, a function may cause control to be grabbed ! 1417: immediately thereafter in the thread\index{thread} controller. ! 1418: } ! 1419: \end{quotation} ! 1420: ! 1421: \section{Access to operating system services} ! 1422: \label{access} ! 1423: ! 1424: A typical operating system\index{operating system} ! 1425: provides many services, each with one or ! 1426: more operations (``system calls''). A very early version of Standard ML of ! 1427: New Jersey made little use of these services, so it sufficed to write ! 1428: an assembly-language interface to each desired system call, and then ! 1429: call the assembly-language functions from ML programs. As the ML environment ! 1430: grew more sophisticated, it needed more operating system services, so the ! 1431: number of assembly-language interface functions grew. This eventually ! 1432: became intolerable. ! 1433: ! 1434: Now there is just one assembly-language interface function, \verb"syscall", ! 1435: \index{syscall} ! 1436: that takes a list of arguments and a system-call number. This function ! 1437: pushes the arguments onto the stack and makes the system call. The ML standard ! 1438: library hides the \verb"syscall" function behind a variety of typechecked ! 1439: functions. Implementing these functions in ML rather than assembly language ! 1440: is better for two reasons: ML is safer and easier to program in, and (more ! 1441: important) the ML code doesn't need to be rewritten for each target ! 1442: architecture. ! 1443: ! 1444: \begin{quotation} ! 1445: {\small All Unix system calls return integer values. These are ! 1446: converted into ML integers (by shifting and incrementing) and returned ! 1447: as the result of \verb"syscall". Many system calls take parameters ! 1448: by reference that they stuff results into. These results are (of course) ! 1449: not appropriately tagged for ML. A byte-array (of the ! 1450: appropriate length) is used ! 1451: as the actual parameter for such an argument, and then ! 1452: the results can be extracted from the byte-array after the system call. ! 1453: ! 1454: Integer arguments to system calls are shifted right by \verb"syscall" ! 1455: to convert them into their (untagged) representation. ! 1456: ! 1457: String and byte-array arguments are left alone, yielding their natural ! 1458: representation. However, strings\index{string} ! 1459: in C (and Unix) are terminated ! 1460: by a null character, whereas strings in ML have a length prefix. ! 1461: The prefix is ignored by the kernel, since it occurs at offset -4 (bytes). ! 1462: To achieve null termination of strings, you may take the actual ML argument ! 1463: and concatenate the string \verb|"\000\000"| of two zero characters (two ! 1464: characters are required because of the unboxed representation of ! 1465: single-character strings, and the possibility that the actual argument ! 1466: is an empty string). ! 1467: } ! 1468: \end{quotation} ! 1469: \section{Constructor representations} ! 1470: \label{constructor} ! 1471: Those unfamiliar with ML might wish to skip this section. ! 1472: ! 1473: Some of the constructors\index{constructor} ! 1474: in an ML datatype\index{datatype} may not carry values. ! 1475: For example, in the type ! 1476: \begin{verbatim} ! 1477: datatype 'a option = NONE | SOME of 'a ! 1478: \end{verbatim} ! 1479: the \verb"SOME" constructor carries value of type \verb"'a" ! 1480: and the ! 1481: \verb"NONE" constructor carries no value. Constructors that carry no value ! 1482: are represented not as two-element records, but as (unboxed) small integers. ! 1483: In this case, \verb"NONE" ! 1484: is represented by the integer 0 (with a low-order 1 tag bit), ! 1485: and \verb"SOME(x)" is represented by a two-element record containing the value ! 1486: \verb"x" and a small integer representing the \verb"SOME" constructor. ! 1487: The ML program can distinguish which constructor has been used by ! 1488: testing for ``boxity:''\index{boxity} ! 1489: \verb"NONE" is an unboxed value\index{unboxed value}, \verb"SOME(x)" ! 1490: is a boxed value\index{boxed value}, ! 1491: and the low-order bit will distinguish them. ! 1492: ! 1493: Finally, in the case that there is only one value-carrying constructor, ! 1494: and the value carried always has a boxed representation, the extra ! 1495: indirection record can be eliminated. Thus, for the {\bf list} datatype ! 1496: \begin{verbatim} ! 1497: datatype 'a list = NIL | CONS of ('a * 'a list) ! 1498: \end{verbatim} ! 1499: the constructor \verb"NIL" can be represented as the unboxed integer 0, the ! 1500: constructor \verb"CONS(a,b)" can be represented as a two-element record ! 1501: containing \verb"a" and \verb"b". Again, the low-order ``boxity'' bit ! 1502: distinguishes the constructors. If there had been several value-carrying ! 1503: constructors, then the boxity bit alone could not distinguish them, ! 1504: and an indirection record (containing the carried value and a small ! 1505: integer denoting the constructor) would be required. ! 1506: ! 1507: It is tempting to play more elaborate tricks with the representation of ! 1508: constructors. For example, the {\em option} datatype described above ! 1509: might not seem to need an extra indirection; surely the boxity bit ! 1510: should be enough to distinguish between \verb"NONE" and \verb"SOME(x)"? ! 1511: The problem is that the value \verb"x" might itself be either boxed or ! 1512: unboxed; the indirection guarantees that \verb"SOME(x)" will indeed be boxed. ! 1513: ! 1514: Similarly, it is tempting to use a clever representation for the datatype ! 1515: \begin{verbatim} ! 1516: type t = a * b ! 1517: datatype atom = NUMBER of int | RECORD of t ! 1518: \end{verbatim} ! 1519: Here, the values carried by the NUMBER constructor are always unboxed, and ! 1520: the values carried by the RECORD constructor are always boxed; the ! 1521: boxity can serve to distinguish them. ! 1522: ! 1523: The problem with these schemes is caused by the abstraction provided by ! 1524: ML {\em functors\index{functor}}: ! 1525: a functor might take the atom datatype as a parameter ! 1526: with some of its structure hidden: ! 1527: \begin{verbatim} ! 1528: functor F(sig type t ! 1529: datatype atom = NUMBER of int | RECORD of t ! 1530: end) ! 1531: \end{verbatim} ! 1532: and here it is not at all clear that \verb"t" is always boxed. ! 1533: (Unfortunately, this problem applies to the ``clever'' representation of the {\em list} ! 1534: datatype as well, but we have chosen to ignore it there---we can ! 1535: detect the (rare) problems with partially-abstracted lists ! 1536: as functor parameters and give error messages, but we didn't want ! 1537: to use an extra indirection in the representation of lists.) ! 1538: The problem of functor/datatype interaction ! 1539: could be considered a defect in the language design. ! 1540: ! 1541: \begin{quotation} ! 1542: {\small Note that the constructor tag is element 1 of the record and ! 1543: the value is element 0. There's no particular reason for this; it's ! 1544: probably a bad idea. For exception constructors, the constructor ! 1545: tag is not a small integer, it is a \verb"string ref". A ref cell is ! 1546: used instead of just a string so that (generative) ! 1547: exceptions may be compared ! 1548: for equality in pattern-matching exception\index{exception} constructors. ! 1549: } ! 1550: \end{quotation} ! 1551: \section{Suspending a process} ! 1552: \label{suspending} ! 1553: \index{exportML} ! 1554: Having designed a simple layout of runtime objects, we can now implement ! 1555: easily a variety of runtime services. ! 1556: ! 1557: A convenient feature of a programming environment is the ability to save ! 1558: the current state of the computation in a file, so that by executing ! 1559: that file on a later date the computation can be resumed. In an interactive ! 1560: system, one might wish to compile and load some programs, and then ! 1561: ``save the world'' so that in the next session these programs don't ! 1562: have to be re-loaded. This feature could also be useful for program ! 1563: checkpoints. ! 1564: ! 1565: It's not too hard to suspend a process in this way. It's even possible ! 1566: to make the resulting ! 1567: file an ordinary executable file. An executable file ! 1568: contains a header, a text segment, and a data segment. The (read-only) ! 1569: text segment ! 1570: of the saved file will be identical to the text segment of the currently ! 1571: executing process, so a single \verb"write" system call will serve to ! 1572: write it to the file. The data segment of the file will consist of the ! 1573: original data segment of the process plus any newly allocated heap memory; ! 1574: again, one or two \verb"write"s will put it neatly into the file. ! 1575: (It is useful to do a garbage collection immediately before saving, ! 1576: to minimize the size of the file.) ! 1577: Finally, the header can be synthesized appropriately. ! 1578: ! 1579: An early version of Standard ML of New Jersey used a runtime ! 1580: stack\index{stack}, and ! 1581: we had no end of trouble getting the stack saved correctly. A Unix executable ! 1582: file doesn't have a stack segment, so the stack had to be copied into the ! 1583: heap before saving. And after restoring and copying back to the stack ! 1584: segment, we found that we had creamed the (new process's) command-line ! 1585: arguments, etc. When we eliminated the runtime stack, these annoyances ! 1586: ceased. ! 1587: ! 1588: \begin{quotation} ! 1589: {\small ! 1590: \index{magic number} ! 1591: The resulting executable file has a ZMAGIC magic number on ! 1592: Berkeley Unix, and NMAGIC on SunOs 4.0. This is because ZMAGIC ! 1593: implies dynamic loading of shared libraries on SunOs 4.0, which complicates ! 1594: the address map. Creating executable files is annoyingly difficult ! 1595: in operating systems like SunOs and Mach, whose notion of a process ! 1596: address map is not as simple as in Berkeley Unix. ! 1597: } ! 1598: \end{quotation} ! 1599: \section{Structured writes and reads} ! 1600: \label{structured} ! 1601: ! 1602: \index{structured write} ! 1603: It's often necessary to take a data structure (with many records, pointers, ! 1604: strings, etc.) and write it to a file in binary form so it can be read ! 1605: back in quickly. It's always possible to do this in an {\em ad hoc} way ! 1606: for each different kind of data structure that has to be written, but ! 1607: it's sometimes difficult to get the pointer-sharing relations right. ! 1608: Since the runtime data format has enough information for the garbage-collector ! 1609: to traverse the structure, then a writer/reader must be able to traverse ! 1610: it as well. ! 1611: ! 1612: In fact, we can view the saving of such a structure ! 1613: as just a garbage collection. ! 1614: The argument to the \verb"structured-write" function is just a root pointer, ! 1615: and \verb"structured-write" must traverse all the data accessible from this ! 1616: root. In doing so, it must copy the data to a file, being careful ! 1617: to make only one copy of each record. ! 1618: ! 1619: \index{garbage collection} ! 1620: We can, in fact, just invoke the \verb"gc" procedure, giving as the ! 1621: \verb"roots" argument just a singleton vector---the pointer that ! 1622: was handed to \verb"structured-write". Then, the resulting {\em tospace} ! 1623: will be exactly the desired contents of the file, which can be written ! 1624: out in a single operation. ! 1625: ! 1626: At this point, however, memory is in a bit of a mess; some objects have ! 1627: been forwarded and others have not. One solution is just to finish the ! 1628: garbage collection: each of the ``normal'' roots of the data is ! 1629: forwarded, and then (again) all the remaining unforwarded pointers in ! 1630: {\em tospace} are forwarded. Then memory is again consistent, and execution ! 1631: may continue. ! 1632: ! 1633: This is unattractive because the time required to do a structured write ! 1634: is now proportional to the amount of all live data, not proportional ! 1635: to the amount written. So a different solution is better: keep ! 1636: a separate list of ! 1637: repair\index{repair} ! 1638: information about the {\em fromspace} objects that were overwritten with ! 1639: forwarding pointers. Whenever a {\em fromspace} object is forwarded, ! 1640: two words of information must be kept about it: the location of the ! 1641: object itself, and ! 1642: the previous value ! 1643: of its first word (that was overwritten). Then, repair is easy; ! 1644: the list of repair information can be traversed, and each {\em fromspace} ! 1645: object is restored to its original state. ! 1646: The descriptor can be recovered by copying the descriptor of the ! 1647: {\em tospace} copy (accessed via the forwarding pointer), and the first ! 1648: word (which had been overwritten by the forwarding pointer) can then ! 1649: be recovered from the repair information. ! 1650: Saving the repair information is fast, and performing the repairs is also ! 1651: fast. The time to do a structured write is now proportional (with a small ! 1652: constant of proportionality!) to the amount written. ! 1653: ! 1654: There's one last problem. Where do we keep the repair information, and ! 1655: what if it grows too large? The repair information can be kept at the ! 1656: far end of {\em tospace}. Then, as long as the amount written is fairly small, ! 1657: we'll never run into it. And if we do run into it, that means that ! 1658: the amount written must be at least half the size of {\em tospace} (where {\em tospace} ! 1659: is always at least the size of the total accessible data). What we can do ! 1660: in this case is just to toss away the repair information, and go ahead ! 1661: with a garbage collection! ! 1662: ! 1663: \begin{quotation} ! 1664: {\small In a multi-generation system (as in SML-NJ's runtime system), ! 1665: we do a (cheap) minor collection before the structured write; the collection ! 1666: referred to in the paragraphs above is a major collection. ! 1667: } ! 1668: \end{quotation} ! 1669: \section{Buffered input/output} ! 1670: \label{buffered} ! 1671: ! 1672: Some operating systems (like Unix) don't provide buffered\index{buffered I/O} ! 1673: input and output ! 1674: as a primitive, and even in operating systems that do, the overhead ! 1675: of one system call per character may be too high. What the runtime ! 1676: system can do is provide this service for user programs. ! 1677: ! 1678: The Unix/C\index{C} standard library\cite{kernighan78} demonstrates that ! 1679: \index{standard library} ! 1680: the runtime system need not provide this service; instead, library ! 1681: routines can implement buffered I/O. We have taken the same approach ! 1682: in our system; the buffered I/O functions are implemented in ML as library ! 1683: functions, and the runtime system is completely ignorant of buffered I/O. ! 1684: ! 1685: ! 1686: \section{Execution profiling} ! 1687: \label{execution} ! 1688: ! 1689: \index{profiling} ! 1690: In large software systems it can become difficult to tell where ! 1691: the inefficiencies are. Execution profilers can help with this process. ! 1692: A typical profiler might provided information about the amount of time ! 1693: spent in each procedure and the number of times it was called. ! 1694: ! 1695: Standard ML of New Jersey has an execution profiler\cite{appel88:prof}, ! 1696: which will only briefly be described here. Call counts are handled ! 1697: completely independently of the runtime system; the compiler inserts ! 1698: program variables that are incremented on entry to procedures, and the ! 1699: profiler examines these variables to generate call-count information. ! 1700: ! 1701: Execution-time estimates are more difficult. The Unix \verb"prof" system ! 1702: call starts a timer interrupt in the operating system that periodically ! 1703: samples the program's program counter. An affine function is applied ! 1704: to the PC and the corresponding element of a ``histogram'' array is ! 1705: incremented. Then the histogram can be compared against the object ! 1706: file to see how many interrupts occurred in each procedure, which gives ! 1707: a good estimate of actual time spend in each procedure. ! 1708: ! 1709: With a garbage collector that moves procedures around in memory, this ! 1710: method becomes unwieldy. Instead, the runtime system has a variable ! 1711: \verb"current" that points at a sample-count cell for the currently ! 1712: executing procedure. Whenever (profiled) code begins executing a procedure ! 1713: (or returns from a call), it assigns the address of the new procedure's ! 1714: \index{current} ! 1715: sample-count variable into \verb"current" (the compiler inserts code to do ! 1716: this). The timer interrupt, instead ! 1717: of examining the PC, just increments the cell pointed to by \verb"current". ! 1718: ! 1719: This very simple mechanism does not greatly complicate the runtime system, ! 1720: but provides profiling information with a relatively low overhead. ! 1721: ! 1722: \section{Debugging} ! 1723: \label{debugging} ! 1724: ! 1725: We are currently developing a high-level ! 1726: debugger\index{debugger} based on the same principles as our ! 1727: profiler---namely, minimal interaction with the runtime system. ! 1728: The work is still in its early stages, but we believe that ! 1729: it is possible to make a debugger this way, and that the runtime system ! 1730: won't have to grow much in size to support it. A detailed description ! 1731: of our plans is beyond the scope of this paper. ! 1732: ! 1733: \section{Handling of interrupts and exceptions} ! 1734: \label{handling} ! 1735: ! 1736: The ML language has an exception\index{exception}-handling ! 1737: mechanism with dynamically-nested ! 1738: handlers. Some exceptions are raised by the program itself, some are related ! 1739: to synchronous hardware exceptions (like floating point errors), and ! 1740: some are generated asynchronously (like the \verb"Interrupt" exception ! 1741: raised when the user presses the interrupt key). ! 1742: ! 1743: The ML program has a ``current-exception-handler'' register that just ! 1744: contains a pointer to a continuation\index{continuation} ! 1745: (i.e. a closure). ! 1746: Raising an exception just ! 1747: corresponds to invoking this continuation with the exception-object ! 1748: as an argument. Dynamic nesting of handlers is handled completely by the ! 1749: compiler: it generates code that saves the previous handler (in the ! 1750: new handler's closure) and restores it upon exit from the scope of the ! 1751: new handler. ! 1752: ! 1753: This makes it particularly easy for the programming language's exception ! 1754: mechanism to be attached to the hardware's notion of fault or interrupt. ! 1755: When a fault or interrupt\index{interrupt} ! 1756: occurs, the operating system calls a special ! 1757: (assembly-language) ! 1758: function in the runtime system. This function then determines what ! 1759: caused the fault, clears it if necessary, then {\em does not} return to the ! 1760: operating system but just invokes the current-exception-handler continuation ! 1761: on the appropriate exception object. ! 1762: ! 1763: \section{Fancier garbage collectors and object boundaries} ! 1764: \label{fancier} ! 1765: ! 1766: Our current system uses a good, simple generational garbage ! 1767: collector\cite{appel89:sggc}. Generation garbage ! 1768: collection\cite{lieberman83}\cite{ungar84} is based on the observations ! 1769: that newer records tend to become garbage more quickly, so that the ! 1770: collector should concentrate its effort on the newer records; and that ! 1771: newer records tend to point to older records, but not vice versa, so that ! 1772: the older records need not be searched for roots into the newer area. ! 1773: ! 1774: Sometimes, however, older records are modified (by a \verb"rplaca" operation ! 1775: in Lisp or an assignment operation in ML) ! 1776: to point to newer ones. ! 1777: Generational garbage collectors require ! 1778: the maintenance of a list of all modified records. ! 1779: To ensure portability, ! 1780: we use a very simple scheme to maintain this list: the compiler ! 1781: inserts instructions after each \verb"update" operation to put the ! 1782: address of the updated cell on a special list for the garbage collector ! 1783: to use. ! 1784: ! 1785: \index{page fault}\index{virtual memory}\index{garbage collector} ! 1786: A clever trick is to use the virtual memory system in maintaining this list. ! 1787: It is only updates to records in older generations that must be remembered; ! 1788: we can just make all the memory occupied by older generations read-only, ! 1789: and an update will cause a page fault. The fault-handler can then put the ! 1790: page on a list of updated pages, and mark the page as writeable. ! 1791: During garbage collection, the entire page can be searched for ! 1792: roots into the newer region. ! 1793: ! 1794: A different virtual-memory trick can be used to achieve concurrent garbage ! 1795: collection\cite{appel88:gc}. By making the pages of the {\em tospace} inaccessible ! 1796: until they have been scanned and forwarded, we can allow the ML program ! 1797: to continue executing even while the garbage collector is running. ! 1798: An access to an unscanned page of {\em tospace} causes a page fault, and the ! 1799: fault handler scans that page and makes it accessible. ! 1800: Then the latency is bounded by the product of the page size and the ! 1801: record size (in practice, to just a few milliseconds). ! 1802: This is very desirable in a real-time or interactive system. ! 1803: ! 1804: Both of these algorithms require that pages be scanned (and forwarded) ! 1805: out of order. When objects cross page boundaries, it is difficult ! 1806: to know where the first object on a page begins, so that its descriptor ! 1807: may be found. There are solutions to this problem that involve keeping ! 1808: track of objects that cross page boundaries\cite{appel88:gc}, but the ! 1809: SML-NJ runtime system has a runtime data format that allows a very ! 1810: simple solution. ! 1811: ! 1812: Suppose we need to scan a particular page (to forward all its pointers). ! 1813: When we start at the beginning of the page, we don't know whether ! 1814: we are in the middle of a record or of a string; and we don't know ! 1815: where the descriptor for the object is. ! 1816: But if we knew that the page contained only records, then we wouldn't need ! 1817: to find the descriptor; ! 1818: the important thing about records is that their boundaries can be ignored. ! 1819: If a page consists of the last part of one record, followed by several ! 1820: complete records, followed by the first part of the last record, the ! 1821: pointers in all of those records can be scanned and forwarded ! 1822: without regard to the boundaries. Since the descriptors are unboxed ! 1823: (they look like integers), they won't be touched; and all the other ! 1824: fields are tagged pointers or integers which will be forwarded (or not) ! 1825: appropriately. (See figure~\ref{boundaries} for an illustration.) ! 1826: \begin{figure}[htbp] ! 1827: \label{boundaries} ! 1828: \begin{verbatim} ! 1829: +------+--object boundary ! 1830: | desc | ! 1831: +------+ ! 1832: | o------------> ! 1833: -----+------+-----page boundary ! 1834: | 5 | ! 1835: +------+--object boundary ! 1836: | desc | ! 1837: +------+ ! 1838: | o------------> ! 1839: +------+ ! 1840: | o------------> ! 1841: +------+--object boundary ! 1842: | desc | ! 1843: +------+ ! 1844: | 7 | ! 1845: +------+ ! 1846: | o------------> ! 1847: -----+------+-----page boundary ! 1848: | 6 | ! 1849: +------+ ! 1850: | o------------> ! 1851: +------+--object boundary ! 1852: \end{verbatim} ! 1853: \caption{Object boundaries may be ignored by the forwarding algorithm} ! 1854: \end{figure} ! 1855: ! 1856: And if we knew that the page contained only strings, then it wouldn't ! 1857: need to be scanned at all, since strings contain no pointers. ! 1858: ! 1859: Thus, a simple arrangement that's useful when pages need to be scanned ! 1860: in arbitrary order is ! 1861: to segregate pointer-containing objects (records and arrays) from ! 1862: non-pointer-containing objects (strings and byte-arrays). This isn't necessary ! 1863: for the newly-allocated objects; the compiler can continue to maintain ! 1864: just one register that points at the beginning of the free space. But ! 1865: the copying garbage collector (the functions \verb"gc" and \verb"forward") ! 1866: must copy records and strings into different parts of memory. ! 1867: This greatly simplifies the implementation of page-based garbage collection ! 1868: algorithms. ! 1869: ! 1870: \section{Size of the SML-NJ runtime system} ! 1871: \label{size} ! 1872: ! 1873: Our runtime system is implemented in 1548 lines of C and 232 lines of ! 1874: assembly language (the assembly language must be duplicated for each ! 1875: target architecture). This is small indeed, considering the functionality ! 1876: of our Standard ML system. We have made it small by moving as much as possible ! 1877: out of the runtime system, and by having a simple format for ML data. ! 1878: ! 1879: This 1780 lines of code can be divided approximately as follows: ! 1880: \begin{enumerate} ! 1881: \item[291] lines of C to initialize and link the Standard ML library ! 1882: and loader, which then loads the rest of the system. ! 1883: {\tiny (run.c)} ! 1884: \item[299] lines to implement the copying garbage collector with repairs. ! 1885: {\tiny (gc.c)} ! 1886: \item[55] lines of C to implement the simulated process-switching required ! 1887: for garbage-collector access to ML registers. ! 1888: {\tiny (callgc.c)} ! 1889: \item[400] lines for the management of generational garbage collection, ! 1890: and for deciding when to ask the operating system for more memory. ! 1891: {\tiny (callgc.c)} ! 1892: \item[304] lines to implement C-language functions and data structures ! 1893: used as primitives by the ML program. ! 1894: {\tiny (cstruct.c)} ! 1895: \item[117] lines of C to implement the suspending of process states into ! 1896: Unix executable files. ! 1897: {\tiny (export.c)} ! 1898: \item[72] lines of utility functions for C functions that manipulate ML ! 1899: objects. ! 1900: {\tiny (objects.c)} ! 1901: \item[38] lines of assembly language to handle simulated process switching. ! 1902: {\tiny (*.prim.s)} ! 1903: \item[195] lines of assembly language to implement primitive functions ! 1904: callable by ML programs. ! 1905: {\tiny (*.prim.s)} ! 1906: \end{enumerate} ! 1907: For comparison, the Icon runtime system\cite{griswold86} (excluding ! 1908: its interpreter) is 18,000 lines of C; the T3\cite{kranz86} runtime system is ! 1909: 1,900(?) lines of C and assembly language; the FranzLisp runtime system ! 1910: (including interpreter) ! 1911: is 19,500 lines of C; the Poly/ML runtime system is ??? lines of ???. ! 1912: ! 1913: \section{Conclusion} ! 1914: \label{conclusion} ! 1915: ! 1916: It's easy to implement a hi-tech runtime system if you do it cleanly. ! 1917: ! 1918: \section{Acknowlegements} ! 1919: ! 1920: David MacQueen, Trevor Jim, and Bruce Duba have all worked on the runtime ! 1921: system, and their assistance has been valuable. Norman Ramsey provided ! 1922: many useful suggestions about the organization of this paper. ! 1923: ! 1924: %\cleardoublepage ! 1925: \bibliographystyle{plain} ! 1926: \bibliography{appel} ! 1927: %\cleardoublepage ! 1928: %\printindex ! 1929: \end{document}
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.