|
|
1.1 root 1: .EQ
2: delim $$
3: define |-> ' down 3 size -3 | back 25 -> '
4: .EN
5: .nr PS 11
6: .nr VS 13
7: .fp 5 CW
8: .ND August 5, 1987
9: .TL
10: A Standard ML Compiler
11: .AU
12: Andrew W. Appel \fR*\fP
13: .FS
14: * Supported by NSF Grant DCR-8603453 and by
15: a Digital Equipment Corporation Faculty Incentive Grant.
16: .FE
17: .AI
18: Dept. of Computer Science
19: Princeton University
20: Princeton, NJ 08544
21: .AU
22: David B. MacQueen \fR\(dg\fP
23: .FS
24: \(dg Part of this author's work was done while an SERC Senior Visiting
25: Fellow at the University of Edinburgh.
26: .FE
27: .AI
28: AT&T Bell Laboratories
29: Murray Hill, NJ 07974
30: .AB
31: .LP
32: Standard ML is a major revision of earlier dialects of the functional
33: language ML. We describe the first compiler written for Standard ML
34: in Standard ML. The compiler incorporates a number of novel features
35: and techniques, and is probably the largest system written to date in
36: Standard ML.
37: .LP
38: Great attention was paid to modularity in the construction of the
39: compiler, leading to a successful large-scale test of the modular
40: capabilities of Standard ML. The front end
41: is useful for purposes other than compilation, and the back end is
42: easily retargetable (we have code generators for the VAX and MC68020).
43: The module facilities of Standard ML were taken into account early in
44: the design of the compiler, and they particularly influenced the
45: environment management component of the front end. For example, the
46: symbol table structure is designed for fast access to opened
47: structures.
48: .LP
49: The front end of the compiler is a single phase that integrates
50: parsing, environment management, and type checking. The middle end
51: uses a sophisticated decision tree scheme to produce efficient pattern
52: matching code for functions and case expressions.
53: The abstract syntax produced by the front end is translated into a simple
54: lambda-calculus-based intermediate representation that lends itself to
55: easy case analysis and optimization in the code generator. Special
56: care was taken in designing the runtime data structures for fast
57: allocation and garbage collection.
58: .LP
59: We describe the overall organization of the compiler and present some
60: of the data representations and algorithms used in its various phases.
61: We conclude with some lessons learned about the ML language itself and
62: about compilers for modern functional languages.
63: .AE
64: .pn 1
65: .nr VS 17
66: .np
67: .NH
68: Introduction
69: .LP
70: The ML language is a typed functional language roughly based on
71: Landin's ISWIM [1].
72: It was originally designed in the
73: mid-1970s as the metalanguage of Edinburgh LCF, a machine-assisted
74: reasoning system, and its features were in part motivated by its
75: intended use to express proof tactics. However, these features made
76: it an attractive vehicle for general purpose symbolic programming, and
77: it wasn't long before free-standing implementations appeared, such as
78: Cardelli's compiler[2],
79: which was written in Pascal. In 1983 a
80: group of interested parties began work on an extensive revision of the
81: language design that lead to Standard ML [3,\|4,\|5,\|6].
82: Standard ML
83: extended the earlier versions in certain ways and incorporated ideas
84: from Hope[7],
85: another language developed in Edinburgh in the late 1970s.
86: It also included a module facility that significantly extends the
87: basic polymorphic type system of earlier ML versions and supports
88: large-scale program development.
89:
90: Several implementations of Standard ML have been under development in
91: recent years.
92: Luca Cardelli's compiler was modified to be partially
93: compatible with Standard ML. Meanwhile, Cardelli's original compiler
94: had been reimplemented in its own variant of ML by Kevin Mitchell and
95: Alan Mycroft at Edinburgh, and this compiler was also modified with
96: the help of Robert Harper to make it conform fairly closely to the
97: Standard ML definition. A new compiler was developed in Cambridge by
98: David Matthews, using his Poly language [8,\|9]
99: as the implementation
100: language, and sharing the Poly back end.
101: At INRIA, a group headed by Gerard Huet and Guy Cousineau
102: have been implementing an ML variant called CAML[10]
103: that is intermediate
104: between the LCF version and Standard ML.
105:
106: Yet there was still justification for another Standard ML
107: compiler. One of the main motivations was to build a compiler that
108: would itself be implemented in Standard ML. Such a compiler would
109: have several advantages. It would be a good basis for building
110: meta-level tools (pretty-printers, program analyzers, debuggers, etc.)
111: that could share data types and code with the compiler itself and
112: might be used in ML application programs or in an ML programming environment.
113: It would also serve as a large-scale test of the Standard ML design,
114: particularly the module facility, which was the most novel and untested
115: part of the language. Finally, a new compiler provides an
116: opportunity to try out new implementation strategies (for code
117: generation, optimization, type checking, and modules, for instance).
118: A simple lambda-calculus intermediate representation
119: makes the compiler much cleaner, and could also
120: provide a base for
121: research in evaluation algorithms (call-by-value, normal-order, lazy,
122: combinator-based, parallel, and others) for lambda-calculus.
123:
124: As a call-by-value, higher-order functional language, many of the
125: conventional compilation techniques for functional languages apply to
126: Standard ML. The distinguishing features of ML are its polymorphic
127: type system, the notion of concrete data types for expressing disjoint
128: unions and recursion, the use of pattern matching based on the
129: constructor-functions associated with data types, the use of typed,
130: value-carrying exceptions, the fact that all updateable variables are
131: introduced explicitly by application of the
132: .B ref
133: constructor, and
134: modular programming based on signatures (interface specifications),
135: structures (implementations of interfaces) and functors (functions
136: from structures to structures). These features, singly or in combination,
137: produce novel challenges in compiler design. They also, it turns
138: out, can be effectively exploited in the writing of a compiler.
139:
140: The compiler was written using the Edinburgh Standard ML compiler and
141: then bootstrapped to compile itself. However, the compiler shares
142: no code with previous compilers; and the runtime system,
143: written in C with the help of Peter Weinberger, is also completely
144: new.
145: .NH
146: Architecture
147: .LP
148: The ML compiler described here consists of three phases. The front
149: end consists of a recursive descent parser into which environment
150: management (determining scopes, associating binding and applied
151: occurrences of variables) and type checking are integrated. The
152: output of the front end is a fully typed abstract syntax tree.
153: The front end makes no decisions about runtime
154: locations of variables. It simply assigns them unique identifiers,
155: and the back end makes representation decisions.
156:
157: The middle end performs a simple translation of this abstract tree
158: into an untyped lambda calculus representation,
159: and produces an optimized representation of pattern-matching as one-dimensional
160: switch expressions.
161:
162: The back end transforms the lambda
163: form into target machine code using a largely machine-independent code
164: generator based on an ``abstract machine'' interface. The expressions
165: given to the back end never contain free variables; this simplifies
166: the interface between the code generator and the runtime system.
167: .NH
168: The Front End
169: .LP
170: The front end of the compiler consists of a conventional lexical
171: analyzer, abstract syntax datatypes, environment management machinery,
172: a type checker, and a recursive descent parser that drives the lexical
173: analysis, environment management and type checking.
174: .NH 2
175: Abstract syntax
176: .LP
177: The abstract syntax is implemented in two layers. There is a
178: collection of data types defining the ``bare'' abstract syntax
179: that forms a minimal kernel of the language. For instance, the
180: \fIexpression\fP data type is defined as follows:
181: .DS
182: .ft CW
183: .vs 13
184: datatype exp = VARexp of var ref
185: | CONexp of datacon
186: | INTexp of int
187: | REALexp of real
188: | STRINGexp of string
189: | RECORDexp of (numberedLabel * exp) list
190: | SEQexp of exp list
191: | APPexp of exp * exp
192: | CONSTRAINTexp of exp * ty
193: | HANDLEexp of exp * handler
194: | RAISEXexp of exp
195: | LETexp of dec * exp
196: | CASEexp of exp * rule list
197: | FNexp of rule list
198: .vs 17
199: .ft R
200: .DE
201: The full abstract syntax consists of these bare syntax types augmented
202: with a small number of derived forms, such as clausal function definitions.
203: These derived forms are implemented as functions that generate the
204: appropriate expansion into bare syntax forms.
205: .NH 2
206: Environment management
207: .LP
208: We separate the idea of symbol-table manipulation
209: from the details of the kinds of bindings found in ML.
210: There is a generic environment mechanism that performs binding,
211: lookup, and scope management functions; it
212: is implemented as a functor that takes the
213: binding type as a parameter.
214: The binding type is a disjoint union of the specific categories of identifiers
215: (variables, constructors, type constructors, etc.):
216: .DS
217: .ft CW
218: .vs 13
219: datatype binding
220: = VARbind of var \fI(* variables *)\fP
221: | CONbind of datacon \fI(* data constructors *)\fP
222: | EXNbind of datacon \fI(* exceptions *)\fP
223: | TYCbind of tycon ref \fI(* type constructors (patchable) *)\fP
224: | TYVbind of tyvar \fI(* type variables *)\fP
225: | SIGbind of signatureVar \fI(* signatures *)\fP
226: | STRbind of structureVar \fI(* structures *)\fP
227: | FCTbind of functorVar \fI(* functors *)\fP
228: | FIXbind of fixity \fI(* infix attributes of variables *)\fP
229: .vs 17
230: .ft R
231: .DE
232: The generic environment uses hash tables, which we call binding tables
233: or simply tables, to map symbols to bindings. The environment
234: consists roughly of a stack of tables to hold the actual bindings and
235: a stack of ``remarks'' that records information about bindings and
236: scopes. Only the stack of tables is used when looking up symbols,
237: while the remarks are used when entering and leaving scopes and when
238: binding symbols. There are two types of tables in the stack of
239: tables: those representing the bindings in a previously defined
240: structure that has been ``opened'', and those representing bindings in
241: currently open scopes (i.e. top-level, let-bound, and function
242: parameter bindings).
243:
244: As indicated above, the binding type is a union of all the
245: various kinds of bindings a symbol may have. The tables therefore
246: hold bindings of all kinds, and the specialized access functions for
247: variables, constructors, etc. search only for bindings of the desired
248: kind. Thus it is possible for a symbol to simultaneously have several
249: bindings of different kinds, e.g. as a variable, a type constructor,
250: and an exception. However, variables and constructors share the same
251: name space, so it is not possible for a symbol to be simultaneously
252: bound as a variable and a constructor, despite the fact that these are
253: two disjoint kinds of bindings. The information associated with a
254: symbol, such as the type of a variable or the signature of a
255: structure, is contained in the projected value of its binding (e.g. the
256: var or structureVar).
257: .NH 2
258: Parsing
259: .LP
260: The lexical analyzer is conventional, turning the input
261: character stream into a stream of tokens, recognizing keywords,
262: identifiers, and string and numeric literals.
263: The parser reads tokens from the lexical analyzer and produces an
264: abstract syntax tree containing type and binding information.
265: Parsing is done by recursive descent, with a precedence parser
266: for infix operators in expressions. Unlike previous ML compilers, but
267: like most compilers for Algol-like languages, management of the
268: compile-time environment is done at the same time as parsing: identifiers
269: are entered into the environment as they are defined, and looked up
270: in the environment as they are used.
271:
272: Two particular problems crop up in parsing ML: identifiers may be declared
273: infix, or have their precedence changed, in lexically-scoped declarations;
274: and constructors cannot be syntactically distinguished from identifiers.
275: To handle the infix-operator problem, the lexical analyzer makes no distinction
276: between ordinary identifiers and identifiers that are declared as infix operators.
277: Instead, the operator-precedence expression-parser looks up each identifier
278: in the symbol table to determine if it has a fixity binding, and if it does,
279: what its precedence is. Recognizing constructors can be done similarly
280: by looking up identifiers
281: in the environment.
282: This scheme works nicely in the presence of
283: modules and
284: .B open
285: declarations: if a module $A$ is opened in a scope (even a local scope),
286: and an infix identifier $i$ is used from $A$, the precedence parser will
287: automatically find the precedence declaration of $i$ in the environment.
288:
289: Forward reference to identifiers is legal in mutually recursive definitions.
290: For example, in the function declaration:
291: .DS
292: .ft CW
293: .vs 13
294: fun f(a,b) = if a=0 then b else g(a-1)
295: and g(x) = f(x,x)
296: .vs 17
297: .ft R
298: .DE
299: the first occurrence of the identifier \f5g\fP is in the body of \f5f\fP,
300: which is before the function \f5g\fP is declared. To handle this problem,
301: all free identifiers found in the body of a recursive function definition are
302: kept on a ``backpatch list;'' when the mutually recursive declarations
303: are completely parsed, any references to identifiers defined in those
304: declarations are patched. This extends straightforwardly to nested
305: mutually recursive definitions. Note that constructors are not treated
306: this way; constructors cannot be redefined by
307: .B fun
308: or
309: .B
310: val rec
311: .R
312: definitions, so they are not put on the backpatch list. This means, of course,
313: that all identifiers must be looked up just to determine if they are constructors.
314: .NH 2
315: Type Checking
316: .LP
317: The conventional polymorphic type checking algorithm[11,\|12]
318: is used at present.
319: The basis of the algorithm
320: is the unification of type terms by destructively instantiating type
321: variables so that they become indirections to other types. The
322: administration of generic or bound type variables is done more
323: systematically than in the Cardelli and Edinburgh compilers, but
324: generic types are still copied for each applied occurrence of a
325: variable. A new type representation and type checking algorithm based
326: on structure sharing (in the Boyer-Moore sense [13])
327: has been prototyped with the help of Nick Rothwell. In this new scheme,
328: a polymorphic type is represented as a scheme whose bound variables are
329: indices into an environment vector; different instantiations share the
330: scheme but have different environment vectors.
331:
332: Overloading is accommodated to a limited extent. Currently only
333: certain predefined primitives such as ``+'' are overloaded, and there
334: are no facilities provided by the language for defining new overloaded
335: identifiers. Overloading is implemented as in the Edinburgh compiler,
336: where a type scheme is associated with the overloaded identifier and
337: each overloading (variant) has a type that is an instance of that
338: scheme. The contextual type of an occurrence of the identifier is
339: matched against the scheme and the resulting instantiations of the
340: scheme type variables is used to choose the appropriate variant.
341: This is simpler though more restrictive than the technique used in Hope[7],
342: where variants could have arbitrary types, as long as they
343: were incomparable (i.e. one was not an instance of another).
344: .NH 2
345: Reference and exception types
346: .LP
347: The treatment of references and exceptions with ``open'' types
348: is new, and is based on the fact that the contents of a reference
349: cell cannot be constrained to be polymorphic, and so must be considered
350: to be monomorphic. The following example illustrates the problem.
351: .DS
352: .ft CW
353: .vs 13
354: let val s = ref (fn x => x)
355: in s := (fn x => x+1); (!s) true
356: end
357: .vs 17
358: .ft R
359: .DE
360: If s were given the polymorphic type
361: $roman "ALL"^ alpha "." ( alpha -> alpha ) bold roman "ref"$, then this
362: expression would type check, permitting an obvious type error. To prevent
363: this, we insist that the type of an applied occurrence of the ref
364: constructor should always be given a ``ground'' type (one with no locally-bound
365: type variables).
366:
367: However, functions whose application can create
368: reference variables can still have polymorphic types of a restricted
369: kind. Consider the declaration
370: .DS
371: .ft CW
372: .vs 13
373: val F = fn x => let val r = ref x
374: in !r
375: end
376: .vs 17
377: .ft R
378: .DE
379: Here the function F can be given polymorphic type
380: $roman "ALL"^ alpha sup 1 roman "."
381: alpha sup 1 -> alpha sup 1$ where $alpha sup 1$
382: is a special kind of type variable
383: called a \fIweak\fP type variable (the superscript ``1'' indicates that
384: there is one lambda abstraction suspending the creation of the
385: ref cell). When F is applied to an argument, a reference value
386: of type $alpha sup 1$ is created, and hence this weak type
387: variable must be instantiated to a ground type. This means that
388: an expression like (F nil) would not be properly typed. In contrast,
389: the type
390: $alpha sup 1 bold roman "ref"$
391: assigned to r is permissible because $alpha sup 1$ is implicitly
392: bound in an outer scope and within the scope of its binding is treated as
393: a constant type.
394:
395: Exception declarations raise similar problems, which are handled
396: by an analogous use of weak type variables.
397:
398: The Cardelli and Edinburgh compilers used an earlier form of this
399: treatment proposed by Damas[14].
400: This earlier version was looser in one respect (it allowed unbound
401: weak type variables in types) and more restrictive
402: in another (multiple levels of lambda abstraction were not
403: allowed), and it had some rather counterintuitive effects.
404: .NH 2
405: Modules
406: .LP
407: The simplest kind of module in Standard ML is called a structure. It is
408: most simply defined as an encapsulated set of declarations, as in
409: .DS
410: .ft CW
411: .vs 13
412: structure S =
413: struct
414: type t = int * int
415: val x = (13, 18)
416: end
417: .vs 17
418: .ft R
419: .DE
420: The compile-time symbol table representation of such a structure is
421: a binding table containing a binding for each identifier
422: declared in the structure. There is also an abstract syntax tree for
423: the structure, from which code is generated.
424:
425: A signature is an interface specification for a structure. For
426: instance, the structure above will match the following signature:
427: .DS
428: .ft CW
429: .vs 13
430: signature SS =
431: sig
432: type t
433: val x : t
434: end
435: .vs 17
436: .ft R
437: .DE
438: Signatures are also represented by binding tables, but an abstract
439: syntax tree is not needed since there is no code generated for
440: signatures.
441:
442: A functor is a structure abstracted with respect to one or more
443: structure parameters, which are characterized by signatures. It
444: follows that the representation of a functor consists of a structure
445: representation (the body) and a parameter specification. As part of
446: the parameter specification one can impose sharing constraints among
447: the parameters; these constraints are represented by transforming the
448: parameter specification into a directed acyclic graph of parameter
449: structures.
450:
451: Functors introduce some additional forms of structures, namely formal
452: parameter structures and structures defined by functor applications.
453: For formal parameters, the parameter signature serves as a virtual
454: template defining the components of the structure. In the case of
455: functor applications, the functor body (which may itself be a functor
456: application) is closed with respect to the environment formed by
457: binding actual parameters to the formals.
458:
459: Functor application in principle involves a beta-reduction in which
460: the actual structure parameters are substituted for the formal
461: parameters in the functor body to create the result structure. This
462: beta-reduction acts at two levels\(emthe static level, involving the
463: type aspects of the structures involved, and the dynamic level,
464: involving the value and exception components. The dynamic aspect of
465: the reduction is realized by generating code that performs an ordinary
466: function application. The static aspect is performed at compile time,
467: producing a static representation of the result structure by
468: ``instantiating'' the body of the functor using the formal/actual
469: parameter binding environment.
470:
471: .NH 3
472: Naive copying
473: .LP
474: This static instantiation of functor bodies can be implemented in
475: several ways. The simplest and most direct is to perform a
476: straightforward nondestructive substitution, copying the
477: representation of the functor body in the process. This is roughly
478: the approach followed by Harper and Matthews in their implementations,
479: but the space consumed has been found to be excessive.
480: .NH 3
481: Functor application closures
482: .LP
483: A second
484: alternative initially employed in our compiler was to represent
485: structures formed by functor applications as closures, consisting of
486: the unmodified functor body structure together with the parameter
487: binding environment. This approach saves some copying, but it turns
488: out to be rather unwieldy because functor applications can appear
489: within the body of a functor definition, and consequently the actual
490: parameters may be expressions that themselves refer to other formal
491: parameter variables. For instance, consider
492: .DS
493: .ft CW
494: .vs 13
495: functor F(X:sigX) = bodyF
496:
497: functor G(Y:sigY) =
498: struct
499: structure S = F(Y)
500: ...
501: end
502:
503: structure B = G(A)
504: .vs 17
505: .ft R
506: .DE
507: In this example, the interpretation of \f5B.S\fP is given in terms of the
508: closure of \f5F(X)\fP, which is a pair $< bodyF , ~ roman "{" X |->
509: Y roman "}" > $. Since the
510: binding of \f5X\fP involves \f5Y\fP, this must be interpreted in the additional
511: context of the binding environment
512: $ roman "{" Y |-> A roman "}" $ produced by the
513: application \f5G(A)\fP. In general, a functor application closure may need
514: several layers of such contexts to be properly defined.
515:
516: Another drawback of this approach
517: is that it still performs the
518: copying that is inherent in the naive implementation
519: of signature matching, where a declaration such as
520: .DS
521: .ft CW
522: .vs 13
523: structure S': sig1 = S
524: .vs 17
525: .ft R
526: .DE
527: may cause a partial copy of \f5S\fP to be constructed and bound to \f5S'\fP.
528: .NH 3
529: Structure-sharing
530: .LP
531: The final approach is inspired, like Harper's prototype implementation,
532: by the semantic model developed by Harper, Milner, and Tofte[15].
533: We return to the straightforward idea of
534: actually performing the static reductions to obtain instantiated
535: copies of the functor body, but a structure-sharing representation is
536: used to minimize the amount of copying. In this representation, which
537: is similar in principle to the structure-sharing representation of
538: polytypes alluded to above, the statically ``interesting'' components
539: of each structure (i.e. types and substructures) are represented by
540: indices into an environment vector that is associated with the binding
541: table in another, simpler form of closure. The copying of structures that is
542: entailed by functor application and signature matching can then be
543: reduced to copying the closure objects and their associated
544: environment vectors, leaving the binding tables themselves unaffected.
545:
546: Each type or structure component also has an identifying ``name'',
547: which is basically just a unique number or time stamp. These names
548: serve a couple of purposes: representing sharing constraints, and
549: identifying two sorts of ``bound'' components. To capture sharing
550: specifications in signatures, components that are required to ``share''
551: (i.e. represent views of the same structure)
552: are given the same name. There are two forms of bound components:
553: those incorporated in functor parameter structures, and those
554: representing the components of a functor body that are created each
555: time the functor is applied. All other names are free, and represent
556: actual structures. The first sort of bound names are mapped to
557: components by the process of signature matching between formal and
558: actual functor parameters. The second sort are replaced by new unique
559: names during the static elaboration of functor applications.
560:
561: This third approach is currently under development and will eventually
562: replace the implementation based on functor application closures.
563: .NH
564: Translating abstract syntax into lambda calculus
565: .LP
566: The middle end of the compiler translates type-checked abstract
567: syntax trees into lambda-calculus trees. Because all of the environment
568: and scope manipulation has been done by the front end, and all of the
569: abstract-machine manipulation will be done by the back end, the
570: translator is simple, small, and fast.
571:
572: Though most optimization is purposely left for the back end, some
573: simplification
574: is done in the middle end. Formally recursive
575: .B fun
576: definitions are examined to see if they really contain references to
577: themselves;
578: if not they are replaced by (simpler) nonrecursive definitions.
579: The composition of structure-creation and structure-thinning is evaluated
580: at compile-time rather than run-time.
581: The ML equality predicate is a special
582: function that can be applied to any concrete data type (one built up from
583: primitive types and reference types using record and datatype constructors).
584: The algorithm for testing equality is a recursive tree traversal that varies
585: in its details for each type to which it is applied.
586: For each such occurrence, the translator builds an equality predicate
587: appropriate to the instance.
588:
589: Recent changes to Standard ML will require the use of an ``equality
590: interpreter'' which traverses arbitrary structures in the runtime system,
591: distinguishing certain kinds of cells (like
592: .B ref
593: cells) by special descriptor tags. The present implementation can still
594: be used as an optimized version in those cases where enough is known about
595: the type at compile time.
596: .NH 2
597: Translation of pattern-matching
598: .LP
599: One important and nontrivial job of the middle end
600: is to select optimal comparison
601: sequences for the compilation of pattern-matching. A \fImatch\fP in ML
602: is a sequence of pattern-expression pairs, called \fIrules\fP. When
603: a match is applied to an argument, the argument is matched against the
604: patterns, and the first rule with a matching pattern
605: is selected and its expression
606: is evaluated. A pattern is either a constant, which must match the argument
607: exactly; a variable, which matches any argument (and is bound to it for
608: the purposes of evaluating the expression); a tuple of patterns, which matches
609: a corresponding tuple argument whose components match the components of the
610: pattern-tuple; or a constructor applied to a pattern, which matches
611: an argument built using that constructor if the rest of the pattern matches.
612:
613: As an example, consider the case statement:
614: .DS
615: .ft CW
616: .vs 13
617: case a
618: of (false, nil) => nil
619: | (true, W) => W
620: | (false, cons(X, nil)) => cons(X, cons(X, nil))
621: | (false, cons(Y, Z)) => Z
622: .vs 17
623: .ft R
624: .DE
625: The argument (false, cons(4,nil)) matches the third pattern, while the
626: argument (true, cons(4,nil)) matches the second pattern.
627:
628: One could imagine a naive compilation of matches just by testing the
629: rules in turn as called for by the semantics. Our approach is to transform a
630: sequence of patterns into a decision tree[16].
631: Each internal node of the decision tree corresponds to a matching test and
632: each branch is labeled with one of the possible results of the matching test
633: and with a list of the patterns that remain potential candidates in that
634: case.
635: It is then straightforward to translate the decision tree into code for
636: pattern matching. During the construction of the decision tree it is also
637: easy to determine whether the pattern set is ``exhaustive,'' meaning that
638: every possible argument value matches at least one pattern; and whether
639: there are any ``redundant'' patterns that only match arguments covered
640: by previous rules. Nonexhaustive and redundant patterns result in warning
641: messages by the compiler.
642:
643: Our goal in constructing the decision tree is simply to minimize the total
644: number of test-nodes. This minimizes the size of the generated code and
645: also generally reduces the number of tests performed on value terms.
646: However, finding the decision tree with the minimum number of nodes is
647: an NP-complete problem[16];
648: so a set of efficient heuristics is used that in practice produces an optimal
649: decision tree in almost all cases.
650:
651: In the example above, testing the first component of the pair for truth
652: or falsity suffices to distinguish the second rule from the others;
653: then testing the second component to see whether it is \f5cons\fP or
654: \f5nil\fP distinguishes the first rule from the last two; one more test
655: suffices to separate the last two rules. Thus, in just two or three
656: tests, the appropriate rule can be selected; instead of two or three
657: tests \fIper rule\fP that the naive algorithm would use.
658:
659: The details of this algorithm were originally worked out by Marianne Baudinet
660: and have been implemented in our compiler by Trevor Jim.
661: .NH 2
662: The Lambda Language
663: .LP
664: The front end of the compiler translates ML source into
665: lambda calculus; the back end translates lambda-calculus into machine
666: code for the VAX or MC68020.
667: A significant advantage of having a very simple lambda-calculus
668: as the intermediate representation is that many compiler
669: optimizations can be cleanly described. This is the approach
670: successfully taken in the Rabbit[17]
671: and Orbit[18]
672: compilers for Scheme.
673:
674: The ``lambda language'' is simply an
675: ML datatype, as follows:
676: .DS
677: .ft CW
678: .vs 13
679: datatype lexp
680: = VAR of lvar
681: | FN of lvar * lexp
682: | FIX of lvar list * lexp list * lexp
683: | APP of lexp * lexp
684: | CON of con * lexp
685: | DECON of con * lexp
686: | SWITCH of lexp * (con*lexp) list * lexp Option
687: | RECORD of lexp list
688: | SELECT of int * lexp
689: | RAISE of lexp
690: | HANDLE of lexp * lexp
691: .vs 17
692: .ft R
693: .DE
694: The elements of the lambda language are variables (VAR),
695: abstraction (FN), simultaneous recursive definition (FIX),
696: application (APP), constructors (CON, DECON, and SWITCH),
697: tuples (RECORD and SELECT), and exception handling (HANDLE and
698: RAISE).
699: Exception names, and integer, real, and string literals,
700: are represented as constructors.
701:
702: There are no built-in library functions, at least from the point of
703: view of the compiler; instead, the library module is lambda-bound
704: at top level to some variable $v$, and functions from the
705: library are just components of the structure $v$. Certain functions
706: that are compiled in-line are represented as fields of a special structure
707: bound to a distinguished variable $v sub 0$.
708:
709: Constructor-expressions are explicitly represented in the lambda
710: language using $roman "CON" ( c, e) $ representing the application
711: of the constructor $c$ to the expression $e$, and $roman "DECON"
712: (c,e)$ representing the removal of $c$ from the constructed
713: expression $e$.
714: The lambda language could be simplified by
715: representing constructor-expressions as pairs of (tag,value),
716: and removing the CON and DECON operators. The advantage of using
717: CON and DECON is that the lambda language can be a typed lambda
718: calculus; the disadvantage is that the code generator is somewhat
719: more complicated. Since it's not clear why the low-level
720: representation needs to be typable, we may change the representation
721: of constructors in the lambda language.
722:
723: A structure resembles a record of components;
724: in the lambda language, it is treated exactly as a RECORD. There
725: are no special forms in the lambda language for structures and functors.
726: Functors are treated as functions from structures (RECORDs) to structures.
727: Actually, a structure definition is not just a record; it is an expression
728: that \fIevaluates\fP to a record.
729:
730: Structure $A$ may refer to an element of structure $B$; in the lambda language
731: this will mean that $B$ is a free variable of the expression representing
732: $A$. To keep things simple, it is important to eliminate all free variables
733: from an expression before code is generated for it; so the variable $B$ will
734: be lambda-bound at top level in the definition of $A$, and the runtime system
735: will be instructed to apply the resulting pseudo-functor to the structure
736: $B$ to initialize it.
737: This is a form of automatic,
738: compiler-controlled link-loading. For a link-loading scheme
739: it is extremely simple; our runtime structure-manager and link-loader
740: is less than a page of ML code.
741: .NH
742: Code generation
743: .LP
744: The back end of the compiler is organized as the composition of
745: functors; these functors are most naturally described starting at the end and working
746: toward the front. The last phase is the back-patching of jumps and
747: other relative addresses in a machine-language program. Relative
748: jump instructions on many machines are of different sizes depending on the distance
749: jumped, and several iterations of estimating jump sizes may be
750: required before a fixed point is found[19].
751: This is handled in a machine-independent way by the Backpatch
752: functor:
753: .DS
754: .ft CW
755: .vs 13
756: signature RelativeAddresses =
757: sig type JumpKind
758: val sizeJump : JumpKind -> int
759: val emitJump : (int -> unit) -> JumpKind -> unit
760: end
761: .ft R
762: .vs 17
763: .DE
764:
765: .DS
766: .ft CW
767: .vs 13
768: signature BackPatch =
769: sig
770: type Label
771: val newlabel : unit -> Label
772: type JumpKind
773: val emitbyte : int -> unit
774: val align : unit -> unit
775: val define : Label -> unit
776: val jump : JumpKind*Label -> unit
777: . . .
778: end
779:
780: functor Backpatch( Rel : RelativeAddresses ) : BackPatch
781: = struct
782: . . .
783: end
784: .vs 17
785: .ft R
786: .DE
787: For a given machine with a particular kind of relative address, one
788: builds a small structure explaining what size of relative address can
789: reach what distance, and how to emit code for that sort of relative
790: address. For example, on the Vax, a short conditional branch
791: takes 2 bytes, but a medium-size conditional
792: branch requires 6 bytes (a short conditional branch around a longer
793: unconditional jump), and a longer conditional jump takes 8 bytes.
794:
795: When the Backpatch functor is applied to a structure of type RelativeAddresses,
796: a new structure is created that understands how to
797: backpatch the code for a particular machine. A code generator for
798: that machine can make use of the \fIemit\fP and \fIlabel\fP primitives provided
799: by the specialized Backpatch structure.
800:
801: The signature VaxCode (not shown here)
802: provides an interface to a useful subset of the
803: Vax instruction set. The various instructions and addressing modes
804: are defined in this signature. There are two structures that
805: implement the VaxCode signature in different ways: one that
806: generates machine code when the instructions in the signature are
807: called for, and one that generates assembly code.
808: The VaxMcode structure matches the VaxCode signature, and
809: makes use of the Backpatch functor:
810: .DS
811: .ft CW
812: .vs 13
813: structure VaxMcode : VaxCode =
814: struct
815: structure R : RelativeAddresses =
816: struct
817: . . . functions to compile relative jumps, etc.
818: end
819: structure B = Backpatch(R)
820: datatype AddressMode = . . .
821: fun movl (src : AddressMode, dst: AddressMode) = . . .
822: end
823: .vs 17
824: .ft R
825: .DE
826: There is a peephole optimization module that operates on abstract Vax
827: instructions. It is implemented as a functor taking any VaxCode
828: structure into an ``optimized'' VaxCode structure:
829: .DS
830: .ft CW
831: .vs 13
832: functor Peephole ( V : VaxCode ) : VaxCode = struct . . . end
833: structure OptVaxMcode = Peephole(VaxMcode)
834: .vs 17
835: .ft R
836: .DE
837: If some other structure asks the module OptVaxMcode to produce
838: a particular instruction sequence, it may be that some optimized version of
839: the sequence will be generated instead.
840:
841: The code generator translates programs from
842: lambda language into machine code by
843: means of an abstract machine intermediate form. The abstract machine
844: is similar in spirit to Cardelli's[20].
845: The abstract machine interface is written as a signature in ML:
846: .DS
847: .ft CW
848: .vs 13
849: signature machine =
850: sig type Label
851: val select : int -> unit
852: val apply : unit -> unit
853: val tailrecur : int -> unit
854: val startrecord : int -> unit
855: val endrecord : unit -> unit
856: . . .
857: end;
858: .vs 17
859: .ft R
860: .DE
861: Each of these functions, when called, generates machine code for the
862: corresponding operation.
863:
864: There is a functor Vax that transforms a VaxCode structure into
865: a Machine structure:
866: .DS
867: .ft CW
868: .vs 13
869: functor Vax(C : VaxCode) : Machine =
870: struct
871: fun select j = C.movl(Displace(r0,4*j),Direct(r0))
872: .
873: .
874: .
875: end
876:
877: structure VaxM = Vax(OptVaxMcode)
878: structure VaxA = Vax(OptVaxAcode)
879: .vs 17
880: .ft R
881: .DE
882: This Vax functor can be applied to the structure OptVaxMcode, which will
883: result in an abstract machine that generates machine code for the
884: Vax; or to the structure OptVaxAcode, which generates assembly
885: language. We have a similar set of structures and functors that generate
886: code for the Motorola MC68020 architecture.
887:
888: Finally we can take this implementation of the abstract Machine, and
889: apply the Codegen functor to it:
890: .DS
891: .ft CW
892: .vs 13
893: functor Codegen(M : Machine) =
894: struct
895: fun codegen(APP(a,b)) = (codegen a; codegen b; M.apply())
896: | codegen(FN(v,b)) = . . .
897: \fI(The codegen function is not quite as simple as this, of course.)\fP
898: end
899:
900: structure VaxMCodegen = Codegen(VaxM)
901: structure M68MCodegen = Codegen(M68M)
902: .vs 17
903: .ft R
904: .DE
905: Now the function VaxMCodegen.codegen, when given a lambda expression,
906: generates optimized, backpatched machine code for the Vax.
907:
908: This arrangement of functors is quite satisfactory. However, the
909: details of the Machine signature might
910: be changed to make
911: it less like a stack machine. This might improve the generated
912: code, especially for architectures that are not naturally
913: stack-oriented. We are considering a re-implementation of the
914: code generator using continuation-passing style[17]
915: and Orbit[18].
916: .NH 2
917: Pattern-matching in code generation
918: .LP
919: Many
920: modern code generators are written in the form of tree-pattern matchers[21,\|22].
921: By writing the lambda-calculus
922: tree data structure as an ML datatype (with constructors),
923: the tree pattern matching can be directly specified as an ML function.
924: There is one pattern for each constructor, e.g.
925: .DS
926: .ft CW
927: .vs 13
928: | APP(f,a) => (gen(f); gen(a); machine.apply())
929: .vs 17
930: .ft R
931: .DE
932: but there are also cases that match certain ``idioms;'' like
933: .B let
934: expressions, which are represented as the application of a lambda-function:
935: .DS
936: .ft CW
937: .vs 13
938: | APP(FN(w,b), a) => \fIgenerate code for \fBlet \fIw\fB=\fIa\fB in \fIb\fP
939: .vs 17
940: .ft R
941: .DE
942: There are about a dozen simple cases that handle small patterns with just one
943: constructor, and a dozen more complicated cases that recognize idioms.
944:
945: Some of these cases are applicable in only certain situations. For example,
946: the pattern \f5APP(VAR\ w,\ a)\fP can be compiled without a closure only if
947: \fIw\fP refers to a ``known'' function. One might like to write the
948: clause for this case as:
949: .DS
950: .ft CW
951: .vs 13
952: | APP(VAR w, a) => if knownfunc(w) then applyknown(w,a) else ???
953: .vs 17
954: .ft R
955: .DE
956: The problem is that if $w$ is not a known function, then this pattern-match
957: is not useful, and the pattern \f5APP(f,a)\fP should be matched. By
958: the time the else clause is reached, it is impossible to transfer control
959: to the \f5APP(f,a)\fP clause.
960:
961: One solution to this problem is to attach boolean conditions to patterns,
962: as is done in Miranda[23]:
963: .DS
964: .ft CW
965: .vs 13
966: | APP(VAR w, a) when knownfunc(w) =>
967: .vs 17
968: .ft R
969: .DE
970: Though would add some ``syntactic sludge'' to the language, it is worth
971: considering as a future extension.
972: .NH 2
973: Flat versus linked environments
974: .LP
975: An
976: .I environment
977: is a mapping from some sort of
978: .I identifiers
979: to some sort of
980: .I bindings,
981: on which the fundamental operations are
982: .EQ
983: update ~~ ":" ~~ roman "Env" ~ times ~ roman "Ide" ~ times ~ roman "Bdg" ~~
984: -> ~~roman "Env"
985: .EN
986: .EQ
987: overlay ~~ ":" ~~ roman "Env" ~ times ~ roman "Env" ~~ -> ~~ roman "Env"
988: .EN
989: .EQ
990: access ~~ ":" ~~ roman "Env" ~ times ~ roman "Ide" ~~ -> ~~roman "Bdg"
991: .EN
992: The update function adds an identifier and binding to an
993: environment; the access function looks up an identifier; and the
994: overlay function
995: adds all of the bindings of one environment to
996: another. The overlay function might be used to implement the ML
997: .B open
998: primitive, for example.
999:
1000: Environments occur both in the compiler, where the identifiers are
1001: those of the source text, and the bindings are types and other
1002: information; and at runtime, in the form of function closures.
1003: A function closure, in implementations of lambda calculus, is a pair
1004: consisting of a function-code and the values to be associated with
1005: all of its free variables.
1006: Though these two kinds of environments are used in different ways,
1007: they share many of the same problems of implementation.
1008:
1009: We know of no data structure that implements environments with both a very
1010: fast \fIaccess\fP function and a very fast \fIoverlay\fP function.
1011: For compile-time environments,
1012: a fast \fIaccess\fP can be achieved by using a hash table; but
1013: then to overlay one hash table onto another takes time proportional
1014: to the size of the smaller table. A fast \fIoverlay\fP can be
1015: achieved by representing environments as trees of \fIoverlay\fP
1016: operations, but then \fIaccess\fP requires a tree search.
1017: In the compiled code, closures can be represented as flat
1018: or linked structures. A flat closure[2]
1019: is just a vector with one slot
1020: for each free variable, containing the value to which it is bound.
1021: A linked closure[17]
1022: has the binding of one free variable at the
1023: innermost level, along with a pointer to the closure for the
1024: enclosing scope.
1025: Flat closures have a fast \fIaccess\fP, but to build the vector takes
1026: time proportional to the number of free variables in the function.
1027: Linked closures can be built quickly: since the enclosing linked
1028: closure has already been built, just one \fIcons\fP operation is required.
1029: However, linked closures have a slower \fIaccess\fP function.
1030:
1031: The question of whether to make \fIaccess\fP or \fIoverlay\fP fast
1032: is a difficult one, since examples can be found to support either
1033: decision. Consequently, we have adopted a compromise: a mixed
1034: representation will be used, so that the time required for a typical
1035: sequence of \fIaccess\fP and \fIoverlay\fP operations will probably be
1036: smaller than if either the flat (hash-table) or linked (tree)
1037: representation were used.
1038:
1039: In the front end, environments are represented as linked lists of
1040: hash tables. The operation of \fBopen\fPing a structure just adds
1041: the (already-built) hash-table for that structure onto the list, but
1042: a typical \fIupdate\fP operation just adds a binding to an existing
1043: hash table.
1044: In the generated code, closures are represented as trees\(em
1045: a compromise between flat and linked closures. For each free variable
1046: in a closure, the back end heuristically decides whether to add
1047: another slot to the closure for this variable, or to access this
1048: variable through the (already existing) linked closure. The former
1049: is more expensive to build, but cheaper to use. When we have more
1050: experience with this compiler, we hope to measure the relative
1051: efficiencies of flat and linked closures in practice.
1052: .NH 2
1053: Data structures in the run-time system
1054: .LP
1055: The representations of ML structures in the compiled code are designed
1056: to be simple and general. We took care to avoid arbitrary restrictions
1057: on the size of the address space, or of individual objects. Except for
1058: .B ref
1059: cells, the graph of runtime objects is acyclic\(em
1060: even mutually recursive closures are represented without cycles.
1061:
1062: Because of ML's polymorphic type system, all values must be the same size.
1063: We take this size to be one 32-bit word (this discussion of runtime
1064: data structures is specific to implementations on 32-bit computers).
1065: Any value that is naturally represented in more than one word will
1066: be manipulated as a (one-word) pointer to a (multi-word) object;
1067: this is known as a
1068: .I boxed
1069: value. An
1070: .I unboxed
1071: value is one which fits in one word and is not a pointer.
1072: The garbage collector must be able to tell which words are boxed (pointers)
1073: and which words are not. We use the low-order bit of the word as the
1074: indicator. Since all of our pointers (on byte-addressable machines)
1075: point to word boundaries, the low-order bit of any pointer is zero.
1076: Unboxed values are represented with their low-order bits turned on.
1077: Unboxed values include constructor tags,
1078: one-character strings, and integers (the integer $k$ is
1079: represented as $2k+1$).
1080:
1081: All records in the runtime system contain an integral number of 32-bit words,
1082: and are prefaced by a descriptor describing their size and type. There are two
1083: types of records: those that contain pointers (and unboxed values), and
1084: those that contain no pointers. All ML tuples and closures are of the
1085: former variety (Figure 1);
1086: strings and machine-code fragments are of the latter.
1087: The fact that machine-code fragments contain no pointers is a consequence
1088: of the fact that the back-end code generator is never applied to any
1089: lambda-expression with free variables.
1090: .KF
1091: .PS
1092: boxht = .2; boxwid = .35; a = .4
1093: [down; A: box; B: box; C: box; D: box
1094: "desc" at A.c
1095: spline -> from B.c right a then up a/2 then right a+a
1096: "78" at C.c
1097: spline -> from D.c right a then down a/4 then right a
1098: line <- from B.w left a
1099: "Figure 1. A triple of two" "pointers and an integer" \
1100: at 2.2 between A.c and D.c
1101: ]
1102: move right 4*a
1103: [down; A: box; B: box; C: box; D: box; E: box
1104: "desc" at A.c
1105: line <- from B.w left a
1106: line <- from C.w left a
1107: line <- from D.w left a
1108: spline -> from B.c right a then up a/2 then right a+a
1109: spline -> from C.c right a then up a/4 then right a
1110: spline -> from D.c right a then down a/2 then right a+a
1111: "62" at E.c
1112: "Figure 2. Three mutually recursive" \
1113: "functions with one free variable" \
1114: at 2.2 between A.c and D.c
1115: ]
1116: .PE
1117: .KE
1118:
1119: There are several different representations used for constructors,
1120: just as in Cardelli's compiler[2].
1121: Constructors that carry values are usually represented as pairs of
1122: (tag, value). Constant constructors (link \fBnil\fP) are represented
1123: as small unboxed integers. Value-carrying constructors can be
1124: represented completely transparently if they carry an always-boxed value,
1125: and there are no other value-carrying constructors in the datatype
1126: (like \fBcons\fP), or if there is only one constructor in the datatype.
1127: Finally, \fBref\fP cells look like one-word records.
1128:
1129: Earlier ML compilers[20]
1130: represented mutually recursive functions as a set of closures that point
1131: to each other in a cyclic data structure. We wanted to avoid gratuitous
1132: circularity, because many garbage collection algorithms are more efficient
1133: on acyclic data structures[24].
1134: We represent mutually recursive functions with just one closure
1135: record that has
1136: several function-part entries (Figure 2). To represent function $i$,
1137: one just uses a pointer to the $i sup roman "th"$ field. If the
1138: $i sup roman "th"$ function wants to refer to the closure for the $j sup
1139: roman "th"$ function, it just adds $ j - i $ to the pointer value for
1140: its own closure. This creates no cycles of pointers; although there
1141: is an implicit cycle because each segment of machine code can refer
1142: to the machine's registers, which in turn point to the closure.
1143:
1144: Several machine-code fragments may be stored in the same record.
1145: Because the record may be moved by the garbage collector, references within
1146: the block must be by relative addresses. Many computers have PC-relative
1147: addressing modes to facilitate this; for other machines, a pointer to
1148: the currently-executing block would have to be kept in a register.
1149: A typical record contains all of the function and string-literal fragments
1150: for an entire ML structure (module). In order not to confuse the garbage
1151: collector, just before the start of each fragment within the record
1152: is a (relative) pointer back to the beginning of the record, where
1153: the record's descriptor may be found; and all fragments must begin
1154: on even byte addresses, so that pointers to fragments within the record
1155: won't look like unboxed values.
1156: .NH 2
1157: Fast consing
1158: .LP
1159: The operation of allocating and initializing a record\(em
1160: known in Lisp
1161: as \fIcons\fP\(em
1162: is a fundamental operation in ML. The creation of a tuple value,
1163: the application of a constructor, and the creation of a function-closure
1164: all rely on this operation. Our implementation strives to make this operation
1165: as fast as possible.
1166:
1167: We use a copying (and compacting) garbage collection algorithm. One
1168: consequence of this is that the free space available for storage
1169: allocation is a contiguous region of memory. We reserve a register
1170: to point at the lowest address in the free region. To allocate an
1171: $n$-word initialized record, the $n sup roman "th"$ word is stored
1172: first, followed by the other words and the descriptor. Then the
1173: free-space register is incremented by $n$ words.
1174:
1175: At some point the free space is exhausted. We arrange to have an
1176: inaccessible page of the address space immediately beyond the free
1177: space region. When there is an attempt to allocate a record that
1178: crosses into this page, a page fault occurs; and since the page is
1179: marked as inaccessible, the operating system transfers control to the
1180: ML runtime system instead of handling the page fault itself. Because
1181: the last word of a record is stored first, the creation of
1182: any record that crosses
1183: the boundary into the inaccessible region will fault at the beginning
1184: (where it is easy to patch things up and recover) rather than halfway
1185: through its initialization. (A special case is required for those
1186: records larger than a page; but these are rare and identifiable at
1187: compile-time.)
1188: Thus, the creation of an $n$-word record takes just $n$ stores
1189: (plus one for the descriptor) and an
1190: add instruction; the $n$ stores are required in any case to do the
1191: initialization. This is an extremely low-overhead
1192: .I cons
1193: operation.
1194:
1195: A fast
1196: .I cons
1197: is not helpful if garbage collection is slow. When physical memory
1198: is large, then the cost of copying garbage collection, amortized over
1199: the
1200: .I cons
1201: operations, can come to less than one instruction per \fIcons\fP $~~$[25].
1202: We are using a variant of generational garbage collection[24,\|26,\|27]
1203: that promises to be very fast even in moderate-size memories.
1204:
1205: In writing our compiler, we have aimed for clarity and
1206: straightforwardness; we have eschewed the coding tricks that enable
1207: programmers to avoid using dynamically allocated memory. As a
1208: result, our compiler probably \fIcons\fPes a lot more than other compilers;
1209: it is fortunate that we have made
1210: .I cons
1211: so fast.
1212: .NH 2
1213: Exception Handling
1214: .LP
1215: Exception names in Standard ML behave a bit like constructors, except that
1216: they are generatively (dynamically) defined. The statements
1217: .DS
1218: .ft CW
1219: .vs 13
1220: exception E : int and R : real
1221: \fIand\fP
1222: ... handle E with 0 => 10
1223: | i => 2*i
1224: || R with 0.0 => 10
1225: | r => 2*floor(r)
1226: .vs 17
1227: .ft R
1228: .DE
1229: are, if one considers $E$ and $R$ to be constructors,
1230: a bit like the (hypothetical) statements
1231: .DS
1232: .ft CW
1233: .vs 13
1234: datatype exception = . . . E of int | R of real . . .
1235: \fIand\fP
1236: ... handle E(0) => 10
1237: | E(i) => 2*i
1238: | R(0.0) => 10
1239: | R(r) => 2*floor(r)
1240: .vs 17
1241: .ft R
1242: .DE
1243: Generative data constructors cannot be given small integer tags like ordinary
1244: constructors. The tag for the constructor \f5E\fP must be constructed
1245: at run time. Its representation will be that of a
1246: .B ref
1247: cell: \f5ref("E")\fP. Any match applied to the exception type can
1248: compare the address of the ref cell against the tag of the exception
1249: object; and if the exception is propagated out of the user program into
1250: the bootstrap system, then the name of the exception can be determined
1251: (for debugging) by dereferencing the tag.
1252:
1253: The ML statement \f5raise e with v\fP is compiled into the lambda language
1254: as \f5RAISE(CON(e,v))\fP; and the handle statement applies a match
1255: (with exceptions as constructors) to the exception raised. That is,
1256: \f5HANDLE(e,h)\fP executes expression \f5e\fP; if any exception is raised
1257: in \f5e\fP, then the handler \f5h\fP (which is a
1258: \fImatch\fP as above) is applied to that exception.
1259: If \f5e\fP has type $t$, then \f5h\fP must have type $"exception" ^ -> ^ t$.
1260:
1261: The implementation of \f5HANDLE(e,h)\fP in the code generator is
1262: straightforward. An exception handler is pushed on the stack;
1263: expression \f5e\fP is executed; if no exception is raised, the
1264: exception handler is popped from the stack. An exception handler
1265: has two components: the address of the machine-code for the match
1266: \f5h\fP, and the address (on the stack) of the enclosing
1267: exception handler. Thus, entering the scope of a
1268: .B handle
1269: takes about two instructions, as does leaving its scope.
1270:
1271: To raise an exception \f5E(v)\fP, first the constructor-expression
1272: \f5E(v)\fP is evaluated and put into a register;
1273: then the stack pointer is reset to the position
1274: of the current exception handler; the current handler is removed; and
1275: control passes to the text address found in the current handler.
1276: The match \f5h\fP found at this address is thereby applied to the
1277: exception \f5E(v)\fP previously evaluated. Raising an exception
1278: therefore takes just three or four instructions, not including the
1279: match that may have to be done upon arrival at the handler to determine
1280: which exception was raised.
1281:
1282: The constructor view of exception
1283: names is not just a convenience in implementation.
1284: After some discussion in the Standard ML community, exceptions-as-constructors
1285: have been incorporated into the language.
1286: We no longer need
1287: an awkward notation (using \f5|\fP and \f5||\fP) for exception matches;
1288: and such things as re-raising an arbitrary expression are now possible:
1289: .DS
1290: .ft CW
1291: .vs 13
1292: handle e => (clean up; raise e) \fI(e is a variable of type ``exception'')\fP
1293: .vs 17
1294: .ft R
1295: .DE
1296: .NH
1297: The runtime system
1298: .LP
1299: The ML standard library is implemented as a single module in the runtime
1300: system. The library functions are mostly written in ML, with some
1301: references to primitive functions written in assembly language.
1302: All references to library functions are treated as references
1303: to fields of this structure. Thus, a typical expression might have
1304: only one free variable\(em
1305: the library structure itself. In order to ensure that the code generator
1306: is given only closed expressions, this variable is lambda-bound; the
1307: machine code resulting from this closed expression can then be treated as
1308: a function whose argument will be the standard library. Access to user-defined
1309: structures is handled similarly.
1310:
1311: This general plan is used both for the ``bootstrap'' system (in which
1312: the machine code for each
1313: structure is written to an external file), and
1314: the ``interactive'' system
1315: (in which the code for each structure or expression is
1316: put in memory and executed). In the interactive system, each expression
1317: when compiled is represented as a function whose argument is the
1318: current top-level environment, and whose result is the new binding.
1319: Thus, the interface between the compiler and the runtime system is
1320: very narrow and clean.
1321: .NH
1322: Conclusion
1323: .LP
1324: Standard ML is a complicated language. We manage the complexity by using
1325: a sequence of well-defined intermediate representations: tokens, abstract
1326: syntax, lambda-calculus, abstract machine, symbolic machine instructions.
1327: The datatypes and signatures of ML allow these interfaces to be cleanly
1328: specified, which is a great help.
1329:
1330: Lexical analysis\(em
1331: the translation from source programs to tokens\(em
1332: we make as simple as possible (760 lines). All complications of recognizing
1333: constructors and infix operators are handled in the parser, which
1334: has better access to the environment mechanisms.
1335:
1336: The translation from tokens to
1337: fully typed abstract syntax is the most complicated phase of our
1338: compiler (4500 lines).
1339: Because compile-time environments have some effect on parsing
1340: (constructors and infix operators), we merged the compilation
1341: of syntax, scopes, and types.
1342: This is where most of the complexity of the module features
1343: appear. But since structures and functors impinge on scopes and types
1344: in fundamental ways, we decided to build them into this phase from the
1345: beginning.
1346:
1347: Since the abstract syntax is so well-digested and complete, the next phase\(em
1348: translation into lambda-calculus\(em
1349: is relatively simple (1480 lines).
1350: The major task of this phase is the compilation of patterns into
1351: decision trees.
1352:
1353: A simple lambda-calculus representation of programs allows a simple
1354: and consistent code generator that can recognize idioms in lambda calculus
1355: and optimize them (750 lines).
1356: The lambda-calculus can also be useful for other kinds
1357: of analysis, like in-line function expansion.
1358: The back end needs to know very little about front-end data structures;
1359: thus, we found it simplest to have an environment manager for the back
1360: end that is completely separate from the front-end environment manager.
1361: Back-end environments map applied occurrences of variables
1362: in the lambda-calculus to their bindings, just as the front-end environment
1363: does that job for abstract syntax trees.
1364:
1365: The last two phases\(em
1366: translation from abstract machine instructions into
1367: symbolic (Vax or MC68020) instructions, and the translation of those
1368: into backpatched byte sequences\(em
1369: are the only machine-dependent phases.
1370: These phases have a largely straightforward structure (950 lines per machine).
1371: In the current
1372: implementation, our code generators do not make adequate use of registers,
1373: and we may redesign the abstract machine interface.
1374:
1375: Finally, the runtime system is kept very simple (450 lines of C,
1376: including the garbage collector; 300 lines
1377: of assembly language).
1378: As a statically-typed
1379: language, ML by its nature is oriented to compile-time analysis rather
1380: than complicated runtime systems; and we push this as far as possible.
1381: Our standard library is mostly implemented in ML
1382: (750 lines), and those parts
1383: implemented in assembly language are arranged to match a Standard ML
1384: signature. We pay particular attention to fast allocation of dynamic
1385: storage, since the features of ML (constructors, function closures) necessitate
1386: an efficient memory allocator and garbage collector.
1387:
1388: Our compiler is concise, efficient, and (we think) well-designed.
1389: It should prove to be a useful tool for the functional programming community.
1390: .sp 2
1391: .nr PS 10
1392: .nr VS 12
1393: .LP
1394: .]<
1395: .ds [F 1
1396: .]-
1397: .ds [A P. J. Landin
1398: .ds [U landin
1399: .ds [K ISWIM
1400: .ds [T The next 700 programming languages
1401: .ds [J Comm. ACM
1402: .ds [V 9
1403: .ds [N 3
1404: .ds [P 157-166
1405: .nr [P 1
1406: .ds [D 1966
1407: .nr [T 0
1408: .nr [A 0
1409: .nr [O 0
1410: .][ 1 journal-article
1411: .ds [F 2
1412: .]-
1413: .ds [A Luca Cardelli
1414: .ds [U cardelli
1415: .ds [T Compiling a functional language
1416: .ds [I ACM
1417: .ds [J 1984 Symp. on LISP and Functional Programming
1418: .ds [P 208-217
1419: .nr [P 1
1420: .ds [D 1984
1421: .nr [T 0
1422: .nr [A 0
1423: .nr [O 0
1424: .][ 1 journal-article
1425: .ds [F 3
1426: .]-
1427: .ds [K milner84
1428: .ds [K SML
1429: .ds [U milner
1430: .ds [A Robin Milner
1431: .ds [T A proposal for Standard ML
1432: .ds [I ACM
1433: .ds [J ACM Symposium on LISP and Functional Programming
1434: .ds [P 184-197
1435: .nr [P 1
1436: .ds [D 1984
1437: .nr [T 0
1438: .nr [A 0
1439: .nr [O 0
1440: .][ 1 journal-article
1441: .ds [F 4
1442: .]-
1443: .ds [K milner85
1444: .ds [K SML
1445: .ds [u milner
1446: .ds [A Robin Milner
1447: .ds [T The Standard ML Core Language
1448: .ds [J Polymorphism
1449: .ds [V 2
1450: .ds [N 2
1451: .ds [D October 1985
1452: .nr [T 0
1453: .nr [A 0
1454: .nr [O 0
1455: .][ 1 journal-article
1456: .ds [F 5
1457: .]-
1458: .ds [A David MacQueen
1459: .ds [T Modules for Standard ML
1460: .ds [I ACM
1461: .ds [J Proc. 1984 ACM Conf. on LISP and Functional Programming
1462: .ds [C Austin, Texas
1463: .ds [P 198-207
1464: .nr [P 1
1465: .ds [D 1984
1466: .nr [T 0
1467: .nr [A 0
1468: .nr [O 0
1469: .][ 1 journal-article
1470: .ds [F 6
1471: .]-
1472: .ds [A David MacQueen
1473: .ds [T Modules for Standard ML
1474: .ds [J Polymorphism
1475: .ds [V 2
1476: .ds [N 2
1477: .ds [D October 1985
1478: .nr [T 0
1479: .nr [A 0
1480: .nr [O 0
1481: .][ 1 journal-article
1482: .ds [F 7
1483: .]-
1484: .ds [A R. Burstall
1485: .as [A ", D. MacQueen
1486: .as [A ", and D. Sannella
1487: .ds [T Hope: an Experimental Applicative Language
1488: .ds [D 1980
1489: .ds [P 136-143
1490: .nr [P 1
1491: .ds [K burstall80
1492: .ds [C Stanford
1493: .ds [J Proceedings of the 1980 LISP Conference
1494: .nr [T 0
1495: .nr [A 0
1496: .nr [O 0
1497: .][ 1 journal-article
1498: .ds [F 8
1499: .]-
1500: .ds [A David C. J. Matthews
1501: .ds [T The Poly manual
1502: .ds [J SIGPLAN Notices
1503: .ds [D September 1985
1504: .nr [T 0
1505: .nr [A 0
1506: .nr [O 0
1507: .][ 1 journal-article
1508: .ds [F 9
1509: .]-
1510: .ds [A David C. J. Matthews
1511: .ds [T An implementation of Standard ML in Poly
1512: .ds [D May 1986
1513: .nr [T 0
1514: .nr [A 0
1515: .nr [O 0
1516: .][ 0 other
1517: .ds [F 10
1518: .]-
1519: .ds [A G. Cousineau
1520: .as [A ", P. L. Curien
1521: .as [A ", and M. Mauny
1522: .ds [T The Categorical Abstract Machine
1523: .ds [P 50-64
1524: .nr [P 1
1525: .ds [I Springer-Verlag
1526: .ds [B Functional Programming Languages and Computer Architecture, LNCS Vol 201
1527: .ds [E J. P. Jouannaud
1528: .ds [D 1985
1529: .nr [T 0
1530: .nr [A 0
1531: .nr [O 0
1532: .][ 3 article-in-book
1533: .ds [F 11
1534: .]-
1535: .ds [K milner78
1536: .ds [A Robin Milner
1537: .ds [U milner
1538: .ds [T A Theory of Type Polymorphism in Programming
1539: .ds [J J. CSS
1540: .ds [V 17
1541: .ds [P 348-375
1542: .nr [P 1
1543: .ds [D 1978
1544: .nr [T 0
1545: .nr [A 0
1546: .nr [O 0
1547: .][ 1 journal-article
1548: .ds [F 12
1549: .]-
1550: .ds [A Luca Cardelli
1551: .ds [U cardelli
1552: .ds [T Basic polymorphic typechecking
1553: .ds [J Polymorphism
1554: .ds [V 2
1555: .ds [N 1
1556: .ds [D January 1985
1557: .nr [T 0
1558: .nr [A 0
1559: .nr [O 0
1560: .][ 1 journal-article
1561: .ds [F 13
1562: .]-
1563: .ds [A R. S. Boyer
1564: .as [A " and J Moore
1565: .ds [U boyer
1566: .ds [T The sharing of structure in theorem-proving programs
1567: .ds [B Machine Intelligence 7
1568: .ds [E B. Meltzer
1569: .ds [E D. Michie
1570: .ds [I Edinburgh University Press
1571: .ds [D 1972
1572: .nr [T 0
1573: .nr [A 0
1574: .nr [O 0
1575: .][ 3 article-in-book
1576: .ds [F 14
1577: .]-
1578: .ds [A Luis Damas
1579: .ds [T Type Assignment in Programming Languages
1580: .ds [R PhD Thesis
1581: .ds [I Department of Computer Science, University of Edinburgh
1582: .ds [D 1985
1583: .nr [T 0
1584: .nr [A 0
1585: .nr [O 0
1586: .][ 4 tech-report
1587: .ds [F 15
1588: .]-
1589: .ds [A Robert Harper
1590: .as [A ", Robin Milner
1591: .as [A ", and Mads Tofte
1592: .ds [T A type discipline for program modules
1593: .ds [U harper
1594: .ds [I Univ. of Edinburgh
1595: .ds [R ECS-LFCS-87-28
1596: .ds [D 1987
1597: .nr [T 0
1598: .nr [A 0
1599: .nr [O 0
1600: .][ 4 tech-report
1601: .ds [F 16
1602: .]-
1603: .ds [A Marianne Baudinet
1604: .as [A " and David MacQueen
1605: .ds [T Tree Pattern Matching for ML
1606: .ds [D 1986
1607: .nr [T 0
1608: .nr [A 0
1609: .nr [O 0
1610: .][ 0 other
1611: .ds [F 17
1612: .]-
1613: .ds [K steele78
1614: .ds [A Guy L. Steele
1615: .ds [T Rabbit: a compiler for Scheme
1616: .ds [U steele
1617: .ds [I MIT
1618: .ds [R AI-TR-474
1619: .ds [D 1978
1620: .nr [T 0
1621: .nr [A 0
1622: .nr [O 0
1623: .][ 4 tech-report
1624: .ds [F 18
1625: .]-
1626: .ds [A D. Kranz
1627: .as [A ", R. Kelsey
1628: .as [A ", J. Rees
1629: .as [A ", P. Hudak
1630: .as [A ", J. Philbin
1631: .as [A ", and N. Adams
1632: .ds [T ORBIT: An optimizing compiler for Scheme
1633: .ds [P 219-233
1634: .nr [P 1
1635: .ds [D July 1986
1636: .ds [J Proc. Sigplan '86 Symp. on Compiler Construction
1637: .ds [V 21 (Sigplan Notices)
1638: .ds [N 7
1639: .nr [T 0
1640: .nr [A 0
1641: .nr [O 0
1642: .][ 1 journal-article
1643: .ds [F 19
1644: .]-
1645: .ds [A W. Wulf
1646: .as [A ", R. K. Johnsson
1647: .as [A ", C. B. Weinstock
1648: .as [A ", C. B. Hobbs
1649: .as [A ", and C. M. Geschke
1650: .ds [T Design of an Optimizing Compiler
1651: .ds [I Elsevier North-Holland
1652: .ds [C New York
1653: .ds [D 1975
1654: .ds [k wulf75
1655: .nr [T 0
1656: .nr [A 0
1657: .nr [O 0
1658: .][ 2 book
1659: .ds [F 20
1660: .]-
1661: .ds [A Luca Cardelli
1662: .ds [U cardelli
1663: .ds [T The functional abstract machine
1664: .ds [J Polymorphism
1665: .ds [V 1
1666: .ds [N 1
1667: .ds [D January 1983
1668: .nr [T 0
1669: .nr [A 0
1670: .nr [O 0
1671: .][ 1 journal-article
1672: .ds [F 21
1673: .]-
1674: .ds [A R. G. G. Cattell
1675: .ds [T Formalization and automatic derivation of code generators
1676: .ds [I Carnegie-Mellon University
1677: .ds [C Pittsburgh, PA
1678: .ds [D April 1978
1679: .ds [K cattell78
1680: .ds [R Ph.D. Thesis
1681: .nr [T 0
1682: .nr [A 0
1683: .nr [O 0
1684: .][ 4 tech-report
1685: .ds [F 22
1686: .]-
1687: .ds [K agt86
1688: .ds [K twig
1689: .ds [A A. V. Aho
1690: .as [A ", M. Ganapathi
1691: .as [A ", and S. W. K. Tjiang
1692: .ds [U aho
1693: .ds [T Code generation using tree matching and dynamic programming
1694: .ds [D 1986
1695: .nr [T 0
1696: .nr [A 0
1697: .nr [O 0
1698: .][ 0 other
1699: .ds [F 23
1700: .]-
1701: .ds [A D. A. Turner
1702: .ds [T Miranda: a non-strict functional language with polymorphic types
1703: .ds [I Springer-Verlag
1704: .ds [P 1-16
1705: .nr [P 1
1706: .ds [B Functional Programming Languages and Computer Architecture, LNCS Vol 201
1707: .ds [E J. P. Jouannaud
1708: .ds [D 1985
1709: .nr [T 0
1710: .nr [A 0
1711: .nr [O 0
1712: .][ 3 article-in-book
1713: .ds [F 24
1714: .]-
1715: .ds [K lieberman83
1716: .ds [U lieberman
1717: .ds [A Henry Lieberman
1718: .as [A " and Carl Hewitt
1719: .ds [T A real-time garbage collector based on the lifetimes of objects
1720: .ds [J Communications of the ACM
1721: .ds [I ACM
1722: .ds [V 23
1723: .ds [N 6
1724: .ds [P 419-429
1725: .nr [P 1
1726: .ds [D 1983
1727: .nr [T 0
1728: .nr [A 0
1729: .nr [O 0
1730: .][ 1 journal-article
1731: .ds [F 25
1732: .]-
1733: .ds [U appel
1734: .ds [A A. W. Appel
1735: .ds [T Garbage collection can be faster than stack allocation
1736: .ds [J Information Processing Letters
1737: .ds [V (to appear)
1738: .ds [D 1987
1739: .nr [T 0
1740: .nr [A 0
1741: .nr [O 0
1742: .][ 1 journal-article
1743: .ds [F 26
1744: .]-
1745: .ds [U moon
1746: .ds [K moon84
1747: .ds [A David A. Moon
1748: .ds [T Garbage collection in a large LISP system
1749: .ds [D 1984
1750: .ds [P 235-246
1751: .nr [P 1
1752: .ds [I ACM
1753: .ds [J ACM Symposium on LISP and Functional Programming
1754: .nr [T 0
1755: .nr [A 0
1756: .nr [O 0
1757: .][ 1 journal-article
1758: .ds [F 27
1759: .]-
1760: .ds [A David Ungar
1761: .ds [T Generation scavenging: a non-disruptive high performance storage
1762: .as [T " reclamation algorithm
1763: .ds [I ACM
1764: .ds [J SIGPLAN Notices (Proc. ACM SIGSOFT/SIGPLAN Software Eng. Symp. on Practical Software
1765: .as [J " Development Environments)
1766: .ds [V 19
1767: .ds [N 5
1768: .ds [D 1984
1769: .ds [P 157-167
1770: .nr [P 1
1771: .nr [T 0
1772: .nr [A 0
1773: .nr [O 0
1774: .][ 1 journal-article
1775: .]>
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.