|
|
1.1 root 1: .RP
2: .ND March 11, 1988
3: .TL
4: An Implementation of Standard ML Modules
5: .AU
6: David MacQueen
7: .AI
8: AT&T Bell Laboratories
9: Murray Hill, NJ 07974
10: .AB
11: Standard ML includes a set of module constructs that support
12: programming in the large. These constructs extend ML's basic
13: polymorphic type system by introducing the dependent types of Martin
14: L\o'o"'f's Intuitionistic Type Theory. This paper discusses the
15: problems involved in implementing Standard ML's modules and describes
16: a practical, efficient solution to these problems. The
17: representations and algorithms of this implementation were inspired by
18: a detailed formal semantics of Standard ML developed by Milner, Tofte,
19: and Harper. The implementation is part of a new Standard ML compiler
20: that is written in Standard ML using the module system.
21: .AE
22: .EQ
23: delim %%
24: .EN
25: .de M1 \" start of a program display
26: .DS
27: .ft CW
28: ..
29: .de M2
30: .ft R
31: .DE
32: ..
33: .de M3
34: .if "\\$2"" \\&\f(CW\\$1\f1
35: .if !"\\$2"" \\&\f(CW\\$1\f1\\&\\$2
36: ..
37: .fp 5 CW
38: .NH
39: Introduction
40: .PP
41: An important part of the revision of ML that led to
42: the Standard ML language was the inclusion of module
43: facilities for the support of ``programming in the large.''
44: The design of these facilities went through several
45: versions [8] and was supported by concurrent investigations
46: of the type theory of ML and related systems [9,11].
47: The central idea behind the design was to support modularity
48: by introducing a stratified system of dependent types as suggested
49: by Martin L\o'o"'f's Intuitionistic Type Theory [10].
50: In late 1985 Bob Harper added a prototype implementation of
51: most of the module facilities to the Edinburgh ML compiler,
52: which was serving as a test-bed for the evolving Standard ML design.
53: .PP
54: Starting in the spring of 1986, Andrew Appel, Trevor Jim, and I have
55: implemented a new Standard ML compiler, written in Standard ML,
56: and initially bootstrapped from the Edinburgh compiler. An overview of
57: this new compiler, known as Standard ML of New Jersey, is given in [1].
58: The implementation of modules in this new compiler went through
59: two generations.
60: A first version was done in the fall of 1986, but it was completely
61: rewritten in the summer of 1987 following discussions of the
62: operational static semantics of modules with Robin Milner, Mads Tofte,
63: and Bob Harper [6,7,12].
64: Like Harper's prototype implementation, the new modules implementation
65: was inspired by the static semantics, but it uses a \fIstructure sharing\fP
66: strategy [3,13]
67: to avoid serious performance problems associated with a naive
68: implementation of the static semantics.
69: Although precise comparative measurements are not yet available, our
70: experience shows that the symbol table size for a large ML program
71: such as the ML compiler is several times smaller with the
72: new compiler than with the old compiler.
73: .PP
74: The objective of this paper is to describe our implementation of the
75: Standard ML module facilities, with particular emphasis on the
76: techniques used to minimize the space consumed by static representations
77: of modules (\fIi.e.\fP symbol table structures).
78: We begin by reviewing the elements of the module language.
79: .NH
80: Summary of the module constructs
81: .PP
82: Before describing the basic issues concerning implementation
83: of Standard ML modules, we need to review the main elements
84: of the module language.
85: There are three principal notions:
86: .in .3i
87: .IP (1)
88: \f2signature\f1 \- interface specification or ``type'' of modules.
89: .IP (2)
90: \f2structure\f1 \- an environment; a collection of type,
91: structure, value, and exception bindings; corresponds to
92: the conventional idea of a module.
93: .IP (3)
94: \f2functor\f1 \- function mapping structures to structures;
95: a form of parametric module.
96: .in -0.3i
97: .LP
98: Figure 1 below contains simple examples of each of these constructs.
99: .M1
100: signature ORD =
101: sig
102: type t
103: val le : t*t -> bool
104: end
105:
106: structure S =
107: struct
108: datatype t = A | B of t
109: val x = A
110: fun le(A,_) = true
111: | le(_,A) = false
112: | le(B x, B y) = le(x,y)
113: end
114:
115: signature LEXORD =
116: sig
117: structure A : ORD
118: val lexord : A.t list * A.t list -> bool
119: end
120:
121: functor LexOrd(O: ORD) : LEXORD =
122: struct
123: structure A = O
124: fun lexord([],_) = true
125: | lexord(_,[]) = false
126: | lexord(x::l,y::m) = ... O.le(x,y) ...
127: end
128:
129: structure LS = LexOrd(S)
130: .M2
131: .tl ''\f2Figure 1\f1''
132: .PP
133: This example contains declarations of two signatures,
134: .M3 ORD
135: and
136: .M3 LEXORD ,
137: two structures,
138: .M3 S
139: and
140: .M3 LS ,
141: and one functor,
142: \&\f5LexOrd\fP, mapping a structure of signature \&\f5ORD\fP
143: to a new structure of signature \&\f5LEXORD\fP.
144: The structure
145: .M3 LS
146: is defined as the result of applying
147: .M3 LexOrd
148: to
149: .M3 S .
150: We refer to components of a structure using qualified names or
151: paths formed with the usual ``dot'' notation: \fIe.g.\fP \&\f5S.t\fP,
152: \&\f5S.x\fP, \&\f5LS.A.le\fP.
153: .PP
154: A signature can be regarded as a form of ``type''
155: for structures, or as a schematic representation
156: of a class of structures, and a
157: structure \f2matches\f1 a signature if it satisfies the specifications
158: given in the signature. A structure does not have to agree exactly with
159: a signature in order for it to match the signature;
160: in this example the structure
161: .M3 S
162: matches the signature
163: .M3 ORD ,
164: even though
165: .M3 S
166: has an additional value component
167: .M3 x
168: not specified in
169: .M3 ORD .
170: In such cases signature matching has a coercive
171: effect, producing a ``thinned'' structure that exactly
172: agrees with the signature in terms of number of
173: components and their types.
174: .PP
175: Signature matching is performed in two contexts:
176: (1) when a signature constraint is given in a structure
177: declaration, as in:
178: .M1
179: structure R : ORD = S
180: .M2
181: and (2) when a functor is applied to an argument structure,
182: which must match the signature specified for the formal
183: parameter, as in
184: .M1
185: structure LS = LexOrd(S)
186: .M2
187: where
188: .M3 S
189: must match
190: .M3 ORD .
191: Actually, these two contexts are closely related under Landin's
192: principle of correspondence. In the first case, \&\f5R\fP is bound
193: to a thinned version of \&\f5S\fP that does not contain an \&\f5x\fP
194: component, and in the second case, the formal parameter \&\f5O\fP, and hence
195: the substructure \&\f5LS.A\fP, is also bound to a thinned version of \&\f5S\fP.
196: .PP
197: Another kind of specification that may appear in signatures
198: is the \f2sharing constraint\f1, the purpose of which is to
199: insure a kind of type coherence among functor parameters.
200: The program sketch in Figure 2 illustrates the use of sharing constraints.
201: .M1
202: signature SYMBOL = sig type symbol ... end
203:
204: signature LEX =
205: sig
206: structure Symbol : SYMBOL
207: val next : unit -> Symbol.symbol
208: ...
209: end
210:
211: signature SYMBOLTABLE =
212: sig
213: structure Symbol : SYMBOL
214: type var
215: val bind : Symbol.symbol * var -> unit
216: ...
217: end
218:
219: signature PARSE_ARGS =
220: sig
221: structure Lex : LEX
222: structure SymTab : SYMBOLTABLE
223: sharing Lex.Symbol = SymTab.Symbol
224: end
225:
226: functor Parse(A: PARSE_ARGS) =
227: struct ... A.SymTab.bind(A.Lex.next(), v) ... end
228: .M2
229: .tl ''\f2Figure 2\f1''
230: .PP
231: The functor
232: .M3 Parse
233: essentially takes two structure arguments,
234: .M3 Lex
235: (implementing a lexical analyzer) and
236: .M3 SymTab
237: (implementing a symbol table), which are bundled as components
238: of a single parameter structure.
239: The sharing specification in
240: .M3 PARSE_ARGS
241: requires that the same
242: .M3 Symbol
243: structure be used in both
244: .M3 Lex
245: and
246: .M3 SymTab .
247: This insures that
248: .M3 Lex
249: and
250: .M3 SymTab
251: can consistently interact, as in the expression
252: .M3 "A.Symtab.bind(A.Lex.next(),v)" ,
253: which is well-typed only if
254: .M3 A.Lex.Symbol.symbol
255: and
256: .M3 A.Symtab.Symbol.symbol
257: are the same type.
258: .PP
259: An important point about datatype and structure declarations
260: is that they are \f2generative\f1, meaning
261: that each time they are elaborated (\fIe.g.\fP in a functor body as a result
262: of functor applications) a new, distinct structure or type is created.
263: For example, in
264: .M1
265: functor F () =
266: struct
267: datatype t = A | B of t
268: end
269:
270: structure S1 = F()
271: structure S2 = F()
272: .M2
273: .M3 S1
274: and
275: .M3 S2
276: are distinct structures and
277: .M3 S1.t
278: and
279: .M3 S2.t
280: are distinct types, so
281: .M3 S1.B(S2.A)
282: is an ill-typed expression.
283: .PP
284: On the other hand, simple type definitions (whether occurring inside or
285: outside of structures) are \f2transparent\f1 rather than generative.
286: For instance, in
287: .M1
288: structure IntOrd =
289: struct
290: type t = int
291: fun le(x,y) = x <= y
292: end
293: .M2
294: the type
295: .M3 S.t
296: is identical to \&\f5int\fP.
297: In other words, there is no information hiding or abstraction
298: inherent in the formation of structures.
299: This applies even to the results of functor applications; type
300: information is propagated through functor applications, so that after
301: the declaration
302: .M1
303: structure IntLexOrd = LexOrd (IntOrd)
304: .M2
305: .M3 IntLexOrd.le
306: has type
307: .M3 "int list * int list -> bool" .
308: This reflects the dependent product nature of functor signatures, and
309: the fact that structures represent a form of \fIstrong\fP dependent sum
310: (see [9,11] for discussion of the relation between ML modules and dependent
311: types).
312: .NH
313: Implementation of modules
314: .PP
315: The principal tasks that an implementation must deal with are as follows:
316: .IP (1)
317: representation of signatures, structures, and functors.
318: .IP (2)
319: signature matching, including instantiation of the signature
320: template and possible thinning of the matched structure.
321: .IP (3)
322: functor application, including
323: .RS
324: .IP (a)
325: matching formal signature to actual parameter, with possible thinning
326: of the parameter.
327: .IP (b)
328: creation of the result structure, including propagation of type information from
329: parameter to result and generation of new instances of datatypes and
330: structures.
331: .RE
332: .IP (4)
333: representation and checking of sharing constraints.
334: .PP
335: Most of these tasks have two parts, the \f2static\f1 or compile-time
336: task and the \f2dynamic\f1 or run-time task.
337: The run-time problems are straightforward and are discussed in the
338: next subsection.
339: Our main focus will be on the static aspects of the module language,
340: for which our principal implementation goals are:
341: .IP (1)
342: compact representation of structures having a given signature
343: .IP (2)
344: efficient signature matching and functor application, with minimal
345: duplication of static (\fIi.e.\fP symbol table) information
346: .IP (3)
347: efficient representation and checking of sharing constraints.
348: .NH 2
349: Dynamic representations and processes.
350: .PP
351: The run-time representations of modules are remarkably simple [1].
352: Signatures and types have no run-time representation \(em they exist
353: only at the static level. A structure is represented as a record
354: whose components represent the dynamic
355: structure components (\fIi.e.\fP substructures, values, and exceptions) in a
356: canonical order. A functor is represented as an ordinary function
357: closure, and functor application corresponds to the normal application
358: of this function to a record representing the argument structure. The
359: thinning coercions associated with signature matching give rise to
360: in-line code to construct the thinned record.
361: .PP
362: In the middle-end of the compiler, all module constructs are reduced
363: to the same simple lambda-calculus based intermediate language that is
364: used for the core ML constructs of value declarations and expressions.
365: In effect, the back-end of the compiler is unaware of the existence of
366: the module constructs \(em they have been reduced to common notions of
367: records and functions.
368: .NH 2
369: Static Representations
370: .PP
371: A naive representation of signatures and structures can be modeled
372: more or less directly on the semantic constructs used in the operational static
373: semantics [5,6].
374: There a structure is modeled by an environment %E%
375: that maps component identifiers to the appropriate sort of
376: static binding (type, structure, variable, etc.), and a
377: \fIstamp\fP,* %n%, that uniquely identifies the structure:
378: %str^=^(n,E)%.
379: .FS
380: *We prefer the term ``stamp'' for this purpose, rather than the term
381: ``name'' used in [6,7],
382: since ``name'' could also refer to the identifier to which a structure
383: is bound or an identifier bound within a structure.
384: .FE
385: We can view a structure as a tree or dag with nodes labeled by stamps
386: and edges labeled by component names.
387: A signature is then a structure together with a designated set of
388: \fIbound\fP or \fIschematic\fP stamps occurring within the structure:
389: %sig^=^(N)(n,E)%.
390: .PP
391: We illustrate this with the definitions in Figure 3
392: and the corresponding graphs in Figure 4 (adapted from [6]), in which
393: \fI(a)\fP represents the structure \&\f5C\fP and \fI(b)\fP represents
394: the signature \&\f5SIGC\fP.
395: Our convention for distinguishing between constant and bound stamps
396: is that metavariables %k sub i%
397: range over constant stamps, while metavariables %x sub i% range over
398: bound stamps in a signature. This is a more concise alternative to the
399: separate specification of the graph and the set of bound stamps
400: We emphasize the distinction by using solid
401: circles for nodes with constant stamps and open circles for nodes with
402: bound stamps.
403: A structure will always contain only constant
404: stamps, while a signature will typically contain only bound stamps. The
405: graphs are simplified by showing only structure components, but type
406: components are dealt with similarly.
407:
408: .DS
409: .ft 5
410: .ta 3i
411: structure A = signature SIGA =
412: struct sig
413: type t = int type t
414: fun f n = 2 * n val f : t -> t
415: end end
416:
417: structure B = signature SIGB =
418: struct sig
419: structure BA = A structure BA: SIGA
420: fun g x = BA.f(x) + 1 val g: BA.t -> BA.t
421: end end
422:
423: structure C = signature SIGC =
424: struct sig
425: structure CA = A structure CA : SIGA
426: structure CB = B structure CB : SIGB
427: fun h x = CB.g(CA.f x) val h: CA.t -> CA.t
428: end end
429: .ft R
430: .DE
431: .tl ''\f2Figure 3\f1''
432: .PS
433: circlerad = 0.05i
434: define str '
435: {C1: circle
436: move to C1
437: move down right
438: C2: circle
439: move to C2
440: move down left
441: C3: circle
442: A1: arrow from C1.c to C2.c chop
443: A2: arrow from C2.c to C3.c chop
444: A3: arrow from C1.c to C3.c chop
445: "%k sub 1% " at C1.w rjust
446: " %k sub 3%" at C2.e ljust
447: "%k sub 2% " at C3.w rjust
448: " CB" at A1.center ljust
449: " BA" at A2.center ljust
450: "CA " at A3.center rjust
451: move to C1
452: move down 1.3i
453: "\f5structure C\fR" center
454: }
455: '
456: define sig '
457: {C1: circle
458: move to C1
459: move down left
460: C2: circle
461: move to C1
462: move down right
463: C3: circle
464: move to C3
465: move down
466: C4: circle
467: A1: arrow from C1.c to C2.c chop
468: A2: arrow from C1.c to C3.c chop
469: A3: arrow from C3.c to C4.c chop
470: " %x sub 1%" at C1.e ljust
471: "%x sub 2% " at C2.w rjust
472: " %x sub 3%" at C3.e ljust
473: " %x sub 4%" at C4.e ljust
474: "CA " at A1.center rjust
475: " CB" at A2.center ljust
476: " BA" at A3.center ljust
477: move to C1
478: move down 1.3i
479: "\f5signature SIGC\fR" center
480: }
481: '
482: H: str; move to H; move right 2.25i; sig
483: .PE
484: .tl ''\f2Figure 4\f1''
485:
486: .PP
487: The purpose of a signature matching \&\f5S: SIG\fP is to produce a
488: structure \&\f5S'\fP that has exactly the form specified by
489: \&\f5SIG\fP and yet shares the identity (\fIi.e.\fP the stamps) of
490: \&\f5S\fP. In some cases, \&\f5S\fP and \&\f5S'\fP are identical, as
491: when \&\f5S\fP had already matched the signature \&\f5SIG\fP. In
492: other cases \&\f5S'\fP is a thinned version of \&\f5S\fP having fewer
493: components or components whose types are generic instances of their
494: types in \&\f5S\fP. Another product of matching is the realization
495: map, whose use in functor applications is explained in Section 3.5.
496: .PP
497: We can think of \&\f5SIG\fP as a scheme analogous to a generic type scheme or
498: polytype in the core ML type system [4], with the bound stamps
499: playing the role of generic type variables in a type scheme. The
500: product of matching is then an instance of this scheme under the
501: substitution represented by the realization map. The details of this
502: analogy have been worked out by Mads Tofte, including a version of the
503: principal typing theorem of [4].
504: .PP
505: A naive implementation of matching would make a copy of the signature
506: \&\f5SIG\fP, in the process replacing each bound stamp by the corresponding
507: constant stamp from \&\f5S\fP. This would involve copying most of the
508: environment part of the signature, since we have to instantiate the
509: type specifications of values and exceptions as well as instantiating
510: the types and substructures themselves. However, the environment or
511: symbol table part of the signature can be regarded as a template
512: relative to its type and substructure components, which are the only
513: parts that need to change during signature matching. For instance, a
514: type specification like \&\f5f: t->t\fP can remain fixed if it is interpreted
515: relative to the type component \&\f5t\fP. We can abstract out the type and
516: structure components carrying the bound stamps and use the rest of the
517: information in a signature as an unchanging template that can be
518: shared by all instances of the signature. This is the familiar
519: \fIstructure sharing\fP idea first proposed by Boyer and Moore in the
520: context of resolution theorem proving [3] and later exploited in the
521: implementation of Prolog [13]. The use of structure sharing in
522: the basic ML type system has been considered, but in that context it
523: does not appear to have a clear advantage over the simpler approach of
524: instantiation by copying. In the case of signature matching, however,
525: the shared information in the template is typically of considerable
526: volume, so structure sharing is quite effective in saving space
527: relative to copying.
528: .PP
529: The definitions of the basic datatypes used to represent type
530: constructors, structures and signatures are given in Figure 5. The
531: representations of structures and signatures both use the \&\f5Structure\fP
532: datatype, and differ only in the value of the \&\f5kind\fP field. The \&\f5stamp\fP
533: field contains the identifying stamp, the \&\f5table\fP is the environment
534: component represented as a hash table mapping symbols to the various
535: sorts of bindings, and the \&\f5env\fP field contains a pair of instance
536: vectors for type and structure components. The \&\f5sign\fP field in
537: a signature to identify the signature (the \&\f5stamp\fP field
538: will have a formal value representing a bound stamp); in
539: a structure it identifies the signature the structure is an instantiation of,
540: if any. Bound and constant stamps are both represented as integers;
541: stamps greater than some base value are constant stamps, while stamps
542: less than that value are bound. Within a given signature, bound
543: stamps are canonically numbered starting from 0.
544: .M1
545: datatype tycon
546: = TYCON of {stamp : stamp, ...}
547: | INDtyc of int list
548:
549: datatype Structure
550: = STRstr of
551: {stamp : stamp,
552: sign : stamp option,
553: table : symtable,
554: env : strenv,
555: kind : strkind}
556: | INDstr of int
557:
558: and strkind
559: = STRkind
560: | SIGkind of
561: {share : sharespec,
562: bindings : binding list,
563: stampcounts : {s : int, t : int}}
564: .M2
565: .tl ''\f2Figure 5\f1''
566:
567: .PP
568: The \&\f5INDtyc\fP and \&\f5INDstr\fP forms of type constructors and structures are
569: used within the symbol table to refer indirectly to components stored
570: in the instance vectors. The term \&\f5INDtyc[%i%]\fR refers to the ith
571: element of the type instance vector, while \&\f5INDtyc[%i%,%j%]\fR refers to the
572: jth element of the type vector of the ith element of the structure
573: vector. The type specifications
574: .M1
575: f: t -> t
576: h: CA.t -> CA.t
577: .M2
578: from Figure 3 are represented internally as
579: .M1
580: f: INDtyc[0] -> INDtyc[0]
581: h: INDtyc[0,0] -> INDtyc[0,0]
582: .M2
583: The representation of the entire signature SIGC from Figure 3 can be
584: summarized as follows:
585: .M1
586: SIGC:
587: stamp: 0
588: table: CA => INDstr 0
589: CB => INDstr 1
590: h => VAR: INDtyc[0,0] -> INDtyc[0,0]
591: strenv: structures = <SIGA',SIGB'>
592: types = <>
593:
594: SIGA':
595: stamp: 1
596: table: t => INDtyc[0]
597: f => VAR: INDtyc[0] -> INDtyc[0]
598: strenv: structures = <>
599: types = <DUMMY 0>
600:
601: .M2
602: .M1
603: SIGB':
604: stamp: 2
605: table: BA => INDstr 0
606: g => VAR: BA.t -> BA.t
607: strenv: structures = <SIGA''>
608: types = <>
609:
610: SIGA'':
611: stamp: 3
612: table: t => INDtyc[0]
613: f => VAR: INDtyc[0] -> INDtyc[0]
614: strenv: structures = <>
615: types = <DUMMY 1>
616: .M2
617: .LP
618: Note that there are two copies of the signature \&\f5SIGA\fP, identified
619: as \&\f5SIGA'\fP and \&\f5SIGA''\fP, each with its own stamp. This
620: duplication is required to get the canonical numbering of stamps for
621: each component of \&\f5SIGC\fP, but each of these copies shares the
622: original symbol table component from \&\f5SIGA\fP. \&\f5DUMMY 0\fP
623: and \&\f5DUMMY 1\fP are dummy type constructor components, which have
624: their own separate numbering within the context of \&\f5SIGC\fP.
625: .NH 2
626: Signature Matching
627: .PP
628: We now describe the process of signature matching in terms of the
629: representation described above. Given a signature \&\f5sig\fP and
630: structure \&\f5str\fP represented as
631: .M1
632: sig = STRstr{stamp = x, sign = n, table = sigtab, env = sigenv,
633: kind = SIGkind{bindings,stampcounts,sharing}}
634:
635: str = STRstr{stamp = k, sign = s, table = strtab, env = strenv,
636: kind=STRkind}
637: .M2
638: we first check whether \&\f5s=n\fP , and if so
639: return \&\f5str\fP, because \&\f5str\fP is already an instance of
640: \&\f5sig\fP. Otherwise we attempt to construct a new instance of
641: \&\f5sig\fP. We start by allocating a new pair of instance vectors,
642: \&\f5newenv={s=sNew,t=tNew}\fP, based on the size information in the
643: \&\f5stampcounts\fP field. Then we iterate through the list of all
644: of \&\f5sig\fP's bindings (\fIi.e.\fP specifications), which is
645: available in the field \&\f5bindings\fP. For each structure binding
646: \&\f5(id,INDstr i)\fP in \&\fIsig\fP, we look up a structure named
647: \&\fIid\fP in \&\f5strtab\fP. If it does not exist, matching fails.
648: If it does exist, we recursively match it against the substructure
649: signature bound to \&\f5id\fP in \&\f5sig\fP (obtained as the ith
650: element of the structure vector in \&\f5sigenv\fP), and if successful
651: use the result to define the ith element of \&\f5sNew\fP. Similarly
652: for type bindings, where we check that the type constructor bound in
653: \&\f5str\fP agrees with the specification in sig (\fIe.g.\fP they must have
654: the same arity). For value specifications like \&\f5x: ty\fP, we
655: interpret indirect type constructors in \&\f5ty\fP with respect to
656: \&\f5newenv\fP and check that we have a generic instance of the type
657: of the corresponding component of \&\f5str\fP. Checking value
658: components has no effect on the instance vectors, but if necessary
659: (\fIi.e.\fP if \&\f5sig\fP has fewer value components than
660: \&\f5str\fP) we calculate the translation between the old and new
661: runtime positions of the components and collect this information in a
662: thinning specification to be applied at runtime.
663: .PP
664: When we have successfully matched all the bindings in \&\f5sig\fP we build the
665: result structure
666: .M3 str\'
667: using
668: .M3 sig \'s
669: table and
670: .M3 sign ,
671: .M3 str \'s
672: stamp ,
673: and
674: .M3 newenv
675: .M1
676: str' = STRstr {stamp=n, sign=SOME k, table=sigtab,
677: env=newenv, kind=STRkind}
678: .M2
679: Finally any sharing constraints (from sharing) are checked as
680: described in Section 3.5, and we return \&\f5str'\fP and a thinning
681: specification as the result of the match. Note that
682: the bulk of the information in the signature is in
683: .M3 sigtab ,
684: and this is directly shared with the instantiation
685: .M3 str\' .
686: .PP
687: As a shortcut, when elaborating a declaration like
688: .M1
689: structure S: SIG = struct \fIdeclarations\fP end
690: .M2
691: we do not build the structure on the right-hand side before doing
692: the signature match. Instead we elaborate the body declarations in
693: the top-level environment and then do the signature matching using
694: the top-level environment in the place of the target structure.
695: .NH 2
696: Functors and functor application
697: .PP
698: In the static semantics a functor %F% is modeled by a pair of
699: structures representing the parameter and body of the functor, and
700: two sets of bound stamps.
701: .EQ I
702: F ~ = ~ ( N ) ( S sub p , ^ ( N ' ) S sub b )
703: .EN
704: %N% is the set of bound stamps in the parameter structure, which may
705: also occur in the body %S sub b%, while %N '%
706: is the set of stamps associated with generative elements of the
707: functor body. To apply %F% to an argument structure %A% we perform
708: the following steps
709: .EQ I
710: r ~ = ~ match ( ( N ) S sub p , ^ A )
711: .EN
712: .EQ I
713: g ~ = ~ generate ( N ' )
714: .EN
715: .EQ I
716: F ( A ) ~ = ~ g ( r ( S sub b ) )
717: .EN
718: That is, we match the parameter signature and the argument to produce
719: a realization map %r%, then we generate a realization map %g% that maps
720: each bound stamp in %N '% to a unique, new constant stamp, and finally
721: we produce the result structure by using %r% and %g% to instantiate
722: the body structure.
723: .PP
724: The implementation of functors follows this scheme closely. The
725: datatype used to represent functors is defined by
726: .M1
727: datatype Functor = FUNCTOR of
728: {param : Structure,
729: body : Structure,
730: tycCount : int}
731: .M2
732: The bound stamps in the \&\f5param\fP structure are numbered from 0 to
733: %n% and these may also occur in the \&\f5body\fP structure.
734: Generative stamps in the body are numbered from %n+1% to %n+m%, which
735: is the value of \&\f5tycCount\fP.
736: .PP
737: If the functor declaration provides an explicit result signature, as in
738: .M1
739: functor F(X : SIGP) : SIGR = struct ... end
740: .M2
741: the body will naturally be schematic (i.e. the parts with bound stamps will
742: be isolated in instance vectors) as a result of the signature
743: matching between the body and the result signature. However, if there
744: is no result signature, we explicitly abstract these ``volatile'' parts of
745: the body structure to get an instantiable scheme so that the body's
746: symbol table may be shared by all structures produced by the functor.
747: .PP
748: To apply the functor, signature matching is performed between the
749: parameter signature and the argument to build a realization map for
750: the bound stamps in the parameter. Then the body is instantiated
751: using this realization map and introducing new constant stamps to
752: replace generative bound stamps as required. The actual algorithm
753: is more complicated than this because functor application can occur
754: within the body of functor declaration, as in
755: .M1
756: functor F(X : SIGP) =
757: struct
758: structure A = G(X)
759: structure B = H(A)
760: ...
761: end
762: .M2
763: In cases like the applications of \&\f5G\fP and \&\f5H\fP in this
764: example, the actual parameter may contain parameter bound and even
765: generatively bound stamps, and the realization of the generative
766: stamps in the body of \&\f5G\fP and \&\f5H\fP will themselves be
767: generatively bound stamps.
768: .NH 2
769: Sharing
770: .PP
771: The purpose of sharing constraints is to insure a kind of compatibility
772: between several parameter structures of a functor, as illustrated in
773: Figure 2. The sharing constraints are expressed as sets of equations
774: between paths designating structures or types (there are two kinds of
775: sharing specifications: structure sharing and type sharing) and they
776: determine an equivalence relation amongst the components of the
777: signature. The strategy for incorporating sharing constraints in the
778: representation of a signature is to force all components of an
779: equivalence class to have the same stamp.
780: .PP
781: Two components may be required to share either because they are
782: directly equated in a sharing specification, or because they are
783: corresponding components of structures that are required to share.
784: Thus if
785: .M1
786: X : sig
787: structure C1 : SIGC
788: structure C2 : SIGC
789: sharing C1.CB = C2.CB
790: end
791: .M2
792: then \&\f5X.C1.CB = X.C2.CB\fP is directly specified, and \&\f5X.C1.CB.BA =
793: X.C2.CB.BA\fP is an inferred consequence. This simply says that the
794: complete sharing relation must be a congruence with respect to the
795: operation of selecting a named substructure or type.
796: .PP
797: Under what circumstances may two structures be constrained to share?
798: They must be \fIconsistent\fP, in the sense that it is possible to
799: find a structure that could simultaneously match both of them. In
800: particular, the structures that are forced to share do not necessarily
801: have to share the same signature, and the fact that they share does
802: not have any effect on their signature. The idea is that various
803: thinned versions of a given structure may have different signatures,
804: but they can still share because they are actually restricted views
805: of their common ancestor structure. This approach is supported by the
806: fact that in signature matching, the stamps of the matched structure
807: are inherited by the resultant structure. Note that this is not the
808: approach described in [6], where signatures of sharing
809: structures are forced to agree by formation of a kind of union
810: signature. We do not actually verify that signatures that are
811: specified to share are consistent, but if they are not, the signature
812: containing the sharing specification can never be successfully matched.
813: .PP
814: The processing of the sharing constraints is performed in two stages.
815: First, a union-find algorithm is used to determine all sharing
816: relations, direct and inferred, and to construct the equivalence
817: classes for the sharing relation. At this stage it is also possible
818: to detect certain pathological sharing specifications, such as trying
819: to identify a structure with one of its substructures. Second, the
820: signature is copied and each element of a given equivalence class is
821: given the same representative stamp.
822: .PP
823: There are two ways in which sharing information is used. (1) When a
824: signature with sharing constraints is used as a functor parameter, the
825: identification of stamps in the signature will automatically insure
826: that the sharing has the desired effect during type checking, \fIi.e.\fP
827: types that are specified to share will be seen to be identical by the
828: type checker. (2) During signature matching, any sharing relations
829: specified in the signature must also hold in the matched structure.
830: One way to check this would be to make sure that the realization map
831: was well defined, because a failure of sharing in the target structure
832: would cause a single bound stamp to be mapped to more than one target
833: stamp. However, the realization map is only explicitly constructed in
834: the matching of functor parameters, where it is needed to help
835: instantiate the functor body. Hence it is more convenient to simply
836: save the original sharing constraints as equations in the signature
837: and check them explicitly in the target structure as part of signature
838: matching.
839: .NH 2
840: Relation with type checking
841: .PP
842: What is the relationship between the structures and signatures and the
843: underlying ML type checking mechanism? Obviously signatures and structures
844: are carriers of type information \(em that is one of their principle purposes.
845: When we look up a value component of a structure we get the same sort of bindings
846: as in the top-level environment, except that in some circumstances the
847: type has been relativized to the structure's instance vectors and it
848: contains \&\f5INDtyc\fP type constructors. The basic variable lookup
849: functions have been defined to eliminate these indirections by replacing
850: them with the referenced type constructors from the instance vectors,
851: at the expense of partially copying the type. This would appear to undo
852: some of the savings achieved by the structure sharing representation of
853: structures, but these copies tend to be ephemeral, and they are
854: quickly and efficiently garbage-collected. The type information in
855: structures and signatures is, on the other hand, long-lived, so it is
856: more critical to minimize their space requirements.
857: .NH
858: Conclusions
859: .PP
860: The challenge of implementing the Standard ML module features is to
861: perform the compile-time matchings and instantiations necessary to
862: propagate and check type information with a minimum of duplication of
863: that information. Experience has shown that a naive approach leads to
864: an explosion in the size of the static representations.
865: .PP
866: The implementation strategy described in this paper uses a structure
867: sharing instantiation technique instead of instantiation by copying,
868: and has proved to be reasonably modest in its space requirements. It
869: also has the advantage that it remains quite close in spirit to the
870: formal static semantics.
871: .PP
872: Work on the Standard ML module facilities continues, and current
873: topics of interest include explicit functor signatures and the relation
874: of the module constructs to separate compilation.
875: .SH
876: Acknowledgements
877: .PP
878: This implementation is inspired directly by the work done on the
879: operational semantics of Standard ML by Robin Milner, Mads Tofte, and
880: Bob Harper, and I have benefited from many discussions with them.
881: Bob Harper passed on useful ideas from his prototype implementation
882: of modules. Nick Rothwell helped me explore the use of structure sharing
883: in type checking. Andrew Appel provided valuable advice throughout and
884: had a direct hand in the implementation of sharing specifications, and
885: he has been an equal partner in the creation of Standard ML of New Jersey.
886: .SH
887: References
888: .IP 1.
889: A. Appel and D. MacQueen, \fIA Standard ML compiler\fP, Proceedings of the
890: Conference on Functional Programming and Computer Architecture, Portland,
891: September 1987, G. Kahn, ed., LNCS Vol. 274, Springer-Verlag, 1987.
892: .IP 2.
893: H.-J. Boehm and A. Demers, \fIImplementing Russell\fP,
894: Proceedings of SIGPLAN 86 Symposium on Compiler Construction, Palo Alto,
895: 1986, 186-195.
896: .IP 3.
897: R. S. Boyer and J Moore, \fIThe sharing of structure in theorem-proving
898: programs\fP, Machine Intelligence 7, B. Meltzer and D. Michie, eds., Edinburgh
899: University Press, 1972, 101-116.
900: .IP 4.
901: L. Damas and R. Milner, \fIPrincipal type schemes for functional programs\fP,
902: Proceedings of 9th ACM Symposium on Principles of Programming Languages,
903: Albuquerque, 1982, 207-212.
904: .IP 5.
905: R. Harper, D. MacQueen, and R. Milner, \fIStandard ML\fP, Laboratory for
906: Foundations of Computer Science, Dept. of Computer Science, University of
907: Edinburgh, ECS-LFCS-86-2, 1986. (Also Polymorphism II, 2, October 1985.)
908: .IP 6.
909: R. Harper, R. Milner, and M. Tofte, \fIA type discipline for program modules\fP,
910: Proceedings TAPSOFT 87, LNCS Vol. 250, Springer-Verlag, New York, 1987, 308-319.
911: .IP 7.
912: R. Harper, R. Milner, and M. Tofte, \fIThe semantics of Standard ML,
913: Version I\fP, Laboratory for Foundations of Computer Science, Dept. of
914: Computer Science, University of Edinburgh, ECS-LFCS-87-36, 1986.
915: .IP 8.
916: D. MacQueen, \fIModules for Standard ML\fP, in [5]. (An earlier version
917: appeared in Proceedings ACM Symposium on Lisp and Functional Programming,
918: Austin, 1984.)
919: .IP 9.
920: D. MacQueen, \fIUsing dependent types to express modular structure\fP,
921: Proceedings 13th ACM Symposium on Principles of Programming Languages, St.
922: Petersburg Beach, 1986, 277-286.
923: .IP 10.
924: P. Martin-L\o'o"'f, \fIConstructive mathematics and computer programming\fP,
925: Sixth International Congress for Logic, Methodology, and Philosophy of Science,
926: North Holland, Amsterdam, 1982, 153-175.
927: .IP 11.
928: J. C. Mitchell and R. Harper, \fIThe essence of ML\fP, Proceedings 15th
929: ACM Symposium on Principles of Programming Languages, San Diego, 1988,
930: 28-46.
931: .IP 12.
932: M. Tofte, Operational Semantics and Polymorphic Type Inference, Ph.D.
933: Dissertation, Dept. of Computer Science, University of Edinburgh, 1987.
934: .IP 13.
935: D. H. D. Warren, \fIImplementing PROLOG - Compiling Predicate Logic Programs,
936: Vol. I\fP, Dept. of Artificial Intelligence Report No. 39, University of
937: Edinburgh, 1977.
This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.