Annotation of researchv10dc/cmd/sml/doc/papers/modimp/paper.ms, revision 1.1

1.1     ! root        1: .RP
        !             2: .ND March 11, 1988
        !             3: .TL
        !             4: An Implementation of Standard ML Modules
        !             5: .AU 
        !             6: David MacQueen
        !             7: .AI
        !             8: AT&T Bell Laboratories
        !             9: Murray Hill, NJ 07974
        !            10: .AB
        !            11: Standard ML includes a set of module constructs that support
        !            12: programming in the large.  These constructs extend ML's basic
        !            13: polymorphic type system by introducing the dependent types of Martin
        !            14: L\o'o"'f's Intuitionistic Type Theory.  This paper discusses the
        !            15: problems involved in implementing Standard ML's modules and describes
        !            16: a practical, efficient solution to these problems.  The
        !            17: representations and algorithms of this implementation were inspired by
        !            18: a detailed formal semantics of Standard ML developed by Milner, Tofte,
        !            19: and Harper.  The implementation is part of a new Standard ML compiler
        !            20: that is written in Standard ML using the module system.
        !            21: .AE
        !            22: .EQ
        !            23: delim %%
        !            24: .EN
        !            25: .de M1         \" start of a program display
        !            26: .DS
        !            27: .ft CW
        !            28: ..
        !            29: .de M2
        !            30: .ft R
        !            31: .DE
        !            32: ..
        !            33: .de M3
        !            34: .if "\\$2"" \\&\f(CW\\$1\f1
        !            35: .if !"\\$2"" \\&\f(CW\\$1\f1\\&\\$2
        !            36: ..
        !            37: .fp 5 CW
        !            38: .NH
        !            39: Introduction
        !            40: .PP
        !            41: An important part of the revision of ML that led to
        !            42: the Standard ML language was the inclusion of module
        !            43: facilities for the support of ``programming in the large.''
        !            44: The design of these facilities went through several
        !            45: versions [8] and was supported by concurrent investigations
        !            46: of the type theory of ML and related systems [9,11].
        !            47: The central idea behind the design was to support modularity
        !            48: by introducing a stratified system of dependent types as suggested
        !            49: by Martin L\o'o"'f's Intuitionistic Type Theory [10].
        !            50: In late 1985 Bob Harper added a prototype implementation of
        !            51: most of the module facilities to the Edinburgh ML compiler,
        !            52: which was serving as a test-bed for the evolving Standard ML design.
        !            53: .PP
        !            54: Starting in the spring of 1986, Andrew Appel, Trevor Jim, and I have
        !            55: implemented a new Standard ML compiler, written in Standard ML,
        !            56: and initially bootstrapped from the Edinburgh compiler.  An overview of
        !            57: this new compiler, known as Standard ML of New Jersey, is given in [1].
        !            58: The implementation of modules in this new compiler went through
        !            59: two generations.
        !            60: A first version was done in the fall of 1986, but it was completely
        !            61: rewritten in the summer of 1987 following discussions of the
        !            62: operational static semantics of modules with Robin Milner, Mads Tofte,
        !            63: and Bob Harper [6,7,12].
        !            64: Like Harper's prototype implementation, the new modules implementation
        !            65: was inspired by the static semantics, but it uses a \fIstructure sharing\fP
        !            66: strategy [3,13]
        !            67: to avoid serious performance problems associated with a naive
        !            68: implementation of the static semantics.
        !            69: Although precise comparative measurements are not yet available, our
        !            70: experience shows that the symbol table size for a large ML program
        !            71: such as the ML compiler is several times smaller with the
        !            72: new compiler than with the old compiler.
        !            73: .PP
        !            74: The objective of this paper is to describe our implementation of the
        !            75: Standard ML module facilities, with particular emphasis on the
        !            76: techniques used to minimize the space consumed by static representations
        !            77: of modules (\fIi.e.\fP symbol table structures).
        !            78: We begin by reviewing the elements of the module language.
        !            79: .NH
        !            80: Summary of the module constructs
        !            81: .PP
        !            82: Before describing the basic issues concerning implementation
        !            83: of Standard ML modules, we need to review the main elements
        !            84: of the module language.
        !            85: There are three principal notions:
        !            86: .in .3i
        !            87: .IP (1)
        !            88: \f2signature\f1 \- interface specification or ``type'' of modules.
        !            89: .IP (2)
        !            90: \f2structure\f1 \- an environment; a collection of type,
        !            91: structure, value, and exception bindings; corresponds to
        !            92: the conventional idea of a module.
        !            93: .IP (3)
        !            94: \f2functor\f1 \- function mapping structures to structures;
        !            95: a form of parametric module.
        !            96: .in  -0.3i
        !            97: .LP
        !            98: Figure 1 below contains simple examples of each of these constructs.
        !            99: .M1
        !           100: signature ORD =
        !           101:   sig
        !           102:     type t
        !           103:     val le : t*t -> bool
        !           104:   end
        !           105: 
        !           106: structure S =
        !           107:   struct
        !           108:     datatype t = A | B of t
        !           109:     val x = A
        !           110:     fun le(A,_) = true
        !           111:       | le(_,A) = false
        !           112:       | le(B x, B y) = le(x,y)
        !           113:   end
        !           114: 
        !           115: signature LEXORD =
        !           116:   sig
        !           117:     structure A : ORD
        !           118:     val lexord : A.t list * A.t list -> bool
        !           119:   end
        !           120: 
        !           121: functor LexOrd(O: ORD) : LEXORD =
        !           122:   struct
        !           123:     structure A = O
        !           124:     fun lexord([],_) = true
        !           125:       | lexord(_,[]) = false
        !           126:       | lexord(x::l,y::m) = ... O.le(x,y) ...
        !           127:   end
        !           128: 
        !           129: structure LS = LexOrd(S)
        !           130: .M2
        !           131: .tl ''\f2Figure 1\f1''
        !           132: .PP
        !           133: This example contains declarations of two signatures,
        !           134: .M3 ORD
        !           135: and
        !           136: .M3 LEXORD ,
        !           137: two structures,
        !           138: .M3 S
        !           139: and
        !           140: .M3 LS ,
        !           141: and one functor,
        !           142: \&\f5LexOrd\fP, mapping a structure of signature \&\f5ORD\fP
        !           143: to a new structure of signature \&\f5LEXORD\fP.
        !           144: The structure
        !           145: .M3 LS
        !           146: is defined as the result of applying
        !           147: .M3 LexOrd
        !           148: to
        !           149: .M3 S .
        !           150: We refer to components of a structure using qualified names or
        !           151: paths formed with the usual ``dot'' notation: \fIe.g.\fP \&\f5S.t\fP,
        !           152: \&\f5S.x\fP, \&\f5LS.A.le\fP. 
        !           153: .PP
        !           154: A signature can be regarded as a form of ``type''
        !           155: for structures, or as a schematic representation
        !           156: of a class of structures, and a
        !           157: structure \f2matches\f1 a signature if it satisfies the specifications
        !           158: given in the signature.  A structure does not have to agree exactly with
        !           159: a signature in order for it to match the signature;
        !           160: in this example the structure
        !           161: .M3 S
        !           162: matches the signature
        !           163: .M3 ORD ,
        !           164: even though
        !           165: .M3 S
        !           166: has an additional value component
        !           167: .M3 x
        !           168: not specified in
        !           169: .M3 ORD .
        !           170: In such cases signature matching has a coercive
        !           171: effect, producing a ``thinned'' structure that exactly
        !           172: agrees with the signature in terms of number of
        !           173: components and their types.
        !           174: .PP
        !           175: Signature matching is performed in two contexts:
        !           176: (1) when a signature constraint is given in a structure
        !           177: declaration, as in:
        !           178: .M1
        !           179: structure R : ORD = S
        !           180: .M2
        !           181: and (2) when a functor is applied to an argument structure,
        !           182: which must match the signature specified for the formal
        !           183: parameter, as in
        !           184: .M1
        !           185: structure LS = LexOrd(S)
        !           186: .M2
        !           187: where
        !           188: .M3 S
        !           189: must match
        !           190: .M3 ORD .
        !           191: Actually, these two contexts are closely related under Landin's
        !           192: principle of correspondence.  In the first case, \&\f5R\fP is bound
        !           193: to a thinned version of \&\f5S\fP that does not contain an \&\f5x\fP
        !           194: component, and in the second case, the formal parameter \&\f5O\fP, and hence
        !           195: the substructure \&\f5LS.A\fP, is also bound to a thinned version of \&\f5S\fP.
        !           196: .PP
        !           197: Another kind of specification that may appear in signatures
        !           198: is the \f2sharing constraint\f1, the purpose of which is to
        !           199: insure a kind of type coherence among functor parameters.
        !           200: The program sketch in Figure 2 illustrates the use of sharing constraints.
        !           201: .M1
        !           202: signature SYMBOL = sig type symbol ... end
        !           203: 
        !           204: signature LEX =
        !           205:   sig
        !           206:     structure Symbol : SYMBOL
        !           207:     val next : unit -> Symbol.symbol
        !           208:     ...
        !           209:   end
        !           210: 
        !           211: signature SYMBOLTABLE =
        !           212:   sig
        !           213:     structure Symbol : SYMBOL
        !           214:     type var
        !           215:     val bind : Symbol.symbol * var -> unit
        !           216:     ...
        !           217:   end
        !           218: 
        !           219: signature PARSE_ARGS =
        !           220:   sig
        !           221:     structure Lex : LEX
        !           222:     structure SymTab : SYMBOLTABLE
        !           223:     sharing Lex.Symbol = SymTab.Symbol
        !           224:   end
        !           225: 
        !           226: functor Parse(A: PARSE_ARGS) =
        !           227:   struct ... A.SymTab.bind(A.Lex.next(), v) ... end
        !           228: .M2
        !           229: .tl ''\f2Figure 2\f1''
        !           230: .PP
        !           231: The functor
        !           232: .M3 Parse
        !           233: essentially takes two structure arguments,
        !           234: .M3 Lex
        !           235: (implementing a lexical analyzer) and
        !           236: .M3 SymTab
        !           237: (implementing a symbol table), which are bundled as components
        !           238: of a single parameter structure.
        !           239: The sharing specification in
        !           240: .M3 PARSE_ARGS
        !           241: requires that the same
        !           242: .M3 Symbol
        !           243: structure be used in both
        !           244: .M3 Lex
        !           245: and
        !           246: .M3 SymTab .
        !           247: This insures that
        !           248: .M3 Lex
        !           249: and
        !           250: .M3 SymTab
        !           251: can consistently interact, as in the expression
        !           252: .M3 "A.Symtab.bind(A.Lex.next(),v)" ,
        !           253: which is well-typed only if
        !           254: .M3 A.Lex.Symbol.symbol
        !           255: and
        !           256: .M3 A.Symtab.Symbol.symbol
        !           257: are the same type.
        !           258: .PP
        !           259: An important point about datatype and structure declarations
        !           260: is that they are \f2generative\f1, meaning
        !           261: that each time they are elaborated (\fIe.g.\fP in a functor body as a result
        !           262: of functor applications) a new, distinct structure or type is created.
        !           263: For example, in
        !           264: .M1
        !           265: functor F () =
        !           266:   struct
        !           267:     datatype t = A | B of t
        !           268:   end
        !           269: 
        !           270: structure S1 = F()
        !           271: structure S2 = F()
        !           272: .M2
        !           273: .M3 S1
        !           274: and
        !           275: .M3 S2
        !           276: are distinct structures and 
        !           277: .M3 S1.t
        !           278: and
        !           279: .M3 S2.t
        !           280: are distinct types, so 
        !           281: .M3 S1.B(S2.A)
        !           282: is an ill-typed expression.
        !           283: .PP
        !           284: On the other hand, simple type definitions (whether occurring inside or
        !           285: outside of structures) are \f2transparent\f1 rather than generative.
        !           286: For instance, in 
        !           287: .M1
        !           288: structure IntOrd =
        !           289:   struct
        !           290:     type t = int
        !           291:     fun le(x,y) = x <= y
        !           292:   end
        !           293: .M2
        !           294: the type
        !           295: .M3 S.t
        !           296: is identical to \&\f5int\fP.
        !           297: In other words, there is no information hiding or abstraction
        !           298: inherent in the formation of structures.
        !           299: This applies even to the results of functor applications; type
        !           300: information is propagated through functor applications, so that after
        !           301: the declaration
        !           302: .M1
        !           303: structure IntLexOrd = LexOrd (IntOrd)
        !           304: .M2
        !           305: .M3 IntLexOrd.le
        !           306: has type
        !           307: .M3 "int list * int list -> bool" .
        !           308: This reflects the dependent product nature of functor signatures, and
        !           309: the fact that structures represent a form of \fIstrong\fP dependent sum
        !           310: (see [9,11] for discussion of the relation between ML modules and dependent
        !           311: types).
        !           312: .NH
        !           313: Implementation of modules
        !           314: .PP
        !           315: The principal tasks that an implementation must deal with are as follows:
        !           316: .IP (1)
        !           317: representation of signatures, structures, and functors.
        !           318: .IP (2)
        !           319: signature matching, including instantiation of the signature
        !           320: template and possible thinning of the matched structure.
        !           321: .IP (3)
        !           322: functor application, including 
        !           323: .RS
        !           324: .IP (a)
        !           325: matching formal signature to actual parameter, with possible thinning
        !           326: of the parameter.
        !           327: .IP (b)
        !           328: creation of the result structure, including propagation of type information from 
        !           329: parameter to result and generation of new instances of datatypes and
        !           330: structures.
        !           331: .RE
        !           332: .IP (4)
        !           333: representation and checking of sharing constraints.
        !           334: .PP
        !           335: Most of these tasks have two parts, the \f2static\f1 or compile-time
        !           336: task and the \f2dynamic\f1 or run-time task.
        !           337: The run-time problems are straightforward and are discussed in the
        !           338: next subsection.
        !           339: Our main focus will be on the static aspects of the module language,
        !           340: for which our principal implementation goals are:
        !           341: .IP (1)
        !           342: compact representation of structures having a given signature
        !           343: .IP (2)
        !           344: efficient signature matching and functor application, with minimal 
        !           345: duplication of static (\fIi.e.\fP symbol table) information
        !           346: .IP (3)
        !           347: efficient representation and checking of sharing constraints.
        !           348: .NH 2
        !           349: Dynamic representations and processes.
        !           350: .PP
        !           351: The run-time representations of modules are remarkably simple [1].
        !           352: Signatures and types have no run-time representation \(em they exist
        !           353: only at the static level.  A structure is represented as a record
        !           354: whose components represent the dynamic
        !           355: structure components (\fIi.e.\fP substructures, values, and exceptions) in a
        !           356: canonical order.  A functor is represented as an ordinary function
        !           357: closure, and functor application corresponds to the normal application
        !           358: of this function to a record representing the argument structure.  The
        !           359: thinning coercions associated with signature matching give rise to
        !           360: in-line code to construct the thinned record.
        !           361: .PP
        !           362: In the middle-end of the compiler, all module constructs are reduced
        !           363: to the same simple lambda-calculus based intermediate language that is
        !           364: used for the core ML constructs of value declarations and expressions.
        !           365: In effect, the back-end of the compiler is unaware of the existence of
        !           366: the module constructs \(em they have been reduced to common notions of
        !           367: records and functions.
        !           368: .NH 2
        !           369: Static Representations
        !           370: .PP
        !           371: A naive representation of signatures and structures can be modeled
        !           372: more or less directly on the semantic constructs used in the operational static
        !           373: semantics [5,6].
        !           374: There a structure is modeled by an environment %E%
        !           375: that maps component identifiers to the appropriate sort of
        !           376: static binding (type, structure, variable, etc.), and a 
        !           377: \fIstamp\fP,* %n%, that uniquely identifies the structure:
        !           378: %str^=^(n,E)%.
        !           379: .FS
        !           380: *We prefer the term ``stamp'' for this purpose, rather than the term
        !           381: ``name'' used in [6,7],
        !           382: since ``name'' could also refer to the identifier to which a structure
        !           383: is bound or an identifier bound within a structure.
        !           384: .FE
        !           385: We can view a structure as a tree or dag with nodes labeled by stamps
        !           386: and edges labeled by component names.
        !           387: A signature is then a structure together with a designated set of
        !           388: \fIbound\fP or \fIschematic\fP stamps occurring within the structure:
        !           389: %sig^=^(N)(n,E)%.
        !           390: .PP
        !           391: We illustrate this with the definitions in Figure 3
        !           392: and the corresponding graphs in Figure 4 (adapted from [6]), in which
        !           393: \fI(a)\fP represents the structure \&\f5C\fP and \fI(b)\fP represents
        !           394: the signature \&\f5SIGC\fP.
        !           395: Our convention for distinguishing between constant and bound stamps
        !           396: is that metavariables %k sub i%
        !           397: range over constant stamps, while metavariables %x sub i% range over 
        !           398: bound stamps in a signature.  This is a more concise alternative to the
        !           399: separate specification of the graph and the set of bound stamps
        !           400: We emphasize the distinction by using solid
        !           401: circles for nodes with constant stamps and open circles for nodes with 
        !           402: bound stamps.
        !           403: A structure will always contain only constant
        !           404: stamps, while a signature will typically contain only bound stamps.  The
        !           405: graphs are simplified by showing only structure components, but type
        !           406: components are dealt with similarly.
        !           407: 
        !           408: .DS
        !           409: .ft 5
        !           410: .ta 3i
        !           411: structure A =  signature SIGA =
        !           412:   struct         sig
        !           413:     type t = int           type t
        !           414:     fun f n = 2 * n        val f : t -> t
        !           415:   end    end
        !           416: 
        !           417: structure B =  signature SIGB  =
        !           418:   struct         sig
        !           419:     structure BA = A       structure BA: SIGA
        !           420:     fun g x = BA.f(x) + 1          val g: BA.t -> BA.t
        !           421:   end    end
        !           422: 
        !           423: structure C =  signature SIGC =
        !           424:   struct         sig
        !           425:     structure CA = A       structure CA : SIGA
        !           426:     structure CB = B       structure CB : SIGB
        !           427:     fun h x = CB.g(CA.f x)         val h: CA.t  -> CA.t
        !           428:   end    end
        !           429: .ft R
        !           430: .DE
        !           431: .tl ''\f2Figure 3\f1''
        !           432: .PS
        !           433: circlerad = 0.05i
        !           434: define str '
        !           435: {C1: circle
        !           436:  move to C1
        !           437:  move down right
        !           438:  C2: circle
        !           439:  move to C2
        !           440:  move down left
        !           441:  C3: circle
        !           442:  A1: arrow from C1.c to C2.c chop
        !           443:  A2: arrow from C2.c to C3.c chop
        !           444:  A3: arrow from C1.c to C3.c chop
        !           445:  "%k sub 1% " at C1.w rjust
        !           446:  " %k sub 3%" at C2.e ljust
        !           447:  "%k sub 2% " at C3.w rjust
        !           448:  " CB" at A1.center ljust
        !           449:  " BA" at A2.center ljust
        !           450:  "CA " at A3.center rjust
        !           451:  move to C1
        !           452:  move down 1.3i
        !           453:  "\f5structure C\fR" center
        !           454:  }
        !           455: '
        !           456: define sig '
        !           457: {C1: circle
        !           458:  move to C1
        !           459:  move down left
        !           460:  C2: circle
        !           461:  move to C1
        !           462:  move down right
        !           463:  C3: circle
        !           464:  move to C3
        !           465:  move down
        !           466:  C4: circle
        !           467:  A1: arrow from C1.c to C2.c chop
        !           468:  A2: arrow from C1.c to C3.c chop
        !           469:  A3: arrow from C3.c to C4.c chop
        !           470:  " %x sub 1%" at C1.e ljust
        !           471:  "%x sub 2% " at C2.w rjust
        !           472:  " %x sub 3%" at C3.e ljust
        !           473:  " %x sub 4%" at C4.e ljust
        !           474:  "CA " at A1.center rjust
        !           475:  " CB" at A2.center ljust
        !           476:  " BA" at A3.center ljust
        !           477:  move to C1
        !           478:  move down 1.3i
        !           479:  "\f5signature SIGC\fR" center
        !           480: }
        !           481: '
        !           482: H: str; move to H; move right 2.25i; sig
        !           483: .PE
        !           484: .tl ''\f2Figure 4\f1''
        !           485: 
        !           486: .PP
        !           487: The purpose of a signature matching \&\f5S: SIG\fP is to produce a
        !           488: structure \&\f5S'\fP that has exactly the form specified by
        !           489: \&\f5SIG\fP and yet shares the identity (\fIi.e.\fP the stamps) of
        !           490: \&\f5S\fP.  In some cases, \&\f5S\fP and \&\f5S'\fP are identical, as
        !           491: when \&\f5S\fP had already matched the signature \&\f5SIG\fP.  In
        !           492: other cases \&\f5S'\fP is a thinned version of \&\f5S\fP having fewer
        !           493: components or components whose types are generic instances of their
        !           494: types in \&\f5S\fP.  Another product of matching is the realization
        !           495: map, whose use in functor applications is explained in Section 3.5.
        !           496: .PP
        !           497: We can think of \&\f5SIG\fP as a scheme analogous to a generic type scheme or
        !           498: polytype in the core ML type system [4], with the bound stamps
        !           499: playing the role of generic type variables in a type scheme.  The
        !           500: product of matching is then an instance of this scheme under the
        !           501: substitution represented by the realization map.  The details of this
        !           502: analogy have been worked out by Mads Tofte, including a version of the
        !           503: principal typing theorem of [4].
        !           504: .PP
        !           505: A naive implementation of matching would make a copy of the signature
        !           506: \&\f5SIG\fP, in the process replacing each bound stamp by the corresponding
        !           507: constant stamp from \&\f5S\fP.  This would involve copying most of the
        !           508: environment part of the signature, since we have to instantiate the
        !           509: type specifications of values and exceptions as well as instantiating
        !           510: the types and substructures themselves.  However, the environment or
        !           511: symbol table part of the signature can be regarded as a template
        !           512: relative to its type and substructure components, which are the only
        !           513: parts that need to change during signature matching.  For instance, a
        !           514: type specification like \&\f5f: t->t\fP can remain fixed if it is interpreted
        !           515: relative to the type component \&\f5t\fP.  We can abstract out the type and
        !           516: structure components carrying the bound stamps and use the rest of the
        !           517: information in a signature as an unchanging template that can be
        !           518: shared by all instances of the signature.  This is the familiar
        !           519: \fIstructure sharing\fP idea first proposed by Boyer and Moore in the
        !           520: context of resolution theorem proving [3] and later exploited in the
        !           521: implementation of Prolog [13].  The use of structure sharing in
        !           522: the basic ML type system has been considered, but in that context it
        !           523: does not appear to have a clear advantage over the simpler approach of
        !           524: instantiation by copying.  In the case of signature matching, however,
        !           525: the shared information in the template is typically of considerable
        !           526: volume, so structure sharing is quite effective in saving space
        !           527: relative to copying.
        !           528: .PP
        !           529: The definitions of the basic datatypes used to represent type
        !           530: constructors, structures and signatures are given in Figure 5.  The
        !           531: representations of structures and signatures both use the \&\f5Structure\fP
        !           532: datatype, and differ only in the value of the \&\f5kind\fP field.  The \&\f5stamp\fP
        !           533: field contains the identifying stamp, the \&\f5table\fP is the environment
        !           534: component represented as a hash table mapping symbols to the various
        !           535: sorts of bindings, and the \&\f5env\fP field contains a pair of instance
        !           536: vectors for type and structure components.  The \&\f5sign\fP field in
        !           537: a signature to identify the signature (the \&\f5stamp\fP field
        !           538: will have a formal value representing a bound stamp); in
        !           539: a structure it identifies the signature the structure is an instantiation of,
        !           540: if any.  Bound and constant stamps are both represented as integers;
        !           541: stamps greater than some base value are constant stamps, while stamps
        !           542: less than that value are bound.  Within a given signature, bound
        !           543: stamps are canonically numbered starting from 0.
        !           544: .M1
        !           545: datatype tycon
        !           546:   = TYCON of {stamp : stamp, ...}
        !           547:   | INDtyc of int list
        !           548: 
        !           549: datatype Structure
        !           550:   = STRstr of
        !           551:       {stamp : stamp,
        !           552:        sign  : stamp option,
        !           553:        table : symtable,
        !           554:        env   : strenv,
        !           555:        kind  : strkind}
        !           556:   | INDstr of int
        !           557: 
        !           558: and strkind
        !           559:   = STRkind
        !           560:   | SIGkind of
        !           561:       {share : sharespec,
        !           562:        bindings : binding list,
        !           563:        stampcounts : {s : int, t : int}}
        !           564: .M2
        !           565: .tl ''\f2Figure 5\f1''
        !           566: 
        !           567: .PP
        !           568: The \&\f5INDtyc\fP and \&\f5INDstr\fP forms of type constructors and structures are
        !           569: used within the symbol table to refer indirectly to components stored
        !           570: in the instance vectors.  The term \&\f5INDtyc[%i%]\fR refers to the ith
        !           571: element of the type instance vector, while \&\f5INDtyc[%i%,%j%]\fR refers to the
        !           572: jth element of the type vector of the ith element of the structure
        !           573: vector.  The type specifications
        !           574: .M1
        !           575: f: t -> t
        !           576: h: CA.t -> CA.t
        !           577: .M2
        !           578: from Figure 3 are represented internally as
        !           579: .M1
        !           580: f: INDtyc[0] -> INDtyc[0]
        !           581: h: INDtyc[0,0] -> INDtyc[0,0]
        !           582: .M2
        !           583: The representation of the entire signature SIGC from Figure 3 can be
        !           584: summarized as follows:
        !           585: .M1
        !           586: SIGC:
        !           587:   stamp:  0
        !           588:   table:  CA => INDstr 0
        !           589:           CB => INDstr 1
        !           590:           h  => VAR: INDtyc[0,0] -> INDtyc[0,0]
        !           591:   strenv: structures = <SIGA',SIGB'>
        !           592:           types      = <>
        !           593: 
        !           594: SIGA':
        !           595:   stamp:  1
        !           596:   table:  t  => INDtyc[0]
        !           597:           f  => VAR: INDtyc[0] -> INDtyc[0]
        !           598:   strenv: structures = <>
        !           599:           types      = <DUMMY 0>
        !           600: 
        !           601: .M2
        !           602: .M1
        !           603: SIGB':
        !           604:   stamp:  2
        !           605:   table:  BA => INDstr 0
        !           606:           g  => VAR: BA.t -> BA.t
        !           607:   strenv: structures = <SIGA''>
        !           608:           types      = <>
        !           609: 
        !           610: SIGA'':
        !           611:   stamp:  3
        !           612:   table:  t  => INDtyc[0]
        !           613:           f  => VAR: INDtyc[0] -> INDtyc[0]
        !           614:   strenv: structures = <>
        !           615:           types      = <DUMMY 1>
        !           616: .M2
        !           617: .LP
        !           618: Note that there are two copies of the signature \&\f5SIGA\fP, identified
        !           619: as \&\f5SIGA'\fP and \&\f5SIGA''\fP, each with its own stamp.  This 
        !           620: duplication is required to get the canonical numbering of stamps for
        !           621: each component of \&\f5SIGC\fP, but each of these copies shares the
        !           622: original symbol table component from \&\f5SIGA\fP.  \&\f5DUMMY 0\fP
        !           623: and \&\f5DUMMY 1\fP are dummy type constructor components, which have
        !           624: their own separate numbering within the context of \&\f5SIGC\fP.
        !           625: .NH 2
        !           626: Signature Matching
        !           627: .PP
        !           628: We now describe the process of signature matching in terms of the
        !           629: representation described above.  Given a signature \&\f5sig\fP and
        !           630: structure \&\f5str\fP represented as
        !           631: .M1
        !           632: sig = STRstr{stamp = x, sign = n, table = sigtab, env = sigenv,
        !           633:              kind = SIGkind{bindings,stampcounts,sharing}}
        !           634: 
        !           635: str = STRstr{stamp = k, sign = s, table = strtab, env = strenv,
        !           636:              kind=STRkind}
        !           637: .M2
        !           638: we first check whether \&\f5s=n\fP , and if so
        !           639: return \&\f5str\fP, because \&\f5str\fP is already an instance of
        !           640: \&\f5sig\fP.  Otherwise we attempt to construct a new instance of
        !           641: \&\f5sig\fP.  We start by allocating a new pair of instance vectors,
        !           642: \&\f5newenv={s=sNew,t=tNew}\fP, based on the size information in the
        !           643: \&\f5stampcounts\fP field.  Then we iterate through the list of all
        !           644: of \&\f5sig\fP's bindings (\fIi.e.\fP specifications), which is
        !           645: available in the field \&\f5bindings\fP.  For each structure binding
        !           646: \&\f5(id,INDstr i)\fP in \&\fIsig\fP, we look up a structure named
        !           647: \&\fIid\fP in \&\f5strtab\fP.  If it does not exist, matching fails.
        !           648: If it does exist, we recursively match it against the substructure
        !           649: signature bound to \&\f5id\fP in \&\f5sig\fP (obtained as the ith
        !           650: element of the structure vector in \&\f5sigenv\fP), and if successful
        !           651: use the result to define the ith element of \&\f5sNew\fP.  Similarly
        !           652: for type bindings, where we check that the type constructor bound in
        !           653: \&\f5str\fP agrees with the specification in sig (\fIe.g.\fP they must have
        !           654: the same arity).  For value specifications like \&\f5x: ty\fP, we
        !           655: interpret indirect type constructors in \&\f5ty\fP with respect to
        !           656: \&\f5newenv\fP and check that we have a generic instance of the type
        !           657: of the corresponding component of \&\f5str\fP.  Checking value
        !           658: components has no effect on the instance vectors, but if necessary
        !           659: (\fIi.e.\fP if \&\f5sig\fP has fewer value components than
        !           660: \&\f5str\fP) we calculate the translation between the old and new
        !           661: runtime positions of the components and collect this information in a
        !           662: thinning specification to be applied at runtime.
        !           663: .PP
        !           664: When we have successfully matched all the bindings in \&\f5sig\fP we build the
        !           665: result structure
        !           666: .M3 str\'
        !           667: using
        !           668: .M3 sig \'s
        !           669: table and
        !           670: .M3 sign ,
        !           671: .M3 str \'s
        !           672: stamp ,
        !           673: and
        !           674: .M3 newenv
        !           675: .M1
        !           676: str' = STRstr {stamp=n, sign=SOME k, table=sigtab,
        !           677:                env=newenv, kind=STRkind}
        !           678: .M2
        !           679: Finally any sharing constraints (from sharing) are checked as
        !           680: described in Section 3.5, and we return \&\f5str'\fP and a thinning
        !           681: specification as the result of the match.  Note that
        !           682: the bulk of the information in the signature is in
        !           683: .M3 sigtab ,
        !           684: and this is directly shared with the instantiation
        !           685: .M3 str\' .
        !           686: .PP
        !           687: As a shortcut, when elaborating a declaration like
        !           688: .M1
        !           689: structure S: SIG = struct \fIdeclarations\fP end
        !           690: .M2
        !           691: we do not build the structure on the right-hand side before doing
        !           692: the signature match.  Instead we elaborate the body declarations in
        !           693: the top-level environment and then do the signature matching using
        !           694: the top-level environment in the place of the target structure.
        !           695: .NH 2
        !           696: Functors and functor application
        !           697: .PP
        !           698: In the static semantics a functor %F% is modeled by a pair of
        !           699: structures representing the parameter and body of the functor, and
        !           700: two sets of bound stamps.
        !           701: .EQ I
        !           702: F ~ = ~ ( N ) ( S sub p , ^ ( N ' ) S sub b )
        !           703: .EN
        !           704: %N% is the set of bound stamps in the parameter structure, which may
        !           705: also occur in the body %S sub b%, while %N '%
        !           706: is the set of stamps associated with generative elements of the
        !           707: functor body.  To apply %F% to an argument structure %A% we perform
        !           708: the following steps
        !           709: .EQ I
        !           710: r ~ = ~ match ( ( N ) S sub p , ^ A )
        !           711: .EN
        !           712: .EQ I
        !           713: g ~ = ~ generate ( N ' )
        !           714: .EN
        !           715: .EQ I
        !           716: F ( A ) ~ = ~ g ( r ( S sub b ) )
        !           717: .EN
        !           718: That is, we match the parameter signature and the argument to produce
        !           719: a realization map %r%, then we generate a realization map %g% that maps
        !           720: each bound stamp in %N '% to a unique, new constant stamp, and finally
        !           721: we produce the result structure by using %r% and %g% to instantiate
        !           722: the body structure.
        !           723: .PP
        !           724: The implementation of functors follows this scheme closely.  The
        !           725: datatype used to represent functors is defined by
        !           726: .M1
        !           727: datatype Functor = FUNCTOR of
        !           728:                      {param : Structure,
        !           729:                       body  : Structure,
        !           730:                       tycCount : int}
        !           731: .M2
        !           732: The bound stamps in the \&\f5param\fP structure are numbered from 0 to
        !           733: %n% and these may also occur in the \&\f5body\fP structure.
        !           734: Generative stamps in the body are numbered from %n+1% to %n+m%, which
        !           735: is the value of \&\f5tycCount\fP.
        !           736: .PP
        !           737: If the functor declaration provides an explicit result signature, as in
        !           738: .M1
        !           739: functor F(X : SIGP) : SIGR = struct ... end
        !           740: .M2
        !           741: the body will naturally be schematic (i.e. the parts with bound stamps will
        !           742: be isolated in instance vectors) as a result of the signature
        !           743: matching between the body and the result signature.  However, if there
        !           744: is no result signature, we explicitly abstract these ``volatile'' parts of
        !           745: the body structure to get an instantiable scheme so that the body's
        !           746: symbol table may be shared by all structures produced by the functor.
        !           747: .PP
        !           748: To apply the functor, signature matching is performed between the
        !           749: parameter signature and the argument to build a realization map for
        !           750: the bound stamps in the parameter.  Then the body is instantiated
        !           751: using this realization map and introducing new constant stamps to
        !           752: replace generative bound stamps as required.  The actual algorithm
        !           753: is more complicated than this because functor application can occur
        !           754: within the body of functor declaration, as in
        !           755: .M1
        !           756: functor F(X : SIGP) =
        !           757:   struct
        !           758:     structure A = G(X)
        !           759:     structure B = H(A)
        !           760:     ...
        !           761:   end
        !           762: .M2
        !           763: In cases like the applications of \&\f5G\fP and \&\f5H\fP in this
        !           764: example, the actual parameter may contain parameter bound and even
        !           765: generatively bound stamps, and the realization of the generative
        !           766: stamps in the body of \&\f5G\fP and \&\f5H\fP will themselves be
        !           767: generatively bound stamps.
        !           768: .NH 2
        !           769: Sharing
        !           770: .PP
        !           771: The purpose of sharing constraints is to insure a kind of compatibility
        !           772: between several parameter structures of a functor, as illustrated in
        !           773: Figure 2.  The sharing constraints are expressed as sets of equations
        !           774: between paths designating structures or types (there are two kinds of
        !           775: sharing specifications: structure sharing and type sharing) and they
        !           776: determine an equivalence relation amongst the components of the
        !           777: signature.  The strategy for incorporating sharing constraints in the
        !           778: representation of a signature is to force all components of an
        !           779: equivalence class to have the same stamp.
        !           780: .PP
        !           781: Two components may be required to share either because they are
        !           782: directly equated in a sharing specification, or because they are
        !           783: corresponding components of structures that are required to share.
        !           784: Thus if 
        !           785: .M1
        !           786: X : sig
        !           787:       structure C1 : SIGC
        !           788:       structure C2 : SIGC
        !           789:       sharing C1.CB = C2.CB
        !           790:     end
        !           791: .M2
        !           792: then \&\f5X.C1.CB = X.C2.CB\fP is directly specified, and \&\f5X.C1.CB.BA =
        !           793: X.C2.CB.BA\fP is an inferred consequence.  This simply says that the
        !           794: complete sharing relation must be a congruence with respect to the
        !           795: operation of selecting a named substructure or type.
        !           796: .PP
        !           797: Under what circumstances may two structures be constrained to share?
        !           798: They must be \fIconsistent\fP, in the sense that it is possible to
        !           799: find a structure that could simultaneously match both of them.  In
        !           800: particular, the structures that are forced to share do not necessarily
        !           801: have to share the same signature, and the fact that they share does
        !           802: not have any effect on their signature.  The idea is that various
        !           803: thinned versions of a given structure may have different signatures,
        !           804: but they can still share because they are actually restricted views
        !           805: of their common ancestor structure.  This approach is supported by the
        !           806: fact that in signature matching, the stamps of the matched structure
        !           807: are inherited by the resultant structure.  Note that this is not the
        !           808: approach described in [6], where signatures of sharing
        !           809: structures are forced to agree by formation of a kind of union
        !           810: signature.  We do not actually verify that signatures that are
        !           811: specified to share are consistent, but if they are not, the signature
        !           812: containing the sharing specification can never be successfully matched.
        !           813: .PP
        !           814: The processing of the sharing constraints is performed in two stages.
        !           815: First, a union-find algorithm is used to determine all sharing
        !           816: relations, direct and inferred, and to construct the equivalence
        !           817: classes for the sharing relation.  At this stage it is also possible
        !           818: to detect certain pathological sharing specifications, such as trying
        !           819: to identify a structure with one of its substructures.  Second, the
        !           820: signature is copied and each element of a given equivalence class is
        !           821: given the same representative stamp.
        !           822: .PP
        !           823: There are two ways in which sharing information is used.  (1) When a
        !           824: signature with sharing constraints is used as a functor parameter, the
        !           825: identification of stamps in the signature will automatically insure
        !           826: that the sharing has the desired effect during type checking, \fIi.e.\fP
        !           827: types that are specified to share will be seen to be identical by the
        !           828: type checker.  (2) During signature matching, any sharing relations
        !           829: specified in the signature must also hold in the matched structure.
        !           830: One way to check this would be to make sure that the realization map
        !           831: was well defined, because a failure of sharing in the target structure
        !           832: would cause a single bound stamp to be mapped to more than one target
        !           833: stamp.  However, the realization map is only explicitly constructed in
        !           834: the matching of functor parameters, where it is needed to help
        !           835: instantiate the functor body.  Hence it is more convenient to simply
        !           836: save the original sharing constraints as equations in the signature
        !           837: and check them explicitly in the target structure as part of signature
        !           838: matching.
        !           839: .NH 2
        !           840: Relation with type checking
        !           841: .PP
        !           842: What is the relationship between the structures and signatures and the
        !           843: underlying ML type checking mechanism?  Obviously signatures and structures
        !           844: are carriers of type information \(em that is one of their principle purposes.
        !           845: When we look up a value component of a structure we get the same sort of bindings
        !           846: as in the top-level environment, except that in some circumstances the
        !           847: type has been relativized to the structure's instance vectors and it
        !           848: contains \&\f5INDtyc\fP type constructors.  The basic variable lookup
        !           849: functions have been defined to eliminate these indirections by replacing
        !           850: them with the referenced type constructors from the instance vectors,
        !           851: at the expense of partially copying the type.  This would appear to undo
        !           852: some of the savings achieved by the structure sharing representation of
        !           853: structures, but these copies tend to be ephemeral, and they are
        !           854: quickly and efficiently garbage-collected.  The type information in
        !           855: structures and signatures is, on the other hand, long-lived, so it is
        !           856: more critical to minimize their space requirements.
        !           857: .NH
        !           858: Conclusions
        !           859: .PP
        !           860: The challenge of implementing the Standard ML module features is to
        !           861: perform the compile-time matchings and instantiations necessary to
        !           862: propagate and check type information with a minimum of duplication of
        !           863: that information.  Experience has shown that a naive approach leads to
        !           864: an explosion in the size of the static representations.
        !           865: .PP
        !           866: The implementation strategy described in this paper uses a structure
        !           867: sharing instantiation technique instead of instantiation by copying,
        !           868: and has proved to be reasonably modest in its space requirements.  It
        !           869: also has the advantage that it remains quite close in spirit to the
        !           870: formal static semantics.
        !           871: .PP
        !           872: Work on the Standard ML module facilities continues, and current
        !           873: topics of interest include explicit functor signatures and the relation
        !           874: of the module constructs to separate compilation.
        !           875: .SH
        !           876: Acknowledgements
        !           877: .PP
        !           878: This implementation is inspired directly by the work done on the
        !           879: operational semantics of Standard ML by Robin Milner, Mads Tofte, and
        !           880: Bob Harper, and I have benefited from many discussions with them.
        !           881: Bob Harper passed on useful ideas from his prototype implementation
        !           882: of modules.  Nick Rothwell helped me explore the use of structure sharing
        !           883: in type checking.  Andrew Appel provided valuable advice throughout and
        !           884: had a direct hand in the implementation of sharing specifications, and
        !           885: he has been an equal partner in the creation of Standard ML of New Jersey.
        !           886: .SH
        !           887: References
        !           888: .IP 1.
        !           889: A. Appel and D. MacQueen, \fIA Standard ML compiler\fP, Proceedings of the
        !           890: Conference on Functional Programming and Computer Architecture, Portland,
        !           891: September 1987, G. Kahn, ed., LNCS Vol. 274, Springer-Verlag, 1987.
        !           892: .IP 2.
        !           893: H.-J. Boehm and A. Demers, \fIImplementing Russell\fP, 
        !           894: Proceedings of SIGPLAN 86 Symposium on Compiler Construction, Palo Alto,
        !           895: 1986, 186-195.
        !           896: .IP 3.
        !           897: R. S. Boyer and J Moore, \fIThe sharing of structure in theorem-proving
        !           898: programs\fP, Machine Intelligence 7, B. Meltzer and D. Michie, eds., Edinburgh
        !           899: University Press, 1972, 101-116.
        !           900: .IP 4.
        !           901: L. Damas and R. Milner, \fIPrincipal type schemes for functional programs\fP,
        !           902: Proceedings of 9th ACM Symposium on Principles of Programming Languages,
        !           903: Albuquerque, 1982, 207-212.
        !           904: .IP 5.
        !           905: R. Harper, D. MacQueen, and R. Milner, \fIStandard ML\fP, Laboratory for
        !           906: Foundations of Computer Science, Dept. of Computer Science, University of
        !           907: Edinburgh, ECS-LFCS-86-2, 1986. (Also Polymorphism II, 2, October 1985.)
        !           908: .IP 6.
        !           909: R. Harper, R. Milner, and M. Tofte, \fIA type discipline for program modules\fP,
        !           910: Proceedings TAPSOFT 87, LNCS Vol. 250, Springer-Verlag, New York, 1987, 308-319.
        !           911: .IP 7.
        !           912: R. Harper, R. Milner, and M. Tofte, \fIThe semantics of Standard ML,
        !           913: Version I\fP, Laboratory for Foundations of Computer Science, Dept. of
        !           914: Computer Science, University of Edinburgh, ECS-LFCS-87-36, 1986.
        !           915: .IP 8.
        !           916: D. MacQueen, \fIModules for Standard ML\fP, in [5].  (An earlier version
        !           917: appeared in Proceedings ACM Symposium on Lisp and Functional Programming,
        !           918: Austin, 1984.)
        !           919: .IP 9.
        !           920: D. MacQueen, \fIUsing dependent types to express modular structure\fP,
        !           921: Proceedings 13th ACM Symposium on Principles of Programming Languages, St.
        !           922: Petersburg Beach, 1986, 277-286.
        !           923: .IP 10.
        !           924: P. Martin-L\o'o"'f, \fIConstructive mathematics and computer programming\fP,
        !           925: Sixth International Congress for Logic, Methodology, and Philosophy of Science,
        !           926: North Holland, Amsterdam, 1982, 153-175.
        !           927: .IP 11.
        !           928: J. C. Mitchell and R. Harper, \fIThe essence of ML\fP, Proceedings 15th
        !           929: ACM Symposium on Principles of Programming Languages, San Diego, 1988,
        !           930: 28-46.
        !           931: .IP 12.
        !           932: M. Tofte, Operational Semantics and Polymorphic Type Inference, Ph.D.
        !           933: Dissertation, Dept. of Computer Science, University of Edinburgh, 1987.
        !           934: .IP 13.
        !           935: D. H. D. Warren, \fIImplementing PROLOG - Compiling Predicate Logic Programs,
        !           936: Vol. I\fP, Dept. of Artificial Intelligence Report No. 39, University of
        !           937: Edinburgh, 1977.

unix.superglobalmegacorp.com

This archive runs on limited infrastructure. Preserving old code on modern bandwidth. Automated agents are requested to crawl responsibly.